Commit graph

1 commit

Author SHA1 Message Date
Dario Gieselaar
7d20301289
Load huggingface content datasets (#224543)
Implements a huggingface dataset loader for RAG evals - see
[x-pack/platform/packages/shared/kbn-ai-tools-cli/src/hf_dataset_loader/README.md](https://github.com/dgieselaar/kibana/blob/hf-dataset-loader/x-pack/platform/packages/shared/kbn-ai-tools-cli/src/hf_dataset_loader/README.md).
Additionally, a `@kbn/cache-cli` tool was added that allows tooling
authors to cache to disk (possibly remote storage later).

Used o3 for finding datasets on HuggingFace and doing an initial pass on
a line-by-line dataset processor ([see
conversation](https://chatgpt.com/share/6853e49a-e870-8000-9c65-f7a5a3a72af0))

Libraries added:

- `cache-manager`, `cache-manager-fs-hash`, `keyv`,
`@types/cache-manager-fs-hash`: caching libraries and plugins. could not
find any existing caching libraries in the repo.
- `@huggingface/hub`: api client for HF.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-06-26 17:24:45 +02:00