kibana/src/platform/packages/shared/kbn-cache-cli/README.md
Dario Gieselaar 7d20301289
Load huggingface content datasets (#224543)
Implements a huggingface dataset loader for RAG evals - see
[x-pack/platform/packages/shared/kbn-ai-tools-cli/src/hf_dataset_loader/README.md](https://github.com/dgieselaar/kibana/blob/hf-dataset-loader/x-pack/platform/packages/shared/kbn-ai-tools-cli/src/hf_dataset_loader/README.md).
Additionally, a `@kbn/cache-cli` tool was added that allows tooling
authors to cache to disk (possibly remote storage later).

Used o3 for finding datasets on HuggingFace and doing an initial pass on
a line-by-line dataset processor ([see
conversation](https://chatgpt.com/share/6853e49a-e870-8000-9c65-f7a5a3a72af0))

Libraries added:

- `cache-manager`, `cache-manager-fs-hash`, `keyv`,
`@types/cache-manager-fs-hash`: caching libraries and plugins. could not
find any existing caching libraries in the repo.
- `@huggingface/hub`: api client for HF.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-06-26 17:24:45 +02:00

2.3 KiB
Raw Blame History

@kbn/cache-cli

Centralised caching helpers for scripts and CLIs in the Kibana repo.

The goal is to make it easy for engineers to cache computationally or I/O expensive operations on disk, or in the future, possible remote.


Quick start

import { fromCache, createLocalDirDiskCacheStore } from '@kbn/cache-cli';
import { createCache } from 'cache-manager';

const DOC_CACHE = createCache({
  stores: [createLocalDirDiskCacheStore({ dir: 'my_docs', ttl: 60 * 60 /* 1h */ })],
});

const docs = await fromCache('docs', DOC_CACHE, async () => fetchDocs());

fromCache(key, cache, cb, validator?) semantics:

  1. Tries cache.get(key) (skipped when process.env.DISABLE_KBN_CACHE is truthy).
  2. Runs the optional validator(cached) return false to force a refresh.
  3. Calls cb() if the cache miss / invalid.
  4. Persists the fresh value via cache.set(key, value) and returns it.

Available cache stores

@kbn/cache-cli wraps cache-manager so any Keyv compatible store works. The helpers below ship out-of-the-box:

Helper Backing store Typical use-case
createLocalDirDiskCacheStore({ dir, ttl? }) cache-manager-fs-hash on <REPO_ROOT>/data/{dir} Persist in ./data with an unknown ttl
createTmpDirDiskCacheStore({ dir, ttl? }) cache-manager-fs-hash on <OS_TMP_DIR>/{dir} Persist in os tmp dir which might be cleared over restarts

Cache invalidation strategies

  1. Manual bypass set DISABLE_KBN_CACHE=true to force fresh data (useful in CI workflows).
  2. Time-to-live (TTL) pass ttl when creating a store to let the backend expire entries automatically.
  3. Programmatic validation supply the cacheValidator callback to fromCache(); it receives the cached value and should return true when it is still valid.
  4. Clear on disk delete the relevant directory under data/ if you need a hard reset.

Choose whichever fits your script. They can be combined (e.g. a TTL plus a validator).