mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
This adds a new quantization mechanism for HNSW and flat indices. Here
we add `int4` quantization via the `int4_hnsw` and `int4_flat` index
types. This quantization methodology further reduces the memory required
for fast HNSW, meaning that the memory required is 8x smaller than with
regular float32 values.
8x reduction means that 1M 1024 dimension vectors goes from requiring
3.8GB to 477MB.
Recall continues to stay steady, there is some reduction that is
recoverable via slightly oversampling and reranking. For example over
500k CohereV3 vectors, only 5 extra vectors are required to be gathered
to achieve over 0.98 recall in a brute-force scenario.
![recall](
|
||
---|---|---|
.. | ||
dynamic | ||
fields | ||
params | ||
types | ||
dynamic-mapping.asciidoc | ||
explicit-mapping.asciidoc | ||
fields.asciidoc | ||
mapping-settings-limit.asciidoc | ||
params.asciidoc | ||
removal_of_types.asciidoc | ||
runtime.asciidoc | ||
types.asciidoc |