mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
Adds clarification for vector preloading, what extension is to what storage kind, and that quantized vectors are stored in separate files allowing for individual preload. closes: https://github.com/elastic/elasticsearch/issues/116273
170 lines
8.9 KiB
Text
170 lines
8.9 KiB
Text
[[tune-knn-search]]
|
|
== Tune approximate kNN search
|
|
|
|
{es} supports <<approximate-knn, approximate k-nearest neighbor search>> for
|
|
efficiently finding the _k_ nearest vectors to a query vector. Since
|
|
approximate kNN search works differently from other queries, there are special
|
|
considerations around its performance.
|
|
|
|
Many of these recommendations help improve search speed. With approximate kNN,
|
|
the indexing algorithm runs searches under the hood to create the vector index
|
|
structures. So these same recommendations also help with indexing speed.
|
|
|
|
[discrete]
|
|
=== Reduce vector memory foot-print
|
|
|
|
The default <<dense-vector-element-type,`element_type`>> is `float`. But this
|
|
can be automatically quantized during index time through
|
|
<<dense-vector-quantization,`quantization`>>. Quantization will reduce the
|
|
required memory by 4x, 8x, or as much as 32x, but it will also reduce the precision of the vectors and
|
|
increase disk usage for the field (by up to 25%, 12.5%, or 3.125%, respectively). Increased disk usage is a
|
|
result of {es} storing both the quantized and the unquantized vectors.
|
|
For example, when int8 quantizing 40GB of floating point vectors an extra 10GB of data will be stored for the quantized vectors.
|
|
The total disk usage amounts to 50GB, but the memory usage for fast search will be reduced to 10GB.
|
|
|
|
For `float` vectors with `dim` greater than or equal to `384`, using a
|
|
<<dense-vector-quantization,`quantized`>> index is highly recommended.
|
|
|
|
[discrete]
|
|
=== Reduce vector dimensionality
|
|
|
|
The speed of kNN search scales linearly with the number of vector dimensions,
|
|
because each similarity computation considers each element in the two vectors.
|
|
Whenever possible, it's better to use vectors with a lower dimension. Some
|
|
embedding models come in different "sizes", with both lower and higher
|
|
dimensional options available. You could also experiment with dimensionality
|
|
reduction techniques like PCA. When experimenting with different approaches,
|
|
it's important to measure the impact on relevance to ensure the search
|
|
quality is still acceptable.
|
|
|
|
[discrete]
|
|
=== Exclude vector fields from `_source`
|
|
|
|
{es} stores the original JSON document that was passed at index time in the
|
|
<<mapping-source-field, `_source` field>>. By default, each hit in the search
|
|
results contains the full document `_source`. When the documents contain
|
|
high-dimensional `dense_vector` fields, the `_source` can be quite large and
|
|
expensive to load. This could significantly slow down the speed of kNN search.
|
|
|
|
NOTE: <<docs-reindex, reindex>>, <<docs-update, update>>,
|
|
and <<docs-update-by-query, update by query>> operations generally
|
|
require the `_source` field. Disabling `_source` for a field might result in
|
|
unexpected behavior for these operations. For example, reindex might not actually
|
|
contain the `dense_vector` field in the new index.
|
|
|
|
You can disable storing `dense_vector` fields in the `_source` through the
|
|
<<include-exclude, `excludes`>> mapping parameter. This prevents loading and
|
|
returning large vectors during search, and also cuts down on the index size.
|
|
Vectors that have been omitted from `_source` can still be used in kNN search,
|
|
since it relies on separate data structures to perform the search. Before
|
|
using the <<include-exclude, `excludes`>> parameter, make sure to review the
|
|
downsides of omitting fields from `_source`.
|
|
|
|
Another option is to use <<synthetic-source,synthetic `_source`>>.
|
|
|
|
[discrete]
|
|
=== Ensure data nodes have enough memory
|
|
|
|
{es} uses the https://arxiv.org/abs/1603.09320[HNSW] algorithm for approximate
|
|
kNN search. HNSW is a graph-based algorithm which only works efficiently when
|
|
most vector data is held in memory. You should ensure that data nodes have at
|
|
least enough RAM to hold the vector data and index structures. To check the
|
|
size of the vector data, you can use the <<indices-disk-usage>> API.
|
|
|
|
Here are estimates for different element types and quantization levels:
|
|
|
|
* `element_type: float`: `num_vectors * num_dimensions * 4`
|
|
* `element_type: float` with `quantization: int8`: `num_vectors * (num_dimensions + 4)`
|
|
* `element_type: float` with `quantization: int4`: `num_vectors * (num_dimensions/2 + 4)`
|
|
* `element_type: float` with `quantization: bbq`: `num_vectors * (num_dimensions/8 + 12)`
|
|
* `element_type: byte`: `num_vectors * num_dimensions`
|
|
* `element_type: bit`: `num_vectors * (num_dimensions/8)`
|
|
|
|
If utilizing HNSW, the graph must also be in memory, to estimate the required bytes use `num_vectors * 4 * HNSW.m`. The
|
|
default value for `HNSW.m` is 16, so by default `num_vectors * 4 * 16`.
|
|
|
|
Note that the required RAM is for the filesystem cache, which is separate from the Java heap.
|
|
|
|
The data nodes should also leave a buffer for other ways that RAM is needed.
|
|
For example your index might also include text fields and numerics, which also
|
|
benefit from using filesystem cache. It's recommended to run benchmarks with
|
|
your specific dataset to ensure there's a sufficient amount of memory to give
|
|
good search performance.
|
|
You can find https://elasticsearch-benchmarks.elastic.co/#tracks/so_vector[here]
|
|
and https://elasticsearch-benchmarks.elastic.co/#tracks/dense_vector[here] some examples
|
|
of datasets and configurations that we use for our nightly benchmarks.
|
|
|
|
[discrete]
|
|
[[dense-vector-preloading]]
|
|
include::search-speed.asciidoc[tag=warm-fs-cache]
|
|
|
|
The following file extensions are used for the approximate kNN search:
|
|
Each extension is broken down by the quantization types.
|
|
|
|
* `vex` for the HNSW graph
|
|
* `vec` for all non-quantized vector values. This includes all element types: `float`, `byte`, and `bit`.
|
|
* `veq` for quantized vectors indexed with <<dense-vector-quantization,`quantization`>>: `int4` or `int8`
|
|
* `veb` for binary vectors indexed with <<dense-vector-quantization,`quantization`>>: `bbq`
|
|
* `vem`, `vemf`, `vemq`, and `vemb` for metadata, usually small and not a concern for preloading
|
|
|
|
Generally, if you are using a quantized index, you should only preload the relevant quantized values and the HNSW graph.
|
|
Preloading the raw vectors is not necessary and might be counterproductive.
|
|
|
|
[discrete]
|
|
=== Reduce the number of index segments
|
|
|
|
{es} shards are composed of segments, which are internal storage elements in
|
|
the index. For approximate kNN search, {es} stores the vector values of
|
|
each segment as a separate HNSW graph, so kNN search must check each segment.
|
|
The recent parallelization of kNN search made it much faster to search across
|
|
multiple segments, but still kNN search can be up to several times
|
|
faster if there are fewer segments. By default, {es} periodically
|
|
merges smaller segments into larger ones through a background
|
|
<<index-modules-merge, merge process>>. If this isn't sufficient, you can take
|
|
explicit steps to reduce the number of index segments.
|
|
|
|
[discrete]
|
|
==== Increase maximum segment size
|
|
|
|
{es} provides many tunable settings for controlling the merge process. One
|
|
important setting is `index.merge.policy.max_merged_segment`. This controls
|
|
the maximum size of the segments that are created during the merge process.
|
|
By increasing the value, you can reduce the number of segments in the index.
|
|
The default value is `5GB`, but that might be too small for larger dimensional vectors.
|
|
Consider increasing this value to `10GB` or `20GB` can help reduce the number of segments.
|
|
|
|
[discrete]
|
|
==== Create large segments during bulk indexing
|
|
|
|
A common pattern is to first perform an initial bulk upload, then make an
|
|
index available for searches. Instead of force merging, you can adjust the
|
|
index settings to encourage {es} to create larger initial segments:
|
|
|
|
* Ensure there are no searches during the bulk upload and disable
|
|
<<index-refresh-interval-setting,`index.refresh_interval`>> by setting it to
|
|
`-1`. This prevents refresh operations and avoids creating extra segments.
|
|
* Give {es} a large indexing buffer so it can accept more documents before
|
|
flushing. By default, the <<indexing-buffer,`indices.memory.index_buffer_size`>>
|
|
is set to 10% of the heap size. With a substantial heap size like 32GB, this
|
|
is often enough. To allow the full indexing buffer to be used, you should also
|
|
increase the limit <<index-modules-translog,`index.translog.flush_threshold_size`>>.
|
|
|
|
[discrete]
|
|
=== Avoid heavy indexing during searches
|
|
|
|
Actively indexing documents can have a negative impact on approximate kNN
|
|
search performance, since indexing threads steal compute resources from
|
|
search. When indexing and searching at the same time, {es} also refreshes
|
|
frequently, which creates several small segments. This also hurts search
|
|
performance, since approximate kNN search is slower when there are more
|
|
segments.
|
|
|
|
When possible, it's best to avoid heavy indexing during approximate kNN
|
|
search. If you need to reindex all the data, perhaps because the vector
|
|
embedding model changed, then it's better to reindex the new documents into a
|
|
separate index rather than update them in-place. This helps avoid the slowdown
|
|
mentioned above, and prevents expensive merge operations due to frequent
|
|
document updates.
|
|
|
|
[discrete]
|
|
include::search-speed.asciidoc[tag=readahead]
|