mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
Even better(er) binary quantization (#117994)
This measurably improves BBQ by adjusting the underlying algorithm to an optimized per vector scalar quantization. This is a brand new way to quantize vectors. Instead of there being a global set of upper and lower quantile bands, these are optimized and calculated per individual vector. Additionally, vectors are centered on a common centroid. This allows for an almost 32x reduction in memory, and even better recall than before at the cost of slightly increasing indexing time. Additionally, this new approach is easily generalizable to various other bit sizes (e.g. 2 bits, etc.). While not taken advantage of yet, we may update our scalar quantized indices in the future to use this new algorithm, giving significant boosts in recall. The recall gains spread from 2% to almost 10% for certain datasets with an additional 5-10% indexing cost when indexing with HNSW when compared with current BBQ.
This commit is contained in:
parent
0586cbfb34
commit
5e859d9301
32 changed files with 3501 additions and 137 deletions
|
@ -459,7 +459,9 @@ module org.elasticsearch.server {
|
|||
org.elasticsearch.index.codec.vectors.ES815HnswBitVectorsFormat,
|
||||
org.elasticsearch.index.codec.vectors.ES815BitFlatVectorFormat,
|
||||
org.elasticsearch.index.codec.vectors.es816.ES816BinaryQuantizedVectorsFormat,
|
||||
org.elasticsearch.index.codec.vectors.es816.ES816HnswBinaryQuantizedVectorsFormat;
|
||||
org.elasticsearch.index.codec.vectors.es816.ES816HnswBinaryQuantizedVectorsFormat,
|
||||
org.elasticsearch.index.codec.vectors.es818.ES818BinaryQuantizedVectorsFormat,
|
||||
org.elasticsearch.index.codec.vectors.es818.ES818HnswBinaryQuantizedVectorsFormat;
|
||||
|
||||
provides org.apache.lucene.codecs.Codec
|
||||
with
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue