elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-24 15:17:30 -04:00

History

Benjamin Trent caec612fea Make cosine similarity faster by storing magnitude and normalizing vectors (#99445 ) `cosine` is our default similarity and should provide a good experience on speed. `dot_product` is faster than `cosine` as it doesn't require calculating vector magnitudes in the similarity comparison loop. Instead, it can assume vectors have a length of `1` and use an optimized `dot_product` calculation. However, `cosine` as it exists today accepts vectors of any magnitude and cannot take advantage of this. This commit addresses this by: - Normalizing all vectors passed when indexing via `cosine` - Storing the calculated magnitude in an additional field (only if its `!= 1`). - Using the `dot_product` Lucene calculation - Normalizing query vectors when used against these new `cosine` fields - De-normalizing vectors when accessed via scripts - Allowing scripts to access these stored magnitudes.	2023-12-01 13:45:43 -05:00
..
src/test	Make cosine similarity faster by storing magnitude and normalizing vectors (#99445 )	2023-12-01 13:45:43 -05:00
build.gradle	Remove unnecessary test logging (#101710 )	2023-11-03 10:04:57 +01:00

`cosine` is our default similarity and should provide a good experience
on speed.

`dot_product` is faster than `cosine` as it doesn't require calculating
vector magnitudes in the similarity comparison loop. Instead, it can
assume vectors have a length of `1` and use an optimized `dot_product`
calculation.

However, `cosine` as it exists today accepts vectors of any magnitude
and cannot take advantage of this.

This commit addresses this by:

 - Normalizing all vectors passed when indexing via `cosine`
 - Storing the calculated magnitude in an additional field (only if its `!= 1`).
 - Using the `dot_product` Lucene calculation
 - Normalizing query vectors when used against these new `cosine` fields
 - De-normalizing vectors when accessed via scripts
 - Allowing scripts to access these stored magnitudes.

2023-12-01 13:45:43 -05:00

src/test

Make cosine similarity faster by storing magnitude and normalizing vectors (#99445 )

2023-12-01 13:45:43 -05:00

build.gradle

Remove unnecessary test logging (#101710 )

2023-11-03 10:04:57 +01:00