elasticsearch/qa/rolling-upgrade-legacy
Benjamin Trent caec612fea
Make cosine similarity faster by storing magnitude and normalizing vectors (#99445)
`cosine` is our default similarity and should provide a good experience
on speed.

`dot_product` is faster than `cosine` as it doesn't require calculating
vector magnitudes in the similarity comparison loop. Instead, it can
assume vectors have a length of `1` and use an optimized `dot_product`
calculation.

However, `cosine` as it exists today accepts vectors of any magnitude
and cannot take advantage of this.

This commit addresses this by:

 - Normalizing all vectors passed when indexing via `cosine`
 - Storing the calculated magnitude in an additional field (only if its `!= 1`).
 - Using the `dot_product` Lucene calculation
 - Normalizing query vectors when used against these new `cosine` fields
 - De-normalizing vectors when accessed via scripts
 - Allowing scripts to access these stored magnitudes.
2023-12-01 13:45:43 -05:00
..
src/test Make cosine similarity faster by storing magnitude and normalizing vectors (#99445) 2023-12-01 13:45:43 -05:00
build.gradle Remove unnecessary test logging (#101710) 2023-11-03 10:04:57 +01:00