Commit graph

14 commits

Author SHA1 Message Date
Benjamin Trent
d33a03ce6b
Add support for bitwise inner-product in painless (#116082)
This adds bitwise inner product to painless. 

The idea here is:

 - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&`
 - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask.

This is effectively supporting asynchronous quantization. A prime
example of how this works is:
https://github.com/cohere-ai/BinaryVectorDB

Basically, you do your initial search against the binary space and then
rerank with a differently quantized vector allowing for more information
without additional storage space. 

closes:  https://github.com/elastic/elasticsearch/issues/111232
2024-11-06 09:22:04 +11:00
Benjamin Trent
5add44d7d1
Adds new bit element_type for dense_vectors (#110059)
This commit adds `bit` vector support by adding `element_type: bit` for
vectors. This new element type works for indexed and non-indexed
vectors. Additionally, it works with `hnsw` and `flat` index types. No
quantization based codec works with this element type, this is
consistent with `byte` vectors.

`bit` vectors accept up to `32768` dimensions in size and expect vectors
that are being indexed to be encoded either as a hexidecimal string or a
`byte[]` array where each element of the `byte` array represents `8`
bits of the vector.

`bit` vectors support script usage and regular query usage. When
indexed, all comparisons done are `xor` and `popcount` summations (aka,
hamming distance), and the scores are transformed and normalized given
the vector dimensions. Note, indexed bit vectors require `l2_norm` to be
the similarity.

For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported.

Note, the dimensions expected by this element_type are always to be
divisible by `8`, and the `byte[]` vectors provided for index must be
have size `dim/8` size, where each byte element represents `8` bits of
the vectors.

closes: https://github.com/elastic/elasticsearch/issues/48322
2024-06-27 04:48:41 +10:00
Benjamin Trent
acc99302c6
Adding hamming distance function to painless for dense_vector fields (#109359)
This adds `hamming` distances, the pop-count of `xor` byte vectors as a
first class citizen in painless. 

For byte vectors, this means that we can compute hamming distances via
script_score (aka, brute-force).

The implementation of `hamming` is the same that is available in Lucene,
and when lucene 9.11 is merged, we should update our logic where
applicable to utilize it.

NOTE: this does not yet add hamming distance as a metric for indexed
vectors. This will be a future PR after the Lucene 9.11 upgrade.
2024-06-18 03:41:20 +10:00
Abdon Pijpelink
7d01d768c2
[DOCS] Warn about calling vector functions repeatedly (#91864)
* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
2022-12-12 09:43:46 +01:00
James Rodewig
f56a0f4b66
[DOCS] Remove testenv annotations from doc snippet tests (#80023)
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.

Relates to #79309, #31619
2021-11-05 18:38:50 -04:00
Mayya Sharipova
853e68dfdf
Add access to dense_vector values (#71313)
Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude

Closes #51964
2021-04-19 08:02:05 -04:00
James Rodewig
441c3a21b1
[DOCS] Update my-index examples (#60132)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 14:46:39 -04:00
Julie Tibshirani
dd6f0a35e4
Remove the 'experimental' marking from vector fields. (#49120)
We wrapped up the API changes we wanted to make, and vector fields can now be
considered GA.
2019-11-18 11:57:18 -08:00
Julie Tibshirani
460d545921
Remove support for sparse vectors. (#48781)
Follow up to #48368. This PR removes support for `sparse_vector` on new
indices. On 7.x indices a `sparse_vector` can still be defined, but it is not
possible to index or search on it.
2019-11-14 16:54:48 -05:00
Julie Tibshirani
f863dd12b4
Update the signature of vector script functions. (#48604)
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
2019-10-29 13:26:36 -07:00
Julie Tibshirani
5478fff640
Deprecate the sparse_vector field type. (#48315)
We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
2019-10-22 18:06:50 -07:00
James Rodewig
f5827ba0ae
[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159) 2019-09-04 12:51:02 -04:00
Mayya Sharipova
de8b9f3039
Add filters in exampls of vector functions (#45327) 2019-08-08 09:38:05 -04:00
Mayya Sharipova
16747f811f
Add l1norm and l2norm distances for vectors (#44116)
* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic
2019-07-11 14:14:23 -04:00