elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 01:44:36 -04:00

Author	SHA1	Message	Date
Benjamin Trent	5add44d7d1	Adds new `bit` element_type for dense_vectors (#110059 ) This commit adds `bit` vector support by adding `element_type: bit` for vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works with `hnsw` and `flat` index types. No quantization based codec works with this element type, this is consistent with `byte` vectors. `bit` vectors accept up to `32768` dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or a `byte[]` array where each element of the `byte` array represents `8` bits of the vector. `bit` vectors support script usage and regular query usage. When indexed, all comparisons done are `xor` and `popcount` summations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions. Note, indexed bit vectors require `l2_norm` to be the similarity. For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is `sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported. Note, the dimensions expected by this element_type are always to be divisible by `8`, and the `byte[]` vectors provided for index must be have size `dim/8` size, where each byte element represents `8` bits of the vectors. closes: https://github.com/elastic/elasticsearch/issues/48322	2024-06-27 04:48:41 +10:00
Benjamin Trent	acc99302c6	Adding hamming distance function to painless for dense_vector fields (#109359 ) This adds `hamming` distances, the pop-count of `xor` byte vectors as a first class citizen in painless. For byte vectors, this means that we can compute hamming distances via script_score (aka, brute-force). The implementation of `hamming` is the same that is available in Lucene, and when lucene 9.11 is merged, we should update our logic where applicable to utilize it. NOTE: this does not yet add hamming distance as a metric for indexed vectors. This will be a future PR after the Lucene 9.11 upgrade.	2024-06-18 03:41:20 +10:00
Abdon Pijpelink	7d01d768c2	[DOCS] Warn about calling vector functions repeatedly (#91864 ) * [DOCS] Add script score vector function clarification * [DOCS] Warn about calling vector functions repeatedly	2022-12-12 09:43:46 +01:00
James Rodewig	f56a0f4b66	[DOCS] Remove `testenv` annotations from doc snippet tests (#80023 ) Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible. Relates to #79309, #31619	2021-11-05 18:38:50 -04:00
Mayya Sharipova	853e68dfdf	Add access to dense_vector values (#71313 ) Allow direct access to a dense_vector' values in script through the following functions: - getVectorValue – returns a vector's value as an array of floats - getMagnitude – returns a vector's magnitude Closes #51964	2021-04-19 08:02:05 -04:00
James Rodewig	441c3a21b1	[DOCS] Update my-index examples (#60132 ) Changes the following example index names to `my-index-000001` for consistency: * `my-index` * `my_index` * `myindex`	2020-07-27 14:46:39 -04:00
Julie Tibshirani	dd6f0a35e4	Remove the 'experimental' marking from vector fields. (#49120 ) We wrapped up the API changes we wanted to make, and vector fields can now be considered GA.	2019-11-18 11:57:18 -08:00
Julie Tibshirani	460d545921	Remove support for sparse vectors. (#48781 ) Follow up to #48368. This PR removes support for `sparse_vector` on new indices. On 7.x indices a `sparse_vector` can still be defined, but it is not possible to index or search on it.	2019-11-14 16:54:48 -05:00
Julie Tibshirani	f863dd12b4	Update the signature of vector script functions. (#48604 ) Previously the functions accepted a doc values reference, whereas they now accept the name of the vector field. Here's an example of how a vector function was called before and after the change. ``` Before: cosineSimilarity(params.query_vector, doc['field']) After: cosineSimilarity(params.query_vector, 'field') ``` This seems more intuitive, since we don't allow direct access to vector doc values and the the meaning of `doc['field']` is unclear. The PR makes the following changes (broken into distinct commits): * Add new function signatures of the form `function(params.query_vector, 'field')` and deprecates the old ones. Because Painless doesn't allow two methods with the same name and number of arguments, we allow a generic `Object` to be passed in to the function and decide on the behavior through an `instanceof` check. * Refactor the class bindings so that the document field is passed to the constructor instead of the instance method. This allows us to avoid retrieving the vector doc values on every function invocation, which gives a tiny speed-up in benchmarks. Note that this PR adds new signatures for the sparse vector functions too, even though sparse vectors are deprecated. It seemed simplest to understand (for both us and users) to keep everything symmetric between dense and sparse vectors.	2019-10-29 13:26:36 -07:00
Julie Tibshirani	5478fff640	Deprecate the sparse_vector field type. (#48315 ) We have not seen much adoption of this experimental field type, and don't see a clear use case as it's currently designed. This PR deprecates the field type in 7.x. It will be removed from 8.0 in a follow-up PR.	2019-10-22 18:06:50 -07:00
James Rodewig	f5827ba0ae	[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159 )	2019-09-04 12:51:02 -04:00
Mayya Sharipova	de8b9f3039	Add filters in exampls of vector functions (#45327 )	2019-08-08 09:38:05 -04:00
Mayya Sharipova	16747f811f	Add l1norm and l2norm distances for vectors (#44116 ) * Add l1norm and l2norm distances for vectors Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 * Address Christoph's feedback - organize vector functions as a separate doc - increase precision in tests calculations - add a separate test when sparse doc dims are bigger and less than query vector dims * Made examples more realistic	2019-07-11 14:14:23 -04:00

13 commits