Fixed a typo and a small grammatical error in the explanation of the `null_value` option
(cherry picked from commit fa52f82838)
Co-authored-by: Nimrod Dolev <nimrodavid@gmail.com>
Adds new `quantization_options` to `dense_vector`. This allows for
vectors to be automatically quantized to `byte` when indexed.
Example:
```
PUT vectors
{
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"index": true,
"index_options": {
"type": "int8_hnsw"
}
}
}
}
}
```
When querying, the query vector is automatically quantized and used when
querying the HNSW graph. This reduces the memory required to only `25%`
of what was previously required for `float` vectors at a slight loss of
accuracy.
This is currently only available when `index: true` and when using
`hnsw`
* Represent histogram value count as long
Histograms currently use integers to store the count of each value,
which can overflow. Switch to using long integers to avoid this.
TDigestState was updated to use long for centroid value count in #99491Fixes#99820
* Update docs/changelog/99912.yaml
* spotless fix
* Nested dense_vector support
* Adjust nested support based on new lucene version
* fixing after rebase
* fixing some code
* fixing tests adding transport version
* spotless
* [Automated] Update Lucene snapshot to 9.9.0-snapshot-b3e67403aaf
* Adds new max_inner_product vector similarity function (#99527)
Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:
Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores
* requiring top level filter to be parent filter
* adding docs & fixing tests
* adding and fixing docs
* adding changlog
* removing unnecessary file changes
* removing unused imports
* fixing test
* maybe fix doc tests
* continue tests in docs
* fixing more tests
* fixing tests
---------
Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:
Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores
`dot_product` requires vectors to be unit-length. Previously, we would
check that vectors were unit-length and throw if they were not.
Instead, we will now auto-normalize vectors as they are indexed.
`cosine` will continue to behave as usual, not normalizing the vectors.
closes: https://github.com/elastic/elasticsearch/issues/98935
* First version
* Spotless, I liked my version better
* Fix param default values
* Add a supplier for default value to ensure it's calculated correctly
* Can't improve this without breaking tests
* Added checks for not specifying a body in PUT requests
* Fix default provider for enum params
* Added yaml test
* Changed docs and fix TODO
* Removing synonyms changes
* Added separate methods for providing default value as suppliers in enums
* Fixed test
* Add a supplier for default value to ensure it's calculated correctly
* Added checks for not specifying a body in PUT requests
* Remove synonyms changes
* Remove some supplier changes
* Better call enumParam with supplier version
* Fix compiler error on supplier
* Apply validators or requires depending on index version
* Solved BWC tests that involved using validators instead of requiresParameters
* Add tests
* Spotless
* Update docs/changelog/98268.yaml
* Update changelog
* Update docs/changelog/98268.yaml
* PR comments
* PR feedback
* Serialize index only for new index versions
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format
Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format
* Update docs/reference/mapping/dynamic/field-mapping.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
* Documentation for time-series geo_line
* Fix incorrect ids in geoline docs
* Some updates from review
Added image of kibana map, improved first example, linked to TSDS and added section on line simplification with link to wikipedia.
* Diagrams of truncation versus simplification
* Allow multiple field names/patterns for (path_)(un)match (#66364)
Arrays of patterns are now allowed for dynamic_templates in the match,
unmatch, path_match and path_unmatch fields. DynamicTemplate has been modified to
support List<String> for these fields. The patterns can be either simple wildcards
or regex. As with previous functionality, when match_pattern="regex", simple wildcards
will be flagged with an error, but when match_pattern="simple", using regular expressions
in the match will not throw an error.
One new error pathway was added: if a user specifies a list of non-strings for
one of these pattern fields (e.g., "match": [10, false]) a MapperParserException
will be thrown.
A dynamic_template yamlRestTest was added. This is a BWC change, so the REST test
that uses arrays of patterns is limited to v8.9 and above.
Closes#66364.
Currently Lucene limits the max number of vector dimensions to 1024.
This commit overrides KnnFloatVectorField and KnnByteVectorField
classes to increase the limit to 2048 for indexed vectors in ES.
Here we add synthetic source support for fields whose type is flattened.
Note that flattened fields and synthetic source have the following limitations,
all arising from the fact that in synthetic source we just see key/value pairs
when reconstructing the original object and have no type information in mappings:
* flattened fields use sorted set doc values of keywords, which means two things:
first we do not allow duplicate values, second we treat all values as keywords
* reconstructing array of objects results in nested objects (no array)
* reconstructing arrays with just one element results in a single-value field since we
have no way to distinguish single-valued from multi-values fields other then looking
at the count of values
`runtime_mappings` is the name of the param in the search request. In the
document `put` statement, it's called `runtime`
Co-authored-by: Matthew Hinea <matthew.hinea@gmail.com>
This PR enables the `ignore_malformed`parameter to be accepted as an option in
boolean field mappings. Support for synthetic source is not added yet, so if
`ignore_malformed` is set to true, synthetic source isn't supported.
Closes#89542
This adds term query capabilities for rank_features fields. term queries against rank_features are not scored in the typical way as regular fields. This is because the stored feature values take advantage of the term frequency storage mechanism, and thus regular BM25 does not work.
Instead, a term query against a rank_features field is very similar to linear rank_feature query. If more complicated combinations of features and values are required, the rank_feature query should be used.
* enhancement: boolean field to support ignore_malformed
* fix: changes in current builder for BooleanFieldMappers within tests files.
* Updating documentation
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
Documentation incorrectly states that all aggregations are supported by
the `aggregate_metric_double` field.
This PR rectifies this error.
Closes#92236
Docs around the `index` option were not very precise. The term "typical" was used without describing for which fields querying is still available when `index: false` is set. But more precise docs existed in the `doc_values` documentation found here for the index option: https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html This docs were mostly copied over.
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Currently Elasticsearch always returns a shard failure once a runtime error arises from using a runtime field, the exception being script-less runtime fields. This also means that execution of the query for that shard stops, which is okay for development and exploration. In a production scenario, however, it is often desirable to ignore runtime errors and continue with the query execution.
This change adds a new a new on_script_error parameter to runtime field definitions similar to the already existing
parameter for index-time scripted fields. When `on_script_error` is set to `continue`, errors from script execution are effectively ignored. This means affected documents don't show up in query results, but also don't prevent other matches from the same shard. Runtime fields accessed through the fields API don't return values on errors, aggregations will ignore documents that throw errors.
Note that this change affects scripted runtime fields only, while leaving default behaviour untouched. Also, ignored errors are not reported back to users for now.
Relates to #72143
* The exception is inserted in a code block
* Update docs/reference/mapping/types/text.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>