Commit graph

821 commits

Author SHA1 Message Date
Abdon Pijpelink
1612ad1d65
fix typo (#103149) (#103381)
Fixed a typo and a small grammatical error in the explanation of the `null_value` option

(cherry picked from commit fa52f82838)

Co-authored-by: Nimrod Dolev <nimrodavid@gmail.com>
2023-12-13 07:17:00 -05:00
Chris Hegarty
ff22c90735
Merge branch 'main' into lucene_snapshot_9_9 2023-12-02 09:42:22 +00:00
Jorge Sanz
c622dad8dd
[Docs] Move coordinate note for geojson/wkt up to the beginning of the geo_shape page (#102857)
* Move coordinate note for geojson/wkt up to the beginning of the page

* Add links to GeoJSON and WKT specs
2023-12-01 15:20:42 +01:00
Benjamin Trent
f00364aefd
Add byte quantization for float vectors in HNSW (#102093)
Adds new `quantization_options` to `dense_vector`. This allows for
vectors to be automatically quantized to `byte` when indexed.

Example:

```
PUT vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "int8_hnsw"
        }
      }
    }
  }
}
```

When querying, the query vector is automatically quantized and used when
querying the HNSW graph. This reduces the memory required to only `25%`
of what was previously required for `float` vectors at a slight loss of
accuracy.

This is currently only available when `index: true` and when using
`hnsw`
2023-11-29 12:29:55 -05:00
amyjtechwriter
d25435e185
disabling source (#101839) 2023-11-07 13:43:28 +00:00
James Rodewig
4c69746c24
[DOCS] Update tech preview copy (#101606)
Updates the copy for tech preview and experimental features in the Elasticsearch docs.

Relates to https://github.com/elastic/docs/pull/2807
2023-10-31 10:31:07 -04:00
Carlos Delgado
f2dfbfe8c4
[DOCS] Add sparse-vector field type to docs, changed references (#100348) 2023-10-06 14:25:27 +02:00
Luca Cavanna
689a1e490a Merge branch 'main' into lucene_snapshot_9_8 2023-10-02 13:56:12 +02:00
Kostas Krikellas
98b9e819ee
Represent histogram value count as long (#99912)
* Represent histogram value count as long

Histograms currently use integers to store the count of each value,
which can overflow. Switch to using long integers to avoid this.

TDigestState was updated to use long for centroid value count in #99491

Fixes #99820

* Update docs/changelog/99912.yaml

* spotless fix
2023-09-29 12:30:55 +03:00
Benjamin Trent
92cea2797e
Add nested support for dense_vector fields and knn search (#99763)
* Nested dense_vector support

* Adjust nested support based on new lucene version

* fixing after rebase

* fixing some code

* fixing tests adding transport version

* spotless

* [Automated] Update Lucene snapshot to 9.9.0-snapshot-b3e67403aaf

* Adds new max_inner_product vector similarity function (#99527)

Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:

Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores

* requiring top level filter to be parent filter

* adding docs & fixing tests

* adding and fixing docs

* adding changlog

* removing unnecessary file changes

* removing unused imports

* fixing test

* maybe fix doc tests

* continue tests in docs

* fixing more tests

* fixing tests

---------

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2023-09-28 11:38:04 -04:00
Luca Cavanna
15c87b681c Merge branch 'main' into lucene_snapshot_9_8 2023-09-28 12:19:14 +02:00
Kostas Krikellas
137bb45662
Support runtime fields in synthetic source (#99796)
* Support runtime fields in synthetic source

* Update docs/changelog/99796.yaml

* Introduce SyntheticSourceProvider

* Address comments

* More fixes

* Fix checkstyle violation

* More unittest updates

* Use SourceProvider in MapperServiceTestCase

* Remove runtime field from unittest

* Update synthetic source doc
2023-09-26 14:29:56 +03:00
Luca Cavanna
b3e769987d Merge branch 'main' into lucene_snapshot_9_8 2023-09-22 13:11:10 +02:00
Mayya Sharipova
ddf17e6be5
Increase the max vector dims to 4096 (#99682) 2023-09-20 15:43:40 -04:00
Benjamin Trent
dee85de61c Adds new max_inner_product vector similarity function (#99527)
Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:

Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores
2023-09-20 20:51:46 +02:00
Benjamin Trent
83b70e37ef
Revert "Auto-normalize dot_product vectors at index & query (#98944)" (#99421)
This reverts commit 7b9c367aeb.
2023-09-11 09:33:17 -04:00
Kathleen DeRusso
258d0cb0be
Automatically map floats as dense vector (#98512) 2023-09-06 16:06:29 -04:00
Benjamin Trent
7b9c367aeb
Auto-normalize dot_product vectors at index & query (#98944)
`dot_product` requires vectors to be unit-length. Previously, we would
check that vectors were unit-length and throw if they were not. 

Instead, we will now auto-normalize vectors as they are indexed.

`cosine` will continue to behave as usual, not normalizing the vectors.

closes: https://github.com/elastic/elasticsearch/issues/98935
2023-08-30 09:50:49 -04:00
Carlos Delgado
2b838ae853
Dense vector field types are indexed by default (#98268)
* First version

* Spotless, I liked my version better

* Fix param default values

* Add a supplier for default value to ensure it's calculated correctly

* Can't improve this without breaking tests

* Added checks for not specifying a body in PUT requests

* Fix default provider for enum params

* Added yaml test

* Changed docs and fix TODO

* Removing synonyms changes

* Added separate methods for providing default value as suppliers in enums

* Fixed test

* Add a supplier for default value to ensure it's calculated correctly

* Added checks for not specifying a body in PUT requests

* Remove synonyms changes

* Remove some supplier changes

* Better call enumParam with supplier version

* Fix compiler error on supplier

* Apply validators or requires depending on index version

* Solved BWC tests that involved using validators instead of requiresParameters

* Add tests

* Spotless

* Update docs/changelog/98268.yaml

* Update changelog

* Update docs/changelog/98268.yaml

* PR comments

* PR feedback

* Serialize index only for new index versions

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-08-17 10:53:14 -04:00
Kuni Sen
225503a447
Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format (#98338)
* Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format

Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format

* Update docs/reference/mapping/dynamic/field-mapping.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-08-10 16:44:44 +09:00
Mayya Sharipova
2076183dee
Move vectors of > 1024 dims out of experimental (#96850)
With moving max dims check to codec from Lucene 9.8, we will always
have a way to provide our own codec with the max dims
defined by us.
2023-08-03 14:30:14 -04:00
Abdon Pijpelink
5947f3b455
[DOCS] Clarify TSDS/synthetic source/runtime field restrictions (#97980) 2023-08-03 18:28:08 +02:00
Craig Taverner
8151092b45
Documentation for time-series geo_line (#97373)
* Documentation for time-series geo_line

* Fix incorrect ids in geoline docs

* Some updates from review

Added image of kibana map, improved first example, linked to TSDS and added section on line simplification with link to wikipedia.

* Diagrams of truncation versus simplification
2023-07-05 17:53:27 +02:00
Abdon Pijpelink
16aba067a0
[DOCS] Make 2028 dims 'experimental' warning inline (#96369) 2023-05-30 10:13:38 +02:00
debadair
777598d602
[DOCS] Remove redirect pages (#88738)
* [DOCS] Remove manual redirects

* [DOCS] Removed refs to modules-discovery-hosts-providers

* [DOCS] Fixed broken internal refs

* Fixing bad cross links in ES book, and adding redirects.asciidoc[] back into docs/reference/index.asciidoc.

* Update docs/reference/search/point-in-time-api.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/setup/restart-cluster.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/sql/endpoints/translate.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/snapshot-restore/restore-snapshot.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update repository-azure.asciidoc

* Update node-tool.asciidoc

* Update repository-azure.asciidoc

---------

Co-authored-by: amyjtechwriter <61687663+amyjtechwriter@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-05-24 12:32:46 +01:00
Salvatore Campagna
6b1e0603ce
Test histogram with zero-count buckets and synthetic source (#95400) 2023-05-03 15:23:36 +02:00
Michael Peterson
5169011325
Allow multiple field names/patterns for (path_)(un)match (#66364) (#95558)
* Allow multiple field names/patterns for (path_)(un)match (#66364)

Arrays of patterns are now allowed for dynamic_templates in the match,
unmatch, path_match and path_unmatch fields. DynamicTemplate has been modified to
support List<String> for these fields. The patterns can be either simple wildcards
or regex. As with previous functionality, when match_pattern="regex", simple wildcards
will be flagged with an error, but when match_pattern="simple", using regular expressions
in the match will not throw an error.

One new error pathway was added: if a user specifies a list of non-strings for
one of these pattern fields (e.g., "match": [10, false]) a MapperParserException
will be thrown.

A dynamic_template yamlRestTest was added. This is a BWC change, so the REST test
that uses arrays of patterns is limited to v8.9 and above.

Closes #66364.
2023-04-27 12:58:49 -04:00
Martijn van Groningen
49e8ee4269
Remove remaining tsdb tech preview labels (#95563)
Remove tech preview label from a number of tsdb settings and mapping attributes.
2023-04-26 12:11:03 +02:00
Mayya Sharipova
4d6e451d8b
Add an experimental label for 2048 vector dims (#95395)
Add an experimental lable for increased vector dims.

Relates to PR#95257
2023-04-20 07:48:12 -04:00
Salvatore Campagna
ec2bdee31b
Add time_series_dimensions param to flattened docs (#95374) 2023-04-20 10:58:12 +02:00
Martijn van Groningen
1f40ced134
Tiny tsdb docs update (#95333)
Update definition of metric type counter to include it resets to zero.

Just like is defined on the tsdb page:
https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html#time-series-metric
2023-04-18 11:17:31 -04:00
Mayya Sharipova
32c17d79c5
Increase max number of vector dims to 2048 (#95257)
Currently Lucene limits the max number of vector dimensions to 1024.
This commit overrides KnnFloatVectorField and KnnByteVectorField
classes to increase the limit to 2048 for indexed vectors in ES.
2023-04-17 09:05:49 -04:00
Salvatore Campagna
0eeef45ea2
Synthetic source support for flattened fields (#94842)
Here we add synthetic source support for fields whose type is flattened.
Note that flattened fields and synthetic source have the following limitations,
all arising from the fact that in synthetic source we just see key/value pairs
when reconstructing the original object and have no type information in mappings:

* flattened fields use sorted set doc values of keywords, which means two things: 
   first we do not allow duplicate values, second we treat all values as keywords
* reconstructing array of objects results in nested objects (no array)
* reconstructing arrays with just one element results in a single-value field since we
   have no way to distinguish single-valued from multi-values fields other then looking
   at the count of values
2023-04-11 10:54:28 +02:00
Jim Ferenczi
57cbbb3fcd
Minor ann docs update (#94783)
Replace the link to the deprecated knn search API and
added a link to the nightly benchmarks in Rally.
2023-03-31 17:59:25 +01:00
Alan Woodward
b2cf4757f3
Fix backwards description in runtime fields documentation (#94608) (#94642)
`runtime_mappings` is the name of the param in the search request. In the 
document `put` statement, it's called `runtime`

Co-authored-by: Matthew Hinea <matthew.hinea@gmail.com>
2023-03-22 11:53:35 -04:00
Ignacio Vera
397d52e24b
Allow docvalues-only search on geo_shape (#94396)
allows searching on a geo_shape field type when the field is not indexed (index: false) but just doc values are enabled.
2023-03-08 16:30:06 +01:00
Hritik Kumar
f5af004117
Support ignore_malformed in boolean fields (#93239)
This PR enables the `ignore_malformed`parameter to be accepted as an option in 
boolean field mappings. Support for synthetic source is not added yet, so if
`ignore_malformed` is set to true, synthetic source isn't supported.

Closes #89542
2023-02-21 18:22:10 +01:00
Przemyslaw Gomulka
b0ba832791
[doc] Mention dates_nanos in dates field type page (#93828) 2023-02-15 16:58:24 +01:00
Benjamin Trent
e8c5ed46c6
Fixing our docs for vector sizing calculation (#93703) 2023-02-13 07:52:53 -05:00
Benjamin Trent
323a13ac3f
Add term query support to rank_features mapped field (#93247)
This adds term query capabilities for rank_features fields. term queries against rank_features are not scored in the typical way as regular fields. This is because the stored feature values take advantage of the term frequency storage mechanism, and thus regular BM25 does not work.

Instead, a term query against a rank_features field is very similar to linear rank_feature query. If more complicated combinations of features and values are required, the rank_feature query should be used.
2023-02-01 13:32:13 -05:00
David Turner
ce736dd0e0 Revert "enhancement: boolean field to support ignore_malformed (#90122)"
This was merged in error without a full CI run, and has some issues.

This reverts commit edcdc43519.
This reverts commit 26c0a35558.
2023-01-25 15:09:59 +00:00
Hritik Kumar
edcdc43519
enhancement: boolean field to support ignore_malformed (#90122)
* enhancement: boolean field to support ignore_malformed

* fix: changes in current builder for BooleanFieldMappers within tests files.

* Updating documentation

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
2023-01-25 13:56:50 +00:00
Christos Soulios
a183843893
[DOCS] Fix incorrect statement for aggregate_metric_double field type (#92961)
Documentation incorrectly states that all aggregations are supported by
the `aggregate_metric_double` field.

This PR rectifies this  error.

Closes #92236
2023-01-16 12:33:20 -05:00
Dale Visser
1a9150dddb [Docs] Differentiate runtime field and indexed field (#91057)
Clarify wording of upgrading runtime fields to index field.
2023-01-13 17:05:26 +01:00
Abdon Pijpelink
85e965a35c
[DOCS] Remove experimental flag from index vectors for kNN search docs (#92867) 2023-01-12 15:57:28 +01:00
Nicolas Ruflin
71739416cf
[Docs] Add more details to the index option docs (#92606)
Docs around the `index` option were not very precise. The term "typical" was used without describing for which fields querying is still available when `index: false` is set. But more precise docs existed in the `doc_values` documentation found here for the index option: https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html This docs were mostly copied over.

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-01-04 09:09:21 +01:00
Christoph Büscher
8067f01d48
Runtime fields to optionally ignore script errors (#92380)
Currently Elasticsearch always returns a shard failure once a runtime error arises from using a runtime field, the exception being script-less runtime fields. This also means that execution of the query for that shard stops, which is okay for development and exploration. In a production scenario, however, it is often desirable to ignore runtime errors and continue with the query execution.

This change adds a new a new on_script_error parameter to runtime field definitions similar to the already existing
parameter for index-time scripted fields. When `on_script_error` is set to `continue`, errors from script execution are effectively ignored. This means affected documents don't show up in query results, but also don't prevent other matches from the same shard. Runtime fields accessed through the fields API don't return values on errors, aggregations will ignore documents that throw errors.

Note that this change affects scripted runtime fields only, while leaving default behaviour untouched. Also, ignored errors are not reported back to users for now.

Relates to #72143
2022-12-23 09:29:12 +01:00
Madhusudhan Konda
af65e71114
The exception is inserted in a code block (#90325)
* The exception is inserted in a code block

* Update docs/reference/mapping/types/text.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-12-21 17:22:35 +01:00
QY
7b17e1b5dc
[DOCS] Remove outdated note in Date field type (#92408)
Negative epoch timestamps are supported in 8.2.0 by pr #80208
2022-12-20 14:01:11 +01:00
Nik Everett
b9bb7252be
Docs: synthetic _source can't params._source (#91630)
This documents that `params._source` isn't available for synthetic
`_source` indices and suggests to instead use `doc['foo']` or
`field('foo')`.
2022-11-22 15:23:30 -05:00