Commit graph

832 commits

Author SHA1 Message Date
Carlos Delgado
f8e516eb9c
Update sparse_vector docs on index version availability (#107315) 2024-04-10 17:41:42 +02:00
Benjamin Trent
89bf4b33e8
Make int8_hnsw our default index for new dense-vector fields (#106836)
For float32, there is no compelling reason to use all the memory
required by default for HNSW. Using `int8_hnsw` provides a much saner
default when it comes to cost vs relevancy. 

So, on all new indices that use `dense_vector` and want to index them
for fast search, we will default to `int8_hnsw`. 

Users can still customize their parameters, or prefer `hnsw` over
float32 if they so desire.
2024-04-01 08:23:32 -04:00
Oleksandr Kolomiiets
9e6b893896
Text fields are stored by default in TSDB indices (#106338)
* Text fields are stored by default with synthetic source

Synthetic source requires text fields to be stored or have keyword
sub-field that supports synthetic source. If there are no keyword fields
 users currently have to explicitly set 'store' to 'true' or get a
validation exception. This is not the best experience. It is quite
likely that setting `store` to `true` is  the correct thing to do but
users still get an error and need to investigate it. With this change if
 `store` setting is not specified in such context it  will be set to
 `true` by default. Setting it explicitly to `false` results in the
 exception.

Closes #97039
2024-03-26 13:37:19 -07:00
István Zoltán Szabó
5afc59b07e
[DOCS] Creates a semantic_text field type docs page. (#106528) 2024-03-20 11:05:52 +01:00
Oleksandr Kolomiiets
28f3977a2e
[DOCS] time_series_dimension fields do not support ignore_above (#106203)
* [DOCS] `time_series_dimension` fields do not support `ignore_above`

There is existing validation for this combination of parameters but
it was not documented.

Closes #99044

* Remove maximum size constraint

* Add reasoning for constraints
2024-03-13 08:40:16 -07:00
Benjamin Trent
61b3d98227
Add note about optional times and epochs (#105786) 2024-03-05 08:44:03 -05:00
Liam Thompson
4bea4a7a10
[Docs] Tiny format fix (#105820) 2024-02-29 09:32:42 +01:00
Felix Barnsteiner
dee0be589c
Flatten object mappings when subobjects is false (#103542) 2024-02-22 11:43:12 +01:00
Andrew Wilkins
5f90978296
Add unmatch_mapping_type, and support array of types (#103171)
Add an `unmatch_mapping_type` condition to dynamic templates (supporting
one or more types), and add support for specifying a list of types to
`match_mapping_type`.

Closes https://github.com/elastic/elasticsearch/issues/102795 Closes
https://github.com/elastic/elasticsearch/issues/102807
2024-02-09 10:42:26 -05:00
Benjamin Trent
43362d5de5
Add new int8_flat and flat vector index types (#104872)
This adds two new vector index types:  - flat   - int8_flat

Both store the vectors in a flat space and search is brute-force over
the vectors in the index.   For the regular `flat` index, this can be
considered syntactic sugar that allows `knn` queries without having to
put indices within HNSW. 

For `int8_flat`, this allows float vectors to be stored in a flat
manner, but also automatically quantized.
2024-02-05 12:56:13 -05:00
Felix Barnsteiner
f642b8a3aa
Add setting to ignore dynamic fields when field limit is reached (#96235)
Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit`
index setting.

When set to `true`, new fields are added to the mapping as long as the
field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields
that would exceed the limit are not added to the mapping, similar to
`dynamic: false`.  Ignored fields are added to the `_ignored` metadata
field.

Relates to https://github.com/elastic/elasticsearch/issues/89911

To make this easier to review, this is split into the following PRs: -
[x] https://github.com/elastic/elasticsearch/pull/102915 - [x]
https://github.com/elastic/elasticsearch/pull/102936 - [x]
https://github.com/elastic/elasticsearch/pull/104769

Related but not a prerequisite: - [ ]
https://github.com/elastic/elasticsearch/pull/102885
2024-02-02 05:53:52 -05:00
Abdon Pijpelink
1612ad1d65
fix typo (#103149) (#103381)
Fixed a typo and a small grammatical error in the explanation of the `null_value` option

(cherry picked from commit fa52f82838)

Co-authored-by: Nimrod Dolev <nimrodavid@gmail.com>
2023-12-13 07:17:00 -05:00
Chris Hegarty
ff22c90735
Merge branch 'main' into lucene_snapshot_9_9 2023-12-02 09:42:22 +00:00
Jorge Sanz
c622dad8dd
[Docs] Move coordinate note for geojson/wkt up to the beginning of the geo_shape page (#102857)
* Move coordinate note for geojson/wkt up to the beginning of the page

* Add links to GeoJSON and WKT specs
2023-12-01 15:20:42 +01:00
Benjamin Trent
f00364aefd
Add byte quantization for float vectors in HNSW (#102093)
Adds new `quantization_options` to `dense_vector`. This allows for
vectors to be automatically quantized to `byte` when indexed.

Example:

```
PUT vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "int8_hnsw"
        }
      }
    }
  }
}
```

When querying, the query vector is automatically quantized and used when
querying the HNSW graph. This reduces the memory required to only `25%`
of what was previously required for `float` vectors at a slight loss of
accuracy.

This is currently only available when `index: true` and when using
`hnsw`
2023-11-29 12:29:55 -05:00
amyjtechwriter
d25435e185
disabling source (#101839) 2023-11-07 13:43:28 +00:00
James Rodewig
4c69746c24
[DOCS] Update tech preview copy (#101606)
Updates the copy for tech preview and experimental features in the Elasticsearch docs.

Relates to https://github.com/elastic/docs/pull/2807
2023-10-31 10:31:07 -04:00
Carlos Delgado
f2dfbfe8c4
[DOCS] Add sparse-vector field type to docs, changed references (#100348) 2023-10-06 14:25:27 +02:00
Luca Cavanna
689a1e490a Merge branch 'main' into lucene_snapshot_9_8 2023-10-02 13:56:12 +02:00
Kostas Krikellas
98b9e819ee
Represent histogram value count as long (#99912)
* Represent histogram value count as long

Histograms currently use integers to store the count of each value,
which can overflow. Switch to using long integers to avoid this.

TDigestState was updated to use long for centroid value count in #99491

Fixes #99820

* Update docs/changelog/99912.yaml

* spotless fix
2023-09-29 12:30:55 +03:00
Benjamin Trent
92cea2797e
Add nested support for dense_vector fields and knn search (#99763)
* Nested dense_vector support

* Adjust nested support based on new lucene version

* fixing after rebase

* fixing some code

* fixing tests adding transport version

* spotless

* [Automated] Update Lucene snapshot to 9.9.0-snapshot-b3e67403aaf

* Adds new max_inner_product vector similarity function (#99527)

Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:

Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores

* requiring top level filter to be parent filter

* adding docs & fixing tests

* adding and fixing docs

* adding changlog

* removing unnecessary file changes

* removing unused imports

* fixing test

* maybe fix doc tests

* continue tests in docs

* fixing more tests

* fixing tests

---------

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2023-09-28 11:38:04 -04:00
Luca Cavanna
15c87b681c Merge branch 'main' into lucene_snapshot_9_8 2023-09-28 12:19:14 +02:00
Kostas Krikellas
137bb45662
Support runtime fields in synthetic source (#99796)
* Support runtime fields in synthetic source

* Update docs/changelog/99796.yaml

* Introduce SyntheticSourceProvider

* Address comments

* More fixes

* Fix checkstyle violation

* More unittest updates

* Use SourceProvider in MapperServiceTestCase

* Remove runtime field from unittest

* Update synthetic source doc
2023-09-26 14:29:56 +03:00
Luca Cavanna
b3e769987d Merge branch 'main' into lucene_snapshot_9_8 2023-09-22 13:11:10 +02:00
Mayya Sharipova
ddf17e6be5
Increase the max vector dims to 4096 (#99682) 2023-09-20 15:43:40 -04:00
Benjamin Trent
dee85de61c Adds new max_inner_product vector similarity function (#99527)
Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:

Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores
2023-09-20 20:51:46 +02:00
Benjamin Trent
83b70e37ef
Revert "Auto-normalize dot_product vectors at index & query (#98944)" (#99421)
This reverts commit 7b9c367aeb.
2023-09-11 09:33:17 -04:00
Kathleen DeRusso
258d0cb0be
Automatically map floats as dense vector (#98512) 2023-09-06 16:06:29 -04:00
Benjamin Trent
7b9c367aeb
Auto-normalize dot_product vectors at index & query (#98944)
`dot_product` requires vectors to be unit-length. Previously, we would
check that vectors were unit-length and throw if they were not. 

Instead, we will now auto-normalize vectors as they are indexed.

`cosine` will continue to behave as usual, not normalizing the vectors.

closes: https://github.com/elastic/elasticsearch/issues/98935
2023-08-30 09:50:49 -04:00
Carlos Delgado
2b838ae853
Dense vector field types are indexed by default (#98268)
* First version

* Spotless, I liked my version better

* Fix param default values

* Add a supplier for default value to ensure it's calculated correctly

* Can't improve this without breaking tests

* Added checks for not specifying a body in PUT requests

* Fix default provider for enum params

* Added yaml test

* Changed docs and fix TODO

* Removing synonyms changes

* Added separate methods for providing default value as suppliers in enums

* Fixed test

* Add a supplier for default value to ensure it's calculated correctly

* Added checks for not specifying a body in PUT requests

* Remove synonyms changes

* Remove some supplier changes

* Better call enumParam with supplier version

* Fix compiler error on supplier

* Apply validators or requires depending on index version

* Solved BWC tests that involved using validators instead of requiresParameters

* Add tests

* Spotless

* Update docs/changelog/98268.yaml

* Update changelog

* Update docs/changelog/98268.yaml

* PR comments

* PR feedback

* Serialize index only for new index versions

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-08-17 10:53:14 -04:00
Kuni Sen
225503a447
Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format (#98338)
* Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format

Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format

* Update docs/reference/mapping/dynamic/field-mapping.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-08-10 16:44:44 +09:00
Mayya Sharipova
2076183dee
Move vectors of > 1024 dims out of experimental (#96850)
With moving max dims check to codec from Lucene 9.8, we will always
have a way to provide our own codec with the max dims
defined by us.
2023-08-03 14:30:14 -04:00
Abdon Pijpelink
5947f3b455
[DOCS] Clarify TSDS/synthetic source/runtime field restrictions (#97980) 2023-08-03 18:28:08 +02:00
Craig Taverner
8151092b45
Documentation for time-series geo_line (#97373)
* Documentation for time-series geo_line

* Fix incorrect ids in geoline docs

* Some updates from review

Added image of kibana map, improved first example, linked to TSDS and added section on line simplification with link to wikipedia.

* Diagrams of truncation versus simplification
2023-07-05 17:53:27 +02:00
Abdon Pijpelink
16aba067a0
[DOCS] Make 2028 dims 'experimental' warning inline (#96369) 2023-05-30 10:13:38 +02:00
debadair
777598d602
[DOCS] Remove redirect pages (#88738)
* [DOCS] Remove manual redirects

* [DOCS] Removed refs to modules-discovery-hosts-providers

* [DOCS] Fixed broken internal refs

* Fixing bad cross links in ES book, and adding redirects.asciidoc[] back into docs/reference/index.asciidoc.

* Update docs/reference/search/point-in-time-api.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/setup/restart-cluster.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/sql/endpoints/translate.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/snapshot-restore/restore-snapshot.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update repository-azure.asciidoc

* Update node-tool.asciidoc

* Update repository-azure.asciidoc

---------

Co-authored-by: amyjtechwriter <61687663+amyjtechwriter@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-05-24 12:32:46 +01:00
Salvatore Campagna
6b1e0603ce
Test histogram with zero-count buckets and synthetic source (#95400) 2023-05-03 15:23:36 +02:00
Michael Peterson
5169011325
Allow multiple field names/patterns for (path_)(un)match (#66364) (#95558)
* Allow multiple field names/patterns for (path_)(un)match (#66364)

Arrays of patterns are now allowed for dynamic_templates in the match,
unmatch, path_match and path_unmatch fields. DynamicTemplate has been modified to
support List<String> for these fields. The patterns can be either simple wildcards
or regex. As with previous functionality, when match_pattern="regex", simple wildcards
will be flagged with an error, but when match_pattern="simple", using regular expressions
in the match will not throw an error.

One new error pathway was added: if a user specifies a list of non-strings for
one of these pattern fields (e.g., "match": [10, false]) a MapperParserException
will be thrown.

A dynamic_template yamlRestTest was added. This is a BWC change, so the REST test
that uses arrays of patterns is limited to v8.9 and above.

Closes #66364.
2023-04-27 12:58:49 -04:00
Martijn van Groningen
49e8ee4269
Remove remaining tsdb tech preview labels (#95563)
Remove tech preview label from a number of tsdb settings and mapping attributes.
2023-04-26 12:11:03 +02:00
Mayya Sharipova
4d6e451d8b
Add an experimental label for 2048 vector dims (#95395)
Add an experimental lable for increased vector dims.

Relates to PR#95257
2023-04-20 07:48:12 -04:00
Salvatore Campagna
ec2bdee31b
Add time_series_dimensions param to flattened docs (#95374) 2023-04-20 10:58:12 +02:00
Martijn van Groningen
1f40ced134
Tiny tsdb docs update (#95333)
Update definition of metric type counter to include it resets to zero.

Just like is defined on the tsdb page:
https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html#time-series-metric
2023-04-18 11:17:31 -04:00
Mayya Sharipova
32c17d79c5
Increase max number of vector dims to 2048 (#95257)
Currently Lucene limits the max number of vector dimensions to 1024.
This commit overrides KnnFloatVectorField and KnnByteVectorField
classes to increase the limit to 2048 for indexed vectors in ES.
2023-04-17 09:05:49 -04:00
Salvatore Campagna
0eeef45ea2
Synthetic source support for flattened fields (#94842)
Here we add synthetic source support for fields whose type is flattened.
Note that flattened fields and synthetic source have the following limitations,
all arising from the fact that in synthetic source we just see key/value pairs
when reconstructing the original object and have no type information in mappings:

* flattened fields use sorted set doc values of keywords, which means two things: 
   first we do not allow duplicate values, second we treat all values as keywords
* reconstructing array of objects results in nested objects (no array)
* reconstructing arrays with just one element results in a single-value field since we
   have no way to distinguish single-valued from multi-values fields other then looking
   at the count of values
2023-04-11 10:54:28 +02:00
Jim Ferenczi
57cbbb3fcd
Minor ann docs update (#94783)
Replace the link to the deprecated knn search API and
added a link to the nightly benchmarks in Rally.
2023-03-31 17:59:25 +01:00
Alan Woodward
b2cf4757f3
Fix backwards description in runtime fields documentation (#94608) (#94642)
`runtime_mappings` is the name of the param in the search request. In the 
document `put` statement, it's called `runtime`

Co-authored-by: Matthew Hinea <matthew.hinea@gmail.com>
2023-03-22 11:53:35 -04:00
Ignacio Vera
397d52e24b
Allow docvalues-only search on geo_shape (#94396)
allows searching on a geo_shape field type when the field is not indexed (index: false) but just doc values are enabled.
2023-03-08 16:30:06 +01:00
Hritik Kumar
f5af004117
Support ignore_malformed in boolean fields (#93239)
This PR enables the `ignore_malformed`parameter to be accepted as an option in 
boolean field mappings. Support for synthetic source is not added yet, so if
`ignore_malformed` is set to true, synthetic source isn't supported.

Closes #89542
2023-02-21 18:22:10 +01:00
Przemyslaw Gomulka
b0ba832791
[doc] Mention dates_nanos in dates field type page (#93828) 2023-02-15 16:58:24 +01:00
Benjamin Trent
e8c5ed46c6
Fixing our docs for vector sizing calculation (#93703) 2023-02-13 07:52:53 -05:00