Commit graph

3822 commits

Author SHA1 Message Date
Quentin Pradet
c473da5e64
Fix rest-api-spec and docs for bulk API (#118415) 2024-12-11 12:38:42 +04:00
Carlos Delgado
59967727cf
kNN vector rescoring for quantized vectors (#116663) 2024-12-11 09:14:18 +01:00
Keith Massey
ae598ee513
Adding get migration reindex status (#118267)
This adds a new transport action to get the status of a migration
reindex (started via the API at #118109), and a new rest action to use
it. The rest action accepts the data stream or index name, and returns
the status. For example if the reindex task exists for data stream
`my-data-stream`:

```
GET /_migration/reindex/my-data-stream/_status?pretty
```

returns

```
{
  "start_time" : 1733519098570,
  "complete" : true,
  "total_indices" : 1,
  "total_indices_requiring_upgrade" : 0,
  "successes" : 0,
  "in_progress" : 0,
  "pending" : 0,
  "errors" : [ ]
}
```

If a reindex task does not exist:

```
GET _migration/reindex/my-data-stream/_status?pretty
```

Then a 404 is returned:

```
{
  "error" : {
    "root_cause" : [
      {
        "type" : "resource_not_found_exception",
        "reason" : "No migration reindex status found for [my-data-stream]"
      }
    ],
    "type" : "resource_not_found_exception",
    "reason" : "No migration reindex status found for [my-data-stream]"
  },
  "status" : 404
}
```
2024-12-11 03:34:13 +11:00
Benjamin Trent
645657cc56
Remove old _knn_search tech preview API in v9 (#118104)
Removes the old `_knn_search` API that was never out of tech preview and
deprecated throughout the v8 cycle. 

To utilize the API, `compatible-with=8` can be utilized.
2024-12-11 02:01:25 +11:00
Benjamin Trent
31678a377d
Rename multi-dense vector to rank vectors (#118183)
renames `multi_dense_vector` field mapper and such to `rank_vectors` to
better describe its restricted usage.
2024-12-10 05:35:21 +11:00
Benjamin Trent
5e859d9301
Even better(er) binary quantization (#117994)
This measurably improves BBQ by adjusting the underlying algorithm to an
optimized per vector scalar quantization.

This is a brand new way to quantize vectors. Instead of there being a
global set of upper and lower quantile bands, these are optimized and
calculated per individual vector. Additionally, vectors are centered on
a common centroid. 

This allows for an almost 32x reduction in memory, and even better
recall than before at the cost of slightly increasing indexing time.

Additionally, this new approach is easily generalizable to various other
bit sizes (e.g. 2 bits, etc.). While not taken advantage of yet, we may
update our scalar quantized indices in the future to use this new
algorithm, giving significant boosts in recall.

The recall gains spread from 2% to almost 10% for certain datasets with
an additional 5-10% indexing cost when indexing with HNSW when compared
with current BBQ.
2024-12-10 03:06:27 +11:00
Keith Massey
8107cc9e5e
Adding reindex data stream rest action (#118109)
* Adding a _migration/reindex endpoint

* Adding rest api spec and test

* Adding a feature flag for reindex data streams

* updating json spec

* fixing a typo

* Changing mode to an enum

* Moving ParseFields into public static finals

* Commenting out test that leaves task running, until we add a cancel API

* Removing persistent task id from output

* replacing a string with a variable
2024-12-09 10:56:09 +11:00
Jim Ferenczi
0901a2734e
Add option to store sparse_vector outside _source (#117917)
This PR introduces an option for `sparse_vector` to store its values separately from `_source` by using term vectors.
This capability is primarly needed by the semantic text field.
2024-12-04 17:29:46 +00:00
Niels Bauman
032b42fcf7
Make TransportLocalClusterStateAction wait for cluster to unblock (#117230)
This will make `TransportLocalClusterStateAction` wait for a new state
that is not blocked. This means we need a timeout (again). For
consistency's sake, we're reusing the REST param `master_timeout` for
this timeout as well.

The only class that was using `TransportLocalClusterStateAction` was
`TransportGetAliasesAction`, so its request needed to accept a timeout
again as well.
2024-12-04 12:17:13 +01:00
Kostas Krikellas
f2addbc69a
Parse the contents of dynamic objects for [subobjects:false] (#117762)
* Parse the contents of dynamic objects for [subobjects:false]

* Update docs/changelog/117762.yaml

* add tests

* tests

* test dynamic field

* test dynamic field

* fix tests
2024-12-03 18:10:30 +00:00
Dimitris Rempapis
a514aad3c2
Fix/meta fields bad request (#117229)
400 rather a 5xx error is returned when _source / _seq_no / _feature / _nested_path / _field_names is requested, via fields
2024-12-03 10:58:20 +02:00
Quentin Pradet
e19f2b7fbb
Remove unsupported async_search parameters from rest-api-spec (#117626) 2024-11-29 17:22:37 +04:00
David Turner
17d280363c
Add YAML test for status in indices stats (#116711)
The feature added in #81954 lacks coverage in BwC situations. This
commit adds a YAML test to address that.
2024-11-29 09:54:38 +00:00
Martijn van Groningen
6a4b68d263
Add source mode stats to MappingStats (#117463) 2024-11-28 10:53:39 +01:00
Carlos Delgado
930a99cc38
Fix and unmute synonyms tests using timeout (#117486) 2024-11-25 20:19:24 +01:00
Quentin Pradet
2f8bb0b23c
Add missing async_search query parameters to rest-api-spec (#117312) 2024-11-25 11:43:36 +04:00
Panagiotis Bailis
7c18f1108d
Adding missing json spec for allow_partial_search_results in point-in-time (#117121) 2024-11-21 13:25:43 +02:00
Nhat Nguyen
fe7818af04
Deprecate _source.mode in mappings (#117172)
Re-introduce #116689
2024-11-20 10:27:44 -08:00
Martijn van Groningen
ac06a84e0a
Revert "Deprecate _source.mode in mappings (#116689)" (#117150)
This reverts commit 0d7b90e22a, because of bwc testing failures.
2024-11-20 13:38:26 +01:00
Nhat Nguyen
0d7b90e22a
Deprecate _source.mode in mappings (#116689)
This change deprecates _source.mode in mappings, replacing it with the 
index.mapping.source.mode index setting.
2024-11-19 17:53:52 -08:00
Mayya Sharipova
f9c5bc0b06
Remove legacy params from range query (#116970)
Remove to, from, include_lower, include_upper range query params.
These params have been removed from our documentation in v. 0.90.4 (d6ecdec),
and got deprecated in 8.16 in #113286.
2024-11-19 15:18:31 -05:00
David Turner
c72d5fdf1c
Revert "Index stats enhancement: creation date and tier_preference (#116339)" (#116959)
This reverts commit e0af1238fc.
2024-11-18 17:45:40 +01:00
Kostas Krikellas
c1c59eb41d
Rename tsdb integration test (#116909)
The current name doesn't allow skipping it to workaround compatibility
test failures:

```
> Task :rest-api-spec:yamlRestCompatTestTransform FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':rest-api-spec:yamlRestCompatTestTransform'.
> class com.fasterxml.jackson.databind.node.ObjectNode cannot be cast to class com.fasterxml.jackson.databind.node.ArrayNode (com.fasterxml.jackson.databind.node.ObjectNode and com.fasterxml.jackson.databind.node.ArrayNode are in unnamed module of loader org.gradle.internal.classloader.VisitableURLClassLoader$InstrumentingVisitableURLClassLoader @15eaac09)
```
2024-11-18 19:42:33 +11:00
David Kyle
590b75df21
[ML] Set inference API stability to stable (#116828) 2024-11-15 08:10:28 +00:00
Alexis Charveriat
e0af1238fc
Index stats enhancement: creation date and tier_preference (#116339)
* Expose tier preference as part of the index stats
* Also expose index creation date in index stats
* Added test
2024-11-15 09:08:42 +01:00
Niels Bauman
103a8b0960
Avoid ignoring yaml tests for retrieving index templates (#116446)
The `skip` caused the tests to be ignored instead of included.
2024-11-13 10:33:14 +01:00
Benjamin Trent
7369c0818d
Add new multi_dense_vector field for brute-force search (#116275)
This adds a new `multi_dense_vector` field that focuses on the maxSim
usecase provided by Col[BERT|Pali]. 

Indexing vectors in HNSW as it stands makes no sense. Performance wise
or for cost. However, we should totally support rescoring and
brute-force search over vectors with maxSim.

This is step one of many. Behind a feature flag, this adds support for
indexing any number of vectors of the same dimension.

Supports bit/byte/float.

Scripting support will be a follow up.

Marking as non-issue as its behind a flag and unusable currently.
2024-11-09 01:14:19 +11:00
Rassyan
c8a8d4d931
Add docvalue_fields Support for dense_vector Fields (#114484)
Currently dense_vector field don't support docvalue_fields.

This add this support for debugging purposes. Users can inspect
row values of their vectors even if the source is disabled.

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-11-07 17:24:39 -05:00
Quentin Pradet
90600bf7a3
Add missing header in put_data_lifecycle rest-api-spec (#116292) 2024-11-07 06:40:19 +04:00
Carlos Delgado
e59407251b
Synonyms test fix - update number of shards (#116224) 2024-11-06 09:52:26 +01:00
Kostas Krikellas
89bd58e04b
[TEST] Restore rest compat tests (#116229)
* Track source for objects and fields with [synthetic_source_keep:arrays] in arrays as ignored

* Update TransportResumeFollowActionTests.java

* rest compat fixes

* rest compat fixes

* update test

* Restore rest compat tests
2024-11-05 12:30:42 +02:00
Kostas Krikellas
6cf45366d5
Track source for objects and fields with [synthetic_source_keep:arrays] in arrays as ignored (#116065)
* Track source for objects and fields with [synthetic_source_keep:arrays] in arrays as ignored

* Update TransportResumeFollowActionTests.java

* rest compat fixes

* rest compat fixes

* update test
2024-11-04 11:32:43 +01:00
Kostas Krikellas
4573ab8ec1
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests - take 2 (#116072)
* Reapply "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)

This reverts commit e8bf344a28.

* [TEST] Replace _source.mode with index.mapping.source.mode in integration tests

* add reason

* add reason

* spotless

* revert unneeded
2024-11-04 09:39:34 +02:00
Oleksandr Kolomiiets
edd4ebf1af
Fix new logsdb tests (#116108) 2024-11-01 13:29:43 -07:00
Artem Prigoda
d93d333141
Remove checking of sync commit ids (#114246)
A Lucene commit doesn't contain sync ids `SegmentInfos` anymore, so we can't rely on them during recovery. The fields was marked as deprecated in #102343.
2024-11-01 16:18:12 +01:00
Pete Gillin
64e4c38708
Remove code for ?verbose in _segments API (#116030)
A change made in 8.0 intended to deprecate this parameter. However,
because the new code only checked for the presence of the parameter
and never consumed it, the effect was actually to remove support for
the parameter. This code therefore basically does nothing and can be
removed.
2024-11-01 09:29:07 +00:00
Kostas Krikellas
e8bf344a28
Revert "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)
This reverts commit a360757968.
2024-11-01 10:53:08 +02:00
Kostas Krikellas
a360757968
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests (#115926)
* Replace _source.mode with index.mapping.source.mode in integration tests

* fix tests

* revert 40_source_mode_setting.yml
2024-11-01 09:46:06 +02:00
Salvatore Campagna
3cbbcc5748
Default LogsDB value for ignore_dynamic_beyond_limit (#115265)
When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded:

1. **`index.mapping.total_fields.limit`**: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail.

2. **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by `index.mapping.total_fields.limit`. If set to `false`, indexing will fail once the limit is surpassed. However, if set to `true`, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit.

To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of **`index.mapping.total_fields.ignore_dynamic_beyond_limit` from `false` to `true` in LogsDB**. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure.

This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.
2024-10-31 15:54:42 +01:00
Stanislav Malyshev
5f4e681788
Fix CCS stats test (#115801)
Set index stats to be refreshed immediately - cached 0 size may be the
reason why it fails.

Fixes #115600
2024-10-31 03:37:24 +11:00
Kostas Krikellas
06eb0727c2
Use flattened names in ignored source (#115822)
* Use flattened names in ignored source

* spotless

* fix rest compat

* fix unittests

* expand dots
2024-10-29 20:12:43 +01:00
Artem Prigoda
ef2cf37a6d
Revert "Don't return or accept node_version in the Desired Nodes API (#114580)" (#115829)
This reverts commit c64226c350.
2024-10-29 12:03:19 +01:00
John Wagster
d4ac705d57
[CI] MixedClusterClientYamlTestSuiteIT test {p0=range/20_synthetic_source/Date range} failing - Removed Old Date range test because it's not longer validating useful code (#114057)
unmuting test and removing bwc test to get mixedClusterTest working
2024-10-25 09:26:51 -05:00
Nhat Nguyen
5714b989fa
Do not run lookup index YAML with two shards (#115608)
We can randomly inject a global template that defaults to 2 shards 
instead of 1. This causes the lookup index YAML tests to fail. To avoid 
this, the change requires specifying the default_shards setting for
these tests
2024-10-24 16:58:41 -07:00
Nhat Nguyen
f444c86f85
Add lookup index mode (#115143)
This change introduces a new index mode, lookup, for indices intended 
for lookup operations in ES|QL. Lookup indices must have a single shard
and be replicated to all data nodes by default. Aside from these
requirements, they function as standard indices. Documentation will be
added later when the lookup operator in ES|QL is implemented.
2024-10-24 13:47:20 -07:00
Artem Prigoda
c64226c350
Don't return or accept node_version in the Desired Nodes API (#114580)
It was deprecated in #104209 (8.13) and shouldn't be set or returned in 9.0

The Desired Nodes API is an internal API, and users shouldn't depend on its backward compatibility.
2024-10-24 18:19:14 +02:00
Pete Gillin
d7a9575d03
Remove deprecated local parameter from alias APIs (#115393)
This removes the `local` parameter from the `GET /_alias`, `HEAD /_alias`, and `GET /_cat/aliases` APIs. This option became a no-op and was deprecated in 8.12 by https://github.com/elastic/elasticsearch/pull/101815.

We continue to accept the parameter (deprecated, with no other effect) in v8 compatibility mode for `GET /_alias` and `HEAD /_alias`. We don't do this for `GET /_cat/aliases` where the [compatibility policy does not apply](https://github.com/elastic/elasticsearch/blob/main/REST_API_COMPATIBILITY.md#when-not-to-apply).
2024-10-24 15:58:24 +01:00
Oleksandr Kolomiiets
f04bf5c356
Apply workaround for synthetic source of object arrays inside nested objects (#115275) 2024-10-23 13:22:26 -07:00
Carlos Delgado
ba7d0954ef
Fix synonyms CI tests timeout (#114641) 2024-10-23 07:57:32 +02:00
Stanislav Malyshev
ffcd62e32b
Fix test - times can be 0 sometimes (#115260) 2024-10-21 12:01:46 -06:00