Commit graph

3798 commits

Author SHA1 Message Date
Chris Hegarty
19550a838f
Add dense vector off-heap stats to Node stats and Index stats APIs (#126704)
This change enhances the dense_vector section of the Nodes stats and Index stats APIs so that they report the desired size of off-heap memory for all indexed vectors. The dense_vector section of the Custer stats API remains unchanged.

The retrieval mechanism and structure of the new stats is the same across the various three stats APIs, but more fine-grained information is disclosed as when moving from Cluster -> Node -> Index API.

For Node stats, we aggregate the total byte sizes for all vectors, categorised by the data type. For example:

"dense_vector" : {
  "value_count" : 5,
  "off_heap" : {
    "total_size_in_bytes" : 27,
    "total_veb_size_in_bytes" : 3,
    "total_vec_size_in_bytes" : 23,
    "total_veq_size_in_bytes" : 0,
    "total_vex_size_in_bytes" : 1
  }
}
Index stats: same as Node stats with included field break down . For example:

"dense_vector" : {
  "value_count" : 5,
  "off_heap" : {
    "total_size_in_bytes" : 27,
    "total_veb_size_in_bytes" : 3,
    "total_vec_size_in_bytes" : 23,
    "total_veq_size_in_bytes" : 0,
    "total_vex_size_in_bytes" : 1,
    "fielddata" : {
      "bar" : {
        "veb_size_in_bytes" : 3,
        "vec_size_in_bytes" : 14,
        "vex_size_in_bytes" : 1
      },
      "foo" : {
        "vec_size_in_bytes" : 9
      }
    }
  }
The implementation accesses the actual statistics through reflection. This will be completely removed when Lucene exposes this, which is expected in Lucene 10.3
2025-04-23 15:04:44 +01:00
Carlos Delgado
4d4b962fd1
Synonyms API - Add refresh parameter to check synonyms index and reload analyzers (#126935)
* Add timeout to SynonymsManagementAPIService put synonyms

* Remove replicas 0, as that may impact serverless

* Add timeout to put synonyms action, fix tests

* Fix number of replicas

* Remove cluster.health checks for synonyms index

* Revert debugging

* Add integration test for timeouts

* Use TimeValue instead of an int

* Add YAML tests and REST API specs

* Fix a validation bug in put synonym rule

* Spotless

* Update docs/changelog/126314.yaml

* Remove unnecessary checks for null

* Fix equals / HashCode

* Checks that timeout is passed correctly to the check health method

* Use correctly the default timeout

* spotless

* Add monitor cluster privilege to internal synonyms user

* [CI] Auto commit changes from spotless

* Add capabilities to avoid failing on bwc tests

* Replace timeout for refresh param

* Add param to specs

* Add YAML tests

* Fix changelog

* [CI] Auto commit changes from spotless

* Use BWC serialization tests

* Fix bug in test parser

* Spotless

* Delete doesn't need reloading 🤦 removing it

* Revert "Delete doesn't need reloading 🤦 removing it"

This reverts commit 9c8e0b62be.

* [CI] Auto commit changes from spotless

* Fix refresh for delete synonym rule

* Fix tests

* Update docs/changelog/126935.yaml

* Add reload analyzers test

* reload_analyzers is not available on serverless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-22 17:23:06 +02:00
James Baiera
7b89f4d4a6
Add ability to redirect ingestion failures on data streams to a failure store (#126973)
Removes the feature flags and guards that prevent the new failure store functionality 
from operating in production runtimes.
2025-04-18 16:33:03 -04:00
Quentin Pradet
1f68bfbc3e
Add back inference.inference API (#126601) 2025-04-11 14:09:51 +04:00
Josh Mock
5f871c5cf5
Remove reference to dropped EIS API (#126422)
Co-authored-by: Quentin Pradet <quentin.pradet@elastic.co>
2025-04-09 12:06:00 +04:00
Gal Lalouche
953b9fbb83
ESQL: List/get query API (#124832)
This PR adds two new REST endpoints, for listing queries and getting information on a current query.

* Resolves #124827 
* Related to #124828 (initial work)

Changes from the API specified in the above issues:
* The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed.

List queries response:
```
GET /_query/queries
// returns for each of the running queries
// query_id, start_time, running_time, query

{ "queries" : {
 "abc": {
  "id": "abc",
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  },
 "4321": {
  "id":"4321",
  "start_time_millis": 14585858823573,
  "running_time_nanos": 90231,
  "query": "FROM orders | LOOKUP country_code ON country"
  }
 } 
}
```

Get query response:
```
GET /_query/queries/abc

{
 "id" : "abc",
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  "coordinating_node": "oTUltX4IQMOUUVeiohTt8A"
  "data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"]
}
```
2025-04-08 22:21:32 +03:00
David Turner
527d2a203b
Improve handling of empty response (#125562)
Today `ActionResponse$Empty` implements `ToXContentObject`, but yields
no bytes of content when serialized which creates an invalid JSON
response. This commit removes the bogus interface and adjusts the
affected REST APIs to send a `text/plain` response instead.
2025-04-07 12:10:07 +01:00
Jordan Powers
4c174a891f
Use Lucene101 postings format by default (#126080)
Update the PerFieldFormatSupplier so that new standard indices use the
Lucene101PostingsFormat instead of the current default ES812PostingsFormat.

Currently, use of the new codec is gated behind a feature flag.
2025-04-04 12:41:27 -07:00
Alexey Ivanov
fd7efe587e
[main] Move system indices migration to migrate plugin (#125437)
* [main] Move system indices migration to migrate plugin

It seems the best way to fix #122949 is to use existing data stream reindex API. However, this API is located in the migrate x-pack plugin. This commit moves the system indices migration logic (REST handlers, transport actions, and task) to the migrate plugin.

Port of #123551

* [CI] Auto commit changes from spotless

* Fix compilation

* Fix tests

* Fix test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-04 18:49:38 +01:00
Stanislav Malyshev
6043d9c675
Update allow_partial_results docs (#126257) 2025-04-03 22:13:49 -06:00
Niels Bauman
483f97915c
Run TransportGetIndexAction on local node (#125652)
This action solely needs the cluster state, it can run on any node.
Since this is the last class/action that extends the `ClusterInfo`
abstract classes, we remove those classes too as they're not required
anymore.

Relates #101805
2025-04-02 18:41:35 +01:00
Mary Gouseti
25050495b9
Data stream options convert to javaRestTests to yamlRestTests. (#126037)
In this PR we introduce the data stream API in the `es-rest-api` using
the feature flag feature. This enabled us to use the `yamlRestTests`
tests instead of the `javaRestTests`.
2025-04-03 01:32:54 +11:00
Niels Bauman
eb4d64f94a
Run TransportGetSettingsAction on local node (#126051)
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

Relates #101805
2025-04-02 15:05:31 +01:00
Niels Bauman
8028d5adde
Fix cat allocation YAML test (#126003)
This test failed when the `disk.indices.forecast` value was a decimal number.
We adjust the regex to allow decimal values and for consistency we also allow negative values.

Fixes #125711
Fixes #125848
Fixes #125661
2025-04-01 11:25:13 +01:00
Benjamin Trent
505f21ba42
Simplify tests, bypassing raw score test (#125877)
I was debating on having this tests in the original PR anyways. It ain't
worth the flakiness. We know the oversampling setting gets updated given
the other tests.

closes: https://github.com/elastic/elasticsearch/issues/125851
2025-03-31 23:49:29 +11:00
Armin Braun
fd2cc97541
Introduce batched query execution and data-node side reduce (#121885)
This change moves the query phase a single roundtrip per node just like can_match or field_caps work already. 
A a result of executing multiple shard queries from a single request we can also partially reduce each node's query results on the data node side before responding to the coordinating node.

As a result this change significantly reduces the impact of network latencies on the end-to-end query performance, reduces the amount of work done (memory and cpu) on the coordinating node and the network traffic by factors of up to the number of shards per data node!

Benchmarking shows up to orders of magnitude improvements in heap and network traffic dimensions in querying across a larger number of shards.
2025-03-29 16:53:18 +01:00
Carlos Delgado
968bddc462
Non existing synonyms sets do not fail shard recovery (#125659) 2025-03-27 18:04:20 +02:00
Benjamin Trent
d84eb1f53f
Update bbq test data to better distinguish docs (#125705)
Adjust the test data. I verified that the scores are now more
distinguishable when:

 - each doc has its own segment
 - when 1 & 2 are in the same segment but 3 is alone
 - 2 & 3 in the same segment but 1 alone
 - 1 & 3 in the same segment but 2 alone
 - all three in the same segment

closes: https://github.com/elastic/elasticsearch/issues/123727 closes:
https://github.com/elastic/elasticsearch/issues/124848
2025-03-28 00:12:56 +11:00
Benjamin Trent
dd58b0b6fa
Return appropriate error on null dims update instead of npe (#125716)
Calling `Object::toString` was trying to call `null.toString()`, really
it should have been `Objects::toString`, which accepts `null`.

closes: https://github.com/elastic/elasticsearch/issues/125713
2025-03-27 08:47:20 +11:00
Benjamin Trent
009a86a0e3
Allow zero for rescore_vector.oversample to indicate by-passing oversample and rescoring (#125599)
This allows a `rescore_vector: {oversample: 0}` to indicate bypassing
oversampling and rescoring. 

This is useful for:

 - Updating a quantized mapping to turn off automatic rescoring
 - Bypassing oversampling at query time in an ad-hoc manner if its on by default in the mapping

closes: https://github.com/elastic/elasticsearch/issues/125157
2025-03-27 06:56:51 +11:00
Stanislav Malyshev
07921a78a6
Handle long overflow in dates (#124048)
* Handle long overflow in dates
2025-03-26 18:57:04 +02:00
Niels Bauman
fdd453734d
Fix NPE in rolling over unknown target and return 404 (#125352)
Since #122905 we were throwing NPEs (i.e. 5xxs) when a rollover request has an unknown/non-existent target. Before that, we returned a 400 - illegal argument exception. We now return a 404 which matches "missing target" better. Additionally, to avoid this from happening again, we add a YAML test that asserts the correct exception behavior.
2025-03-22 12:59:13 +02:00
Lisa Cawley
97c5d4e149
Add more inference API REST specifications (#125187) 2025-03-21 09:44:37 +02:00
Benjamin Trent
e9c4b267c2
Adjusting 41_knn_search_bbq_hnsw tests to have explicit refresh (#125255) 2025-03-20 17:15:05 -04:00
Tommaso Teofili
6d3dac32c6
Let random_score yaml test explicitly fail on _id field (#125230)
* constrain the no-field scenario to 9.x
2025-03-20 14:16:02 +01:00
István Zoltán Szabó
8a741bfd62
Adds VoyageAI PUT Inference API. (#125198) 2025-03-19 13:29:14 +01:00
Quentin Pradet
7070f3fdbe
Add missing cause param to indices.put_template API (#125189) 2025-03-19 14:57:30 +04:00
Lisa Cawley
b5bc681191
Add Mistral inference API (#125063) 2025-03-18 22:11:12 -07:00
István Zoltán Szabó
c3222aba74
Adds EIS inference PUT API (#125082) 2025-03-18 16:19:00 +01:00
Stanislav Malyshev
f0ee146f7f
Document allow_partial_results (#125044) 2025-03-17 12:37:10 -06:00
Quentin Pradet
0bacede6cc
Add missing OpenAI and Watsonx inference APIs (#124989) 2025-03-17 18:42:09 +04:00
Tommaso Teofili
51877bb33c
Add yaml test for random_score in function_score query (#124893) 2025-03-17 10:59:01 +01:00
Niels Bauman
481d91c428
Run TransportGetMappingsAction on local node (#122921)
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

Relates #101805
2025-03-15 07:59:28 +00:00
Luca Cavanna
05c8453b2b
Remove search throttled index setting and thread pool (#124519)
Frozen indices, the freeze index API and the private index.frozen setting have been removed with #120539.

There is also a search throttled thread pool that can now be removed, as well as a private search.throttled index settings that is no longer used as it could only be set internally by freezing an index.

While the index setting is private and can be removed, as it should no longer be present in any index on 9.0+ indices, the thread pool settings associated to the removed pool are still accepted as no-op in case users have customized them and are upgrading without removing these. These will also trigger a deprecating warning.

This change also removes the search.throttled related output from the thread pool section of the cluster info API.
2025-03-14 12:04:35 +01:00
Benjamin Trent
b2c1c4e0f0
New vector_rescore parameter as a quantized index type option (#124581)
This adds a new parameter to the quantized index mapping that allows
default oversampling and rescoring to occur. 

This doesn't adjust any of the defaults. It allows it to be configured.
When the user provides `rescore_vector: {oversample: <number>}` in the
query it will overwrite it.

For example, here is how to use it with bbq:

```
PUT rescored_bbq
{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "index_options": {
          "type": "bbq_hnsw",
          "rescore_vector": {"oversample": 3.0}
        }
      }
    }
  }
}
```

Then, when querying, it will auto oversample the `k` by `3x` and rerank
with the raw vectors.

```
POST _search
{
  "knn": {
    "query_vector": [...],
    "field": "vector"
  }
}
```
2025-03-14 00:40:08 +11:00
Benjamin Trent
a1ee3c9291
Have create index return a bad request on poor formatting (#123761)
closes: https://github.com/elastic/elasticsearch/issues/123661
2025-03-07 04:24:54 +11:00
Jonathan Buttner
2a006ec1f4
Updating description of stream API (#124209) 2025-03-06 15:34:21 +01:00
Jonathan Buttner
3a472ebae9
[ML] Update inference api rest spec (#124151)
* Pulling api spec changes

* Fixing test and updating code javadoc
2025-03-06 08:26:49 -05:00
Benjamin Trent
a92b1d6892
Adjust exception thrown when unable to load hunspell dict (#123743)
On index creation, its possible to configure an hunspell analyzer, but
reference a locale file that actually doesn't exist or isn't accessible.

This error, like our other user dictionary errors, should be an IAE not
an ISE.

closes: https://github.com/elastic/elasticsearch/issues/123729
2025-03-06 06:19:21 +11:00
David Kyle
c3e7493d7a
[ML] Remove deprecated routes for ml trained models APIs (#124019)
The 7.x routes for ml trained models _ml/inference/ have been deprecated
since 8 and replaced with _ml/trained_models. Also removes query 
parameters that are no longer supported.
2025-03-05 16:09:37 +00:00
Martijn van Groningen
26de5343a2
Remove synthetic recovery source feature flag. (#122615)
This feature flag controls whether synthetic recovery source is enabled by default when the source mode is synthetic.

The synthetic recovery source feature itself is already available via the index.recovery.use_synthetic_source index setting and can be enabled by anyone using synthetic source.

The default value of index.recovery.use_synthetic_source setting defaults to true when index.mapping.source.mode is enabled. The index.mapping.source.mode default to true if index.mode is logsdb or time_series.

In other words, with this change synthetic recovery source will be enabled by default for logsdb and tsdb.

Closes #116726
2025-03-05 15:43:33 +01:00
Rene Groeschke
496c38e5a5
Reapply "Update Gradle wrapper to 8.13 (#122421)" (#123889) (#123896)
This reverts commit 36660f2e5f.
2025-03-05 08:02:13 +01:00
Rene Groeschke
36660f2e5f
Revert "Update Gradle wrapper to 8.13 (#122421)" (#123889)
This reverts commit e19b2264af.
2025-03-03 15:51:07 +01:00
Rene Groeschke
e19b2264af
Update Gradle wrapper to 8.13 (#122421)
* Fix Gradle Deprecation warning as declaring an is- property with a Boolean type has been deprecated.
* Make use of new layout.settingsFolder api to address some cross project references
* Fix buildParams snapshot check for multiprojet projects
2025-03-03 14:10:00 +01:00
Kathleen DeRusso
ae6474db63
Deprecate Behavioral Analytics CRUD apis (#122960)
* Deprecate Behavioral Analytics CRUD APIs

* Add allowed warning for REST Compatibility tests

* Update docs/changelog/122960.yaml

* Update changelog

* Update docs to add deprecation flags and fix failing tests

* Update changelog

* Update changelog again

* Update docs formatting

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Skip asciidoc test

---------

Co-authored-by: Efe Gürkan YALAMAN <efeyalaman@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Efe Gürkan YALAMAN <efeguerkan.yalaman@elastic.co>
2025-02-25 16:02:50 +01:00
Samiul Monir
5664f4f2ba
Improved error message when index field type is unknown (#122860)
* Updating error message when index field type is unknown

* Fix style issue

* Add yaml test for invalid field type error message

* Update docs/changelog/122860.yaml

* Updating error message for runtime and multi field type parser

* add and fix yaml tests

* Fix code styles by running spotlessApply

* Update changelog

* Updatig the test in yml

* Updating error message for runtime

* Fix failing yaml tests

* Update error message to Fix unit tests

* fix serverless qa test

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-02-24 13:16:22 -05:00
Quentin Pradet
17ed01471b
Add missing body to ML rest-api-spec API (#123235) 2025-02-24 19:56:01 +04:00
Quentin Pradet
d8284fba1a
Fix cat APIs query parameters (#123020) 2025-02-21 14:46:05 +04:00
Martijn van Groningen
43665f0a35
Store arrays offsets for keyword fields natively with synthetic source (#113757)
The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field.

For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]

Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1]

Empty arrays are also supported by encoding a zigzag vint array of zero elements.

Limitations:

currently only doc values based array support for keyword field mapper.
multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c]
arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"].
These limitations can be addressed, but some require more complexity and or additional storage.

With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).
2025-02-20 09:20:49 +01:00
Niels Bauman
618de4855d
Remove local param from field mapping API spec (#122922)
The `local` param for the `GetFieldMapping` API was deprecated in #55014
and I think #57265 aimed to propogate that deprecation to the REST API
spec, but it changed `get_mapping.json` instead of
`get_field_mapping.json`. #55100 removed the `local` param for the
_field_ mapping API so we can safely remove the field from the spec and
remove the YAML test.
2025-02-19 22:51:11 +01:00