This PR adds two new REST endpoints, for listing queries and getting information on a current query.
* Resolves#124827
* Related to #124828 (initial work)
Changes from the API specified in the above issues:
* The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed.
List queries response:
```
GET /_query/queries
// returns for each of the running queries
// query_id, start_time, running_time, query
{ "queries" : {
"abc": {
"id": "abc",
"start_time_millis": 14585858875292,
"running_time_nanos": 762794,
"query": "FROM logs* | STATS BY hostname"
},
"4321": {
"id":"4321",
"start_time_millis": 14585858823573,
"running_time_nanos": 90231,
"query": "FROM orders | LOOKUP country_code ON country"
}
}
}
```
Get query response:
```
GET /_query/queries/abc
{
"id" : "abc",
"start_time_millis": 14585858875292,
"running_time_nanos": 762794,
"query": "FROM logs* | STATS BY hostname"
"coordinating_node": "oTUltX4IQMOUUVeiohTt8A"
"data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"]
}
```
Today `ActionResponse$Empty` implements `ToXContentObject`, but yields
no bytes of content when serialized which creates an invalid JSON
response. This commit removes the bogus interface and adjusts the
affected REST APIs to send a `text/plain` response instead.
Update the PerFieldFormatSupplier so that new standard indices use the
Lucene101PostingsFormat instead of the current default ES812PostingsFormat.
Currently, use of the new codec is gated behind a feature flag.
* [main] Move system indices migration to migrate plugin
It seems the best way to fix#122949 is to use existing data stream reindex API. However, this API is located in the migrate x-pack plugin. This commit moves the system indices migration logic (REST handlers, transport actions, and task) to the migrate plugin.
Port of #123551
* [CI] Auto commit changes from spotless
* Fix compilation
* Fix tests
* Fix test
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This action solely needs the cluster state, it can run on any node.
Since this is the last class/action that extends the `ClusterInfo`
abstract classes, we remove those classes too as they're not required
anymore.
Relates #101805
In this PR we introduce the data stream API in the `es-rest-api` using
the feature flag feature. This enabled us to use the `yamlRestTests`
tests instead of the `javaRestTests`.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
This test failed when the `disk.indices.forecast` value was a decimal number.
We adjust the regex to allow decimal values and for consistency we also allow negative values.
Fixes#125711Fixes#125848Fixes#125661
I was debating on having this tests in the original PR anyways. It ain't
worth the flakiness. We know the oversampling setting gets updated given
the other tests.
closes: https://github.com/elastic/elasticsearch/issues/125851
This change moves the query phase a single roundtrip per node just like can_match or field_caps work already.
A a result of executing multiple shard queries from a single request we can also partially reduce each node's query results on the data node side before responding to the coordinating node.
As a result this change significantly reduces the impact of network latencies on the end-to-end query performance, reduces the amount of work done (memory and cpu) on the coordinating node and the network traffic by factors of up to the number of shards per data node!
Benchmarking shows up to orders of magnitude improvements in heap and network traffic dimensions in querying across a larger number of shards.
This allows a `rescore_vector: {oversample: 0}` to indicate bypassing
oversampling and rescoring.
This is useful for:
- Updating a quantized mapping to turn off automatic rescoring
- Bypassing oversampling at query time in an ad-hoc manner if its on by default in the mapping
closes: https://github.com/elastic/elasticsearch/issues/125157
Since #122905 we were throwing NPEs (i.e. 5xxs) when a rollover request has an unknown/non-existent target. Before that, we returned a 400 - illegal argument exception. We now return a 404 which matches "missing target" better. Additionally, to avoid this from happening again, we add a YAML test that asserts the correct exception behavior.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
Frozen indices, the freeze index API and the private index.frozen setting have been removed with #120539.
There is also a search throttled thread pool that can now be removed, as well as a private search.throttled index settings that is no longer used as it could only be set internally by freezing an index.
While the index setting is private and can be removed, as it should no longer be present in any index on 9.0+ indices, the thread pool settings associated to the removed pool are still accepted as no-op in case users have customized them and are upgrading without removing these. These will also trigger a deprecating warning.
This change also removes the search.throttled related output from the thread pool section of the cluster info API.
This adds a new parameter to the quantized index mapping that allows
default oversampling and rescoring to occur.
This doesn't adjust any of the defaults. It allows it to be configured.
When the user provides `rescore_vector: {oversample: <number>}` in the
query it will overwrite it.
For example, here is how to use it with bbq:
```
PUT rescored_bbq
{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"index_options": {
"type": "bbq_hnsw",
"rescore_vector": {"oversample": 3.0}
}
}
}
}
}
```
Then, when querying, it will auto oversample the `k` by `3x` and rerank
with the raw vectors.
```
POST _search
{
"knn": {
"query_vector": [...],
"field": "vector"
}
}
```
On index creation, its possible to configure an hunspell analyzer, but
reference a locale file that actually doesn't exist or isn't accessible.
This error, like our other user dictionary errors, should be an IAE not
an ISE.
closes: https://github.com/elastic/elasticsearch/issues/123729
The 7.x routes for ml trained models _ml/inference/ have been deprecated
since 8 and replaced with _ml/trained_models. Also removes query
parameters that are no longer supported.
This feature flag controls whether synthetic recovery source is enabled by default when the source mode is synthetic.
The synthetic recovery source feature itself is already available via the index.recovery.use_synthetic_source index setting and can be enabled by anyone using synthetic source.
The default value of index.recovery.use_synthetic_source setting defaults to true when index.mapping.source.mode is enabled. The index.mapping.source.mode default to true if index.mode is logsdb or time_series.
In other words, with this change synthetic recovery source will be enabled by default for logsdb and tsdb.
Closes#116726
* Fix Gradle Deprecation warning as declaring an is- property with a Boolean type has been deprecated.
* Make use of new layout.settingsFolder api to address some cross project references
* Fix buildParams snapshot check for multiprojet projects
* Updating error message when index field type is unknown
* Fix style issue
* Add yaml test for invalid field type error message
* Update docs/changelog/122860.yaml
* Updating error message for runtime and multi field type parser
* add and fix yaml tests
* Fix code styles by running spotlessApply
* Update changelog
* Updatig the test in yml
* Updating error message for runtime
* Fix failing yaml tests
* Update error message to Fix unit tests
* fix serverless qa test
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field.
For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]
Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1]
Empty arrays are also supported by encoding a zigzag vint array of zero elements.
Limitations:
currently only doc values based array support for keyword field mapper.
multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c]
arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"].
These limitations can be addressed, but some require more complexity and or additional storage.
With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).
The `local` param for the `GetFieldMapping` API was deprecated in #55014
and I think #57265 aimed to propogate that deprecation to the REST API
spec, but it changed `get_mapping.json` instead of
`get_field_mapping.json`. #55100 removed the `local` param for the
_field_ mapping API so we can safely remove the field from the spec and
remove the YAML test.
We shouldn't run the post-snapshot-delete cleanup work on the master
thread, since it can be quite expensive and need not block subsequent
cluster state updates. This commit forks it onto a `SNAPSHOT` thread.
This PR extends the work done in #121751 by enabling a sparse doc values index for the @timestamp field in LogsDB.
Similar to the previous PR, the setting index.mapping.use_doc_values_skipper will override the index mapping parameter when all of the following conditions are met:
* The index mode is LogsDB.
* The field name is @timestamp.
* Index sorting is configured on @timestamp (regardless of whether it is a primary sort field or not).
* Doc values are enabled.
This ensures that only one index structure is defined on the @timestamp field:
* If the conditions above are met, the inverted index is replaced with a sparse doc values index.
* This prevents both the inverted index and sparse doc values index from being enabled together, reducing unnecessary storage overhead.
This change aligns with our goal of optimizing LogsDB for storage efficiency while possibly maintaining reasonable query latency performance. It will enable us to run benchmarks and evaluate the impact of sparse indexing on the @timestamp field as well.