* Adding new bbq_ivf format behind a feature flag
* adding tests
* [CI] Auto commit changes from spotless
* addressing pr comments
* fixing flagging for yaml tests
* adjust ivf search to utilize num candidates as approximation measure
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Add multi-project support for more stats APIs
This affects the following APIs:
- `GET _nodes/stats`:
- For `indices`, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name.
- For `ingest`, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project.
- `GET /_cluster/stats`:
- For `ingest`, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project.
- `GET /_info`:
- For `ingest`, same as for `GET /_nodes/stats`.
This is done by making `IndicesService.stats()` and `IngestService.stats()` include project IDs in the `NodeIndicesStats` and `IngestStats` objects they return, and making those stats objects incorporate the project IDs when converting to XContent.
The transitive callers of these two methods are rather extensive (including all callers to `NodeService.stats()`, all callers of `TransportNodesStatsAction`, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases:
- The behaviour change is one of the desired enhancements described above.
- There is no behaviour change because it was getting node stats but neither `indices` nor `ingest` stats were requested.
- There is no behaviour change because it was getting `indices` and/or `ingest` stats but only using aggregate values.
- In `MachineLearningUsageTransportAction` and `TransportGetTrainedModelsStatsAction`, the `IngestStats` returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.)
- `MonitoringService` will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)
* Update test/external-modules/multi-project/build.gradle
Change suggested by Niels.
Co-authored-by: Niels Bauman <33722607+nielsbauman@users.noreply.github.com>
* Respond to review comments
* fix merge weirdness
* [CI] Auto commit changes from spotless
* Fix test compilation following upstream change to base class
* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/datatiers/DataTierUsageFixtures.java
Co-authored-by: Niels Bauman <33722607+nielsbauman@users.noreply.github.com>
* Make projects-by-index map nullable and omit in single-project; always include project prefix in XContent in multip-project, even if default; also incorporate one other review comment
* Add a TODO
* update IT to reflect changed behaviour
* Switch to using XContent.Params to indicate whether it is multi-project or not
* Refactor NodesStatsMultiProjectIT to common up repeated assertions
* Defer use of ProjectIdResolver in REST handlers to keep tests happy
* Include index UUID in "unknown project" case
* Make the index-to-project map empty rather than null in the BWC deserialization case.
This works out fine, for the reasons given in the comment. As it happens, I'd already forgotten to do the null check in the one place it's actively used.
* remove a TODO that is done, and add a comment
* fix typo
* Get REST YAML tests working with project ID prefix TODO finish this
* As a drive-by, fix and un-suppress one of the health REST tests
* [CI] Auto commit changes from spotless
* TODO ugh
* Experiment with different stashing behaviour
* [CI] Auto commit changes from spotless
* Try a more sensible stash behaviour for assertions
* clarify comment
* Make checkstyle happy
* Make the way `Assertion` works more consistent, and simplify implementation
* [CI] Auto commit changes from spotless
* In RestNodesStatsAction, make the XContent params to channel.request(), which is the value it would have had before this change
---------
Co-authored-by: Niels Bauman <33722607+nielsbauman@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Not sure why this was defined as `private` in #112348, it should have
been `public`. This commit fixes the visibility so we generate docs for
this API.
After creating index with alias using the following request
```
PUT test-index
{
"aliases": {
"alias1": {
"is_write_index": "true"
}
}
}
```
we got the following result for get index request:
```
{
"test-index" : {
"aliases" : {
"alias1" : { }
},
"mappings" : { },
"settings" : {
...
}
}
}
```
The `is_write_index` field is missing because string boolean value is
not supported for this filed and `no warning message showed`, which will
mislead the users. In #120453 I open a PR to let the createIndex API
support string boolean values for `is_write_index` field, but @dakrone
think it's better to be strict about boolean values. So I open this PR
to let the Alias class throw exception for the unsupport value type to
avoid the slience swallowing of this case.
* Unit test to validate default behavior
* adding default value to oversample for bbq
* Fix code style issue
* Update docs/changelog/127134.yaml
* Update changelog
* Adding index version to support only new indices
* Update index version name to better match
* Adding a simple yaml test to verify the yaml functionality for oversample value
* Refactor knn float to add rescore vector by default when index type is one of bbq
* adding yaml tests to verify oversampel default value
* Fixing format issue for not_exists
This change enhances the dense_vector section of the Nodes stats and Index stats APIs so that they report the desired size of off-heap memory for all indexed vectors. The dense_vector section of the Custer stats API remains unchanged.
The retrieval mechanism and structure of the new stats is the same across the various three stats APIs, but more fine-grained information is disclosed as when moving from Cluster -> Node -> Index API.
For Node stats, we aggregate the total byte sizes for all vectors, categorised by the data type. For example:
"dense_vector" : {
"value_count" : 5,
"off_heap" : {
"total_size_in_bytes" : 27,
"total_veb_size_in_bytes" : 3,
"total_vec_size_in_bytes" : 23,
"total_veq_size_in_bytes" : 0,
"total_vex_size_in_bytes" : 1
}
}
Index stats: same as Node stats with included field break down . For example:
"dense_vector" : {
"value_count" : 5,
"off_heap" : {
"total_size_in_bytes" : 27,
"total_veb_size_in_bytes" : 3,
"total_vec_size_in_bytes" : 23,
"total_veq_size_in_bytes" : 0,
"total_vex_size_in_bytes" : 1,
"fielddata" : {
"bar" : {
"veb_size_in_bytes" : 3,
"vec_size_in_bytes" : 14,
"vex_size_in_bytes" : 1
},
"foo" : {
"vec_size_in_bytes" : 9
}
}
}
The implementation accesses the actual statistics through reflection. This will be completely removed when Lucene exposes this, which is expected in Lucene 10.3
* Add timeout to SynonymsManagementAPIService put synonyms
* Remove replicas 0, as that may impact serverless
* Add timeout to put synonyms action, fix tests
* Fix number of replicas
* Remove cluster.health checks for synonyms index
* Revert debugging
* Add integration test for timeouts
* Use TimeValue instead of an int
* Add YAML tests and REST API specs
* Fix a validation bug in put synonym rule
* Spotless
* Update docs/changelog/126314.yaml
* Remove unnecessary checks for null
* Fix equals / HashCode
* Checks that timeout is passed correctly to the check health method
* Use correctly the default timeout
* spotless
* Add monitor cluster privilege to internal synonyms user
* [CI] Auto commit changes from spotless
* Add capabilities to avoid failing on bwc tests
* Replace timeout for refresh param
* Add param to specs
* Add YAML tests
* Fix changelog
* [CI] Auto commit changes from spotless
* Use BWC serialization tests
* Fix bug in test parser
* Spotless
* Delete doesn't need reloading 🤦 removing it
* Revert "Delete doesn't need reloading 🤦 removing it"
This reverts commit 9c8e0b62be.
* [CI] Auto commit changes from spotless
* Fix refresh for delete synonym rule
* Fix tests
* Update docs/changelog/126935.yaml
* Add reload analyzers test
* reload_analyzers is not available on serverless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This PR adds two new REST endpoints, for listing queries and getting information on a current query.
* Resolves#124827
* Related to #124828 (initial work)
Changes from the API specified in the above issues:
* The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed.
List queries response:
```
GET /_query/queries
// returns for each of the running queries
// query_id, start_time, running_time, query
{ "queries" : {
"abc": {
"id": "abc",
"start_time_millis": 14585858875292,
"running_time_nanos": 762794,
"query": "FROM logs* | STATS BY hostname"
},
"4321": {
"id":"4321",
"start_time_millis": 14585858823573,
"running_time_nanos": 90231,
"query": "FROM orders | LOOKUP country_code ON country"
}
}
}
```
Get query response:
```
GET /_query/queries/abc
{
"id" : "abc",
"start_time_millis": 14585858875292,
"running_time_nanos": 762794,
"query": "FROM logs* | STATS BY hostname"
"coordinating_node": "oTUltX4IQMOUUVeiohTt8A"
"data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"]
}
```
Today `ActionResponse$Empty` implements `ToXContentObject`, but yields
no bytes of content when serialized which creates an invalid JSON
response. This commit removes the bogus interface and adjusts the
affected REST APIs to send a `text/plain` response instead.
Update the PerFieldFormatSupplier so that new standard indices use the
Lucene101PostingsFormat instead of the current default ES812PostingsFormat.
Currently, use of the new codec is gated behind a feature flag.
* [main] Move system indices migration to migrate plugin
It seems the best way to fix#122949 is to use existing data stream reindex API. However, this API is located in the migrate x-pack plugin. This commit moves the system indices migration logic (REST handlers, transport actions, and task) to the migrate plugin.
Port of #123551
* [CI] Auto commit changes from spotless
* Fix compilation
* Fix tests
* Fix test
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This action solely needs the cluster state, it can run on any node.
Since this is the last class/action that extends the `ClusterInfo`
abstract classes, we remove those classes too as they're not required
anymore.
Relates #101805
In this PR we introduce the data stream API in the `es-rest-api` using
the feature flag feature. This enabled us to use the `yamlRestTests`
tests instead of the `javaRestTests`.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
This test failed when the `disk.indices.forecast` value was a decimal number.
We adjust the regex to allow decimal values and for consistency we also allow negative values.
Fixes#125711Fixes#125848Fixes#125661
I was debating on having this tests in the original PR anyways. It ain't
worth the flakiness. We know the oversampling setting gets updated given
the other tests.
closes: https://github.com/elastic/elasticsearch/issues/125851
This change moves the query phase a single roundtrip per node just like can_match or field_caps work already.
A a result of executing multiple shard queries from a single request we can also partially reduce each node's query results on the data node side before responding to the coordinating node.
As a result this change significantly reduces the impact of network latencies on the end-to-end query performance, reduces the amount of work done (memory and cpu) on the coordinating node and the network traffic by factors of up to the number of shards per data node!
Benchmarking shows up to orders of magnitude improvements in heap and network traffic dimensions in querying across a larger number of shards.
This allows a `rescore_vector: {oversample: 0}` to indicate bypassing
oversampling and rescoring.
This is useful for:
- Updating a quantized mapping to turn off automatic rescoring
- Bypassing oversampling at query time in an ad-hoc manner if its on by default in the mapping
closes: https://github.com/elastic/elasticsearch/issues/125157
Since #122905 we were throwing NPEs (i.e. 5xxs) when a rollover request has an unknown/non-existent target. Before that, we returned a 400 - illegal argument exception. We now return a 404 which matches "missing target" better. Additionally, to avoid this from happening again, we add a YAML test that asserts the correct exception behavior.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
Frozen indices, the freeze index API and the private index.frozen setting have been removed with #120539.
There is also a search throttled thread pool that can now be removed, as well as a private search.throttled index settings that is no longer used as it could only be set internally by freezing an index.
While the index setting is private and can be removed, as it should no longer be present in any index on 9.0+ indices, the thread pool settings associated to the removed pool are still accepted as no-op in case users have customized them and are upgrading without removing these. These will also trigger a deprecating warning.
This change also removes the search.throttled related output from the thread pool section of the cluster info API.
This adds a new parameter to the quantized index mapping that allows
default oversampling and rescoring to occur.
This doesn't adjust any of the defaults. It allows it to be configured.
When the user provides `rescore_vector: {oversample: <number>}` in the
query it will overwrite it.
For example, here is how to use it with bbq:
```
PUT rescored_bbq
{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"index_options": {
"type": "bbq_hnsw",
"rescore_vector": {"oversample": 3.0}
}
}
}
}
}
```
Then, when querying, it will auto oversample the `k` by `3x` and rerank
with the raw vectors.
```
POST _search
{
"knn": {
"query_vector": [...],
"field": "vector"
}
}
```