It's not obvious that a YAML test with a `catch` stanza also permits
`match` blocks to assert things about the structure of the error
response, but this structure may be an important part of the API spec.
This commit adds this info to the docs about YAML tests.
Adds REST tests for the `percentiles_bucket` pipeline bucket
aggregation. This gives us forwards and backwards compatibility tests
for these aggs as well as mixed version cluster tests for these aggs.
Relates to #26220
Adds REST tests for the `cumulative_cardinality` and `cumulative_sum`
pipeline aggregations. This gives us forwards and backwards compatibility
tests for these aggs as well as mixed version cluster tests for these
aggs.
Relates to #26220
Adds support for loading `text` and `keyword` fields that have
`store: true`. We could likely load *any* stored fields, but I
wanted to blaze the trail using something fairly useful.
I broke shard splitting when `_routing` is required and you use `nested`
docs. The mapping would look like this:
```
"mappings": {
"_routing": {
"required": true
},
"properties": {
"n": { "type": "nested" }
}
}
```
If you attempt to split an index with a mapping like this it'll blow up
with an exception like this:
```
Caused by: [idx] org.elasticsearch.action.RoutingMissingException: routing is required for [idx]/[0]
at org.elasticsearch.cluster.routing.IndexRouting$IdAndRoutingOnly.checkRoutingRequired(IndexRouting.java:181)
at org.elasticsearch.cluster.routing.IndexRouting$IdAndRoutingOnly.getShard(IndexRouting.java:175)
```
This fixes the problem by entirely avoiding the branch of code. That
branch was trying to find any top level documents that don't have a
`_routing`. But we *know* that there aren't any top level documents
without a routing in this case - the routing is "required". ES wouldn't
have let you index any top level documents without the routing.
This also adds a small pile of REST layer tests for shard splitting that
hit various branches in this area. For extra paranoia.
Closes#88109
This PR expands the existing GetProfile API to support getting multiple
profiles by IDs. As a result, the response format is also changed to
align with the latest version of API design guideline. Concretely, this
means moving the profiles as an array inside a top level "profiles"
field so that (1) does not mix dynamic fields (uid) with static fields
and (2) enforcing an order in the response which is desirable for
clients.
The change also reports any error encounter in the retrieving process in
a top level "errors" field.
Relates: #81910
This adds a new `_ml/trained_models/<model_id>/deployment/cache/_clear` API. This will clear the inference cache on every node where the model is allocated.
If a docvalues field matches multiple field patterns, then ES will
return the value of that doc-values field multiple times. Like fetching
fields from source, we should deduplicate the matching doc-values
fields.
We previously removed support for `fields` in the request body, to ensure there
was only one way to specify the parameter. We've now decided to undo the
change, since it was disruptive and the request body is actually the best place to
pass variable-length data like `fields`.
This PR restores support for `fields` in the request body. It throws an error
if the parameter is specified both in the URL and the body.
Closes#86875
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field
caps API per field.
Especially for metrics fields, it must be clear which fields are metrics and if they belong
to only time-series indexes or mixed time-series and non-time-series indexes.
To further distinguish metric fields when they belong to any of the following indices:
- Standard (non-time-series) indexes
- Time series indexes
- Downsampled time series indexes
This PR modifies the field caps API so that the mapping parameters time_series_dimension
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.
This PR revisits some of the conventions adopted by #78790
Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns
Relates #77065
Co-authored-by: David Roberts <dave.roberts@elastic.co>
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.
Relates to #87625
This formats the result of the `fields` section of the `_search` API for
runtime `geo_point` fields using the `format` parameter like we do for
non-runtime `geo_point` fields. This changes the default format for
those fields from `lat, lon` to `geojson` with the option to get `wkt`
or any other format we support.
The fix does so by preserving the `double, double` nature of the
`geo_point` rather than encoding it immediately in the script. Callers can
use the results. The field fetchers use the `double, double` natively,
preserving as much precision as possible. The queries quantize the points
exactly like lucene indexing does. And like the script did before this Pr.
Closes#85245
This change adds support for kNN vector fields to the `_disk_usage` API. The
strategy:
* Iterate the vector values (using the same strategy as for doc values) to
estimate the vector data size
* Run some random vector searches to estimate the vector index size
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Closes#84801
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.
See #82975
This commit removes the notion of components from the health API. They are gone from being
a top-level field in the response, and indicators is promoted into its place.
Remove help_url,rename summary->symptom,user_actions->diagnosis
Separate the diagnosis `message` field in `cause` and `action`
Co-authored-by: Mary Gouseti <mgouseti@gmail.com>
This PR adds a new `knn` option to the `_search` API to support ANN search.
It's powered by the same Lucene ANN capabilities as the old `_knn_search`
endpoint. The `knn` option can be combined with other search features like
queries and aggregations.
Addresses #87625
This adds support for the `cardinality` aggregation within a random_sampler.
This usecase is helpful in determining the ratio of unique values compared to the count of total documents within the sampled set.
Propagate alias filters to significance aggs filters
If we have an alias filter, use it as part of the background filter on a
signficant terms agg. Previously, alias filters did not apply to background
filters so this will change bg_count results for some significant terms aggs
using background filter.
Closes#81585
With: https://github.com/elastic/ml-cpp/pull/2305 we now support caching pytorch inference responses per node per model.
By default, the cache will be the same size has the model on disk size. This is because our current best estimate for memory used (for deploying) is 2*model_size + constant_overhead.
This is due to the model having to be loaded in memory twice when serializing to the native process.
But, once the model is in memory and accepting requests, its actual memory usage is reduced vs. what we have "reserved" for it within the node.
Consequently, having a cache layer that takes advantage of that unused (but reserved) memory is effectively free. When used in production, especially in search scenarios, caching inference results is critical for decreasing latency.
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.
The build_flavor was previously removed since it is no longer relevant;
only the default distribution now exists. However, the removal of build
flavor included removing it from the version information on the info
response for the root path. This API is supposed to be stable, so
removing that key was a compatibility break. This commit adds the
build_flavor back to that API, hardcoded to `default`. Additionally, a
test is added to ensure the key exists going forward, until it can be
properly deprecated.
closes#88318
Plumbs through a new parameter for the cardinality aggregation, to allow configuring the execution mode. This can have significant impacts on speed and memory usage. This PR exposes three collection modes and two heuristics that we can tune going forward. All of these are treated as hints and can be silently ignored, e.g. if not applicable to the given field type. I've change the default behavior to optimize for time, which potentially uses more memory. Users can override this for the old behavior if needed.
This adds the generation and upload logic of Gradle dependency graphs to snyk
We directly implemented a rest api based snyk plugin as:
the existing snyk gradle plugin delegates to the snyk command line tool the command line tool
uses custom gradle logic by injecting a init file that is
a) using deprecated build logic which we definitely want to avoid
b) uses gradle api we avoid like eager task creation.
Shipping this as a internal gradle plugin gives us the most flexibility as we only want to monitor
production code for now we apply this plugin as part of the elasticsearch.build plugin,
that usage has been for now the de-facto indicator if a project is considered a "production" project
that ends up in our distribution or public maven repositories. This isnt yet ideal and we will revisit
the distinction between production and non production code / projects in a separate effort.
As part of this effort we added the elasticsearch.build plugin to more projects that actually end up
in the distribution. To unblock us on this we for now disabled a few check tasks that started failing by applying elasticsearch.build.
Addresses #87620
Adds REST layer tests for some sneaky cases in the the `avg_bucket`,
`max_bucket`, `min_bucket`, and `sum_bucket` pipeline aggregations.
This gives us forwards and backwards compatibility tests for these
aggs as well as mixed version cluster tests for these aggs.
Relates to #26220
Bootstrap plugins were an internal mechanism added to allow a
filesystemprovider for cloud with the quota-aware-fs plugin. Since that
was removed, bootstrap plugins no longer serve a purpose. They were
never officially documented because they were for internal use only.
This commit removes the bootstrap plugins infrastructure.
This PR moves kNN search and dense vector support out of an xpack plugin and
into server.
In #87625 we plan to integrate ANN search into the main `_search` endpoint as a
new top-level component called `knn`. So kNN will be a dedicated part of the
search request, and we'll have kNN logic within the search phases. The classes
and logic will live in server, matching the other search components like
suggesters, field collapsing, etc.
This adds the option to force synthetic source to the MGET API. See
#87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in MGET.
This adds tests to make sure that we use all of the normal synthetic
source machinery, even when loading from the translog. So all GETs on
synthetic source indices will require an in memory index. That'll be an
extra cost on indices that are updated very very frequently.
Adds REST layer tests for the `avg_bucket`, `max_bucket`, `min_bucket`,
and `sum_bucket` pipeline aggregations. This gives us forwards and
backwards compatibility tests for these aggs as well as mixed version
cluster tests for these aggs.
Relates to #26220
The synthetic source highlighting tests would sometimes fail in a
strange way - they expect the entire search request to fail but it
*didn't* - only a single shard would fail. This locks the tests to
always make single shard indices so the failures are consistent.
Closes#87730
Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it *often* wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's *enough* to stop them from spitting out wrong answers.
We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because *generally* folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.
Adds measures of the total size of all mappings and the total number of
fields in the cluster (both before and after deduplication).
Relates #86639
Relates #77466
This adds the option to force synthetic source to the GET API. See
#87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in GET.