Commit graph

2826 commits

Author SHA1 Message Date
David Turner
c238aa1b46
Add YAML spec docs about matching errors (#89370)
It's not obvious that a YAML test with a `catch` stanza also permits
`match` blocks to assert things about the structure of the error
response, but this structure may be an important part of the API spec.
This commit adds this info to the docs about YAML tests.
2022-08-18 22:20:13 +09:30
Nik Everett
b46d95b2fb
REST tests for percentiles_bucket agg (#88029)
Adds REST tests for the `percentiles_bucket` pipeline bucket
aggregation. This gives us forwards and backwards compatibility tests
for these aggs as well as mixed version cluster tests for these aggs.

Relates to #26220
2022-08-17 13:19:49 -04:00
Nik Everett
63b850cac9
REST tests for cumulative pipeline aggs (#88966)
Adds REST tests for the `cumulative_cardinality` and `cumulative_sum`
pipeline aggregations. This gives us forwards and backwards compatibility
tests for these aggs as well as mixed version cluster tests for these
aggs.

Relates to #26220
2022-08-17 13:05:47 -04:00
Nik Everett
79a89790e3
Synthetic source: load text from stored fields (#87480)
Adds support for loading `text` and `keyword` fields that have
`store: true`. We could likely load *any* stored fields, but I
wanted to blaze the trail using something fairly useful.
2022-08-17 10:18:36 -04:00
Nik Everett
b327b17653
Fix shard splitting for nested (#89351)
I broke shard splitting when `_routing` is required and you use `nested`
docs. The mapping would look like this:
```
"mappings": {
  "_routing": {
    "required": true
  },
  "properties": {
    "n": { "type": "nested" }
  }
}
```

If you attempt to split an index with a mapping like this it'll blow up
with an exception like this:
```
Caused by: [idx] org.elasticsearch.action.RoutingMissingException: routing is required for [idx]/[0]
	at org.elasticsearch.cluster.routing.IndexRouting$IdAndRoutingOnly.checkRoutingRequired(IndexRouting.java:181)
	at org.elasticsearch.cluster.routing.IndexRouting$IdAndRoutingOnly.getShard(IndexRouting.java:175)
```

This fixes the problem by entirely avoiding the branch of code. That
branch was trying to find any top level documents that don't have a
`_routing`. But we *know* that there aren't any top level documents
without a routing in this case - the routing is "required". ES wouldn't
have let you index any top level documents without the routing.

This also adds a small pile of REST layer tests for shard splitting that
hit various branches in this area. For extra paranoia.

Closes #88109
2022-08-16 11:55:46 -04:00
weizijun
104ad7fd92
TSDB: fix time series field caps bwc yaml test (#89236)
Stops the repeated test failures due to #89171
2022-08-15 09:46:09 +01:00
Yang Wang
d663231a83
User Profile - GetProfile API nows supports multiple UIDs (#89023)
This PR expands the existing GetProfile API to support getting multiple
profiles by IDs. As a result, the response format is also changed to
align with the latest version of API design guideline. Concretely, this
means moving the profiles as an array inside a top level "profiles"
field so that (1) does not mix dynamic fields (uid) with static fields
and (2) enforcing an order in the response which is desirable for
clients.

The change also reports any error encounter in the retrieving process in
a top level "errors" field.

Relates: #81910
2022-08-10 10:51:38 +09:30
Benjamin Trent
d588d456f0
[ML] add new trained model deployment cache clear API (#89074)
This adds a new `_ml/trained_models/<model_id>/deployment/cache/_clear` API. This will clear the inference cache on every node where the model is allocated.
2022-08-04 19:45:15 +01:00
Nhat Nguyen
e3c33e2acd
Deduplicate fetching doc-values fields (#89094)
If a docvalues field matches multiple field patterns, then ES will 
return the value of that doc-values field multiple times. Like fetching
fields from source, we should deduplicate the matching doc-values
fields.
2022-08-04 14:05:09 -04:00
likzn
f28f4545b2
In the field capabilities API, re-add support for fields in the request body (#88972)
We previously removed support for `fields` in the request body, to ensure there
was only one way to specify the parameter. We've now decided to undo the
change, since it was disruptive and the request body is actually the best place to
pass variable-length data like `fields`.

This PR restores support for `fields` in the request body. It throws an error
if the parameter is specified both in the URL and the body.

Closes #86875
2022-08-04 13:44:50 -04:00
Christos Soulios
b81f4187ab
[TSDB] Metric fields in the field caps API (#88695)
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field 
caps API per field.

Especially for metrics fields, it must be clear which fields are metrics and if they belong 
to only time-series indexes or mixed time-series and non-time-series indexes.

To further distinguish metric fields when they belong to any of the following indices:

  -  Standard (non-time-series) indexes
  -  Time series indexes
  -  Downsampled time series indexes

This PR modifies the field caps API so that the mapping parameters time_series_dimension 
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.

This PR revisits some of the conventions adopted by #78790
2022-08-04 20:42:34 +03:00
Ed Savage
188f8872c6
[ML] ECS Grok patterns in the _text_structure/find_structure endpoint (#88982)
Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns

Relates #77065

Co-authored-by: David Roberts <dave.roberts@elastic.co>
2022-08-04 18:39:04 +01:00
Julie Tibshirani
0bed7f768a Fix failures in vector field usage mixed cluster test 2022-08-03 16:14:46 -04:00
Julie Tibshirani
21eb984e64
Deprecate the _knn_search endpoint (#88828)
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.

Relates to #87625
2022-08-03 15:19:01 -04:00
Nikolaj Volgushev
a124bafe7e
REST tests and spec for bulk update API keys (#89027)
This PR adds REST API spec and YAML test files for the BulkUpdateApiKey
operation.
2022-08-03 12:42:54 +02:00
Artem Prigoda
f4e617e894
Add a test for checking for misspelled "dry_run" parameters for Desired Nodes API (#88898)
Check we the API doesn't accept a misspelled parameter and returns a client error.
2022-07-28 16:15:43 +02:00
Nik Everett
3bcee8eaa0
Format runtime geo_points (#85449)
This formats the result of the `fields` section of the `_search` API for
runtime `geo_point` fields using the `format` parameter like we do for
non-runtime `geo_point` fields. This changes the default format for
those fields from `lat, lon` to `geojson` with the option to get `wkt`
or any other format we support.

The fix does so by preserving the `double, double` nature of the
`geo_point` rather than encoding it immediately in the script. Callers can
use the results. The field fetchers use the `double, double` natively,
preserving as much precision as possible. The queries quantize the points
exactly like lucene indexing does. And like the script did before this Pr.

Closes #85245
2022-07-27 13:11:07 -04:00
Przemko Robakowski
539434dbb4
Add min_* conditions to rollover (#83345) 2022-07-26 11:46:39 -04:00
Julie Tibshirani
abd561a277
Support kNN vectors in disk usage action (#88785)
This change adds support for kNN vector fields to the `_disk_usage` API. The
strategy:
* Iterate the vector values (using the same strategy as for doc values) to
estimate the vector data size
* Run some random vector searches to estimate the vector index size 

Co-authored-by: Yannick Welsch <yannick@welsch.lu>

Closes #84801
2022-07-26 07:57:47 -07:00
Artem Prigoda
c0bc85522d
Clean up desired nodes in between dry run tests (#88797) 2022-07-26 12:04:06 +02:00
Artem Prigoda
72a6fdc2b8
Support "dry run" mode for updating Desired Nodes (#88305)
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.

See #82975
2022-07-26 09:03:12 +02:00
Keith Massey
4b060a6046
Removing the notion of components from the health API (#88663)
This commit removes the notion of components from the health API. They are gone from being
a top-level field in the response, and indicators is promoted into its place.
2022-07-25 12:29:06 -05:00
Andrei Dan
da765ced7f
Remove help_url,rename summary to symptom, and user_actions to diagnosis (#88553)
Remove help_url,rename summary->symptom,user_actions->diagnosis
Separate the diagnosis `message` field in `cause` and `action`
Co-authored-by: Mary Gouseti <mgouseti@gmail.com>
2022-07-25 10:35:16 +01:00
Julie Tibshirani
e3ede67262
Integrate ANN into _search endpoint (#88694)
This PR adds a new `knn` option to the `_search` API to support ANN search.
It's powered by the same Lucene ANN capabilities as the old `_knn_search`
endpoint. The `knn` option can be combined with other search features like
queries and aggregations.

Addresses #87625
2022-07-22 08:02:07 -07:00
Benjamin Trent
94f2544998
Adding cardinality support for random_sampler agg (#86838)
This adds support for the `cardinality` aggregation within a random_sampler.

This usecase is helpful in determining the ratio of unique values compared to the count of total documents within the sampled set.
2022-07-21 07:19:35 -04:00
Seth Michael Larson
fffabae10a
Add pagination parameters to API spec and docs for 'snapshot.get' API 2022-07-20 06:35:52 -05:00
tmgordeeva
ab2602ecb0
Propagate alias filters to significance aggs filters (#88221)
Propagate alias filters to significance aggs filters

If we have an alias filter, use it as part of the background filter on a
signficant terms agg. Previously, alias filters did not apply to background
filters so this will change bg_count results for some significant terms aggs
using background filter.

Closes #81585
2022-07-19 10:03:08 -07:00
Seth Michael Larson
478c06ef29
Verify that 'details' aren't sent when explain=false 2022-07-18 09:48:11 -05:00
Benjamin Trent
afa28d49b4
[ML] add new cache_size parameter to trained_model deployments API (#88450)
With: https://github.com/elastic/ml-cpp/pull/2305 we now support caching pytorch inference responses per node per model.

By default, the cache will be the same size has the model on disk size. This is because our current best estimate for memory used (for deploying) is 2*model_size + constant_overhead. 

This is due to the model having to be loaded in memory twice when serializing to the native process. 

But, once the model is in memory and accepting requests, its actual memory usage is reduced vs. what we have "reserved" for it within the node.

Consequently, having a cache layer that takes advantage of that unused (but reserved) memory is effectively free. When used in production, especially in search scenarios, caching inference results is critical for decreasing latency.
2022-07-18 09:19:01 -04:00
Alan Woodward
5c11a81913
Add 'mode' option to _source field mapper (#88211)
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.
2022-07-18 12:50:10 +01:00
Chen Ni
c45c205c33
Add test execution guide in yamlRestTest asciidoc (#88490) 2022-07-14 08:22:35 -07:00
Nhat Nguyen
227d80975b
Add tests for query/agg on lookup runtime fields (#88389)
Adds tests to ensure that querying and aggregating on lookup runtimes
aren't supported.

Relates #88296
2022-07-09 02:02:13 +09:30
Nikolaj Volgushev
f42b15bc8c
Updatable API keys - REST API spec and tests (#88270)
This PR adds REST API spec and YAML test files for the UpdateApiKey
operation.
2022-07-08 11:48:02 +02:00
Ryan Ernst
9016883e1c
Add build_flavor back to info api rest response (#88336)
The build_flavor was previously removed since it is no longer relevant;
only the default distribution now exists. However, the removal of build
flavor included removing it from the version information on the info
response for the root path. This API is supposed to be stable, so
removing that key was a compatibility break. This commit adds the
build_flavor back to that API, hardcoded to `default`. Additionally, a
test is added to ensure the key exists going forward, until it can be
properly deprecated.

closes #88318
2022-07-08 09:54:29 +09:30
Mark Tozzi
9ee6a19187
Add ability to select execution mode for cardinality aggregation (#87704)
Plumbs through a new parameter for the cardinality aggregation, to allow configuring the execution mode.  This can have significant impacts on speed and memory usage.  This PR exposes three collection modes and two heuristics that we can tune going forward.  All of these are treated as hints and can be silently ignored, e.g. if not applicable to the given field type.  I've change the default behavior to optimize for time, which potentially uses more memory.  Users can override this for the old behavior if needed.
2022-07-05 09:11:22 -04:00
Rene Groeschke
8ccae4da71
Setup elasticsearch dependency monitoring with Snyk for production code (#88036)
This adds the generation and upload logic of Gradle dependency graphs to snyk

We directly implemented a rest api based snyk plugin as:

the existing snyk gradle plugin delegates to the snyk command line tool the command line tool 
uses custom gradle logic by injecting a init file that is 

a) using deprecated build logic which we definitely want to avoid
b) uses gradle api we avoid like eager task creation.

Shipping this as a internal gradle plugin gives us the most flexibility as we only want to monitor 
production code for now we apply this plugin as part of the elasticsearch.build plugin, 
that usage has been for now the de-facto indicator if a project is considered a "production" project 
that ends up in our distribution or public maven repositories. This isnt yet ideal and we will revisit 
the distinction between production and non production code / projects in a separate effort.

As part of this effort we added the elasticsearch.build plugin to more projects that actually end up 
in the distribution. To unblock us on this we for now disabled a few check tasks that started failing by applying elasticsearch.build. 

Addresses  #87620
2022-06-29 13:29:14 +02:00
Nik Everett
d88dfb11c7
More REST tests for avg/max/min/sum_bucket aggs (#88027)
Adds REST layer tests for some sneaky cases in the the `avg_bucket`,
`max_bucket`, `min_bucket`, and `sum_bucket` pipeline aggregations.
This gives us forwards and backwards compatibility tests for these
aggs as well as mixed version cluster tests for these aggs.

Relates to #26220
2022-06-27 13:49:29 -04:00
Ryan Ernst
e3c4cddbe2
Remove legacy bootstrap plugins (#87775)
Bootstrap plugins were an internal mechanism added to allow a
filesystemprovider for cloud with the quota-aware-fs plugin. Since that
was removed, bootstrap plugins no longer serve a purpose. They were
never officially documented because they were for internal use only.
This commit removes the bootstrap plugins infrastructure.
2022-06-23 20:38:06 -04:00
Julie Tibshirani
572a5b9bb4 Skip dense_vector field usage test before 8.1
Fixes #87971.
2022-06-23 10:25:17 -07:00
Julie Tibshirani
3a9e511117
Move kNN search and dense vectors to core (#87815)
This PR moves kNN search and dense vector support out of an xpack plugin and
into server.

In #87625 we plan to integrate ANN search into the main `_search` endpoint as a
new top-level component called `knn`. So kNN will be a dedicated part of the
search request, and we'll have kNN logic within the search phases. The classes
and logic will live in server, matching the other search components like
suggesters, field collapsing, etc.
2022-06-22 21:10:20 -07:00
Nik Everett
463d46cd79
Add force_synthetic_source to mget (#87574)
This adds the option to force synthetic source to the MGET API. See
 #87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in MGET.
2022-06-22 08:55:55 -04:00
Mark Tozzi
5f2411a3b8
Revert "Correct skip versions for new flattened terms test (#87540)" (#87764)
This reverts commit f72c7da7ee.
2022-06-16 16:35:39 -04:00
Nik Everett
cf154fd367
Tests for synthetic _source from translog (#87578)
This adds tests to make sure that we use all of the normal synthetic
source machinery, even when loading from the translog. So all GETs on
synthetic source indices will require an in memory index. That'll be an
extra cost on indices that are updated very very frequently.
2022-06-16 14:51:17 -04:00
Mark Tozzi
f72c7da7ee
Correct skip versions for new flattened terms test (#87540)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-06-16 14:38:05 -04:00
Nik Everett
5b525a290e
REST tests for avg/max/min/sum_bucket aggs (#87009)
Adds REST layer tests for the `avg_bucket`, `max_bucket`, `min_bucket`,
and `sum_bucket` pipeline aggregations. This gives us forwards and
backwards compatibility tests for these aggs as well as mixed version
cluster tests for these aggs.

Relates to #26220
2022-06-16 12:16:06 -04:00
Nik Everett
d74b45b9b1
REST tests for stats_bucket aggs (#87006)
Adds REST tests for `stats_bucket` and `extended_stats_bucket` aggs.

Relates to #26220
2022-06-16 12:14:52 -04:00
Nik Everett
48ab87f60b
Fix synthetic source highlighting tests (#87749)
The synthetic source highlighting tests would sometimes fail in a
strange way - they expect the entire search request to fail but it
*didn't* - only a single shard would fail. This locks the tests to
always make single shard indices so the failures are consistent.

Closes #87730
2022-06-16 12:07:43 -04:00
Nik Everett
8ebf39b7e1
Fixup highlighting with synthetic source (#87667)
Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it *often* wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's *enough* to stop them from spitting out wrong answers.

We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because *generally* folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.
2022-06-15 14:49:06 -04:00
David Turner
fcf293f87c
Report overall mapping size in cluster stats (#87556)
Adds measures of the total size of all mappings and the total number of
fields in the cluster (both before and after deduplication).

Relates #86639
Relates #77466
2022-06-14 13:55:14 +01:00
Nik Everett
a37edb7796
Add force_synthetic_source to GET (#87536)
This adds the option to force synthetic source to the GET API. See
 #87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in GET.
2022-06-09 09:40:36 -04:00