Commit graph

2817 commits

Author SHA1 Message Date
likzn
f28f4545b2
In the field capabilities API, re-add support for fields in the request body (#88972)
We previously removed support for `fields` in the request body, to ensure there
was only one way to specify the parameter. We've now decided to undo the
change, since it was disruptive and the request body is actually the best place to
pass variable-length data like `fields`.

This PR restores support for `fields` in the request body. It throws an error
if the parameter is specified both in the URL and the body.

Closes #86875
2022-08-04 13:44:50 -04:00
Christos Soulios
b81f4187ab
[TSDB] Metric fields in the field caps API (#88695)
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field 
caps API per field.

Especially for metrics fields, it must be clear which fields are metrics and if they belong 
to only time-series indexes or mixed time-series and non-time-series indexes.

To further distinguish metric fields when they belong to any of the following indices:

  -  Standard (non-time-series) indexes
  -  Time series indexes
  -  Downsampled time series indexes

This PR modifies the field caps API so that the mapping parameters time_series_dimension 
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.

This PR revisits some of the conventions adopted by #78790
2022-08-04 20:42:34 +03:00
Ed Savage
188f8872c6
[ML] ECS Grok patterns in the _text_structure/find_structure endpoint (#88982)
Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns

Relates #77065

Co-authored-by: David Roberts <dave.roberts@elastic.co>
2022-08-04 18:39:04 +01:00
Julie Tibshirani
0bed7f768a Fix failures in vector field usage mixed cluster test 2022-08-03 16:14:46 -04:00
Julie Tibshirani
21eb984e64
Deprecate the _knn_search endpoint (#88828)
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.

Relates to #87625
2022-08-03 15:19:01 -04:00
Nikolaj Volgushev
a124bafe7e
REST tests and spec for bulk update API keys (#89027)
This PR adds REST API spec and YAML test files for the BulkUpdateApiKey
operation.
2022-08-03 12:42:54 +02:00
Artem Prigoda
f4e617e894
Add a test for checking for misspelled "dry_run" parameters for Desired Nodes API (#88898)
Check we the API doesn't accept a misspelled parameter and returns a client error.
2022-07-28 16:15:43 +02:00
Nik Everett
3bcee8eaa0
Format runtime geo_points (#85449)
This formats the result of the `fields` section of the `_search` API for
runtime `geo_point` fields using the `format` parameter like we do for
non-runtime `geo_point` fields. This changes the default format for
those fields from `lat, lon` to `geojson` with the option to get `wkt`
or any other format we support.

The fix does so by preserving the `double, double` nature of the
`geo_point` rather than encoding it immediately in the script. Callers can
use the results. The field fetchers use the `double, double` natively,
preserving as much precision as possible. The queries quantize the points
exactly like lucene indexing does. And like the script did before this Pr.

Closes #85245
2022-07-27 13:11:07 -04:00
Przemko Robakowski
539434dbb4
Add min_* conditions to rollover (#83345) 2022-07-26 11:46:39 -04:00
Julie Tibshirani
abd561a277
Support kNN vectors in disk usage action (#88785)
This change adds support for kNN vector fields to the `_disk_usage` API. The
strategy:
* Iterate the vector values (using the same strategy as for doc values) to
estimate the vector data size
* Run some random vector searches to estimate the vector index size 

Co-authored-by: Yannick Welsch <yannick@welsch.lu>

Closes #84801
2022-07-26 07:57:47 -07:00
Artem Prigoda
c0bc85522d
Clean up desired nodes in between dry run tests (#88797) 2022-07-26 12:04:06 +02:00
Artem Prigoda
72a6fdc2b8
Support "dry run" mode for updating Desired Nodes (#88305)
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.

See #82975
2022-07-26 09:03:12 +02:00
Keith Massey
4b060a6046
Removing the notion of components from the health API (#88663)
This commit removes the notion of components from the health API. They are gone from being
a top-level field in the response, and indicators is promoted into its place.
2022-07-25 12:29:06 -05:00
Andrei Dan
da765ced7f
Remove help_url,rename summary to symptom, and user_actions to diagnosis (#88553)
Remove help_url,rename summary->symptom,user_actions->diagnosis
Separate the diagnosis `message` field in `cause` and `action`
Co-authored-by: Mary Gouseti <mgouseti@gmail.com>
2022-07-25 10:35:16 +01:00
Julie Tibshirani
e3ede67262
Integrate ANN into _search endpoint (#88694)
This PR adds a new `knn` option to the `_search` API to support ANN search.
It's powered by the same Lucene ANN capabilities as the old `_knn_search`
endpoint. The `knn` option can be combined with other search features like
queries and aggregations.

Addresses #87625
2022-07-22 08:02:07 -07:00
Benjamin Trent
94f2544998
Adding cardinality support for random_sampler agg (#86838)
This adds support for the `cardinality` aggregation within a random_sampler.

This usecase is helpful in determining the ratio of unique values compared to the count of total documents within the sampled set.
2022-07-21 07:19:35 -04:00
Seth Michael Larson
fffabae10a
Add pagination parameters to API spec and docs for 'snapshot.get' API 2022-07-20 06:35:52 -05:00
tmgordeeva
ab2602ecb0
Propagate alias filters to significance aggs filters (#88221)
Propagate alias filters to significance aggs filters

If we have an alias filter, use it as part of the background filter on a
signficant terms agg. Previously, alias filters did not apply to background
filters so this will change bg_count results for some significant terms aggs
using background filter.

Closes #81585
2022-07-19 10:03:08 -07:00
Seth Michael Larson
478c06ef29
Verify that 'details' aren't sent when explain=false 2022-07-18 09:48:11 -05:00
Benjamin Trent
afa28d49b4
[ML] add new cache_size parameter to trained_model deployments API (#88450)
With: https://github.com/elastic/ml-cpp/pull/2305 we now support caching pytorch inference responses per node per model.

By default, the cache will be the same size has the model on disk size. This is because our current best estimate for memory used (for deploying) is 2*model_size + constant_overhead. 

This is due to the model having to be loaded in memory twice when serializing to the native process. 

But, once the model is in memory and accepting requests, its actual memory usage is reduced vs. what we have "reserved" for it within the node.

Consequently, having a cache layer that takes advantage of that unused (but reserved) memory is effectively free. When used in production, especially in search scenarios, caching inference results is critical for decreasing latency.
2022-07-18 09:19:01 -04:00
Alan Woodward
5c11a81913
Add 'mode' option to _source field mapper (#88211)
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.
2022-07-18 12:50:10 +01:00
Chen Ni
c45c205c33
Add test execution guide in yamlRestTest asciidoc (#88490) 2022-07-14 08:22:35 -07:00
Nhat Nguyen
227d80975b
Add tests for query/agg on lookup runtime fields (#88389)
Adds tests to ensure that querying and aggregating on lookup runtimes
aren't supported.

Relates #88296
2022-07-09 02:02:13 +09:30
Nikolaj Volgushev
f42b15bc8c
Updatable API keys - REST API spec and tests (#88270)
This PR adds REST API spec and YAML test files for the UpdateApiKey
operation.
2022-07-08 11:48:02 +02:00
Ryan Ernst
9016883e1c
Add build_flavor back to info api rest response (#88336)
The build_flavor was previously removed since it is no longer relevant;
only the default distribution now exists. However, the removal of build
flavor included removing it from the version information on the info
response for the root path. This API is supposed to be stable, so
removing that key was a compatibility break. This commit adds the
build_flavor back to that API, hardcoded to `default`. Additionally, a
test is added to ensure the key exists going forward, until it can be
properly deprecated.

closes #88318
2022-07-08 09:54:29 +09:30
Mark Tozzi
9ee6a19187
Add ability to select execution mode for cardinality aggregation (#87704)
Plumbs through a new parameter for the cardinality aggregation, to allow configuring the execution mode.  This can have significant impacts on speed and memory usage.  This PR exposes three collection modes and two heuristics that we can tune going forward.  All of these are treated as hints and can be silently ignored, e.g. if not applicable to the given field type.  I've change the default behavior to optimize for time, which potentially uses more memory.  Users can override this for the old behavior if needed.
2022-07-05 09:11:22 -04:00
Rene Groeschke
8ccae4da71
Setup elasticsearch dependency monitoring with Snyk for production code (#88036)
This adds the generation and upload logic of Gradle dependency graphs to snyk

We directly implemented a rest api based snyk plugin as:

the existing snyk gradle plugin delegates to the snyk command line tool the command line tool 
uses custom gradle logic by injecting a init file that is 

a) using deprecated build logic which we definitely want to avoid
b) uses gradle api we avoid like eager task creation.

Shipping this as a internal gradle plugin gives us the most flexibility as we only want to monitor 
production code for now we apply this plugin as part of the elasticsearch.build plugin, 
that usage has been for now the de-facto indicator if a project is considered a "production" project 
that ends up in our distribution or public maven repositories. This isnt yet ideal and we will revisit 
the distinction between production and non production code / projects in a separate effort.

As part of this effort we added the elasticsearch.build plugin to more projects that actually end up 
in the distribution. To unblock us on this we for now disabled a few check tasks that started failing by applying elasticsearch.build. 

Addresses  #87620
2022-06-29 13:29:14 +02:00
Nik Everett
d88dfb11c7
More REST tests for avg/max/min/sum_bucket aggs (#88027)
Adds REST layer tests for some sneaky cases in the the `avg_bucket`,
`max_bucket`, `min_bucket`, and `sum_bucket` pipeline aggregations.
This gives us forwards and backwards compatibility tests for these
aggs as well as mixed version cluster tests for these aggs.

Relates to #26220
2022-06-27 13:49:29 -04:00
Ryan Ernst
e3c4cddbe2
Remove legacy bootstrap plugins (#87775)
Bootstrap plugins were an internal mechanism added to allow a
filesystemprovider for cloud with the quota-aware-fs plugin. Since that
was removed, bootstrap plugins no longer serve a purpose. They were
never officially documented because they were for internal use only.
This commit removes the bootstrap plugins infrastructure.
2022-06-23 20:38:06 -04:00
Julie Tibshirani
572a5b9bb4 Skip dense_vector field usage test before 8.1
Fixes #87971.
2022-06-23 10:25:17 -07:00
Julie Tibshirani
3a9e511117
Move kNN search and dense vectors to core (#87815)
This PR moves kNN search and dense vector support out of an xpack plugin and
into server.

In #87625 we plan to integrate ANN search into the main `_search` endpoint as a
new top-level component called `knn`. So kNN will be a dedicated part of the
search request, and we'll have kNN logic within the search phases. The classes
and logic will live in server, matching the other search components like
suggesters, field collapsing, etc.
2022-06-22 21:10:20 -07:00
Nik Everett
463d46cd79
Add force_synthetic_source to mget (#87574)
This adds the option to force synthetic source to the MGET API. See
 #87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in MGET.
2022-06-22 08:55:55 -04:00
Mark Tozzi
5f2411a3b8
Revert "Correct skip versions for new flattened terms test (#87540)" (#87764)
This reverts commit f72c7da7ee.
2022-06-16 16:35:39 -04:00
Nik Everett
cf154fd367
Tests for synthetic _source from translog (#87578)
This adds tests to make sure that we use all of the normal synthetic
source machinery, even when loading from the translog. So all GETs on
synthetic source indices will require an in memory index. That'll be an
extra cost on indices that are updated very very frequently.
2022-06-16 14:51:17 -04:00
Mark Tozzi
f72c7da7ee
Correct skip versions for new flattened terms test (#87540)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-06-16 14:38:05 -04:00
Nik Everett
5b525a290e
REST tests for avg/max/min/sum_bucket aggs (#87009)
Adds REST layer tests for the `avg_bucket`, `max_bucket`, `min_bucket`,
and `sum_bucket` pipeline aggregations. This gives us forwards and
backwards compatibility tests for these aggs as well as mixed version
cluster tests for these aggs.

Relates to #26220
2022-06-16 12:16:06 -04:00
Nik Everett
d74b45b9b1
REST tests for stats_bucket aggs (#87006)
Adds REST tests for `stats_bucket` and `extended_stats_bucket` aggs.

Relates to #26220
2022-06-16 12:14:52 -04:00
Nik Everett
48ab87f60b
Fix synthetic source highlighting tests (#87749)
The synthetic source highlighting tests would sometimes fail in a
strange way - they expect the entire search request to fail but it
*didn't* - only a single shard would fail. This locks the tests to
always make single shard indices so the failures are consistent.

Closes #87730
2022-06-16 12:07:43 -04:00
Nik Everett
8ebf39b7e1
Fixup highlighting with synthetic source (#87667)
Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it *often* wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's *enough* to stop them from spitting out wrong answers.

We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because *generally* folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.
2022-06-15 14:49:06 -04:00
David Turner
fcf293f87c
Report overall mapping size in cluster stats (#87556)
Adds measures of the total size of all mappings and the total number of
fields in the cluster (both before and after deduplication).

Relates #86639
Relates #77466
2022-06-14 13:55:14 +01:00
Nik Everett
a37edb7796
Add force_synthetic_source to GET (#87536)
This adds the option to force synthetic source to the GET API. See
 #87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in GET.
2022-06-09 09:40:36 -04:00
Nik Everett
2ec59e799b
Fix test in synthetic source (#87534)
Fixes a test for forcing synthetic source that sometimes fails if the
index has more than one shard. We're just looking for a sensible failure
message here so we can lock it to one shard.
2022-06-08 15:17:25 -04:00
Yang Wang
f5ceed19fc
User Profile - remove feature flag (#87383)
The feature flag is no longer necessary in the 8.4 release cycle. The
feature itself is still in beta.
2022-06-08 10:18:18 -04:00
Mark Tozzi
c9af118237
Fix a bug with flattened fields in terms aggregations (#87392)
The root cause here was that missing did not correctly delegate `supportsGlobalOrdinalsMappnig` to the wrapped values source, instead falling back to the default.  I've added the delegation, and made the base method abstract so this doesn't happen again.
2022-06-08 08:08:18 -04:00
Salvatore Campagna
5d062f9fdd
Make the metric in the buckets_path parameter optional (#87220)
With this change the metric field name becomes optional if the
'bukets_path' is pointing to a multi-value aggregation with a single
metric field. Normally the full path would be required including
the aggregation name followed by the metric field.

If the metric is not specified in the path and the multi-value
aggregation computes more than one value an error is thrown.

The old notation is still supported for backward compatibility in case
the full path is specified and the target multi-value aggregation
computes a single value.
2022-06-08 10:44:02 +02:00
Nik Everett
d0b50b56a0
Add an option to _search to force synthetic source (#87068)
This adds `?force_synthetic_source` to, well, force running the fetch
phase with synthetic source. If the mapping is incompatible with
synthetic source it'll throw a 400 error.
2022-06-07 11:11:14 -04:00
Albert Zaharovits
7be60d6068
[DOCS] Profile Has Privileges API (#87360)
Docs for the new Has Privileges API for profiles from #85898.

[Has privileges user profile API
preview](https://elasticsearch_87360.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-api-has-privileges-user-profiles.html).
2022-06-07 03:58:32 -04:00
Keith Massey
c95230d155
Master stability health indicator part 1 (when a master has been seen recently) (#86524)
This is the first PR for the master stability check, which is part of the health API. It handles the case
when we have seen a master node recently. The more complicated case when we have not seen a
master node recently will be in subsequent PRs.
2022-06-06 14:40:15 -05:00
Nik Everett
5f70d30330
Synthetic source: paranoid tests for configuration (#87182)
This adds some paranoid REST layer tests for modifying the `synthetic`
configuration.
2022-06-06 09:37:57 -04:00
Luca Cavanna
50793a68a8
Fields API to allow fetching values when _source is disabled (#87267)
Back when we introduced the fields parameter to the search API, it could only fetch values from _source, hence
the corresponding sub-fetch phase fails early whenever _source is disabled. Today though runtime fields can
be retrieved from a separate value fetcher that reads from fielddata, and metadata fields can be retrieved
from stored fields. These two scenarios currently throw an unnecessary error whenever _source is disabled.

This commit removes the check for disabled _source, so that runtime fields and metadata fields can be retrieved even when _source is disabled. Fields that need to be loaded from _source are simply skipped whenever _source is disabled, similar to when a field is not found in _source.

Closes #87072
2022-06-02 11:28:36 +02:00