This prevents docs files from *starting* with a "response" because when
that happens the response is converted to an assertion and appended
to the last snippet that was processed. If that last snipper was in a
different file then it's very hard to reason about the tests. That goes
double because the order we iterate files isn't defined....
Anyway! This adds a guard in the build, removes the offending
"response", and reenables the tests that we'd thought we failing here.
Closes#91081
This PR extends the basic Prevalidation API so that in case there are
red non-searchable-snapshot indices in the cluster, we reach out to
the nodes (whose removal is being prevalidated) to find out if they
have a local copy of any red indices.
Closes#87776
This PR adds the first part of the Prevalidate Node Removal API. This
API allows checking whether attempting to remove some node(s) from the
cluster is likely to succeed or not. This check is useful when a node
needs to be removed from a RED cluster, without risking loosing the last
copy of some RED shards.
In this PR, we only check whether a RED index is a Searchable Snapshot
index or not, in which case the removal of any node is safe as the RED
index is backed by a snapshot.
Relates #87776
This is the continuation of #90176 which leverages #90425 to count query types. This PR adds search usage stats to the existing telemetry by counting sections being used as part of a search request, as well as query types. Each distinct query type is counted once per search request.
The counting is performed while parsing, for the following REST search endpoints:
- _search
- _msearch
- _async_search
- _search/template
- _msearch/template
- _fleet/_fleet_search
- _fleet/_fleet_msearch
All other API using search internally, like reindex, ML transform, rank eval, sql etc. are not counted as part of these search usage stats. Such additional functionalities should have its own dedicated telemetry if needed.
The counting of the search sections is not extensive, only the ones that are interesting to collect counts for are tracked.
The following is the new section added to the cluster stats API response, including some sample stats:
```
"search" : {
"total" : 63,
"sections" : {
"knn" : 42,
"query" : 21,
"aggs" : 46
},
"query" : {
"match" : 58
}
}
```
A big part of the change is actually the plumbing to make a common service class that holds the counters available to all the different callers of the parsing methods, especially plugins. Ideally, there would be a separate component that exposes the search parsing functionality rather than static methods, but changing that would require making the additional component available to the REST layer which is not trivial. I reused the existing UsageService which the RestController already holds, and is already used to count access to the different REST endpoints.
Co-authored-by: Mayya Sharipova mayya.sharipova@elastic.co
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.
Closes#90102
So that they are visible in NodeIndicesStats only at the node and index (but not shard) levels. Also visible in the _cat/nodes table. And make an exact count yaml REST test.
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.
See #82975
Adds measures of the total size of all mappings and the total number of
fields in the cluster (both before and after deduplication).
Relates #86639
Relates #77466
Add cluster mapping stats for indexed dense_vectors
Currently _cluster/stats mapping section displays all mapping types
along with their count. In 8.0 we introduced indexed dense_vector
types, and we would like to collect more enhanced stats on them:
- number of indexed dense_vector fields
- sum of dims across all indexed dense_vector fields
This allows to differentiate how indexed dense_vector types are
used as opposed to unindexed dense_vector types.
This commit adds support for CPU ranges in the desired nodes API.
This aligns better with environments where administrators/orchestrators
can define lower and upper bounds for the amount of CPUs that the
desired node would get once deployed.
This allows to provide information about the expected CPU and possible
allowed overcommit that the desired node will run on.
This was the previous expected body for the desired nodes API (we still support it):
```
PUT /_internal/desired_nodes/history/1
{
"nodes" : [
{
"settings" : {
"node.name" : "instance-000187",
"node.external_id": "instance-000187",
"node.roles" : ["data_hot", "master"],
"node.attr.data" : "hot",
"node.attr.logical_availability_zone" : "zone-0"
},
"processors" : 8,
"memory" : "58gb",
"storage" : "1700gb",
"node_version" : "8.3.0"
}
]
}
```
Now it's possible to define `processors` or `processors_range` as in:
```
PUT /_internal/desired_nodes/history/1
{
"nodes" : [
{
"settings" : {
"node.name" : "instance-000187",
"node.external_id": "instance-000187",
"node.roles" : ["data_hot", "master"],
"node.attr.data" : "hot",
"node.attr.logical_availability_zone" : "zone-0"
},
"processors_range" : {"min": 8.0, "max": 16.0},
"memory" : "58gb",
"storage" : "1700gb",
"node_version" : "8.3.0"
}
]
}
```
Note that `max` in `processors_range` is optional.
This commit also moves from representing CPUs as integers to
accept floating point numbers.
Note: I disabled the bwc yamlRestTests for versions < 8.3 since we introduced
a few "breaking changes" but since this is an internal API it should be fine.
Today the add/clear voting config exclusions APIs route a request to the
master node but do not expose the usual `?master_timeout` parameter
allowing to change the timeout for this phase of execution. This commit
adds the missing parameter.
Remove usage of deprecated elasticsearch.rest-test in DocsTestPlugin
we keep some files in src/test in docs projects as moving them would require more changes
in build-docs project outside this repository
The default distribution is the only remaining build flavor, and has been for
quite a while now. This commit removes flavor from the internal Build
class. It keeps rest api compat for nodes info for now by hardcoding
`default`.
The no-jdk distributions exist in 7.x and before. They were removed with
8.0. This commit removes the remaining deprecation messages for using
the no-jdk distribution. Note that when talking with an older node, we
drop the bundledJdk attribute. This is ok because it is only possible
for this to not be true when talking with a 7.17 node, during an upgrade,
and the usingBundledJdk is retained, which is the important thing if
debugging a problem.
relates #76896
relates #85758
This commit adds an explanation for the relation between `allow_partial_search_results` and `skip_unavailable` in CCS requests.
Relates to #33915Closes#82407
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
The cluster allocation explain API includes a top-level status
indicating to the user whether the shard can be assigned/rebalanced/etc
or not. Today this status is fairly terse and experience shows that
users sometimes struggle to understand how to interpret it and to decide
on follow-up actions.
This commit makes the top-level explanation more detailed and
actionable. For instance, in the cases like `THROTTLED` where the status
is transient we instruct the user to wait; if a shard is lost we say to
restore it from a snapshot; if a shard cannot be assigned we say to
choose a specific node where its assignment is expected and to address
the obstacles.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
This commit adds the Desired Nodes API, allowing orchestrators
that manage Elasticsearch clusters to let the system know about the
current/planned topology that the cluster will run on.
This allows the system to take better decisions based on the entire
cluster topology, including nodes that will be added/removed in the
near future.
The `GET _cluster/state` API is really only suitable for debugging or
diagnostics. Its response format is not documented since it changes
fairly freely between versions.
Today we mention in its docs that this API is unstable, and deliberately
omit a description of its response format, but we don't explicitly say
that it's only for diagnostics and is unsuitable for consumption by
external tools that might try and use it for monitoring.
This commit adjusts the docs to give some more explicit guidance about
how it should and shouldn't be used.
`GET _nodes/stats` returns statistics about indexing pressure for each node.
With this commit `GET _cluster/stats` now returns stats about indexing pressure
computed by aggregating the indexing pressure stats of each node in the
cluster.
Closes#79788
Adds to the transport node stats a record of the distribution of the
times for which a transport thread was handling a message, represented
as a histogram.
Closes#80428
* Revert "Return 200 OK response code for a cluster health timeout (#78968)"
This reverts commit a2c3daea
* Revert "Allow deprecation warning for the return_200_for_cluster_health_timeout parameter (#80178)"
This reverts commit 1c711e35fc.
* Revert "Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)"
This reverts commit b9fbe66ab0.
* Revert "Adjust the BWC version for the return200ForClusterHealthTimeout field (#79436)"
This reverts commit f60bda5685.
* Revert "Use query param instead of a system property for opting in for new cluster health response code (#79351)"
This reverts commit 8901a999
* Revert "Deprecate returning 408 for a server timeout on `_cluster/health` (#78180)"
This reverts commit f266eb32
* Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)
This reverts commit fa4d562c
* Revert "Disable BWC for #80821 (#80839)"
This reverts commit cb0e73e2fc.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
#80556 reverted the deprecation of transient cluster settings. This replaces deprecation language in the docs with a warning/recommendation to avoid transient settings.
Closes#80557
# Conflicts:
# docs/reference/migration/migrate_7_16.asciidoc
Changes:
* Adds a transient settings migration guide to the 7.16 docs.
* Updates the related deprecation docs to link to the guide.
Closes#80055
Relates to #79167.
The original change was implemented in #78940, bu we have decided to move from a system property to an a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.
Relates #70849
Since #65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.
This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.
The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)