Today the `current_node` parameter is given in several sample requests
illustrating how to explain an unassigned shard using the cluster
allocation explain API. This doesn't make sense, an unassigned shard has
no `current_node`. This commit removes the misleading parameter in these
cases.
Today we document that tasks may not react to cancellations immediately,
but in practice it's surprising to users and kind of a bug if they run
for too long after being cancelled. This commit adds a little extra
detail about the information to collect to troubleshoot such a
situation.
* Add repo throttle metrics to node stats api response
* Update docs/changelog/96678.yaml
* Change x-content output structure
* Fix test after merge from main
* Follow PR comments
* minor fixes
* minor fixes 2
* Introduce new TransportVersion (V_8_500_010)
* Fix yaml test
* Follow PR comments
* Make stats datapoints human readable
* Follow common pattern for human readable output
* Bump up TransportVersion
Add a new target (`script`) to the `/_info` API. It consolidates all the script information from the cluster nodes and returns a summary at the cluster level (compared with `_nodes/stats/script` it lacks the `<node>` dimension).
Add a new target (thread_pool) to the /_info API. It consolidates all the thread pools information from the cluster nodes and returns a summary at the cluster level (compared with _nodes/stats/thread_pool it lacks the <node> dimension)
Adding a new endpoint under `_info/http`. This endpoint summarises the HTTP info of all the nodes into one big response, at cluster level. Compared with `_nodes/stats`, it lacks the nodes dimension.
This change adds:
* Total global ordinal build time for all fields and per field.
* Max shard value count per field. The value count is per shard and of the shard with the highest count. Reporting value on index level or across indices is too expensive to report or keep track of.
This is added to common stats, which
is exposed in several stats APIs.
The following api call:
```
GET /_nodes/stats?filter_path=nodes.*.indices.fielddata&fields=key,key2
```
Returns:
```
{
"nodes": {
"pcMNy4GsQ8ef6Rw-bI2EFg": {
"indices": {
"fielddata": {
"memory_size_in_bytes": 2552,
"evictions": 0,
"fields": {
"key2": {
"memory_size_in_bytes": 1320
},
"key": {
"memory_size_in_bytes": 1232
}
},
"global_ordinals": {
"build_time_in_millis": 8,
"fields": {
"key2": {
"build_time_in_millis": 4,
"shard_max_value_count": 4
},
"key": {
"build_time_in_millis": 4,
"shard_max_value_count": 4
}
}
}
}
}
}
}
}
```
This introduces an endpoint to reset the desired balance.
It could be used if computed balance diverged from the actual one a lot
to start a new computation from the current state.
Today we report node stats by name, but the desired nodes work in terms
of node IDs. This commit adds a mapping between node name and ID to make
the output easier to interpret.
This prevents docs files from *starting* with a "response" because when
that happens the response is converted to an assertion and appended
to the last snippet that was processed. If that last snipper was in a
different file then it's very hard to reason about the tests. That goes
double because the order we iterate files isn't defined....
Anyway! This adds a guard in the build, removes the offending
"response", and reenables the tests that we'd thought we failing here.
Closes#91081
This PR extends the basic Prevalidation API so that in case there are
red non-searchable-snapshot indices in the cluster, we reach out to
the nodes (whose removal is being prevalidated) to find out if they
have a local copy of any red indices.
Closes#87776
This PR adds the first part of the Prevalidate Node Removal API. This
API allows checking whether attempting to remove some node(s) from the
cluster is likely to succeed or not. This check is useful when a node
needs to be removed from a RED cluster, without risking loosing the last
copy of some RED shards.
In this PR, we only check whether a RED index is a Searchable Snapshot
index or not, in which case the removal of any node is safe as the RED
index is backed by a snapshot.
Relates #87776
This is the continuation of #90176 which leverages #90425 to count query types. This PR adds search usage stats to the existing telemetry by counting sections being used as part of a search request, as well as query types. Each distinct query type is counted once per search request.
The counting is performed while parsing, for the following REST search endpoints:
- _search
- _msearch
- _async_search
- _search/template
- _msearch/template
- _fleet/_fleet_search
- _fleet/_fleet_msearch
All other API using search internally, like reindex, ML transform, rank eval, sql etc. are not counted as part of these search usage stats. Such additional functionalities should have its own dedicated telemetry if needed.
The counting of the search sections is not extensive, only the ones that are interesting to collect counts for are tracked.
The following is the new section added to the cluster stats API response, including some sample stats:
```
"search" : {
"total" : 63,
"sections" : {
"knn" : 42,
"query" : 21,
"aggs" : 46
},
"query" : {
"match" : 58
}
}
```
A big part of the change is actually the plumbing to make a common service class that holds the counters available to all the different callers of the parsing methods, especially plugins. Ideally, there would be a separate component that exposes the search parsing functionality rather than static methods, but changing that would require making the additional component available to the REST layer which is not trivial. I reused the existing UsageService which the RestController already holds, and is already used to count access to the different REST endpoints.
Co-authored-by: Mayya Sharipova mayya.sharipova@elastic.co
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.
Closes#90102
So that they are visible in NodeIndicesStats only at the node and index (but not shard) levels. Also visible in the _cat/nodes table. And make an exact count yaml REST test.
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.
See #82975
Adds measures of the total size of all mappings and the total number of
fields in the cluster (both before and after deduplication).
Relates #86639
Relates #77466