Today the add/clear voting config exclusions APIs route a request to the
master node but do not expose the usual `?master_timeout` parameter
allowing to change the timeout for this phase of execution. This commit
adds the missing parameter.
Remove usage of deprecated elasticsearch.rest-test in DocsTestPlugin
we keep some files in src/test in docs projects as moving them would require more changes
in build-docs project outside this repository
The default distribution is the only remaining build flavor, and has been for
quite a while now. This commit removes flavor from the internal Build
class. It keeps rest api compat for nodes info for now by hardcoding
`default`.
The no-jdk distributions exist in 7.x and before. They were removed with
8.0. This commit removes the remaining deprecation messages for using
the no-jdk distribution. Note that when talking with an older node, we
drop the bundledJdk attribute. This is ok because it is only possible
for this to not be true when talking with a 7.17 node, during an upgrade,
and the usingBundledJdk is retained, which is the important thing if
debugging a problem.
relates #76896
relates #85758
This commit adds an explanation for the relation between `allow_partial_search_results` and `skip_unavailable` in CCS requests.
Relates to #33915Closes#82407
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
The cluster allocation explain API includes a top-level status
indicating to the user whether the shard can be assigned/rebalanced/etc
or not. Today this status is fairly terse and experience shows that
users sometimes struggle to understand how to interpret it and to decide
on follow-up actions.
This commit makes the top-level explanation more detailed and
actionable. For instance, in the cases like `THROTTLED` where the status
is transient we instruct the user to wait; if a shard is lost we say to
restore it from a snapshot; if a shard cannot be assigned we say to
choose a specific node where its assignment is expected and to address
the obstacles.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
This commit adds the Desired Nodes API, allowing orchestrators
that manage Elasticsearch clusters to let the system know about the
current/planned topology that the cluster will run on.
This allows the system to take better decisions based on the entire
cluster topology, including nodes that will be added/removed in the
near future.
The `GET _cluster/state` API is really only suitable for debugging or
diagnostics. Its response format is not documented since it changes
fairly freely between versions.
Today we mention in its docs that this API is unstable, and deliberately
omit a description of its response format, but we don't explicitly say
that it's only for diagnostics and is unsuitable for consumption by
external tools that might try and use it for monitoring.
This commit adjusts the docs to give some more explicit guidance about
how it should and shouldn't be used.
`GET _nodes/stats` returns statistics about indexing pressure for each node.
With this commit `GET _cluster/stats` now returns stats about indexing pressure
computed by aggregating the indexing pressure stats of each node in the
cluster.
Closes#79788
Adds to the transport node stats a record of the distribution of the
times for which a transport thread was handling a message, represented
as a histogram.
Closes#80428
* Revert "Return 200 OK response code for a cluster health timeout (#78968)"
This reverts commit a2c3daea
* Revert "Allow deprecation warning for the return_200_for_cluster_health_timeout parameter (#80178)"
This reverts commit 1c711e35fc.
* Revert "Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)"
This reverts commit b9fbe66ab0.
* Revert "Adjust the BWC version for the return200ForClusterHealthTimeout field (#79436)"
This reverts commit f60bda5685.
* Revert "Use query param instead of a system property for opting in for new cluster health response code (#79351)"
This reverts commit 8901a999
* Revert "Deprecate returning 408 for a server timeout on `_cluster/health` (#78180)"
This reverts commit f266eb32
* Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)
This reverts commit fa4d562c
* Revert "Disable BWC for #80821 (#80839)"
This reverts commit cb0e73e2fc.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
#80556 reverted the deprecation of transient cluster settings. This replaces deprecation language in the docs with a warning/recommendation to avoid transient settings.
Closes#80557
# Conflicts:
# docs/reference/migration/migrate_7_16.asciidoc
Changes:
* Adds a transient settings migration guide to the 7.16 docs.
* Updates the related deprecation docs to link to the guide.
Closes#80055
Relates to #79167.
The original change was implemented in #78940, bu we have decided to move from a system property to an a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.
Relates #70849
Since #65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.
This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.
The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)
This PR changes uses of transient cluster settings to
persistent cluster settings.
The PR also deprecates the transient settings usage.
Relates to #49540
The docs for `GET _nodes/<node>/<metric>` omitted a couple of metrics
and indicated that this API returned dynamic stats rather than static
info. They also didn't mention that `_all` is a legal value, nor
did it give a way to suppress all metrics even though this is possible.
This commit adjusts the docs and adds tests to ensure that selecting
metrics works as expected and to ensure that there is a future-proof
legal way to suppress all metrics.
Closes#79187
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
If the _nodes/stats API received a level=shards request parameter, then the response would have two "shards" fields,
which would cause problems with json parsers. This commit renames the "shards" field that currently only contains
"total_count" to "shard_stats".
Relates #78311#75433
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.
Relates #76625
To return the JVM `uptime` metric, the `human` query parameter must be `true`.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Today we often encounter users that are confused by the behaviour of
calling `GET _cluster/allocation/explain` without a body: it _seems_ to
work, but it explains a random shard, and if this isn't the shard
they're thinking of then it's unclear how to proceed.
With this commit we add a note to the response when a shard was randomly
chosen indicating that it is possible, and possibly useful, to explain a
different shard. We also adjust the exception message in the case when
all shards are assigned to indicate why it's an invalid request and what
to do to make it valid.
* Adding shard count to _nodes/stats api
Added a shards section to each node returned by the _nodes/stats api. Currently this new section only contains a total count of all shards on the node.
This commit adds a `cancelled` flag to each cancellable task in the
response to the list tasks API, allowing users to see that a task has
been properly cancelled and will complete as soon as possible.
Closes#72907
Changes:
* Renames 'full copy searchable snapshot' to 'fully mounted index.'
* Renames 'shared cache searchable snapshot' to 'partially mounted index.'
* Removes some unneeded cache setup instructions for the frozen tier. We added a default cache size with #71844.
Today the only example of calling the cluster allocation explain API above the
fold is the bare `GET /_cluster/allocation/explain` which kind of works but is
not usually what the user wants. This commit changes the docs so that we open
with an example showing how we usually expect it to be called. This will make
it clearer that you should normally specify exactly for which shard you want an
explanation. It also tidies up a few other wrinkles in these docs.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
We have recently introduced the ability to associate an indexed field with a script. This commit updates the existing mappings stats to output stats about the script, similar to what we already do for runtime fields.
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.
Relates #69820
Runtime fields usage is currently reported as part of the xpack feature usage API. Now that runtime fields are part of server, their corresponding stats can be moved to be part of the ordinary mapping stats exposed by the cluster stats API.
Adds support for the include_unloaded_segments flag in node stats, which helps with understanding resource usage of
shared_cache-style searchable snapshots on a per-node basis.
This change adds a new "architectures" section to the
cluster stats, containing a summary of how many nodes
in the cluster are on each processor architecture.
The intention is to make it easier to see whether
clusters are running on aarch64, or mixed x86_64/aarch64,
which may aid support as aarch64 becomes more commonly
used.
Today's network config docs are split into "Network", "HTTP" and
"Transport" pages, with unclear relationships between them. We often
encounter users with weird configs that indicate they don't really
understand how these settings all relate. In fact these pages are all
very interrelated, and the HTTP and Transport pages are almost all only
for advanced users. This commit brings these docs into a single page and
rewords some things to try and guide users away from the advanced
settings unless their configuration needs all the extra complexity.
It also adds a section entitled "Binding and publishing" which clarifies
the meanings of the `bind_host` and `publish_host` parameters. This is
also a common source of confusion amongst users.
It also clarifies that many of these settings accept a list of
addresses, and warns that this may not be what you want. Closes#67956.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This commit adds statistics about the index creation versions to the `/_cluster/stats` endpoint. The
stats look like:
```
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"count" : 3,
...
"versions" : [
{
"version" : "8.0.0",
"index_count" : 1,
"primary_shard_count" : 2,
"total_primary_size" : "8.6kb",
"total_primary_bytes" : 8831
},
{
"version" : "7.11.0",
"index_count" : 1,
"primary_shard_count" : 1,
"total_primary_size" : "4.6kb",
"total_primary_bytes" : 4230
}
]
},
...
}
```
(`total_primary_size` is only shown with the `?human` flag)
This is useful for telemetry as it allows us to see if/when a cluster has indices created on a
previous version that would need to be either upgraded or supported during an upgrade.