* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
Add a new `total_time_excluding_waiting_on_lock metric` to the index flush stats that measures the flushing time excluding waiting on the flush lock. This metrics provides a more granular view on flush performance and without the overhead of flush throttling.
Resolves ES-7201
Moving https://github.com/elastic/elasticsearch/pull/103472 here.
---
👋 howdy, team!
Could we include "XFS quotas" as an example for "depending on OS or process level restrictions" for this doc's searchability for users to better understand how to investigate this potential lever's impact?
TIA!
This change adds the total dense vector count to the output of the indices stats.
This is useful for observability in order to track the number of indexed vectors
in a cluster.
---------
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
* Add repo throttle metrics to node stats api response
* Update docs/changelog/96678.yaml
* Change x-content output structure
* Fix test after merge from main
* Follow PR comments
* minor fixes
* minor fixes 2
* Introduce new TransportVersion (V_8_500_010)
* Fix yaml test
* Follow PR comments
* Make stats datapoints human readable
* Follow common pattern for human readable output
* Bump up TransportVersion
Adding a new endpoint under `_info/http`. This endpoint summarises the HTTP info of all the nodes into one big response, at cluster level. Compared with `_nodes/stats`, it lacks the nodes dimension.
This prevents docs files from *starting* with a "response" because when
that happens the response is converted to an assertion and appended
to the last snippet that was processed. If that last snipper was in a
different file then it's very hard to reason about the tests. That goes
double because the order we iterate files isn't defined....
Anyway! This adds a guard in the build, removes the offending
"response", and reenables the tests that we'd thought we failing here.
Closes#91081
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.
Closes#90102
So that they are visible in NodeIndicesStats only at the node and index (but not shard) levels. Also visible in the _cat/nodes table. And make an exact count yaml REST test.
Adds to the transport node stats a record of the distribution of the
times for which a transport thread was handling a message, represented
as a histogram.
Closes#80428
Since #65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.
This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.
The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)
If the _nodes/stats API received a level=shards request parameter, then the response would have two "shards" fields,
which would cause problems with json parsers. This commit renames the "shards" field that currently only contains
"total_count" to "shard_stats".
Relates #78311#75433
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.
Relates #76625
To return the JVM `uptime` metric, the `human` query parameter must be `true`.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Adding shard count to _nodes/stats api
Added a shards section to each node returned by the _nodes/stats api. Currently this new section only contains a total count of all shards on the node.
Changes:
* Renames 'full copy searchable snapshot' to 'fully mounted index.'
* Renames 'shared cache searchable snapshot' to 'partially mounted index.'
* Removes some unneeded cache setup instructions for the frozen tier. We added a default cache size with #71844.
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.
Relates #69820
Adds support for the include_unloaded_segments flag in node stats, which helps with understanding resource usage of
shared_cache-style searchable snapshots on a per-node basis.
Today's network config docs are split into "Network", "HTTP" and
"Transport" pages, with unclear relationships between them. We often
encounter users with weird configs that indicate they don't really
understand how these settings all relate. In fact these pages are all
very interrelated, and the HTTP and Transport pages are almost all only
for advanced users. This commit brings these docs into a single page and
rewords some things to try and guide users away from the advanced
settings unless their configuration needs all the extra complexity.
It also adds a section entitled "Binding and publishing" which clarifies
the meanings of the `bind_host` and `publish_host` parameters. This is
also a common source of confusion amongst users.
It also clarifies that many of these settings accept a list of
addresses, and warns that this may not be what you want. Closes#67956.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.
This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.
This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.
This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.