* Note in docs about incorrect IO stats when running in docker
* Update docs/reference/cluster/nodes-stats.asciidoc
Co-authored-by: David Turner <david.turner@elastic.co>
* Requested PR changes to wording
* Update docs/reference/cluster/nodes-stats.asciidoc
Co-authored-by: David Turner <david.turner@elastic.co>
---------
Co-authored-by: David Turner <david.turner@elastic.co>
This change returns the total number of fields at the segment level,
allowing for a more accurate estimate of the memory used by Lucene. The
new estimate is expected to be closer to the actual memory usage than
the current estimate using the index-level field count, due to the
non-trivial overhead incurred by each Lucene segment. Two new fields are
introduced: total_segment_fields, which is the total number of fields at
the segment level, and average_fields_per_segment. The overhead per
field in segments with fewer fields is larger than in segments with many
fields.
* Add SparseVectorStats
* Update to use mappings in engine
* Update to be unique to primary shards
* Fix doc
* Fix null error in test
* Cleanup
* fix yaml
* remove comment
* add version to yaml
* Revert whitespace changes to stats doc
* fix yml test
* Checkstyle
* Fix NPE in test
* Update docs/changelog/108793.yaml
* Add link to sparse_vector field type in docs
* PR feedback
* Flesh out test a bit more
* PR feedback - alphabetize placement in docs
* Fix doc change
Change the ingest byte stats to always be returned
whether or not they have a value of 0. Add human readable
form of byte stats. Update docs to reflect changes.
Current ingest byte stat fields could easily be confused.
Add more descriptive name to make it clear that they do not
count all docs processed by the pipeline.
Add ingested_in_bytes and produced_in_bytes stats to pipeline ingest stats.
These track how many bytes are ingested and produced by a given pipeline.
For efficiency, these stats are recorded for the first pipeline to process a
document. Thus, if a pipeline is called as a final pipeline after a default pipeline,
as a pipeline processor, and after a reroute request, a document will not
contribute to the stats for that pipeline. If a given pipeline has 0 bytes recorded
for both of these stats, due to not being the first pipeline to run any doc, these
stats will not appear in the pipeline's entry in ingest stats.
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
Add a new `total_time_excluding_waiting_on_lock metric` to the index flush stats that measures the flushing time excluding waiting on the flush lock. This metrics provides a more granular view on flush performance and without the overhead of flush throttling.
Resolves ES-7201
Moving https://github.com/elastic/elasticsearch/pull/103472 here.
---
👋 howdy, team!
Could we include "XFS quotas" as an example for "depending on OS or process level restrictions" for this doc's searchability for users to better understand how to investigate this potential lever's impact?
TIA!
This change adds the total dense vector count to the output of the indices stats.
This is useful for observability in order to track the number of indexed vectors
in a cluster.
---------
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
* Add repo throttle metrics to node stats api response
* Update docs/changelog/96678.yaml
* Change x-content output structure
* Fix test after merge from main
* Follow PR comments
* minor fixes
* minor fixes 2
* Introduce new TransportVersion (V_8_500_010)
* Fix yaml test
* Follow PR comments
* Make stats datapoints human readable
* Follow common pattern for human readable output
* Bump up TransportVersion
Adding a new endpoint under `_info/http`. This endpoint summarises the HTTP info of all the nodes into one big response, at cluster level. Compared with `_nodes/stats`, it lacks the nodes dimension.
This prevents docs files from *starting* with a "response" because when
that happens the response is converted to an assertion and appended
to the last snippet that was processed. If that last snipper was in a
different file then it's very hard to reason about the tests. That goes
double because the order we iterate files isn't defined....
Anyway! This adds a guard in the build, removes the offending
"response", and reenables the tests that we'd thought we failing here.
Closes#91081
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.
Closes#90102
So that they are visible in NodeIndicesStats only at the node and index (but not shard) levels. Also visible in the _cat/nodes table. And make an exact count yaml REST test.
Adds to the transport node stats a record of the distribution of the
times for which a transport thread was handling a message, represented
as a histogram.
Closes#80428
Since #65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.
This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.
The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)
If the _nodes/stats API received a level=shards request parameter, then the response would have two "shards" fields,
which would cause problems with json parsers. This commit renames the "shards" field that currently only contains
"total_count" to "shard_stats".
Relates #78311#75433
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.
Relates #76625
To return the JVM `uptime` metric, the `human` query parameter must be `true`.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Adding shard count to _nodes/stats api
Added a shards section to each node returned by the _nodes/stats api. Currently this new section only contains a total count of all shards on the node.
Changes:
* Renames 'full copy searchable snapshot' to 'fully mounted index.'
* Renames 'shared cache searchable snapshot' to 'partially mounted index.'
* Removes some unneeded cache setup instructions for the frozen tier. We added a default cache size with #71844.
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.
Relates #69820
Adds support for the include_unloaded_segments flag in node stats, which helps with understanding resource usage of
shared_cache-style searchable snapshots on a per-node basis.
Today's network config docs are split into "Network", "HTTP" and
"Transport" pages, with unclear relationships between them. We often
encounter users with weird configs that indicate they don't really
understand how these settings all relate. In fact these pages are all
very interrelated, and the HTTP and Transport pages are almost all only
for advanced users. This commit brings these docs into a single page and
rewords some things to try and guide users away from the advanced
settings unless their configuration needs all the extra complexity.
It also adds a section entitled "Binding and publishing" which clarifies
the meanings of the `bind_host` and `publish_host` parameters. This is
also a common source of confusion amongst users.
It also clarifies that many of these settings accept a list of
addresses, and warns that this may not be what you want. Closes#67956.
Co-authored-by: Adam Locke <adam.locke@elastic.co>