Commit graph

106 commits

Author SHA1 Message Date
Howard
dcb248a013
Add node stats effective watermark thresholds docs (#107668)
Relates https://github.com/elastic/elasticsearch/pull/107244
2024-04-21 15:31:11 +01:00
Liam Thompson
33a71e3289
[DOCS] Refactor book-scoped variables in docs/reference/index.asciidoc (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00
Artem Prigoda
6a300509cd
Add metric for calculating index flush time excluding waiting on locks (#107196)
Add a new `total_time_excluding_waiting_on_lock metric` to the index flush stats that measures the flushing time excluding waiting on the flush lock. This metrics provides a more granular view on flush performance and without the overhead of flush throttling.

Resolves ES-7201
2024-04-12 15:04:08 +02:00
Yang Wang
5632380ecd
[Doc] Trivial correction for shard allocator choice (#106216)
Relates: #105894
2024-03-12 18:34:22 +11:00
Ievgen Degtiarenko
5e52059947
Add allocation stats (#105894)
This change attempts to add allocation section to the node stats in
order to simplify unbalanced clusters debugging. It is required for
https://github.com/elastic/elasticsearch/pull/97561
2024-03-11 11:07:56 -04:00
Stef Nestor
18a509a18f
(DOC+) Node Stats fs.available reflects XFS quotas (#106085)
Moving https://github.com/elastic/elasticsearch/pull/103472 here.

---

👋 howdy, team!

Could we include "XFS quotas" as an example for "depending on OS or process level restrictions" for this doc's searchability for users to better understand how to investigate this potential lever's impact?

TIA!
2024-03-08 10:19:27 -05:00
Jim Ferenczi
a5d21ce800
Add the total dense vector count in the indices stats output (#98275)
This change adds the total dense vector count to the output of the indices stats.
This is useful for observability in order to track the number of indexed vectors
in a cluster.

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2023-08-11 23:17:38 +09:00
Abdon Pijpelink
efc0cb5422
[DOCS] Node stats API: fix descriptions of 'cache_size' and 'cache_count' (#98092) 2023-08-03 09:45:22 +02:00
Volodymyr Krasnikov
7abe8cb974
Add repo throttle metrics to node stats api response (#96678)
* Add repo throttle metrics to node stats api response

* Update docs/changelog/96678.yaml

* Change x-content output structure

* Fix test after merge from main

* Follow PR comments

* minor fixes

* minor fixes 2

* Introduce new TransportVersion (V_8_500_010)

* Fix yaml test

* Follow PR comments

* Make stats datapoints human readable

* Follow common pattern for human readable output

* Bump up TransportVersion
2023-06-13 09:04:36 -07:00
Pablo Alcantar Morales
dc5d0546e9
New HTTP info (/_info/http) endpoint (#96198)
Adding a new endpoint under `_info/http`. This endpoint summarises the HTTP info of all the nodes into one big response, at cluster level. Compared with `_nodes/stats`, it lacks the nodes dimension.
2023-05-22 07:43:23 +02:00
David Turner
4ef9965d47
Report transport message size per action (#94543)
Adds to the transport stats a histogram of transport message sizes for
each transport action.

Closes https://github.com/elastic/elasticsearch/issues/88151
2023-03-28 12:05:18 -04:00
David Turner
e43e7c2f4a
Improve transport stats histogram (#93598)
- omits empty buckets at the start and end of the histogram
- includes human-readable representation of the bucket boundaries if `?human` specified
2023-03-17 18:01:58 -04:00
Abdon Pijpelink
c5b1d997d1
[DOCS] Add 'total' object for io_stats in nodes stats response (#93854) 2023-02-23 13:08:42 +01:00
Nik Everett
6481342466
Fix sneaky docs test failure (#91829)
This prevents docs files from *starting* with a "response" because when
that happens the response is converted to an assertion and appended
to the last snippet that was processed. If that last snipper was in a
different file then it's very hard to reason about the tests. That goes
double because the order we iterate files isn't defined....

Anyway! This adds a guard in the build, removes the offending
"response", and reenables the tests that we'd thought we failing here.

Closes #91081
2022-12-07 11:02:44 -05:00
Iraklis Psaroudakis
756fcc212d
Log YAML test file on failure (#91349)
Relates #91081
2022-11-09 18:35:36 +02:00
Iraklis Psaroudakis
aa083ce419
[CI] Mute reference/cluster/nodes-stats (#91399)
relates #91081
2022-11-08 14:57:37 +02:00
Iraklis Psaroudakis
dcdf58721d
[CI] Mute reference/cluster/nodes-stats/line_2735 (#91380)
relates #91081
2022-11-08 05:04:49 -05:00
Hendrik Muhs
1b556d75fa
mute another node stats test (#91346)
muting another test part as it causes a lot of CI failures

relates #91081
2022-11-07 06:07:09 -05:00
Mary Gouseti
d55059afab
Mute reference/cluster/nodes-stats/line_2751 (#91174) 2022-10-28 11:55:53 +02:00
Francisco Fernández Castaño
1a3032beb6
Keep track of average shard write load (#90768)
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.

Closes #90102
2022-10-13 16:34:45 +02:00
Iraklis Psaroudakis
3ed7a04d22
Introduce node mappings stats (#89807)
So that they are visible in NodeIndicesStats only at the node and index (but not shard) levels. Also visible in the _cat/nodes table. And make an exact count yaml REST test.
2022-09-19 15:47:47 +03:00
Joe Gallo
79990fa49b
Remove "Push back excessive requests for stats (#83832)" (#87054) 2022-05-23 12:58:02 -04:00
Gabi Davar
43ab984639
Add documentation for "io_time_in_millis" (#84911)
Add documentation for "io_time_in_millis"

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-04-25 16:43:19 +01:00
Mary Gouseti
ed0bb2a8af
Push back excessive requests for stats (#83832)
Resolves #51992
2022-02-28 08:46:18 +01:00
Jake Landis
fd6f04bb24
[docs] clarify purged http stats (#82123) 2022-01-04 09:51:41 -06:00
David Turner
54e0370b3e
Track histogram of transport handling times (#80581)
Adds to the transport node stats a record of the distribution of the
times for which a transport thread was handling a message, represented
as a histogram.

Closes #80428
2021-11-29 15:41:33 +00:00
Stuart Tettemer
30e15ba838
Script: Time series compile and cache evict metrics (#79078)
Collects compilation and cache eviction metrics for
each script context.

Metrics are available in _nodes/stats in 5m/15m/1d
buckets.

Refs: #62899
2021-11-03 13:13:42 -05:00
David Roberts
e86de065cf
Allow total memory to be overridden (#78750)
Since #65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.

This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.

The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)
2021-10-16 12:01:37 +01:00
Keith Massey
4df15f5177
Changing name of shards field in node/stats api to shard_stats (#78531)
If the _nodes/stats API received a level=shards request parameter, then the response would have two "shards" fields,
which would cause problems with json parsers. This commit renames the "shards" field that currently only contains
"total_count" to "shard_stats".
Relates #78311 #75433
2021-10-06 17:19:04 -05:00
David Turner
4a17847b85
Add timing stats to publication process (#76771)
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.

Relates #76625
2021-08-23 17:38:32 +01:00
Peter Dyson
cad55c8393
[DOCS] Clarify usage of optional human readable jvm uptime metric in Nodes Stats API (#76545)
To return the JVM `uptime` metric, the `human` query parameter must be `true`.

Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-08-20 08:55:23 -04:00
Keith Massey
ddc3b37580
Adding shard count to node stats api (#75433)
* Adding shard count to _nodes/stats api

Added a shards section to each node returned by the _nodes/stats api. Currently this new section only contains a total count of all shards on the node.
2021-07-27 10:39:53 -05:00
James Rodewig
ba66669eb3
[DOCS] Rename mount types for searchable snapshots (#72699)
Changes:

* Renames 'full copy searchable snapshot' to 'fully mounted index.'
* Renames 'shared cache searchable snapshot' to 'partially mounted index.'
* Removes some unneeded cache setup instructions for the frozen tier. We added a default cache size with #71844.
2021-05-05 16:35:33 -04:00
James Rodewig
693807a6d3
[DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
Henning Andersen
0f28e97857
Total data set size in stats (#70625)
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.

Relates #69820
2021-03-30 15:23:29 +02:00
Dan Hermann
8ff7360901
[DOCS] HTTP client stats (#70512) 2021-03-19 06:22:17 -05:00
Yannick Welsch
529c6227fe
Support include_unloaded_segments in node stats (#69682)
Adds support for the include_unloaded_segments flag in node stats, which helps with understanding resource usage of
shared_cache-style searchable snapshots on a per-node basis.
2021-03-01 17:18:47 +01:00
David Turner
2adeb4a666
Expand and consolidate networking docs (#68051)
Today's network config docs are split into "Network", "HTTP" and
"Transport" pages, with unclear relationships between them. We often
encounter users with weird configs that indicate they don't really
understand how these settings all relate. In fact these pages are all
very interrelated, and the HTTP and Transport pages are almost all only
for advanced users. This commit brings these docs into a single page and
rewords some things to try and guide users away from the advanced
settings unless their configuration needs all the extra complexity.

It also adds a section entitled "Binding and publishing" which clarifies
the meanings of the `bind_host` and `publish_host` parameters. This is
also a common source of confusion amongst users.

It also clarifies that many of these settings accept a list of
addresses, and warns that this may not be what you want. Closes #67956.

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2021-02-01 13:06:20 +00:00
James Rodewig
3e34247570
[DOCS] Add security privileges to cluster API docs (#67589) 2021-01-19 10:18:59 -05:00
bellengao
d14492ca13
[DOCS] Fix some typos in docs (#66672) 2020-12-21 12:45:51 +02:00
James Rodewig
1ea83359bb
[DOCS] Fix case for 'Boolean' (#64299) 2020-10-29 09:04:43 -04:00
Adam Locke
789ee2d73e
[DOCS] Combining important config settings into a single page (#63849)
* Combining important config settings into a single page.

* Updating ids for two pages causing link errors and implementing redirects.
2020-10-19 10:02:22 -04:00
James Rodewig
136275e3e6
[DOCS] Fix typo in nodes stats docs (#61601) (#61716)
Co-authored-by: Henry <henryloh@ucla.edu>
2020-08-31 09:29:40 -04:00
James Rodewig
a94e5cb7c4
[DOCS] Replace Wikipedia links with attribute (#61171) 2020-08-17 09:44:24 -04:00
Tim Brooks
b1a6271ec8
Add configured indexing memory limit to node stats (#60342)
This commit adds the configured memory limit to the node stats API.
2020-07-29 11:20:59 -06:00
David Turner
940d618186
Log and track open/close of transport connections (#60297)
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.

This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
2020-07-28 16:58:00 +01:00
Tim Brooks
5c227dac88
Implement human readable indexing pressure stats (#60022)
The indexing pressure stats do not currently have human readable
variants. This commit add human readable variants and updates the
documentation.
2020-07-22 09:54:51 -06:00
Tim Brooks
08506de861
Add indexing pressure documentation (#59456)
This commit adds documentation about the new indexing pressure memory
limit setting and exposure of this metrics in node stats.
2020-07-20 19:35:26 -06:00
David Turner
7bb748da8c
Remove sporadic min/max usage estimates from stats (#59755)
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.

This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
2020-07-20 14:48:53 +01:00
David Turner
83d6589b2a
Account for remaining recovery in disk allocator (#58029)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
2020-07-01 08:04:45 +01:00