elasticsearch/docs/reference/indices/field-usage-stats.asciidoc
Liam Thompson 33a71e3289
[DOCS] Refactor book-scoped variables in docs/reference/index.asciidoc (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00

246 lines
8.2 KiB
Text

[[field-usage-stats]]
=== Field usage stats API
++++
<titleabbrev>Field usage stats</titleabbrev>
++++
experimental[]
Returns field usage information for each shard and field
of an index.
Field usage statistics are automatically captured when
queries are running on a cluster. A shard-level search
request that accesses a given field, even if multiple times
during that request, is counted as a single use.
[source,console]
--------------------------------------------------
GET /my-index-000001/_field_usage_stats
--------------------------------------------------
// TEST[setup:messages]
[[field-usage-stats-api-request]]
==== {api-request-title}
`GET /<index>/_field_usage_stats`
[[field-usage-stats-api-request-prereqs]]
==== {api-prereq-title}
* If the {es} {security-features} are enabled, you must have the `manage`
<<privileges-list-indices,index privilege>> for the target index or index alias.
[[field-usage-stats-api-path-params]]
==== {api-path-parms-title}
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=index]
[[field-usage-stats-api-query-params]]
==== {api-query-parms-title}
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=allow-no-indices]
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailable]
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=timeoutparms]
`fields`::
+
--
(Optional, string)
Comma-separated list or wildcard expressions of fields
to include in the statistics.
--
[role="child_attributes"]
[[field-usage-stats-api-response-body]]
==== {api-response-body-title}
The response body reports the per-shard usage count of the data structures that back the fields in the index.
A given request will increment each count by a maximum value of 1, even if the request accesses the same field multiple times.
`any`::
(integer)
Denotes any kind of use of the field (e.g. via the _inverted index_, _stored fields_, _doc values_, etc.)
such that any usage is counted once for a given search request.
`inverted_index`::
(object)
The _inverted index_ is enabled by the <<mapping-index,`index`>> mapping parameter and configured by setting the <<index-options,`index_options`>> for the field.
+
.Properties of `inverted_index`:
[%collapsible%open]
====
`terms`::
(integer)
Denotes the usages of _terms_ in the _inverted index_, answering the question "Is this field's inverted index used?".
`postings`::
(integer)
Denotes the usage of the _posting list_ which contains the document ids for a given term.
`proximity`::
(integer)
Denotes any kind of usage of either _positions_, _offsets_ or _payloads_ in the _inverted index_
such that any usage is counted once for a given search request.
`positions`::
(integer)
Denotes the usage of _position_ data (order of the term) in the _inverted index_.
`term_frequencies`::
(integer)
Denotes the usage of the _term frequencies_ in the _inverted index_ which are used to calculate scores.
`offsets`::
(integer)
Denotes the usage of the _offsets_ in the _inverted index_ which store the start and end character offsets of the terms.
`payloads`::
(integer)
Denotes the usage of _payloads_ in the _inverted index_,
e.g. via the <<analysis-delimited-payload-tokenfilter, delimited payload token filter>>, or by user-defined analysis components and plugins.
====
`stored_fields`::
(integer)
Denotes the usage of _stored fields_. These are enabled via the <<mapping-store,`store`>> mapping option,
and accessed by specifying the <<stored-fields, `stored_fields`>> query option.
Note that the <<mapping-source-field,`_source`>> and <<mapping-id-field, `_id`>> fields are stored by default and their usage is counted here.
`doc_values`::
(integer)
Denotes the usage of _doc values_, which are primarily used for sorting and aggregations. These are enabled via the <<doc-values,`doc_values`>> mapping parameter.
`points`::
(integer)
Denotes the usage of the Lucene _PointValues_ which are the basis of most numeric _field data types_, including <<spatial_datatypes, spacial data types>>, <<number,numbers>>, <<_core_datatypes,dates>>, and more.
These are used by queries/aggregations for ranges, counts, bucketing, min/max, histograms, spacial, etc.
`norms`::
(integer)
Denotes the usage of <<norms,_norms_>> which contain index-time boost values used for scoring.
`term_vectors`::
(integer)
Denotes the usage of <<term-vector,_term vectors_>> which allow for a document's terms to be retrieved at search time.
Usages include <<highlighting,highlighting>> and the <<query-dsl-mlt-query,More Like This Query>>.
`knn_vectors`::
(integer)
Denotes the usage of the <<dense-vector,_knn_vectors_>> field type,
primarily used for <<knn-search,k-nearest neighbor (kNN) search>>.
[[field-usage-stats-api-example]]
==== {api-examples-title}
//////////////////////////
[source,console]
--------------------------------------------------
POST /my-index-000001/_search
{
"query" : {
"match" : { "context" : "bar" }
},
"aggs": {
"message_stats": {
"string_stats": {
"field": "message.keyword",
"show_distribution": true
}
}
}
}
--------------------------------------------------
// TEST[setup:messages]
//////////////////////////
The following request retrieves field usage information of index `my-index-000001`
on the currently available shards.
[source,console]
--------------------------------------------------
GET /my-index-000001/_field_usage_stats
--------------------------------------------------
// TEST[continued]
The API returns the following response:
[source,console-response]
--------------------------------------------------
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"my-index-000001": {
"shards": [
{
"tracking_id": "MpOl0QlTQ4SYYhEe6KgJoQ",
"tracking_started_at_millis": 1625558985010,
"routing": {
"state": "STARTED",
"primary": true,
"node": "gA6KeeVzQkGURFCUyV-e8Q",
"relocating_node": null
},
"stats" : {
"all_fields": { <1>
"any": "6",
"inverted_index": {
"terms" : 1,
"postings" : 1,
"proximity" : 1,
"positions" : 0,
"term_frequencies" : 1,
"offsets" : 0,
"payloads" : 0
},
"stored_fields" : 2,
"doc_values" : 1,
"points" : 0,
"norms" : 1,
"term_vectors" : 0,
"knn_vectors" : 0
},
"fields": {
"_id": { <2>
"any" : 1,
"inverted_index": {
"terms" : 1,
"postings" : 1,
"proximity" : 1,
"positions" : 0,
"term_frequencies" : 1,
"offsets" : 0,
"payloads" : 0
},
"stored_fields" : 1,
"doc_values" : 0,
"points" : 0,
"norms" : 0,
"term_vectors" : 0,
"knn_vectors" : 0
},
"_source": {...},
"context": {...},
"message.keyword": {...}
}
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/: \{\.\.\.\}/: $body.$_path/]
// TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]
// TESTRESPONSE[s/: "[^"]*"/: $body.$_path/]
<1> Reports the sums of the usage-counts for all fields in the index (on the listed shard).
<2> The field name for which the following usage-counts are reported (on the listed shard).