mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)
Relates to #69291
Co-authored-by: Adam Locke <adam.locke@elastic.co>
421 lines
9.7 KiB
Text
421 lines
9.7 KiB
Text
[[search-aggregations]]
|
||
= Aggregations
|
||
|
||
[partintro]
|
||
--
|
||
An aggregation summarizes your data as metrics, statistics, or other analytics.
|
||
Aggregations help you answer questions like:
|
||
|
||
* What's the average load time for my website?
|
||
* Who are my most valuable customers based on transaction volume?
|
||
* What would be considered a large file on my network?
|
||
* How many products are in each product category?
|
||
|
||
{es} organizes aggregations into three categories:
|
||
|
||
* <<search-aggregations-metrics,Metric>> aggregations that calculate metrics,
|
||
such as a sum or average, from field values.
|
||
|
||
* <<search-aggregations-bucket,Bucket>> aggregations that
|
||
group documents into buckets, also called bins, based on field values, ranges,
|
||
or other criteria.
|
||
|
||
* <<search-aggregations-pipeline,Pipeline>> aggregations that take input from
|
||
other aggregations instead of documents or fields.
|
||
|
||
[discrete]
|
||
[[run-an-agg]]
|
||
=== Run an aggregation
|
||
|
||
You can run aggregations as part of a <<search-your-data,search>> by specifying the <<search-search,search API>>'s `aggs` parameter. The
|
||
following search runs a
|
||
<<search-aggregations-bucket-terms-aggregation,terms aggregation>> on
|
||
`my-field`:
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search
|
||
{
|
||
"aggs": {
|
||
"my-agg-name": {
|
||
"terms": {
|
||
"field": "my-field"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
// TEST[s/my-field/http.request.method/]
|
||
|
||
Aggregation results are in the response's `aggregations` object:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
"took": 78,
|
||
"timed_out": false,
|
||
"_shards": {
|
||
"total": 1,
|
||
"successful": 1,
|
||
"skipped": 0,
|
||
"failed": 0
|
||
},
|
||
"hits": {
|
||
"total": {
|
||
"value": 5,
|
||
"relation": "eq"
|
||
},
|
||
"max_score": 1.0,
|
||
"hits": [...]
|
||
},
|
||
"aggregations": {
|
||
"my-agg-name": { <1>
|
||
"doc_count_error_upper_bound": 0,
|
||
"sum_other_doc_count": 0,
|
||
"buckets": []
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/"took": 78/"took": "$body.took"/]
|
||
// TESTRESPONSE[s/\.\.\.$/"took": "$body.took", "timed_out": false, "_shards": "$body._shards", /]
|
||
// TESTRESPONSE[s/"hits": \[\.\.\.\]/"hits": "$body.hits.hits"/]
|
||
// TESTRESPONSE[s/"buckets": \[\]/"buckets":\[\{"key":"get","doc_count":5\}\]/]
|
||
|
||
<1> Results for the `my-agg-name` aggregation.
|
||
|
||
[discrete]
|
||
[[change-agg-scope]]
|
||
=== Change an aggregation's scope
|
||
|
||
Use the `query` parameter to limit the documents on which an aggregation runs:
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search
|
||
{
|
||
"query": {
|
||
"range": {
|
||
"@timestamp": {
|
||
"gte": "now-1d/d",
|
||
"lt": "now/d"
|
||
}
|
||
}
|
||
},
|
||
"aggs": {
|
||
"my-agg-name": {
|
||
"terms": {
|
||
"field": "my-field"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
// TEST[s/my-field/http.request.method/]
|
||
|
||
[discrete]
|
||
[[return-only-agg-results]]
|
||
=== Return only aggregation results
|
||
|
||
By default, searches containing an aggregation return both search hits and
|
||
aggregation results. To return only aggregation results, set `size` to `0`:
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search
|
||
{
|
||
"size": 0,
|
||
"aggs": {
|
||
"my-agg-name": {
|
||
"terms": {
|
||
"field": "my-field"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
// TEST[s/my-field/http.request.method/]
|
||
|
||
[discrete]
|
||
[[run-multiple-aggs]]
|
||
=== Run multiple aggregations
|
||
|
||
You can specify multiple aggregations in the same request:
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search
|
||
{
|
||
"aggs": {
|
||
"my-first-agg-name": {
|
||
"terms": {
|
||
"field": "my-field"
|
||
}
|
||
},
|
||
"my-second-agg-name": {
|
||
"avg": {
|
||
"field": "my-other-field"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
// TEST[s/my-field/http.request.method/]
|
||
// TEST[s/my-other-field/http.response.bytes/]
|
||
|
||
[discrete]
|
||
[[run-sub-aggs]]
|
||
=== Run sub-aggregations
|
||
|
||
Bucket aggregations support bucket or metric sub-aggregations. For example, a
|
||
terms aggregation with an <<search-aggregations-metrics-avg-aggregation,avg>>
|
||
sub-aggregation calculates an average value for each bucket of documents. There
|
||
is no level or depth limit for nesting sub-aggregations.
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search
|
||
{
|
||
"aggs": {
|
||
"my-agg-name": {
|
||
"terms": {
|
||
"field": "my-field"
|
||
},
|
||
"aggs": {
|
||
"my-sub-agg-name": {
|
||
"avg": {
|
||
"field": "my-other-field"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
// TEST[s/_search/_search?size=0/]
|
||
// TEST[s/my-field/http.request.method/]
|
||
// TEST[s/my-other-field/http.response.bytes/]
|
||
|
||
The response nests sub-aggregation results under their parent aggregation:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"aggregations": {
|
||
"my-agg-name": { <1>
|
||
"doc_count_error_upper_bound": 0,
|
||
"sum_other_doc_count": 0,
|
||
"buckets": [
|
||
{
|
||
"key": "foo",
|
||
"doc_count": 5,
|
||
"my-sub-agg-name": { <2>
|
||
"value": 75.0
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits",/]
|
||
// TESTRESPONSE[s/"key": "foo"/"key": "get"/]
|
||
// TESTRESPONSE[s/"value": 75.0/"value": $body.aggregations.my-agg-name.buckets.0.my-sub-agg-name.value/]
|
||
|
||
<1> Results for the parent aggregation, `my-agg-name`.
|
||
<2> Results for `my-agg-name`'s sub-aggregation, `my-sub-agg-name`.
|
||
|
||
[discrete]
|
||
[[add-metadata-to-an-agg]]
|
||
=== Add custom metadata
|
||
|
||
Use the `meta` object to associate custom metadata with an aggregation:
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search
|
||
{
|
||
"aggs": {
|
||
"my-agg-name": {
|
||
"terms": {
|
||
"field": "my-field"
|
||
},
|
||
"meta": {
|
||
"my-metadata-field": "foo"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
// TEST[s/_search/_search?size=0/]
|
||
|
||
The response returns the `meta` object in place:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"aggregations": {
|
||
"my-agg-name": {
|
||
"meta": {
|
||
"my-metadata-field": "foo"
|
||
},
|
||
"doc_count_error_upper_bound": 0,
|
||
"sum_other_doc_count": 0,
|
||
"buckets": []
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits",/]
|
||
|
||
[discrete]
|
||
[[return-agg-type]]
|
||
=== Return the aggregation type
|
||
|
||
By default, aggregation results include the aggregation's name but not its type.
|
||
To return the aggregation type, use the `typed_keys` query parameter.
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search?typed_keys
|
||
{
|
||
"aggs": {
|
||
"my-agg-name": {
|
||
"histogram": {
|
||
"field": "my-field",
|
||
"interval": 1000
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
// TEST[s/typed_keys/typed_keys&size=0/]
|
||
// TEST[s/my-field/http.response.bytes/]
|
||
|
||
The response returns the aggregation type as a prefix to the aggregation's name.
|
||
|
||
IMPORTANT: Some aggregations return a different aggregation type from the
|
||
type in the request. For example, the terms,
|
||
<<search-aggregations-bucket-significantterms-aggregation,significant terms>>,
|
||
and <<search-aggregations-metrics-percentile-aggregation,percentiles>>
|
||
aggregations return different aggregations types depending on the data type of
|
||
the aggregated field.
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"aggregations": {
|
||
"histogram#my-agg-name": { <1>
|
||
"buckets": []
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits",/]
|
||
// TESTRESPONSE[s/"buckets": \[\]/"buckets":\[\{"key":1070000.0,"doc_count":5\}\]/]
|
||
|
||
<1> The aggregation type, `histogram`, followed by a `#` separator and the aggregation's name, `my-agg-name`.
|
||
|
||
[discrete]
|
||
[[use-scripts-in-an-agg]]
|
||
=== Use scripts in an aggregation
|
||
|
||
When a field doesn't exactly match the aggregation you need, you
|
||
should aggregate on a <<runtime,runtime field>>:
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index-000001/_search?size=0
|
||
{
|
||
"runtime_mappings": {
|
||
"message.length": {
|
||
"type": "long",
|
||
"script": "emit(doc['message.keyword'].value.length())"
|
||
}
|
||
},
|
||
"aggs": {
|
||
"message_length": {
|
||
"histogram": {
|
||
"interval": 10,
|
||
"field": "message.length"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
////
|
||
[source,console-result]
|
||
----
|
||
{
|
||
"timed_out": false,
|
||
"took": "$body.took",
|
||
"_shards": {
|
||
"total": 1,
|
||
"successful": 1,
|
||
"failed": 0,
|
||
"skipped": 0
|
||
},
|
||
"hits": "$body.hits",
|
||
"aggregations": {
|
||
"message_length": {
|
||
"buckets": [
|
||
{
|
||
"key": 30.0,
|
||
"doc_count": 5
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
----
|
||
////
|
||
|
||
Scripts calculate field values dynamically, which adds a little
|
||
overhead to the aggregation. In addition to the time spent calculating,
|
||
some aggregations like <<search-aggregations-bucket-terms-aggregation,`terms`>>
|
||
and <<search-aggregations-bucket-filters-aggregation,`filters`>> can't use
|
||
some of their optimizations with runtime fields. In total, performance costs
|
||
for using a runtime field varies from aggregation to aggregation.
|
||
|
||
// TODO when we have calculated fields we can link to them here.
|
||
|
||
[discrete]
|
||
[[agg-caches]]
|
||
=== Aggregation caches
|
||
|
||
For faster responses, {es} caches the results of frequently run aggregations in
|
||
the <<shard-request-cache,shard request cache>>. To get cached results, use the
|
||
same <<shard-and-node-preference,`preference` string>> for each search. If you
|
||
don't need search hits, <<return-only-agg-results,set `size` to `0`>> to avoid
|
||
filling the cache.
|
||
|
||
{es} routes searches with the same preference string to the same shards. If the
|
||
shards' data doesn’t change between searches, the shards return cached
|
||
aggregation results.
|
||
|
||
[discrete]
|
||
[[limits-for-long-values]]
|
||
=== Limits for `long` values
|
||
|
||
When running aggregations, {es} uses <<number,`double`>> values to hold and
|
||
represent numeric data. As a result, aggregations on <<number,`long`>> numbers
|
||
greater than +2^53^+ are approximate.
|
||
--
|
||
|
||
include::aggregations/bucket.asciidoc[]
|
||
|
||
include::aggregations/metrics.asciidoc[]
|
||
|
||
include::aggregations/pipeline.asciidoc[]
|