[docs] Prepare for docs-assembler (#125118)

* reorg files for docs-assembler and create toc.yml files

* fix build error, add redirects

* only toc

* move images
This commit is contained in:
Colleen McGinnis 2025-03-20 12:09:12 -05:00 committed by GitHub
parent 52bc96240c
commit 9bcd59596d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
396 changed files with 1905 additions and 2214 deletions

View file

@ -22,7 +22,15 @@ cross_links:
- kibana - kibana
- logstash - logstash
toc: toc:
- toc: reference - toc: reference/elasticsearch
- toc: reference/community-contributed
- toc: reference/enrich-processor
- toc: reference/search-connectors
- toc: reference/elasticsearch-plugins
- toc: reference/query-languages
- toc: reference/scripting-languages
- toc: reference/text-analysis
- toc: reference/aggregations
- toc: release-notes - toc: release-notes
- toc: extend - toc: extend
subs: subs:

View file

@ -41,3 +41,102 @@ redirects:
'reference/query-languages/query-dsl-function-score-query.md': 'reference/query-languages/query-dsl/query-dsl-function-score-query.md' 'reference/query-languages/query-dsl-function-score-query.md': 'reference/query-languages/query-dsl/query-dsl-function-score-query.md'
'reference/query-languages/query-dsl-knn-query.md': 'reference/query-languages/query-dsl/query-dsl-knn-query.md' 'reference/query-languages/query-dsl-knn-query.md': 'reference/query-languages/query-dsl/query-dsl-knn-query.md'
'reference/query-languages/query-dsl-text-expansion-query.md': 'reference/query-languages/query-dsl/query-dsl-text-expansion-query.md' 'reference/query-languages/query-dsl-text-expansion-query.md': 'reference/query-languages/query-dsl/query-dsl-text-expansion-query.md'
# Related to https://github.com/elastic/elasticsearch/pull/125118
'reference/community-contributed.md': 'reference/community-contributed/index.md'
'reference/data-analysis/aggregations/bucket.md': 'reference/aggregations/bucket.md'
'reference/data-analysis/aggregations/index.md': 'reference/aggregations/index.md'
'reference/data-analysis/aggregations/metrics.md': 'reference/aggregations/metrics.md'
'reference/data-analysis/aggregations/pipeline.md': 'reference/aggregations/pipeline.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md': 'reference/aggregations/search-aggregations-bucket-composite-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md': 'reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-filter-aggregation.md': 'reference/aggregations/search-aggregations-bucket-filter-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-filters-aggregation.md': 'reference/aggregations/search-aggregations-bucket-filters-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-geodistance-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geodistance-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md': 'reference/aggregations/search-aggregations-bucket-histogram-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md': 'reference/aggregations/search-aggregations-bucket-multi-terms-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md': 'reference/aggregations/search-aggregations-bucket-range-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-significantterms-aggregation.md': 'reference/aggregations/search-aggregations-bucket-significantterms-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md': 'reference/aggregations/search-aggregations-bucket-terms-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-change-point-aggregation.md': 'reference/aggregations/search-aggregations-change-point-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-avg-aggregation.md': 'reference/aggregations/search-aggregations-metrics-avg-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-cardinality-aggregation.md': 'reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-geo-line.md': 'reference/aggregations/search-aggregations-metrics-geo-line.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-geobounds-aggregation.md': 'reference/aggregations/search-aggregations-metrics-geobounds-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-geocentroid-aggregation.md': 'reference/aggregations/search-aggregations-metrics-geocentroid-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-max-aggregation.md': 'reference/aggregations/search-aggregations-metrics-max-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md': 'reference/aggregations/search-aggregations-metrics-percentile-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md': 'reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md': 'reference/aggregations/search-aggregations-metrics-stats-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-sum-aggregation.md': 'reference/aggregations/search-aggregations-metrics-sum-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-top-hits-aggregation.md': 'reference/aggregations/search-aggregations-metrics-top-hits-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-metrics-top-metrics.md': 'reference/aggregations/search-aggregations-metrics-top-metrics.md'
'reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-selector-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-bucket-selector-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-pipeline-cumulative-sum-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-cumulative-sum-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-pipeline-derivative-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-derivative-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md'
'reference/data-analysis/aggregations/search-aggregations-pipeline-movfn-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-movfn-aggregation.md'
'reference/data-analysis/text-analysis/analysis-asciifolding-tokenfilter.md': 'reference/text-analysis/analysis-asciifolding-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-condition-tokenfilter.md': 'reference/text-analysis/analysis-condition-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-htmlstrip-charfilter.md': 'reference/text-analysis/analysis-htmlstrip-charfilter.md'
'reference/data-analysis/text-analysis/analysis-hunspell-tokenfilter.md': 'reference/text-analysis/analysis-hunspell-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-keyword-marker-tokenfilter.md': 'reference/text-analysis/analysis-keyword-marker-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-kstem-tokenfilter.md': 'reference/text-analysis/analysis-kstem-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-lang-analyzer.md': 'reference/text-analysis/analysis-lang-analyzer.md'
'reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md': 'reference/text-analysis/analysis-lowercase-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-mapping-charfilter.md': 'reference/text-analysis/analysis-mapping-charfilter.md'
'reference/data-analysis/text-analysis/analysis-pattern-replace-charfilter.md': 'reference/text-analysis/analysis-pattern-replace-charfilter.md'
'reference/data-analysis/text-analysis/analysis-pattern-tokenizer.md': 'reference/text-analysis/analysis-pattern-tokenizer.md'
'reference/data-analysis/text-analysis/analysis-porterstem-tokenfilter.md': 'reference/text-analysis/analysis-porterstem-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-snowball-tokenfilter.md': 'reference/text-analysis/analysis-snowball-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-standard-analyzer.md': 'reference/text-analysis/analysis-standard-analyzer.md'
'reference/data-analysis/text-analysis/analysis-standard-tokenizer.md': 'reference/text-analysis/analysis-standard-tokenizer.md'
'reference/data-analysis/text-analysis/analysis-stemmer-override-tokenfilter.md': 'reference/text-analysis/analysis-stemmer-override-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-stemmer-tokenfilter.md': 'reference/text-analysis/analysis-stemmer-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md': 'reference/text-analysis/analysis-stop-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-synonym-graph-tokenfilter.md': 'reference/text-analysis/analysis-synonym-graph-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md': 'reference/text-analysis/analysis-synonym-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-whitespace-tokenizer.md': 'reference/text-analysis/analysis-whitespace-tokenizer.md'
'reference/data-analysis/text-analysis/analysis-word-delimiter-graph-tokenfilter.md': 'reference/text-analysis/analysis-word-delimiter-graph-tokenfilter.md'
'reference/data-analysis/text-analysis/analysis-word-delimiter-tokenfilter.md': 'reference/text-analysis/analysis-word-delimiter-tokenfilter.md'
'reference/data-analysis/text-analysis/analyzer-reference.md': 'reference/text-analysis/analyzer-reference.md'
'reference/data-analysis/text-analysis/character-filter-reference.md': 'reference/text-analysis/character-filter-reference.md'
'reference/data-analysis/text-analysis/index.md': 'reference/text-analysis/index.md'
'reference/data-analysis/text-analysis/normalizers.md': 'reference/text-analysis/normalizers.md'
'reference/data-analysis/text-analysis/token-filter-reference.md': 'reference/text-analysis/token-filter-reference.md'
'reference/data-analysis/text-analysis/tokenizer-reference.md': 'reference/text-analysis/tokenizer-reference.md'
'reference/ingestion-tools/enrich-processor/attachment.md': 'reference/enrich-processor/attachment.md'
'reference/ingestion-tools/enrich-processor/convert-processor.md': 'reference/enrich-processor/convert-processor.md'
'reference/ingestion-tools/enrich-processor/csv-processor.md': 'reference/enrich-processor/csv-processor.md'
'reference/ingestion-tools/enrich-processor/date-index-name-processor.md': 'reference/enrich-processor/date-index-name-processor.md'
'reference/ingestion-tools/enrich-processor/date-processor.md': 'reference/enrich-processor/date-processor.md'
'reference/ingestion-tools/enrich-processor/dissect-processor.md': 'reference/enrich-processor/dissect-processor.md'
'reference/ingestion-tools/enrich-processor/dot-expand-processor.md': 'reference/enrich-processor/dot-expand-processor.md'
'reference/ingestion-tools/enrich-processor/enrich-processor.md': 'reference/enrich-processor/enrich-processor.md'
'reference/ingestion-tools/enrich-processor/fingerprint-processor.md': 'reference/enrich-processor/fingerprint-processor.md'
'reference/ingestion-tools/enrich-processor/geoip-processor.md': 'reference/enrich-processor/geoip-processor.md'
'reference/ingestion-tools/enrich-processor/grok-processor.md': 'reference/enrich-processor/grok-processor.md'
'reference/ingestion-tools/enrich-processor/gsub-processor.md': 'reference/enrich-processor/gsub-processor.md'
'reference/ingestion-tools/enrich-processor/htmlstrip-processor.md': 'reference/enrich-processor/htmlstrip-processor.md'
'reference/ingestion-tools/enrich-processor/index.md': 'reference/enrich-processor/index.md'
'reference/ingestion-tools/enrich-processor/inference-processor.md': 'reference/enrich-processor/inference-processor.md'
'reference/ingestion-tools/enrich-processor/ingest-geo-grid-processor.md': 'reference/enrich-processor/ingest-geo-grid-processor.md'
'reference/ingestion-tools/enrich-processor/ingest-node-set-security-user-processor.md': 'reference/enrich-processor/ingest-node-set-security-user-processor.md'
'reference/ingestion-tools/enrich-processor/json-processor.md': 'reference/enrich-processor/json-processor.md'
'reference/ingestion-tools/enrich-processor/lowercase-processor.md': 'reference/enrich-processor/lowercase-processor.md'
'reference/ingestion-tools/enrich-processor/pipeline-processor.md': 'reference/enrich-processor/pipeline-processor.md'
'reference/ingestion-tools/enrich-processor/remove-processor.md': 'reference/enrich-processor/remove-processor.md'
'reference/ingestion-tools/enrich-processor/rename-processor.md': 'reference/enrich-processor/rename-processor.md'
'reference/ingestion-tools/enrich-processor/reroute-processor.md': 'reference/enrich-processor/reroute-processor.md'
'reference/ingestion-tools/enrich-processor/script-processor.md': 'reference/enrich-processor/script-processor.md'
'reference/ingestion-tools/enrich-processor/set-processor.md': 'reference/enrich-processor/set-processor.md'
'reference/ingestion-tools/enrich-processor/trim-processor.md': 'reference/enrich-processor/trim-processor.md'
'reference/ingestion-tools/enrich-processor/user-agent-processor.md': 'reference/enrich-processor/user-agent-processor.md'
'reference/ingestion-tools/search-connectors/connectors-ui-in-kibana.md': 'reference/search-connectors/connectors-ui-in-kibana.md'
'reference/ingestion-tools/search-connectors/es-connectors-github.md': 'reference/search-connectors/es-connectors-github.md'
'reference/ingestion-tools/search-connectors/index.md': 'reference/search-connectors/index.md'
'reference/ingestion-tools/search-connectors/self-managed-connectors.md': 'reference/search-connectors/self-managed-connectors.md'

View file

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 25 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 26 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 106 KiB

After

Width:  |  Height:  |  Size: 106 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 1.5 MiB

After

Width:  |  Height:  |  Size: 1.5 MiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 255 KiB

After

Width:  |  Height:  |  Size: 255 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 2.1 MiB

After

Width:  |  Height:  |  Size: 2.1 MiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 34 KiB

After

Width:  |  Height:  |  Size: 34 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 31 KiB

After

Width:  |  Height:  |  Size: 31 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 136 KiB

After

Width:  |  Height:  |  Size: 136 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 20 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 134 KiB

After

Width:  |  Height:  |  Size: 134 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 304 KiB

After

Width:  |  Height:  |  Size: 304 KiB

Before After
Before After

View file

@ -221,7 +221,7 @@ POST /sales/_search
## Dealing with dots in agg names [dots-in-agg-names] ## Dealing with dots in agg names [dots-in-agg-names]
An alternate syntax is supported to cope with aggregations or metrics which have dots in the name, such as the `99.9`th [percentile](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md). This metric may be referred to as: An alternate syntax is supported to cope with aggregations or metrics which have dots in the name, such as the `99.9`th [percentile](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md). This metric may be referred to as:
```js ```js
"buckets_path": "my_percentile[99.9]" "buckets_path": "my_percentile[99.9]"

View file

@ -7,7 +7,7 @@ mapped_pages:
# Auto-interval date histogram aggregation [search-aggregations-bucket-autodatehistogram-aggregation] # Auto-interval date histogram aggregation [search-aggregations-bucket-autodatehistogram-aggregation]
A multi-bucket aggregation similar to the [Date histogram](/reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number. A multi-bucket aggregation similar to the [Date histogram](/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number.
The buckets field is optional, and will default to 10 buckets if not specified. The buckets field is optional, and will default to 10 buckets if not specified.
@ -55,7 +55,7 @@ POST /sales/_search?size=0
} }
``` ```
1. Supports expressive date [format pattern](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern) 1. Supports expressive date [format pattern](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
Response: Response:

View file

@ -10,7 +10,7 @@ mapped_pages:
A multi-bucket aggregation that groups semi-structured text into buckets. Each `text` field is re-analyzed using a custom analyzer. The resulting tokens are then categorized creating buckets of similarly formatted text values. This aggregation works best with machine generated text like system logs. Only the first 100 analyzed tokens are used to categorize the text. A multi-bucket aggregation that groups semi-structured text into buckets. Each `text` field is re-analyzed using a custom analyzer. The resulting tokens are then categorized creating buckets of similarly formatted text values. This aggregation works best with machine generated text like system logs. Only the first 100 analyzed tokens are used to categorize the text.
::::{note} ::::{note}
If you have considerable memory allocated to your JVM but are receiving circuit breaker exceptions from this aggregation, you may be attempting to categorize text that is poorly formatted for categorization. Consider adding `categorization_filters` or running under [sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-sampler-aggregation.md), [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md), or [random sampler](/reference/data-analysis/aggregations/search-aggregations-random-sampler-aggregation.md) to explore the created categories. If you have considerable memory allocated to your JVM but are receiving circuit breaker exceptions from this aggregation, you may be attempting to categorize text that is poorly formatted for categorization. Consider adding `categorization_filters` or running under [sampler](/reference/aggregations/search-aggregations-bucket-sampler-aggregation.md), [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md), or [random sampler](/reference/aggregations/search-aggregations-random-sampler-aggregation.md) to explore the created categories.
:::: ::::
@ -24,14 +24,14 @@ The algorithm used for categorization was completely changed in version 8.3.0. A
`categorization_analyzer` `categorization_analyzer`
: (Optional, object or string) The categorization analyzer specifies how the text is analyzed and tokenized before being categorized. The syntax is very similar to that used to define the `analyzer` in the [Analyze endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-analyze). This property cannot be used at the same time as `categorization_filters`. : (Optional, object or string) The categorization analyzer specifies how the text is analyzed and tokenized before being categorized. The syntax is very similar to that used to define the `analyzer` in the [Analyze endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-analyze). This property cannot be used at the same time as `categorization_filters`.
The `categorization_analyzer` field can be specified either as a string or as an object. If it is a string it must refer to a [built-in analyzer](/reference/data-analysis/text-analysis/analyzer-reference.md) or one added by another plugin. If it is an object it has the following properties: The `categorization_analyzer` field can be specified either as a string or as an object. If it is a string it must refer to a [built-in analyzer](/reference/text-analysis/analyzer-reference.md) or one added by another plugin. If it is an object it has the following properties:
:::::{dropdown} Properties of `categorization_analyzer` :::::{dropdown} Properties of `categorization_analyzer`
`char_filter` `char_filter`
: (array of strings or objects) One or more [character filters](/reference/data-analysis/text-analysis/character-filter-reference.md). In addition to the built-in character filters, other plugins can provide more character filters. This property is optional. If it is not specified, no character filters are applied prior to categorization. If you are customizing some other aspect of the analyzer and you need to achieve the equivalent of `categorization_filters` (which are not permitted when some other aspect of the analyzer is customized), add them here as [pattern replace character filters](/reference/data-analysis/text-analysis/analysis-pattern-replace-charfilter.md). : (array of strings or objects) One or more [character filters](/reference/text-analysis/character-filter-reference.md). In addition to the built-in character filters, other plugins can provide more character filters. This property is optional. If it is not specified, no character filters are applied prior to categorization. If you are customizing some other aspect of the analyzer and you need to achieve the equivalent of `categorization_filters` (which are not permitted when some other aspect of the analyzer is customized), add them here as [pattern replace character filters](/reference/text-analysis/analysis-pattern-replace-charfilter.md).
`tokenizer` `tokenizer`
: (string or object) The name or definition of the [tokenizer](/reference/data-analysis/text-analysis/tokenizer-reference.md) to use after character filters are applied. This property is compulsory if `categorization_analyzer` is specified as an object. Machine learning provides a tokenizer called `ml_standard` that tokenizes in a way that has been determined to produce good categorization results on a variety of log file formats for logs in English. If you want to use that tokenizer but change the character or token filters, specify `"tokenizer": "ml_standard"` in your `categorization_analyzer`. Additionally, the `ml_classic` tokenizer is available, which tokenizes in the same way as the non-customizable tokenizer in old versions of the product (before 6.2). `ml_classic` was the default categorization tokenizer in versions 6.2 to 7.13, so if you need categorization identical to the default for jobs created in these versions, specify `"tokenizer": "ml_classic"` in your `categorization_analyzer`. : (string or object) The name or definition of the [tokenizer](/reference/text-analysis/tokenizer-reference.md) to use after character filters are applied. This property is compulsory if `categorization_analyzer` is specified as an object. Machine learning provides a tokenizer called `ml_standard` that tokenizes in a way that has been determined to produce good categorization results on a variety of log file formats for logs in English. If you want to use that tokenizer but change the character or token filters, specify `"tokenizer": "ml_standard"` in your `categorization_analyzer`. Additionally, the `ml_classic` tokenizer is available, which tokenizes in the same way as the non-customizable tokenizer in old versions of the product (before 6.2). `ml_classic` was the default categorization tokenizer in versions 6.2 to 7.13, so if you need categorization identical to the default for jobs created in these versions, specify `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
::::{note} ::::{note}
From {{es}} 8.10.0, a new version number is used to track the configuration and state changes in the {{ml}} plugin. This new version number is decoupled from the product version and will increment independently. From {{es}} 8.10.0, a new version number is used to track the configuration and state changes in the {{ml}} plugin. This new version number is decoupled from the product version and will increment independently.
@ -39,7 +39,7 @@ The algorithm used for categorization was completely changed in version 8.3.0. A
`filter` `filter`
: (array of strings or objects) One or more [token filters](/reference/data-analysis/text-analysis/token-filter-reference.md). In addition to the built-in token filters, other plugins can provide more token filters. This property is optional. If it is not specified, no token filters are applied prior to categorization. : (array of strings or objects) One or more [token filters](/reference/text-analysis/token-filter-reference.md). In addition to the built-in token filters, other plugins can provide more token filters. This property is optional. If it is not specified, no token filters are applied prior to categorization.
::::: :::::
@ -90,7 +90,7 @@ The algorithm used for categorization was completely changed in version 8.3.0. A
## Basic use [_basic_use] ## Basic use [_basic_use]
::::{warning} ::::{warning}
Re-analyzing *large* result sets will require a lot of time and memory. This aggregation should be used in conjunction with [Async search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit). Additionally, you may consider using the aggregation as a child of either the [sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation. This will typically improve speed and memory use. Re-analyzing *large* result sets will require a lot of time and memory. This aggregation should be used in conjunction with [Async search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit). Additionally, you may consider using the aggregation as a child of either the [sampler](/reference/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation. This will typically improve speed and memory use.
:::: ::::

View file

@ -237,7 +237,7 @@ GET /_search
} }
``` ```
1. Supports expressive date [format pattern](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern) 1. Supports expressive date [format pattern](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
**Time Zone** **Time Zone**

View file

@ -12,7 +12,7 @@ A sibling pipeline aggregation which executes a correlation function on the conf
## Parameters [bucket-correlation-agg-syntax] ## Parameters [bucket-correlation-agg-syntax]
`buckets_path` `buckets_path`
: (Required, string) Path to the buckets that contain one set of values to correlate. For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax). : (Required, string) Path to the buckets that contain one set of values to correlate. For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
`function` `function`
: (Required, object) The correlation function to execute. : (Required, object) The correlation function to execute.
@ -76,7 +76,7 @@ A `bucket_correlation` aggregation looks like this in isolation:
## Example [bucket-correlation-agg-example] ## Example [bucket-correlation-agg-example]
The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation. The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
This example is only using the 10s percentiles. This example is only using the 10s percentiles.

View file

@ -12,7 +12,7 @@ A sibling pipeline aggregation which executes a two sample KolmogorovSmirnov
## Parameters [bucket-count-ks-test-agg-syntax] ## Parameters [bucket-count-ks-test-agg-syntax]
`buckets_path` `buckets_path`
: (Required, string) Path to the buckets that contain one set of values to correlate. Must be a `_count` path For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax). : (Required, string) Path to the buckets that contain one set of values to correlate. Must be a `_count` path For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
`alternative` `alternative`
: (Optional, list) A list of string values indicating which K-S test alternative to calculate. The valid values are: "greater", "less", "two_sided". This parameter is key for determining the K-S statistic used when calculating the K-S test. Default value is all possible alternative hypotheses. : (Optional, list) A list of string values indicating which K-S test alternative to calculate. The valid values are: "greater", "less", "two_sided". This parameter is key for determining the K-S statistic used when calculating the K-S test. Default value is all possible alternative hypotheses.
@ -46,7 +46,7 @@ A `bucket_count_ks_test` aggregation looks like this in isolation:
## Example [bucket-count-ks-test-agg-example] ## Example [bucket-count-ks-test-agg-example]
The following snippet runs the `bucket_count_ks_test` on the individual terms in the field `version` against a uniform distribution. The uniform distribution reflects the `latency` percentile buckets. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation. The following snippet runs the `bucket_count_ks_test` on the individual terms in the field `version` against a uniform distribution. The uniform distribution reflects the `latency` percentile buckets. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
This example is only using the deciles of `latency`. This example is only using the deciles of `latency`.

View file

@ -7,7 +7,7 @@ mapped_pages:
# Date histogram aggregation [search-aggregations-bucket-datehistogram-aggregation] # Date histogram aggregation [search-aggregations-bucket-datehistogram-aggregation]
This multi-bucket aggregation is similar to the normal [histogram](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md), but it can only be used with date or date range values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal `histogram` on dates as well. The main difference in the two APIs is that here the interval can be specified using date/time expressions. Time-based data requires special support because time-based intervals are not always a fixed length. This multi-bucket aggregation is similar to the normal [histogram](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md), but it can only be used with date or date range values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal `histogram` on dates as well. The main difference in the two APIs is that here the interval can be specified using date/time expressions. Time-based data requires special support because time-based intervals are not always a fixed length.
Like the histogram, values are rounded **down** into the closest bucket. For example, if the interval is a calendar day, `2020-01-03T07:00:01Z` is rounded to `2020-01-03T00:00:00Z`. Values are rounded as follows: Like the histogram, values are rounded **down** into the closest bucket. For example, if the interval is a calendar day, `2020-01-03T07:00:01Z` is rounded to `2020-01-03T00:00:00Z`. Values are rounded as follows:
@ -236,7 +236,7 @@ POST /sales/_search?size=0
} }
``` ```
1. Supports expressive date [format pattern](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern) 1. Supports expressive date [format pattern](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
Response: Response:
@ -600,7 +600,7 @@ POST /sales/_search?size=0
## Parameters [date-histogram-params] ## Parameters [date-histogram-params]
You can control the order of the returned buckets using the `order` settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds` setting, which enables extending the bounds of the histogram beyond the data itself, and `hard_bounds` that limits the histogram to specified bounds. For more information, see [`Extended Bounds`](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-extended-bounds) and [`Hard Bounds`](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-hard-bounds). You can control the order of the returned buckets using the `order` settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds` setting, which enables extending the bounds of the histogram beyond the data itself, and `hard_bounds` that limits the histogram to specified bounds. For more information, see [`Extended Bounds`](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-extended-bounds) and [`Hard Bounds`](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-hard-bounds).
### Missing value [date-histogram-missing-value] ### Missing value [date-histogram-missing-value]
@ -629,7 +629,7 @@ POST /sales/_search?size=0
### Order [date-histogram-order] ### Order [date-histogram-order]
By default the returned buckets are sorted by their `key` ascending, but you can control the order using the `order` setting. This setting supports the same `order` functionality as [`Terms Aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order). By default the returned buckets are sorted by their `key` ascending, but you can control the order using the `order` setting. This setting supports the same `order` functionality as [`Terms Aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order).
### Using a script to aggregate by day of the week [date-histogram-aggregate-scripts] ### Using a script to aggregate by day of the week [date-histogram-aggregate-scripts]

View file

@ -7,7 +7,7 @@ mapped_pages:
# Date range aggregation [search-aggregations-bucket-daterange-aggregation] # Date range aggregation [search-aggregations-bucket-daterange-aggregation]
A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal [range](/reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation is that the `from` and `to` values can be expressed in [Date Math](/reference/elasticsearch/rest-apis/common-options.md#date-math) expressions, and it is also possible to specify a date format by which the `from` and `to` response fields will be returned. Note that this aggregation includes the `from` value and excludes the `to` value for each range. A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal [range](/reference/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation is that the `from` and `to` values can be expressed in [Date Math](/reference/elasticsearch/rest-apis/common-options.md#date-math) expressions, and it is also possible to specify a date format by which the `from` and `to` response fields will be returned. Note that this aggregation includes the `from` value and excludes the `to` value for each range.
Example: Example:

View file

@ -83,7 +83,7 @@ POST /sales/_search?size=0&filter_path=aggregations
## Use the `filters` aggregation for multiple filters [use-filters-agg-for-multiple-filters] ## Use the `filters` aggregation for multiple filters [use-filters-agg-for-multiple-filters]
To group documents using multiple filters, use the [`filters` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-filters-aggregation.md). This is faster than multiple `filter` aggregations. To group documents using multiple filters, use the [`filters` aggregation](/reference/aggregations/search-aggregations-bucket-filters-aggregation.md). This is faster than multiple `filter` aggregations.
For example, use this: For example, use this:

View file

@ -178,7 +178,7 @@ The response would be something like the following:
## Non-keyed Response [non-keyed-response] ## Non-keyed Response [non-keyed-response]
By default, the named filters aggregation returns the buckets as an object. But in some sorting cases, such as [bucket sort](/reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-sort-aggregation.md), the JSON doesnt guarantee the order of elements in the object. You can use the `keyed` parameter to specify the buckets as an array of objects. The value of this parameter can be as follows: By default, the named filters aggregation returns the buckets as an object. But in some sorting cases, such as [bucket sort](/reference/aggregations/search-aggregations-pipeline-bucket-sort-aggregation.md), the JSON doesnt guarantee the order of elements in the object. You can use the `keyed` parameter to specify the buckets as an array of objects. The value of this parameter can be as follows:
`true` `true`
: (Default) Returns the buckets as an object : (Default) Returns the buckets as an object

View file

@ -7,7 +7,7 @@ mapped_pages:
# Geo-distance aggregation [search-aggregations-bucket-geodistance-aggregation] # Geo-distance aggregation [search-aggregations-bucket-geodistance-aggregation]
A multi-bucket aggregation that works on `geo_point` fields and conceptually works very similar to the [range](/reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluates the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket). A multi-bucket aggregation that works on `geo_point` fields and conceptually works very similar to the [range](/reference/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluates the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket).
$$$geodistance-aggregation-example$$$ $$$geodistance-aggregation-example$$$

View file

@ -286,7 +286,7 @@ The table below shows the metric dimensions for cells covered by various string
Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works just as it does for points, except that a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this: Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works just as it does for points, except that a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this:
![geoshape grid](../../../images/geoshape_grid.png "") ![geoshape grid](images/geoshape_grid.png "")
## Options [_options_3] ## Options [_options_3]

View file

@ -204,9 +204,9 @@ Response:
Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works almost as it does for points. There are two key differences: Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works almost as it does for points. There are two key differences:
* When aggregating over `geo_point` data, points are considered within a hexagonal tile if they lie within the edges defined by great circles. In other words the calculation is done using spherical coordinates. However, when aggregating over `geo_shape` data, the shapes are considered within a hexagon if they lie within the edges defined as straight lines on an equirectangular projection. The reason is that Elasticsearch and Lucene treat edges using the equirectangular projection at index and search time. In order to ensure that search results and aggregation results are aligned, we therefore also use equirectangular projection in aggregations. For most data, the difference is subtle or not noticed. However, for low zoom levels (low precision), especially far from the equator, this can be noticeable. For example, if the same point data is indexed as `geo_point` and `geo_shape`, it is possible to get different results when aggregating at lower resolutions. * When aggregating over `geo_point` data, points are considered within a hexagonal tile if they lie within the edges defined by great circles. In other words the calculation is done using spherical coordinates. However, when aggregating over `geo_shape` data, the shapes are considered within a hexagon if they lie within the edges defined as straight lines on an equirectangular projection. The reason is that Elasticsearch and Lucene treat edges using the equirectangular projection at index and search time. In order to ensure that search results and aggregation results are aligned, we therefore also use equirectangular projection in aggregations. For most data, the difference is subtle or not noticed. However, for low zoom levels (low precision), especially far from the equator, this can be noticeable. For example, if the same point data is indexed as `geo_point` and `geo_shape`, it is possible to get different results when aggregating at lower resolutions.
* As is the case with [`geotile_grid`](/reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md#geotilegrid-aggregating-geo-shape), a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this: * As is the case with [`geotile_grid`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md#geotilegrid-aggregating-geo-shape), a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this:
![geoshape hexgrid](../../../images/geoshape_hexgrid.png "") ![geoshape hexgrid](images/geoshape_hexgrid.png "")
## Options [_options_4] ## Options [_options_4]

View file

@ -208,7 +208,7 @@ Response:
Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works almost as it does for points, except that a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this: Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works almost as it does for points, except that a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this:
![geoshape grid](../../../images/geoshape_grid.png "") ![geoshape grid](images/geoshape_grid.png "")
## Options [_options_5] ## Options [_options_5]

View file

@ -156,7 +156,7 @@ POST /sales/_search?size=0
} }
``` ```
When aggregating ranges, buckets are based on the values of the returned documents. This means the response may include buckets outside of a querys range. For example, if your query looks for values greater than 100, and you have a range covering 50 to 150, and an interval of 50, that document will land in 3 buckets - 50, 100, and 150. In general, its best to think of the query and aggregation steps as independent - the query selects a set of documents, and then the aggregation buckets those documents without regard to how they were selected. See [note on bucketing range fields](/reference/data-analysis/aggregations/search-aggregations-bucket-range-field-note.md) for more information and an example. When aggregating ranges, buckets are based on the values of the returned documents. This means the response may include buckets outside of a querys range. For example, if your query looks for values greater than 100, and you have a range covering 50 to 150, and an interval of 50, that document will land in 3 buckets - 50, 100, and 150. In general, its best to think of the query and aggregation steps as independent - the query selects a set of documents, and then the aggregation buckets those documents without regard to how they were selected. See [note on bucketing range fields](/reference/aggregations/search-aggregations-bucket-range-field-note.md) for more information and an example.
$$$search-aggregations-bucket-histogram-aggregation-hard-bounds$$$ $$$search-aggregations-bucket-histogram-aggregation-hard-bounds$$$
The `hard_bounds` is a counterpart of `extended_bounds` and can limit the range of buckets in the histogram. It is particularly useful in the case of open [data ranges](/reference/elasticsearch/mapping-reference/range.md) that can result in a very large number of buckets. The `hard_bounds` is a counterpart of `extended_bounds` and can limit the range of buckets in the histogram. It is particularly useful in the case of open [data ranges](/reference/elasticsearch/mapping-reference/range.md) that can result in a very large number of buckets.
@ -191,7 +191,7 @@ In this example even though the range specified in the query is up to 500, the h
## Order [_order_2] ## Order [_order_2]
By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using the `order` setting. Supports the same `order` functionality as the [`Terms Aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order). By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using the `order` setting. Supports the same `order` functionality as the [`Terms Aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order).
## Offset [_offset] ## Offset [_offset]

View file

@ -7,7 +7,7 @@ mapped_pages:
# IP range aggregation [search-aggregations-bucket-iprange-aggregation] # IP range aggregation [search-aggregations-bucket-iprange-aggregation]
Just like the dedicated [date](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md) range aggregation, there is also a dedicated range aggregation for IP typed fields: Just like the dedicated [date](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md) range aggregation, there is also a dedicated range aggregation for IP typed fields:
Example: Example:

View file

@ -7,9 +7,9 @@ mapped_pages:
# Multi Terms aggregation [search-aggregations-bucket-multi-terms-aggregation] # Multi Terms aggregation [search-aggregations-bucket-multi-terms-aggregation]
A multi-bucket value source based aggregation where buckets are dynamically built - one per unique set of values. The multi terms aggregation is very similar to the [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), however in most cases it will be slower than the terms aggregation and will consume more memory. Therefore, if the same set of fields is constantly used, it would be more efficient to index a combined key for this fields as a separate field and use the terms aggregation on this field. A multi-bucket value source based aggregation where buckets are dynamically built - one per unique set of values. The multi terms aggregation is very similar to the [`terms aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), however in most cases it will be slower than the terms aggregation and will consume more memory. Therefore, if the same set of fields is constantly used, it would be more efficient to index a combined key for this fields as a separate field and use the terms aggregation on this field.
The multi_term aggregations are the most useful when you need to sort by a number of document or a metric aggregation on a composite key and get top N results. If sorting is not required and all values are expected to be retrieved using nested terms aggregation or [`composite aggregations`](/reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md) will be a faster and more memory efficient solution. The multi_term aggregations are the most useful when you need to sort by a number of document or a metric aggregation on a composite key and get top N results. If sorting is not required and all values are expected to be retrieved using nested terms aggregation or [`composite aggregations`](/reference/aggregations/search-aggregations-bucket-composite-aggregation.md) will be a faster and more memory efficient solution.
Example: Example:
@ -32,7 +32,7 @@ GET /products/_search
} }
``` ```
1. `multi_terms` aggregation can work with the same field types as a [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and supports most of the terms aggregation parameters. 1. `multi_terms` aggregation can work with the same field types as a [`terms aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and supports most of the terms aggregation parameters.
Response: Response:
@ -93,7 +93,7 @@ By default, the `multi_terms` aggregation will return the buckets for the top te
## Aggregation Parameters [search-aggregations-bucket-multi-terms-aggregation-parameters] ## Aggregation Parameters [search-aggregations-bucket-multi-terms-aggregation-parameters]
The following parameters are supported. See [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) for more detailed explanation of these parameters. The following parameters are supported. See [`terms aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) for more detailed explanation of these parameters.
size size
: Optional. Defines how many term buckets should be returned out of the overall terms list. Defaults to 10. : Optional. Defines how many term buckets should be returned out of the overall terms list. Defaults to 10.

View file

@ -104,7 +104,7 @@ Response:
} }
``` ```
You can use a [`filter`](/reference/data-analysis/aggregations/search-aggregations-bucket-filter-aggregation.md) sub-aggregation to return results for a specific reseller. You can use a [`filter`](/reference/aggregations/search-aggregations-bucket-filter-aggregation.md) sub-aggregation to return results for a specific reseller.
```console ```console
GET /products/_search?size=0 GET /products/_search?size=0

View file

@ -7,7 +7,7 @@ mapped_pages:
# Rare terms aggregation [search-aggregations-bucket-rare-terms-aggregation] # Rare terms aggregation [search-aggregations-bucket-rare-terms-aggregation]
A multi-bucket value source based aggregation which finds "rare" termsterms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation A multi-bucket value source based aggregation which finds "rare" termsterms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation
## Syntax [_syntax_3] ## Syntax [_syntax_3]
@ -152,21 +152,21 @@ The X-axis shows the number of distinct values the aggregation has seen, and the
This first chart shows precision `0.01`: This first chart shows precision `0.01`:
![accuracy 01](../../../images/accuracy_01.png "") ![accuracy 01](images/accuracy_01.png "")
And precision `0.001` (the default): And precision `0.001` (the default):
![accuracy 001](../../../images/accuracy_001.png "") ![accuracy 001](images/accuracy_001.png "")
And finally `precision 0.0001`: And finally `precision 0.0001`:
![accuracy 0001](../../../images/accuracy_0001.png "") ![accuracy 0001](images/accuracy_0001.png "")
The default precision of `0.001` maintains an accuracy of < 2.5% for the tested conditions, and accuracy slowly degrades in a controlled, linear fashion as the number of distinct values increases. The default precision of `0.001` maintains an accuracy of < 2.5% for the tested conditions, and accuracy slowly degrades in a controlled, linear fashion as the number of distinct values increases.
The default precision of `0.001` has a memory profile of `1.748⁻⁶ * n` bytes, where `n` is the number of distinct values the aggregation has seen (it can also be roughly eyeballed, e.g. 20 million unique values is about 30mb of memory). The memory usage is linear to the number of distinct values regardless of which precision is chosen, the precision only affects the slope of the memory profile as seen in this chart: The default precision of `0.001` has a memory profile of `1.748⁻⁶ * n` bytes, where `n` is the number of distinct values the aggregation has seen (it can also be roughly eyeballed, e.g. 20 million unique values is about 30mb of memory). The memory usage is linear to the number of distinct values regardless of which precision is chosen, the precision only affects the slope of the memory profile as seen in this chart:
![memory](../../../images/memory.png "") ![memory](images/memory.png "")
For comparison, an equivalent terms aggregation at 20 million buckets would be roughly `20m * 69b == ~1.38gb` (with 69 bytes being a very optimistic estimate of an empty bucket cost, far lower than what the circuit breaker accounts for). So although the `rare_terms` agg is relatively heavy, it is still orders of magnitude smaller than the equivalent terms aggregation For comparison, an equivalent terms aggregation at 20 million buckets would be roughly `20m * 69b == ~1.38gb` (with 69 bytes being a very optimistic estimate of an empty bucket cost, far lower than what the circuit breaker accounts for). So although the `rare_terms` agg is relatively heavy, it is still orders of magnitude smaller than the equivalent terms aggregation

View file

@ -524,13 +524,13 @@ Use of background filters will slow the query as each terms postings must be
### Filtering Values [_filtering_values_2] ### Filtering Values [_filtering_values_2]
It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation. It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation.
## Collect mode [_collect_mode] ## Collect mode [_collect_mode]
To avoid memory issues, the `significant_terms` aggregation always computes child aggregations in `breadth_first` mode. A description of the different collection modes can be found in the [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-collect) documentation. To avoid memory issues, the `significant_terms` aggregation always computes child aggregations in `breadth_first` mode. A description of the different collection modes can be found in the [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-collect) documentation.
## Execution hint [_execution_hint_2] ## Execution hint [_execution_hint_2]

View file

@ -7,14 +7,14 @@ mapped_pages:
# Significant text aggregation [search-aggregations-bucket-significanttext-aggregation] # Significant text aggregation [search-aggregations-bucket-significanttext-aggregation]
An aggregation that returns interesting or unusual occurrences of free-text terms in a set. It is like the [significant terms](/reference/data-analysis/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation but differs in that: An aggregation that returns interesting or unusual occurrences of free-text terms in a set. It is like the [significant terms](/reference/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation but differs in that:
* It is specifically designed for use on type `text` fields * It is specifically designed for use on type `text` fields
* It does not require field data or doc-values * It does not require field data or doc-values
* It re-analyzes text content on-the-fly meaning it can also filter duplicate sections of noisy text that otherwise tend to skew statistics. * It re-analyzes text content on-the-fly meaning it can also filter duplicate sections of noisy text that otherwise tend to skew statistics.
::::{warning} ::::{warning}
Re-analyzing *large* result sets will require a lot of time and memory. It is recommended that the significant_text aggregation is used as a child of either the [sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to limit the analysis to a *small* selection of top-matching documents e.g. 200. This will typically improve speed, memory use and quality of results. Re-analyzing *large* result sets will require a lot of time and memory. It is recommended that the significant_text aggregation is used as a child of either the [sampler](/reference/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to limit the analysis to a *small* selection of top-matching documents e.g. 200. This will typically improve speed, memory use and quality of results.
:::: ::::
@ -257,7 +257,7 @@ The results from analysing our deduplicated text are obviously of higher quality
Mr Pozmantier and other one-off associations with elasticsearch no longer appear in the aggregation results as a consequence of copy-and-paste operations or other forms of mechanical repetition. Mr Pozmantier and other one-off associations with elasticsearch no longer appear in the aggregation results as a consequence of copy-and-paste operations or other forms of mechanical repetition.
If your duplicate or near-duplicate content is identifiable via a single-value indexed field (perhaps a hash of the articles `title` text or an `original_press_release_url` field) then it would be more efficient to use a parent [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to eliminate these documents from the sample set based on that single key. The less duplicate content you can feed into the significant_text aggregation up front the better in terms of performance. If your duplicate or near-duplicate content is identifiable via a single-value indexed field (perhaps a hash of the articles `title` text or an `original_press_release_url` field) then it would be more efficient to use a parent [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to eliminate these documents from the sample set based on that single key. The less duplicate content you can feed into the significant_text aggregation up front the better in terms of performance.
::::{admonition} How are the significance scores calculated? ::::{admonition} How are the significance scores calculated?
The numbers returned for scores are primarily intended for ranking different suggestions sensibly rather than something easily understood by end users. The scores are derived from the doc frequencies in *foreground* and *background* sets. In brief, a term is considered significant if there is a noticeable difference in the frequency in which a term appears in the subset and in the background. The way the terms are ranked can be configured, see "Parameters" section. The numbers returned for scores are primarily intended for ranking different suggestions sensibly rather than something easily understood by end users. The scores are derived from the doc frequencies in *foreground* and *background* sets. In brief, a term is considered significant if there is a noticeable difference in the frequency in which a term appears in the subset and in the background. The way the terms are ranked can be configured, see "Parameters" section.
@ -306,7 +306,7 @@ Like most design decisions, this is the basis of a trade-off in which we have ch
### Significance heuristics [_significance_heuristics] ### Significance heuristics [_significance_heuristics]
This aggregation supports the same scoring heuristics (JLH, mutual_information, gnd, chi_square etc) as the [significant terms](/reference/data-analysis/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation This aggregation supports the same scoring heuristics (JLH, mutual_information, gnd, chi_square etc) as the [significant terms](/reference/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation
### Size & Shard Size [sig-text-shard-size] ### Size & Shard Size [sig-text-shard-size]
@ -403,7 +403,7 @@ GET news/_search
### Filtering Values [_filtering_values_3] ### Filtering Values [_filtering_values_3]
It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation. It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation.

View file

@ -69,7 +69,7 @@ By default, you cannot run a `terms` aggregation on a `text` field. Use a `keywo
By default, the `terms` aggregation returns the top ten terms with the most documents. Use the `size` parameter to return more terms, up to the [search.max_buckets](/reference/elasticsearch/configuration-reference/search-settings.md#search-settings-max-buckets) limit. By default, the `terms` aggregation returns the top ten terms with the most documents. Use the `size` parameter to return more terms, up to the [search.max_buckets](/reference/elasticsearch/configuration-reference/search-settings.md#search-settings-max-buckets) limit.
If your data contains 100 or 1000 unique terms, you can increase the `size` of the `terms` aggregation to return them all. If you have more unique terms and you need them all, use the [composite aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md) instead. If your data contains 100 or 1000 unique terms, you can increase the `size` of the `terms` aggregation to return them all. If you have more unique terms and you need them all, use the [composite aggregation](/reference/aggregations/search-aggregations-bucket-composite-aggregation.md) instead.
Larger values of `size` use more memory to compute and, push the whole aggregation close to the `max_buckets` limit. Youll know youve gone too large if the request fails with a message about `max_buckets`. Larger values of `size` use more memory to compute and, push the whole aggregation close to the `max_buckets` limit. Youll know youve gone too large if the request fails with a message about `max_buckets`.
@ -133,7 +133,7 @@ By default, the `terms` aggregation orders terms by descending document `_count`
You can use the `order` parameter to specify a different sort order, but we dont recommend it. It is extremely easy to create a terms ordering that will just return wrong results, and not obvious to see when you have done so. Change this only with caution. You can use the `order` parameter to specify a different sort order, but we dont recommend it. It is extremely easy to create a terms ordering that will just return wrong results, and not obvious to see when you have done so. Change this only with caution.
::::{warning} ::::{warning}
Especially avoid using `"order": { "_count": "asc" }`. If you need to find rare terms, use the [`rare_terms`](/reference/data-analysis/aggregations/search-aggregations-bucket-rare-terms-aggregation.md) aggregation instead. Due to the way the `terms` aggregation [gets terms from shards](#search-aggregations-bucket-terms-aggregation-shard-size), sorting by ascending doc count often produces inaccurate results. Especially avoid using `"order": { "_count": "asc" }`. If you need to find rare terms, use the [`rare_terms`](/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md) aggregation instead. Due to the way the `terms` aggregation [gets terms from shards](#search-aggregations-bucket-terms-aggregation-shard-size), sorting by ascending doc count often produces inaccurate results.
:::: ::::
@ -216,7 +216,7 @@ GET /_search
::::{admonition} Pipeline aggs cannot be used for sorting ::::{admonition} Pipeline aggs cannot be used for sorting
:class: note :class: note
[Pipeline aggregations](/reference/data-analysis/aggregations/pipeline.md) are run during the reduce phase after all other aggregations have already completed. For this reason, they cannot be used for ordering. [Pipeline aggregations](/reference/aggregations/pipeline.md) are run during the reduce phase after all other aggregations have already completed. For this reason, they cannot be used for ordering.
:::: ::::
@ -548,7 +548,7 @@ There are three approaches that you can use to perform a `terms` agg across mult
[`copy_to` field](/reference/elasticsearch/mapping-reference/copy-to.md) [`copy_to` field](/reference/elasticsearch/mapping-reference/copy-to.md)
: If you know ahead of time that you want to collect the terms from two or more fields, then use `copy_to` in your mapping to create a new dedicated field at index time which contains the values from both fields. You can aggregate on this single field, which will benefit from the global ordinals optimization. : If you know ahead of time that you want to collect the terms from two or more fields, then use `copy_to` in your mapping to create a new dedicated field at index time which contains the values from both fields. You can aggregate on this single field, which will benefit from the global ordinals optimization.
[`multi_terms` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md) [`multi_terms` aggregation](/reference/aggregations/search-aggregations-bucket-multi-terms-aggregation.md)
: Use multi_terms aggregation to combine terms from multiple fields into a compound key. This also disables the global ordinals and will be slower than collecting terms from a single field. It is faster but less flexible than using a script. : Use multi_terms aggregation to combine terms from multiple fields into a compound key. This also disables the global ordinals and will be slower than collecting terms from a single field. It is faster but less flexible than using a script.

View file

@ -7,7 +7,7 @@ mapped_pages:
# Variable width histogram aggregation [search-aggregations-bucket-variablewidthhistogram-aggregation] # Variable width histogram aggregation [search-aggregations-bucket-variablewidthhistogram-aggregation]
This is a multi-bucket aggregation similar to [Histogram](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md). However, the width of each bucket is not specified. Rather, a target number of buckets is provided and bucket intervals are dynamically determined based on the document distribution. This is done using a simple one-pass document clustering algorithm that aims to obtain low distances between bucket centroids. Unlike other multi-bucket aggregations, the intervals will not necessarily have a uniform width. This is a multi-bucket aggregation similar to [Histogram](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md). However, the width of each bucket is not specified. Rather, a target number of buckets is provided and bucket intervals are dynamically determined based on the document distribution. This is done using a simple one-pass document clustering algorithm that aims to obtain low distances between bucket centroids. Unlike other multi-bucket aggregations, the intervals will not necessarily have a uniform width.
::::{tip} ::::{tip}
The number of buckets returned will always be less than or equal to the target number. The number of buckets returned will always be less than or equal to the target number.

View file

@ -22,7 +22,7 @@ It is recommended to use the change point aggregation to detect changes in time-
## Parameters [change-point-agg-syntax] ## Parameters [change-point-agg-syntax]
`buckets_path` `buckets_path`
: (Required, string) Path to the buckets that contain one set of values in which to detect a change point. There must be at least 22 bucketed values. Fewer than 1,000 is preferred. For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax). : (Required, string) Path to the buckets that contain one set of values in which to detect a change point. There must be at least 22 bucketed values. Fewer than 1,000 is preferred. For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
## Syntax [_syntax_11] ## Syntax [_syntax_11]

View file

@ -56,7 +56,7 @@ POST /museums/_search?size=0
::::{note} ::::{note}
Unlike the case with the [`geo_bounds`](/reference/data-analysis/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geobounds-aggregation-geo-shape) aggregation, there is no option to set [`wrap_longitude`](/reference/data-analysis/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geo-bounds-wrap-longitude). This is because the cartesian space is euclidean and does not wrap back on itself. So the bounds will always have a minimum x value less than or equal to the maximum x value. Unlike the case with the [`geo_bounds`](/reference/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geobounds-aggregation-geo-shape) aggregation, there is no option to set [`wrap_longitude`](/reference/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geo-bounds-wrap-longitude). This is because the cartesian space is euclidean and does not wrap back on itself. So the bounds will always have a minimum x value less than or equal to the maximum x value.
:::: ::::

View file

@ -91,7 +91,7 @@ POST /museums/_search?size=0
} }
``` ```
The above example uses `cartesian_centroid` as a sub-aggregation to a [terms](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city. The above example uses `cartesian_centroid` as a sub-aggregation to a [terms](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city.
The response for the above aggregation: The response for the above aggregation:
@ -145,7 +145,7 @@ The response for the above aggregation:
## Cartesian Centroid Aggregation on `shape` fields [cartesian-centroid-aggregation-geo-shape] ## Cartesian Centroid Aggregation on `shape` fields [cartesian-centroid-aggregation-geo-shape]
The centroid metric for shapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes consisting of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shapes centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/ingestion-tools/enrich-processor/ingest-circle-processor.md) are treated as polygons. The centroid metric for shapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes consisting of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shapes centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/enrich-processor/ingest-circle-processor.md) are treated as polygons.
| Geometry Type | Centroid Calculation | | Geometry Type | Centroid Calculation |
| --- | --- | | --- | --- |

View file

@ -9,7 +9,7 @@ mapped_pages:
A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents. A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents.
The `extended_stats` aggregations is an extended version of the [`stats`](/reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md) aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`. The `extended_stats` aggregations is an extended version of the [`stats`](/reference/aggregations/search-aggregations-metrics-stats-aggregation.md) aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
Assuming the data consists of documents representing exams grades (between 0 and 100) of students Assuming the data consists of documents representing exams grades (between 0 and 100) of students

View file

@ -77,7 +77,7 @@ The resulting [GeoJSON Feature](https://tools.ietf.org/html/rfc7946#section-3.2)
This result could be displayed in a map user interface: This result could be displayed in a map user interface:
![Kibana map with museum tour of Amsterdam](../../../images/geo_line.png "") ![Kibana map with museum tour of Amsterdam](images/geo_line.png "")
## Options [search-aggregations-metrics-geo-line-options] ## Options [search-aggregations-metrics-geo-line-options]
@ -183,7 +183,7 @@ POST /tour/_bulk?refresh
## Grouping with terms [search-aggregations-metrics-geo-line-grouping-terms] ## Grouping with terms [search-aggregations-metrics-geo-line-grouping-terms]
Using this data, for a non-time-series use case, the grouping can be done using a [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) based on city name. This would work whether or not we had defined the `tour` index as a time series index. Using this data, for a non-time-series use case, the grouping can be done using a [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) based on city name. This would work whether or not we had defined the `tour` index as a time series index.
$$$search-aggregations-metrics-geo-line-terms$$$ $$$search-aggregations-metrics-geo-line-terms$$$
@ -273,7 +273,7 @@ This functionality is in technical preview and may be changed or removed in a fu
:::: ::::
Using the same data as before, we can also perform the grouping with a [`time_series` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md). This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`, in this case the same `city` field used in the previous [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md). This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`. Using the same data as before, we can also perform the grouping with a [`time_series` aggregation](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md). This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`, in this case the same `city` field used in the previous [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md). This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`.
$$$search-aggregations-metrics-geo-line-time-series$$$ $$$search-aggregations-metrics-geo-line-time-series$$$
@ -296,7 +296,7 @@ POST /tour/_search?filter_path=aggregations
``` ```
::::{note} ::::{note}
The `geo_line` aggregation no longer requires the `sort` field when nested within a [`time_series` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md). This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by. If you do set this parameter, and set it to something other than `@timestamp` you will get an error. The `geo_line` aggregation no longer requires the `sort` field when nested within a [`time_series` aggregation](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md). This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by. If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
:::: ::::
@ -366,7 +366,7 @@ These results are essentially the same as with the previous `terms` aggregation
## Why group with time-series? [search-aggregations-metrics-geo-line-grouping-time-series-advantages] ## Why group with time-series? [search-aggregations-metrics-geo-line-grouping-time-series-advantages]
When reviewing these examples, you might think that there is little difference between using [`terms`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) or [`time_series`](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md) to group the geo-lines. However, there are some important differences in behaviour between the two cases. Time series indexes are stored in a very specific order on disk. They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field. This allows the `geo_line` aggregation to be considerably optimized: When reviewing these examples, you might think that there is little difference between using [`terms`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) or [`time_series`](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md) to group the geo-lines. However, there are some important differences in behaviour between the two cases. Time series indexes are stored in a very specific order on disk. They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field. This allows the `geo_line` aggregation to be considerably optimized:
* The same memory allocated for the first bucket can be re-used over and over for all subsequent buckets. This is substantially less memory than required for non-time-series cases where all buckets are collected concurrently. * The same memory allocated for the first bucket can be re-used over and over for all subsequent buckets. This is substantially less memory than required for non-time-series cases where all buckets are collected concurrently.
* No sorting needs to be done, since the data is pre-sorted by `@timestamp`. The time-series data will naturally arrive at the aggregation collector in `DESC` order. This means that if we specify `sort_order:ASC` (the default), we still collect in `DESC` order, but perform an efficient in-memory reverse order before generating the final `LineString` geometry. * No sorting needs to be done, since the data is pre-sorted by `@timestamp`. The time-series data will naturally arrive at the aggregation collector in `DESC` order. This means that if we specify `sort_order:ASC` (the default), we still collect in `DESC` order, but perform an efficient in-memory reverse order before generating the final `LineString` geometry.
@ -377,19 +377,19 @@ Note: There are other significant advantages to working with time-series data an
## Streaming line simplification [search-aggregations-metrics-geo-line-simplification] ## Streaming line simplification [search-aggregations-metrics-geo-line-simplification]
Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the entire geometry to be maintained in memory together with supporting data for the simplification itself. The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting is required, which is the case when grouping is done by the [`time_series` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md), running on an index with the `time_series` index mode. Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the entire geometry to be maintained in memory together with supporting data for the simplification itself. The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting is required, which is the case when grouping is done by the [`time_series` aggregation](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md), running on an index with the `time_series` index mode.
Under these conditions the `geo_line` aggregation allocates memory to the `size` specified, and then fills that memory with the incoming documents. Once the memory is completely filled, documents from within the line are removed as new documents are added. The choice of document to remove is made to minimize the visual impact on the geometry. This process makes use of the [VisvalingamWhyatt algorithm](https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm). Essentially this means points are removed if they have the minimum triangle area, with the triangle defined by the point under consideration and the two points before and after it in the line. In addition, we calculate the area using spherical coordinates so that no planar distortions affect the choice. Under these conditions the `geo_line` aggregation allocates memory to the `size` specified, and then fills that memory with the incoming documents. Once the memory is completely filled, documents from within the line are removed as new documents are added. The choice of document to remove is made to minimize the visual impact on the geometry. This process makes use of the [VisvalingamWhyatt algorithm](https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm). Essentially this means points are removed if they have the minimum triangle area, with the triangle defined by the point under consideration and the two points before and after it in the line. In addition, we calculate the area using spherical coordinates so that no planar distortions affect the choice.
In order to demonstrate how much better line simplification is to line truncation, consider this example of the north shore of Kodiak Island. The data for this is only 209 points, but if we want to set `size` to `100` we get dramatic truncation. In order to demonstrate how much better line simplification is to line truncation, consider this example of the north shore of Kodiak Island. The data for this is only 209 points, but if we want to set `size` to `100` we get dramatic truncation.
![North short of Kodiak Island truncated to 100 points](../../../images/kodiak_geo_line_truncated.png "") ![North short of Kodiak Island truncated to 100 points](images/kodiak_geo_line_truncated.png "")
The grey line is the entire geometry of 209 points, while the blue line is the first 100 points, a very different geometry than the original. The grey line is the entire geometry of 209 points, while the blue line is the first 100 points, a very different geometry than the original.
Now consider the same geometry simplified to 100 points. Now consider the same geometry simplified to 100 points.
![North short of Kodiak Island simplified to 100 points](../../../images/kodiak_geo_line_simplified.png "") ![North short of Kodiak Island simplified to 100 points](images/kodiak_geo_line_simplified.png "")
For comparison we have shown the original in grey, the truncated in blue and the new simplified geometry in magenta. It is possible to see where the new simplified line deviates from the original, but the overall geometry appears almost identical and is still clearly recognizable as the north shore of Kodiak Island. For comparison we have shown the original in grey, the truncated in blue and the new simplified geometry in magenta. It is possible to see where the new simplified line deviates from the original, but the overall geometry appears almost identical and is still clearly recognizable as the north shore of Kodiak Island.

View file

@ -91,7 +91,7 @@ POST /museums/_search?size=0
} }
``` ```
The above example uses `geo_centroid` as a sub-aggregation to a [terms](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city. The above example uses `geo_centroid` as a sub-aggregation to a [terms](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city.
The response for the above aggregation: The response for the above aggregation:
@ -145,7 +145,7 @@ The response for the above aggregation:
## Geo Centroid Aggregation on `geo_shape` fields [geocentroid-aggregation-geo-shape] ## Geo Centroid Aggregation on `geo_shape` fields [geocentroid-aggregation-geo-shape]
The centroid metric for geoshapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes comprising of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shapes centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/ingestion-tools/enrich-processor/ingest-circle-processor.md) are treated as polygons. The centroid metric for geoshapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes comprising of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shapes centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/enrich-processor/ingest-circle-processor.md) are treated as polygons.
| Geometry Type | Centroid Calculation | | Geometry Type | Centroid Calculation |
| --- | --- | | --- | --- |
@ -204,7 +204,7 @@ POST /places/_search?size=0
::::{admonition} Using `geo_centroid` as a sub-aggregation of `geohash_grid` ::::{admonition} Using `geo_centroid` as a sub-aggregation of `geohash_grid`
:class: warning :class: warning
The [`geohash_grid`](/reference/data-analysis/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md) aggregation places documents, not individual geopoints, into buckets. If a documents `geo_point` field contains [multiple values](/reference/elasticsearch/mapping-reference/array.md), the document could be assigned to multiple buckets, even if one or more of its geopoints are outside the bucket boundaries. The [`geohash_grid`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md) aggregation places documents, not individual geopoints, into buckets. If a documents `geo_point` field contains [multiple values](/reference/elasticsearch/mapping-reference/array.md), the document could be assigned to multiple buckets, even if one or more of its geopoints are outside the bucket boundaries.
If a `geocentroid` sub-aggregation is also used, each centroid is calculated using all geopoints in a bucket, including those outside the bucket boundaries. This can result in centroids outside of bucket boundaries. If a `geocentroid` sub-aggregation is also used, each centroid is calculated using all geopoints in a bucket, including those outside the bucket boundaries. This can result in centroids outside of bucket boundaries.

View file

@ -60,9 +60,9 @@ The resulting median absolute deviation of `2` tells us that there is a fair amo
## Approximation [_approximation] ## Approximation [_approximation]
The naive implementation of calculating median absolute deviation stores the entire sample in memory, so this aggregation instead calculates an approximation. It uses the [TDigest data structure](https://github.com/tdunning/t-digest) to approximate the sample median and the median of deviations from the sample median. For more about the approximation characteristics of TDigests, see [Percentiles are (usually) approximate](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation). The naive implementation of calculating median absolute deviation stores the entire sample in memory, so this aggregation instead calculates an approximation. It uses the [TDigest data structure](https://github.com/tdunning/t-digest) to approximate the sample median and the median of deviations from the sample median. For more about the approximation characteristics of TDigests, see [Percentiles are (usually) approximate](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation).
The tradeoff between resource usage and accuracy of a TDigests quantile approximation, and therefore the accuracy of this aggregations approximation of median absolute deviation, is controlled by the `compression` parameter. A higher `compression` setting provides a more accurate approximation at the cost of higher memory usage. For more about the characteristics of the TDigest `compression` parameter see [Compression](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression). The tradeoff between resource usage and accuracy of a TDigests quantile approximation, and therefore the accuracy of this aggregations approximation of median absolute deviation, is controlled by the `compression` parameter. A higher `compression` setting provides a more accurate approximation at the cost of higher memory usage. For more about the characteristics of the TDigest `compression` parameter see [Compression](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression).
```console ```console
GET reviews/_search GET reviews/_search

View file

@ -175,7 +175,7 @@ GET latency/_search
## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation] ## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation]
:::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md :::{include} /reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
::: :::
::::{warning} ::::{warning}

View file

@ -10,7 +10,7 @@ mapped_pages:
A `multi-value` metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents. These values can be extracted from specific numeric or [histogram fields](/reference/elasticsearch/mapping-reference/histogram.md) in the documents. A `multi-value` metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents. These values can be extracted from specific numeric or [histogram fields](/reference/elasticsearch/mapping-reference/histogram.md) in the documents.
::::{note} ::::{note}
Please see [Percentiles are (usually) approximate](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation), [Compression](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression) and [Execution hint](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-execution-hint) for advice regarding approximation, performance and memory use of the percentile ranks aggregation Please see [Percentiles are (usually) approximate](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation), [Compression](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression) and [Execution hint](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-execution-hint) for advice regarding approximation, performance and memory use of the percentile ranks aggregation
:::: ::::

View file

@ -375,7 +375,7 @@ By default `sum` mode is used.
## Relationship between bucket sizes and rate [_relationship_between_bucket_sizes_and_rate] ## Relationship between bucket sizes and rate [_relationship_between_bucket_sizes_and_rate]
The `rate` aggregation supports all rate that can be used [calendar_intervals parameter](/reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_intervals) of `date_histogram` aggregation. The specified rate should compatible with the `date_histogram` aggregation interval, i.e. it should be possible to convert the bucket size into the rate. By default the interval of the `date_histogram` is used. The `rate` aggregation supports all rate that can be used [calendar_intervals parameter](/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_intervals) of `date_histogram` aggregation. The specified rate should compatible with the `date_histogram` aggregation interval, i.e. it should be possible to convert the bucket size into the rate. By default the interval of the `date_histogram` is used.
`"rate": "second"` `"rate": "second"`
: compatible with all intervals : compatible with all intervals

View file

@ -39,7 +39,7 @@ The top_hits aggregation returns regular search hits, because of this many per h
* [Include Sequence Numbers and Primary Terms](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) * [Include Sequence Numbers and Primary Terms](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search)
::::{important} ::::{important}
If you **only** need `docvalue_fields`, `size`, and `sort` then [Top metrics](/reference/data-analysis/aggregations/search-aggregations-metrics-top-metrics.md) might be a more efficient choice than the Top Hits Aggregation. If you **only** need `docvalue_fields`, `size`, and `sort` then [Top metrics](/reference/aggregations/search-aggregations-metrics-top-metrics.md) might be a more efficient choice than the Top Hits Aggregation.
:::: ::::

View file

@ -44,7 +44,7 @@ Which returns:
} }
``` ```
`top_metrics` is fairly similar to [`top_hits`](/reference/data-analysis/aggregations/search-aggregations-metrics-top-hits-aggregation.md) in spirit but because it is more limited it is able to do its job using less memory and is often faster. `top_metrics` is fairly similar to [`top_hits`](/reference/aggregations/search-aggregations-metrics-top-hits-aggregation.md) in spirit but because it is more limited it is able to do its job using less memory and is often faster.
## `sort` [_sort] ## `sort` [_sort]
@ -268,7 +268,7 @@ If `size` is more than `1` the `top_metrics` aggregation cant be the **target
### Use with terms [search-aggregations-metrics-top-metrics-example-terms] ### Use with terms [search-aggregations-metrics-top-metrics-example-terms]
This aggregation should be quite useful inside of [`terms`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregation, to, say, find the last value reported by each server. This aggregation should be quite useful inside of [`terms`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregation, to, say, find the last value reported by each server.
$$$search-aggregations-metrics-top-metrics-terms$$$ $$$search-aggregations-metrics-top-metrics-terms$$$

View file

@ -23,10 +23,10 @@ A sibling pipeline aggregation which calculates the mean value of a specified me
## Parameters [avg-bucket-params] ## Parameters [avg-bucket-params]
`buckets_path` `buckets_path`
: (Required, string) Path to the buckets to average. For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax). : (Required, string) Path to the buckets to average. For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
`gap_policy` `gap_policy`
: (Optional, string) Policy to apply when gaps are found in the data. For valid values, see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy). Defaults to `skip`. : (Optional, string) Policy to apply when gaps are found in the data. For valid values, see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy). Defaults to `skip`.
`format` `format`
: (Optional, string) [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for the output value. If specified, the formatted value is returned in the aggregations `value_as_string` property. : (Optional, string) [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for the output value. If specified, the formatted value is returned in the aggregations `value_as_string` property.

View file

@ -33,8 +33,8 @@ $$$bucket-script-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `script` | The script to run for this aggregation. The script can be inline, file or indexed. (see [Scripting](docs-content://explore-analyze/scripting.md)for more details) | Required | | | `script` | The script to run for this aggregation. The script can be inline, file or indexed. (see [Scripting](docs-content://explore-analyze/scripting.md)for more details) | Required | |
| `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
The following snippet calculates the ratio percentage of t-shirt sales compared to total sales each month: The following snippet calculates the ratio percentage of t-shirt sales compared to total sales each month:

View file

@ -38,8 +38,8 @@ $$$bucket-selector-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `script` | The script to run for this aggregation. The script can be inline, file or indexed. (see [Scripting](docs-content://explore-analyze/scripting.md)for more details) | Required | | | `script` | The script to run for this aggregation. The script can be inline, file or indexed. (see [Scripting](docs-content://explore-analyze/scripting.md)for more details) | Required | |
| `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
The following snippet only retains buckets where the total sales for the month is more than 200: The following snippet only retains buckets where the total sales for the month is more than 200:

View file

@ -42,7 +42,7 @@ $$$bucket-sort-params$$$
| `sort` | The list of fields to sort on. See [`sort`](/reference/elasticsearch/rest-apis/sort-search-results.md) for more details. | Optional | | | `sort` | The list of fields to sort on. See [`sort`](/reference/elasticsearch/rest-apis/sort-search-results.md) for more details. | Optional | |
| `from` | Buckets in positions prior to the set value will be truncated. | Optional | `0` | | `from` | Buckets in positions prior to the set value will be truncated. | Optional | `0` |
| `size` | The number of buckets to return. Defaults to all buckets of the parent aggregation. | Optional | | | `size` | The number of buckets to return. Defaults to all buckets of the parent aggregation. | Optional | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
The following snippet returns the buckets corresponding to the 3 months with the highest total sales in descending order: The following snippet returns the buckets corresponding to the 3 months with the highest total sales in descending order:

View file

@ -27,7 +27,7 @@ $$$cumulative-cardinality-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the cardinality aggregation we wish to find the cumulative cardinality for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the cardinality aggregation we wish to find the cumulative cardinality for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
The following snippet calculates the cumulative cardinality of the total daily `users`: The following snippet calculates the cumulative cardinality of the total daily `users`:

View file

@ -25,7 +25,7 @@ $$$cumulative-sum-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to find the cumulative sum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to find the cumulative sum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
The following snippet calculates the cumulative sum of the total monthly `sales`: The following snippet calculates the cumulative sum of the total monthly `sales`:

View file

@ -23,8 +23,8 @@ $$$derivative-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to find the derivative for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to find the derivative for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |

View file

@ -27,8 +27,8 @@ $$$extended-stats-bucket-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
| `sigma` | The number of standard deviations above/below the mean to display | Optional | 2 | | `sigma` | The number of standard deviations above/below the mean to display | Optional | 2 |

View file

@ -43,7 +43,7 @@ $$$inference-bucket-params$$$
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `model_id` | The ID or alias for the trained model. | Required | - | | `model_id` | The ID or alias for the trained model. | Required | - |
| `inference_config` | Contains the inference type and its options. There are two types: [`regression`](#inference-agg-regression-opt) and [`classification`](#inference-agg-classification-opt) | Optional | - | | `inference_config` | Contains the inference type and its options. There are two types: [`regression`](#inference-agg-regression-opt) and [`classification`](#inference-agg-classification-opt) | Optional | - |
| `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | - | | `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | - |
## Configuration options for {{infer}} models [_configuration_options_for_infer_models] ## Configuration options for {{infer}} models [_configuration_options_for_infer_models]

View file

@ -25,8 +25,8 @@ $$$max-bucket-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to find the maximum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to find the maximum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
The following snippet calculates the maximum of the total monthly `sales`: The following snippet calculates the maximum of the total monthly `sales`:

View file

@ -25,8 +25,8 @@ $$$min-bucket-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to find the minimum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to find the minimum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
The following snippet calculates the minimum of the total monthly `sales`: The following snippet calculates the minimum of the total monthly `sales`:

View file

@ -27,10 +27,10 @@ $$$moving-fn-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | | | `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
| `window` | The size of window to "slide" across the histogram. | Required | | | `window` | The size of window to "slide" across the histogram. | Required | |
| `script` | The script that should be executed on each window of data | Required | | | `script` | The script that should be executed on each window of data | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data. See [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy). | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data. See [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy). | Optional | `skip` |
| `shift` | [Shift](#shift-parameter) of window position. | Optional | 0 | | `shift` | [Shift](#shift-parameter) of window position. | Optional | 0 |
`moving_fn` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be embedded like any other metric aggregation: `moving_fn` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be embedded like any other metric aggregation:
@ -67,7 +67,7 @@ POST /_search
3. Finally, we specify a `moving_fn` aggregation which uses "the_sum" metric as its input. 3. Finally, we specify a `moving_fn` aggregation which uses "the_sum" metric as its input.
Moving averages are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add numeric metrics, such as a `sum`, inside of that histogram. Finally, the `moving_fn` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`. Moving averages are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add numeric metrics, such as a `sum`, inside of that histogram. Finally, the `moving_fn` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`.
An example response from the above aggregation may look like: An example response from the above aggregation may look like:

View file

@ -7,9 +7,9 @@ mapped_pages:
# Moving percentiles aggregation [search-aggregations-pipeline-moving-percentiles-aggregation] # Moving percentiles aggregation [search-aggregations-pipeline-moving-percentiles-aggregation]
Given an ordered series of [percentiles](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md), the Moving Percentile aggregation will slide a window across those percentiles and allow the user to compute the cumulative percentile. Given an ordered series of [percentiles](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md), the Moving Percentile aggregation will slide a window across those percentiles and allow the user to compute the cumulative percentile.
This is conceptually very similar to the [Moving Function](/reference/data-analysis/aggregations/search-aggregations-pipeline-movfn-aggregation.md) pipeline aggregation, except it works on the percentiles sketches instead of the actual buckets values. This is conceptually very similar to the [Moving Function](/reference/aggregations/search-aggregations-pipeline-movfn-aggregation.md) pipeline aggregation, except it works on the percentiles sketches instead of the actual buckets values.
## Syntax [_syntax_19] ## Syntax [_syntax_19]
@ -28,9 +28,9 @@ $$$moving-percentiles-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | Path to the percentile of interest (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | | | `buckets_path` | Path to the percentile of interest (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
| `window` | The size of window to "slide" across the histogram. | Required | | | `window` | The size of window to "slide" across the histogram. | Required | |
| `shift` | [Shift](/reference/data-analysis/aggregations/search-aggregations-pipeline-movfn-aggregation.md#shift-parameter) of window position. | Optional | 0 | | `shift` | [Shift](/reference/aggregations/search-aggregations-pipeline-movfn-aggregation.md#shift-parameter) of window position. | Optional | 0 |
`moving_percentiles` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be embedded like any other metric aggregation: `moving_percentiles` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be embedded like any other metric aggregation:
@ -68,7 +68,7 @@ POST /_search
3. Finally, we specify a `moving_percentiles` aggregation which uses "the_percentile" sketch as its input. 3. Finally, we specify a `moving_percentiles` aggregation which uses "the_percentile" sketch as its input.
Moving percentiles are built by first specifying a `histogram` or `date_histogram` over a field. You then add a percentile metric inside of that histogram. Finally, the `moving_percentiles` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at the percentiles aggregation inside of the histogram (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`). Moving percentiles are built by first specifying a `histogram` or `date_histogram` over a field. You then add a percentile metric inside of that histogram. Finally, the `moving_percentiles` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at the percentiles aggregation inside of the histogram (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`).
And the following may be the response: And the following may be the response:
@ -132,7 +132,7 @@ And the following may be the response:
} }
``` ```
The output format of the `moving_percentiles` aggregation is inherited from the format of the referenced [`percentiles`](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation. The output format of the `moving_percentiles` aggregation is inherited from the format of the referenced [`percentiles`](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
Moving percentiles pipeline aggregations always run with `skip` gap policy. Moving percentiles pipeline aggregations always run with `skip` gap policy.

View file

@ -7,7 +7,7 @@ mapped_pages:
# Normalize aggregation [search-aggregations-pipeline-normalize-aggregation] # Normalize aggregation [search-aggregations-pipeline-normalize-aggregation]
A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value. Values that cannot be normalized, will be skipped using the [skip gap policy](/reference/data-analysis/aggregations/pipeline.md#gap-policy). A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value. Values that cannot be normalized, will be skipped using the [skip gap policy](/reference/aggregations/pipeline.md#gap-policy).
## Syntax [_syntax_20] ## Syntax [_syntax_20]
@ -26,7 +26,7 @@ $$$normalize_pipeline-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to normalize (see [`buckets_path` syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to normalize (see [`buckets_path` syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `method` | The specific [method](#normalize_pipeline-method) to apply | Required | | | `method` | The specific [method](#normalize_pipeline-method) to apply | Required | |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |

View file

@ -25,8 +25,8 @@ $$$percentiles-bucket-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to find the percentiles for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to find the percentiles for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
| `percents` | The list of percentiles to calculate | Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]` | | `percents` | The list of percentiles to calculate | Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]` |
| `keyed` | Flag which returns the range as an hash instead of an array of key-value pairs | Optional | `true` | | `keyed` | Flag which returns the range as an hash instead of an array of key-value pairs | Optional | `true` |

View file

@ -15,7 +15,7 @@ Single periods are also useful for transforming data into a stationary series. I
By calculating the first-difference, we de-trend the data (e.g. remove a constant, linear trend). We can see that the data becomes a stationary series (e.g. the first difference is randomly distributed around zero, and doesnt seem to exhibit any pattern/behavior). The transformation reveals that the dataset is following a random-walk; the value is the previous value +/- a random amount. This insight allows selection of further tools for analysis. By calculating the first-difference, we de-trend the data (e.g. remove a constant, linear trend). We can see that the data becomes a stationary series (e.g. the first difference is randomly distributed around zero, and doesnt seem to exhibit any pattern/behavior). The transformation reveals that the dataset is following a random-walk; the value is the previous value +/- a random amount. This insight allows selection of further tools for analysis.
:::{image} ../../../images/dow.png :::{image} images/dow.png
:alt: dow :alt: dow
:title: Dow Jones plotted and made stationary with first-differencing :title: Dow Jones plotted and made stationary with first-differencing
:name: serialdiff_dow :name: serialdiff_dow
@ -25,7 +25,7 @@ Larger periods can be used to remove seasonal / cyclic behavior. In this example
The first-difference removes the constant trend, leaving just a sine wave. The 30th-difference is then applied to the first-difference to remove the cyclic behavior, leaving a stationary series which is amenable to other analysis. The first-difference removes the constant trend, leaving just a sine wave. The 30th-difference is then applied to the first-difference to remove the cyclic behavior, leaving a stationary series which is amenable to other analysis.
:::{image} ../../../images/lemmings.png :::{image} images/lemmings.png
:alt: lemmings :alt: lemmings
:title: Lemmings data plotted made stationary with 1st and 30th difference :title: Lemmings data plotted made stationary with 1st and 30th difference
:name: serialdiff_lemmings :name: serialdiff_lemmings
@ -48,7 +48,7 @@ $$$serial-diff-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | | | `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
| `lag` | The historical bucket to subtract from the current value. E.g. a lag of 7 will subtract the current value from the value 7 buckets ago. Must be a positive, non-zero integer | Optional | `1` | | `lag` | The historical bucket to subtract from the current value. E.g. a lag of 7 will subtract the current value from the value 7 buckets ago. Must be a positive, non-zero integer | Optional | `1` |
| `gap_policy` | Determines what should happen when a gap in the data is encountered. | Optional | `insert_zeros` | | `gap_policy` | Determines what should happen when a gap in the data is encountered. | Optional | `insert_zeros` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
@ -88,6 +88,6 @@ POST /_search
3. Finally, we specify a `serial_diff` aggregation which uses "the_sum" metric as its input. 3. Finally, we specify a `serial_diff` aggregation which uses "the_sum" metric as its input.
Serial differences are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add normal metrics, such as a `sum`, inside of that histogram. Finally, the `serial_diff` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`. Serial differences are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add normal metrics, such as a `sum`, inside of that histogram. Finally, the `serial_diff` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`.

View file

@ -25,8 +25,8 @@ $$$stats-bucket-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property | Optional | `null` |
The following snippet calculates the stats for monthly `sales`: The following snippet calculates the stats for monthly `sales`:

View file

@ -25,8 +25,8 @@ $$$sum-bucket-params$$$
| Parameter Name | Description | Required | Default Value | | Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `buckets_path` | The path to the buckets we wish to find the sum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | | | `buckets_path` | The path to the buckets we wish to find the sum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` | | `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property. | Optional | `null` | | `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregations`value_as_string` property. | Optional | `null` |
The following snippet calculates the sum of all the total monthly `sales` buckets: The following snippet calculates the sum of all the total monthly `sales` buckets:

Some files were not shown because too many files have changed in this diff Show more