[docs] Prepare for docs-assembler (#125118)
* reorg files for docs-assembler and create toc.yml files * fix build error, add redirects * only toc * move images
|
@ -22,7 +22,15 @@ cross_links:
|
|||
- kibana
|
||||
- logstash
|
||||
toc:
|
||||
- toc: reference
|
||||
- toc: reference/elasticsearch
|
||||
- toc: reference/community-contributed
|
||||
- toc: reference/enrich-processor
|
||||
- toc: reference/search-connectors
|
||||
- toc: reference/elasticsearch-plugins
|
||||
- toc: reference/query-languages
|
||||
- toc: reference/scripting-languages
|
||||
- toc: reference/text-analysis
|
||||
- toc: reference/aggregations
|
||||
- toc: release-notes
|
||||
- toc: extend
|
||||
subs:
|
||||
|
|
|
@ -41,3 +41,102 @@ redirects:
|
|||
'reference/query-languages/query-dsl-function-score-query.md': 'reference/query-languages/query-dsl/query-dsl-function-score-query.md'
|
||||
'reference/query-languages/query-dsl-knn-query.md': 'reference/query-languages/query-dsl/query-dsl-knn-query.md'
|
||||
'reference/query-languages/query-dsl-text-expansion-query.md': 'reference/query-languages/query-dsl/query-dsl-text-expansion-query.md'
|
||||
|
||||
# Related to https://github.com/elastic/elasticsearch/pull/125118
|
||||
'reference/community-contributed.md': 'reference/community-contributed/index.md'
|
||||
'reference/data-analysis/aggregations/bucket.md': 'reference/aggregations/bucket.md'
|
||||
'reference/data-analysis/aggregations/index.md': 'reference/aggregations/index.md'
|
||||
'reference/data-analysis/aggregations/metrics.md': 'reference/aggregations/metrics.md'
|
||||
'reference/data-analysis/aggregations/pipeline.md': 'reference/aggregations/pipeline.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md': 'reference/aggregations/search-aggregations-bucket-composite-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md': 'reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-filter-aggregation.md': 'reference/aggregations/search-aggregations-bucket-filter-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-filters-aggregation.md': 'reference/aggregations/search-aggregations-bucket-filters-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-geodistance-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geodistance-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md': 'reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md': 'reference/aggregations/search-aggregations-bucket-histogram-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md': 'reference/aggregations/search-aggregations-bucket-multi-terms-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md': 'reference/aggregations/search-aggregations-bucket-range-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-significantterms-aggregation.md': 'reference/aggregations/search-aggregations-bucket-significantterms-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md': 'reference/aggregations/search-aggregations-bucket-terms-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-change-point-aggregation.md': 'reference/aggregations/search-aggregations-change-point-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-avg-aggregation.md': 'reference/aggregations/search-aggregations-metrics-avg-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-cardinality-aggregation.md': 'reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-geo-line.md': 'reference/aggregations/search-aggregations-metrics-geo-line.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-geobounds-aggregation.md': 'reference/aggregations/search-aggregations-metrics-geobounds-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-geocentroid-aggregation.md': 'reference/aggregations/search-aggregations-metrics-geocentroid-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-max-aggregation.md': 'reference/aggregations/search-aggregations-metrics-max-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md': 'reference/aggregations/search-aggregations-metrics-percentile-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md': 'reference/aggregations/search-aggregations-metrics-scripted-metric-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md': 'reference/aggregations/search-aggregations-metrics-stats-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-sum-aggregation.md': 'reference/aggregations/search-aggregations-metrics-sum-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-top-hits-aggregation.md': 'reference/aggregations/search-aggregations-metrics-top-hits-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-metrics-top-metrics.md': 'reference/aggregations/search-aggregations-metrics-top-metrics.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-bucket-script-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-selector-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-bucket-selector-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-pipeline-cumulative-sum-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-cumulative-sum-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-pipeline-derivative-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-derivative-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md'
|
||||
'reference/data-analysis/aggregations/search-aggregations-pipeline-movfn-aggregation.md': 'reference/aggregations/search-aggregations-pipeline-movfn-aggregation.md'
|
||||
'reference/data-analysis/text-analysis/analysis-asciifolding-tokenfilter.md': 'reference/text-analysis/analysis-asciifolding-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-condition-tokenfilter.md': 'reference/text-analysis/analysis-condition-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-htmlstrip-charfilter.md': 'reference/text-analysis/analysis-htmlstrip-charfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-hunspell-tokenfilter.md': 'reference/text-analysis/analysis-hunspell-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-keyword-marker-tokenfilter.md': 'reference/text-analysis/analysis-keyword-marker-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-kstem-tokenfilter.md': 'reference/text-analysis/analysis-kstem-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-lang-analyzer.md': 'reference/text-analysis/analysis-lang-analyzer.md'
|
||||
'reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md': 'reference/text-analysis/analysis-lowercase-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-mapping-charfilter.md': 'reference/text-analysis/analysis-mapping-charfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-pattern-replace-charfilter.md': 'reference/text-analysis/analysis-pattern-replace-charfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-pattern-tokenizer.md': 'reference/text-analysis/analysis-pattern-tokenizer.md'
|
||||
'reference/data-analysis/text-analysis/analysis-porterstem-tokenfilter.md': 'reference/text-analysis/analysis-porterstem-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-snowball-tokenfilter.md': 'reference/text-analysis/analysis-snowball-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-standard-analyzer.md': 'reference/text-analysis/analysis-standard-analyzer.md'
|
||||
'reference/data-analysis/text-analysis/analysis-standard-tokenizer.md': 'reference/text-analysis/analysis-standard-tokenizer.md'
|
||||
'reference/data-analysis/text-analysis/analysis-stemmer-override-tokenfilter.md': 'reference/text-analysis/analysis-stemmer-override-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-stemmer-tokenfilter.md': 'reference/text-analysis/analysis-stemmer-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md': 'reference/text-analysis/analysis-stop-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-synonym-graph-tokenfilter.md': 'reference/text-analysis/analysis-synonym-graph-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md': 'reference/text-analysis/analysis-synonym-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-whitespace-tokenizer.md': 'reference/text-analysis/analysis-whitespace-tokenizer.md'
|
||||
'reference/data-analysis/text-analysis/analysis-word-delimiter-graph-tokenfilter.md': 'reference/text-analysis/analysis-word-delimiter-graph-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analysis-word-delimiter-tokenfilter.md': 'reference/text-analysis/analysis-word-delimiter-tokenfilter.md'
|
||||
'reference/data-analysis/text-analysis/analyzer-reference.md': 'reference/text-analysis/analyzer-reference.md'
|
||||
'reference/data-analysis/text-analysis/character-filter-reference.md': 'reference/text-analysis/character-filter-reference.md'
|
||||
'reference/data-analysis/text-analysis/index.md': 'reference/text-analysis/index.md'
|
||||
'reference/data-analysis/text-analysis/normalizers.md': 'reference/text-analysis/normalizers.md'
|
||||
'reference/data-analysis/text-analysis/token-filter-reference.md': 'reference/text-analysis/token-filter-reference.md'
|
||||
'reference/data-analysis/text-analysis/tokenizer-reference.md': 'reference/text-analysis/tokenizer-reference.md'
|
||||
'reference/ingestion-tools/enrich-processor/attachment.md': 'reference/enrich-processor/attachment.md'
|
||||
'reference/ingestion-tools/enrich-processor/convert-processor.md': 'reference/enrich-processor/convert-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/csv-processor.md': 'reference/enrich-processor/csv-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/date-index-name-processor.md': 'reference/enrich-processor/date-index-name-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/date-processor.md': 'reference/enrich-processor/date-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/dissect-processor.md': 'reference/enrich-processor/dissect-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/dot-expand-processor.md': 'reference/enrich-processor/dot-expand-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/enrich-processor.md': 'reference/enrich-processor/enrich-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/fingerprint-processor.md': 'reference/enrich-processor/fingerprint-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/geoip-processor.md': 'reference/enrich-processor/geoip-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/grok-processor.md': 'reference/enrich-processor/grok-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/gsub-processor.md': 'reference/enrich-processor/gsub-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/htmlstrip-processor.md': 'reference/enrich-processor/htmlstrip-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/index.md': 'reference/enrich-processor/index.md'
|
||||
'reference/ingestion-tools/enrich-processor/inference-processor.md': 'reference/enrich-processor/inference-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/ingest-geo-grid-processor.md': 'reference/enrich-processor/ingest-geo-grid-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/ingest-node-set-security-user-processor.md': 'reference/enrich-processor/ingest-node-set-security-user-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/json-processor.md': 'reference/enrich-processor/json-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/lowercase-processor.md': 'reference/enrich-processor/lowercase-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/pipeline-processor.md': 'reference/enrich-processor/pipeline-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/remove-processor.md': 'reference/enrich-processor/remove-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/rename-processor.md': 'reference/enrich-processor/rename-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/reroute-processor.md': 'reference/enrich-processor/reroute-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/script-processor.md': 'reference/enrich-processor/script-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/set-processor.md': 'reference/enrich-processor/set-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/trim-processor.md': 'reference/enrich-processor/trim-processor.md'
|
||||
'reference/ingestion-tools/enrich-processor/user-agent-processor.md': 'reference/enrich-processor/user-agent-processor.md'
|
||||
'reference/ingestion-tools/search-connectors/connectors-ui-in-kibana.md': 'reference/search-connectors/connectors-ui-in-kibana.md'
|
||||
'reference/ingestion-tools/search-connectors/es-connectors-github.md': 'reference/search-connectors/es-connectors-github.md'
|
||||
'reference/ingestion-tools/search-connectors/index.md': 'reference/search-connectors/index.md'
|
||||
'reference/ingestion-tools/search-connectors/self-managed-connectors.md': 'reference/search-connectors/self-managed-connectors.md'
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 25 KiB |
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 26 KiB |
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 24 KiB |
Before Width: | Height: | Size: 106 KiB After Width: | Height: | Size: 106 KiB |
Before Width: | Height: | Size: 1.5 MiB After Width: | Height: | Size: 1.5 MiB |
Before Width: | Height: | Size: 255 KiB After Width: | Height: | Size: 255 KiB |
Before Width: | Height: | Size: 2.1 MiB After Width: | Height: | Size: 2.1 MiB |
Before Width: | Height: | Size: 34 KiB After Width: | Height: | Size: 34 KiB |
Before Width: | Height: | Size: 31 KiB After Width: | Height: | Size: 31 KiB |
Before Width: | Height: | Size: 136 KiB After Width: | Height: | Size: 136 KiB |
Before Width: | Height: | Size: 20 KiB After Width: | Height: | Size: 20 KiB |
Before Width: | Height: | Size: 134 KiB After Width: | Height: | Size: 134 KiB |
Before Width: | Height: | Size: 304 KiB After Width: | Height: | Size: 304 KiB |
|
@ -221,7 +221,7 @@ POST /sales/_search
|
|||
|
||||
## Dealing with dots in agg names [dots-in-agg-names]
|
||||
|
||||
An alternate syntax is supported to cope with aggregations or metrics which have dots in the name, such as the `99.9`th [percentile](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md). This metric may be referred to as:
|
||||
An alternate syntax is supported to cope with aggregations or metrics which have dots in the name, such as the `99.9`th [percentile](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md). This metric may be referred to as:
|
||||
|
||||
```js
|
||||
"buckets_path": "my_percentile[99.9]"
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# Auto-interval date histogram aggregation [search-aggregations-bucket-autodatehistogram-aggregation]
|
||||
|
||||
|
||||
A multi-bucket aggregation similar to the [Date histogram](/reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number.
|
||||
A multi-bucket aggregation similar to the [Date histogram](/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number.
|
||||
|
||||
The buckets field is optional, and will default to 10 buckets if not specified.
|
||||
|
||||
|
@ -55,7 +55,7 @@ POST /sales/_search?size=0
|
|||
}
|
||||
```
|
||||
|
||||
1. Supports expressive date [format pattern](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
|
||||
1. Supports expressive date [format pattern](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
|
||||
|
||||
|
||||
Response:
|
|
@ -10,7 +10,7 @@ mapped_pages:
|
|||
A multi-bucket aggregation that groups semi-structured text into buckets. Each `text` field is re-analyzed using a custom analyzer. The resulting tokens are then categorized creating buckets of similarly formatted text values. This aggregation works best with machine generated text like system logs. Only the first 100 analyzed tokens are used to categorize the text.
|
||||
|
||||
::::{note}
|
||||
If you have considerable memory allocated to your JVM but are receiving circuit breaker exceptions from this aggregation, you may be attempting to categorize text that is poorly formatted for categorization. Consider adding `categorization_filters` or running under [sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-sampler-aggregation.md), [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md), or [random sampler](/reference/data-analysis/aggregations/search-aggregations-random-sampler-aggregation.md) to explore the created categories.
|
||||
If you have considerable memory allocated to your JVM but are receiving circuit breaker exceptions from this aggregation, you may be attempting to categorize text that is poorly formatted for categorization. Consider adding `categorization_filters` or running under [sampler](/reference/aggregations/search-aggregations-bucket-sampler-aggregation.md), [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md), or [random sampler](/reference/aggregations/search-aggregations-random-sampler-aggregation.md) to explore the created categories.
|
||||
::::
|
||||
|
||||
|
||||
|
@ -24,14 +24,14 @@ The algorithm used for categorization was completely changed in version 8.3.0. A
|
|||
`categorization_analyzer`
|
||||
: (Optional, object or string) The categorization analyzer specifies how the text is analyzed and tokenized before being categorized. The syntax is very similar to that used to define the `analyzer` in the [Analyze endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-analyze). This property cannot be used at the same time as `categorization_filters`.
|
||||
|
||||
The `categorization_analyzer` field can be specified either as a string or as an object. If it is a string it must refer to a [built-in analyzer](/reference/data-analysis/text-analysis/analyzer-reference.md) or one added by another plugin. If it is an object it has the following properties:
|
||||
The `categorization_analyzer` field can be specified either as a string or as an object. If it is a string it must refer to a [built-in analyzer](/reference/text-analysis/analyzer-reference.md) or one added by another plugin. If it is an object it has the following properties:
|
||||
|
||||
:::::{dropdown} Properties of `categorization_analyzer`
|
||||
`char_filter`
|
||||
: (array of strings or objects) One or more [character filters](/reference/data-analysis/text-analysis/character-filter-reference.md). In addition to the built-in character filters, other plugins can provide more character filters. This property is optional. If it is not specified, no character filters are applied prior to categorization. If you are customizing some other aspect of the analyzer and you need to achieve the equivalent of `categorization_filters` (which are not permitted when some other aspect of the analyzer is customized), add them here as [pattern replace character filters](/reference/data-analysis/text-analysis/analysis-pattern-replace-charfilter.md).
|
||||
: (array of strings or objects) One or more [character filters](/reference/text-analysis/character-filter-reference.md). In addition to the built-in character filters, other plugins can provide more character filters. This property is optional. If it is not specified, no character filters are applied prior to categorization. If you are customizing some other aspect of the analyzer and you need to achieve the equivalent of `categorization_filters` (which are not permitted when some other aspect of the analyzer is customized), add them here as [pattern replace character filters](/reference/text-analysis/analysis-pattern-replace-charfilter.md).
|
||||
|
||||
`tokenizer`
|
||||
: (string or object) The name or definition of the [tokenizer](/reference/data-analysis/text-analysis/tokenizer-reference.md) to use after character filters are applied. This property is compulsory if `categorization_analyzer` is specified as an object. Machine learning provides a tokenizer called `ml_standard` that tokenizes in a way that has been determined to produce good categorization results on a variety of log file formats for logs in English. If you want to use that tokenizer but change the character or token filters, specify `"tokenizer": "ml_standard"` in your `categorization_analyzer`. Additionally, the `ml_classic` tokenizer is available, which tokenizes in the same way as the non-customizable tokenizer in old versions of the product (before 6.2). `ml_classic` was the default categorization tokenizer in versions 6.2 to 7.13, so if you need categorization identical to the default for jobs created in these versions, specify `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
|
||||
: (string or object) The name or definition of the [tokenizer](/reference/text-analysis/tokenizer-reference.md) to use after character filters are applied. This property is compulsory if `categorization_analyzer` is specified as an object. Machine learning provides a tokenizer called `ml_standard` that tokenizes in a way that has been determined to produce good categorization results on a variety of log file formats for logs in English. If you want to use that tokenizer but change the character or token filters, specify `"tokenizer": "ml_standard"` in your `categorization_analyzer`. Additionally, the `ml_classic` tokenizer is available, which tokenizes in the same way as the non-customizable tokenizer in old versions of the product (before 6.2). `ml_classic` was the default categorization tokenizer in versions 6.2 to 7.13, so if you need categorization identical to the default for jobs created in these versions, specify `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
|
||||
|
||||
::::{note}
|
||||
From {{es}} 8.10.0, a new version number is used to track the configuration and state changes in the {{ml}} plugin. This new version number is decoupled from the product version and will increment independently.
|
||||
|
@ -39,7 +39,7 @@ The algorithm used for categorization was completely changed in version 8.3.0. A
|
|||
|
||||
|
||||
`filter`
|
||||
: (array of strings or objects) One or more [token filters](/reference/data-analysis/text-analysis/token-filter-reference.md). In addition to the built-in token filters, other plugins can provide more token filters. This property is optional. If it is not specified, no token filters are applied prior to categorization.
|
||||
: (array of strings or objects) One or more [token filters](/reference/text-analysis/token-filter-reference.md). In addition to the built-in token filters, other plugins can provide more token filters. This property is optional. If it is not specified, no token filters are applied prior to categorization.
|
||||
|
||||
:::::
|
||||
|
||||
|
@ -90,7 +90,7 @@ The algorithm used for categorization was completely changed in version 8.3.0. A
|
|||
## Basic use [_basic_use]
|
||||
|
||||
::::{warning}
|
||||
Re-analyzing *large* result sets will require a lot of time and memory. This aggregation should be used in conjunction with [Async search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit). Additionally, you may consider using the aggregation as a child of either the [sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation. This will typically improve speed and memory use.
|
||||
Re-analyzing *large* result sets will require a lot of time and memory. This aggregation should be used in conjunction with [Async search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit). Additionally, you may consider using the aggregation as a child of either the [sampler](/reference/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation. This will typically improve speed and memory use.
|
||||
::::
|
||||
|
||||
|
|
@ -237,7 +237,7 @@ GET /_search
|
|||
}
|
||||
```
|
||||
|
||||
1. Supports expressive date [format pattern](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
|
||||
1. Supports expressive date [format pattern](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
|
||||
|
||||
|
||||
**Time Zone**
|
|
@ -12,7 +12,7 @@ A sibling pipeline aggregation which executes a correlation function on the conf
|
|||
## Parameters [bucket-correlation-agg-syntax]
|
||||
|
||||
`buckets_path`
|
||||
: (Required, string) Path to the buckets that contain one set of values to correlate. For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax).
|
||||
: (Required, string) Path to the buckets that contain one set of values to correlate. For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
|
||||
|
||||
`function`
|
||||
: (Required, object) The correlation function to execute.
|
||||
|
@ -76,7 +76,7 @@ A `bucket_correlation` aggregation looks like this in isolation:
|
|||
|
||||
## Example [bucket-correlation-agg-example]
|
||||
|
||||
The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
|
||||
The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
|
||||
|
||||
This example is only using the 10s percentiles.
|
||||
|
|
@ -12,7 +12,7 @@ A sibling pipeline aggregation which executes a two sample Kolmogorov–Smirnov
|
|||
## Parameters [bucket-count-ks-test-agg-syntax]
|
||||
|
||||
`buckets_path`
|
||||
: (Required, string) Path to the buckets that contain one set of values to correlate. Must be a `_count` path For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax).
|
||||
: (Required, string) Path to the buckets that contain one set of values to correlate. Must be a `_count` path For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
|
||||
|
||||
`alternative`
|
||||
: (Optional, list) A list of string values indicating which K-S test alternative to calculate. The valid values are: "greater", "less", "two_sided". This parameter is key for determining the K-S statistic used when calculating the K-S test. Default value is all possible alternative hypotheses.
|
||||
|
@ -46,7 +46,7 @@ A `bucket_count_ks_test` aggregation looks like this in isolation:
|
|||
|
||||
## Example [bucket-count-ks-test-agg-example]
|
||||
|
||||
The following snippet runs the `bucket_count_ks_test` on the individual terms in the field `version` against a uniform distribution. The uniform distribution reflects the `latency` percentile buckets. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
|
||||
The following snippet runs the `bucket_count_ks_test` on the individual terms in the field `version` against a uniform distribution. The uniform distribution reflects the `latency` percentile buckets. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
|
||||
|
||||
This example is only using the deciles of `latency`.
|
||||
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# Date histogram aggregation [search-aggregations-bucket-datehistogram-aggregation]
|
||||
|
||||
|
||||
This multi-bucket aggregation is similar to the normal [histogram](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md), but it can only be used with date or date range values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal `histogram` on dates as well. The main difference in the two APIs is that here the interval can be specified using date/time expressions. Time-based data requires special support because time-based intervals are not always a fixed length.
|
||||
This multi-bucket aggregation is similar to the normal [histogram](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md), but it can only be used with date or date range values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal `histogram` on dates as well. The main difference in the two APIs is that here the interval can be specified using date/time expressions. Time-based data requires special support because time-based intervals are not always a fixed length.
|
||||
|
||||
Like the histogram, values are rounded **down** into the closest bucket. For example, if the interval is a calendar day, `2020-01-03T07:00:01Z` is rounded to `2020-01-03T00:00:00Z`. Values are rounded as follows:
|
||||
|
||||
|
@ -236,7 +236,7 @@ POST /sales/_search?size=0
|
|||
}
|
||||
```
|
||||
|
||||
1. Supports expressive date [format pattern](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
|
||||
1. Supports expressive date [format pattern](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern)
|
||||
|
||||
|
||||
Response:
|
||||
|
@ -600,7 +600,7 @@ POST /sales/_search?size=0
|
|||
|
||||
## Parameters [date-histogram-params]
|
||||
|
||||
You can control the order of the returned buckets using the `order` settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds` setting, which enables extending the bounds of the histogram beyond the data itself, and `hard_bounds` that limits the histogram to specified bounds. For more information, see [`Extended Bounds`](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-extended-bounds) and [`Hard Bounds`](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-hard-bounds).
|
||||
You can control the order of the returned buckets using the `order` settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds` setting, which enables extending the bounds of the histogram beyond the data itself, and `hard_bounds` that limits the histogram to specified bounds. For more information, see [`Extended Bounds`](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-extended-bounds) and [`Hard Bounds`](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md#search-aggregations-bucket-histogram-aggregation-hard-bounds).
|
||||
|
||||
### Missing value [date-histogram-missing-value]
|
||||
|
||||
|
@ -629,7 +629,7 @@ POST /sales/_search?size=0
|
|||
|
||||
### Order [date-histogram-order]
|
||||
|
||||
By default the returned buckets are sorted by their `key` ascending, but you can control the order using the `order` setting. This setting supports the same `order` functionality as [`Terms Aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order).
|
||||
By default the returned buckets are sorted by their `key` ascending, but you can control the order using the `order` setting. This setting supports the same `order` functionality as [`Terms Aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order).
|
||||
|
||||
|
||||
### Using a script to aggregate by day of the week [date-histogram-aggregate-scripts]
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# Date range aggregation [search-aggregations-bucket-daterange-aggregation]
|
||||
|
||||
|
||||
A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal [range](/reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation is that the `from` and `to` values can be expressed in [Date Math](/reference/elasticsearch/rest-apis/common-options.md#date-math) expressions, and it is also possible to specify a date format by which the `from` and `to` response fields will be returned. Note that this aggregation includes the `from` value and excludes the `to` value for each range.
|
||||
A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal [range](/reference/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation is that the `from` and `to` values can be expressed in [Date Math](/reference/elasticsearch/rest-apis/common-options.md#date-math) expressions, and it is also possible to specify a date format by which the `from` and `to` response fields will be returned. Note that this aggregation includes the `from` value and excludes the `to` value for each range.
|
||||
|
||||
Example:
|
||||
|
|
@ -83,7 +83,7 @@ POST /sales/_search?size=0&filter_path=aggregations
|
|||
|
||||
## Use the `filters` aggregation for multiple filters [use-filters-agg-for-multiple-filters]
|
||||
|
||||
To group documents using multiple filters, use the [`filters` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-filters-aggregation.md). This is faster than multiple `filter` aggregations.
|
||||
To group documents using multiple filters, use the [`filters` aggregation](/reference/aggregations/search-aggregations-bucket-filters-aggregation.md). This is faster than multiple `filter` aggregations.
|
||||
|
||||
For example, use this:
|
||||
|
|
@ -178,7 +178,7 @@ The response would be something like the following:
|
|||
|
||||
## Non-keyed Response [non-keyed-response]
|
||||
|
||||
By default, the named filters aggregation returns the buckets as an object. But in some sorting cases, such as [bucket sort](/reference/data-analysis/aggregations/search-aggregations-pipeline-bucket-sort-aggregation.md), the JSON doesn’t guarantee the order of elements in the object. You can use the `keyed` parameter to specify the buckets as an array of objects. The value of this parameter can be as follows:
|
||||
By default, the named filters aggregation returns the buckets as an object. But in some sorting cases, such as [bucket sort](/reference/aggregations/search-aggregations-pipeline-bucket-sort-aggregation.md), the JSON doesn’t guarantee the order of elements in the object. You can use the `keyed` parameter to specify the buckets as an array of objects. The value of this parameter can be as follows:
|
||||
|
||||
`true`
|
||||
: (Default) Returns the buckets as an object
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# Geo-distance aggregation [search-aggregations-bucket-geodistance-aggregation]
|
||||
|
||||
|
||||
A multi-bucket aggregation that works on `geo_point` fields and conceptually works very similar to the [range](/reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluates the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket).
|
||||
A multi-bucket aggregation that works on `geo_point` fields and conceptually works very similar to the [range](/reference/aggregations/search-aggregations-bucket-range-aggregation.md) aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluates the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket).
|
||||
|
||||
$$$geodistance-aggregation-example$$$
|
||||
|
|
@ -286,7 +286,7 @@ The table below shows the metric dimensions for cells covered by various string
|
|||
|
||||
Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works just as it does for points, except that a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this:
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
## Options [_options_3]
|
|
@ -204,9 +204,9 @@ Response:
|
|||
Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works almost as it does for points. There are two key differences:
|
||||
|
||||
* When aggregating over `geo_point` data, points are considered within a hexagonal tile if they lie within the edges defined by great circles. In other words the calculation is done using spherical coordinates. However, when aggregating over `geo_shape` data, the shapes are considered within a hexagon if they lie within the edges defined as straight lines on an equirectangular projection. The reason is that Elasticsearch and Lucene treat edges using the equirectangular projection at index and search time. In order to ensure that search results and aggregation results are aligned, we therefore also use equirectangular projection in aggregations. For most data, the difference is subtle or not noticed. However, for low zoom levels (low precision), especially far from the equator, this can be noticeable. For example, if the same point data is indexed as `geo_point` and `geo_shape`, it is possible to get different results when aggregating at lower resolutions.
|
||||
* As is the case with [`geotile_grid`](/reference/data-analysis/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md#geotilegrid-aggregating-geo-shape), a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this:
|
||||
* As is the case with [`geotile_grid`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md#geotilegrid-aggregating-geo-shape), a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this:
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
## Options [_options_4]
|
|
@ -208,7 +208,7 @@ Response:
|
|||
|
||||
Aggregating on [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) fields works almost as it does for points, except that a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile. Below is an image that demonstrates this:
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
## Options [_options_5]
|
|
@ -156,7 +156,7 @@ POST /sales/_search?size=0
|
|||
}
|
||||
```
|
||||
|
||||
When aggregating ranges, buckets are based on the values of the returned documents. This means the response may include buckets outside of a query’s range. For example, if your query looks for values greater than 100, and you have a range covering 50 to 150, and an interval of 50, that document will land in 3 buckets - 50, 100, and 150. In general, it’s best to think of the query and aggregation steps as independent - the query selects a set of documents, and then the aggregation buckets those documents without regard to how they were selected. See [note on bucketing range fields](/reference/data-analysis/aggregations/search-aggregations-bucket-range-field-note.md) for more information and an example.
|
||||
When aggregating ranges, buckets are based on the values of the returned documents. This means the response may include buckets outside of a query’s range. For example, if your query looks for values greater than 100, and you have a range covering 50 to 150, and an interval of 50, that document will land in 3 buckets - 50, 100, and 150. In general, it’s best to think of the query and aggregation steps as independent - the query selects a set of documents, and then the aggregation buckets those documents without regard to how they were selected. See [note on bucketing range fields](/reference/aggregations/search-aggregations-bucket-range-field-note.md) for more information and an example.
|
||||
|
||||
$$$search-aggregations-bucket-histogram-aggregation-hard-bounds$$$
|
||||
The `hard_bounds` is a counterpart of `extended_bounds` and can limit the range of buckets in the histogram. It is particularly useful in the case of open [data ranges](/reference/elasticsearch/mapping-reference/range.md) that can result in a very large number of buckets.
|
||||
|
@ -191,7 +191,7 @@ In this example even though the range specified in the query is up to 500, the h
|
|||
|
||||
## Order [_order_2]
|
||||
|
||||
By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using the `order` setting. Supports the same `order` functionality as the [`Terms Aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order).
|
||||
By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using the `order` setting. Supports the same `order` functionality as the [`Terms Aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order).
|
||||
|
||||
|
||||
## Offset [_offset]
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# IP range aggregation [search-aggregations-bucket-iprange-aggregation]
|
||||
|
||||
|
||||
Just like the dedicated [date](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md) range aggregation, there is also a dedicated range aggregation for IP typed fields:
|
||||
Just like the dedicated [date](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md) range aggregation, there is also a dedicated range aggregation for IP typed fields:
|
||||
|
||||
Example:
|
||||
|
|
@ -7,9 +7,9 @@ mapped_pages:
|
|||
# Multi Terms aggregation [search-aggregations-bucket-multi-terms-aggregation]
|
||||
|
||||
|
||||
A multi-bucket value source based aggregation where buckets are dynamically built - one per unique set of values. The multi terms aggregation is very similar to the [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), however in most cases it will be slower than the terms aggregation and will consume more memory. Therefore, if the same set of fields is constantly used, it would be more efficient to index a combined key for this fields as a separate field and use the terms aggregation on this field.
|
||||
A multi-bucket value source based aggregation where buckets are dynamically built - one per unique set of values. The multi terms aggregation is very similar to the [`terms aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), however in most cases it will be slower than the terms aggregation and will consume more memory. Therefore, if the same set of fields is constantly used, it would be more efficient to index a combined key for this fields as a separate field and use the terms aggregation on this field.
|
||||
|
||||
The multi_term aggregations are the most useful when you need to sort by a number of document or a metric aggregation on a composite key and get top N results. If sorting is not required and all values are expected to be retrieved using nested terms aggregation or [`composite aggregations`](/reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md) will be a faster and more memory efficient solution.
|
||||
The multi_term aggregations are the most useful when you need to sort by a number of document or a metric aggregation on a composite key and get top N results. If sorting is not required and all values are expected to be retrieved using nested terms aggregation or [`composite aggregations`](/reference/aggregations/search-aggregations-bucket-composite-aggregation.md) will be a faster and more memory efficient solution.
|
||||
|
||||
Example:
|
||||
|
||||
|
@ -32,7 +32,7 @@ GET /products/_search
|
|||
}
|
||||
```
|
||||
|
||||
1. `multi_terms` aggregation can work with the same field types as a [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and supports most of the terms aggregation parameters.
|
||||
1. `multi_terms` aggregation can work with the same field types as a [`terms aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and supports most of the terms aggregation parameters.
|
||||
|
||||
|
||||
Response:
|
||||
|
@ -93,7 +93,7 @@ By default, the `multi_terms` aggregation will return the buckets for the top te
|
|||
|
||||
## Aggregation Parameters [search-aggregations-bucket-multi-terms-aggregation-parameters]
|
||||
|
||||
The following parameters are supported. See [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) for more detailed explanation of these parameters.
|
||||
The following parameters are supported. See [`terms aggregation`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) for more detailed explanation of these parameters.
|
||||
|
||||
size
|
||||
: Optional. Defines how many term buckets should be returned out of the overall terms list. Defaults to 10.
|
|
@ -104,7 +104,7 @@ Response:
|
|||
}
|
||||
```
|
||||
|
||||
You can use a [`filter`](/reference/data-analysis/aggregations/search-aggregations-bucket-filter-aggregation.md) sub-aggregation to return results for a specific reseller.
|
||||
You can use a [`filter`](/reference/aggregations/search-aggregations-bucket-filter-aggregation.md) sub-aggregation to return results for a specific reseller.
|
||||
|
||||
```console
|
||||
GET /products/_search?size=0
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# Rare terms aggregation [search-aggregations-bucket-rare-terms-aggregation]
|
||||
|
||||
|
||||
A multi-bucket value source based aggregation which finds "rare" terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation
|
||||
A multi-bucket value source based aggregation which finds "rare" terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation
|
||||
|
||||
## Syntax [_syntax_3]
|
||||
|
||||
|
@ -152,21 +152,21 @@ The X-axis shows the number of distinct values the aggregation has seen, and the
|
|||
|
||||
This first chart shows precision `0.01`:
|
||||
|
||||

|
||||

|
||||
|
||||
And precision `0.001` (the default):
|
||||
|
||||

|
||||

|
||||
|
||||
And finally `precision 0.0001`:
|
||||
|
||||

|
||||

|
||||
|
||||
The default precision of `0.001` maintains an accuracy of < 2.5% for the tested conditions, and accuracy slowly degrades in a controlled, linear fashion as the number of distinct values increases.
|
||||
|
||||
The default precision of `0.001` has a memory profile of `1.748⁻⁶ * n` bytes, where `n` is the number of distinct values the aggregation has seen (it can also be roughly eyeballed, e.g. 20 million unique values is about 30mb of memory). The memory usage is linear to the number of distinct values regardless of which precision is chosen, the precision only affects the slope of the memory profile as seen in this chart:
|
||||
|
||||

|
||||

|
||||
|
||||
For comparison, an equivalent terms aggregation at 20 million buckets would be roughly `20m * 69b == ~1.38gb` (with 69 bytes being a very optimistic estimate of an empty bucket cost, far lower than what the circuit breaker accounts for). So although the `rare_terms` agg is relatively heavy, it is still orders of magnitude smaller than the equivalent terms aggregation
|
||||
|
|
@ -524,13 +524,13 @@ Use of background filters will slow the query as each term’s postings must be
|
|||
|
||||
### Filtering Values [_filtering_values_2]
|
||||
|
||||
It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation.
|
||||
It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation.
|
||||
|
||||
|
||||
|
||||
## Collect mode [_collect_mode]
|
||||
|
||||
To avoid memory issues, the `significant_terms` aggregation always computes child aggregations in `breadth_first` mode. A description of the different collection modes can be found in the [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-collect) documentation.
|
||||
To avoid memory issues, the `significant_terms` aggregation always computes child aggregations in `breadth_first` mode. A description of the different collection modes can be found in the [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-collect) documentation.
|
||||
|
||||
|
||||
## Execution hint [_execution_hint_2]
|
|
@ -7,14 +7,14 @@ mapped_pages:
|
|||
# Significant text aggregation [search-aggregations-bucket-significanttext-aggregation]
|
||||
|
||||
|
||||
An aggregation that returns interesting or unusual occurrences of free-text terms in a set. It is like the [significant terms](/reference/data-analysis/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation but differs in that:
|
||||
An aggregation that returns interesting or unusual occurrences of free-text terms in a set. It is like the [significant terms](/reference/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation but differs in that:
|
||||
|
||||
* It is specifically designed for use on type `text` fields
|
||||
* It does not require field data or doc-values
|
||||
* It re-analyzes text content on-the-fly meaning it can also filter duplicate sections of noisy text that otherwise tend to skew statistics.
|
||||
|
||||
::::{warning}
|
||||
Re-analyzing *large* result sets will require a lot of time and memory. It is recommended that the significant_text aggregation is used as a child of either the [sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to limit the analysis to a *small* selection of top-matching documents e.g. 200. This will typically improve speed, memory use and quality of results.
|
||||
Re-analyzing *large* result sets will require a lot of time and memory. It is recommended that the significant_text aggregation is used as a child of either the [sampler](/reference/aggregations/search-aggregations-bucket-sampler-aggregation.md) or [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to limit the analysis to a *small* selection of top-matching documents e.g. 200. This will typically improve speed, memory use and quality of results.
|
||||
::::
|
||||
|
||||
|
||||
|
@ -257,7 +257,7 @@ The results from analysing our deduplicated text are obviously of higher quality
|
|||
|
||||
Mr Pozmantier and other one-off associations with elasticsearch no longer appear in the aggregation results as a consequence of copy-and-paste operations or other forms of mechanical repetition.
|
||||
|
||||
If your duplicate or near-duplicate content is identifiable via a single-value indexed field (perhaps a hash of the article’s `title` text or an `original_press_release_url` field) then it would be more efficient to use a parent [diversified sampler](/reference/data-analysis/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to eliminate these documents from the sample set based on that single key. The less duplicate content you can feed into the significant_text aggregation up front the better in terms of performance.
|
||||
If your duplicate or near-duplicate content is identifiable via a single-value indexed field (perhaps a hash of the article’s `title` text or an `original_press_release_url` field) then it would be more efficient to use a parent [diversified sampler](/reference/aggregations/search-aggregations-bucket-diversified-sampler-aggregation.md) aggregation to eliminate these documents from the sample set based on that single key. The less duplicate content you can feed into the significant_text aggregation up front the better in terms of performance.
|
||||
|
||||
::::{admonition} How are the significance scores calculated?
|
||||
The numbers returned for scores are primarily intended for ranking different suggestions sensibly rather than something easily understood by end users. The scores are derived from the doc frequencies in *foreground* and *background* sets. In brief, a term is considered significant if there is a noticeable difference in the frequency in which a term appears in the subset and in the background. The way the terms are ranked can be configured, see "Parameters" section.
|
||||
|
@ -306,7 +306,7 @@ Like most design decisions, this is the basis of a trade-off in which we have ch
|
|||
|
||||
### Significance heuristics [_significance_heuristics]
|
||||
|
||||
This aggregation supports the same scoring heuristics (JLH, mutual_information, gnd, chi_square etc) as the [significant terms](/reference/data-analysis/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation
|
||||
This aggregation supports the same scoring heuristics (JLH, mutual_information, gnd, chi_square etc) as the [significant terms](/reference/aggregations/search-aggregations-bucket-significantterms-aggregation.md) aggregation
|
||||
|
||||
|
||||
### Size & Shard Size [sig-text-shard-size]
|
||||
|
@ -403,7 +403,7 @@ GET news/_search
|
|||
|
||||
### Filtering Values [_filtering_values_3]
|
||||
|
||||
It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation.
|
||||
It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and `exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features described in the [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) documentation.
|
||||
|
||||
|
||||
|
|
@ -69,7 +69,7 @@ By default, you cannot run a `terms` aggregation on a `text` field. Use a `keywo
|
|||
|
||||
By default, the `terms` aggregation returns the top ten terms with the most documents. Use the `size` parameter to return more terms, up to the [search.max_buckets](/reference/elasticsearch/configuration-reference/search-settings.md#search-settings-max-buckets) limit.
|
||||
|
||||
If your data contains 100 or 1000 unique terms, you can increase the `size` of the `terms` aggregation to return them all. If you have more unique terms and you need them all, use the [composite aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md) instead.
|
||||
If your data contains 100 or 1000 unique terms, you can increase the `size` of the `terms` aggregation to return them all. If you have more unique terms and you need them all, use the [composite aggregation](/reference/aggregations/search-aggregations-bucket-composite-aggregation.md) instead.
|
||||
|
||||
Larger values of `size` use more memory to compute and, push the whole aggregation close to the `max_buckets` limit. You’ll know you’ve gone too large if the request fails with a message about `max_buckets`.
|
||||
|
||||
|
@ -133,7 +133,7 @@ By default, the `terms` aggregation orders terms by descending document `_count`
|
|||
You can use the `order` parameter to specify a different sort order, but we don’t recommend it. It is extremely easy to create a terms ordering that will just return wrong results, and not obvious to see when you have done so. Change this only with caution.
|
||||
|
||||
::::{warning}
|
||||
Especially avoid using `"order": { "_count": "asc" }`. If you need to find rare terms, use the [`rare_terms`](/reference/data-analysis/aggregations/search-aggregations-bucket-rare-terms-aggregation.md) aggregation instead. Due to the way the `terms` aggregation [gets terms from shards](#search-aggregations-bucket-terms-aggregation-shard-size), sorting by ascending doc count often produces inaccurate results.
|
||||
Especially avoid using `"order": { "_count": "asc" }`. If you need to find rare terms, use the [`rare_terms`](/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md) aggregation instead. Due to the way the `terms` aggregation [gets terms from shards](#search-aggregations-bucket-terms-aggregation-shard-size), sorting by ascending doc count often produces inaccurate results.
|
||||
::::
|
||||
|
||||
|
||||
|
@ -216,7 +216,7 @@ GET /_search
|
|||
::::{admonition} Pipeline aggs cannot be used for sorting
|
||||
:class: note
|
||||
|
||||
[Pipeline aggregations](/reference/data-analysis/aggregations/pipeline.md) are run during the reduce phase after all other aggregations have already completed. For this reason, they cannot be used for ordering.
|
||||
[Pipeline aggregations](/reference/aggregations/pipeline.md) are run during the reduce phase after all other aggregations have already completed. For this reason, they cannot be used for ordering.
|
||||
|
||||
::::
|
||||
|
||||
|
@ -548,7 +548,7 @@ There are three approaches that you can use to perform a `terms` agg across mult
|
|||
[`copy_to` field](/reference/elasticsearch/mapping-reference/copy-to.md)
|
||||
: If you know ahead of time that you want to collect the terms from two or more fields, then use `copy_to` in your mapping to create a new dedicated field at index time which contains the values from both fields. You can aggregate on this single field, which will benefit from the global ordinals optimization.
|
||||
|
||||
[`multi_terms` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md)
|
||||
[`multi_terms` aggregation](/reference/aggregations/search-aggregations-bucket-multi-terms-aggregation.md)
|
||||
: Use multi_terms aggregation to combine terms from multiple fields into a compound key. This also disables the global ordinals and will be slower than collecting terms from a single field. It is faster but less flexible than using a script.
|
||||
|
||||
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# Variable width histogram aggregation [search-aggregations-bucket-variablewidthhistogram-aggregation]
|
||||
|
||||
|
||||
This is a multi-bucket aggregation similar to [Histogram](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md). However, the width of each bucket is not specified. Rather, a target number of buckets is provided and bucket intervals are dynamically determined based on the document distribution. This is done using a simple one-pass document clustering algorithm that aims to obtain low distances between bucket centroids. Unlike other multi-bucket aggregations, the intervals will not necessarily have a uniform width.
|
||||
This is a multi-bucket aggregation similar to [Histogram](/reference/aggregations/search-aggregations-bucket-histogram-aggregation.md). However, the width of each bucket is not specified. Rather, a target number of buckets is provided and bucket intervals are dynamically determined based on the document distribution. This is done using a simple one-pass document clustering algorithm that aims to obtain low distances between bucket centroids. Unlike other multi-bucket aggregations, the intervals will not necessarily have a uniform width.
|
||||
|
||||
::::{tip}
|
||||
The number of buckets returned will always be less than or equal to the target number.
|
|
@ -22,7 +22,7 @@ It is recommended to use the change point aggregation to detect changes in time-
|
|||
## Parameters [change-point-agg-syntax]
|
||||
|
||||
`buckets_path`
|
||||
: (Required, string) Path to the buckets that contain one set of values in which to detect a change point. There must be at least 22 bucketed values. Fewer than 1,000 is preferred. For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax).
|
||||
: (Required, string) Path to the buckets that contain one set of values in which to detect a change point. There must be at least 22 bucketed values. Fewer than 1,000 is preferred. For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
|
||||
|
||||
|
||||
## Syntax [_syntax_11]
|
|
@ -56,7 +56,7 @@ POST /museums/_search?size=0
|
|||
|
||||
|
||||
::::{note}
|
||||
Unlike the case with the [`geo_bounds`](/reference/data-analysis/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geobounds-aggregation-geo-shape) aggregation, there is no option to set [`wrap_longitude`](/reference/data-analysis/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geo-bounds-wrap-longitude). This is because the cartesian space is euclidean and does not wrap back on itself. So the bounds will always have a minimum x value less than or equal to the maximum x value.
|
||||
Unlike the case with the [`geo_bounds`](/reference/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geobounds-aggregation-geo-shape) aggregation, there is no option to set [`wrap_longitude`](/reference/aggregations/search-aggregations-metrics-geobounds-aggregation.md#geo-bounds-wrap-longitude). This is because the cartesian space is euclidean and does not wrap back on itself. So the bounds will always have a minimum x value less than or equal to the maximum x value.
|
||||
::::
|
||||
|
||||
|
|
@ -91,7 +91,7 @@ POST /museums/_search?size=0
|
|||
}
|
||||
```
|
||||
|
||||
The above example uses `cartesian_centroid` as a sub-aggregation to a [terms](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city.
|
||||
The above example uses `cartesian_centroid` as a sub-aggregation to a [terms](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city.
|
||||
|
||||
The response for the above aggregation:
|
||||
|
||||
|
@ -145,7 +145,7 @@ The response for the above aggregation:
|
|||
|
||||
## Cartesian Centroid Aggregation on `shape` fields [cartesian-centroid-aggregation-geo-shape]
|
||||
|
||||
The centroid metric for shapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes consisting of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shape’s centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/ingestion-tools/enrich-processor/ingest-circle-processor.md) are treated as polygons.
|
||||
The centroid metric for shapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes consisting of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shape’s centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/enrich-processor/ingest-circle-processor.md) are treated as polygons.
|
||||
|
||||
| Geometry Type | Centroid Calculation |
|
||||
| --- | --- |
|
|
@ -9,7 +9,7 @@ mapped_pages:
|
|||
|
||||
A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents.
|
||||
|
||||
The `extended_stats` aggregations is an extended version of the [`stats`](/reference/data-analysis/aggregations/search-aggregations-metrics-stats-aggregation.md) aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
|
||||
The `extended_stats` aggregations is an extended version of the [`stats`](/reference/aggregations/search-aggregations-metrics-stats-aggregation.md) aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
|
||||
|
||||
Assuming the data consists of documents representing exams grades (between 0 and 100) of students
|
||||
|
|
@ -77,7 +77,7 @@ The resulting [GeoJSON Feature](https://tools.ietf.org/html/rfc7946#section-3.2)
|
|||
|
||||
This result could be displayed in a map user interface:
|
||||
|
||||

|
||||

|
||||
|
||||
## Options [search-aggregations-metrics-geo-line-options]
|
||||
|
||||
|
@ -183,7 +183,7 @@ POST /tour/_bulk?refresh
|
|||
|
||||
## Grouping with terms [search-aggregations-metrics-geo-line-grouping-terms]
|
||||
|
||||
Using this data, for a non-time-series use case, the grouping can be done using a [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) based on city name. This would work whether or not we had defined the `tour` index as a time series index.
|
||||
Using this data, for a non-time-series use case, the grouping can be done using a [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) based on city name. This would work whether or not we had defined the `tour` index as a time series index.
|
||||
|
||||
$$$search-aggregations-metrics-geo-line-terms$$$
|
||||
|
||||
|
@ -273,7 +273,7 @@ This functionality is in technical preview and may be changed or removed in a fu
|
|||
::::
|
||||
|
||||
|
||||
Using the same data as before, we can also perform the grouping with a [`time_series` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md). This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`, in this case the same `city` field used in the previous [terms aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md). This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`.
|
||||
Using the same data as before, we can also perform the grouping with a [`time_series` aggregation](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md). This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`, in this case the same `city` field used in the previous [terms aggregation](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md). This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`.
|
||||
|
||||
$$$search-aggregations-metrics-geo-line-time-series$$$
|
||||
|
||||
|
@ -296,7 +296,7 @@ POST /tour/_search?filter_path=aggregations
|
|||
```
|
||||
|
||||
::::{note}
|
||||
The `geo_line` aggregation no longer requires the `sort` field when nested within a [`time_series` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md). This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by. If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
|
||||
The `geo_line` aggregation no longer requires the `sort` field when nested within a [`time_series` aggregation](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md). This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by. If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
|
||||
::::
|
||||
|
||||
|
||||
|
@ -366,7 +366,7 @@ These results are essentially the same as with the previous `terms` aggregation
|
|||
|
||||
## Why group with time-series? [search-aggregations-metrics-geo-line-grouping-time-series-advantages]
|
||||
|
||||
When reviewing these examples, you might think that there is little difference between using [`terms`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) or [`time_series`](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md) to group the geo-lines. However, there are some important differences in behaviour between the two cases. Time series indexes are stored in a very specific order on disk. They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field. This allows the `geo_line` aggregation to be considerably optimized:
|
||||
When reviewing these examples, you might think that there is little difference between using [`terms`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) or [`time_series`](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md) to group the geo-lines. However, there are some important differences in behaviour between the two cases. Time series indexes are stored in a very specific order on disk. They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field. This allows the `geo_line` aggregation to be considerably optimized:
|
||||
|
||||
* The same memory allocated for the first bucket can be re-used over and over for all subsequent buckets. This is substantially less memory than required for non-time-series cases where all buckets are collected concurrently.
|
||||
* No sorting needs to be done, since the data is pre-sorted by `@timestamp`. The time-series data will naturally arrive at the aggregation collector in `DESC` order. This means that if we specify `sort_order:ASC` (the default), we still collect in `DESC` order, but perform an efficient in-memory reverse order before generating the final `LineString` geometry.
|
||||
|
@ -377,19 +377,19 @@ Note: There are other significant advantages to working with time-series data an
|
|||
|
||||
## Streaming line simplification [search-aggregations-metrics-geo-line-simplification]
|
||||
|
||||
Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the entire geometry to be maintained in memory together with supporting data for the simplification itself. The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting is required, which is the case when grouping is done by the [`time_series` aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-time-series-aggregation.md), running on an index with the `time_series` index mode.
|
||||
Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the entire geometry to be maintained in memory together with supporting data for the simplification itself. The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting is required, which is the case when grouping is done by the [`time_series` aggregation](/reference/aggregations/search-aggregations-bucket-time-series-aggregation.md), running on an index with the `time_series` index mode.
|
||||
|
||||
Under these conditions the `geo_line` aggregation allocates memory to the `size` specified, and then fills that memory with the incoming documents. Once the memory is completely filled, documents from within the line are removed as new documents are added. The choice of document to remove is made to minimize the visual impact on the geometry. This process makes use of the [Visvalingam–Whyatt algorithm](https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm). Essentially this means points are removed if they have the minimum triangle area, with the triangle defined by the point under consideration and the two points before and after it in the line. In addition, we calculate the area using spherical coordinates so that no planar distortions affect the choice.
|
||||
|
||||
In order to demonstrate how much better line simplification is to line truncation, consider this example of the north shore of Kodiak Island. The data for this is only 209 points, but if we want to set `size` to `100` we get dramatic truncation.
|
||||
|
||||

|
||||

|
||||
|
||||
The grey line is the entire geometry of 209 points, while the blue line is the first 100 points, a very different geometry than the original.
|
||||
|
||||
Now consider the same geometry simplified to 100 points.
|
||||
|
||||

|
||||

|
||||
|
||||
For comparison we have shown the original in grey, the truncated in blue and the new simplified geometry in magenta. It is possible to see where the new simplified line deviates from the original, but the overall geometry appears almost identical and is still clearly recognizable as the north shore of Kodiak Island.
|
||||
|
|
@ -91,7 +91,7 @@ POST /museums/_search?size=0
|
|||
}
|
||||
```
|
||||
|
||||
The above example uses `geo_centroid` as a sub-aggregation to a [terms](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city.
|
||||
The above example uses `geo_centroid` as a sub-aggregation to a [terms](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) bucket aggregation for finding the central location for museums in each city.
|
||||
|
||||
The response for the above aggregation:
|
||||
|
||||
|
@ -145,7 +145,7 @@ The response for the above aggregation:
|
|||
|
||||
## Geo Centroid Aggregation on `geo_shape` fields [geocentroid-aggregation-geo-shape]
|
||||
|
||||
The centroid metric for geoshapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes comprising of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shape’s centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/ingestion-tools/enrich-processor/ingest-circle-processor.md) are treated as polygons.
|
||||
The centroid metric for geoshapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes comprising of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shape’s centroid is calculated differently. Envelopes and circles ingested via the [Circle](/reference/enrich-processor/ingest-circle-processor.md) are treated as polygons.
|
||||
|
||||
| Geometry Type | Centroid Calculation |
|
||||
| --- | --- |
|
||||
|
@ -204,7 +204,7 @@ POST /places/_search?size=0
|
|||
::::{admonition} Using `geo_centroid` as a sub-aggregation of `geohash_grid`
|
||||
:class: warning
|
||||
|
||||
The [`geohash_grid`](/reference/data-analysis/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md) aggregation places documents, not individual geopoints, into buckets. If a document’s `geo_point` field contains [multiple values](/reference/elasticsearch/mapping-reference/array.md), the document could be assigned to multiple buckets, even if one or more of its geopoints are outside the bucket boundaries.
|
||||
The [`geohash_grid`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md) aggregation places documents, not individual geopoints, into buckets. If a document’s `geo_point` field contains [multiple values](/reference/elasticsearch/mapping-reference/array.md), the document could be assigned to multiple buckets, even if one or more of its geopoints are outside the bucket boundaries.
|
||||
|
||||
If a `geocentroid` sub-aggregation is also used, each centroid is calculated using all geopoints in a bucket, including those outside the bucket boundaries. This can result in centroids outside of bucket boundaries.
|
||||
|
|
@ -60,9 +60,9 @@ The resulting median absolute deviation of `2` tells us that there is a fair amo
|
|||
|
||||
## Approximation [_approximation]
|
||||
|
||||
The naive implementation of calculating median absolute deviation stores the entire sample in memory, so this aggregation instead calculates an approximation. It uses the [TDigest data structure](https://github.com/tdunning/t-digest) to approximate the sample median and the median of deviations from the sample median. For more about the approximation characteristics of TDigests, see [Percentiles are (usually) approximate](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation).
|
||||
The naive implementation of calculating median absolute deviation stores the entire sample in memory, so this aggregation instead calculates an approximation. It uses the [TDigest data structure](https://github.com/tdunning/t-digest) to approximate the sample median and the median of deviations from the sample median. For more about the approximation characteristics of TDigests, see [Percentiles are (usually) approximate](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation).
|
||||
|
||||
The tradeoff between resource usage and accuracy of a TDigest’s quantile approximation, and therefore the accuracy of this aggregation’s approximation of median absolute deviation, is controlled by the `compression` parameter. A higher `compression` setting provides a more accurate approximation at the cost of higher memory usage. For more about the characteristics of the TDigest `compression` parameter see [Compression](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression).
|
||||
The tradeoff between resource usage and accuracy of a TDigest’s quantile approximation, and therefore the accuracy of this aggregation’s approximation of median absolute deviation, is controlled by the `compression` parameter. A higher `compression` setting provides a more accurate approximation at the cost of higher memory usage. For more about the characteristics of the TDigest `compression` parameter see [Compression](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression).
|
||||
|
||||
```console
|
||||
GET reviews/_search
|
|
@ -175,7 +175,7 @@ GET latency/_search
|
|||
|
||||
## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation]
|
||||
|
||||
:::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
|
||||
:::{include} /reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
|
||||
:::
|
||||
|
||||
::::{warning}
|
|
@ -10,7 +10,7 @@ mapped_pages:
|
|||
A `multi-value` metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents. These values can be extracted from specific numeric or [histogram fields](/reference/elasticsearch/mapping-reference/histogram.md) in the documents.
|
||||
|
||||
::::{note}
|
||||
Please see [Percentiles are (usually) approximate](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation), [Compression](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression) and [Execution hint](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-execution-hint) for advice regarding approximation, performance and memory use of the percentile ranks aggregation
|
||||
Please see [Percentiles are (usually) approximate](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-approximation), [Compression](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-compression) and [Execution hint](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md#search-aggregations-metrics-percentile-aggregation-execution-hint) for advice regarding approximation, performance and memory use of the percentile ranks aggregation
|
||||
|
||||
::::
|
||||
|
|
@ -375,7 +375,7 @@ By default `sum` mode is used.
|
|||
|
||||
## Relationship between bucket sizes and rate [_relationship_between_bucket_sizes_and_rate]
|
||||
|
||||
The `rate` aggregation supports all rate that can be used [calendar_intervals parameter](/reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_intervals) of `date_histogram` aggregation. The specified rate should compatible with the `date_histogram` aggregation interval, i.e. it should be possible to convert the bucket size into the rate. By default the interval of the `date_histogram` is used.
|
||||
The `rate` aggregation supports all rate that can be used [calendar_intervals parameter](/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#calendar_intervals) of `date_histogram` aggregation. The specified rate should compatible with the `date_histogram` aggregation interval, i.e. it should be possible to convert the bucket size into the rate. By default the interval of the `date_histogram` is used.
|
||||
|
||||
`"rate": "second"`
|
||||
: compatible with all intervals
|
|
@ -39,7 +39,7 @@ The top_hits aggregation returns regular search hits, because of this many per h
|
|||
* [Include Sequence Numbers and Primary Terms](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search)
|
||||
|
||||
::::{important}
|
||||
If you **only** need `docvalue_fields`, `size`, and `sort` then [Top metrics](/reference/data-analysis/aggregations/search-aggregations-metrics-top-metrics.md) might be a more efficient choice than the Top Hits Aggregation.
|
||||
If you **only** need `docvalue_fields`, `size`, and `sort` then [Top metrics](/reference/aggregations/search-aggregations-metrics-top-metrics.md) might be a more efficient choice than the Top Hits Aggregation.
|
||||
::::
|
||||
|
||||
|
|
@ -44,7 +44,7 @@ Which returns:
|
|||
}
|
||||
```
|
||||
|
||||
`top_metrics` is fairly similar to [`top_hits`](/reference/data-analysis/aggregations/search-aggregations-metrics-top-hits-aggregation.md) in spirit but because it is more limited it is able to do its job using less memory and is often faster.
|
||||
`top_metrics` is fairly similar to [`top_hits`](/reference/aggregations/search-aggregations-metrics-top-hits-aggregation.md) in spirit but because it is more limited it is able to do its job using less memory and is often faster.
|
||||
|
||||
## `sort` [_sort]
|
||||
|
||||
|
@ -268,7 +268,7 @@ If `size` is more than `1` the `top_metrics` aggregation can’t be the **target
|
|||
|
||||
### Use with terms [search-aggregations-metrics-top-metrics-example-terms]
|
||||
|
||||
This aggregation should be quite useful inside of [`terms`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregation, to, say, find the last value reported by each server.
|
||||
This aggregation should be quite useful inside of [`terms`](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregation, to, say, find the last value reported by each server.
|
||||
|
||||
$$$search-aggregations-metrics-top-metrics-terms$$$
|
||||
|
|
@ -23,10 +23,10 @@ A sibling pipeline aggregation which calculates the mean value of a specified me
|
|||
## Parameters [avg-bucket-params]
|
||||
|
||||
`buckets_path`
|
||||
: (Required, string) Path to the buckets to average. For syntax, see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax).
|
||||
: (Required, string) Path to the buckets to average. For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
|
||||
|
||||
`gap_policy`
|
||||
: (Optional, string) Policy to apply when gaps are found in the data. For valid values, see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy). Defaults to `skip`.
|
||||
: (Optional, string) Policy to apply when gaps are found in the data. For valid values, see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy). Defaults to `skip`.
|
||||
|
||||
`format`
|
||||
: (Optional, string) [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for the output value. If specified, the formatted value is returned in the aggregation’s `value_as_string` property.
|
|
@ -33,8 +33,8 @@ $$$bucket-script-params$$$
|
|||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `script` | The script to run for this aggregation. The script can be inline, file or indexed. (see [Scripting](docs-content://explore-analyze/scripting.md)for more details) | Required | |
|
||||
| `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
||||
The following snippet calculates the ratio percentage of t-shirt sales compared to total sales each month:
|
|
@ -38,8 +38,8 @@ $$$bucket-selector-params$$$
|
|||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `script` | The script to run for this aggregation. The script can be inline, file or indexed. (see [Scripting](docs-content://explore-analyze/scripting.md)for more details) | Required | |
|
||||
| `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | A map of script variables and their associated path to the buckets we wish to use for the variable(see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
|
||||
The following snippet only retains buckets where the total sales for the month is more than 200:
|
||||
|
|
@ -42,7 +42,7 @@ $$$bucket-sort-params$$$
|
|||
| `sort` | The list of fields to sort on. See [`sort`](/reference/elasticsearch/rest-apis/sort-search-results.md) for more details. | Optional | |
|
||||
| `from` | Buckets in positions prior to the set value will be truncated. | Optional | `0` |
|
||||
| `size` | The number of buckets to return. Defaults to all buckets of the parent aggregation. | Optional | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
|
||||
The following snippet returns the buckets corresponding to the 3 months with the highest total sales in descending order:
|
||||
|
|
@ -27,7 +27,7 @@ $$$cumulative-cardinality-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the cardinality aggregation we wish to find the cumulative cardinality for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `buckets_path` | The path to the cardinality aggregation we wish to find the cumulative cardinality for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
||||
The following snippet calculates the cumulative cardinality of the total daily `users`:
|
|
@ -25,7 +25,7 @@ $$$cumulative-sum-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to find the cumulative sum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `buckets_path` | The path to the buckets we wish to find the cumulative sum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
||||
The following snippet calculates the cumulative sum of the total monthly `sales`:
|
|
@ -23,8 +23,8 @@ $$$derivative-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to find the derivative for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | The path to the buckets we wish to find the derivative for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
||||
|
|
@ -27,8 +27,8 @@ $$$extended-stats-bucket-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
| `sigma` | The number of standard deviations above/below the mean to display | Optional | 2 |
|
||||
|
|
@ -43,7 +43,7 @@ $$$inference-bucket-params$$$
|
|||
| --- | --- | --- | --- |
|
||||
| `model_id` | The ID or alias for the trained model. | Required | - |
|
||||
| `inference_config` | Contains the inference type and its options. There are two types: [`regression`](#inference-agg-regression-opt) and [`classification`](#inference-agg-classification-opt) | Optional | - |
|
||||
| `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | - |
|
||||
| `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | - |
|
||||
|
||||
|
||||
## Configuration options for {{infer}} models [_configuration_options_for_infer_models]
|
|
@ -25,8 +25,8 @@ $$$max-bucket-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to find the maximum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | The path to the buckets we wish to find the maximum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
||||
The following snippet calculates the maximum of the total monthly `sales`:
|
|
@ -25,8 +25,8 @@ $$$min-bucket-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to find the minimum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | The path to the buckets we wish to find the minimum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
||||
The following snippet calculates the minimum of the total monthly `sales`:
|
|
@ -27,10 +27,10 @@ $$$moving-fn-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
|
||||
| `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
|
||||
| `window` | The size of window to "slide" across the histogram. | Required | |
|
||||
| `script` | The script that should be executed on each window of data | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data. See [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy). | Optional | `skip` |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data. See [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy). | Optional | `skip` |
|
||||
| `shift` | [Shift](#shift-parameter) of window position. | Optional | 0 |
|
||||
|
||||
`moving_fn` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be embedded like any other metric aggregation:
|
||||
|
@ -67,7 +67,7 @@ POST /_search
|
|||
3. Finally, we specify a `moving_fn` aggregation which uses "the_sum" metric as its input.
|
||||
|
||||
|
||||
Moving averages are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add numeric metrics, such as a `sum`, inside of that histogram. Finally, the `moving_fn` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`.
|
||||
Moving averages are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add numeric metrics, such as a `sum`, inside of that histogram. Finally, the `moving_fn` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`.
|
||||
|
||||
An example response from the above aggregation may look like:
|
||||
|
|
@ -7,9 +7,9 @@ mapped_pages:
|
|||
# Moving percentiles aggregation [search-aggregations-pipeline-moving-percentiles-aggregation]
|
||||
|
||||
|
||||
Given an ordered series of [percentiles](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md), the Moving Percentile aggregation will slide a window across those percentiles and allow the user to compute the cumulative percentile.
|
||||
Given an ordered series of [percentiles](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md), the Moving Percentile aggregation will slide a window across those percentiles and allow the user to compute the cumulative percentile.
|
||||
|
||||
This is conceptually very similar to the [Moving Function](/reference/data-analysis/aggregations/search-aggregations-pipeline-movfn-aggregation.md) pipeline aggregation, except it works on the percentiles sketches instead of the actual buckets values.
|
||||
This is conceptually very similar to the [Moving Function](/reference/aggregations/search-aggregations-pipeline-movfn-aggregation.md) pipeline aggregation, except it works on the percentiles sketches instead of the actual buckets values.
|
||||
|
||||
## Syntax [_syntax_19]
|
||||
|
||||
|
@ -28,9 +28,9 @@ $$$moving-percentiles-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | Path to the percentile of interest (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
|
||||
| `buckets_path` | Path to the percentile of interest (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
|
||||
| `window` | The size of window to "slide" across the histogram. | Required | |
|
||||
| `shift` | [Shift](/reference/data-analysis/aggregations/search-aggregations-pipeline-movfn-aggregation.md#shift-parameter) of window position. | Optional | 0 |
|
||||
| `shift` | [Shift](/reference/aggregations/search-aggregations-pipeline-movfn-aggregation.md#shift-parameter) of window position. | Optional | 0 |
|
||||
|
||||
`moving_percentiles` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be embedded like any other metric aggregation:
|
||||
|
||||
|
@ -68,7 +68,7 @@ POST /_search
|
|||
3. Finally, we specify a `moving_percentiles` aggregation which uses "the_percentile" sketch as its input.
|
||||
|
||||
|
||||
Moving percentiles are built by first specifying a `histogram` or `date_histogram` over a field. You then add a percentile metric inside of that histogram. Finally, the `moving_percentiles` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at the percentiles aggregation inside of the histogram (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`).
|
||||
Moving percentiles are built by first specifying a `histogram` or `date_histogram` over a field. You then add a percentile metric inside of that histogram. Finally, the `moving_percentiles` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at the percentiles aggregation inside of the histogram (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`).
|
||||
|
||||
And the following may be the response:
|
||||
|
||||
|
@ -132,7 +132,7 @@ And the following may be the response:
|
|||
}
|
||||
```
|
||||
|
||||
The output format of the `moving_percentiles` aggregation is inherited from the format of the referenced [`percentiles`](/reference/data-analysis/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
|
||||
The output format of the `moving_percentiles` aggregation is inherited from the format of the referenced [`percentiles`](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
|
||||
|
||||
Moving percentiles pipeline aggregations always run with `skip` gap policy.
|
||||
|
|
@ -7,7 +7,7 @@ mapped_pages:
|
|||
# Normalize aggregation [search-aggregations-pipeline-normalize-aggregation]
|
||||
|
||||
|
||||
A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value. Values that cannot be normalized, will be skipped using the [skip gap policy](/reference/data-analysis/aggregations/pipeline.md#gap-policy).
|
||||
A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value. Values that cannot be normalized, will be skipped using the [skip gap policy](/reference/aggregations/pipeline.md#gap-policy).
|
||||
|
||||
## Syntax [_syntax_20]
|
||||
|
||||
|
@ -26,7 +26,7 @@ $$$normalize_pipeline-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to normalize (see [`buckets_path` syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `buckets_path` | The path to the buckets we wish to normalize (see [`buckets_path` syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `method` | The specific [method](#normalize_pipeline-method) to apply | Required | |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
|
@ -25,8 +25,8 @@ $$$percentiles-bucket-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to find the percentiles for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | The path to the buckets we wish to find the percentiles for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
| `percents` | The list of percentiles to calculate | Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]` |
|
||||
| `keyed` | Flag which returns the range as an hash instead of an array of key-value pairs | Optional | `true` |
|
|
@ -15,7 +15,7 @@ Single periods are also useful for transforming data into a stationary series. I
|
|||
|
||||
By calculating the first-difference, we de-trend the data (e.g. remove a constant, linear trend). We can see that the data becomes a stationary series (e.g. the first difference is randomly distributed around zero, and doesn’t seem to exhibit any pattern/behavior). The transformation reveals that the dataset is following a random-walk; the value is the previous value +/- a random amount. This insight allows selection of further tools for analysis.
|
||||
|
||||
:::{image} ../../../images/dow.png
|
||||
:::{image} images/dow.png
|
||||
:alt: dow
|
||||
:title: Dow Jones plotted and made stationary with first-differencing
|
||||
:name: serialdiff_dow
|
||||
|
@ -25,7 +25,7 @@ Larger periods can be used to remove seasonal / cyclic behavior. In this example
|
|||
|
||||
The first-difference removes the constant trend, leaving just a sine wave. The 30th-difference is then applied to the first-difference to remove the cyclic behavior, leaving a stationary series which is amenable to other analysis.
|
||||
|
||||
:::{image} ../../../images/lemmings.png
|
||||
:::{image} images/lemmings.png
|
||||
:alt: lemmings
|
||||
:title: Lemmings data plotted made stationary with 1st and 30th difference
|
||||
:name: serialdiff_lemmings
|
||||
|
@ -48,7 +48,7 @@ $$$serial-diff-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
|
||||
| `buckets_path` | Path to the metric of interest (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | |
|
||||
| `lag` | The historical bucket to subtract from the current value. E.g. a lag of 7 will subtract the current value from the value 7 buckets ago. Must be a positive, non-zero integer | Optional | `1` |
|
||||
| `gap_policy` | Determines what should happen when a gap in the data is encountered. | Optional | `insert_zeros` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
@ -88,6 +88,6 @@ POST /_search
|
|||
3. Finally, we specify a `serial_diff` aggregation which uses "the_sum" metric as its input.
|
||||
|
||||
|
||||
Serial differences are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add normal metrics, such as a `sum`, inside of that histogram. Finally, the `serial_diff` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`.
|
||||
Serial differences are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally add normal metrics, such as a `sum`, inside of that histogram. Finally, the `serial_diff` is embedded inside the histogram. The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for a description of the syntax for `buckets_path`.
|
||||
|
||||
|
|
@ -25,8 +25,8 @@ $$$stats-bucket-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | The path to the buckets we wish to calculate stats for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property | Optional | `null` |
|
||||
|
||||
The following snippet calculates the stats for monthly `sales`:
|
|
@ -25,8 +25,8 @@ $$$sum-bucket-params$$$
|
|||
|
||||
| Parameter Name | Description | Required | Default Value |
|
||||
| --- | --- | --- | --- |
|
||||
| `buckets_path` | The path to the buckets we wish to find the sum for (see [`buckets_path` Syntax](/reference/data-analysis/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/data-analysis/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `buckets_path` | The path to the buckets we wish to find the sum for (see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details) | Required | |
|
||||
| `gap_policy` | The policy to apply when gaps are found in the data (see [Dealing with gaps in the data](/reference/aggregations/pipeline.md#gap-policy) for more details) | Optional | `skip` |
|
||||
| `format` | [DecimalFormat pattern](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/DecimalFormat.html) for theoutput value. If specified, the formatted value is returned in the aggregation’s`value_as_string` property. | Optional | `null` |
|
||||
|
||||
The following snippet calculates the sum of all the total monthly `sales` buckets:
|