mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
202 lines
6 KiB
Text
202 lines
6 KiB
Text
[discrete]
|
|
[[esql-stats-by]]
|
|
=== `STATS`
|
|
|
|
The `STATS` processing command groups rows according to a common value
|
|
and calculates one or more aggregated values over the grouped rows.
|
|
|
|
**Syntax**
|
|
|
|
[source,esql]
|
|
----
|
|
STATS [column1 =] expression1 [WHERE boolean_expression1][,
|
|
...,
|
|
[columnN =] expressionN [WHERE boolean_expressionN]]
|
|
[BY grouping_expression1[, ..., grouping_expressionN]]
|
|
----
|
|
|
|
*Parameters*
|
|
|
|
`columnX`::
|
|
The name by which the aggregated value is returned. If omitted, the name is
|
|
equal to the corresponding expression (`expressionX`).
|
|
If multiple columns have the same name, all but the rightmost column with this
|
|
name will be ignored.
|
|
|
|
`expressionX`::
|
|
An expression that computes an aggregated value.
|
|
|
|
`grouping_expressionX`::
|
|
An expression that outputs the values to group by.
|
|
If its name coincides with one of the computed columns, that column will be ignored.
|
|
|
|
`boolean_expressionX`::
|
|
The condition that must be met for a row to be included in the evaluation of `expressionX`.
|
|
|
|
NOTE: Individual `null` values are skipped when computing aggregations.
|
|
|
|
*Description*
|
|
|
|
The `STATS` processing command groups rows according to a common value
|
|
and calculates one or more aggregated values over the grouped rows. For the
|
|
calculation of each aggregated value, the rows in a group can be filtered with
|
|
`WHERE`. If `BY` is omitted, the output table contains exactly one row with
|
|
the aggregations applied over the entire dataset.
|
|
|
|
The following <<esql-agg-functions,aggregation functions>> are supported:
|
|
|
|
include::../functions/aggregation-functions.asciidoc[tag=agg_list]
|
|
|
|
The following <<esql-group-functions,grouping functions>> are supported:
|
|
|
|
include::../functions/grouping-functions.asciidoc[tag=group_list]
|
|
|
|
NOTE: `STATS` without any groups is much much faster than adding a group.
|
|
|
|
NOTE: Grouping on a single expression is currently much more optimized than grouping
|
|
on many expressions. In some tests we have seen grouping on a single `keyword`
|
|
column to be five times faster than grouping on two `keyword` columns. Do
|
|
not try to work around this by combining the two columns together with
|
|
something like <<esql-concat>> and then grouping - that is not going to be
|
|
faster.
|
|
|
|
*Examples*
|
|
|
|
Calculating a statistic and grouping by the values of another column:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=stats]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=stats-result]
|
|
|===
|
|
|
|
Omitting `BY` returns one row with the aggregations applied over the entire
|
|
dataset:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result]
|
|
|===
|
|
|
|
It's possible to calculate multiple values:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
|
|
|===
|
|
|
|
To filter the rows that go into an aggregation, use the `WHERE` clause:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=aggFiltering]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=aggFiltering-result]
|
|
|===
|
|
|
|
The aggregations can be mixed, with and without a filter and grouping is
|
|
optional as well:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup-result]
|
|
|===
|
|
|
|
[[esql-stats-mv-group]]
|
|
If the grouping key is multivalued then the input row is in all groups:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=mv-group]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=mv-group-result]
|
|
|===
|
|
|
|
It's also possible to group by multiple values:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
|
|
----
|
|
|
|
If all the grouping keys are multivalued then the input row is in all groups:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=multi-mv-group]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
|
|
|===
|
|
|
|
Both the aggregating functions and the grouping expressions accept other
|
|
functions. This is useful for using `STATS` on multivalue columns.
|
|
For example, to calculate the average salary change, you can use `MV_AVG` to
|
|
first average the multiple values per employee, and use the result with the
|
|
`AVG` function:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
|
|
|===
|
|
|
|
An example of grouping by an expression is grouping employees on the first
|
|
letter of their last name:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression-result]
|
|
|===
|
|
|
|
Specifying the output column name is optional. If not specified, the new column
|
|
name is equal to the expression. The following query returns a column named
|
|
`AVG(salary)`:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result]
|
|
|===
|
|
|
|
Because this name contains special characters, <<esql-identifiers,it needs to be
|
|
quoted>> with backticks (+{backtick}+) when using it in subsequent commands:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result]
|
|
|===
|