elasticsearch/docs/reference/esql/processing-commands/stats.asciidoc
Abdon Pijpelink 76ab37b35d
[DOCS] Uniform formatting for ES|QL commands (#101728)
* Source commands

* Missing word

* Processing commands

* Apply suggestions from code review

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>

* Review feedback

* Add sort detail for mv

* More review feedback

---------

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
2023-11-06 08:42:13 +01:00

82 lines
2.2 KiB
Text

[discrete]
[[esql-stats-by]]
=== `STATS ... BY`
**Syntax**
[source,esql]
----
STATS [column1 =] expression1[, ..., [columnN =] expressionN] [BY grouping_column1[, ..., grouping_columnN]]
----
*Parameters*
`columnX`::
The name by which the aggregated value is returned. If omitted, the name is
equal to the corresponding expression (`expressionX`).
`expressionX`::
An expression that computes an aggregated value.
`grouping_columnX`::
The column containing the values to group by.
*Description*
The `STATS ... BY` processing command groups rows according to a common value
and calculate one or more aggregated values over the grouped rows. If `BY` is
omitted, the output table contains exactly one row with the aggregations applied
over the entire dataset.
The following aggregation functions are supported:
include::../functions/aggregation-functions.asciidoc[tag=agg_list]
NOTE: `STATS` without any groups is much much faster than adding a group.
NOTE: Grouping on a single column is currently much more optimized than grouping
on many columns. In some tests we have seen grouping on a single `keyword`
column to be five times faster than grouping on two `keyword` columns. Do
not try to work around this by combining the two columns together with
something like <<esql-concat>> and then grouping - that is not going to be
faster.
*Examples*
Calculating a statistic and grouping by the values of another column:
[source.merge.styled,esql]
----
include::{esql-specs}/docs.csv-spec[tag=stats]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/docs.csv-spec[tag=stats-result]
|===
Omitting `BY` returns one row with the aggregations applied over the entire
dataset:
[source.merge.styled,esql]
----
include::{esql-specs}/docs.csv-spec[tag=statsWithoutBy]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/docs.csv-spec[tag=statsWithoutBy-result]
|===
It's possible to calculate multiple values:
[source,esql]
----
include::{esql-specs}/docs.csv-spec[tag=statsCalcMultipleValues]
----
It's also possible to group by multiple values (only supported for long and
keyword family fields):
[source,esql]
----
include::{esql-specs}/docs.csv-spec[tag=statsGroupByMultipleValues]
----