[discrete] [[esql-stats-by]] === `STATS ... BY` The `STATS ... BY` processing command groups rows according to a common value and calculates one or more aggregated values over the grouped rows. **Syntax** [source,esql] ---- STATS [column1 =] expression1[, ..., [columnN =] expressionN] [BY grouping_expression1[, ..., grouping_expressionN]] ---- *Parameters* `columnX`:: The name by which the aggregated value is returned. If omitted, the name is equal to the corresponding expression (`expressionX`). If multiple columns have the same name, all but the rightmost column with this name will be ignored. `expressionX`:: An expression that computes an aggregated value. `grouping_expressionX`:: An expression that outputs the values to group by. If its name coincides with one of the computed columns, that column will be ignored. NOTE: Individual `null` values are skipped when computing aggregations. *Description* The `STATS ... BY` processing command groups rows according to a common value and calculate one or more aggregated values over the grouped rows. If `BY` is omitted, the output table contains exactly one row with the aggregations applied over the entire dataset. The following <> are supported: include::../functions/aggregation-functions.asciidoc[tag=agg_list] The following <> are supported: include::../functions/grouping-functions.asciidoc[tag=group_list] NOTE: `STATS` without any groups is much much faster than adding a group. NOTE: Grouping on a single expression is currently much more optimized than grouping on many expressions. In some tests we have seen grouping on a single `keyword` column to be five times faster than grouping on two `keyword` columns. Do not try to work around this by combining the two columns together with something like <> and then grouping - that is not going to be faster. *Examples* Calculating a statistic and grouping by the values of another column: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=stats] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=stats-result] |=== Omitting `BY` returns one row with the aggregations applied over the entire dataset: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result] |=== It's possible to calculate multiple values: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result] |=== [[esql-stats-mv-group]] If the grouping key is multivalued then the input row is in all groups: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=mv-group] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=mv-group-result] |=== It's also possible to group by multiple values: [source,esql] ---- include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues] ---- If the all grouping keys are multivalued then the input row is in all groups: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=multi-mv-group] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result] |=== Both the aggregating functions and the grouping expressions accept other functions. This is useful for using `STATS...BY` on multivalue columns. For example, to calculate the average salary change, you can use `MV_AVG` to first average the multiple values per employee, and use the result with the `AVG` function: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result] |=== An example of grouping by an expression is grouping employees on the first letter of their last name: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression-result] |=== Specifying the output column name is optional. If not specified, the new column name is equal to the expression. The following query returns a column named `AVG(salary)`: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result] |=== Because this name contains special characters, <> with backticks (+{backtick}+) when using it in subsequent commands: [source.merge.styled,esql] ---- include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result] |===