elasticsearch/docs/reference/esql/processing-commands/inlinestats.asciidoc
Nik Everett b5c6c2da30
ESQL: INLINESTATS (#109583)
This implements `INLINESTATS`. Most of the heavy lifting is done by
`LOOKUP`, with this change mostly adding a new abstraction to logical
plans, and interface I'm calling `Phased`. Implementing this interface
allows a logical plan node to cut the query into phases. `INLINESTATS`
implements it by asking for a "first phase" that's the same query, up to
`INLINESTATS`, but with `INLINESTATS` replaced with `STATS`. The next
phase replaces the `INLINESTATS` with a `LOOKUP` on the results of the
first phase.

So, this query:
```
FROM foo
| EVAL bar = a * b
| INLINESTATS m = MAX(bar) BY b
| WHERE m = bar
| LIMIT 1
```

gets split into
```
FROM foo
| EVAL bar = a * b
| STATS m = MAX(bar) BY b
```

followed by
```
FROM foo
| EVAL bar = a * b
| LOOKUP (results of m = MAX(bar) BY b) ON b
| WHERE m = bar
| LIMIT 1
```
2024-07-24 17:16:37 -04:00

102 lines
3 KiB
Text

[discrete]
[[esql-inlinestats-by]]
=== `INLINESTATS ... BY`
experimental::["INLINESTATS is highly experimental and only available in SNAPSHOT versions."]
The `INLINESTATS` command calculates an aggregate result and adds new columns
with the result to the stream of input data.
**Syntax**
[source,esql]
----
INLINESTATS [column1 =] expression1[, ..., [columnN =] expressionN]
[BY grouping_expression1[, ..., grouping_expressionN]]
----
*Parameters*
`columnX`::
The name by which the aggregated value is returned. If omitted, the name is
equal to the corresponding expression (`expressionX`). If multiple columns
have the same name, all but the rightmost column with this name will be ignored.
`expressionX`::
An expression that computes an aggregated value. If its name coincides with one
of the computed columns, that column will be ignored.
`grouping_expressionX`::
An expression that outputs the values to group by.
NOTE: Individual `null` values are skipped when computing aggregations.
*Description*
The `INLINESTATS` command calculates an aggregate result and merges that result
back into the stream of input data. Without the optional `BY` clause this will
produce a single result which is appended to each row. With a `BY` clause this
will produce one result per grouping and merge the result into the stream based on
matching group keys.
All of the <<esql-agg-functions,aggregation functions>> are supported.
*Examples*
Find the employees that speak the most languages (it's a tie!):
[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=max-languages]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=max-languages-result]
|===
Find the longest tenured employee who's last name starts with each letter of the alphabet:
[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=longest-tenured-by-first]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=longest-tenured-by-first-result]
|===
Find the northern and southern most airports:
[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=extreme-airports]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=extreme-airports-result]
|===
NOTE: Our test data doesn't have many "small" airports.
If a `BY` field is multivalued then `INLINESTATS` will put the row in *each*
bucket like <<esql-stats-by>>:
[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=mv-group]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=mv-group-result]
|===
To treat each group key as its own row use <<esql-mv_expand>> before `INLINESTATS`:
[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=mv-expand]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=mv-expand-result]
|===