mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
Re-implement `CATEGORIZE` in a way that works for multi-node clusters. This requires that data is first categorized on each data node in a first pass, then the categorizers from each data node are merged on the coordinator node and previously categorized rows are re-categorized. BlockHashes, used in HashAggregations, already work in a very similar way. E.g. for queries like `... | STATS ... BY field1, field2` they map values for `field1` and `field2` to unique integer ids that are then passed to the actual aggregate functions to identify which "bucket" a row belongs to. When passed from the data nodes to the coordinator, the BlockHashes are also merged to obtain unique ids for every value in `field1, field2` that is seen on the coordinator (not only on the local data nodes). Therefore, we re-implement `CATEGORIZE` as a special BlockHash. To choose the correct BlockHash when a query plan is mapped to physical operations, the `AggregateExec` query plan node needs to know that we will be categorizing the field `message` in a query containing `... | STATS ... BY c = CATEGORIZE(message)`. For this reason, _we do not extract the expression_ `c = CATEGORIZE(message)` into an `EVAL` node, in contrast to e.g. `STATS ... BY b = BUCKET(field, 10)`. The expression `c = CATEGORIZE(message)` simply remains inside the `AggregateExec`'s groupings. **Important limitation:** For now, to use `CATEGORIZE` in a `STATS` command, there can be only 1 grouping (the `CATEGORIZE`) overall. |
||
---|---|---|
.. | ||
appendix | ||
description | ||
examples | ||
kibana | ||
layout | ||
parameters | ||
signature | ||
types | ||
aggregation-functions.asciidoc | ||
binary.asciidoc | ||
cast.asciidoc | ||
conditional-functions-and-expressions.asciidoc | ||
date-time-functions.asciidoc | ||
grouping-functions.asciidoc | ||
in.asciidoc | ||
ip-functions.asciidoc | ||
like.asciidoc | ||
logical.asciidoc | ||
math-functions.asciidoc | ||
mv-functions.asciidoc | ||
operators.asciidoc | ||
predicates.asciidoc | ||
README.md | ||
rlike.asciidoc | ||
search-functions.asciidoc | ||
search.asciidoc | ||
spatial-functions.asciidoc | ||
string-functions.asciidoc | ||
type-conversion-functions.asciidoc | ||
unary.asciidoc |
The files in these subdirectories are generated by ESQL's test suite:
description
- description of each function scraped from@FunctionInfo#description
examples
- examples of each function scraped from@FunctionInfo#examples
parameters
- description of each function's parameters scraped from@Param
signature
- railroad diagram of the syntax to invoke each functiontypes
- a table of each combination of support type for each parameter. These are generated from tests.layout
- a fully generated description for each functionkibana/definition
- function definitions for kibana's ESQL editorkibana/docs
- the inline docs for kibana
Most functions can use the generated docs generated in the layout
directory.
If we need something more custom for the function we can make a file in this
directory that can include::
any parts of the files above.
To regenerate the files for a function run its tests using gradle:
./gradlew :x-pack:plugin:esql:test -Dtests.class='*SinTests'
To regenerate the files for all functions run all of ESQL's tests using gradle:
./gradlew :x-pack:plugin:esql:test