mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 09:28:55 -04:00
Re-implement `CATEGORIZE` in a way that works for multi-node clusters. This requires that data is first categorized on each data node in a first pass, then the categorizers from each data node are merged on the coordinator node and previously categorized rows are re-categorized. BlockHashes, used in HashAggregations, already work in a very similar way. E.g. for queries like `... | STATS ... BY field1, field2` they map values for `field1` and `field2` to unique integer ids that are then passed to the actual aggregate functions to identify which "bucket" a row belongs to. When passed from the data nodes to the coordinator, the BlockHashes are also merged to obtain unique ids for every value in `field1, field2` that is seen on the coordinator (not only on the local data nodes). Therefore, we re-implement `CATEGORIZE` as a special BlockHash. To choose the correct BlockHash when a query plan is mapped to physical operations, the `AggregateExec` query plan node needs to know that we will be categorizing the field `message` in a query containing `... | STATS ... BY c = CATEGORIZE(message)`. For this reason, _we do not extract the expression_ `c = CATEGORIZE(message)` into an `EVAL` node, in contrast to e.g. `STATS ... BY b = BUCKET(field, 10)`. The expression `c = CATEGORIZE(message)` simply remains inside the `AggregateExec`'s groupings. **Important limitation:** For now, to use `CATEGORIZE` in a `STATS` command, there can be only 1 grouping (the `CATEGORIZE`) overall. |
||
---|---|---|
.. | ||
functions | ||
processing-commands | ||
source-commands | ||
esql-across-clusters.asciidoc | ||
esql-apis.asciidoc | ||
esql-async-query-api.asciidoc | ||
esql-async-query-delete-api.asciidoc | ||
esql-async-query-get-api.asciidoc | ||
esql-commands.asciidoc | ||
esql-enrich-data.asciidoc | ||
esql-examples.asciidoc | ||
esql-functions-operators.asciidoc | ||
esql-get-started.asciidoc | ||
esql-kibana.asciidoc | ||
esql-language.asciidoc | ||
esql-limitations.asciidoc | ||
esql-multi-index.asciidoc | ||
esql-process-data-with-dissect-grok.asciidoc | ||
esql-query-api.asciidoc | ||
esql-rest.asciidoc | ||
esql-security-solution.asciidoc | ||
esql-syntax.asciidoc | ||
esql-using.asciidoc | ||
implicit-casting.asciidoc | ||
index.asciidoc | ||
metadata-fields.asciidoc | ||
multivalued-fields.asciidoc | ||
task-management.asciidoc | ||
time-spans.asciidoc |