elasticsearch/docs/reference/esql
Nik Everett 9022cccba7
ESQL: CATEGORIZE as a BlockHash (#114317)
Re-implement `CATEGORIZE` in a way that works for multi-node clusters.

This requires that data is first categorized on each data node in a first pass, then the categorizers from each data node are merged on the coordinator node and previously categorized rows are re-categorized.

BlockHashes, used in HashAggregations, already work in a very similar way. E.g. for queries like `... | STATS ... BY field1, field2` they map values for `field1` and `field2` to unique integer ids that are then passed to the actual aggregate functions to identify which "bucket" a row belongs to. When passed from the data nodes to the coordinator, the BlockHashes are also merged to obtain unique ids for every value in `field1, field2` that is seen on the coordinator (not only on the local data nodes).

Therefore, we re-implement `CATEGORIZE` as a special BlockHash.

To choose the correct BlockHash when a query plan is mapped to physical operations, the `AggregateExec` query plan node needs to know that we will be categorizing the field `message` in a query containing `... | STATS ... BY c = CATEGORIZE(message)`. For this reason, _we do not extract the expression_ `c = CATEGORIZE(message)` into an `EVAL` node, in contrast to e.g. `STATS ... BY b = BUCKET(field, 10)`. The expression `c = CATEGORIZE(message)` simply remains inside the `AggregateExec`'s groupings.

**Important limitation:** For now, to use `CATEGORIZE` in a `STATS` command, there can be only 1 grouping (the `CATEGORIZE`) overall.
2024-11-27 17:44:55 +01:00
..
functions ESQL: CATEGORIZE as a BlockHash (#114317) 2024-11-27 17:44:55 +01:00
processing-commands Add docs for aggs filtering (#116681) 2024-11-22 13:26:30 +01:00
source-commands ESQL: Validate unique plan attribute names (#110488) 2024-07-17 11:39:02 +02:00
esql-across-clusters.asciidoc CCS metadata is opt-in in ESQL JSON responses (#114437) 2024-10-11 15:03:26 -04:00
esql-apis.asciidoc (Doc+) Link API doc to parent object - part1 (#111951) 2024-08-20 14:58:18 -06:00
esql-async-query-api.asciidoc Remove esql version from docs (#108933) 2024-05-23 10:36:15 -04:00
esql-async-query-delete-api.asciidoc (Doc+) Link API doc to parent object - part1 (#111951) 2024-08-20 14:58:18 -06:00
esql-async-query-get-api.asciidoc Add ES|QL async query api docs (#104054) 2024-01-09 09:17:02 +00:00
esql-commands.asciidoc Esql/lookup join grammar (#116515) 2024-11-19 17:52:24 -08:00
esql-enrich-data.asciidoc Added stricter range type checks and runtime warnings for ENRICH (#115091) 2024-11-19 16:34:21 +01:00
esql-examples.asciidoc [DOCS] Small ES|QL improvements (#101877) 2023-11-07 17:24:59 +01:00
esql-functions-operators.asciidoc ES|QL Add full-text search to the functions docs page (#116024) 2024-11-01 12:04:55 +00:00
esql-get-started.asciidoc Revert "[DOCS] Remove ESQL demo env link from 8.14+ (#109562)" (#109579) 2024-06-11 17:04:37 +02:00
esql-kibana.asciidoc Docs for starred esql queries in Kibana (#117468) 2024-11-25 15:13:23 +01:00
esql-language.asciidoc [ES|QL][DOCS] Add docs for date_period and time_duration (#116368) 2024-11-19 07:48:35 -05:00
esql-limitations.asciidoc Esql Enable Date Nanos (#117080) 2024-11-20 09:31:01 -05:00
esql-multi-index.asciidoc ESQL: Fix for overzealous validation in case of invalid mapped fields (#111475) 2024-08-09 09:38:14 +02:00
esql-process-data-with-dissect-grok.asciidoc ES|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (#115320) 2024-10-24 09:19:46 +02:00
esql-query-api.asciidoc Esql/lookup join grammar (#116515) 2024-11-19 17:52:24 -08:00
esql-rest.asciidoc Collect and display execution metadata for ES|QL cross cluster searches (#112595) 2024-09-30 16:03:39 -04:00
esql-security-solution.asciidoc Update esql-security-solution.asciidoc (#104531) 2024-01-18 15:48:43 +01:00
esql-syntax.asciidoc [ES|QL][DOCS] Add docs for date_period and time_duration (#116368) 2024-11-19 07:48:35 -05:00
esql-using.asciidoc Union types documentation (#110183) 2024-07-16 12:06:19 +02:00
implicit-casting.asciidoc [ES|QL][DOCS] Add docs for date_period and time_duration (#116368) 2024-11-19 07:48:35 -05:00
index.asciidoc [DOCS] ESQL goes GA (#108342) 2024-05-07 14:12:50 +02:00
metadata-fields.asciidoc Reapply "ESQL: Expose "_ignored" metadata field" (#108864) (#108871) 2024-05-22 07:06:04 -04:00
multivalued-fields.asciidoc Docs: ESQL doesn't preserve nulls in a list (#114335) 2024-10-09 03:17:56 +11:00
task-management.asciidoc [DOCS] One more round of restructuring the ES|QL documentation (#101340) 2023-10-26 10:57:05 +02:00
time-spans.asciidoc [ES|QL][DOCS] Add docs for date_period and time_duration (#116368) 2024-11-19 07:48:35 -05:00