elasticsearch/docs/reference/esql/functions
Nik Everett 9022cccba7
ESQL: CATEGORIZE as a BlockHash (#114317)
Re-implement `CATEGORIZE` in a way that works for multi-node clusters.

This requires that data is first categorized on each data node in a first pass, then the categorizers from each data node are merged on the coordinator node and previously categorized rows are re-categorized.

BlockHashes, used in HashAggregations, already work in a very similar way. E.g. for queries like `... | STATS ... BY field1, field2` they map values for `field1` and `field2` to unique integer ids that are then passed to the actual aggregate functions to identify which "bucket" a row belongs to. When passed from the data nodes to the coordinator, the BlockHashes are also merged to obtain unique ids for every value in `field1, field2` that is seen on the coordinator (not only on the local data nodes).

Therefore, we re-implement `CATEGORIZE` as a special BlockHash.

To choose the correct BlockHash when a query plan is mapped to physical operations, the `AggregateExec` query plan node needs to know that we will be categorizing the field `message` in a query containing `... | STATS ... BY c = CATEGORIZE(message)`. For this reason, _we do not extract the expression_ `c = CATEGORIZE(message)` into an `EVAL` node, in contrast to e.g. `STATS ... BY b = BUCKET(field, 10)`. The expression `c = CATEGORIZE(message)` simply remains inside the `AggregateExec`'s groupings.

**Important limitation:** For now, to use `CATEGORIZE` in a `STATS` command, there can be only 1 grouping (the `CATEGORIZE`) overall.
2024-11-27 17:44:55 +01:00
..
appendix ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (#111409) 2024-07-30 03:07:15 +10:00
description ES|QL kql function. (#116764) 2024-11-25 14:22:11 +01:00
examples ES|QL kql function. (#116764) 2024-11-25 14:22:11 +01:00
kibana ESQL: CATEGORIZE as a BlockHash (#114317) 2024-11-27 17:44:55 +01:00
layout ES|QL kql function. (#116764) 2024-11-25 14:22:11 +01:00
parameters ES|QL kql function. (#116764) 2024-11-25 14:22:11 +01:00
signature ES|QL kql function. (#116764) 2024-11-25 14:22:11 +01:00
types ESQL: CATEGORIZE as a BlockHash (#114317) 2024-11-27 17:44:55 +01:00
aggregation-functions.asciidoc [ES|QL] Add a standard deviation function (#116531) 2024-11-22 12:33:46 -10:00
binary.asciidoc [ES|QL][DOCS] Add docs for date_period and time_duration (#116368) 2024-11-19 07:48:35 -05:00
cast.asciidoc ESQL: Document the cast operator (::) (#107871) 2024-04-25 10:10:59 -04:00
conditional-functions-and-expressions.asciidoc [ES|QL] Add/Modify annotations for spatial and conditional functions for better doc generation (#107722) 2024-05-10 14:49:25 -04:00
date-time-functions.asciidoc ES|QL: Add unit tests for now() function (#108498) 2024-05-10 14:28:19 +02:00
grouping-functions.asciidoc ESQL: Document BUCKET as a grouping function (#107864) 2024-04-25 12:38:12 -04:00
in.asciidoc [DOCS] Examples for ES|QL DISSECT and WHERE (#102591) 2023-11-27 10:56:48 +01:00
ip-functions.asciidoc ESQL: Add aggregations testing base and docs (#110042) 2024-06-27 21:21:55 +10:00
like.asciidoc ES|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (#115320) 2024-10-24 09:19:46 +02:00
logical.asciidoc Restructure ES|QL docs (#100806) 2023-10-17 17:36:14 +02:00
math-functions.asciidoc [ES|QL] Add hypot function (#114382) 2024-10-11 09:33:45 -10:00
mv-functions.asciidoc ESQL: Add docs for MV_PERCENTILE (#117377) 2024-11-23 06:41:18 +11:00
operators.asciidoc ESQL - match operator included in non-snapshot builds (#116819) 2024-11-21 07:45:22 +01:00
predicates.asciidoc [DOCS] Examples for ES|QL DISSECT and WHERE (#102591) 2023-11-27 10:56:48 +01:00
README.md ESQL: Generate kibana inline docs (#106782) 2024-04-09 14:19:48 -04:00
rlike.asciidoc ES|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (#115320) 2024-10-24 09:19:46 +02:00
search-functions.asciidoc ES|QL Add full-text search to the functions docs page (#116024) 2024-11-01 12:04:55 +00:00
search.asciidoc ESQL - match operator included in non-snapshot builds (#116819) 2024-11-21 07:45:22 +01:00
spatial-functions.asciidoc Make spatial search functions not preview (#117489) 2024-11-25 17:04:48 +01:00
string-functions.asciidoc [ES|QL] Add support BYTE_LENGTH scalar function (#116591) 2024-11-13 00:42:19 +02:00
type-conversion-functions.asciidoc [ES|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations (#109193) 2024-09-09 14:56:43 -04:00
unary.asciidoc ESQL: Add type tables for operators to docs (#103206) 2023-12-11 10:51:38 -05:00

The files in these subdirectories are generated by ESQL's test suite:

  • description - description of each function scraped from @FunctionInfo#description
  • examples - examples of each function scraped from @FunctionInfo#examples
  • parameters - description of each function's parameters scraped from @Param
  • signature - railroad diagram of the syntax to invoke each function
  • types - a table of each combination of support type for each parameter. These are generated from tests.
  • layout - a fully generated description for each function
  • kibana/definition - function definitions for kibana's ESQL editor
  • kibana/docs - the inline docs for kibana

Most functions can use the generated docs generated in the layout directory. If we need something more custom for the function we can make a file in this directory that can include:: any parts of the files above.

To regenerate the files for a function run its tests using gradle:

./gradlew :x-pack:plugin:esql:test -Dtests.class='*SinTests'

To regenerate the files for all functions run all of ESQL's tests using gradle:

./gradlew :x-pack:plugin:esql:test