[ES|QL] Automate the retrieval of grouping functions (#210513)

## Summary Closes https://github.com/elastic/kibana/issues/210313 Automates the retrieval of grouping functions ( categorize and bucket ) for both the functions definitions and docs. Buckets signatures are tricky so I overwrite them with our implementation. Everything else is being retrieved by ES
2025-04-24 01:38:56 -04:00 · 2025-02-12 13:28:05 +01:00 · 2025-02-12 13:28:05 +01:00 · 7683f01564
commit 7683f01564
parent bcfdd13c11
17 changed files with 1077 additions and 351 deletions
--- a/src/platform/packages/private/kbn-language-documentation/scripts/generate_esql_docs.ts
+++ b/src/platform/packages/private/kbn-language-documentation/scripts/generate_esql_docs.ts
@ -20,7 +20,8 @@ interface DocsSectionContent {

 (function () {
  const pathToElasticsearch = process.argv[2];
-  const { scalarFunctions, aggregationFunctions } = loadFunctionDocs(pathToElasticsearch);
+  const { scalarFunctions, aggregationFunctions, groupingFunctions } =
+    loadFunctionDocs(pathToElasticsearch);
  writeFunctionDocs(
    scalarFunctions,
    path.join(__dirname, '../src/sections/generated/scalar_functions.tsx')
@ -29,6 +30,10 @@ interface DocsSectionContent {
    aggregationFunctions,
    path.join(__dirname, '../src/sections/generated/aggregation_functions.tsx')
  );
+  writeFunctionDocs(
+    groupingFunctions,
+    path.join(__dirname, '../src/sections/generated/grouping_functions.tsx')
+  );
 })();

 function loadFunctionDocs(pathToElasticsearch: string) {
@ -48,6 +53,7 @@ function loadFunctionDocs(pathToElasticsearch: string) {

  const scalarFunctions = new Map<string, DocsSectionContent>();
  const aggregationFunctions = new Map<string, DocsSectionContent>();
+  const groupingFunctions = new Map<string, DocsSectionContent>();

  // Iterate over each file in the directory
  for (const file of docsFiles) {
@ -80,10 +86,16 @@ function loadFunctionDocs(pathToElasticsearch: string) {
          preview: functionDefinition.preview,
        });
      }
+      if (functionDefinition.type === 'grouping') {
+        groupingFunctions.set(functionName, {
+          description: content,
+          preview: functionDefinition.preview,
+        });
+      }
    }
  }

-  return { scalarFunctions, aggregationFunctions };
+  return { scalarFunctions, aggregationFunctions, groupingFunctions };
 }

 function writeFunctionDocs(functionDocs: Map<string, DocsSectionContent>, pathToDocsFile: string) {
--- a/src/platform/packages/private/kbn-language-documentation/src/sections/esql_documentation_sections.tsx
+++ b/src/platform/packages/private/kbn-language-documentation/src/sections/esql_documentation_sections.tsx
@ -682,139 +682,6 @@ Refer to **Operators** for an overview of the supported operators.
  ],
 };

-export const groupingFunctions = {
-  label: i18n.translate('languageDocumentation.documentationESQL.groupingFunctions', {
-    defaultMessage: 'Grouping functions',
-  }),
-  description: i18n.translate(
-    'languageDocumentation.documentationESQL.groupingFunctionsDocumentationESQLDescription',
-    {
-      defaultMessage: `These grouping functions can be used with \`STATS...BY\`:`,
-    }
-  ),
-  items: [
-    {
-      label: i18n.translate('languageDocumentation.documentationESQL.autoBucketFunction', {
-        defaultMessage: 'BUCKET',
-      }),
-      description: (
-        <Markdown
-          markdownContent={i18n.translate(
-            'languageDocumentation.documentationESQL.autoBucketFunction.markdown',
-            {
-              defaultMessage: `### BUCKET
-Creates groups of values - buckets - out of a datetime or numeric input. The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.
-
-\`BUCKET\` works in two modes:
-
-1. Where the size of the bucket is computed based on a buckets count recommendation (four parameters) and a range.
-2. Where the bucket size is provided directly (two parameters).
-
-Using a target number of buckets, a start of a range, and an end of a range, \`BUCKET\` picks an appropriate bucket size to generate the target number of buckets or fewer.
-
-For example, requesting up to 20 buckets for a year will organize data into monthly intervals:
-
-\`\`\`
-FROM employees
-| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
-| STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
-| SORT hire_date
-\`\`\`
-
-**NOTE**: The goal isn’t to provide the exact target number of buckets, it’s to pick a range that provides _at most_ the target number of buckets.
-
-You can combine \`BUCKET\` with an aggregation to create a histogram:
-
-\`\`\`
-FROM employees
-| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
-| STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
-| SORT month
-\`\`\`
-
-**NOTE**: \`BUCKET\` does not create buckets that match zero documents. That’s why the previous example is missing \`1985-03-01\` and other dates.
-
-Asking for more buckets can result in a smaller range. For example, requesting at most 100 buckets in a year results in weekly buckets:
-
-\`\`\`
-FROM employees
-| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
-| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
-| SORT week
-\`\`\`
-
-**NOTE**: \`BUCKET\` does not filter any rows. It only uses the provided range to pick a good bucket size. For rows with a value outside of the range, it returns a bucket value that corresponds to a bucket outside the range. Combine \`BUCKET\` with \`WHERE\` to filter rows.
-
-If the desired bucket size is known in advance, simply provide it as the second argument, leaving the range out:
-
-\`\`\`
-FROM employees
-| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
-| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week)
-| SORT week
-\`\`\`
-
-**NOTE**: When providing the bucket size as the second parameter, it must be a time duration or date period.
-
-\`BUCKET\` can also operate on numeric fields. For example, to create a salary histogram:
-
-\`\`\`
-FROM employees
-| STATS COUNT(*) by bs = BUCKET(salary, 20, 25324, 74999)
-| SORT bs
-\`\`\`
-
-Unlike the earlier example that intentionally filters on a date range, you rarely want to filter on a numeric range. You have to find the min and max separately. ES|QL doesn’t yet have an easy way to do that automatically.
-
-The range can be omitted if the desired bucket size is known in advance. Simply provide it as the second argument:
-
-\`\`\`
-FROM employees
-| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
-| STATS c = COUNT(1) BY b = BUCKET(salary, 5000.)
-| SORT b
-\`\`\`
-
-**NOTE**: When providing the bucket size as the second parameter, it must be of a **floating point type**.
-
-Here's an example to create hourly buckets for the last 24 hours, and calculate the number of events per hour:
-
-\`\`\`
-FROM sample_data
-| WHERE @timestamp >= NOW() - 1 day and @timestamp < NOW()
-| STATS COUNT(*) BY bucket = BUCKET(@timestamp, 25, NOW() - 1 day, NOW())
-\`\`\`
-
-Here's an example  to create monthly buckets for the year 1985, and calculate the average salary by hiring month:
-
-\`\`\`
-FROM employees
-| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
-| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
-| SORT bucket
-\`\`\`
-
-\`BUCKET\` may be used in both the aggregating and grouping part of the \`STATS … BY …\` command, provided that in the aggregating part the function is **referenced by an alias defined in the grouping part**, or that it is invoked with the exact same expression.
-
-For example:
-
-\`\`\`
-FROM employees
-| STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.)
-| SORT b1, b2
-| KEEP s1, b1, s2, b2
-\`\`\`
-              `,
-              description:
-                'Text is in markdown. Do not translate function names, special characters, or field names like sum(bytes)',
-            }
-          )}
-        />
-      ),
-    },
-  ],
-};
-
 export const operators = {
  label: i18n.translate('languageDocumentation.documentationESQL.operators', {
    defaultMessage: 'Operators',
@ -1005,3 +872,4 @@ FROM employees

 export { functions as scalarFunctions } from './generated/scalar_functions';
 export { functions as aggregationFunctions } from './generated/aggregation_functions';
+export { functions as groupingFunctions } from './generated/grouping_functions';
--- a/src/platform/packages/private/kbn-language-documentation/src/sections/generated/grouping_functions.tsx
+++ b/src/platform/packages/private/kbn-language-documentation/src/sections/generated/grouping_functions.tsx
@ -0,0 +1,99 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the "Elastic License
+ * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
+ * Public License v 1"; you may not use this file except in compliance with, at
+ * your election, the "Elastic License 2.0", the "GNU Affero General Public
+ * License v3.0 only", or the "Server Side Public License, v 1".
+ */
+
+import React from 'react';
+import { i18n } from '@kbn/i18n';
+import { Markdown } from '@kbn/shared-ux-markdown';
+
+// DO NOT RENAME!
+export const functions = {
+  label: i18n.translate('languageDocumentation.documentationESQL.groupingFunctions', {
+    defaultMessage: 'Grouping functions',
+  }),
+  description: i18n.translate(
+    'languageDocumentation.documentationESQL.groupingFunctionsDocumentationESQLDescription',
+    {
+      defaultMessage: `These grouping functions can be used with \`STATS...BY\`:`,
+    }
+  ),
+  // items are managed by scripts/generate_esql_docs.ts
+  items: [
+    // Do not edit manually... automatically generated by scripts/generate_esql_docs.ts
+    {
+      label: i18n.translate('languageDocumentation.documentationESQL.bucket', {
+        defaultMessage: 'BUCKET',
+      }),
+      preview: false,
+      description: (
+        <Markdown
+          openLinksInNewTab
+          readOnly
+          enableSoftLineBreaks
+          markdownContent={i18n.translate(
+            'languageDocumentation.documentationESQL.bucket.markdown',
+            {
+              defaultMessage: `<!--
+  This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
+  -->
+
+  ### BUCKET
+  Creates groups of values - buckets - out of a datetime or numeric input.
+  The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.
+
+  \`\`\`
+  FROM employees
+  | WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
+  | STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
+  | SORT hire_date
+  \`\`\`
+  `,
+              description:
+                'Text is in markdown. Do not translate function names, special characters, or field names like sum(bytes)',
+              ignoreTag: true,
+            }
+          )}
+        />
+      ),
+    },
+    // Do not edit manually... automatically generated by scripts/generate_esql_docs.ts
+    {
+      label: i18n.translate('languageDocumentation.documentationESQL.categorize', {
+        defaultMessage: 'CATEGORIZE',
+      }),
+      preview: true,
+      description: (
+        <Markdown
+          openLinksInNewTab
+          readOnly
+          enableSoftLineBreaks
+          markdownContent={i18n.translate(
+            'languageDocumentation.documentationESQL.categorize.markdown',
+            {
+              defaultMessage: `<!--
+  This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
+  -->
+
+  ### CATEGORIZE
+  Groups text messages into categories of similarly formatted text values.
+
+  \`\`\`
+  FROM sample_data
+  | STATS count=COUNT() BY category=CATEGORIZE(message)
+  \`\`\`
+  `,
+              description:
+                'Text is in markdown. Do not translate function names, special characters, or field names like sum(bytes)',
+              ignoreTag: true,
+            }
+          )}
+        />
+      ),
+    },
+  ],
+};
--- a/src/platform/packages/private/kbn-language-documentation/src/sections/generated/scalar_functions.tsx
+++ b/src/platform/packages/private/kbn-language-documentation/src/sections/generated/scalar_functions.tsx
@ -212,42 +212,6 @@ export const functions = {
  | EVAL fn_length = LENGTH(city), fn_bit_length = BIT_LENGTH(city)
  \`\`\`
  Note: All strings are in UTF-8, so a single character can use multiple bytes.
-  `,
-              description:
-                'Text is in markdown. Do not translate function names, special characters, or field names like sum(bytes)',
-              ignoreTag: true,
-            }
-          )}
-        />
-      ),
-    },
-    {
-      label: i18n.translate('languageDocumentation.documentationESQL.bucket', {
-        defaultMessage: 'BUCKET',
-      }),
-      preview: false,
-      description: (
-        <Markdown
-          openLinksInNewTab
-          readOnly
-          enableSoftLineBreaks
-          markdownContent={i18n.translate(
-            'languageDocumentation.documentationESQL.bucket.markdown',
-            {
-              defaultMessage: `<!--
-  This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-  -->
-
-  ### BUCKET
-  Creates groups of values - buckets - out of a datetime or numeric input.
-  The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.
-
-  \`\`\`
-  FROM employees
-  | WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
-  | STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
-  | SORT hire_date
-  \`\`\`
  `,
              description:
                'Text is in markdown. Do not translate function names, special characters, or field names like sum(bytes)',
@ -334,37 +298,6 @@ export const functions = {
        />
      ),
    },
-    {
-      label: i18n.translate('languageDocumentation.documentationESQL.categorize', {
-        defaultMessage: 'CATEGORIZE',
-      }),
-      preview: true,
-      description: (
-        <Markdown
-          openLinksInNewTab
-          readOnly
-          enableSoftLineBreaks
-          markdownContent={i18n.translate(
-            'languageDocumentation.documentationESQL.categorize.markdown',
-            {
-              defaultMessage: `<!--
-  This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-  -->
-  ### CATEGORIZE
-  Groups text messages into categories of similarly formatted text values.
-  \`\`\`
-  FROM sample_data
-  | STATS count=COUNT() BY category=CATEGORIZE(message)
-  \`\`\`
-  `,
-              description:
-                'Text is in markdown. Do not translate function names, special characters, or field names like sum(bytes)',
-              ignoreTag: true,
-            }
-          )}
-        />
-      ),
-    },
    // Do not edit manually... automatically generated by scripts/generate_esql_docs.ts
    {
      label: i18n.translate('languageDocumentation.documentationESQL.cbrt', {
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/scripts/generate_function_definitions.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/scripts/generate_function_definitions.ts
@ -12,7 +12,12 @@ import { writeFile } from 'fs/promises';
 import { join } from 'path';
 import _ from 'lodash';
 import type { RecursivePartial } from '@kbn/utility-types';
-import { FunctionDefinition, FunctionParameterType, Signature } from '../src/definitions/types';
+import {
+  FunctionDefinition,
+  FunctionParameterType,
+  FunctionReturnType,
+  Signature,
+} from '../src/definitions/types';
 import { FULL_TEXT_SEARCH_FUNCTIONS } from '../src/shared/constants';
 const aliasTable: Record<string, string[]> = {
  to_version: ['to_ver'],
@ -25,6 +30,52 @@ const aliasTable: Record<string, string[]> = {
 };
 const aliases = new Set(Object.values(aliasTable).flat());

+const bucketParameterTypes: Array<
+  [
+    FunctionParameterType,
+    FunctionParameterType,
+    FunctionParameterType | null,
+    FunctionParameterType | null,
+    FunctionReturnType
+  ]
+> = [
+  // field   // bucket   //from    // to   //result
+  ['date', 'date_period', null, null, 'date'],
+  ['date', 'integer', 'date', 'date', 'date'],
+  // Modified time_duration to time_literal
+  ['date', 'time_literal', null, null, 'date'],
+  ['double', 'double', null, null, 'double'],
+  ['double', 'integer', 'double', 'double', 'double'],
+  ['double', 'integer', 'double', 'integer', 'double'],
+  ['double', 'integer', 'double', 'long', 'double'],
+  ['double', 'integer', 'integer', 'double', 'double'],
+  ['double', 'integer', 'integer', 'integer', 'double'],
+  ['double', 'integer', 'integer', 'long', 'double'],
+  ['double', 'integer', 'long', 'double', 'double'],
+  ['double', 'integer', 'long', 'integer', 'double'],
+  ['double', 'integer', 'long', 'long', 'double'],
+  ['integer', 'double', null, null, 'double'],
+  ['integer', 'integer', 'double', 'double', 'double'],
+  ['integer', 'integer', 'double', 'integer', 'double'],
+  ['integer', 'integer', 'double', 'long', 'double'],
+  ['integer', 'integer', 'integer', 'double', 'double'],
+  ['integer', 'integer', 'integer', 'integer', 'double'],
+  ['integer', 'integer', 'integer', 'long', 'double'],
+  ['integer', 'integer', 'long', 'double', 'double'],
+  ['integer', 'integer', 'long', 'integer', 'double'],
+  ['integer', 'integer', 'long', 'long', 'double'],
+  ['long', 'double', null, null, 'double'],
+  ['long', 'integer', 'double', 'double', 'double'],
+  ['long', 'integer', 'double', 'integer', 'double'],
+  ['long', 'integer', 'double', 'long', 'double'],
+  ['long', 'integer', 'integer', 'double', 'double'],
+  ['long', 'integer', 'integer', 'integer', 'double'],
+  ['long', 'integer', 'integer', 'long', 'double'],
+  ['long', 'integer', 'long', 'double', 'double'],
+  ['long', 'integer', 'long', 'integer', 'double'],
+  ['long', 'integer', 'long', 'long', 'double'],
+];
+
 const scalarSupportedCommandsAndOptions = {
  supportedCommands: ['stats', 'inlinestats', 'metrics', 'eval', 'where', 'row', 'sort'],
  supportedOptions: ['by'],
@ -36,7 +87,7 @@ const aggregationSupportedCommandsAndOptions = {

 // coalesce can be removed when a test is added for version type
 // (https://github.com/elastic/elasticsearch/pull/109032#issuecomment-2150033350)
-const excludedFunctions = new Set(['bucket', 'case', 'categorize']);
+const excludedFunctions = new Set(['case']);

 const extraFunctions: FunctionDefinition[] = [
  {
@ -609,6 +660,38 @@ const replaceParamName = (str: string) => {
  }
 };

+const enrichGrouping = (
+  groupingFunctionDefinitions: FunctionDefinition[]
+): FunctionDefinition[] => {
+  return groupingFunctionDefinitions.map((op) => {
+    if (op.name === 'bucket') {
+      const signatures = [
+        ...bucketParameterTypes.map((signature) => {
+          const [fieldType, bucketType, fromType, toType, resultType] = signature;
+          return {
+            params: [
+              { name: 'field', type: fieldType },
+              { name: 'buckets', type: bucketType, constantOnly: true },
+              ...(fromType ? [{ name: 'startDate', type: fromType, constantOnly: true }] : []),
+              ...(toType ? [{ name: 'endDate', type: toType, constantOnly: true }] : []),
+            ],
+            returnType: resultType,
+          };
+        }),
+      ];
+      return {
+        ...op,
+        signatures,
+        supportedOptions: ['by'],
+      };
+    }
+    return {
+      ...op,
+      supportedOptions: ['by'],
+    };
+  });
+};
+
 const enrichOperators = (
  operatorsFunctionDefinitions: FunctionDefinition[]
 ): FunctionDefinition[] => {
@ -686,7 +769,7 @@ const enrichOperators = (

 function printGeneratedFunctionsFile(
  functionDefinitions: FunctionDefinition[],
-  functionsType: 'aggregation' | 'scalar' | 'operators'
+  functionsType: 'aggregation' | 'scalar' | 'operators' | 'grouping'
 ) {
  /**
   * Deals with asciidoc internal cross-references in the function descriptions
@ -823,6 +906,7 @@ ${functionsType === 'operators' ? `import { isNumericType } from '../../shared/e
  const scalarFunctionDefinitions: FunctionDefinition[] = [];
  const aggFunctionDefinitions: FunctionDefinition[] = [];
  const operatorDefinitions: FunctionDefinition[] = [];
+  const groupingFunctionDefinitions: FunctionDefinition[] = [];

  for (const ESDefinition of ESFunctionDefinitions) {
    if (aliases.has(ESDefinition.name) || excludedFunctions.has(ESDefinition.name)) {
@ -843,6 +927,8 @@ ${functionsType === 'operators' ? `import { isNumericType } from '../../shared/e
      scalarFunctionDefinitions.push(functionDefinition);
    } else if (functionDefinition.type === 'agg') {
      aggFunctionDefinitions.push(functionDefinition);
+    } else if (functionDefinition.type === 'grouping') {
+      groupingFunctionDefinitions.push(functionDefinition);
    }
  }

@ -860,4 +946,8 @@ ${functionsType === 'operators' ? `import { isNumericType } from '../../shared/e
    join(__dirname, '../src/definitions/generated/operators.ts'),
    printGeneratedFunctionsFile(enrichOperators(operatorDefinitions), 'operators')
  );
+  await writeFile(
+    join(__dirname, '../src/definitions/generated/grouping_functions.ts'),
+    printGeneratedFunctionsFile(enrichGrouping(groupingFunctionDefinitions), 'grouping')
+  );
 })();
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/tests/autocomplete.command.stats.test.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/tests/autocomplete.command.stats.test.ts
@ -57,7 +57,12 @@ describe('autocomplete.suggest', () => {
    describe('... <aggregates> ...', () => {
      test('lists possible aggregations on space after command', async () => {
        const { assertSuggestions } = await setup();
-        const expected = ['var0 = ', ...allAggFunctions, ...allEvaFunctions];
+        const expected = [
+          'var0 = ',
+          ...allAggFunctions,
+          ...allGroupingFunctions,
+          ...allEvaFunctions,
+        ];

        await assertSuggestions('from a | stats /', expected);
        await assertSuggestions('FROM a | STATS /', expected);
@ -66,7 +71,11 @@ describe('autocomplete.suggest', () => {
      test('on assignment expression, shows all agg and eval functions', async () => {
        const { assertSuggestions } = await setup();

-        await assertSuggestions('from a | stats a=/', [...allAggFunctions, ...allEvaFunctions]);
+        await assertSuggestions('from a | stats a=/', [
+          ...allAggFunctions,
+          ...allGroupingFunctions,
+          ...allEvaFunctions,
+        ]);
      });

      test('on space after aggregate field', async () => {
@ -81,6 +90,7 @@ describe('autocomplete.suggest', () => {
        await assertSuggestions('from a | stats a=max(b), /', [
          'var0 = ',
          ...allAggFunctions,
+          ...allGroupingFunctions,
          ...allEvaFunctions,
        ]);
      });
@ -99,7 +109,6 @@ describe('autocomplete.suggest', () => {
        await assertSuggestions('from a | stats round(/', [
          ...getFunctionSignaturesByReturnType('stats', roundParameterTypes, {
            agg: true,
-            grouping: true,
          }),
          ...getFieldNamesByType(roundParameterTypes),
          ...getFunctionSignaturesByReturnType(
@ -206,11 +215,13 @@ describe('autocomplete.suggest', () => {
          // TODO verify that this change is ok
          ...allAggFunctions,
          ...allEvaFunctions,
+          ...allGroupingFunctions,
        ]);
        await assertSuggestions('from a | stats var0=min(b),var1=c,/', [
          'var2 = ',
          ...allAggFunctions,
          ...allEvaFunctions,
+          ...allGroupingFunctions,
        ]);
      });
    });
@ -252,6 +263,7 @@ describe('autocomplete.suggest', () => {
          'var0 = ',
          ...allAggFunctions,
          ...allEvaFunctions,
+          ...allGroupingFunctions,
        ]);
        await assertSuggestions('from a | stats avg(b) by c, /', [
          'var0 = ',
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/tests/helpers.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/tests/helpers.ts
@ -14,7 +14,7 @@ import { builtinFunctions } from '../../definitions/builtin';
 import { NOT_SUGGESTED_TYPES } from '../../shared/resources_helpers';
 import { aggregationFunctionDefinitions } from '../../definitions/generated/aggregation_functions';
 import { timeUnitsToSuggest } from '../../definitions/literals';
-import { groupingFunctionDefinitions } from '../../definitions/grouping';
+import { groupingFunctionDefinitions } from '../../definitions/generated/grouping_functions';
 import * as autocomplete from '../autocomplete';
 import type { ESQLCallbacks } from '../../shared/types';
 import type { EditorContext, SuggestionRawDefinition } from '../types';
@ -159,8 +159,6 @@ export function getFunctionSignaturesByReturnType(
  const list = [];
  if (agg) {
    list.push(...aggregationFunctionDefinitions);
-    // right now all grouping functions are agg functions too
-    list.push(...groupingFunctionDefinitions);
  }
  if (grouping) {
    list.push(...groupingFunctionDefinitions);
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/autocomplete.test.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/autocomplete.test.ts
@ -593,7 +593,11 @@ describe('autocomplete', () => {
    // STATS argument
    testSuggestions('FROM index1 | STATS f/', [
      'var0 = ',
-      ...getFunctionSignaturesByReturnType('stats', 'any', { scalar: true, agg: true }),
+      ...getFunctionSignaturesByReturnType('stats', 'any', {
+        scalar: true,
+        agg: true,
+        grouping: true,
+      }),
    ]);

    // STATS argument BY
@ -953,9 +957,11 @@ describe('autocomplete', () => {
      'FROM a | STATS /',
      [
        'var0 = ',
-        ...getFunctionSignaturesByReturnType('stats', 'any', { scalar: true, agg: true }).map(
-          attachAsSnippet
-        ),
+        ...getFunctionSignaturesByReturnType('stats', 'any', {
+          scalar: true,
+          agg: true,
+          grouping: true,
+        }).map(attachAsSnippet),
      ].map(attachTriggerCommand)
    );

--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/autocomplete.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/autocomplete.ts
@ -1151,6 +1151,8 @@ async function getFunctionArgsSuggestions(
    } else {
      fnToIgnore.push(
        ...getFunctionsToIgnoreForStats(command, finalCommandArgIndex),
+        // ignore grouping functions, they are only used for grouping
+        ...getAllFunctions({ type: 'grouping' }).map(({ name }) => name),
        ...(isAggFunctionUsedAlready(command, finalCommandArgIndex)
          ? getAllFunctions({ type: 'agg' }).map(({ name }) => name)
          : [])
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/factories.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/autocomplete/factories.ts
@ -10,7 +10,7 @@
 import { i18n } from '@kbn/i18n';
 import { memoize } from 'lodash';
 import { SuggestionRawDefinition } from './types';
-import { groupingFunctionDefinitions } from '../definitions/grouping';
+import { groupingFunctionDefinitions } from '../definitions/generated/grouping_functions';
 import { aggregationFunctionDefinitions } from '../definitions/generated/aggregation_functions';
 import { scalarFunctionDefinitions } from '../definitions/generated/scalar_functions';
 import { getFunctionSignatures } from '../definitions/helpers';
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/definitions/generated/grouping_functions.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/definitions/generated/grouping_functions.ts
@ -0,0 +1,839 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the "Elastic License
+ * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
+ * Public License v 1"; you may not use this file except in compliance with, at
+ * your election, the "Elastic License 2.0", the "GNU Affero General Public
+ * License v3.0 only", or the "Server Side Public License, v 1".
+ */
+
+/**
+ * __AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY.__
+ *
+ * @note This file is generated by the `generate_function_definitions.ts`
+ * script. Do not edit it manually.
+ *
+ *
+ *
+ *
+ *
+ *
+ *
+ *
+ *
+ *
+ *
+ *
+ */
+
+import { i18n } from '@kbn/i18n';
+import type { FunctionDefinition } from '../types';
+
+// Do not edit this manually... generated by scripts/generate_function_definitions.ts
+const bucketDefinition: FunctionDefinition = {
+  type: 'grouping',
+  name: 'bucket',
+  description: i18n.translate('kbn-esql-validation-autocomplete.esql.definitions.bucket', {
+    defaultMessage:
+      'Creates groups of values - buckets - out of a datetime or numeric input.\nThe size of the buckets can either be provided directly, or chosen based on a recommended count and values range.',
+  }),
+  preview: false,
+  alias: undefined,
+  signatures: [
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'date',
+        },
+        {
+          name: 'buckets',
+          type: 'date_period',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'date',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'date',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'date',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'date',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'date',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'date',
+        },
+        {
+          name: 'buckets',
+          type: 'time_literal',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'date',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'double',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'integer',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'double',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'double',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'integer',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'long',
+        },
+        {
+          name: 'buckets',
+          type: 'integer',
+          constantOnly: true,
+        },
+        {
+          name: 'startDate',
+          type: 'long',
+          constantOnly: true,
+        },
+        {
+          name: 'endDate',
+          type: 'long',
+          constantOnly: true,
+        },
+      ],
+      returnType: 'double',
+    },
+  ],
+  supportedCommands: ['stats', 'inlinestats', 'metrics'],
+  supportedOptions: ['by'],
+  validate: undefined,
+  examples: [
+    'FROM employees\n| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"\n| STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")\n| SORT hire_date',
+    'FROM employees\n| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"\n| STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")\n| SORT month',
+    'FROM employees\n| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"\n| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")\n| SORT week',
+    'FROM employees\n| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"\n| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week)\n| SORT week',
+    'FROM employees\n| STATS COUNT(*) by bs = BUCKET(salary, 20, 25324, 74999)\n| SORT bs',
+    'FROM employees\n| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"\n| STATS c = COUNT(1) BY b = BUCKET(salary, 5000.)\n| SORT b',
+    'FROM sample_data \n| WHERE @timestamp >= NOW() - 1 day and @timestamp < NOW()\n| STATS COUNT(*) BY bucket = BUCKET(@timestamp, 25, NOW() - 1 day, NOW())',
+    'FROM employees\n| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"\n| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")\n| SORT bucket',
+    'FROM employees\n| STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.)\n| SORT b1, b2\n| KEEP s1, b1, s2, b2',
+    'FROM employees\n| STATS dates = MV_SORT(VALUES(birth_date)) BY b = BUCKET(birth_date + 1 HOUR, 1 YEAR) - 1 HOUR\n| EVAL d_count = MV_COUNT(dates)\n| SORT d_count, b\n| LIMIT 3',
+  ],
+};
+
+// Do not edit this manually... generated by scripts/generate_function_definitions.ts
+const categorizeDefinition: FunctionDefinition = {
+  type: 'grouping',
+  name: 'categorize',
+  description: i18n.translate('kbn-esql-validation-autocomplete.esql.definitions.categorize', {
+    defaultMessage: 'Groups text messages into categories of similarly formatted text values.',
+  }),
+  preview: true,
+  alias: undefined,
+  signatures: [
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'keyword',
+          optional: false,
+        },
+      ],
+      returnType: 'keyword',
+    },
+    {
+      params: [
+        {
+          name: 'field',
+          type: 'text',
+          optional: false,
+        },
+      ],
+      returnType: 'keyword',
+    },
+  ],
+  supportedCommands: ['stats', 'inlinestats', 'metrics'],
+  supportedOptions: ['by'],
+  validate: undefined,
+  examples: ['FROM sample_data\n| STATS count=COUNT() BY category=CATEGORIZE(message)'],
+};
+export const groupingFunctionDefinitions = [bucketDefinition, categorizeDefinition];
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/definitions/grouping.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/definitions/grouping.ts
@ -1,124 +0,0 @@
-/*
- * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
- * or more contributor license agreements. Licensed under the "Elastic License
- * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
- * Public License v 1"; you may not use this file except in compliance with, at
- * your election, the "Elastic License 2.0", the "GNU Affero General Public
- * License v3.0 only", or the "Server Side Public License, v 1".
- */
-
-import { i18n } from '@kbn/i18n';
-import { FunctionDefinition, FunctionParameterType, FunctionReturnType } from './types';
-
-const groupingTypeTable: Array<
-  [
-    FunctionParameterType,
-    FunctionParameterType,
-    FunctionParameterType | null,
-    FunctionParameterType | null,
-    FunctionReturnType
-  ]
-> = [
-  // field   // bucket   //from    // to   //result
-  ['date', 'date_period', null, null, 'date'],
-  ['date', 'integer', 'date', 'date', 'date'],
-  // Modified time_duration to time_literal
-  ['date', 'time_literal', null, null, 'date'],
-  ['double', 'double', null, null, 'double'],
-  ['double', 'integer', 'double', 'double', 'double'],
-  ['double', 'integer', 'double', 'integer', 'double'],
-  ['double', 'integer', 'double', 'long', 'double'],
-  ['double', 'integer', 'integer', 'double', 'double'],
-  ['double', 'integer', 'integer', 'integer', 'double'],
-  ['double', 'integer', 'integer', 'long', 'double'],
-  ['double', 'integer', 'long', 'double', 'double'],
-  ['double', 'integer', 'long', 'integer', 'double'],
-  ['double', 'integer', 'long', 'long', 'double'],
-  ['integer', 'double', null, null, 'double'],
-  ['integer', 'integer', 'double', 'double', 'double'],
-  ['integer', 'integer', 'double', 'integer', 'double'],
-  ['integer', 'integer', 'double', 'long', 'double'],
-  ['integer', 'integer', 'integer', 'double', 'double'],
-  ['integer', 'integer', 'integer', 'integer', 'double'],
-  ['integer', 'integer', 'integer', 'long', 'double'],
-  ['integer', 'integer', 'long', 'double', 'double'],
-  ['integer', 'integer', 'long', 'integer', 'double'],
-  ['integer', 'integer', 'long', 'long', 'double'],
-  ['long', 'double', null, null, 'double'],
-  ['long', 'integer', 'double', 'double', 'double'],
-  ['long', 'integer', 'double', 'integer', 'double'],
-  ['long', 'integer', 'double', 'long', 'double'],
-  ['long', 'integer', 'integer', 'double', 'double'],
-  ['long', 'integer', 'integer', 'integer', 'double'],
-  ['long', 'integer', 'integer', 'long', 'double'],
-  ['long', 'integer', 'long', 'double', 'double'],
-  ['long', 'integer', 'long', 'integer', 'double'],
-  ['long', 'integer', 'long', 'long', 'double'],
-];
-export const groupingFunctionDefinitions: FunctionDefinition[] = [
-  {
-    name: 'bucket',
-    alias: ['bin'],
-    description: i18n.translate('kbn-esql-validation-autocomplete.esql.definitions.autoBucketDoc', {
-      defaultMessage: `Automatically bucket dates based on a given range and bucket target.`,
-    }),
-    // type agg because it can also be used as an aggregation...
-    type: 'agg',
-    supportedCommands: ['stats'],
-    supportedOptions: ['by'],
-    signatures: [
-      ...groupingTypeTable.map((signature) => {
-        const [fieldType, bucketType, fromType, toType, resultType] = signature;
-        return {
-          params: [
-            { name: 'field', type: fieldType },
-            { name: 'buckets', type: bucketType, constantOnly: true },
-            ...(fromType ? [{ name: 'startDate', type: fromType, constantOnly: true }] : []),
-            ...(toType ? [{ name: 'endDate', type: toType, constantOnly: true }] : []),
-          ],
-          returnType: resultType,
-        };
-      }),
-    ],
-    examples: [
-      'from index | eval hd = bucket(bytes, 1 hour)',
-      'from index | eval hd = bucket(hire_date, 1 hour)',
-      'from index | eval hd = bucket(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")',
-      'from index | eval hd = bucket(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")',
-      'from index | eval bs = bucket(bytes, 20, 25324, 74999)',
-    ],
-  },
-  {
-    name: 'categorize',
-    alias: ['bin'],
-    description: i18n.translate('kbn-esql-validation-autocomplete.esql.definitions.categorize', {
-      defaultMessage: `Groups text messages into categories of similarly formatted text values.`,
-    }),
-    type: 'agg',
-    supportedCommands: ['stats'],
-    supportedOptions: ['by'],
-    signatures: [
-      {
-        params: [
-          {
-            name: 'field',
-            type: 'keyword',
-            optional: false,
-          },
-        ],
-        returnType: 'keyword',
-      },
-      {
-        params: [
-          {
-            name: 'field',
-            type: 'text',
-            optional: false,
-          },
-        ],
-        returnType: 'keyword',
-      },
-    ],
-    examples: ['FROM sample_data\n| STATS count=COUNT() BY category=CATEGORIZE(message)'],
-  },
-];
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/definitions/types.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/definitions/types.ts
@ -170,7 +170,7 @@ export interface Signature {
 }

 export interface FunctionDefinition {
-  type: 'builtin' | 'agg' | 'scalar' | 'operator';
+  type: 'builtin' | 'agg' | 'scalar' | 'operator' | 'grouping';
  preview?: boolean;
  ignoreAsSuggestion?: boolean;
  name: string;
--- a/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/shared/helpers.ts
+++ b/src/platform/packages/shared/kbn-esql-validation-autocomplete/src/shared/helpers.ts
@ -29,7 +29,7 @@ import { aggregationFunctionDefinitions } from '../definitions/generated/aggrega
 import { builtinFunctions } from '../definitions/builtin';
 import { commandDefinitions } from '../definitions/commands';
 import { scalarFunctionDefinitions } from '../definitions/generated/scalar_functions';
-import { groupingFunctionDefinitions } from '../definitions/grouping';
+import { groupingFunctionDefinitions } from '../definitions/generated/grouping_functions';
 import { getTestFunctions } from './test_functions';
 import { getFunctionSignatures } from '../definitions/helpers';
 import { timeUnits } from '../definitions/literals';
--- a/x-pack/platform/plugins/private/translations/translations/fr-FR.json
+++ b/x-pack/platform/plugins/private/translations/translations/fr-FR.json
--- a/x-pack/platform/plugins/private/translations/translations/ja-JP.json
+++ b/x-pack/platform/plugins/private/translations/translations/ja-JP.json
--- a/x-pack/platform/plugins/private/translations/translations/zh-CN.json
+++ b/x-pack/platform/plugins/private/translations/translations/zh-CN.json
@ -5277,7 +5277,6 @@
    "kbn-esql-validation-autocomplete.esql.definitions.asin": "返回输入数字表达式的反正弦作为角度，以弧度表示。",
    "kbn-esql-validation-autocomplete.esql.definitions.atan": "返回输入数字表达式的反正切作为角度，以弧度表示。",
    "kbn-esql-validation-autocomplete.esql.definitions.atan2": "笛卡儿平面中正 x 轴与从原点到点 (x , y) 构成的射线之间的角度，以弧度表示。",
-    "kbn-esql-validation-autocomplete.esql.definitions.autoBucketDoc": "根据给定范围和存储桶目标自动收集存储桶日期。",
    "kbn-esql-validation-autocomplete.esql.definitions.avg": "数字字段的平均值。",
    "kbn-esql-validation-autocomplete.esql.definitions.byDoc": "依据",
    "kbn-esql-validation-autocomplete.esql.definitions.case": "接受成对的条件和值。此函数返回属于第一个评估为 `true` 的条件的值。如果参数数量为奇数，则最后一个参数为在无条件匹配时返回的默认值。",
@ -5533,8 +5532,6 @@
    "languageDocumentation.documentationESQL.atan.markdown": "<!-- This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it. --> ### ATAN 以角度形式返回输入数字表达式的反正切，以弧度表示。``` ROW a=12.9 | EVAL atan=ATAN(a) ```",
    "languageDocumentation.documentationESQL.atan2": "ATAN2",
    "languageDocumentation.documentationESQL.atan2.markdown": "<!-- This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it. --> ### ATAN2 笛卡儿平面中正 x 轴与从原点到点 (x , y) 构成的射线之间的角度，以弧度表示。``` ROW y=12.9, x=.6 | EVAL atan2=ATAN2(y, x) ```",
-    "languageDocumentation.documentationESQL.autoBucketFunction": "BUCKET",
-    "languageDocumentation.documentationESQL.autoBucketFunction.markdown": "### BUCKET 用日期时间或数字输入创建值（存储桶）的分组。存储桶的大小可以直接提供，或基于建议的计数和值范围进行选择。`BUCKET` 以两种模式运行：1.在此模式下基于存储桶计数建议（四个参数）和范围计算存储桶的大小。2.在此模式下直接提供存储桶大小（两个参数）。使用存储桶的目标数量、起始范围和结束范围，`BUCKET` 将选取适当的存储桶大小以生成目标数量或更小数量的存储桶。例如，一年请求多达 20 个存储桶会按每月时间间隔组织数据：``` FROM employees | WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\" | STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\") | SORT hire_date ``` **注意**：目标并不是提供存储桶的确切目标数量，而是选择一个范围，最多提供存储桶的目标数量。可以组合 `BUCKET` 与聚合以创建直方图：``` FROM employees | WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\" | STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\") | SORT month ``` **注意**：`BUCKET` 不会创建未匹配任何文档的存储桶。因此，上一示例缺少 `1985-03-01` 和其他日期。如果需要更多存储桶，可能导致更小的范围。例如，如果一年内最多请求 100 个存储桶，则会生成周期为周的存储桶：``` FROM employees | WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\" | STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\") | SORT week ``` **注意**：`BUCKET` 不筛选任何行。它只会使用提供的范围来选取适当的存储桶大小。对于值超出范围的行，它会返回与超出范围的存储桶对应的存储桶值。组合 `BUCKET` 与 `WHERE` 可筛选行。如果提前已知所需存储桶大小，则只需提供它作为第二个参数，而忽略范围：``` FROM employees | WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\" | STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week) | SORT week ``` **注意**：提供存储桶大小作为第二个参数时，它必须为持续时间或日期期间。`BUCKET` 还可对数字字段执行操作。例如，要创建工资直方图：``` FROM employees | STATS COUNT(*) by bs = BUCKET(salary, 20, 25324, 74999) | SORT bs ```与前面有意筛选日期范围的示例不同，您极少想要筛选数值范围。您必须分别查找最小值和最大值。ES|QL 尚未提供简便方法来自动执行此操作。如果提前已知所需存储桶大小，则可以忽略该范围。只需提供它作为第二个参数即可：``` FROM employees | WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\" | STATS c = COUNT(1) BY b = BUCKET(salary, 5000.) | SORT b ``` **注意**：提供存储桶大小作为第二个参数时，它必须为 **浮点类型**。这里提供了一个示例，用于为过去 24 小时创建小时存储桶，并计算每小时的事件数：``` FROM sample_data | WHERE @timestamp >= NOW() - 1 day and @timestamp < NOW() | STATS COUNT(*) BY bucket = BUCKET(@timestamp, 25, NOW() - 1 day, NOW()) ```这里提供了一个示例，用于为 1985 年创建月度存储桶，并按聘用月份计算平均工资：``` FROM employees | WHERE hire_date >= \"1985-01-01T00:00:00Z\" AND hire_date < \"1986-01-01T00:00:00Z\" | STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, \"1985-01-01T00:00:00Z\", \"1986-01-01T00:00:00Z\") | SORT bucket ``` `BUCKET` 可用在 `STATS … BY …` 命令的聚合和分组部分， 前提是在聚合部分中，该函数 **由在分组部分中定义的别名引用**，或使用完全相同的表达式调用。例如：``` FROM employees | STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.) | SORT b1, b2 | KEEP s1, b1, s2, b2 ```",
    "languageDocumentation.documentationESQL.avg": "AVG",
    "languageDocumentation.documentationESQL.avg.markdown": "<!-- This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it. --> ### AVG 数字字段的平均值。``` FROM employees | STATS AVG(height) ```",
    "languageDocumentation.documentationESQL.binaryOperators": "二进制运算符",