mirror of
https://github.com/elastic/kibana.git
synced 2025-04-24 01:38:56 -04:00
[ES|QL] AST package documentation (#194296)
Updates documentation for the ES|QL AST package.
This commit is contained in:
parent
896dce358c
commit
5def848d2c
6 changed files with 578 additions and 111 deletions
|
@ -1,89 +1,38 @@
|
|||
# ES|QL utility library
|
||||
# ES|QL AST library
|
||||
|
||||
## Folder structure
|
||||
The general idea of this package is to provide low-level ES|QL parsing,
|
||||
building, traversal, pretty-printing, and manipulation features on top of a
|
||||
custom compact AST representation, which is designed to be resilient to many
|
||||
grammar changes.
|
||||
|
||||
This library brings all the foundation data structure to enable all advanced features within an editor for ES|QL as validation, autocomplete, hover, etc...
|
||||
The package is structure as follow:
|
||||
Contents of this package:
|
||||
|
||||
```
|
||||
src
|
||||
|- antlr // => contains the ES|QL grammar files and various compilation assets
|
||||
| ast_factory.ts // => binding to the Antlr that generates the AST data structure
|
||||
| ast_errors.ts // => error translation utility from raw Antlr to something understandable (somewhat)
|
||||
| antlr_error_listener.ts // => The ES|QL syntax error listener
|
||||
| antlr_facade.ts // => getParser and getLexer utilities
|
||||
| ... // => miscellaneas utilities to work with AST
|
||||
- [`builder` — Contains the `Builder` class for AST node construction](./src/builder/README.md).
|
||||
- [`parser` — Contains text to ES|QL AST parsing code](./src/parser/README.md).
|
||||
- [`walker` — Contains the ES|QL AST `Walker` utility](./src/walker/README.md).
|
||||
- [`visitor` — Contains the ES|QL AST `Visitor` utility](./src/visitor/README.md).
|
||||
- [`pretty_print` — Contains code for formatting AST to text](./src/pretty_print/README.md).
|
||||
|
||||
|
||||
## Demo
|
||||
|
||||
Much of the functionality of this package is demonstrated in the demo UI. You
|
||||
can run it in Storybook, using the following command:
|
||||
|
||||
```bash
|
||||
yarn storybook esql_ast_inspector
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
Alternatively, you can start Kibana with *Example Plugins* enabled, using:
|
||||
|
||||
#### Get AST from a query string
|
||||
|
||||
This module contains the entire logic to translate from a query string into the AST data structure.
|
||||
The `getAstAndSyntaxErrors` function returns the AST data structure, unless a syntax error happens in which case the `errors` array gets populated with a Syntax error.
|
||||
|
||||
##### Usage
|
||||
|
||||
```js
|
||||
import { getAstAndSyntaxErrors } from '@kbn/esql-ast';
|
||||
|
||||
const queryString = "from index | stats 1 + avg(myColumn) ";
|
||||
const { ast, errors} = await astProvider(queryString);
|
||||
|
||||
if(errors){
|
||||
console.log({ syntaxErrors: errors });
|
||||
}
|
||||
// do stuff with the ast
|
||||
```bash
|
||||
yarn start --run-examples
|
||||
```
|
||||
|
||||
## How does it work
|
||||
Then navigate to the *ES|QL AST Inspector* plugin in the Kibana UI.
|
||||
|
||||
The general idea of this package is to provide all ES|QL features on top of a custom compact AST definition (all data structure types defined in `./types.ts`) which is designed to be resilient to many grammar changes.
|
||||
The pipeline is the following:
|
||||
|
||||
```
|
||||
Antlr grammar files
|
||||
=> Compiled grammar files (.ts assets in the antlr folder)
|
||||
=> AST Factory (Antlr Parser tree => custom AST)
|
||||
```
|
||||
## Keeping ES|QL AST library up to date
|
||||
|
||||
Each feature function works with the combination of the AST and the definition files: the former describe the current statement in a easy to traverse way, while the definitions describe what's the expected behaviour of each node in the AST node (i.e. what arguments should it accept? How many arguments? etc...).
|
||||
While AST requires the grammar to be compiled to be updated, definitions are static files which can be dynamically updated without running the ANTLR compile task.
|
||||
|
||||
#### AST
|
||||
|
||||
The AST is generated by 2 files: `ast_factory.ts` and its buddy `ast_walker.ts`:
|
||||
* `ast_factory.ts` is a binding to Antlr and access the Parser tree
|
||||
* Parser tree is passed over to `ast_walker` to append new AST nodes
|
||||
|
||||
In general Antlr is resilient to grammar errors, in the sense that it can produe a Parser tree up to the point of the error, then stops. This is useful to perform partial tasks even with broken queries and this means that a partial AST can be produced even with an invalid query.
|
||||
|
||||
### Keeping ES|QL up to date
|
||||
|
||||
In general when operating on changes here use the `yarn kbn watch` in a terminal window to make sure changes are correctly compiled.
|
||||
|
||||
### How to add new commands/options
|
||||
|
||||
When a new command/option is added to ES|QL it is done via a grammar update.
|
||||
Therefore adding them requires a two step phase:
|
||||
* Update the grammar with the new one
|
||||
* add/fix all AST generator bindings in case of new/changed TOKENS in the `lexer` grammar file
|
||||
* Update the definition files for commands/options
|
||||
|
||||
To update the grammar:
|
||||
1. Make sure the `lexer` and `parser` files are up to date with their ES counterparts
|
||||
* an existing Kibana CI job is updating them already automatically
|
||||
2. Run the script into the `package.json` to compile the ES|QL grammar.
|
||||
3. open the `ast_factory.ts` file and add a new `exit<Command/Option>` method
|
||||
4. write some code in the `ast_walker/ts` to translate the Antlr Parser tree into the custom AST (there are already few utilites for that, but sometimes it is required to write some more code if the `parser` introduced a new flow)
|
||||
* pro tip: use the `http://lab.antlr.org/` to visualize/debug the parser tree for a given statement (copy and paste the grammar files there)
|
||||
5. if something goes wrong with new quoted/unquoted identifier token, open the `ast_helpers.ts` and check the ids of the new tokens in the `getQuotedText` and `getUnquotedText` functions - please make sure to leave a comment on the token name
|
||||
|
||||
#### Debug and fix grammar changes (tokens, etc...)
|
||||
|
||||
On TOKEN renaming or with subtle `lexer` grammar changes it can happens that test breaks, this can be happen for two main issues:
|
||||
* A TOKEN name changed so the `ast_walker.ts` doesn't find it any more. Go there and rename the TOKEN name.
|
||||
* TOKEN order changed and tests started failing. This probably generated some TOKEN id reorder and there are two functions in `ast_helpers.ts` who rely on hardcoded ids: `getQuotedText` and `getUnquotedText`.
|
||||
* Note that the `getQuotedText` and `getUnquotedText` are automatically updated on grammar changes detected by the Kibana CI sync job.
|
||||
* to fix this just look at the commented tokens and update the ids. If a new token add it and leave a comment to point to the new token name.
|
||||
* This choice was made to reduce the bundle size, as importing the `esql_parser` adds some hundreds of Kbs to the bundle otherwise.
|
||||
In general when operating on changes here use the `yarn kbn watch` in a terminal
|
||||
window to make sure changes are correctly compiled.
|
||||
|
|
39
packages/kbn-esql-ast/src/builder/README.md
Normal file
39
packages/kbn-esql-ast/src/builder/README.md
Normal file
|
@ -0,0 +1,39 @@
|
|||
# Builder
|
||||
|
||||
Contains the `Builder` class for AST node construction. It provides the most
|
||||
low-level stateless AST node construction API.
|
||||
|
||||
The `Builder` API can be used when constructing AST nodes from scratch manually,
|
||||
and it is also used by the parser to construct the AST nodes during the parsing
|
||||
process.
|
||||
|
||||
When parsing the AST nodes will typically have more information, such as the
|
||||
position in the source code, and other metadata. When constructing the AST nodes
|
||||
manually, this information is not available, but the `Builder` API can still be
|
||||
used as it permits to skip the metadata.
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
Construct a `literal` expression node:
|
||||
|
||||
```typescript
|
||||
import { Builder } from '@kbn/esql-ast';
|
||||
|
||||
const node = Builder.expression.literal.numeric({ value: 42, literalType: 'integer' });
|
||||
```
|
||||
|
||||
Returns:
|
||||
|
||||
```js
|
||||
{
|
||||
type: 'literal',
|
||||
literalType: 'integer',
|
||||
value: 42,
|
||||
name: '42',
|
||||
|
||||
location: { min: 0, max: 0 },
|
||||
text: '',
|
||||
incomplete: false,
|
||||
}
|
||||
```
|
|
@ -1,6 +1,91 @@
|
|||
# ES|QL Parser
|
||||
|
||||
The Kibana ES|QL parser uses the ANTLR library for lexing and parse tree (CST)
|
||||
generation. The ANTLR grammar is imported from the Elasticsearch repository in
|
||||
an automated CI job.
|
||||
|
||||
We use the ANTLR outputs: (1) the token stream; and (2) the parse tree to
|
||||
generate (1) the Abstract Syntax Tree (AST), (2) for syntax validation, (3) for
|
||||
syntax highlighting, and (4) for formatting (comment and whitespace) extraction
|
||||
and assignment to AST nodes.
|
||||
|
||||
In general ANTLR is resilient to grammar errors, in the sense that it can
|
||||
produce a Parser tree up to the point of the error, then stops. This is useful
|
||||
to perform partial tasks even with broken queries and this means that a partial
|
||||
AST can be produced even with an invalid query.
|
||||
|
||||
|
||||
## Folder structure
|
||||
|
||||
The parser is structured as follows:
|
||||
|
||||
```
|
||||
src/
|
||||
|- parser/ Contains the logic to parse the ES|QL query and generate the AST.
|
||||
| |- factories.ts Contains AST node factories.
|
||||
| |- antlr_error_listener.ts Contains code which traverses ANTLR CST and collects syntax errors.
|
||||
| |- esql_ast_builder_listener.ts Contains code which traverses ANTLR CST and builds the AST.
|
||||
|
|
||||
|- antlr/ Contains the autogenerated ES|QL ANTLR grammar files and various compilation assets.
|
||||
|- esql_lexer.g4 Contains the ES|QL ANTLR lexer grammar.
|
||||
|- esql_parser.g4 Contains the ES|QL ANTLR parser grammar.
|
||||
```
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
### Get AST from a query string
|
||||
|
||||
The `parse` function returns the AST data structure, unless a syntax error
|
||||
happens in which case the `errors` array gets populated with a Syntax errors.
|
||||
|
||||
```js
|
||||
import { parse } from '@kbn/esql-ast';
|
||||
|
||||
const src = "FROM index | STATS 1 + AVG(myColumn) ";
|
||||
const { root, errors } = await parse(src);
|
||||
|
||||
if(errors){
|
||||
console.log({ syntaxErrors: errors });
|
||||
}
|
||||
|
||||
// do stuff with the ast
|
||||
```
|
||||
|
||||
The `root` is the root node of the AST. The AST is a tree structure where each
|
||||
node represents a part of the query. Each node has a `type` property which
|
||||
indicates the type of the node.
|
||||
|
||||
|
||||
### Parse a query and populate the AST with comments
|
||||
|
||||
When calling the `parse` method with the `withFormatting` flag set to `true`,
|
||||
the AST will be populated with comments.
|
||||
|
||||
```js
|
||||
import { parse } from '@kbn/esql-ast';
|
||||
|
||||
const src = "FROM /* COMMENT */ index";
|
||||
const { root } = await parse(src, { withFormatting: true });
|
||||
```
|
||||
|
||||
|
||||
## Comments
|
||||
|
||||
### Inter-node comment places
|
||||
By default, when parsing the AST does not include any *formatting* information,
|
||||
such as comments or whitespace. This is because the AST is designed to be
|
||||
compact and to be used for syntax validation, syntax highlighting, and other
|
||||
high-level operations.
|
||||
|
||||
However, sometimes it is useful to have comments attached to the AST nodes. The
|
||||
parser can collect all comments when the `withFormatting` flag is set to `true`
|
||||
and attach them to the AST nodes. The comments are attached to the closest node,
|
||||
while also considering the surrounding punctuation.
|
||||
|
||||
### Inter-node comments
|
||||
|
||||
Currently, when parsed inter-node comments are attached to the node from the
|
||||
left side.
|
||||
|
||||
Around colon in source identifier:
|
||||
|
||||
|
@ -25,3 +110,60 @@ Time interface expressions:
|
|||
```eslq
|
||||
STATS 1 /* asdf */ DAY
|
||||
```
|
||||
|
||||
|
||||
## Internal Details
|
||||
|
||||
|
||||
### How does it work?
|
||||
|
||||
The pipeline is the following:
|
||||
|
||||
1. ANTLR grammar files are added to Kibana.
|
||||
2. ANTLR grammar files are compiled to `.ts` assets in the `antlr` folder.
|
||||
3. A query is parsed to a CST by ANTLR.
|
||||
4. The `ESQLAstBuilderListener` traverses the CST and builds the AST.
|
||||
5. Optionally:
|
||||
1. Comments and whitespace are extracted from the ANTLR lexer's token stream.
|
||||
2. The comments and whitespace are attached to the AST nodes.
|
||||
|
||||
|
||||
### How to add new commands/options?
|
||||
|
||||
When a new command/option is added to ES|QL it is done via a grammar update.
|
||||
Therefore adding them requires a two step phase:
|
||||
|
||||
To update the grammar:
|
||||
|
||||
1. Make sure the `lexer` and `parser` files are up to date with their ES
|
||||
counterparts.
|
||||
* an existing Kibana CI job is updating them already automatically
|
||||
2. Run the script into the `package.json` to compile the ES|QL grammar.
|
||||
3. open the `ast_factory.ts` file and add a new `exit<Command/Option>` method
|
||||
4. write some code in the `ast_walker/ts` to translate the Antlr Parser tree
|
||||
into the custom AST (there are already few utilites for that, but sometimes
|
||||
it is required to write some more code if the `parser` introduced a new flow)
|
||||
* pro tip: use the `http://lab.antlr.org/` to visualize/debug the parser tree
|
||||
for a given statement (copy and paste the grammar files there)
|
||||
5. if something goes wrong with new quoted/unquoted identifier token, open
|
||||
the `ast_helpers.ts` and check the ids of the new tokens in the `getQuotedText`
|
||||
and `getUnquotedText` functions, please make sure to leave a comment on the
|
||||
token name
|
||||
|
||||
|
||||
#### Debug and fix grammar changes (tokens, etc...)
|
||||
|
||||
On token renaming or with subtle `lexer` grammar changes it can happens that
|
||||
test breaks, this can be happen for two main issues:
|
||||
|
||||
* A token name changed so the `esql_ast_builder_listener.ts` doesn't find it any
|
||||
more. Go there and rename the TOKEN name.
|
||||
* Token order changed and tests started failing. This probably generated some
|
||||
token id reorder and there are two functions in `helpers.ts` who rely on
|
||||
hardcoded ids: `getQuotedText` and `getUnquotedText`.
|
||||
* Note that the `getQuotedText` and `getUnquotedText` are automatically
|
||||
updated on grammar changes detected by the Kibana CI sync job.
|
||||
* to fix this just look at the commented tokens and update the ids. If a new
|
||||
token add it and leave a comment to point to the new token name.
|
||||
* This choice was made to reduce the bundle size, as importing the
|
||||
`esql_parser` adds some hundreds of Kbs to the bundle otherwise.
|
||||
|
|
|
@ -4,20 +4,82 @@
|
|||
human-readable string. This is useful for debugging or for displaying
|
||||
the AST to the user.
|
||||
|
||||
This module provides a number of pretty-printing options.
|
||||
This module provides a number of pretty-printing facilities. There are two
|
||||
main classes that provide pretty-printing:
|
||||
|
||||
- `BasicPrettyPrinter` — provides the basic pretty-printing to a single
|
||||
line.
|
||||
- `WrappingPrettyPrinter` — provides more advanced pretty-printing, which
|
||||
can wrap the query to multiple lines, and can also wrap the query to a
|
||||
specific width.
|
||||
|
||||
|
||||
## `BasicPrettyPrinter`
|
||||
|
||||
The `BasicPrettyPrinter` class provides the most basic pretty-printing—it
|
||||
prints a query to a single line. Or it can print a query with each command on
|
||||
a separate line, with the ability to customize the indentation before the pipe
|
||||
character.
|
||||
The `BasicPrettyPrinter` class provides the simpler pretty-printing
|
||||
functionality—it prints a query to a single line. Or, it can print a query
|
||||
with each command on a separate line, with the ability to customize the
|
||||
indentation before the pipe character.
|
||||
|
||||
Usage:
|
||||
|
||||
```typescript
|
||||
import { parse, BasicPrettyPrinter } from '@kbn/esql-ast';
|
||||
|
||||
const src = 'FROM index | LIMIT 10';
|
||||
const { root } = parse(src);
|
||||
const text = BasicPrettyPrinter.print(root);
|
||||
|
||||
console.log(text); // FROM index | LIMIT 10
|
||||
```
|
||||
|
||||
It can print each command on a separate line, with a custom indentation before
|
||||
the pipe character:
|
||||
|
||||
```typescript
|
||||
const text = BasicPrettyPrinter.multiline(root, { pipeTab: ' ' });
|
||||
```
|
||||
|
||||
It can also print a single command to a single line; or an expression to a
|
||||
single line.
|
||||
single line. Below is the summary of the top-level functions:
|
||||
|
||||
- `BasicPrettyPrinter.print()` — prints query to a single line.
|
||||
- `BasicPrettyPrinter.multiline()` — prints a query to multiple lines.
|
||||
- `BasicPrettyPrinter.command()` — prints a command to a single line.
|
||||
- `BasicPrettyPrinter.expression()` — prints an expression to a single line.
|
||||
- `BasicPrettyPrinter.expression()` — prints an expression to a single
|
||||
line.
|
||||
|
||||
See `BasicPrettyPrinterOptions` for formatting options. For example, a
|
||||
`lowercase` options allows you to lowercase all ES|QL keywords:
|
||||
|
||||
```typescript
|
||||
const text = BasicPrettyPrinter.print(root, { lowercase: true });
|
||||
```
|
||||
|
||||
The `BasicPrettyPrinter` prints only *left* and *right* multi-line comments,
|
||||
which do not have line breaks, as this formatter is designed to print a query
|
||||
to a single line. If you need to print a query to multiple lines, use the
|
||||
`WrappingPrettyPrinter`.
|
||||
|
||||
|
||||
## `WrappingPrettyPrinter`
|
||||
|
||||
The *wrapping pretty printer* can print a query to multiple lines, and can wrap
|
||||
the text to a new line if the line width exceeds a certain threshold. It also
|
||||
prints all comments attached to the AST (including ones that force the text
|
||||
to be wrapped).
|
||||
|
||||
Usage:
|
||||
|
||||
```typescript
|
||||
import { parse, WrappingPrettyPrinter } from '@kbn/esql-ast';
|
||||
|
||||
const src = `
|
||||
FROM index /* this is a comment */
|
||||
| LIMIT 10`;
|
||||
const { root } = parse(src, { withFormatting: true });
|
||||
const text = WrappingPrettyPrinter.print(root);
|
||||
```
|
||||
|
||||
See `WrappingPrettyPrinterOptions` interface for available formatting options.
|
||||
|
||||
|
|
|
@ -1,4 +1,28 @@
|
|||
## High-level AST structure
|
||||
# `Visitor` Traversal API
|
||||
|
||||
The `Visitor` traversal API provides a feature-rich way to traverse the ES|QL
|
||||
AST. It is more powerful than the [`Walker` API](../walker/README.md), as it
|
||||
allows to traverse the AST in a more flexible way.
|
||||
|
||||
The `Visitor` API allows to traverse the AST starting from the root node or a
|
||||
command statement, or an expression. Unlike in the `Walker` API, the `Visitor`
|
||||
does not automatically traverse the entire AST. Instead, the developer has to
|
||||
manually call the necessary *visit* methods to traverse the AST. This allows
|
||||
to traverse the AST in a more flexible way: only traverse the parts of the AST
|
||||
that are needed, or maybe traverse the AST in a different order, or multiple
|
||||
times.
|
||||
|
||||
The `Visitor` API is also more powerful than the `Walker` API, as for each
|
||||
visitor callback it provides a *context* object, which contains the information
|
||||
about the current node as well as the parent node, and the whole parent chain
|
||||
up to the root node.
|
||||
|
||||
In addition, each visitor callback can return a value (*output*), which is then
|
||||
passed to the parent node, in the place where the visitor was called. Also, when
|
||||
a child is visited, the parent node can pass in *input* to the child visitor.
|
||||
|
||||
|
||||
## About ES|QL AST structure
|
||||
|
||||
Broadly, there are two AST node types: (1) commands (say `FROM ...`, like
|
||||
*statements* in other languages), and (2) expressions (say `a + b`, or `fn()`).
|
||||
|
@ -59,7 +83,8 @@ As of this writing, the following expressions are defined:
|
|||
- Column identifier expression, `{type: "column"}`, like `@timestamp`
|
||||
- Function call expression, `{type: "function"}`, like `fn(123)`
|
||||
- Literal expression, `{type: "literal"}`, like `123`, `"hello"`
|
||||
- List literal expression, `{type: "list"}`, like `[1, 2, 3]`, `["a", "b", "c"]`, `[true, false]`
|
||||
- List literal expression, `{type: "list"}`, like `[1, 2, 3]`,
|
||||
`["a", "b", "c"]`, `[true, false]`
|
||||
- Time interval expression, `{type: "interval"}`, like `1h`, `1d`, `1w`
|
||||
- Inline cast expression, `{type: "cast"}`, like `abc::int`, `def::string`
|
||||
- Unknown node, `{type: "unknown"}`
|
||||
|
@ -67,3 +92,176 @@ As of this writing, the following expressions are defined:
|
|||
Each expression has a `visitExpressionX` callback, where `X` is the type of the
|
||||
expression. If a expression-specific callback is not found, the generic
|
||||
`visitExpression` callback is called.
|
||||
|
||||
|
||||
## `Visitor` API Usage
|
||||
|
||||
The `Visitor` API is used to traverse the AST. The process is as follows:
|
||||
|
||||
1. Create a new `Visitor` instance.
|
||||
2. Register callbacks for the nodes you are interested in.
|
||||
3. Call the `visitQuery`, `visitCommand`, or `visitExpression` method to start
|
||||
the traversal.
|
||||
|
||||
For example, the below code snippet prints the type of each expression node:
|
||||
|
||||
```typescript
|
||||
new Visitor()
|
||||
.on('visitExpression', (ctx) => console.log(ctx.node.type))
|
||||
.on('visitCommand', (ctx) => [...ctx.visitArguments()])
|
||||
.on('visitQuery', (ctx) => [...ctx.visitCommands()])
|
||||
.visitQuery(root);
|
||||
```
|
||||
|
||||
In the `visitQuery` callback it visits all commands, using the `visitCommands`.
|
||||
In the `visitCommand` callback it visits all arguments, using the
|
||||
`visitArguments`. And finally, in the `visitExpression` callback it prints the
|
||||
type of the expression node.
|
||||
|
||||
Above we started the traversal from the root node, using the `.visitQuery(root)`
|
||||
method. However, one can start the traversal from any node, by calling the
|
||||
following methods:
|
||||
|
||||
- `.visitQuery()` — Start traversal from the root node.
|
||||
- `.visitCommand()` — Start traversal from a command node.
|
||||
- `.visitExpression()` — Start traversal from an expression node.
|
||||
|
||||
|
||||
### Specifying Callbacks
|
||||
|
||||
The simplest way to traverse the AST is to specify the below three callbacks:
|
||||
|
||||
- `visitQuery` — Called for every query node. (Normally once.)
|
||||
- `visitCommand` — Called for every command node.
|
||||
- `visitExpression` — Called for every expression node.
|
||||
|
||||
|
||||
However, you can be more specific and specify callbacks for commands and
|
||||
expression types. This way the context `ctx` provided to the callback will have
|
||||
helpful methods specific to the node type.
|
||||
|
||||
When a more specific callback is not found, the generic `visitCommand` or
|
||||
`visitExpression` callbacks are not called for that node.
|
||||
|
||||
You can specify a specific callback for each command, instead of the generic
|
||||
`visitCommand`:
|
||||
|
||||
- `visitFromCommand` — Called for every `FROM` command node.
|
||||
- `visitLimitCommand` — Called for every `LIMIT` command node.
|
||||
- `visitExplainCommand` — Called for every `EXPLAIN` command node.
|
||||
- `visitRowCommand` — Called for every `ROW` command node.
|
||||
- `visitMetricsCommand` — Called for every `METRICS` command node.
|
||||
- `visitShowCommand` — Called for every `SHOW` command node.
|
||||
- `visitMetaCommand` — Called for every `META` command node.
|
||||
- `visitEvalCommand` — Called for every `EVAL` command node.
|
||||
- `visitStatsCommand` — Called for every `STATS` command node.
|
||||
- `visitInlineStatsCommand` — Called for every `INLINESTATS` command node.
|
||||
- `visitLookupCommand` — Called for every `LOOKUP` command node.
|
||||
- `visitKeepCommand` — Called for every `KEEP` command node.
|
||||
- `visitSortCommand` — Called for every `SORT` command node.
|
||||
- `visitWhereCommand` — Called for every `WHERE` command node.
|
||||
- `visitDropCommand` — Called for every `DROP` command node.
|
||||
- `visitRenameCommand` — Called for every `RENAME` command node.
|
||||
- `visitDissectCommand` — Called for every `DISSECT` command node.
|
||||
- `visitGrokCommand` — Called for every `GROK` command node.
|
||||
- `visitEnrichCommand` — Called for every `ENRICH` command node.
|
||||
- `visitMvExpandCommand` — Called for every `MV_EXPAND` command node.
|
||||
|
||||
Similarly, you can specify a specific callback for each expression type, instead
|
||||
of the generic `visitExpression`:
|
||||
|
||||
- `visitColumnExpression` — Called for every column expression node, say
|
||||
`@timestamp`.
|
||||
- `visitSourceExpression` — Called for every source expression node, say
|
||||
`tsdb_index`.
|
||||
- `visitFunctionCallExpression` — Called for every function call
|
||||
expression node. Including binary expressions, such as `a + b`.
|
||||
- `visitLiteralExpression` — Called for every literal expression node, say
|
||||
`123`, `"hello"`.
|
||||
- `visitListLiteralExpression` — Called for every list literal expression
|
||||
node, say `[1, 2, 3]`, `["a", "b", "c"]`.
|
||||
- `visitTimeIntervalLiteralExpression` — Called for every time interval
|
||||
literal expression node, say `1h`, `1d`, `1w`.
|
||||
- `visitInlineCastExpression` — Called for every inline cast expression
|
||||
node, say `abc::int`, `def::string`.
|
||||
- `visitRenameExpression` — Called for every rename expression node, say
|
||||
`a AS b`.
|
||||
- `visitOrderExpression` — Called for every order expression node, say
|
||||
`@timestamp ASC`.
|
||||
|
||||
|
||||
### Using the Node Context
|
||||
|
||||
Each visitor callback receives a `ctx` object, which contains the reference to
|
||||
the parent node's context:
|
||||
|
||||
```typescript
|
||||
new Visitor()
|
||||
.on('visitExpression', (ctx) => {
|
||||
ctx.parent
|
||||
});
|
||||
```
|
||||
|
||||
Each visitor callback also contains various methods to visit the children nodes,
|
||||
if needed. For example, to visit all arguments of a command node:
|
||||
|
||||
```typescript
|
||||
const expressions = [];
|
||||
|
||||
new Visitor()
|
||||
.on('visitExpression', (ctx) => expressions.push(ctx.node));
|
||||
.on('visitCommand', (ctx) => {
|
||||
for (const output of ctx.visitArguments()) {
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
The node context object may also have node specific methods. For example, the
|
||||
`LIMIT` command context has the `.numeric()` method, which returns the numeric
|
||||
value of the `LIMIT` command:
|
||||
|
||||
```typescript
|
||||
new Visitor()
|
||||
.on('visitLimitCommand', (ctx) => {
|
||||
console.log(ctx.numeric());
|
||||
})
|
||||
.on('visitCommand', () => null)
|
||||
.on('visitQuery', (ctx) => [...ctx.visitCommands()])
|
||||
.visitQuery(root);
|
||||
```
|
||||
|
||||
|
||||
### Using the Visitor Output
|
||||
|
||||
Each visitor callback can return a *output*, which is then passed to the parent
|
||||
callback. This allows to pass information from the child node to the parent
|
||||
node.
|
||||
|
||||
For example, the below code snippet collects all column names in the AST:
|
||||
|
||||
```typescript
|
||||
const columns = new Visitor()
|
||||
.on('visitExpression', (ctx) => null)
|
||||
.on('visitColumnExpression', (ctx) => ctx.node.name)
|
||||
.on('visitCommand', (ctx) => [...ctx.visitArguments()])
|
||||
.on('visitQuery', (ctx) => [...ctx.visitCommands()])
|
||||
.visitQuery(root);
|
||||
```
|
||||
|
||||
|
||||
### Using the Visitor Input
|
||||
|
||||
Analogous to the output, each visitor callback can receive an *input* value.
|
||||
This allows to pass information from the parent node to the child node.
|
||||
|
||||
For example, the below code snippet prints all column names prefixed with the
|
||||
text `"prefix"`:
|
||||
|
||||
```typescript
|
||||
new Visitor()
|
||||
.on('visitExpression', (ctx) => null)
|
||||
.on('visitColumnExpression', (ctx, INPUT) => console.log(INPUT + ctx.node.name))
|
||||
.on('visitCommand', (ctx) => [...ctx.visitArguments("prefix")])
|
||||
.on('visitQuery', (ctx) => [...ctx.visitCommands()])
|
||||
.visitQuery(root);
|
||||
```
|
||||
|
|
|
@ -1,41 +1,118 @@
|
|||
# ES|QL AST Walker
|
||||
# `Walker` Traversal API
|
||||
|
||||
The ES|QL AST Walker is a utility that traverses the ES|QL AST and provides a
|
||||
set of callbacks that can be used to perform introspection of the AST.
|
||||
The ES|QL AST `Walker` is a utility that traverses the ES|QL AST. The developer
|
||||
can provide a set of callbacks which are called when the walker visits a
|
||||
specific type of node.
|
||||
|
||||
The `Walker` utility allows to traverse the AST starting from any node, not just
|
||||
the root node.
|
||||
|
||||
|
||||
## Low-level API
|
||||
|
||||
To start a new *walk* you create a `Walker` instance and call the `walk()` method
|
||||
with the AST node to start the walk from.
|
||||
|
||||
```ts
|
||||
|
||||
import { Walker, getAstAndSyntaxErrors } from '@kbn/esql-ast';
|
||||
import { Walker } from '@kbn/esql-ast';
|
||||
|
||||
const walker = new Walker({
|
||||
// Called every time a function node is visited.
|
||||
visitFunction: (fn) => {
|
||||
/**
|
||||
* Visit commands
|
||||
*/
|
||||
visitCommand: (node: ESQLCommand) => {
|
||||
// Called for every command node.
|
||||
},
|
||||
visitCommandOption: (node: ESQLCommandOption) => {
|
||||
// Called for every command option node.
|
||||
},
|
||||
|
||||
/**
|
||||
* Visit expressions
|
||||
*/
|
||||
visitFunction: (fn: ESQLFunction) => {
|
||||
// Called every time a function expression is visited.
|
||||
console.log('Function:', fn.name);
|
||||
},
|
||||
// Called every time a source identifier node is visited.
|
||||
visitSource: (source) => {
|
||||
visitSource: (source: ESQLSource) => {
|
||||
// Called every time a source identifier expression is visited.
|
||||
console.log('Source:', source.name);
|
||||
},
|
||||
visitQuery: (node: ESQLAstQueryExpression) => {
|
||||
// Called for every query node.
|
||||
},
|
||||
visitColumn: (node: ESQLColumn) => {
|
||||
// Called for every column node.
|
||||
},
|
||||
visitLiteral: (node: ESQLLiteral) => {
|
||||
// Called for every literal node.
|
||||
},
|
||||
visitListLiteral: (node: ESQLList) => {
|
||||
// Called for every list literal node.
|
||||
},
|
||||
visitTimeIntervalLiteral: (node: ESQLTimeInterval) => {
|
||||
// Called for every time interval literal node.
|
||||
},
|
||||
visitInlineCast: (node: ESQLInlineCast) => {
|
||||
// Called for every inline cast node.
|
||||
},
|
||||
});
|
||||
|
||||
const { ast } = getAstAndSyntaxErrors('FROM source | STATS fn()');
|
||||
walker.walk(ast);
|
||||
```
|
||||
|
||||
Conceptual structure of an ES|QL AST:
|
||||
It is also possible to provide a single `visitAny` callback that is called for
|
||||
any node type that does not have a specific visitor.
|
||||
|
||||
- A single ES|QL query is composed of one or more source commands and zero or
|
||||
more transformation commands.
|
||||
- Each command is represented by a `command` node.
|
||||
- Each command contains a list expressions named in ES|QL AST as *AST Item*.
|
||||
- `function` — function call expression.
|
||||
- `option` — a list of expressions with a specific role in the command.
|
||||
- `source` — s source identifier expression.
|
||||
- `column` — a field identifier expression.
|
||||
- `timeInterval` — a time interval expression.
|
||||
- `list` — a list literal expression.
|
||||
- `literal` — a literal expression.
|
||||
- `inlineCast` — an inline cast expression.
|
||||
```ts
|
||||
import { Walker } from '@kbn/esql-ast';
|
||||
|
||||
const walker = new Walker({
|
||||
visitAny?: (node: ESQLProperNode) => {
|
||||
// Called for any node type that does not have a specific visitor.
|
||||
},
|
||||
});
|
||||
|
||||
walker.walk(ast);
|
||||
```
|
||||
|
||||
|
||||
## High-level API
|
||||
|
||||
There are few high-level utility functions that are implemented on top of the
|
||||
low-level API, for your convenience:
|
||||
|
||||
- `Walker.walk` — Walks the AST and calls the appropriate visitor functions.
|
||||
- `Walker.commands` — Walks the AST and extracts all command statements.
|
||||
- `Walker.params` — Walks the AST and extracts all parameter literals.
|
||||
- `Walker.find` — Finds and returns the first node that matches the search criteria.
|
||||
- `Walker.findAll` — Finds and returns all nodes that match the search criteria.
|
||||
- `Walker.match` — Matches a single node against a template object.
|
||||
- `Walker.matchAll` — Matches all nodes against a template object.
|
||||
- `Walker.findFunction` — Finds the first function that matches the predicate.
|
||||
- `Walker.hasFunction` — Searches for at least one occurrence of a function or expression in the AST.
|
||||
- `Walker.visitComments` — Visits all comments in the AST.
|
||||
|
||||
The `Walker.walk()` method is simply a sugar syntax around the low-level
|
||||
`new Walker().walk()` method.
|
||||
|
||||
The `Walker.commands()` method returns a list of all commands. This also
|
||||
includes nested commands, once they become supported in ES|QL.
|
||||
|
||||
The `Walker.params()` method collects all param literals, such as unnamed `?` or
|
||||
named `?param`, or ordered `?1`.
|
||||
|
||||
The `Walker.find()` and `Walker.findAll()` methods are used to search for nodes
|
||||
in the AST that match a specific criteria. The criteria is specified using a
|
||||
predicate function.
|
||||
|
||||
The `Walker.match()` and `Walker.matchAll()` methods are also used to search for
|
||||
nodes in the AST, but unlike `find` and `findAll`, they use a template object
|
||||
to match the nodes.
|
||||
|
||||
The `Walker.findFunction()` is a simple utility to find the first function that
|
||||
matches a predicate. The `Walker.hasFunction()` returns `true` if at least one
|
||||
function or expression in the AST matches the predicate.
|
||||
|
||||
The `Walker.visitComments()` method is used to visit all comments in the AST.
|
||||
You specify a callback that is called for each comment node.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue