mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-29 09:54:06 -04:00
* Restructure query-languages docs files for clarity (#124797) In a few previous PR's we restructured the ES|QL docs to make it possible to generate them dynamically. This PR just moves a few files around to make the query languages docs easier to work with, and a little more organized like the ES|QL docs. A bit part of this was setting up redirects to the new locations, so other repo's could correctly link to the elasticsearch docs. * After running tests, we revert a few features * Fix build.gradle and improved some generating code * After running tests * Update x-pack/plugin/esql/build.gradle
204 lines
5.9 KiB
Markdown
204 lines
5.9 KiB
Markdown
---
|
||
mapped_pages:
|
||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html
|
||
---
|
||
|
||
# Regular expression syntax [regexp-syntax]
|
||
|
||
A [regular expression](https://en.wikipedia.org/wiki/Regular_expression) is a way to match patterns in data using placeholder characters, called operators.
|
||
|
||
{{es}} supports regular expressions in the following queries:
|
||
|
||
* [`regexp`](/reference/query-languages/query-dsl/query-dsl-regexp-query.md)
|
||
* [`query_string`](/reference/query-languages/query-dsl/query-dsl-query-string-query.md)
|
||
|
||
{{es}} uses [Apache Lucene](https://lucene.apache.org/core/)'s regular expression engine to parse these queries.
|
||
|
||
|
||
## Reserved characters [regexp-reserved-characters]
|
||
|
||
Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:
|
||
|
||
```
|
||
. ? + * | { } [ ] ( ) " \
|
||
```
|
||
Depending on the [optional operators](#regexp-optional-operators) enabled, the following characters may also be reserved:
|
||
|
||
```
|
||
# @ & < > ~
|
||
```
|
||
To use one of these characters literally, escape it with a preceding backslash or surround it with double quotes. For example:
|
||
|
||
```
|
||
\@ # renders as a literal '@'
|
||
\\ # renders as a literal '\'
|
||
"john@smith.com" # renders as 'john@smith.com'
|
||
```
|
||
::::{note}
|
||
The backslash is an escape character in both JSON strings and regular expressions. You need to escape both backslashes in a query, unless you use a language client, which takes care of this. For example, the string `a\b` needs to be indexed as `"a\\b"`:
|
||
|
||
```console
|
||
PUT my-index-000001/_doc/1
|
||
{
|
||
"my_field": "a\\b"
|
||
}
|
||
```
|
||
|
||
This document matches the following `regexp` query:
|
||
|
||
```console
|
||
GET my-index-000001/_search
|
||
{
|
||
"query": {
|
||
"regexp": {
|
||
"my_field.keyword": "a\\\\.*"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
::::
|
||
|
||
|
||
|
||
## Standard operators [regexp-standard-operators]
|
||
|
||
Lucene’s regular expression engine does not use the [Perl Compatible Regular Expressions (PCRE)](https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions) library, but it does support the following standard operators.
|
||
|
||
`.`
|
||
: Matches any character. For example:
|
||
|
||
```
|
||
ab. # matches 'aba', 'abb', 'abz', etc.
|
||
```
|
||
|
||
`?`
|
||
: Repeat the preceding character zero or one times. Often used to make the preceding character optional. For example:
|
||
|
||
```
|
||
abc? # matches 'ab' and 'abc'
|
||
```
|
||
|
||
`+`
|
||
: Repeat the preceding character one or more times. For example:
|
||
|
||
```
|
||
ab+ # matches 'ab', 'abb', 'abbb', etc.
|
||
```
|
||
|
||
`*`
|
||
: Repeat the preceding character zero or more times. For example:
|
||
|
||
```
|
||
ab* # matches 'a', 'ab', 'abb', 'abbb', etc.
|
||
```
|
||
|
||
`{}`
|
||
: Minimum and maximum number of times the preceding character can repeat. For example:
|
||
|
||
```
|
||
a{{2}} # matches 'aa'
|
||
a{2,4} # matches 'aa', 'aaa', and 'aaaa'
|
||
a{2,} # matches 'a` repeated two or more times
|
||
```
|
||
|
||
`|`
|
||
: OR operator. The match will succeed if the longest pattern on either the left side OR the right side matches. For example:
|
||
|
||
```
|
||
abc|xyz # matches 'abc' and 'xyz'
|
||
```
|
||
|
||
`( … )`
|
||
: Forms a group. You can use a group to treat part of the expression as a single character. For example:
|
||
|
||
```
|
||
abc(def)? # matches 'abc' and 'abcdef' but not 'abcd'
|
||
```
|
||
|
||
`[ … ]`
|
||
: Match one of the characters in the brackets. For example:
|
||
|
||
```
|
||
[abc] # matches 'a', 'b', 'c'
|
||
```
|
||
Inside the brackets, `-` indicates a range unless `-` is the first character or escaped. For example:
|
||
|
||
```
|
||
[a-c] # matches 'a', 'b', or 'c'
|
||
[-abc] # '-' is first character. Matches '-', 'a', 'b', or 'c'
|
||
[abc\-] # Escapes '-'. Matches 'a', 'b', 'c', or '-'
|
||
```
|
||
A `^` before a character in the brackets negates the character or range. For example:
|
||
|
||
```
|
||
[^abc] # matches any character except 'a', 'b', or 'c'
|
||
[^a-c] # matches any character except 'a', 'b', or 'c'
|
||
[^-abc] # matches any character except '-', 'a', 'b', or 'c'
|
||
[^abc\-] # matches any character except 'a', 'b', 'c', or '-'
|
||
```
|
||
|
||
|
||
## Optional operators [regexp-optional-operators]
|
||
|
||
You can use the `flags` parameter to enable more optional operators for Lucene’s regular expression engine.
|
||
|
||
To enable multiple operators, use a `|` separator. For example, a `flags` value of `COMPLEMENT|INTERVAL` enables the `COMPLEMENT` and `INTERVAL` operators.
|
||
|
||
|
||
### Valid values [_valid_values]
|
||
|
||
`ALL` (Default)
|
||
: Enables all optional operators.
|
||
|
||
`""` (empty string)
|
||
: Alias for the `ALL` value.
|
||
|
||
`COMPLEMENT`
|
||
: Enables the `~` operator. You can use `~` to negate the shortest following pattern. For example:
|
||
|
||
```
|
||
a~bc # matches 'adc' and 'aec' but not 'abc'
|
||
```
|
||
|
||
`EMPTY`
|
||
: Enables the `#` (empty language) operator. The `#` operator doesn’t match any string, not even an empty string.
|
||
|
||
If you create regular expressions by programmatically combining values, you can pass `#` to specify "no string." This lets you avoid accidentally matching empty strings or other unwanted strings. For example:
|
||
|
||
```
|
||
#|abc # matches 'abc' but nothing else, not even an empty string
|
||
```
|
||
|
||
`INTERVAL`
|
||
: Enables the `<>` operators. You can use `<>` to match a numeric range. For example:
|
||
|
||
```
|
||
foo<1-100> # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
|
||
foo<01-100> # matches 'foo01', 'foo02' ... 'foo99', 'foo100'
|
||
```
|
||
|
||
`INTERSECTION`
|
||
: Enables the `&` operator, which acts as an AND operator. The match will succeed if patterns on both the left side AND the right side matches. For example:
|
||
|
||
```
|
||
aaa.+&.+bbb # matches 'aaabbb'
|
||
```
|
||
|
||
`ANYSTRING`
|
||
: Enables the `@` operator. You can use `@` to match any entire string.
|
||
|
||
You can combine the `@` operator with `&` and `~` operators to create an "everything except" logic. For example:
|
||
|
||
```
|
||
@&~(abc.+) # matches everything except terms beginning with 'abc'
|
||
```
|
||
|
||
`NONE`
|
||
: Disables all optional operators.
|
||
|
||
|
||
## Unsupported operators [regexp-unsupported-operators]
|
||
|
||
Lucene’s regular expression engine does not support anchor operators, such as `^` (beginning of line) or `$` (end of line). To match a term, the regular expression must match the entire string.
|
||
|