elasticsearch/docs/reference/query-languages/regexp-syntax.md
Colleen McGinnis b7e3a1e14b
[docs] Migrate docs from AsciiDoc to Markdown (#123507)
* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648d.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
2025-02-27 17:56:14 +01:00

5.8 KiB
Raw Blame History

mapped_pages
https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html

Regular expression syntax [regexp-syntax]

A regular expression is a way to match patterns in data using placeholder characters, called operators.

{{es}} supports regular expressions in the following queries:

{{es}} uses Apache Lucene's regular expression engine to parse these queries.

Reserved characters [regexp-reserved-characters]

Lucenes regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:

. ? + * | { } [ ] ( ) " \

Depending on the optional operators enabled, the following characters may also be reserved:

# @ & < >  ~

To use one of these characters literally, escape it with a preceding backslash or surround it with double quotes. For example:

\@                  # renders as a literal '@'
\\                  # renders as a literal '\'
"john@smith.com"    # renders as 'john@smith.com'

::::{note} The backslash is an escape character in both JSON strings and regular expressions. You need to escape both backslashes in a query, unless you use a language client, which takes care of this. For example, the string a\b needs to be indexed as "a\\b":

PUT my-index-000001/_doc/1
{
  "my_field": "a\\b"
}

This document matches the following regexp query:

GET my-index-000001/_search
{
  "query": {
    "regexp": {
      "my_field.keyword": "a\\\\.*"
    }
  }
}

::::

Standard operators [regexp-standard-operators]

Lucenes regular expression engine does not use the Perl Compatible Regular Expressions (PCRE) library, but it does support the following standard operators.

.
Matches any character. For example:
ab.     # matches 'aba', 'abb', 'abz', etc.
?
Repeat the preceding character zero or one times. Often used to make the preceding character optional. For example:
abc?     # matches 'ab' and 'abc'
+
Repeat the preceding character one or more times. For example:
ab+     # matches 'ab', 'abb', 'abbb', etc.
*
Repeat the preceding character zero or more times. For example:
ab*     # matches 'a', 'ab', 'abb', 'abbb', etc.
{}
Minimum and maximum number of times the preceding character can repeat. For example:
a{{2}}    # matches 'aa'
a{2,4}  # matches 'aa', 'aaa', and 'aaaa'
a{2,}   # matches 'a` repeated two or more times
|
OR operator. The match will succeed if the longest pattern on either the left side OR the right side matches. For example:
abc|xyz  # matches 'abc' and 'xyz'
( … )
Forms a group. You can use a group to treat part of the expression as a single character. For example:
abc(def)?  # matches 'abc' and 'abcdef' but not 'abcd'
[ … ]
Match one of the characters in the brackets. For example:
[abc]   # matches 'a', 'b', 'c'

Inside the brackets, - indicates a range unless - is the first character or escaped. For example:

[a-c]   # matches 'a', 'b', or 'c'
[-abc]  # '-' is first character. Matches '-', 'a', 'b', or 'c'
[abc\-] # Escapes '-'. Matches 'a', 'b', 'c', or '-'

A ^ before a character in the brackets negates the character or range. For example:

[^abc]      # matches any character except 'a', 'b', or 'c'
[^a-c]      # matches any character except 'a', 'b', or 'c'
[^-abc]     # matches any character except '-', 'a', 'b', or 'c'
[^abc\-]    # matches any character except 'a', 'b', 'c', or '-'

Optional operators [regexp-optional-operators]

You can use the flags parameter to enable more optional operators for Lucenes regular expression engine.

To enable multiple operators, use a | separator. For example, a flags value of COMPLEMENT|INTERVAL enables the COMPLEMENT and INTERVAL operators.

Valid values [_valid_values]

ALL (Default)
Enables all optional operators.
"" (empty string)
Alias for the ALL value.
COMPLEMENT
Enables the ~ operator. You can use ~ to negate the shortest following pattern. For example:
a~bc   # matches 'adc' and 'aec' but not 'abc'
EMPTY
Enables the # (empty language) operator. The # operator doesnt match any string, not even an empty string.

If you create regular expressions by programmatically combining values, you can pass # to specify "no string." This lets you avoid accidentally matching empty strings or other unwanted strings. For example:

#|abc  # matches 'abc' but nothing else, not even an empty string
INTERVAL
Enables the <> operators. You can use <> to match a numeric range. For example:
foo<1-100>      # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
foo<01-100>     # matches 'foo01', 'foo02' ... 'foo99', 'foo100'
INTERSECTION
Enables the & operator, which acts as an AND operator. The match will succeed if patterns on both the left side AND the right side matches. For example:
aaa.+&.+bbb  # matches 'aaabbb'
ANYSTRING
Enables the @ operator. You can use @ to match any entire string.

You can combine the @ operator with & and ~ operators to create an "everything except" logic. For example:

@&~(abc.+)  # matches everything except terms beginning with 'abc'
NONE
Disables all optional operators.

Unsupported operators [regexp-unsupported-operators]

Lucenes regular expression engine does not support anchor operators, such as ^ (beginning of line) or $ (end of line). To match a term, the regular expression must match the entire string.