Painless: =~ and ==~ operators

Adds support for the find operator (=~) and the match operator (==~)
to painless's regexes. Also whitelists most of the Matcher class and
documents regex support in painless.

The find operator (=~) returns a boolean that is the result of building
a matcher on the lhs with the Pattern on the RHS and calling `find` on
it. Use it like this:

```
if (ctx._source.last =~ /b/)
```

The match operator (==~) returns boolean like find but instead of calling
`find` on the Matcher it calls `matches`.

```
if (ctx._source.last ==~ /[^aeiou].*[aeiou]/)
```

Finally, if you want the actual matcher you do:

```
Matcher m = /[aeiou]/.matcher(ctx._source.last)
```
This commit is contained in:
Nik Everett 2016-06-13 21:42:37 -04:00
parent 3c9712794e
commit 8d3ef742db
17 changed files with 947 additions and 700 deletions

View file

@ -33,6 +33,8 @@ to `painless`.
* Shortcuts for list, map access using the dot `.` operator
* Native support for regular expressions with `/pattern/`, `=~`, and `==~`
[[painless-examples]]
[float]
@ -199,6 +201,79 @@ POST hockey/player/1/_update
----------------------------------------------------------------
// CONSOLE
[float]
=== Regular expressions
Painless's native support for regular expressions has syntax constructs:
* `/pattern/`: Pattern literals create patterns. This is the only way to create
a pattern in painless.
* `=~`: The find operator return a `boolean`, `true` if a subsequence of the
text matches, `false` otherwise.
* `==~`: The match operator returns a `boolean`, `true` if the text matches,
`false` if it doesn't.
Using the find operator (`=~`) you can update all hockey players with "b" in
their last name:
[source,js]
----------------------------------------------------------------
POST hockey/player/_update_by_query
{
"script": {
"lang": "painless",
"inline": "if (ctx._source.last =~ /b/) {ctx._source.last += \"matched\"} else {ctx.op = 'noop'}"
}
}
----------------------------------------------------------------
// CONSOLE
Using the match operator (`==~`) you can update all the hockey players who's
names start with a consonant and end with a vowel:
[source,js]
----------------------------------------------------------------
POST hockey/player/_update_by_query
{
"script": {
"lang": "painless",
"inline": "if (ctx._source.last ==~ /[^aeiou].*[aeiou]/) {ctx._source.last += \"matched\"} else {ctx.op = 'noop'}"
}
}
----------------------------------------------------------------
// CONSOLE
Or you can use the `Pattern.matcher` directory to get a `Matcher` instance and
remove all of the vowels in all of their names:
[source,js]
----------------------------------------------------------------
POST hockey/player/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.last = /[aeiou]/.matcher(ctx._source.last).replaceAll('')"
}
}
----------------------------------------------------------------
// CONSOLE
Note: all of the `_update_by_query` examples above could really do with a
`query` to limit the data that they pull back. While you *could* use a
<<query-dsl-script-query>> it wouldn't be as efficient as using any other query
because script queries aren't able to use the inverted index to limit the
documents that they have to check.
The pattern syntax is just
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expressions].
We intentionally don't allow scripts to call `Pattern.compile` to get a new
pattern on the fly because building a `Pattern` is (comparatively) slow.
Pattern literals (`/apattern/`) have fancy constant extraction so no matter
where they show up in the painless script they are built only when the script
is first used. It is fairly similar to how `String` literals work in Java.
[[painless-api]]
[float]
== Painless API