Expose splitOnWhitespace in Query String Query (#20965)

This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
* A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined.
* Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false.
* Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
This commit is contained in:
Jim Ferenczi 2016-11-02 10:00:40 +01:00 committed by GitHub
parent aa6cd93e0f
commit 9d6fac809c
7 changed files with 182 additions and 4 deletions

View file

@ -282,8 +282,8 @@ A space may also be a reserved character. For instance, if you have a
synonym list which converts `"wi fi"` to `"wifi"`, a `query_string` search
for `"wi fi"` would fail. The query string parser would interpret your
query as a search for `"wi OR fi"`, while the token stored in your
index is actually `"wifi"`. Escaping the space will protect it from
being touched by the query string parser: `"wi\ fi"`.
index is actually `"wifi"`. The option `split_on_whitespace=false` will protect it from
being touched by the query string parser and will let the analysis run on the entire input (`"wi fi"`).
****
===== Empty Query