Expose splitOnWhitespace in Query String Query (#20965)

This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
* A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined.
* Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false.
* Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
This commit is contained in:
Jim Ferenczi 2016-11-02 10:00:40 +01:00 committed by GitHub
parent aa6cd93e0f
commit 9d6fac809c
7 changed files with 182 additions and 4 deletions

View file

@ -90,6 +90,11 @@ http://www.joda.org/joda-time/apidocs/org/joda/time/DateTimeZone.html[JODA timez
the query string. This allows to use a field that has a different analysis chain
for exact matching. Look <<mixing-exact-search-with-stemming,here>> for a
comprehensive example.
|`split_on_whitespace` |Whether query text should be split on whitespace prior to analysis.
Instead the queryparser would parse around only real 'operators'.
Default to `false`.
|=======================================================================
When a multi term query is being generated, one can control how it gets