Expose splitOnWhitespace in Query String Query (#20965)

This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true. For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted. Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are: * A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined. * Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false. * Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
2025-06-28 17:34:17 -04:00 · 2016-11-02 10:00:40 +01:00 · 2016-11-02 10:00:40 +01:00 · 9d6fac809c
commit 9d6fac809c
parent aa6cd93e0f
7 changed files with 182 additions and 4 deletions
--- a/docs/reference/query-dsl/query-string-query.asciidoc
+++ b/docs/reference/query-dsl/query-string-query.asciidoc
@ -90,6 +90,11 @@ http://www.joda.org/joda-time/apidocs/org/joda/time/DateTimeZone.html[JODA timez
 the query string. This allows to use a field that has a different analysis chain
 for exact matching. Look <<mixing-exact-search-with-stemming,here>> for a
 comprehensive example.
+
+|`split_on_whitespace` |Whether query text should be split on whitespace prior to analysis.
+                        Instead  the queryparser would parse around only real 'operators'.
+                        Default to `false`.
+
 |=======================================================================

 When a multi term query is being generated, one can control how it gets