Expose splitOnWhitespace in Query String Query (#20965)

This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true. For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted. Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are: * A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined. * Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false. * Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
2025-04-25 15:47:23 -04:00 · 2016-11-02 10:00:40 +01:00 · 2016-11-02 10:00:40 +01:00 · 9d6fac809c
commit 9d6fac809c
parent aa6cd93e0f
7 changed files with 182 additions and 4 deletions
--- a/docs/reference/query-dsl/query-string-syntax.asciidoc
+++ b/docs/reference/query-dsl/query-string-syntax.asciidoc
@ -282,8 +282,8 @@ A space may also be a reserved character.  For instance, if you have a
 synonym list which converts `"wi fi"` to `"wifi"`, a `query_string` search
 for `"wi fi"` would fail. The query string parser would interpret your
 query as a search for `"wi OR fi"`, while the token stored in your
-index is actually `"wifi"`.  Escaping the space will protect it from
-being touched by the query string parser: `"wi\ fi"`.
+index is actually `"wifi"`.  The option `split_on_whitespace=false` will protect it from
+being touched by the query string parser and will let the analysis run on the entire input (`"wi fi"`).
 ****

 ===== Empty Query