[[query-dsl-intervals-query]] === Intervals query ++++ Intervals ++++ Returns documents based on the order and proximity of matching terms. The `intervals` query uses *matching rules*, constructed from a small set of definitions. These rules are then applied to terms from a specified `field`. The definitions produce sequences of minimal intervals that span terms in a body of text. These intervals can be further combined and filtered by parent sources. [[intervals-query-ex-request]] ==== Example request The following `intervals` search returns documents containing `my favorite food` without any gap, followed by `hot water` or `cold porridge` in the `my_text` field. This search would match a `my_text` value of `my favorite food is cold porridge` but not `when it's cold my favorite food is porridge`. [source,console] -------------------------------------------------- POST _search { "query": { "intervals" : { "my_text" : { "all_of" : { "ordered" : true, "intervals" : [ { "match" : { "query" : "my favorite food", "max_gaps" : 0, "ordered" : true } }, { "any_of" : { "intervals" : [ { "match" : { "query" : "hot water" } }, { "match" : { "query" : "cold porridge" } } ] } } ] } } } } } -------------------------------------------------- [[intervals-top-level-params]] ==== Top-level parameters for `intervals` [[intervals-rules]] ``:: + -- (Required, rule object) Field you wish to search. The value of this parameter is a rule object used to match documents based on matching terms, order, and proximity. Valid rules include: * <> * <> * <> * <> * <> * <> * <> * <> -- [[intervals-match]] ==== `match` rule parameters The `match` rule matches analyzed text. `query`:: (Required, string) Text you wish to find in the provided ``. `max_gaps`:: + -- (Optional, integer) Maximum number of positions between the matching terms. Terms further apart than this are not considered matches. Defaults to `-1`. If unspecified or set to `-1`, there is no width restriction on the match. If set to `0`, the terms must appear next to each other. -- `ordered`:: (Optional, Boolean) If `true`, matching terms must appear in their specified order. Defaults to `false`. `analyzer`:: (Optional, string) <> used to analyze terms in the `query`. Defaults to the top-level ``'s analyzer. `filter`:: (Optional, <> rule object) An optional interval filter. `use_field`:: (Optional, string) If specified, then match intervals from this field rather than the top-level ``. Terms are analyzed using the search analyzer from this field. This allows you to search across multiple fields as if they were all the same field; for example, you could index the same text into stemmed and unstemmed fields, and search for stemmed tokens near unstemmed ones. [[intervals-prefix]] ==== `prefix` rule parameters The `prefix` rule matches terms that start with a specified set of characters. This prefix can expand to match at most `indices.query.bool.max_clause_count` <> terms. If the prefix matches more terms, {es} returns an error. You can use the <> option in the field mapping to avoid this limit. `prefix`:: (Required, string) Beginning characters of terms you wish to find in the top-level ``. `analyzer`:: (Optional, string) <> used to normalize the `prefix`. Defaults to the top-level ``'s analyzer. `use_field`:: + -- (Optional, string) If specified, then match intervals from this field rather than the top-level ``. The `prefix` is normalized using the search analyzer from this field, unless a separate `analyzer` is specified. -- [[intervals-wildcard]] ==== `wildcard` rule parameters The `wildcard` rule matches terms using a wildcard pattern. This pattern can expand to match at most `indices.query.bool.max_clause_count` <> terms. If the pattern matches more terms, {es} returns an error. `pattern`:: (Required, string) Wildcard pattern used to find matching terms. + -- This parameter supports two wildcard operators: * `?`, which matches any single character * `*`, which can match zero or more characters, including an empty one WARNING: Avoid beginning patterns with `*` or `?`. This can increase the iterations needed to find matching terms and slow search performance. -- `analyzer`:: (Optional, string) <> used to normalize the `pattern`. Defaults to the top-level ``'s analyzer. `use_field`:: + -- (Optional, string) If specified, match intervals from this field rather than the top-level ``. The `pattern` is normalized using the search analyzer from this field, unless `analyzer` is specified separately. -- [[intervals-regexp]] ==== `regexp` rule parameters The `regexp` rule matches terms using a regular expression pattern. This pattern can expand to match at most `indices.query.bool.max_clause_count` <> terms. If the pattern matches more terms,{es} returns an error. `pattern`:: (Required, string) Regexp pattern used to find matching terms. For a list of operators supported by the `regexp` pattern, see <>. WARNING: Avoid using wildcard patterns, such as `.*` or `.*?+``. This can increase the iterations needed to find matching terms and slow search performance. -- `analyzer`:: (Optional, string) <> used to normalize the `pattern`. Defaults to the top-level ``'s analyzer. -- `use_field`:: + -- (Optional, string) If specified, match intervals from this field rather than the top-level ``. The `pattern` is normalized using the search analyzer from this field, unless `analyzer` is specified separately. -- [[intervals-fuzzy]] ==== `fuzzy` rule parameters The `fuzzy` rule matches terms that are similar to the provided term, within an edit distance defined by <>. If the fuzzy expansion matches more than `indices.query.bool.max_clause_count` <> terms, {es} returns an error. `term`:: (Required, string) The term to match `prefix_length`:: (Optional, integer) Number of beginning characters left unchanged when creating expansions. Defaults to `0`. `transpositions`:: (Optional, Boolean) Indicates whether edits include transpositions of two adjacent characters (ab → ba). Defaults to `true`. `fuzziness`:: (Optional, string) Maximum edit distance allowed for matching. See <> for valid values and more information. Defaults to `auto`. `analyzer`:: (Optional, string) <> used to normalize the `term`. Defaults to the top-level `` 's analyzer. `use_field`:: + -- (Optional, string) If specified, match intervals from this field rather than the top-level ``. The `term` is normalized using the search analyzer from this field, unless `analyzer` is specified separately. -- [[intervals-range]] ==== `range` rule parameters The `range` rule matches terms contained within a provided range. This range can expand to match at most `indices.query.bool.max_clause_count` <> terms. If the range matches more terms,{es} returns an error. `gt`:: (Optional, string) Greater than: match terms greater than the provided term. `gte`:: (Optional, string) Greater than or equal to: match terms greater than or equal to the provided term. `lt`:: (Optional, string) Less than: match terms less than the provided term. `lte`:: (Optional, string) Less than or equal to: match terms less than or equal to the provided term. NOTE: It is required to provide one of `gt` or `gte` params. It is required to provide one of `lt` or `lte` params. `analyzer`:: (Optional, string) <> used to normalize the `pattern`. Defaults to the top-level ``'s analyzer. `use_field`:: (Optional, string) If specified, match intervals from this field rather than the top-level ``. [[intervals-all_of]] ==== `all_of` rule parameters The `all_of` rule returns matches that span a combination of other rules. `intervals`:: (Required, array of rule objects) An array of rules to combine. All rules must produce a match in a document for the overall source to match. `max_gaps`:: + -- (Optional, integer) Maximum number of positions between the matching terms. Intervals produced by the rules further apart than this are not considered matches. Defaults to `-1`. If unspecified or set to `-1`, there is no width restriction on the match. If set to `0`, the terms must appear next to each other. Internal intervals can have their own `max_gaps` values. In this case we first find internal intervals with their `max_gaps` values, and then combine them to see if a gap between internal intervals match the value of `max_gaps` of the `all_of` rule. For examples, how `max_gaps` works, see <>. -- `ordered`:: (Optional, Boolean) If `true`, intervals produced by the rules should appear in the order in which they are specified. Defaults to `false`. If `ordered` is `false`, intervals can appear in any order, including overlapping with each other. `filter`:: (Optional, <> rule object) Rule used to filter returned intervals. [[intervals-any_of]] ==== `any_of` rule parameters The `any_of` rule returns intervals produced by any of its sub-rules. `intervals`:: (Required, array of rule objects) An array of rules to match. `filter`:: (Optional, <> rule object) Rule used to filter returned intervals. [[interval_filter]] ==== `filter` rule parameters The `filter` rule returns intervals based on a query. See <> for an example. `after`:: (Optional, query object) Query used to return intervals that follow an interval from the `filter` rule. `before`:: (Optional, query object) Query used to return intervals that occur before an interval from the `filter` rule. `contained_by`:: (Optional, query object) Query used to return intervals contained by an interval from the `filter` rule. `containing`:: (Optional, query object) Query used to return intervals that contain an interval from the `filter` rule. `not_contained_by`:: (Optional, query object) Query used to return intervals that are *not* contained by an interval from the `filter` rule. `not_containing`:: (Optional, query object) Query used to return intervals that do *not* contain an interval from the `filter` rule. `not_overlapping`:: (Optional, query object) Query used to return intervals that do *not* overlap with an interval from the `filter` rule. `overlapping`:: (Optional, query object) Query used to return intervals that overlap with an interval from the `filter` rule. `script`:: (Optional, <>) Script used to return matching documents. This script must return a boolean value, `true` or `false`. See <> for an example. [[intervals-query-note]] ==== Notes [[interval-filter-rule-ex]] ===== Filter example The following search includes a `filter` rule. It returns documents that have the words `hot` and `porridge` within 10 positions of each other, without the word `salty` in between: [source,console] -------------------------------------------------- POST _search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "hot porridge", "max_gaps" : 10, "filter" : { "not_containing" : { "match" : { "query" : "salty" } } } } } } } } -------------------------------------------------- [[interval-script-filter]] ===== Script filters You can use a script to filter intervals based on their start position, end position, and internal gap count. The following `filter` script uses the `interval` variable with the `start`, `end`, and `gaps` methods: [source,console] -------------------------------------------------- POST _search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "hot porridge", "filter" : { "script" : { "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0" } } } } } } } -------------------------------------------------- [[interval-minimization]] ===== Minimization The intervals query always minimizes intervals, to ensure that queries can run in linear time. This can sometimes cause surprising results, particularly when using `max_gaps` restrictions or filters. For example, take the following query, searching for `salty` contained within the phrase `hot porridge`: [source,console] -------------------------------------------------- POST _search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "salty", "filter" : { "contained_by" : { "match" : { "query" : "hot porridge" } } } } } } } } -------------------------------------------------- This query does *not* match a document containing the phrase `hot porridge is salty porridge`, because the intervals returned by the match query for `hot porridge` only cover the initial two terms in this document, and these do not overlap the intervals covering `salty`. [[interval-max_gaps-all-rule]] ===== max_gaps in `all_of` ordered and unordered rule The following `intervals` search returns documents containing `my favorite food` without any gap, followed by `cold porridge` that can have at most 4 tokens between "cold" and "porridge". These two inner intervals when combined in the outer `all_of` interval, must have at most 1 gap between each other. Because the `all_of` rule has `ordered` set to `true`, the inner intervals are expected to be in the provided order. Thus, this search would match a `my_text` value of `my favorite food is cold porridge` but not `when it's cold my favorite food is porridge`. [source,console] -------------------------------------------------- POST _search { "query": { "intervals" : { "my_text" : { "all_of" : { "ordered" : true, <1> "max_gaps": 1, "intervals" : [ { "match" : { "query" : "my favorite food", "max_gaps" : 0, "ordered" : true } }, { "match" : { "query" : "cold porridge", "max_gaps" : 4, "ordered" : true } } ] } } } } } -------------------------------------------------- <1> The `ordered` parameter is set to `true`, so intervals must appear in the order specified. Below is the same query, but with `ordered` set to `false`. This means that intervals can appear in any order, even overlap with each other. Thus, this search would match a `my_text` value of `my favorite food is cold porridge`, as well as `when it's cold my favorite food is porridge`. In `when it's cold my favorite food is porridge`, `cold .... porridge` interval overlaps with `my favorite food` interval. [source,console] -------------------------------------------------- POST _search { "query": { "intervals" : { "my_text" : { "all_of" : { "ordered" : false, <1> "max_gaps": 1, "intervals" : [ { "match" : { "query" : "my favorite food", "max_gaps" : 0, "ordered" : true } }, { "match" : { "query" : "cold porridge", "max_gaps" : 4, "ordered" : true } } ] } } } } } -------------------------------------------------- <1> The `ordered` parameter is set to `true`, so intervals can appear in any order, even overlap with each other.