diff --git a/docs/plugins/mapper-size.asciidoc b/docs/plugins/mapper-size.asciidoc index 604cbe10859e..aa1d4f491e03 100644 --- a/docs/plugins/mapper-size.asciidoc +++ b/docs/plugins/mapper-size.asciidoc @@ -86,7 +86,7 @@ GET my-index-000001/_search {ref}/search-request-body.html#request-body-search-script-fields[script field] to return the `_size` field in the search response. <5> Uses a -{ref}/run-a-search.html#docvalue-fields[doc value +{ref}/search-your-data.html#docvalue-fields[doc value field] to return the `_size` field in the search response. Doc value fields are useful if {ref}/modules-scripting-security.html#allowed-script-types-setting[inline diff --git a/docs/reference/index.asciidoc b/docs/reference/index.asciidoc index 12d612393bc6..27661caff422 100644 --- a/docs/reference/index.asciidoc +++ b/docs/reference/index.asciidoc @@ -24,7 +24,7 @@ include::indices/index-templates.asciidoc[] include::data-streams/data-streams.asciidoc[] -include::search/index.asciidoc[] +include::search/search-your-data.asciidoc[] include::query-dsl.asciidoc[] diff --git a/docs/reference/redirects.asciidoc b/docs/reference/redirects.asciidoc index 29103fc8e3c2..1330d0c35fc1 100644 --- a/docs/reference/redirects.asciidoc +++ b/docs/reference/redirects.asciidoc @@ -956,6 +956,16 @@ The `xpack.sql.enabled` setting has been deprecated. SQL access is always enable See <>. +[role="exclude",id="run-a-search"] +=== Run a search + +See <>. + +[role="exclude",id="how-highlighters-work-internally"] +=== How highlighters work internally + +See <>. + //// [role="exclude",id="search-request-body"] === Request body search @@ -983,14 +993,17 @@ See <>. [role="exclude",id="request-body-search-highlighting"] ==== Highlighting + See <>. [role="exclude",id="highlighter-internal-work"] ==== How highlighters work internally -See <>. + +See <>. [role="exclude",id="request-body-search-sort"] ==== Sort + See <>. [role="exclude",id="request-body-search-source-filtering"] diff --git a/docs/reference/search/index.asciidoc b/docs/reference/search/index.asciidoc deleted file mode 100644 index 74a75525f7a2..000000000000 --- a/docs/reference/search/index.asciidoc +++ /dev/null @@ -1,41 +0,0 @@ -[[search-your-data]] -= Search your data - -[partintro] --- - -[[search-query]] -A _search query_, or _query_, is a request for information about data in -{es} data streams or indices. - -You can think of a query as a question, written in a way {es} understands. -Depending on your data, you can use a query to get answers to questions like: - -* What pages on my website contain a specific word or phrase? -* What processes on my server take longer than 500 milliseconds to respond? -* What users on my network ran `regsvr32.exe` within the last week? -* How many of my products have a price greater than $20? - -A _search_ consists of one or more queries that are combined and sent to {es}. -Documents that match a search's queries are returned in the _hits_, or -_search results_, of the response. - -A search may also contain additional information used to better process its -queries. For example, a search may be limited to a specific index or only return -a specific number of results. - -[discrete] -[[search-toc]] -=== In this section - -* <> -* <> -* <> -* <> - --- - -include::run-a-search.asciidoc[] -include::{es-repo-dir}/search/near-real-time.asciidoc[] -include::{es-repo-dir}/async-search.asciidoc[] -include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[] diff --git a/docs/reference/search/request/collapse.asciidoc b/docs/reference/search/request/collapse.asciidoc index 596a4582f88d..9780116f61ab 100644 --- a/docs/reference/search/request/collapse.asciidoc +++ b/docs/reference/search/request/collapse.asciidoc @@ -1,5 +1,5 @@ [[collapse-search-results]] -=== Collapse search results +== Collapse search results You can use the `collapse` parameter to collapse search results based on field values. The collapsing is done by selecting only the top sorted @@ -35,8 +35,9 @@ The field used for collapsing must be a single valued <> or NOTE: The collapsing is applied to the top hits only and does not affect aggregations. +[discrete] [[expand-collapse-results]] -==== Expand collapse results +=== Expand collapse results It is also possible to expand each collapsed top hits with the `inner_hits` option. @@ -118,8 +119,9 @@ The default is based on the number of data nodes and the default search thread p WARNING: `collapse` cannot be used in conjunction with <>, <> or <>. +[discrete] [[second-level-of-collapsing]] -==== Second level of collapsing +=== Second level of collapsing Second level of collapsing is also supported and is applied to `inner_hits`. For example, the following request finds the top scored tweets for diff --git a/docs/reference/search/request/highlighters-internal.asciidoc b/docs/reference/search/request/highlighters-internal.asciidoc index 494b0f93b7cd..c91f4a4cb6e8 100644 --- a/docs/reference/search/request/highlighters-internal.asciidoc +++ b/docs/reference/search/request/highlighters-internal.asciidoc @@ -1,5 +1,6 @@ -[[how-highlighters-work-internally]] -=== How highlighters work internally +[discrete] +[[how-es-highlighters-work-internally]] +== How highlighters work internally Given a query and a text (the content of a document field), the goal of a highlighter is to find the best text fragments for the query, and highlight @@ -10,7 +11,8 @@ address several questions: - How to find the best fragments among all fragments? - How to highlight the query terms in a fragment? -==== How to break a text into fragments? +[discrete] +=== How to break a text into fragments? Relevant settings: `fragment_size`, `fragmenter`, `type` of highlighter, `boundary_chars`, `boundary_max_scan`, `boundary_scanner`, `boundary_scanner_locale`. @@ -27,8 +29,8 @@ Unified or FVH highlighters do a better job of breaking up a text into fragments by utilizing Java's `BreakIterator`. This ensures that a fragment is a valid sentence as long as `fragment_size` allows for this. - -==== How to find the best fragments? +[discrete] +=== How to find the best fragments? Relevant settings: `number_of_fragments`. To find the best, most relevant, fragments, a highlighter needs to score @@ -60,8 +62,8 @@ if they are available. Otherwise, similar to Plain Highlighter, it has to create an in-memory index from the text. Unified highlighter uses the BM25 scoring model to score fragments. - -==== How to highlight the query terms in a fragment? +[discrete] +=== How to highlight the query terms in a fragment? Relevant settings: `pre-tags`, `post-tags`. The goal is to highlight only those terms that participated in generating the 'hit' on the document. @@ -77,8 +79,8 @@ fragments in some raw form, and then populate them with actual text. A highlighter uses `pre-tags`, `post-tags` to encode highlighted terms. - -==== An example of the work of the unified highlighter +[discrete] +=== An example of the work of the unified highlighter Let's look in more details how unified highlighter works. diff --git a/docs/reference/search/request/highlighting.asciidoc b/docs/reference/search/request/highlighting.asciidoc index b15e6c85fc70..5b49c3f04396 100644 --- a/docs/reference/search/request/highlighting.asciidoc +++ b/docs/reference/search/request/highlighting.asciidoc @@ -1,5 +1,5 @@ [[highlighting]] -=== Highlighting +== Highlighting Highlighters enable you to get highlighted snippets from one or more fields in your search results so you can show users where the query matches are. @@ -40,16 +40,18 @@ GET /_search highlighter). You can specify the highlighter `type` you want to use for each field. +[discrete] [[unified-highlighter]] -==== Unified highlighter +=== Unified highlighter The `unified` highlighter uses the Lucene Unified Highlighter. This highlighter breaks the text into sentences and uses the BM25 algorithm to score individual sentences as if they were documents in the corpus. It also supports accurate phrase and multi-term (fuzzy, prefix, regex) highlighting. This is the default highlighter. +[discrete] [[plain-highlighter]] -==== Plain highlighter +=== Plain highlighter The `plain` highlighter uses the standard Lucene highlighter. It attempts to reflect the query matching logic in terms of understanding word importance and any word positioning criteria in phrase queries. @@ -63,8 +65,9 @@ This is repeated for every field and every document that needs to be highlighted If you want to highlight a lot of fields in a lot of documents with complex queries, we recommend using the `unified` highlighter on `postings` or `term_vector` fields. +[discrete] [[fast-vector-highlighter]] -==== Fast vector highlighter +=== Fast vector highlighter The `fvh` highlighter uses the Lucene Fast Vector highlighter. This highlighter can be used on fields with `term_vector` set to `with_positions_offsets` in the mapping. The fast vector highlighter: @@ -82,8 +85,9 @@ This highlighter can be used on fields with `term_vector` set to The `fvh` highlighter does not support span queries. If you need support for span queries, try an alternative highlighter, such as the `unified` highlighter. +[discrete] [[offsets-strategy]] -==== Offsets strategy +=== Offsets strategy To create meaningful search snippets from the terms being queried, the highlighter needs to know the start and end character offsets of each word in the original text. These offsets can be obtained from: @@ -115,8 +119,9 @@ To protect against this, the maximum number of text characters that will be anal limited to 1000000. This default limit can be changed for a particular index with the index setting `index.highlight.max_analyzed_offset`. +[discrete] [[highlighting-settings]] -==== Highlighting settings +=== Highlighting settings Highlighting settings can be set on a global level and overridden at the field level. @@ -215,7 +220,7 @@ order:: Sorts highlighted fragments by score when set to `score`. By default, fragments will be output in the order they appear in the field (order: `none`). Setting this option to `score` will output the most relevant fragments first. Each highlighter applies its own logic to compute relevancy scores. See -the document <> +the document <> for more details how different highlighters find the best fragments. phrase_limit:: Controls the number of matching phrases in a document that are @@ -253,8 +258,9 @@ schema defines the following `pre_tags` and defines `post_tags` as type:: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to `unified`. +[discrete] [[highlighting-examples]] -==== Highlighting examples +=== Highlighting examples * <> * <> @@ -270,7 +276,7 @@ type:: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to [[override-global-settings]] [discrete] -=== Override global settings +== Override global settings You can specify highlighter settings globally and selectively override them for individual fields. @@ -298,7 +304,7 @@ GET /_search [discrete] [[specify-highlight-query]] -=== Specify a highlight query +== Specify a highlight query You can specify a `highlight_query` to take additional information into account when highlighting. For example, the following query includes both the search @@ -367,7 +373,7 @@ GET /_search [discrete] [[set-highlighter-type]] -=== Set highlighter type +== Set highlighter type The `type` field allows to force a specific highlighter type. The allowed values are: `unified`, `plain` and `fvh`. @@ -391,7 +397,7 @@ GET /_search [[configure-tags]] [discrete] -=== Configure highlighting tags +== Configure highlighting tags By default, the highlighting will wrap highlighted text in `` and ``. This can be controlled by setting `pre_tags` and `post_tags`, @@ -457,7 +463,7 @@ GET /_search [discrete] [[highlight-source]] -=== Highlight on source +== Highlight on source Forces the highlighting to highlight fields based on the source even if fields are stored separately. Defaults to `false`. @@ -481,7 +487,7 @@ GET /_search [[highlight-all]] [discrete] -=== Highlight in all fields +== Highlight in all fields By default, only fields that contains a query match are highlighted. Set `require_field_match` to `false` to highlight all fields. @@ -505,7 +511,7 @@ GET /_search [[matched-fields]] [discrete] -=== Combine matches on multiple fields +== Combine matches on multiple fields WARNING: This is only supported by the `fvh` highlighter @@ -639,7 +645,7 @@ to [[explicit-field-order]] [discrete] -=== Explicitly order highlighted fields +== Explicitly order highlighted fields Elasticsearch highlights the fields in the order that they are sent, but per the JSON spec, objects are unordered. If you need to be explicit about the order in which fields are highlighted specify the `fields` as an array: @@ -666,7 +672,7 @@ fields are highlighted but a plugin might. [discrete] [[control-highlighted-frags]] -=== Control highlighted fragments +== Control highlighted fragments Each field highlighted can control the size of the highlighted fragment in characters (defaults to `100`), and the maximum number of fragments @@ -763,7 +769,7 @@ GET /_search [discrete] [[highlight-postings-list]] -=== Highlight using the postings list +== Highlight using the postings list Here is an example of setting the `comment` field in the index mapping to allow for highlighting using the postings: @@ -803,7 +809,7 @@ PUT /example [discrete] [[specify-fragmenter]] -=== Specify a fragmenter for the plain highlighter +== Specify a fragmenter for the plain highlighter When using the `plain` highlighter, you can choose between the `simple` and `span` fragmenters: diff --git a/docs/reference/search/request/sort.asciidoc b/docs/reference/search/request/sort.asciidoc index cb790ccf958d..9a3043b33f07 100644 --- a/docs/reference/search/request/sort.asciidoc +++ b/docs/reference/search/request/sort.asciidoc @@ -1,5 +1,5 @@ [[sort-search-results]] -=== Sort search results +== Sort search results Allows you to add one or more sorts on specific fields. Each sort can be reversed as well. The sort is defined on a per field level, with special @@ -48,12 +48,14 @@ NOTE: `_doc` has no real use-case besides being the most efficient sort order. So if you don't care about the order in which documents are returned, then you should sort by `_doc`. This especially helps when <>. -==== Sort Values +[discrete] +=== Sort Values The sort values for each document returned are also returned as part of the response. -==== Sort Order +[discrete] +=== Sort Order The `order` option can have the following values: @@ -64,7 +66,8 @@ The `order` option can have the following values: The order defaults to `desc` when sorting on the `_score`, and defaults to `asc` when sorting on anything else. -==== Sort mode option +[discrete] +=== Sort mode option Elasticsearch supports sorting by array or multi-valued fields. The `mode` option controls what array value is picked for sorting the document it belongs @@ -84,7 +87,8 @@ The default sort mode in the ascending sort order is `min` -- the lowest value is picked. The default sort mode in the descending order is `max` -- the highest value is picked. -===== Sort mode example usage +[discrete] +==== Sort mode example usage In the example below the field price has multiple prices per document. In this case the result hits will be sorted by price ascending based on @@ -109,7 +113,8 @@ POST /_search } -------------------------------------------------- -==== Sorting numeric fields +[discrete] +=== Sorting numeric fields For numeric fields it is also possible to cast the values from one type to another using the `numeric_type` option. @@ -226,8 +231,9 @@ POST /index_long,index_double/_search To avoid overflow, the conversion to `date_nanos` cannot be applied on dates before 1970 and after 2262 as nanoseconds are represented as longs. +[discrete] [[nested-sorting]] -==== Sorting within nested objects. +=== Sorting within nested objects. Elasticsearch also supports sorting by fields that are inside one or more nested objects. The sorting by nested @@ -253,7 +259,8 @@ field support has a `nested` sort option with the following properties: NOTE: Elasticsearch will throw an error if a nested field is defined in a sort without a `nested` context. -===== Nested sorting examples +[discrete] +==== Nested sorting examples In the below example `offer` is a field of type `nested`. The nested `path` needs to be specified; otherwise, Elasticsearch doesn't know on what nested level sort values need to be captured. @@ -331,7 +338,8 @@ POST /_search Nested sorting is also supported when sorting by scripts and sorting by geo distance. -==== Missing Values +[discrete] +=== Missing Values The `missing` parameter specifies how docs which are missing the sort field should be treated: The `missing` value can be @@ -357,7 +365,8 @@ GET /_search NOTE: If a nested inner object doesn't match with the `nested.filter` then a missing value is used. -==== Ignoring Unmapped Fields +[discrete] +=== Ignoring Unmapped Fields By default, the search request will fail if there is no mapping associated with a field. The `unmapped_type` option allows you to ignore @@ -382,8 +391,9 @@ If any of the indices that are queried doesn't have a mapping for `price` then Elasticsearch will handle it as if there was a mapping of type `long`, with all documents in this index having no value for this field. +[discrete] [[geo-sorting]] -==== Geo Distance Sorting +=== Geo Distance Sorting Allow to sort by `_geo_distance`. Here is an example, assuming `pin.location` is a field of type `geo_point`: @@ -438,7 +448,8 @@ have values for the field that is used for distance computation. The following formats are supported in providing the coordinates: -===== Lat Lon as Properties +[discrete] +==== Lat Lon as Properties [source,console] -------------------------------------------------- @@ -462,7 +473,8 @@ GET /_search } -------------------------------------------------- -===== Lat Lon as String +[discrete] +==== Lat Lon as String Format in `lat,lon`. @@ -485,7 +497,8 @@ GET /_search } -------------------------------------------------- -===== Geohash +[discrete] +==== Geohash [source,console] -------------------------------------------------- @@ -506,7 +519,8 @@ GET /_search } -------------------------------------------------- -===== Lat Lon as Array +[discrete] +==== Lat Lon as Array Format in `[lon, lat]`, note, the order of lon/lat here in order to conform with http://geojson.org/[GeoJSON]. @@ -530,8 +544,8 @@ GET /_search } -------------------------------------------------- - -==== Multiple reference points +[discrete] +=== Multiple reference points Multiple geo points can be passed as an array containing any `geo_point` format, for example @@ -559,8 +573,8 @@ and so forth. The final distance for a document will then be `min`/`max`/`avg` (defined via `mode`) distance of all points contained in the document to all points given in the sort request. - -==== Script Based Sorting +[discrete] +=== Script Based Sorting Allow to sort based on custom scripts, here is an example: @@ -587,8 +601,8 @@ GET /_search } -------------------------------------------------- - -==== Track Scores +[discrete] +=== Track Scores When sorting on a field, scores are not computed. By setting `track_scores` to true, scores will still be computed and tracked. @@ -609,7 +623,8 @@ GET /_search } -------------------------------------------------- -==== Memory Considerations +[discrete] +=== Memory Considerations When sorting, the relevant sorted field values are loaded into memory. This means that per shard, there should be enough memory to contain diff --git a/docs/reference/search/run-a-search.asciidoc b/docs/reference/search/search-your-data.asciidoc similarity index 85% rename from docs/reference/search/run-a-search.asciidoc rename to docs/reference/search/search-your-data.asciidoc index e8f2a92db4dd..0babfb0c1491 100644 --- a/docs/reference/search/run-a-search.asciidoc +++ b/docs/reference/search/search-your-data.asciidoc @@ -1,11 +1,35 @@ -[[run-a-search]] +[[search-your-data]] += Search your data + +[[search-query]] +A _search query_, or _query_, is a request for information about data in +{es} data streams or indices. + +You can think of a query as a question, written in a way {es} understands. +Depending on your data, you can use a query to get answers to questions like: + +* What processes on my server take longer than 500 milliseconds to respond? +* What users on my network ran `regsvr32.exe` within the last week? +* How many of my products have a price greater than $20? +* What pages on my website contain a specific word or phrase? + +A _search_ consists of one or more queries that are combined and sent to {es}. +Documents that match a search's queries are returned in the _hits_, or +_search results_, of the response. + +A search may also contain additional information used to better process its +queries. For example, a search may be limited to a specific index or only return +a specific number of results. + +[discrete] +[[run-an-es-search]] == Run a search You can use the <> to search data stored in {es} data streams or indices. The API can run two types of searches, depending on how you provide -<>: +queries: <>:: Queries are provided through a query parameter. URI searches tend to be @@ -267,11 +291,10 @@ GET /*/_search ---- include::request/from-size.asciidoc[] - include::search-fields.asciidoc[] - include::request/collapse.asciidoc[] - include::request/highlighting.asciidoc[] - include::request/sort.asciidoc[] +include::{es-repo-dir}/async-search.asciidoc[] +include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[] +include::{es-repo-dir}/search/near-real-time.asciidoc[]