[[query-dsl-semantic-query]] === Semantic query ++++ Semantic ++++ beta[] The `semantic` query type enables you to perform <> on data stored in a <> field. [discrete] [[semantic-query-example]] ==== Example request [source,console] ------------------------------------------------------------ GET my-index-000001/_search { "query": { "semantic": { "field": "inference_field", "query": "Best surfing places" } } } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] [discrete] [[semantic-query-params]] ==== Top-level parameters for `semantic` `field`:: (Required, string) The `semantic_text` field to perform the query on. `query`:: (Required, string) The query text to be searched for on the field. `inner_hits`:: (Optional, object) Retrieves the specific passages that match the query. See <> for more information. + .Properties of `inner_hits` [%collapsible%open] ==== `from`:: (Optional, integer) The offset from the first matching passage to fetch. Used to paginate through the passages. Defaults to `0`. `size`:: (Optional, integer) The maximum number of matching passages to return. Defaults to `3`. ==== Refer to <> to learn more about semantic search using `semantic_text` and `semantic` query. [discrete] [[semantic-query-passage-ranking]] ==== Passage ranking with the `semantic` query The `inner_hits` parameter can be used for _passage ranking_, which allows you to determine which passages in the document best match the query. For example, if you have a document that covers varying topics: [source,console] ------------------------------------------------------------ POST my-index/_doc/lake_tahoe { "inference_field": [ "Lake Tahoe is the largest alpine lake in North America", "When hiking in the area, please be on alert for bears" ] } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] You can use passage ranking to find the passage that best matches your query: [source,console] ------------------------------------------------------------ GET my-index/_search { "query": { "semantic": { "field": "inference_field", "query": "mountain lake", "inner_hits": { } } } } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] [source,console-result] ------------------------------------------------------------ { "took": 67, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 10.844536, "hits": [ { "_index": "my-index", "_id": "lake_tahoe", "_score": 10.844536, "_source": { ... }, "inner_hits": { <1> "inference_field": { "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 10.844536, "hits": [ { "_index": "my-index", "_id": "lake_tahoe", "_nested": { "field": "inference_field.inference.chunks", "offset": 0 }, "_score": 10.844536, "_source": { "text": "Lake Tahoe is the largest alpine lake in North America" } }, { "_index": "my-index", "_id": "lake_tahoe", "_nested": { "field": "inference_field.inference.chunks", "offset": 1 }, "_score": 3.2726858, "_source": { "text": "When hiking in the area, please be on alert for bears" } } ] } } } } ] } } ------------------------------------------------------------ <1> Ranked passages will be returned using the <>, with `` set to the `semantic_text` field name. By default, the top three matching passages will be returned. You can use the `size` parameter to control the number of passages returned and the `from` parameter to page through the matching passages: [source,console] ------------------------------------------------------------ GET my-index/_search { "query": { "semantic": { "field": "inference_field", "query": "mountain lake", "inner_hits": { "from": 1, "size": 1 } } } } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] [source,console-result] ------------------------------------------------------------ { "took": 42, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 10.844536, "hits": [ { "_index": "my-index", "_id": "lake_tahoe", "_score": 10.844536, "_source": { ... }, "inner_hits": { "inference_field": { "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 10.844536, "hits": [ { "_index": "my-index", "_id": "lake_tahoe", "_nested": { "field": "inference_field.inference.chunks", "offset": 1 }, "_score": 3.2726858, "_source": { "text": "When hiking in the area, please be on alert for bears" } } ] } } } } ] } } ------------------------------------------------------------ [discrete] [[hybrid-search-semantic]] ==== Hybrid search with the `semantic` query The `semantic` query can be used as a part of a hybrid search where the `semantic` query is combined with lexical queries. For example, the query below finds documents with the `title` field matching "mountain lake", and combines them with results from a semantic search on the field `title_semantic`, that is a `semantic_text` field. The combined documents are then scored, and the top 3 top scored documents are returned. [source,console] ------------------------------------------------------------ POST my-index/_search { "size" : 3, "query": { "bool": { "should": [ { "match": { "title": { "query": "mountain lake", "boost": 1 } } }, { "semantic": { "field": "title_semantic", "query": "mountain lake", "boost": 2 } } ] } } } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] You can also use semantic_text as part of <> to make ranking relevant results easier: [source,console] ------------------------------------------------------------ GET my-index/_search { "retriever": { "rrf": { "retrievers": [ { "standard": { "query": { "term": { "text": "shoes" } } } }, { "standard": { "query": { "semantic": { "field": "semantic_field", "query": "shoes" } } } } ], "rank_window_size": 50, "rank_constant": 20 } } } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] [discrete] [[advanced-search]] ==== Advanced search on `semantic_text` fields The `semantic` query uses default settings for searching on `semantic_text` fields for ease of use. If you want to fine-tune a search on a `semantic_text` field, you need to know the task type used by the `inference_id` configured in `semantic_text`. You can find the task type using the <>, and check the `task_type` associated with the {infer} service. Depending on the `task_type`, use either the <> or the <> query for greater flexibility and customization. NOTE: While it is possible to use the `sparse_vector` query or the `knn` query on a `semantic_text` field, it is not supported to use the `semantic_query` on a `sparse_vector` or `dense_vector` field type. [discrete] [[search-sparse-inference]] ===== Search with `sparse_embedding` inference When the {infer} endpoint uses a `sparse_embedding` model, you can use a <> on a <> field in the following way: [source,console] ------------------------------------------------------------ GET test-index/_search { "query": { "nested": { "path": "inference_field.inference.chunks", "query": { "sparse_vector": { "field": "inference_field.inference.chunks.embeddings", "inference_id": "my-inference-id", "query": "mountain lake" } } } } } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] You can customize the `sparse_vector` query to include specific settings, like <>. [discrete] [[search-text-inferece]] ===== Search with `text_embedding` inference When the {infer} endpoint uses a `text_embedding` model, you can use a <> on a `semantic_text` field in the following way: [source,console] ------------------------------------------------------------ GET test-index/_search { "query": { "nested": { "path": "inference_field.inference.chunks", "query": { "knn": { "field": "inference_field.inference.chunks.embeddings", "query_vector_builder": { "text_embedding": { "model_id": "my_inference_id", "model_text": "mountain lake" } } } } } } } ------------------------------------------------------------ // TEST[skip: Requires inference endpoints] You can customize the `knn` query to include specific settings, like `num_candidates` and `k`.