mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
Adds inner hits support to the semantic query through a restricted inner_hits parameter, which exposes from and size from the inner_hits options
395 lines
12 KiB
Text
395 lines
12 KiB
Text
[[query-dsl-semantic-query]]
|
|
=== Semantic query
|
|
++++
|
|
<titleabbrev>Semantic</titleabbrev>
|
|
++++
|
|
|
|
beta[]
|
|
|
|
The `semantic` query type enables you to perform <<semantic-search,semantic search>> on data stored in a <<semantic-text,`semantic_text`>> field.
|
|
|
|
|
|
[discrete]
|
|
[[semantic-query-example]]
|
|
==== Example request
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
GET my-index-000001/_search
|
|
{
|
|
"query": {
|
|
"semantic": {
|
|
"field": "inference_field",
|
|
"query": "Best surfing places"
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
|
|
[discrete]
|
|
[[semantic-query-params]]
|
|
==== Top-level parameters for `semantic`
|
|
|
|
`field`::
|
|
(Required, string)
|
|
The `semantic_text` field to perform the query on.
|
|
|
|
`query`::
|
|
(Required, string)
|
|
The query text to be searched for on the field.
|
|
|
|
`inner_hits`::
|
|
(Optional, object)
|
|
Retrieves the specific passages that match the query.
|
|
See <<semantic-query-passage-ranking, passage ranking with the `semantic` query>> for more information.
|
|
+
|
|
.Properties of `inner_hits`
|
|
[%collapsible%open]
|
|
====
|
|
`from`::
|
|
(Optional, integer)
|
|
The offset from the first matching passage to fetch.
|
|
Used to paginate through the passages.
|
|
Defaults to `0`.
|
|
|
|
`size`::
|
|
(Optional, integer)
|
|
The maximum number of matching passages to return.
|
|
Defaults to `3`.
|
|
====
|
|
|
|
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and `semantic` query.
|
|
|
|
[discrete]
|
|
[[semantic-query-passage-ranking]]
|
|
==== Passage ranking with the `semantic` query
|
|
The `inner_hits` parameter can be used for _passage ranking_, which allows you to determine which passages in the document best match the query.
|
|
For example, if you have a document that covers varying topics:
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
POST my-index/_doc/lake_tahoe
|
|
{
|
|
"inference_field": [
|
|
"Lake Tahoe is the largest alpine lake in North America",
|
|
"When hiking in the area, please be on alert for bears"
|
|
]
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
You can use passage ranking to find the passage that best matches your query:
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
GET my-index/_search
|
|
{
|
|
"query": {
|
|
"semantic": {
|
|
"field": "inference_field",
|
|
"query": "mountain lake",
|
|
"inner_hits": { }
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
[source,console-result]
|
|
------------------------------------------------------------
|
|
{
|
|
"took": 67,
|
|
"timed_out": false,
|
|
"_shards": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"skipped": 0,
|
|
"failed": 0
|
|
},
|
|
"hits": {
|
|
"total": {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 10.844536,
|
|
"hits": [
|
|
{
|
|
"_index": "my-index",
|
|
"_id": "lake_tahoe",
|
|
"_score": 10.844536,
|
|
"_source": {
|
|
...
|
|
},
|
|
"inner_hits": { <1>
|
|
"inference_field": {
|
|
"hits": {
|
|
"total": {
|
|
"value": 2,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 10.844536,
|
|
"hits": [
|
|
{
|
|
"_index": "my-index",
|
|
"_id": "lake_tahoe",
|
|
"_nested": {
|
|
"field": "inference_field.inference.chunks",
|
|
"offset": 0
|
|
},
|
|
"_score": 10.844536,
|
|
"_source": {
|
|
"text": "Lake Tahoe is the largest alpine lake in North America"
|
|
}
|
|
},
|
|
{
|
|
"_index": "my-index",
|
|
"_id": "lake_tahoe",
|
|
"_nested": {
|
|
"field": "inference_field.inference.chunks",
|
|
"offset": 1
|
|
},
|
|
"_score": 3.2726858,
|
|
"_source": {
|
|
"text": "When hiking in the area, please be on alert for bears"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
<1> Ranked passages will be returned using the <<inner-hits,`inner_hits` response format>>, with `<inner_hits_name>` set to the `semantic_text` field name.
|
|
|
|
By default, the top three matching passages will be returned.
|
|
You can use the `size` parameter to control the number of passages returned and the `from` parameter to page through the matching passages:
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
GET my-index/_search
|
|
{
|
|
"query": {
|
|
"semantic": {
|
|
"field": "inference_field",
|
|
"query": "mountain lake",
|
|
"inner_hits": {
|
|
"from": 1,
|
|
"size": 1
|
|
}
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
[source,console-result]
|
|
------------------------------------------------------------
|
|
{
|
|
"took": 42,
|
|
"timed_out": false,
|
|
"_shards": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"skipped": 0,
|
|
"failed": 0
|
|
},
|
|
"hits": {
|
|
"total": {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 10.844536,
|
|
"hits": [
|
|
{
|
|
"_index": "my-index",
|
|
"_id": "lake_tahoe",
|
|
"_score": 10.844536,
|
|
"_source": {
|
|
...
|
|
},
|
|
"inner_hits": {
|
|
"inference_field": {
|
|
"hits": {
|
|
"total": {
|
|
"value": 2,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 10.844536,
|
|
"hits": [
|
|
{
|
|
"_index": "my-index",
|
|
"_id": "lake_tahoe",
|
|
"_nested": {
|
|
"field": "inference_field.inference.chunks",
|
|
"offset": 1
|
|
},
|
|
"_score": 3.2726858,
|
|
"_source": {
|
|
"text": "When hiking in the area, please be on alert for bears"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
|
|
[discrete]
|
|
[[hybrid-search-semantic]]
|
|
==== Hybrid search with the `semantic` query
|
|
|
|
The `semantic` query can be used as a part of a hybrid search where the `semantic` query is combined with lexical queries.
|
|
For example, the query below finds documents with the `title` field matching "mountain lake", and combines them with results from a semantic search on the field `title_semantic`, that is a `semantic_text` field.
|
|
The combined documents are then scored, and the top 3 top scored documents are returned.
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
POST my-index/_search
|
|
{
|
|
"size" : 3,
|
|
"query": {
|
|
"bool": {
|
|
"should": [
|
|
{
|
|
"match": {
|
|
"title": {
|
|
"query": "mountain lake",
|
|
"boost": 1
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"semantic": {
|
|
"field": "title_semantic",
|
|
"query": "mountain lake",
|
|
"boost": 2
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
You can also use semantic_text as part of <<rrf,Reciprocal Rank Fusion>> to make ranking relevant results easier:
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
GET my-index/_search
|
|
{
|
|
"retriever": {
|
|
"rrf": {
|
|
"retrievers": [
|
|
{
|
|
"standard": {
|
|
"query": {
|
|
"term": {
|
|
"text": "shoes"
|
|
}
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"standard": {
|
|
"query": {
|
|
"semantic": {
|
|
"field": "semantic_field",
|
|
"query": "shoes"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
],
|
|
"rank_window_size": 50,
|
|
"rank_constant": 20
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
|
|
[discrete]
|
|
[[advanced-search]]
|
|
==== Advanced search on `semantic_text` fields
|
|
|
|
The `semantic` query uses default settings for searching on `semantic_text` fields for ease of use.
|
|
If you want to fine-tune a search on a `semantic_text` field, you need to know the task type used by the `inference_id` configured in `semantic_text`.
|
|
You can find the task type using the <<get-inference-api>>, and check the `task_type` associated with the {infer} service.
|
|
Depending on the `task_type`, use either the <<query-dsl-sparse-vector-query,`sparse_vector`>> or the <<query-dsl-knn-query,`knn`>> query for greater flexibility and customization.
|
|
|
|
NOTE: While it is possible to use the `sparse_vector` query or the `knn` query
|
|
on a `semantic_text` field, it is not supported to use the `semantic_query` on a
|
|
`sparse_vector` or `dense_vector` field type.
|
|
|
|
|
|
[discrete]
|
|
[[search-sparse-inference]]
|
|
===== Search with `sparse_embedding` inference
|
|
|
|
When the {infer} endpoint uses a `sparse_embedding` model, you can use a <<query-dsl-sparse-vector-query,`sparse_vector` query>> on a <<semantic-text,`semantic_text`>> field in the following way:
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
GET test-index/_search
|
|
{
|
|
"query": {
|
|
"nested": {
|
|
"path": "inference_field.inference.chunks",
|
|
"query": {
|
|
"sparse_vector": {
|
|
"field": "inference_field.inference.chunks.embeddings",
|
|
"inference_id": "my-inference-id",
|
|
"query": "mountain lake"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
You can customize the `sparse_vector` query to include specific settings, like <<sparse-vector-query-with-pruning-config-and-rescore-example,pruning configuration>>.
|
|
|
|
|
|
[discrete]
|
|
[[search-text-inferece]]
|
|
===== Search with `text_embedding` inference
|
|
|
|
When the {infer} endpoint uses a `text_embedding` model, you can use a <<query-dsl-knn-query,`knn` query>> on a `semantic_text` field in the following way:
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
GET test-index/_search
|
|
{
|
|
"query": {
|
|
"nested": {
|
|
"path": "inference_field.inference.chunks",
|
|
"query": {
|
|
"knn": {
|
|
"field": "inference_field.inference.chunks.embeddings",
|
|
"query_vector_builder": {
|
|
"text_embedding": {
|
|
"model_id": "my_inference_id",
|
|
"model_text": "mountain lake"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip: Requires inference endpoints]
|
|
|
|
You can customize the `knn` query to include specific settings, like `num_candidates` and `k`.
|