mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
270 lines
7.9 KiB
Text
270 lines
7.9 KiB
Text
[[query-dsl-text-expansion-query]]
|
|
=== Text expansion query
|
|
|
|
++++
|
|
<titleabbrev>Text expansion</titleabbrev>
|
|
++++
|
|
|
|
deprecated[8.15.0, This query has been replaced by <<query-dsl-sparse-vector-query>>.]
|
|
|
|
.Deprecation usage note
|
|
****
|
|
You can continue using `rank_features` fields with `text_expansion` queries in the current version.
|
|
However, if you plan to upgrade, we recommend updating mappings to use the `sparse_vector` field type and <<docs-reindex,reindexing your data>>.
|
|
This will allow you to take advantage of the new capabilities and improvements available in newer versions.
|
|
****
|
|
|
|
The text expansion query uses a {nlp} model to convert the query text into a list of token-weight pairs which are then used in a query against a
|
|
<<sparse-vector,sparse vector>> or <<rank-features,rank features>> field.
|
|
|
|
[discrete]
|
|
[[text-expansion-query-ex-request]]
|
|
==== Example request
|
|
|
|
[source,console]
|
|
----
|
|
GET _search
|
|
{
|
|
"query":{
|
|
"text_expansion":{
|
|
"<sparse_vector_field>":{
|
|
"model_id":"the model to produce the token weights",
|
|
"model_text":"the query string"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip: TBD]
|
|
|
|
[discrete]
|
|
[[text-expansion-query-params]]
|
|
==== Top level parameters for `text_expansion`
|
|
|
|
`<sparse_vector_field>`:::
|
|
(Required, object) The name of the field that contains the token-weight pairs the NLP model created based on the input text.
|
|
|
|
[discrete]
|
|
[[text-expansion-rank-feature-field-params]]
|
|
==== Top level parameters for `<sparse_vector_field>`
|
|
|
|
`model_id`::::
|
|
(Required, string) The ID of the model to use to convert the query text into token-weight pairs.
|
|
It must be the same model ID that was used to create the tokens from the input text.
|
|
|
|
`model_text`::::
|
|
(Required, string) The query text you want to use for search.
|
|
|
|
`pruning_config` ::::
|
|
(Optional, object)
|
|
preview:[]
|
|
Optional pruning configuration.
|
|
If enabled, this will omit non-significant tokens from the query in order to improve query performance.
|
|
Default: Disabled.
|
|
+
|
|
--
|
|
Parameters for `<pruning_config>` are:
|
|
|
|
`tokens_freq_ratio_threshold`::
|
|
(Optional, integer)
|
|
preview:[]
|
|
Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned.
|
|
This value must between 1 and 100.
|
|
Default: `5`.
|
|
|
|
`tokens_weight_threshold`::
|
|
(Optional, float)
|
|
preview:[]
|
|
Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned.
|
|
This value must be between 0 and 1.
|
|
Default: `0.4`.
|
|
|
|
`only_score_pruned_tokens`::
|
|
(Optional, boolean)
|
|
preview:[]
|
|
If `true` we only input pruned tokens into scoring, and discard non-pruned tokens.
|
|
It is strongly recommended to set this to `false` for the main query, but this can be set to `true` for a rescore query to get more relevant results.
|
|
Default: `false`.
|
|
|
|
NOTE: The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSER that provided the most optimal results.
|
|
--
|
|
|
|
[discrete]
|
|
[[text-expansion-query-example]]
|
|
==== Example ELSER query
|
|
|
|
The following is an example of the `text_expansion` query that references the ELSER model to perform semantic search.
|
|
For a more detailed description of how to perform semantic search by using ELSER and the `text_expansion` query, refer to <<semantic-search-elser,this tutorial>>.
|
|
|
|
[source,console]
|
|
----
|
|
GET my-index/_search
|
|
{
|
|
"query":{
|
|
"text_expansion":{
|
|
"ml.tokens":{
|
|
"model_id":".elser_model_2",
|
|
"model_text":"How is the weather in Jamaica?"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip: TBD]
|
|
|
|
Multiple `text_expansion` queries can be combined with each other or other query types.
|
|
This can be achieved by wrapping them in <<query-dsl-bool-query, boolean query clauses>> and using linear boosting:
|
|
|
|
[source,console]
|
|
----
|
|
GET my-index/_search
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"should": [
|
|
{
|
|
"text_expansion": {
|
|
"ml.inference.title_expanded.predicted_value": {
|
|
"model_id": ".elser_model_2",
|
|
"model_text": "How is the weather in Jamaica?",
|
|
"boost": 1
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"text_expansion": {
|
|
"ml.inference.description_expanded.predicted_value": {
|
|
"model_id": ".elser_model_2",
|
|
"model_text": "How is the weather in Jamaica?",
|
|
"boost": 1
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"multi_match": {
|
|
"query": "How is the weather in Jamaica?",
|
|
"fields": [
|
|
"title",
|
|
"description"
|
|
],
|
|
"boost": 4
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip: TBD]
|
|
|
|
This can also be achieved using <<rrf, reciprocal rank fusion (RRF)>>, through an <<rrf-retriever, `rrf` retriever>> with multiple
|
|
<<standard-retriever, `standard` retrievers>>.
|
|
|
|
[source,console]
|
|
----
|
|
GET my-index/_search
|
|
{
|
|
"retriever": {
|
|
"rrf": {
|
|
"retrievers": [
|
|
{
|
|
"standard": {
|
|
"query": {
|
|
"multi_match": {
|
|
"query": "How is the weather in Jamaica?",
|
|
"fields": [
|
|
"title",
|
|
"description"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"standard": {
|
|
"query": {
|
|
"text_expansion": {
|
|
"ml.inference.title_expanded.predicted_value": {
|
|
"model_id": ".elser_model_2",
|
|
"model_text": "How is the weather in Jamaica?"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"standard": {
|
|
"query": {
|
|
"text_expansion": {
|
|
"ml.inference.description_expanded.predicted_value": {
|
|
"model_id": ".elser_model_2",
|
|
"model_text": "How is the weather in Jamaica?"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
],
|
|
"window_size": 10,
|
|
"rank_constant": 20
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip: TBD]
|
|
|
|
[discrete]
|
|
[[text-expansion-query-with-pruning-config-and-rescore-example]]
|
|
==== Example ELSER query with pruning configuration and rescore
|
|
|
|
The following is an extension to the above example that adds a preview:[] pruning configuration to the `text_expansion` query.
|
|
The pruning configuration identifies non-significant tokens to prune from the query in order to improve query performance.
|
|
|
|
Token pruning happens at the shard level.
|
|
While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard.
|
|
Therefore, if you are running `text_expansion` with a `pruning_config` on a multi-shard index, we strongly recommend adding a <<rescore>> function with the tokens that were originally pruned from the query.
|
|
This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
|
|
|
|
[source,console]
|
|
----
|
|
GET my-index/_search
|
|
{
|
|
"query":{
|
|
"text_expansion":{
|
|
"ml.tokens":{
|
|
"model_id":".elser_model_2",
|
|
"model_text":"How is the weather in Jamaica?",
|
|
"pruning_config": {
|
|
"tokens_freq_ratio_threshold": 5,
|
|
"tokens_weight_threshold": 0.4,
|
|
"only_score_pruned_tokens": false
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"rescore": {
|
|
"window_size": 100,
|
|
"query": {
|
|
"rescore_query": {
|
|
"text_expansion": {
|
|
"ml.tokens": {
|
|
"model_id": ".elser_model_2",
|
|
"model_text": "How is the weather in Jamaica?",
|
|
"pruning_config": {
|
|
"tokens_freq_ratio_threshold": 5,
|
|
"tokens_weight_threshold": 0.4,
|
|
"only_score_pruned_tokens": true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
//TEST[skip: TBD]
|
|
|
|
[NOTE]
|
|
====
|
|
Depending on your data, the text expansion query may be faster with `track_total_hits: false`.
|
|
====
|