Commit graph

1296 commits

Author SHA1 Message Date
Liam Thompson
1be1110740
[DOCS] Clarify retriever is not API (#108295) 2024-05-06 15:52:25 +02:00
Michael Peterson
a451511e3a
Change skip_unavailable default value to true (#105792)
In order to improve the experience of cross-cluster search, we are changing
the default value of the remote cluster `skip_unavailable` setting from `false` to `true`.

This setting causes any cross-cluster _search (or _async_search) to entirely fail when
any remote cluster with `skip_unavailable=false` is either unavailable (connection to it fails)
or if the search on it fails on all shards.

Setting `skip_unavailable=true` allows partial results from other clusters to be
returned. In that case, the search response cluster metadata will show a `skipped`
status, so the user can see that no data came in from that cluster. Kibana also
now leverages this metadata in the cross-cluster search responses to allow users
to see how many clusters returned data and drill down into which clusters did not
(including failure messages).

Currently, the user/admin has to specifically set the value to `true` in the configs, like so:

```
cluster:
    remote:
        remote1:
            seeds: 10.10.10.10:9300
            skip_unavailable: true
```

even though that is probably what search admins want in the vast majority of cases.

Setting `skip_unavailable=false` should be a conscious (and probably rare) choice
by an Elasticsearch admin that a particular cluster's results are so essential to a
search (or visualization in dashboard or Discover panel) that no results at all should
be shown if it cannot return any results.
2024-04-29 15:53:47 -04:00
eyalkoren
ee262954ee
Adding aggregations support for the _ignored field (#101373)
Enables aggregations on the _ignored metadata field replacing the stored field
with doc values.
2024-04-29 16:41:34 +02:00
Jim Ferenczi
4380cd1bd5
Allow rescorer with field collapsing (#107779)
This change adds the support for rescoring collapsed documents.
The rescoring is applied on the top document per group on each shard.

Closes #27243
2024-04-29 08:48:12 +01:00
Panagiotis Bailis
fdefe09041
Fix for from parameter when using sub_searches and rank (#106253) 2024-04-25 20:11:44 +03:00
Luca Cavanna
223e7f829b
Avoid attempting to load the same empty field twice in fetch phase (#107551)
During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes.

We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead.

This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time).

This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value.

Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`).

Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.
2024-04-17 19:37:04 +02:00
Liam Thompson
33a71e3289
[DOCS] Refactor book-scoped variables in docs/reference/index.asciidoc (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00
Salvatore Campagna
4dfcb0897e
Fetch meta fields in FetchFieldsPhase using ValueFetcher (#106325)
Here we extract the logic to populate metadata fields such as _ignored, _routing, _size and the deprecated _type into FetchFieldsPhase so that we can use the ValueFetcher interface to retrieve field values. This allows us to fetch values no matter if the Mapper uses stored or doc values.
2024-04-15 11:02:18 +02:00
István Zoltán Szabó
afb492272a
[DOCS] Adds HuggingFace example to inference API tutorial (#107298) 2024-04-10 17:57:18 +02:00
Bogdan Pintea
f9ae6db319
ESQL: Add docs for the OPTIONS directive (#107013)
This adds the docs for the newly added `OPTIONS` directive to `FROM`.
2024-04-03 16:23:36 +02:00
Liam Thompson
573c03262f
[Docs] Fix CCS matrix for 8.13 (#107028) 2024-04-03 10:54:49 +02:00
Albert Zaharovits
df0fd30e7a
[Doc] Privileges required to retrieve the status of async searches
Document that users can retrieve the status of the
async searches they submitted without any extra privileges.
2024-04-02 09:35:02 +03:00
Benjamin Trent
89bf4b33e8
Make int8_hnsw our default index for new dense-vector fields (#106836)
For float32, there is no compelling reason to use all the memory
required by default for HNSW. Using `int8_hnsw` provides a much saner
default when it comes to cost vs relevancy. 

So, on all new indices that use `dense_vector` and want to index them
for fast search, we will default to `int8_hnsw`. 

Users can still customize their parameters, or prefer `hnsw` over
float32 if they so desire.
2024-04-01 08:23:32 -04:00
Albert Zaharovits
b4938e1645
Query API Key Information API support for the typed_keys request parameter (#106873)
The typed_keys request parameter is the canonical parameter,
that's also used in the regular index _search enpoint, in order to
return the types of aggregations in the response.
This is required by typed language clients of the _security/_query/api_key
endpoint that are using aggregations.

Closes #106817
2024-03-29 09:24:52 +02:00
Jack Conradson
5ef0b57f77
Remove rank and sub_searches elements from documentation (#106827)
This change removes the technical preview elements rank and sub_searches from the search API 
documentation now that retrievers are available.
2024-03-27 10:51:13 -07:00
István Zoltán Szabó
a3d96b9333
[DOCS] Changes model_id path param to inference_id (#106719) 2024-03-26 08:20:34 +01:00
Liam Thompson
e92420dc86
[DOCS] Update cross cluster search compatability matrix (#106677) 2024-03-22 15:28:30 +01:00
István Zoltán Szabó
32dbc28e82
[DOCS] Adds disclaimer to semantic search tutorials (#106590) 2024-03-21 11:32:57 +01:00
Ioana Tagirta
d01adfff60
Add links to text_expansion in ELSER tutorial (#106490)
* Add links to text_expansion in ELSER tutorial

* Apply suggestions from code review

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-03-20 10:03:04 +01:00
Aurélien FOUCRET
e944619e01
Fix typo in the LTR guide. (#106276) 2024-03-13 09:05:47 +01:00
Panagiotis Bailis
d471ccb5bb
Adding support for hex-encoded byte vectors on knn-search (#105393) 2024-03-13 09:24:51 +02:00
Jack Conradson
68b0acac8f
Add retrievers using the parser-only approach (#105470)
This enhancement adds a new abstraction to the _search API called "retriever." A 
retriever is something that returns top hits. This adds three initial retrievers called
"standard", "knn", and "rrf". The retrievers use a parser-only approach where they
are parsed and then translated into a SearchSourceBuilder to execute the actual
search.
---------

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-03-12 10:11:55 -07:00
Aurélien FOUCRET
5f81c1bbe6
First version of the LTR guide. (#105956) 2024-03-11 17:26:01 +01:00
Nhat Nguyen
863cbf6bb4
Add docs for cross cluster search in ES|QL(#105934)
This change adds a documentation for cross cluster search in ES|QL.

Relates #102954
Closes #105529
2024-03-07 13:15:01 -08:00
István Zoltán Szabó
3dcfbe0732
[DOCS] Changes the cohere example to use a different model (#106037) 2024-03-06 19:40:04 +01:00
István Zoltán Szabó
6ae9dbfda7
[DOCS] Adds cohere service example to the inference API tutorial (#105904)
Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>
2024-03-04 16:43:41 +01:00
Liam Thompson
9e5fe197ca
[DOCS] Fix sublist syntax (#105625) 2024-02-19 16:25:31 +01:00
Matteo Piergiovanni
54cfce4379
Flag in _field_caps to return only fields with values in index (#103651)
We are adding a query parameter to the field_caps api in order to filter out 
fields with no values. The parameter is called `include_empty_fields`  and 
defaults to true, and if set to false it will filter out from the field_caps 
response all the fields that has no value in the index.
We keep track of FieldInfos during refresh in order to know which field has 
value in an index. We added also a system property 
`es.field_caps_empty_fields_filter` in order to disable this feature if needed.

---------

Co-authored-by: Matthias Wilhelm <ankertal@gmail.com>
2024-02-08 17:52:21 +01:00
Panagiotis Bailis
7ce8d76559
Making k and num_candidates optional for knn search (#101209) 2024-02-01 15:43:09 +02:00
Michael Peterson
06a25b60c9
Add keep_alive param to the async-search status endpoint (#104629) 2024-01-31 17:25:37 -05:00
David Kyle
2cbe23a189
[DOCS] Dense vector element type should be float for OpenAI (#104966) 2024-01-31 11:13:03 +00:00
Liam Thompson
dac0f4a371
[DOCS] Update CCS compatibility matrix for 8.12 (#104663) 2024-01-24 10:18:11 +01:00
Michael Peterson
e8370f8c43
Update search-across-clusters API docs to include incremental partial results (#104489) 2024-01-22 08:34:20 -05:00
Benjamin Trent
e4feaff900
Add support for more than one inner_hit when searching nested vectors (#104006)
This commit adds the ability to gather more than one inner_hit when
searching nested kNN.

# Global kNN example

```
POST test/_search
{
    "_source": false,
    "fields": [
        "name"
    ],
    "knn": {
        "field": "nested.vector",
        "query_vector": [
            -0.5,
            90,
            -10,
            14.8,
            -156
        ],
        "k": 3,
        "num_candidates": 3,
        "inner_hits": {
            "size": 2,
            "fields": [
                "nested.paragraph_id"
            ],
            "_source": false
        }
    }
}
```

Results in

<details>

```
{
    "took": 66,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.009090909,
        "hits": [
            {
                "_index": "test",
                "_id": "2",
                "_score": 0.009090909,
                "fields": {
                    "name": [
                        "moose.jpg"
                    ]
                },
                "inner_hits": {
                    "nested": {
                        "hits": {
                            "total": {
                                "value": 2,
                                "relation": "eq"
                            },
                            "max_score": 0.009090909,
                            "hits": [
                                {
                                    "_index": "test",
                                    "_id": "2",
                                    "_nested": {
                                        "field": "nested",
                                        "offset": 0
                                    },
                                    "_score": 0.009090909,
                                    "fields": {
                                        "nested": [
                                            {
                                                "paragraph_id": [
                                                    "0"
                                                ]
                                            }
                                        ]
                                    }
                                },
                                {
                                    "_index": "test",
                                    "_id": "2",
                                    "_nested": {
                                        "field": "nested",
                                        "offset": 1
                                    },
                                    "_score": 0.004968944,
                                    "fields": {
                                        "nested": [
                                            {
                                                "paragraph_id": [
                                                    "2"
                                                ]
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "_index": "test",
                "_id": "3",
                "_score": 0.0021519717,
                "fields": {
                    "name": [
                        "rabbit.jpg"
                    ]
                },
                "inner_hits": {
                    "nested": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 0.0021519717,
                            "hits": [
                                {
                                    "_index": "test",
                                    "_id": "3",
                                    "_nested": {
                                        "field": "nested",
                                        "offset": 0
                                    },
                                    "_score": 0.0021519717,
                                    "fields": {
                                        "nested": [
                                            {
                                                "paragraph_id": [
                                                    "0"
                                                ]
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}
```

</details>

# kNN Query example

With a kNN query, this opens an interesting door, which allows for
multiple inner_hit scoring schemes.

## Nearest by max passage only

```
POST test/_search
{
    "size": 3,
    "query": {
        "nested": {
            "path": "nested",
            "score_mode": "max",
            "query": {
                "knn": {
                    "field": "nested.vector",
                    "query_vector": [
                        -0.5,
                        90,
                        -10,
                        14.8,
                        -156
                    ],
                    "num_candidates": 5
                }
            },
            "inner_hits": {
                "size": 2,
                "_source": false,
                "fields": [
                    "nested.paragraph_id"
                ]
            }
        }
    }
}
```

</details>

closes: https://github.com/elastic/elasticsearch/issues/102950
2024-01-17 11:32:46 -05:00
Benjamin Trent
73f537170b
Update nested knn search documentation about inner-hits (#104154)
Adding a link tag for inner hits behavior and kNN search. Additionally
adding a note that if you are using multiple knn clauses, that the inner
hit name should be provided.
2024-01-10 07:46:42 -05:00
Kathleen DeRusso
bdde29720a
Update synonyms doc with warning about index creation (#103476)
* Update synonyms doc with warning about index creation

* PR feedback

* Moved warning in docs
2023-12-18 13:18:51 -05:00
István Zoltán Szabó
c55495d502
[DOCS] Adds inference API end-to-end example (#103042)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-12-12 12:02:47 +01:00
Benjamin Trent
7fde357f3a
Improve docs around knn similarity search (#103158)
Adding equations to the docs around how to best calculate similarity & score. The similarity parameter for search was added in 8.8.

The max-inner-product mentions will be removed for all versions before 8.11 when backporting.

closes: https://github.com/elastic/elasticsearch/issues/102924
2023-12-11 14:56:16 -05:00
Abdon Pijpelink
6b60a53732
Update rrf.asciidoc (#103078) (#103109)
typo

(cherry picked from commit 851cab63eb)

Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>
2023-12-11 13:02:49 +01:00
Benjamin Trent
47b57537ae
Add docs for the include_named_queries_score param (#103155)
The only docs for this _search param were mentioned in the bool query docs. While it makes contextual sense to have it there, we should also add it as a _search parameter in the search API docs.

It was introduced in 8.8.
2023-12-08 14:39:18 -05:00
Kathleen DeRusso
4dd9e2a772
[Query Rules] Add some usability clarifications to docs (#102990)
* [Query Rules] Add some usability clarifications to docs

* Fix typo
2023-12-06 17:16:56 -05:00
Benjamin Trent
f00364aefd
Add byte quantization for float vectors in HNSW (#102093)
Adds new `quantization_options` to `dense_vector`. This allows for
vectors to be automatically quantized to `byte` when indexed.

Example:

```
PUT vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "int8_hnsw"
        }
      }
    }
  }
}
```

When querying, the query vector is automatically quantized and used when
querying the HNSW graph. This reduces the memory required to only `25%`
of what was previously required for `float` vectors at a slight loss of
accuracy.

This is currently only available when `index: true` and when using
`hnsw`
2023-11-29 12:29:55 -05:00
Luca Cavanna
7c9e8356e6 Merge branch 'main' into lucene_snapshot 2023-11-24 09:57:22 +01:00
Saikat Sarkar
d4f01fc7b3
Gather vector_operation count for knn search (#102032) 2023-11-21 12:16:21 -07:00
Luca Cavanna
9cd96df179
Add support for index_filter to open pit (#102388)
The open point in time API accepts a list of indices and opens a point in time view against those indices.
Like we do already for field caps, this commit allows users to provide an index_filter parameter as part of
the request body, that will be used to execute the can match phase and exclude the indices that can't possibly
match such filter.

Closes #99740
2023-11-21 15:35:49 +01:00
Kathleen DeRusso
4567d397fa
Clarify text expansion query docs to not suggest enabling track_total_hits for performance (#102102) 2023-11-20 08:56:26 -05:00
István Zoltán Szabó
c303ab885a
[DOCS] Simplifies dense vector mapping in semantic search example (#102080) 2023-11-14 10:52:56 +01:00
Abdon Pijpelink
70128f5b74
[DOCS] Mark 'ignore_throttled' deprecated in all docs (#101838) 2023-11-07 13:03:49 +01:00
Abdon Pijpelink
49c5b03d57
[DOCS] Update CCS compatibility matrix for 8.11 (#101786) 2023-11-06 08:41:15 +01:00
Mayya Sharipova
61c7483fc9
Make knn search a query (#98916)
This introduced a new knn query:
- knn query is executed during the Query phase similar to all other queries.
- No k parameter, k defaults to  size
- num_candidates is a size of queue for candidates to consider while
  search a graph on each shard
- For aggregations: "size" results are collected with total = size * shards.
   Aggregations will see size * shards results.
- All filters from DSL are applied as post-filters, except: 1) alias filter
 is applied as  pre-filter or 2) a filter provided as a parameter
 inside knn query.
2023-11-01 14:21:40 -04:00