elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Author	SHA1	Message	Date
Liam Thompson	1be1110740	[DOCS] Clarify `retriever` is not API (#108295 )	2024-05-06 15:52:25 +02:00
Michael Peterson	a451511e3a	Change skip_unavailable default value to true (#105792 ) In order to improve the experience of cross-cluster search, we are changing the default value of the remote cluster `skip_unavailable` setting from `false` to `true`. This setting causes any cross-cluster _search (or _async_search) to entirely fail when any remote cluster with `skip_unavailable=false` is either unavailable (connection to it fails) or if the search on it fails on all shards. Setting `skip_unavailable=true` allows partial results from other clusters to be returned. In that case, the search response cluster metadata will show a `skipped` status, so the user can see that no data came in from that cluster. Kibana also now leverages this metadata in the cross-cluster search responses to allow users to see how many clusters returned data and drill down into which clusters did not (including failure messages). Currently, the user/admin has to specifically set the value to `true` in the configs, like so: ``` cluster: remote: remote1: seeds: 10.10.10.10:9300 skip_unavailable: true ``` even though that is probably what search admins want in the vast majority of cases. Setting `skip_unavailable=false` should be a conscious (and probably rare) choice by an Elasticsearch admin that a particular cluster's results are so essential to a search (or visualization in dashboard or Discover panel) that no results at all should be shown if it cannot return any results.	2024-04-29 15:53:47 -04:00
eyalkoren	ee262954ee	Adding aggregations support for the `_ignored` field (#101373 ) Enables aggregations on the _ignored metadata field replacing the stored field with doc values.	2024-04-29 16:41:34 +02:00
Jim Ferenczi	4380cd1bd5	Allow rescorer with field collapsing (#107779 ) This change adds the support for rescoring collapsed documents. The rescoring is applied on the top document per group on each shard. Closes #27243	2024-04-29 08:48:12 +01:00
Panagiotis Bailis	fdefe09041	Fix for from parameter when using sub_searches and rank (#106253 )	2024-04-25 20:11:44 +03:00
Luca Cavanna	223e7f829b	Avoid attempting to load the same empty field twice in fetch phase (#107551 ) During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes. We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead. This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time). This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value. Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`). Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.	2024-04-17 19:37:04 +02:00
Liam Thompson	33a71e3289	[DOCS] Refactor book-scoped variables in `docs/reference/index.asciidoc` (#107413 ) * Remove `es-test-dir` book-scoped variable * Remove `plugins-examples-dir` book-scoped variable * Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables - In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed. - In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path - In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem * Replace `es-repo-dir` with `es-ref-dir` * Move `:include-xpack: true` to few files that use it, remove from index.asciidoc	2024-04-17 14:37:07 +02:00
Salvatore Campagna	4dfcb0897e	Fetch meta fields in FetchFieldsPhase using ValueFetcher (#106325 ) Here we extract the logic to populate metadata fields such as _ignored, _routing, _size and the deprecated _type into FetchFieldsPhase so that we can use the ValueFetcher interface to retrieve field values. This allows us to fetch values no matter if the Mapper uses stored or doc values.	2024-04-15 11:02:18 +02:00
István Zoltán Szabó	afb492272a	[DOCS] Adds HuggingFace example to inference API tutorial (#107298 )	2024-04-10 17:57:18 +02:00
Bogdan Pintea	f9ae6db319	ESQL: Add docs for the OPTIONS directive (#107013 ) This adds the docs for the newly added `OPTIONS` directive to `FROM`.	2024-04-03 16:23:36 +02:00
Liam Thompson	573c03262f	[Docs] Fix CCS matrix for 8.13 (#107028 )	2024-04-03 10:54:49 +02:00
Albert Zaharovits	df0fd30e7a	[Doc] Privileges required to retrieve the status of async searches Document that users can retrieve the status of the async searches they submitted without any extra privileges.	2024-04-02 09:35:02 +03:00
Benjamin Trent	89bf4b33e8	Make int8_hnsw our default index for new dense-vector fields (#106836 ) For float32, there is no compelling reason to use all the memory required by default for HNSW. Using `int8_hnsw` provides a much saner default when it comes to cost vs relevancy. So, on all new indices that use `dense_vector` and want to index them for fast search, we will default to `int8_hnsw`. Users can still customize their parameters, or prefer `hnsw` over float32 if they so desire.	2024-04-01 08:23:32 -04:00
Albert Zaharovits	b4938e1645	Query API Key Information API support for the `typed_keys` request parameter (#106873 ) The typed_keys request parameter is the canonical parameter, that's also used in the regular index _search enpoint, in order to return the types of aggregations in the response. This is required by typed language clients of the _security/_query/api_key endpoint that are using aggregations. Closes #106817	2024-03-29 09:24:52 +02:00
Jack Conradson	5ef0b57f77	Remove rank and sub_searches elements from documentation (#106827 ) This change removes the technical preview elements rank and sub_searches from the search API documentation now that retrievers are available.	2024-03-27 10:51:13 -07:00
István Zoltán Szabó	a3d96b9333	[DOCS] Changes model_id path param to inference_id (#106719 )	2024-03-26 08:20:34 +01:00
Liam Thompson	e92420dc86	[DOCS] Update cross cluster search compatability matrix (#106677 )	2024-03-22 15:28:30 +01:00
István Zoltán Szabó	32dbc28e82	[DOCS] Adds disclaimer to semantic search tutorials (#106590 )	2024-03-21 11:32:57 +01:00
Ioana Tagirta	d01adfff60	Add links to text_expansion in ELSER tutorial (#106490 ) * Add links to text_expansion in ELSER tutorial * Apply suggestions from code review Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2024-03-20 10:03:04 +01:00
Aurélien FOUCRET	e944619e01	Fix typo in the LTR guide. (#106276 )	2024-03-13 09:05:47 +01:00
Panagiotis Bailis	d471ccb5bb	Adding support for hex-encoded byte vectors on knn-search (#105393 )	2024-03-13 09:24:51 +02:00
Jack Conradson	68b0acac8f	Add retrievers using the parser-only approach (#105470 ) This enhancement adds a new abstraction to the _search API called "retriever." A retriever is something that returns top hits. This adds three initial retrievers called "standard", "knn", and "rrf". The retrievers use a parser-only approach where they are parsed and then translated into a SearchSourceBuilder to execute the actual search. --------- Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2024-03-12 10:11:55 -07:00
Aurélien FOUCRET	5f81c1bbe6	First version of the LTR guide. (#105956 )	2024-03-11 17:26:01 +01:00
Nhat Nguyen	863cbf6bb4	Add docs for cross cluster search in ES\|QL(#105934 ) This change adds a documentation for cross cluster search in ES\|QL. Relates #102954 Closes #105529	2024-03-07 13:15:01 -08:00
István Zoltán Szabó	3dcfbe0732	[DOCS] Changes the cohere example to use a different model (#106037 )	2024-03-06 19:40:04 +01:00
István Zoltán Szabó	6ae9dbfda7	[DOCS] Adds cohere service example to the inference API tutorial (#105904 ) Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>	2024-03-04 16:43:41 +01:00
Liam Thompson	9e5fe197ca	[DOCS] Fix sublist syntax (#105625 )	2024-02-19 16:25:31 +01:00
Matteo Piergiovanni	54cfce4379	Flag in _field_caps to return only fields with values in index (#103651 ) We are adding a query parameter to the field_caps api in order to filter out fields with no values. The parameter is called `include_empty_fields` and defaults to true, and if set to false it will filter out from the field_caps response all the fields that has no value in the index. We keep track of FieldInfos during refresh in order to know which field has value in an index. We added also a system property `es.field_caps_empty_fields_filter` in order to disable this feature if needed. --------- Co-authored-by: Matthias Wilhelm <ankertal@gmail.com>	2024-02-08 17:52:21 +01:00
Panagiotis Bailis	7ce8d76559	Making k and num_candidates optional for knn search (#101209 )	2024-02-01 15:43:09 +02:00
Michael Peterson	06a25b60c9	Add keep_alive param to the async-search status endpoint (#104629 )	2024-01-31 17:25:37 -05:00
David Kyle	2cbe23a189	[DOCS] Dense vector element type should be float for OpenAI (#104966 )	2024-01-31 11:13:03 +00:00
Liam Thompson	dac0f4a371	[DOCS] Update CCS compatibility matrix for 8.12 (#104663 )	2024-01-24 10:18:11 +01:00
Michael Peterson	e8370f8c43	Update search-across-clusters API docs to include incremental partial results (#104489 )	2024-01-22 08:34:20 -05:00
Benjamin Trent	e4feaff900	Add support for more than one inner_hit when searching nested vectors (#104006 ) This commit adds the ability to gather more than one inner_hit when searching nested kNN. # Global kNN example ``` POST test/_search { "_source": false, "fields": [ "name" ], "knn": { "field": "nested.vector", "query_vector": [ -0.5, 90, -10, 14.8, -156 ], "k": 3, "num_candidates": 3, "inner_hits": { "size": 2, "fields": [ "nested.paragraph_id" ], "_source": false } } } ``` Results in <details> ``` { "took": 66, "timed_out": false, "_shards": { "total": 2, "successful": 2, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 0.009090909, "hits": [ { "_index": "test", "_id": "2", "_score": 0.009090909, "fields": { "name": [ "moose.jpg" ] }, "inner_hits": { "nested": { "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 0.009090909, "hits": [ { "_index": "test", "_id": "2", "_nested": { "field": "nested", "offset": 0 }, "_score": 0.009090909, "fields": { "nested": [ { "paragraph_id": [ "0" ] } ] } }, { "_index": "test", "_id": "2", "_nested": { "field": "nested", "offset": 1 }, "_score": 0.004968944, "fields": { "nested": [ { "paragraph_id": [ "2" ] } ] } } ] } } } }, { "_index": "test", "_id": "3", "_score": 0.0021519717, "fields": { "name": [ "rabbit.jpg" ] }, "inner_hits": { "nested": { "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 0.0021519717, "hits": [ { "_index": "test", "_id": "3", "_nested": { "field": "nested", "offset": 0 }, "_score": 0.0021519717, "fields": { "nested": [ { "paragraph_id": [ "0" ] } ] } } ] } } } } ] } } ``` </details> # kNN Query example With a kNN query, this opens an interesting door, which allows for multiple inner_hit scoring schemes. ## Nearest by max passage only ``` POST test/_search { "size": 3, "query": { "nested": { "path": "nested", "score_mode": "max", "query": { "knn": { "field": "nested.vector", "query_vector": [ -0.5, 90, -10, 14.8, -156 ], "num_candidates": 5 } }, "inner_hits": { "size": 2, "_source": false, "fields": [ "nested.paragraph_id" ] } } } } ``` </details> closes: https://github.com/elastic/elasticsearch/issues/102950	2024-01-17 11:32:46 -05:00
Benjamin Trent	73f537170b	Update nested knn search documentation about inner-hits (#104154 ) Adding a link tag for inner hits behavior and kNN search. Additionally adding a note that if you are using multiple knn clauses, that the inner hit name should be provided.	2024-01-10 07:46:42 -05:00
Kathleen DeRusso	bdde29720a	Update synonyms doc with warning about index creation (#103476 ) * Update synonyms doc with warning about index creation * PR feedback * Moved warning in docs	2023-12-18 13:18:51 -05:00
István Zoltán Szabó	c55495d502	[DOCS] Adds inference API end-to-end example (#103042 ) Co-authored-by: David Kyle <david.kyle@elastic.co>	2023-12-12 12:02:47 +01:00
Benjamin Trent	7fde357f3a	Improve docs around knn similarity search (#103158 ) Adding equations to the docs around how to best calculate similarity & score. The similarity parameter for search was added in 8.8. The max-inner-product mentions will be removed for all versions before 8.11 when backporting. closes: https://github.com/elastic/elasticsearch/issues/102924	2023-12-11 14:56:16 -05:00
Abdon Pijpelink	6b60a53732	Update rrf.asciidoc (#103078 ) (#103109 ) typo (cherry picked from commit `851cab63eb`) Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>	2023-12-11 13:02:49 +01:00
Benjamin Trent	47b57537ae	Add docs for the include_named_queries_score param (#103155 ) The only docs for this _search param were mentioned in the bool query docs. While it makes contextual sense to have it there, we should also add it as a _search parameter in the search API docs. It was introduced in 8.8.	2023-12-08 14:39:18 -05:00
Kathleen DeRusso	4dd9e2a772	[Query Rules] Add some usability clarifications to docs (#102990 ) * [Query Rules] Add some usability clarifications to docs * Fix typo	2023-12-06 17:16:56 -05:00
Benjamin Trent	f00364aefd	Add byte quantization for float vectors in HNSW (#102093 ) Adds new `quantization_options` to `dense_vector`. This allows for vectors to be automatically quantized to `byte` when indexed. Example: ``` PUT vectors { "mappings": { "properties": { "my_vector": { "type": "dense_vector", "index": true, "index_options": { "type": "int8_hnsw" } } } } } ``` When querying, the query vector is automatically quantized and used when querying the HNSW graph. This reduces the memory required to only `25%` of what was previously required for `float` vectors at a slight loss of accuracy. This is currently only available when `index: true` and when using `hnsw`	2023-11-29 12:29:55 -05:00
Luca Cavanna	7c9e8356e6	Merge branch 'main' into lucene_snapshot	2023-11-24 09:57:22 +01:00
Saikat Sarkar	d4f01fc7b3	Gather vector_operation count for knn search (#102032 )	2023-11-21 12:16:21 -07:00
Luca Cavanna	9cd96df179	Add support for index_filter to open pit (#102388 ) The open point in time API accepts a list of indices and opens a point in time view against those indices. Like we do already for field caps, this commit allows users to provide an index_filter parameter as part of the request body, that will be used to execute the can match phase and exclude the indices that can't possibly match such filter. Closes #99740	2023-11-21 15:35:49 +01:00
Kathleen DeRusso	4567d397fa	Clarify text expansion query docs to not suggest enabling track_total_hits for performance (#102102 )	2023-11-20 08:56:26 -05:00
István Zoltán Szabó	c303ab885a	[DOCS] Simplifies dense vector mapping in semantic search example (#102080 )	2023-11-14 10:52:56 +01:00
Abdon Pijpelink	70128f5b74	[DOCS] Mark 'ignore_throttled' deprecated in all docs (#101838 )	2023-11-07 13:03:49 +01:00
Abdon Pijpelink	49c5b03d57	[DOCS] Update CCS compatibility matrix for 8.11 (#101786 )	2023-11-06 08:41:15 +01:00
Mayya Sharipova	61c7483fc9	Make knn search a query (#98916 ) This introduced a new knn query: - knn query is executed during the Query phase similar to all other queries. - No k parameter, k defaults to size - num_candidates is a size of queue for candidates to consider while search a graph on each shard - For aggregations: "size" results are collected with total = size * shards. Aggregations will see size * shards results. - All filters from DSL are applied as post-filters, except: 1) alias filter is applied as pre-filter or 2) a filter provided as a parameter inside knn query.	2023-11-01 14:21:40 -04:00

1 2 3 4 5 ...

1296 commits