elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 01:44:36 -04:00

Author	SHA1	Message	Date
Abdon Pijpelink	27061a530e	Revert "[DOCS] Update search_after section with an example (#89328 )" (#89411 ) Reverts elastic/elasticsearch#89328	2022-08-17 18:20:15 +09:30
Anthony McGlone	af8ac50788	[DOCS] Update search_after section with an example (#89328 ) * [DOCS] Update search_after section with an example * Update docs/reference/search/search-your-data/paginate-search-results.asciidoc Co-authored-by: Abdon Pijpelink <abdon@abdon.nl> * Update docs/reference/search/search-your-data/paginate-search-results.asciidoc Co-authored-by: Abdon Pijpelink <abdon@abdon.nl> * Update docs/reference/search/search-your-data/paginate-search-results.asciidoc Co-authored-by: Abdon Pijpelink <abdon@abdon.nl> Co-authored-by: Abdon Pijpelink <abdon@abdon.nl>	2022-08-17 09:53:14 +02:00
Julie Tibshirani	acf9a67480	Document kNN with aggregations (#89359 ) This commit adds a short note to the 'search your data' docs around kNN search to explain how approximate kNN works with aggregations: * Make section on 'hybrid retrieval' more general and include aggregations info * Remove an example response from the previous section on filtering, since this page was getting long	2022-08-16 15:28:32 -07:00
Christos Soulios	b81f4187ab	[TSDB] Metric fields in the field caps API (#88695 ) To assist the user in configuring the visualizations correctly while leveraging TSDB functionality, information about TSDB configuration should be exposed via the field caps API per field. Especially for metrics fields, it must be clear which fields are metrics and if they belong to only time-series indexes or mixed time-series and non-time-series indexes. To further distinguish metric fields when they belong to any of the following indices: - Standard (non-time-series) indexes - Time series indexes - Downsampled time series indexes This PR modifies the field caps API so that the mapping parameters time_series_dimension and time_series_dimension are presented only when they are set on fields of time-series indexes. Those parameters are completely ignored when they are set on standard (non-time-series) indexes. This PR revisits some of the conventions adopted by #78790	2022-08-04 20:42:34 +03:00
Abdon Pijpelink	b96c39e7ad	[DOCS] Move completion type asciidoc (#89086 ) * [DOCS] Move completion type asciidoc * Fix failing code snippet test	2022-08-04 10:02:28 +02:00
Julie Tibshirani	21eb984e64	Deprecate the _knn_search endpoint (#88828 ) This change deprecates the kNN search API in favor of the new 'knn' option inside the search API. The 'knn' option is now the preferred way of performing kNN search. Relates to #87625	2022-08-03 15:19:01 -04:00
Navanit Dubey	9afb01e14e	Update rank-eval.asciidoc (#88771 )	2022-07-25 18:00:49 +02:00
Julie Tibshirani	e3ede67262	Integrate ANN into _search endpoint (#88694 ) This PR adds a new `knn` option to the `_search` API to support ANN search. It's powered by the same Lucene ANN capabilities as the old `_knn_search` endpoint. The `knn` option can be combined with other search features like queries and aggregations. Addresses #87625	2022-07-22 08:02:07 -07:00
Ignacio Vera	04bdefd58c	Remove Collector implementation from BucketCollector (#88444 ) BucketCollector has now a method called #asCollector that returns the current BucketCollector wrapped as a Lucene Collector.	2022-07-18 08:18:13 +02:00
Nhat Nguyen	4732fc2343	Implement count for wrapped Weight in ContextIndexSearcher (#88396 ) Implements Weight#count() for wrapped Weights that don't change matching documents. Relatess #88284	2022-07-13 16:57:12 -04:00
David Kilfoyle	40e9f3097c	[DOCS] Add TSDS docs, take two (#87703 ) * Revert "Revert "[DOCS] Add TSDS docs (#86905)" (#87702)" This reverts commit `0c86d7b9b2`. * First fix to tests * Add data_stream object to index template * small rewording * Add enable data stream object in gradle example setup * Add bullet about data stream must be enabled in template	2022-06-16 12:44:10 -04:00
David Kilfoyle	0c86d7b9b2	Revert "[DOCS] Add TSDS docs (#86905 )" (#87702 ) Reverts elastic/elasticsearch#86905	2022-06-15 13:32:12 -04:00
David Kilfoyle	d57f4ac2c6	[DOCS] Add TSDS docs (#86905 ) * [DOCS] Add TSDB docs * Update docs/build.gradle Co-authored-by: Adam Locke <adam.locke@elastic.co> * Address Nik's comments, part 1 * Address Nik's comments, part deux * Reword write index * Add feature flags * Wrap one more section in feature flag * Small fixes * set index.routing_path to optional * Update storage reduction value * Update create index template code example Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-06-15 12:22:07 -04:00
Julie Tibshirani	fab547bef2	Improve kNN with filtering docs (#87538 ) This change tries to make it easier to find kNN with filtering in the docs: * Mention filtering support in the kNN API description * In kNN tutorial, link to the kNN search API page more prominently	2022-06-09 10:42:54 -07:00
Luca Cavanna	50793a68a8	Fields API to allow fetching values when _source is disabled (#87267 ) Back when we introduced the fields parameter to the search API, it could only fetch values from _source, hence the corresponding sub-fetch phase fails early whenever _source is disabled. Today though runtime fields can be retrieved from a separate value fetcher that reads from fielddata, and metadata fields can be retrieved from stored fields. These two scenarios currently throw an unnecessary error whenever _source is disabled. This commit removes the check for disabled _source, so that runtime fields and metadata fields can be retrieved even when _source is disabled. Fields that need to be loaded from _source are simply skipped whenever _source is disabled, similar to when a field is not found in _source. Closes #87072	2022-06-02 11:28:36 +02:00
Craig Taverner	5f7ea792ac	Soft-deprecation of point/geo_point formats (#86835 ) * Soft-deprecation of point/geo_point formats Since GeoJSON and WKT are now common formats for all three types: geo_shape, geo_point and point We decided to soft-deprecate the other point formats by ordering: * GeoJSON (object with keys `type` and `coordinates`) * WKT `POINT(x y)` * Object with keys `lat` and `lon` (or `x` and `y` for point) * Array [lon,lat] * String `"lat,lon"` (or `"x,y"` in point) * String with geohash (only in `geo_point`) The geohash is last because it is only in one field type. The string version is second last because it is the most controversial being the only version to reverse the coordinate order from all other formats (for geo_point only, since the coordinates are not reversed in point). In addition we replaced many examples in both documentation and tests to prioritize WKT over the plain string format. Many remaining examples of array format or object with keys still exist and could be replaced by, for example, GeoJSON, if we feel the need. * Incorrect quote position	2022-05-17 23:46:43 +02:00
Craig Taverner	db08d61998	Support geo label position through REST vector tiles API (#86458 ) Support label position in REST vector tiles There is a need to provide sensibly calculated label positions for polygons and lines in Kibana maps. A very convenient way to satisfy this need is through a runtime field that the rest API can make use of when labels are requested. This has the advantage of providing painless access to the label position as well. This work adds support for the REST API to provide label positions to MVT queries, both for the HITS layer and the AGGS layer. To enable this feature, set with_labels to true as a query parameter to the vector tile search query.	2022-05-17 15:33:29 +02:00
Sohail Mirza	9117f0e42a	Docs: Remove extraneous backtick (#86750 )	2022-05-16 10:49:22 +02:00
Nik Everett	a589456b81	Synthetic source (#85649 ) This attempts to shrink the index by implementing a "synthetic _source" field. You configure it by in the mapping: ``` { "mappings": { "_source": { "synthetic": true } } } ``` And we just stop storing the `_source` field - kind of. When you go to access the `_source` we regenerate it on the fly by loading doc values. Doc values don't preserve the original structure of the source you sent so we have to make some educated guesses. And we have a rule: the source we generate would result in the same index if you sent it back to us. That way you can use it for things like `_reindex`. Fetching the `_source` from doc values does slow down loading somewhat. See numbers further down. ## Supported fields This only works for the following fields: * `boolean` * `byte` * `date` * `double` * `float` * `geo_point` (with precision loss) * `half_float` * `integer` * `ip` * `keyword` * `long` * `scaled_float` * `short` * `text` (when there is a `keyword` sub-field that is compatible with this feature) ## Educated guesses The synthetic source generator makes `_source` fields that are: * sorted alphabetically * as "objecty" as possible * pushes all arrays to the "leaf" fields * sorts most array values * removes duplicate text and keyword values These are mostly artifacts of how doc values are stored. ### sorted alphabetically ``` { "b": 1, "c": 2, "a": 3 } ``` becomes ``` { "a": 3, "b": 1, "c": 2 } ``` ### as "objecty" as possible ``` { "a.b": "foo" } ``` becomes ``` { "a": { "b": "foo" } } ``` ### pushes all arrays to the "leaf" fields ``` { "a": [ { "b": "foo", "c": "bar" }, { "c": "bort" }, { "b": "snort" } } ``` becomes ``` { "a" { "b": ["foo", "snort"], "c": ["bar", "bort"] } } ``` ### sorts most array values ``` { "a": [2, 3, 1] } ``` becomes ``` { "a": [1, 2, 3] } ``` ### removes duplicate text and keyword values ``` { "a": ["bar", "baz", "baz", "baz", "foo", "foo"] } ``` becomes ``` { "a": ["bar", "baz", "foo"] } ``` ## `_recovery_source` Elasticsearch's shard "recovery" process needs `_source` sometimes. So does cross cluster replication. If you disable source or filter it somehow we store a `_recovery_source` field for as long as the recovery process might need it. When everything is running smoothly that's generally a few seconds or minutes. Then the fields is removed on merge. This synthetic source feature continues to produce `_recovery_source` and relies on it for recovery. It's possible to synthesize `_source` during recovery but we don't do it. That means that synethic source doesn't speed up writing the index. But in the future we might be able to turn this on to trade writing less data at index time for slower recovery and cross cluster replication. That's an area of future improvement. ## perf numbers I loaded the entire tsdb data set with this change and the size: ``` standard -> synthetic store size 31.0 GB -> 7.0 GB (77.5% reduction) _source 24695.7 MB -> 47.6 MB (99.8% reduction - synthetic is in _recovery_source) ``` A second _forcemerge a few minutes after rally finishes should removes the remaining 47.6MB of _recovery_source. With this fetching source for 1,000 documents seems to take about 500ms. I spot checked a lot of different areas and haven't seen any different hit. I expect this performance impact is based on the number of doc values fields in the index and how sparse they are.	2022-05-10 07:46:58 -04:00
Julie Tibshirani	10aa947707	Remove out-of-date note about kNN with filters We implemented this in #84734 but forgot to update these docs.	2022-04-14 10:18:07 -07:00
Yannick Welsch	78789e2b5d	Fix wildcard highlighting on match_only_text (#85500 ) Fixes a bug where match_only_text fields were ignored during highlighting when a field name with wildcard was specified. Closes #85493	2022-04-01 08:12:08 +02:00
Craig Taverner	0b84eb1a53	Added buffer to vector tile REST API docs (#85460 )	2022-03-30 14:29:01 +02:00
Alan Woodward	a5452603cc	Extra testing and some cleanups for filtering on field caps (#85068 ) * adds a test for mixed cluster requests * fixes a bad stream version check (above test will fail if this isn't included) * replaces private FieldCapsFilter interface with Predicate * renames 'allowedTypes' to 'types' to maintain consistency with external API * adds javadoc to ResponseRewriter * removes isRuntimeField from FieldTypeLookup Relates to #83636	2022-03-29 11:38:52 +01:00
Ignacio Vera	a780558e4c	[DOCS] Fix Vector tiles search docs for features.id (#85067 ) Removes the `features.id` property from the response body. This property was actually generated by the tool used to decode the mvt file to JSON.	2022-03-17 16:06:49 -04:00
Ignacio Vera	3f6d460d01	Integrate GeoHexGridAggregation with vector tiles API (#84553 ) This commit adds a new optional parameter on the vector tiles API called `grid_agg` with two possible values, geotile (default) and geohex. This will allow to build the aggs layer using different grid aggregations, for example we can have a grid aggregation that is built using hexagons.	2022-03-16 11:16:30 +01:00
Julie Tibshirani	15708d5454	Integrate filtering support for ANN (#84734 ) This PR integrates support for ANN with filtering added in Lucene 9.1. It adds a new `filter` section to the `_knn_search` endpoint, which accepts a query (in the Elasticsearch query DSL). The value can either be a single query or a list of queries, which matches the syntax we use for defining filter clauses in a `bool` query. Closes #81788.	2022-03-10 15:53:51 -08:00
Craig Taverner	397eccf789	Added buffer pixels to vector tile spec parsing (#84710 ) * Added buffer pixels to vector tile spec parsing Previously this was hard-coded to 5, but now is configurable using the format z/x/y@extent:buffer, where both extent and buffer are optional and default to 4096 and 5 pixels respectively. Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2022-03-10 16:42:29 +01:00
Julie Tibshirani	713017f0e3	Improve readability of field retrieval docs (#84373 ) * Collapse more specialized sections around nested fields, unmapped fields, and ignored values * Move information on metadata fields to a 'note' and streamline it a bit Closes #82983.	2022-02-28 09:52:39 -08:00
James Rodewig	6f5541a9d6	[DOCS] Update CCS forward compatibility docs (#84055 ) Documents the following: * FWC for CCS within the same major version. * A local cluster running the last minor of a major can search a remote cluster running any minor in the following major. * Only features that exist across all searched clusters are supported.	2022-02-28 08:18:04 -05:00
Julie Tibshirani	d9ef39f7c2	Remove 'under development' note in suggester docs (#84366 ) In the intro, we mention that parts of the feature are still under development. This is not very helpful information for users, and could give the wrong impression about its maturity.	2022-02-24 13:27:03 -08:00
Nhat Nguyen	86964c9752	Document partial search results with skip_unavailable (#84057 ) This commit adds an explanation for the relation between `allow_partial_search_results` and `skip_unavailable` in CCS requests. Relates to #33915 Closes #82407 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2022-02-23 10:04:52 -05:00
Nhat Nguyen	31d703f24c	Introduce lookup runtime fields (#82385 ) This PR introduces the lookup runtime fields which are used to retrieve data from the related indices. The below search request enriches its search hits with the location of each IP address from the `ip_location` index. ``` POST logs/_search { "runtime_mappings": { "location": { "type": "lookup", "lookup_index": "ip_location", "query_type": "term", "query_input_field": "ip", "query_target_field": "_id", "fetch_fields": [ "country", "city" ] } }, "fields": [ "timestamp", "message", "location" ] } ``` Response: ``` { "hits": { "hits": [ { "_index": "logs", "_id": "1", "fields": { "location": [ { "city": [ "Montreal" ], "country": [ "Canada" ] } ], "message": [ "the first message" ] } } ] } } ```	2022-02-22 21:36:19 -05:00
Alan Woodward	8bc46ad959	Add filtering to fieldcaps endpoint (#83636 ) Many consumers of the field caps API need to do some post-processing of the results before they can use them; for instance, Kibana would like to exclude multifields from certain field selections, or would like to display only geo_point fields in Maps. ML and QL consumers exclude nested fields in certain circumstances. This post-processing is possible at the moment, but can be hacky; and in all cases it involves sending the whole (possibly very large) field caps response over the wire and then whittling it down in the client. It is also not guaranteed to be accurate - runtime fields may be incorrectly classified as multifields, for example. This commit pushes filtering into elasticsearch itself, reducing the amount of data that needs to be transported and ensuring better accuracy. The field caps API gets two new parameters: * filters - a comma-delimited list that may contain any combination of: `+metadata`, `-metadata`, `-nested`, `-parent`, `-multifield` * types - a comma-delimited list of field types; only fields that have a type in this set will be returned The API will make best-effort attempts to apply the filters post-hoc to responses from older nodes, so this should still work in a mixed-cluster or cross-cluster situation. Fixes #82966, #72174	2022-02-10 14:06:26 +00:00
James Rodewig	2f03112b5b	[DOCS] Synced with 8.0 stack upgrade changes (#83489 ) (#83596 ) This moves the bulk of the upgrade information into the consolidated upgrade guide, but leaves the primary upgrade topic in place as a cross reference. Relates to: https://github.com/elastic/stack-docs/pull/1970 Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> (cherry picked from commit `f6473d71f9`) Co-authored-by: debadair <debadair@elastic.co>	2022-02-07 11:01:42 -05:00
James Rodewig	882bac8948	[DOCS] Fix CCS compatibility typo	2022-02-02 13:09:40 -05:00
Eric Beahan	540a40093c	[DOCS] Correct header syntax (#83275 ) * correct header syntax * Update docs/reference/search/search-your-data/retrieve-selected-fields.asciidoc Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-01-28 14:55:54 -05:00
Julie Tibshirani	e7ba03e0a6	Add notes on indexing to kNN search guide (#83188 ) This change adds a new 'indexing considerations' section that explains why index calls can be slow and how force merge can help search latency.	2022-01-28 10:23:35 -08:00
Christoph Büscher	61e1b080dd	[Docs] Add supported _terms_enum field types (#83244 ) The `_terms_enum` API currently supported keyword, constant_keyword and flattened fields. This should be documented more clearly.	2022-01-28 12:47:12 +01:00
James Rodewig	dfb9f6f18d	[DOCS] Document 8.0 BWC support for CCS (#80809 ) As of 8.0, the compatibility window for cross-cluster search (CCS) to an earlier release will be one minor release. This updates the CCS docs and adds a related 8.0 breaking change. Closes https://github.com/elastic/elasticsearch/issues/80782	2022-01-11 10:33:12 -05:00
James Rodewig	7142b47e69	[DOCS] Add prerequisites for CCS (#81782 ) * Adds a prerequisites section covering remote cluster config, node roles, and security. * Moves existing content about remote cluster config to the prereqs. * Updates the remote cluster docs to include information about eligible gateway nodes and tagging for gateway nodes. Closes https://github.com/elastic/elasticsearch/issues/72001	2022-01-10 09:17:44 -05:00
Bogdan Pintea	13a0e420a3	SQL: Add CCS SQL documentation (#81545 ) This adds the documentation for CCS SQL. Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2022-01-05 20:01:01 +01:00
Ignacio Vera	8c6ed1efc0	Remove experimental flag from geo field format mvt (#81721 ) Small left over from 7.16 where mvt feature become GA	2021-12-14 15:21:05 +01:00
Julie Tibshirani	19eed47159	Improve kNN error message when index is disabled (#81561 ) In order to perform a kNN search on a `dense_vector` field, it must have `index: true` in its mapping. This commit clarifies the error message. Before the message was confusing, because the user likely didn't touch the `index` parameter and might not even be aware of it. It adds a note to the docs clarifying that when coming from 7.x, you must explicitly update `index: true` and reindex the vectors. Relates to #78473.	2021-12-08 16:20:35 -08:00
Nhat Nguyen	d0d91c690e	Handle partial search result with point in time (#81349 ) Today, a search request with PIT would fail immediately if any associated indices or nodes are gone, which is inconsistent when allow_partial_search_results is true. Relates #81256	2021-12-08 10:04:38 -05:00
James Rodewig	229d2d7a77	[DOCS] Add high-level guide for kNN search (#80857 ) Adds a high-level guide for running an approximate or exact kNN search in Elasticsearch. Relates to https://github.com/elastic/elasticsearch/issues/78473.	2021-11-30 14:17:39 -05:00
happybin92	0aa9767f3d	Support combining _shards preference param with <custom-string> (#80024 ) Adds support for combining the _shards search preference parameter with the <custom-string> search preference parameter. Closes #80021	2021-11-10 14:08:27 +01:00
Julie Tibshirani	8ca693b271	Add docs for kNN search endpoint (#80378 ) This commit adds docs for the new `_knn_search` endpoint. It focuses on being an API reference and is light on details in terms of how exactly the kNN search works, and how the endpoint contrasts with `script_score` queries. We plan to add a high-level guide on kNN search that will explain this in depth. Relates to #78473.	2021-11-09 09:28:12 -08:00
James Rodewig	f56a0f4b66	[DOCS] Remove `testenv` annotations from doc snippet tests (#80023 ) Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible. Relates to #79309, #31619	2021-11-05 18:38:50 -04:00
Ignacio Vera	508ed02ed2	Document _key tag added on the agg layer features (#80205 )	2021-11-03 07:12:46 +01:00
James Rodewig	ee1f71d421	[DOCS] Add experimental label to TSDB mapping params and settings (#79647 ) Adds an `experimental` annotation to the following: * `time_series_metric` mapping parameter * `time_series_dimension` mapping parameter * `index.mapping.dimension_fields.limit` index setting * `time_series_dimension` and `time_series_metric` properties in the field caps API response	2021-10-27 09:09:54 -04:00

1 2 3 4 5 ...

1137 commits