elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 09:28:55 -04:00

Author	SHA1	Message	Date
likzn	f28f4545b2	In the field capabilities API, re-add support for `fields` in the request body (#88972 ) We previously removed support for `fields` in the request body, to ensure there was only one way to specify the parameter. We've now decided to undo the change, since it was disruptive and the request body is actually the best place to pass variable-length data like `fields`. This PR restores support for `fields` in the request body. It throws an error if the parameter is specified both in the URL and the body. Closes #86875	2022-08-04 13:44:50 -04:00
Christos Soulios	b81f4187ab	[TSDB] Metric fields in the field caps API (#88695 ) To assist the user in configuring the visualizations correctly while leveraging TSDB functionality, information about TSDB configuration should be exposed via the field caps API per field. Especially for metrics fields, it must be clear which fields are metrics and if they belong to only time-series indexes or mixed time-series and non-time-series indexes. To further distinguish metric fields when they belong to any of the following indices: - Standard (non-time-series) indexes - Time series indexes - Downsampled time series indexes This PR modifies the field caps API so that the mapping parameters time_series_dimension and time_series_dimension are presented only when they are set on fields of time-series indexes. Those parameters are completely ignored when they are set on standard (non-time-series) indexes. This PR revisits some of the conventions adopted by #78790	2022-08-04 20:42:34 +03:00
Ed Savage	188f8872c6	[ML] ECS Grok patterns in the _text_structure/find_structure endpoint (#88982 ) Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns Relates #77065 Co-authored-by: David Roberts <dave.roberts@elastic.co>	2022-08-04 18:39:04 +01:00
zhouhui	8f08c7b55b	Override bulk visit methods of exitable point visitor (#82120 )	2022-08-04 11:48:36 -04:00
Adam Locke	7b8c056494	[DOCS] Replace ES_JAVA_OPTS with CLI_JAVA_OPTS (#89121 )	2022-08-04 09:27:40 -04:00
Thomas Decaux	2f0d9c8342	[DOCS] Fix plugins CLI doc CLI_JAVA_OPTS env var (#89003 ) The commit `1d4534f848` changes the env variable ``ES_JAVA_OPTS`` to ``CLI_JAVA_OPTS``. Doc must be updated as well.	2022-08-04 09:04:28 -04:00
Abdon Pijpelink	b96c39e7ad	[DOCS] Move completion type asciidoc (#89086 ) * [DOCS] Move completion type asciidoc * Fix failing code snippet test	2022-08-04 10:02:28 +02:00
Hendrik Muhs	68050e9502	[ML] Optimize frequent items transaction lookup (#89062 ) represent transactions as bitsets for faster lookups when iterating over candidate sets. This PR implements a lookup table and a subset check based on bits. It uses this lookup table to map transactions to items, this so-called horizontal representation is used to speedup the lookup that checks if a transaction contains the candidate item set	2022-08-04 08:51:31 +02:00
Stef Nestor	5da482b9de	ILM Frozen allows Unfollow Action (#88973 ) Updates [Phase Action](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-index-lifecycle.html#ilm-phase-actions) list to agree with [Unfollow](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-unfollow.html) page that Frozen tier accepts Unfollow action. Confirmed v8.3 ```diff PUT _ilm/policy/my_policy {"policy": {"phases": { "frozen": { "actions": { + "unfollow" : {}, "searchable_snapshot": { "snapshot_repository" : "found-snapshots"} } } } } } {"acknowledged": true } ```	2022-08-03 14:32:15 -06:00
Stef Nestor	4af7069958	Update ES.ILM.Action.ReadOnly (#89054 ) Related to [Discuss#311070](https://discuss.elastic.co/t/action-readonly-appears-to-set-index-blocks-write-not-index-blocks-read-only/311070), @joegallo explains > The [ReadOnlyAction](https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ReadOnlyAction.java#L58-L65) is composed of a series of steps, the most important to this conversation being the [ReadOnlyStep](https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ReadOnlyStep.java#L42). That step does indeed add a write block (as opposed to a ‘read_only’) block, almost certainly the reasoning is that a ‘read_only’ block makes the index metadata read only, also, and we can’t have that — it would prevent the index from moving through the rest of the ILM process. E.g. can’t reassign tiers, can’t change replicas, can’t even change the currently assigned ilm phase/action/step, etc, if you can’t change the index’s metadata. So, the intention of ILM Action "Read Only" is to make an index's data read only and not also the index's metadata. This also decouples "read only" from understanding overlapping to `index.blocks.read_only` which appears to be an accidental thought overlap.	2022-08-03 14:31:20 -06:00
Julie Tibshirani	21eb984e64	Deprecate the _knn_search endpoint (#88828 ) This change deprecates the kNN search API in favor of the new 'knn' option inside the search API. The 'knn' option is now the preferred way of performing kNN search. Relates to #87625	2022-08-03 15:19:01 -04:00
Dimitris Athanasiou	77aa8c03e1	[ML] Include start params in _stats for non-started model deployments (#89091 ) Adds the missing start parameters to the _stats API response for non-started deployments.	2022-08-03 18:55:05 +03:00
Leaf-Lin	942e5fd9fc	Adding specific items into troubleshooting guide (#88105 ) * Update troubleshooting.asciidoc Adding items into the troubleshooting guide * Resolve conflicts * Reorganizes troubleshooting links Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>	2022-08-03 17:00:34 +02:00
Rory Hunter	512bfebc10	Provide tracing implementation using OpenTelemetry + APM agent (#88443 ) Part of #84369. Implement the `Tracer` interface by providing a module that uses OpenTelemetry, along with Elastic's APM agent for Java. See the file `TRACING.md` for background on the changes and the reasoning for some of the implementation decisions. The configuration mechanism is the most fiddly part of this PR. The Security Manager permissions required by the APM Java agent make it prohibitive to start an agent from within Elasticsearch programmatically, so it must be configured when the ES JVM starts. That means that the startup CLI needs to assemble the required JVM options. To complicate matters further, the APM agent needs a secret token in order to ship traces to the APM server. We can't use Java system properties to configure this, since otherwise the secret will be readable to all code in Elasticsearch. It therefore has to be configured in a dedicated config file. This in itself is awkward, since we don't want to leave secrets in config files. Therefore, we pull the APM secret token from the keystore, write it to a config file, then delete the config file after ES starts. There's a further issue with the config file. Any options we set in the APM agent config file cannot later be reconfigured via system properties, so we need to make sure that only "static" configuration goes into the config file. I generated most of the files under `qa/apm` using an APM test utility (I can't remember which one now, unfortunately). The goal is to setup up a complete system so that traces can be captured in APM server, and the results in Elasticsearch inspected.	2022-08-03 14:13:31 +01:00
Denilson das Mercês Amorim	6bf5078fa9	Improve efficiency of BoundedBreakIteratorScanner fragmentation algorithm (#89041 ) As discussed in #73569 the current implementation is too slow in certain scenarios. The inefficient part of the code can be stated as the following problem: Given a text (getText()) and a position in this text (offset), find the sentence boundary before and after the offset, in such a way that the after boundary is maximal but respects end boundary - start boundary < fragment size. In case it's impossible to produce an after boundary that respects the said condition, use the nearest boundary following offset. The current approach begins by finding the nearest preceding and following boundaries, and expands the following boundary greedily while it respects the problem restriction. This is fine asymptotically, but BreakIterator which is used to find each boundary is sometimes expensive. This new approach maximizes the after boundary by scanning for the last boundary preceding the position that would cause the condition to be violated (i.e. knowing start boundary and offset, how many characters are left before resulting length is fragment size). If this scan finds the start boundary, it means it's impossible to satisfy the problem restriction, and we get the first boundary following offset instead (or better, since we already scanned [offset, targetEndOffset], start from targetEndOffset + 1).	2022-08-03 12:07:17 +01:00
David Turner	74ce7a4603	Fix typo (#89063 )	2022-08-03 10:23:57 +01:00
Abdon Pijpelink	aae0ed8eb1	[DOCS] Added note about using _size in Kibana. Closes #88322 (#89030 ) * [DOCS] Added note about using _size in Kibana. Closes #88322 * Use correct attributes	2022-08-03 10:36:03 +02:00
Hendrik Muhs	d3e057c33a	[Transform] improve error handling in state persistence (#88910 ) transform persists the internal state of a transform (e.g. the data cursor) in state document. This change improves the error handling and fixes the problem described in #88905. A transform can now recover from this problem. fixes #88905	2022-08-03 07:57:40 +02:00
Jack Conradson	3bb4a84bdd	Support source fallback for double, float, and half_float field types (#89010 ) This change adds a SourceValueFetcherSortedDoubleIndexFieldData to support double doc values types for source fallback. This also adds support for double, float and half_float field types.	2022-08-02 10:13:58 -07:00
Alexander Reelsen	9b02303138	Docs: Remove paragraph that applies only before Elasticsearch 7.0 (#86209 )	2022-08-03 02:35:11 +09:30
Benjamin Trent	9ce59bb7a9	[ML] add text_similarity nlp task documentation (#88994 ) Introduced in: #88439 * [ML] add text_similarity nlp task documentation * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/ml/ml-shared.asciidoc Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>	2022-08-02 12:17:14 -04:00
Armin Braun	d9dc3a9629	Preemptively initialize routing nodes and indices lookup on all node types (#89032 ) Follow up to #89005 running the initialization as soon as possible on non-master nodes as well.	2022-08-02 17:30:18 +02:00
Leaf-Lin	44c8d19b6d	Update snapshots.asciidoc (#87584 ) Adding a typo ``` in the doc	2022-08-02 11:24:31 +02:00
Armin Braun	9bed4b89fd	Preemptively compute RoutingNodes and the indices lookup during publication (#89005 ) Computing routing nodes and the indices lookup takes considerable time for large states. Both are needed during cluster state application and Prior to this change would be computed on the applier thread in all cases. By running the creation of both objects concurrently to publication, the many shards benchmark sees a 10%+ reduction in the bootstrap time to 50k indices.	2022-08-02 11:02:00 +02:00
Tanguy Leroux	d0b8caebb9	Reject unknown request body fields in Mount API (#88987 ) The parser used to parse Mount API requests is configured to ignore unknown fields. I suspect we made it this way when it was created because we were expecting to change the request's body in the future, but that never happened. This leniency confuses users (#75982) so we think it is better to simply reject requests with unknown fields starting v8.5.0. Because the High Level REST Client has a bug (to be fixed in #79604) that injects a wrong ignored_index_settings we decided to just ignore and not reject that one on purpose. Closes #75982	2022-08-02 09:11:29 +02:00
Leaf-Lin	00eefdd9a0	Revert "Add warning on restarting nodes > low watermark" This reverts commit `a3555eca6b`.	2022-08-02 16:44:14 +10:00
Leaf-Lin	a3555eca6b	Add warning on restarting nodes > low watermark As per https://github.com/elastic/elasticsearch/issues/49972 and https://github.com/elastic/elasticsearch/issues/56578, if a node is above low disk threshold when being restarted (rolling restart, network disruption or crash), the disk threshold decider prevents reusing the shard content on the restarted node. The consequence of the event is the node may take a long time to start.	2022-08-02 16:36:27 +10:00
David Turner	d5ea39b2e8	Clean up network setting docs (#88929 ) Clean up network setting docs - Add types for all params - Remove mention of JDKs before 11 - Clarify some wording Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>	2022-08-01 19:59:50 +01:00
Christos Soulios	ad2dc834a7	Add `synthetic_source` support to `aggregate_metric_double` fields (#88909 ) This PR implements synthetic_source support to the aggregate_metric_double field type Relates to #86603	2022-08-01 20:42:25 +03:00
Jack Conradson	5194d29b1c	Support source fallback for byte, short, and long fields (#88954 ) This change adds source fallback support for byte, short, and long fields. These use the already existing class SourceValueFetcherSortedNumericIndexFieldData.	2022-08-01 08:23:36 -07:00
Lee Hinman	3420be0ca5	Fix renaming data streams with CCR replication (#88875 ) This commit fixes the situation where a user wants to use CCR to replicate indices that are part of a data stream while renaming the data stream. For example, assume a user has an auto-follow request that looks like this: ``` PUT /_ccr/auto_follow/my-auto-follow-pattern { "remote_cluster" : "other-cluster", "leader_index_patterns" : ["logs-"], "follow_index_pattern" : "{{leader_index}}_copy" } ``` And then the data stream `logs-mysql-error` was created, creating the backing index `.ds-logs-mysql-error-2022-07-29-000001`. Prior to this commit, replicating this data stream means that the backing index would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would not* be renamed. This caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not renamed for a data stream during following. After this commit, there are a couple of changes: First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes `logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works, this means we need to support renaming a data stream for a regular "create follower" request, so a new parameter has been added: `data_stream_name`. It works like this: ``` PUT /mynewindex/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": "myotherindex", "data_stream_name": "new_ds" } ``` Second, the backing index for a data stream must be renamed in a way that does not break the parsing of a data stream backing pattern, whereas previously the index `.ds-logs-mysql-error-2022-07-29-000001` would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the rollover digit), after this commit it will be renamed to `.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like: \| Leader Cluster \| Follower Cluster \| \|--------------\|-----------\| \| `logs-mysql-error` (data stream) \| `logs-mysql-error_copy` (data stream) \| \| `.ds-logs-mysql-error-2022-07-29-000001` \| `.ds-logs-mysql-error_copy-2022-07-29-000001` \| Which internally means the auto-follow request turned into the create follower request of: ``` PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": ".ds-logs-mysql-error-2022-07-29-000001", "data_stream_name": "logs-mysql-error_copy" } ``` Relates to https://github.com/elastic/elasticsearch/pull/84940 (cherry-picked the commit for a test) Relates to https://github.com/elastic/elasticsearch/pull/61993 (where data stream support was first introduced for CCR) Resolves https://github.com/elastic/elasticsearch/issues/81751	2022-08-01 09:17:50 -06:00
Benjamin Trent	83136efd20	[ML] address potential bug where trained models get stuck in starting after being allocated to node (#88945 ) When a model is starting, it has been rarely observed that it will lock up while trying to restore the model objects to the native process. This would manifest as a trained model being stuck in "starting" while also being assigned to a node. So, there is a native process started and task available on the assigned nodes, but the model state never gets out of "starting".	2022-08-01 10:06:52 -04:00
Hendrik Muhs	e64eb8cd4f	[ML] Frequent Items: use a bitset for deduplication (#88943 ) Speedup frequent_items by using bitsets instead of lists of longs. With this item sets can be faster de-duplicated. A bit is set according to the order of top items (by count).	2022-08-01 15:16:17 +02:00
Nik Everett	4607182ce8	synthetic source: fix scaled_float rounding (#88916 ) There were some cases where synthetic source wasn't properly rounding in round trips. `0.15527719259262085` with a scaling factor of `2.4206374697469164E16` was round tripping to `0.15527719259262088` which then round trips up to `0.0.1552771925926209`, rounding the wrong direction! This fixes the round tripping in this case through ever more paranoid double checking and nudging. Closes #88854	2022-08-01 22:17:23 +09:30
Benjamin Trent	6d1b2277b3	[ML] add new text_similarity nlp task (#88439 ) text_similarity is a cross-encoding task that compares two text inputs at inference time. It can be used for cross-encoding re-ranking ``` POST _ml/trained_models/cross-encoder__ms-marco-tinybert-l-2-v2/_infer { "docs":[{ "text_field": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."}, {"text_field": "New York City is famous for the Metropolitan Museum of Art."}], "inference_config": { "text_similarity": { "text": "How many people live in Berlin?" } } } ``` With results: ``` { "inference_results": [ { "predicted_value": 7.235751628875732 }, { "predicted_value": -11.562295913696289 } ] } ``` Or with just raw text similarity. Here is an example for check if two questions are very similar: ``` POST _ml/trained_models/cross-encoder__quora-distilroberta-base/_infer { "docs":[{ "text_field": "what is your quest?"}, { "text_field": "what is your favorite color?"}, { "text_field": "is the swallow african or european?"}, { "text_field": "what is the airspeed velocity of a swallow carrying coconuts?"}, { "text_field": "how fast is an unladen swallow?"}], "inference_config": { "text_similarity": { "text": "what is the airspeed velocity of an unladen swallow?" } } } ``` With results: ``` { "inference_results": [ { "predicted_value": -8.312414169311523 }, { "predicted_value": -8.239330291748047 }, { "predicted_value": -8.256011009216309 }, { "predicted_value": -4.1945390701293945 }, { "predicted_value": -3.294121742248535 } ] } ```	2022-08-01 08:05:22 -04:00
Nikolaj Volgushev	d2868b00bf	Support bulk updates of API keys (#88856 ) This PR adds a new API route to support bulk updates of API keys: `POST _security/api_key/_bulk_update` The route takes a list of IDs (`ids`) of API keys to update, along with the same request parameters as the single operation route: - `role_descriptors` - The list of role descriptors specified for the key. This is one of the two parts that determines an API key’s privileges. - `metadata_flattened` - The searchable metadata associated to an API key Analogously to the single operation route, a call to `_bulk_update` automatically updates the `limited_by_role_descriptors`, `creator`, and `version` fields for each API key. The implementation ports the single API key update operation to use the new bulk functionality under the hood, translating as necessary at the transport layer. Relates: #88758	2022-08-01 12:53:03 +02:00
Luigi Dell'Aquila	aa7d0a6cf8	Improve EQL Sequence circuit breaker precision (#88538 ) Fixes #88300	2022-08-01 08:27:41 +02:00
Ryan Ernst	bc9a93975e	Add deprecation message for deprecated plugin APIs (#88961 ) Plugin APIs are defined by a set of interfaces from server. Many of these APIs are actually implementation details of the system. As we move these implementation details to use different hook mechanisms so that internals are only implementable by builtin components, the existing plugin APIs need to be deprecated. Java provides a means to indicate deprecation - through the `@Deprecated` annotation. But that annotation is only seen when compiling a plugin implementing deprecated hooks, and only then if deprecation warnings are not disabled. This commit adds an introspection step to plugin initialization which inspects each loaded plugin and looks for any APIs marked with the @Deprecated annotation which are overridden by the plugin. If any are found, deprecation log messages are then emitted to the deprecation log.	2022-07-29 14:30:04 -07:00
Ryan Ernst	e3c6726a71	Deprecate overriding DiscoveryPlugin internals (#88925 ) DiscoveryPlugin allows extending getJoinValidator and getElectionStrategies. These are implementation details of the system. This commit deprecates these methods so that plugin authors are discouraged from overriding them.	2022-07-30 06:30:34 +09:30
Ryan Ernst	e501609604	Deprecate network plugins (#88924 ) Network plugins provide network implementations. In the past this has been used for alternatives to netty based networking, using the JDK's nio. However, nio has now been removed, and it is inadvisable for a plugin to implement this low level part of the system. Therefore, this commit marks the NetworkPlugin interface as deprecated.	2022-07-29 11:28:31 -07:00
Chris Hegarty	4e3b71b6af	Ensure that the extended socket options TCP_KEEPXXX are available (#88935 )	2022-07-29 17:54:33 +01:00
Benjamin Trent	6b8dab7807	[ML] fix BERT and MPNet tokenization bug when handling unicode accents (#88907 ) When handling unicode accents, it may have been that BERT tokenizations removed the incorrect characters. This would result in an exceptionally strange result and possibly an error. closes #88900	2022-07-29 07:28:25 -04:00
Ryan Ernst	9f7cabacda	Add 8.5 migration docs (#88923 ) This commit adds the a migration docs file for 8.5. This was copied from the 8.4 file, which had no migration notes.	2022-07-29 11:59:32 +09:30
Stuart Tettemer	476da8c4ed	Script: Reindex & UpdateByQuery Metadata (#88665 ) Adds metadata classes for Reindex and UpdateByQuery contexts. For Reindex metadata: * _index can't be null * _id, _routing and _version are writable and nullable * _now is read-only * op is read-write must be 'noop', 'index' or 'delete' Reindex metadata keeps the originx value for _index, _id, _routing and _version so that `Reindexer` can see if they've changed. If _version is null in the ctx map, or, equivalently, the augmentation `setVersionToInternal()` was called by the script, `Reindexer` sets document versioning to internal. If `_version` is `null` in the ctx map, `getVersion` returns `Long.MIN_VALUE`. For UpdateByQuery metadata: * _index, _id, _version, _routing are all read-only * _routing is also nullable * _now is read-only * op is read-write and one of 'index', 'noop', 'delete' Closes: #86472	2022-07-28 17:21:07 -05:00
Jack Conradson	5e0701f026	Add source fallback for keyword fields using operation (#88735 ) This change adds an operation parameter to FieldDataContext that allows us to specialize the field data that are returned from fielddataBuilder in MappedFieldType. Keyword, integer, and geo point field types now support source fallback where we build a doc values wrapper using source if doc values doesn't exist for this field under the operation SCRIPT. This allows us to have source fallback in scripting for the scripting fields API.	2022-07-28 10:34:05 -07:00
Mark Vieira	0ce4a1c12b	Prune changelogs after 8.3.3 release	2022-07-28 08:00:49 -07:00
David Turner	7103053f03	Add troubleshooting docs about data corruption (#88760 ) Adds some docs giving more detailed background about what data corruption really means and some suggestions about how to narrow down the root cause. Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>	2022-07-28 11:23:23 +01:00
Lukas Wegmann	8874608150	Fix SqlSearchIT testAllTypesWithRequestToOldNodes (#88866 ) (#88883 ) Resolves #88866	2022-07-28 18:13:19 +09:30
Gilad Gal	c35cfc9fca	Update synthetic-source.asciidoc (#88880 ) * Update synthetic-source.asciidoc * Update docs/reference/mapping/fields/synthetic-source.asciidoc Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>	2022-07-28 10:35:10 +03:00
Stuart Tettemer	3ecd7ec817	Script: Protected _source inside update scripts (#88733 ) CtxMap delegates all metadata keys to it's `Metadata` container and all other keys to it's source map. In most write contexts (update, update by query, reindex), the source map should only contain one key, `_source`, which has a `Map<String, Object>`. This change adds validations to writes to the source map that rejects insertion of invalid keys, illegal removal of the `_source` key, and illegally overwriting the `_source` mapping with the wrong type.	2022-07-27 15:10:36 -05:00

1 2 3 4 5 ...

12028 commits