ReindexDataStreamIndexAction.cleanupCluster called EsIntegTestCase.cleanupCluster, but did not override it. This caused EsIntegTestCase.cleanupCluster to be called twice, once in ReindexDataStreamIndexAction.cleanupCluster and once when the After annotation is called on EsIntegTestCase.
(cherry picked from commit 89ba03ecff)
# Conflicts:
# muted-tests.yml
To avoid having AggregateMapper find aggregators based on their names with reflection, I'm doing some changes:
- Make the suppliers have methods returning the intermediate states
- To allow this, the suppliers constructor won't receive the chanells as params. Instead, its methods will ask for them
- Most changes in this PR are because of this
- After those changes, I'm leaving AggregateMapper still there, as it still converts AggregateFunctions to its NamedExpressions
(cherry picked from commit 7bea3a5610)
# Conflicts:
# x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/AggregateMapper.java
* Fix privileges for system index migration WRITE block (#121327)
This PR removes a potential cause of data loss when migrating system indices. It does this by changing the way we set a "write-block" on the system index to migrate - now using a dedicated transport request rather than a settings update. Furthermore, we no longer delete the write-block prior to deleting the index, as this was another source of potential data loss. Additionally, we now remove the block if the migration fails.
* Update release notes
* Delete docs/changelog/122214.yaml
* Adding condition to verify if the field belongs to an index
* Update docs/changelog/121720.yaml
* Remove unnecessary comma from yaml file
* remove duplicate inference endpoint creation
* updating isMetadata to return true if mapper has the correct type
* remove unnecessary index creation in yaml tests
* Adding check if the document has returned in the yaml test
* Updating test to skip time series check if index mode is standard
* Refactor tests to consider verifying every metafields with all index modes
* refactoring test to verify for all cases
* Adding assetFalse if not time_series and fields are from time_series
* updating test texts to have better description
This commit removes "TLSv1.1" from the list of default protocols in
Elasticsearch (starting with ES9.0)
TLSv1.1 has been deprecated by the IETF since March 2021
This affects a variety of TLS contexts, include
- The HTTP Server (Rest API)
- Transport protocol (including CCS and CCR)
- Outgoing connections for features that have configurable SSL
settings. This includes
- reindex
- watcher
- security realms (SAML, OIDC, LDAP, etc)
- monitoring exporters
- inference services
In practice, however, TLSv1.1 has been disabled in most Elasticsearch
deployments since around 7.12 because most JDK releases have disabled
TLSv1.1 (by default) starting in April 2021
That is, if you run a default installation of Elasticsearch (for any
currently supported version of ES) that uses the bundled JVM then
TLSv1.1 is already disabled.
And, since ES9+ requires JDK21+, all supported JDKs ship with TLSv1.1
disabled by default.
In addition, incoming HTTP connections to Elastic Cloud deployments
have required TLSv1.2 or higher since April 2020
This change simply makes it clear that Elasticsearch does not
attempt to enable TLSv1.1 and administrators who wish to use that
protocol will need to explicitly enable it in both the JVM and in
Elasticsearch.
Resolves: #108057
The downsample task sometimes needs a little bit longer to complete so
we bump the timeout from 60s to 120s.
Fixes#122056
(cherry picked from commit 0ec2fe05ef)
# Conflicts:
# muted-tests.yml
When a node is shutting down, scheduling tasks for the Driver can result
in a rejection exception. In this case, we drain and close all
operators. However, we don't clear the pending tasks in the scheduler,
which can lead to a pending task being triggered unexpectedly, causing a
ConcurrentModificationException.
* [Deprecation API] Adjust details in the SourceFieldMapper deprecation warning (#122041)
In this PR we improve the deprecation warning about configuring source
in the mapping.
- We reduce the size of the warning message so it looks better in kibana.
- We keep the original message in the details.
- We use an alias help url, so we can associate it with the guide when it's created.
* Remove bwc code
The aggs timeout test waits for the agg to return and then double checks
that the agg is stopped using the tasks API. We're seeing some failures
where the tasks API reports that the agg is still running. I can't
reproduce them because computers. This adds two things:
1. Logs the hot_threads so we can see if the query is indeed still
running.
2. Retries the _tasks API for a minute. If it goes away soon after the
_search returns that's *fine*. If it sticks around for more than a
few seconds then the cancel isn't working. We wait for a minute
because CI can't be trusted to do anything quickly.
Closes#121993
It is possible to create an index in 7.x with a single type. This fixes the CreateIndexFromSourceAction to not copy that type over when creating a destination index from a source index with a type.
Since introducing the fail_fast (see #117410) option to remote sinks,
the ExchangeSource can propagate failures that can lead to circular
references. The issue occurs as follows:
1. remote-sink-1 fails with exception e1, and the failure collector collects e1.
2. remote-sink-2 fails with exception e2, and the failure collector collects e2.
3. The listener of remote-sink-2 propagates e2 before the listener of
remote-sink-1 propagates e1.
4. The failure collector in ExchangeSource sees [e1, e2] and suppresses
e2 to e1. The upstream sees [e2, e1] and suppresses e1 to e2, leading to
a circular reference.
With this change, we stop collecting failures in ExchangeSource.
Labelled this non-issue for an unreleased bug.
Relates #117410
This updates the kibana signature json files in two ways:
* Renames `eval` to `scalar` - that's the name we use inside of ESQL and
we may as well make the name the same.
* Calls the `CATEGORIZE` and `BUCKET` function `grouping` because they
can only be used in the "grouping" positions of the `STATS` command.
Closes#113411
* Add 9.0 patch transport version constants #121985
Transport version changes must be unique per branch. Some transport
version changes meant for 9.0 are missing unique backport constants.
This is a backport of #121985, adding unique transport version patch
numbers for each change intended for 9.0.
* match constant naming in main
When we are already parsing events, we can receive errors as the next
event.
OpenAI formats these as:
```
event: error
data: <payload>
```
Elastic formats these as:
```
data: <payload>
```
Unified will consolidate them into the new error structure.
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Nik Everett <nik9000@gmail.com>
Today, the exchange buffer of an exchange source is finished in two
cases: (1) when the downstream pipeline has received enough data and (2)
when all remote sinks have completed. In the first case, outstanding
pages could be safely discarded. In the second case, no new pages should
be received after finishing. In both scenarios, discarding all
outstanding pages was safe if noMoreInputs was switched while adding
pages.
However, with the stop API, the buffer may now finish while keeping
outstanding pages, and new pages may still be received. This change
updates the exchange buffer to discard only the incoming page when
noMoreInputs is switched, rather than all pages in the buffer.
Closes#120757
* [ML] Support revoking inference default endpoint authorization (#121326)
* Starting revoke
* Adding integration tests
* More integration tests
* Adding test for deleting default inference endpoint via rest call
* Removing task type any
* Addressing feedback and adding test
* Fixing tests
The node environment has many paths. The accessors for these currently
use a "file" suffix, but they are always directories. This commit
renames the accessors to make it clear these paths are directories.
This adds a `task_description` field to `profile` output and task
`status`. This looks like:
```
...
"profile" : {
"drivers" : [
{
"task_description" : "final",
"start_millis" : 1738768795349,
"stop_millis" : 1738768795405,
...
"task_description" : "node_reduce",
"start_millis" : 1738768795392,
"stop_millis" : 1738768795406,
...
"task_description" : "data",
"start_millis" : 1738768795391,
"stop_millis" : 1738768795404,
...
```
Previously you had to look at the signature of the operators in the
driver to figure out what the driver is *doing*. You had to know enough
about how ESQL works to guess. Now you can look at this description to
see what the server *thinks* it is doing. No more manual classification.
This will be useful when debugging failures and performance regressions
because it is much easier to use `jq` to group on it:
```
| jq '.profile[] | group_by(.task_description)[]'
```
Fix a bug in TOP which surfaces when merging results from ordinals. We
weren't always accounting for oversized arrays when checking if we'd
ever seen a field. This changes the oversize itself to always size on a bucket boundary.
The test for this required random `bucketSize` - without that the
oversizing frequently wouldn't cause trouble.
Unified Chat Completion error responses now forward code, type, and
param to in the response payload. `reason` has been renamed to
`message`.
Notes:
- `XContentFormattedException` is a `ChunkedToXContent` so that the REST listener can call `toXContentChunked` to format the output structure. By default, the structure forwards to our existing ES exception structure.
- `UnifiedChatCompletionException` will override the structure to match the new unified format.
- The Rest, Transport, and Stream handlers all check the exception to verify it is a UnifiedChatCompletionException.
- OpenAI response handler now reads all the fields in the error message and forwards them to the user.
- In the event that a `Throwable` is a `Error`, we rethrow it on another thread so the JVM can catch and handle it. We also stop surfacing the JVM details to the user in the error message (but it's still logged for debugging purposes).
A future.actionGet was missing from the delete pipeline action execution in the test cleanup, causing all tests to fail intermittently. Also replace actionGet with safeGet.
(cherry picked from commit 0f6b80a98f)
# Conflicts:
# muted-tests.yml
Backports #114496 to 9.0
> Failure handling for snapshots was made stricter in #107191 (8.15), so this
field is always empty since then. Clients don't need to check it anymore for
failure handling, we can remove it from API responses in 9.0
Add the pipeline "reindex-data-stream-pipeline" to the reindex request within ReindexDataStreamIndexAction. This cleans up documents as needed before inserting into the destination index. Currently, the pipeline only sets a timestamp field with a value of 0, if the document is missing a timestamp field. This is needed because existing indices which are added to a data stream may not contain a timestamp, but reindex validates that a timestamp field exists when creating data stream destination indices.
This pipeline is managed by ES, but can be overriden by users if necessary. To do this, the version field of the pipeline should be set to a value higher than the MigrateRegistry version.
Some areas of the code call this field type
AggregateDoubleMetric and others AggregateMetricDouble, but the docs
use aggregate_metric_double, so for consistency this commit refactors
the former into the latter.
If the query hits the failing index first, we will cancel the request,
preventing exchange-sink requests and data-node requests from reaching
another data node. As a result, exchange sinks could stay for 30
seconds.
* Fix inference update API calls with task_type in body or deployment_id defined
* Update docs/changelog/121231.yaml
* Fixing test
* Reuse existing deployment ID retrieval logic
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This refactoring was motivated by the following issues with the current
state of the code:
- The `TransformDeprecationChecker` is listed as plugin checker, but later we remove is from the `plugin_settings` and add it to the `cluster_settings`. This made me consider that the checker might be dealing with transform deprecation warnings but if they are listed under the `cliuster_settings`, it fits better to be part of `ClusterDeprecationChecker`.
- The `DeprecationInfo` is a data class, but it has a method `from` which constructs an `DeprecationInfo.Response` instance. However, this is not a simple factory class but it actually runs all the checks and it also tries to assert that it is not executed on a transport thread. Considering this, I thought it might fit better to the `TransportDeprecationInfoAction`, this way all the logic is in one place and all the checkers are wired and used in the same class.
- Constructing the node settings deprecation issues requires to merge the deprecation warnings of the individual nodes. We considered bringing together the execution of the remote request and the construction of the response in a new class called `NodeDeprecationChecker` that resembles the patterns of the other Checker classes.
- Reinstated the `PLUGIN_CHECKERS` even if we have only one check, so other developers can easier add their plugin checks.
- Finally, we noticed that the way we synthesise the remote requests is difficult to read and maintain because each call is nested under the previous one. We propose in this PR a different pattern that uses the `RefCountingListener` to combine the different remote calls and store their results in a container class named `PrecomputedData`
- **Bonus**: Removed the `LegacyIndexTemplateDeprecationChecker.java` which was not used.