ReindexDataStreamIndexAction.cleanupCluster called EsIntegTestCase.cleanupCluster, but did not override it. This caused EsIntegTestCase.cleanupCluster to be called twice, once in ReindexDataStreamIndexAction.cleanupCluster and once when the After annotation is called on EsIntegTestCase.
(cherry picked from commit 89ba03ecff)
# Conflicts:
# muted-tests.yml
To avoid having AggregateMapper find aggregators based on their names with reflection, I'm doing some changes:
- Make the suppliers have methods returning the intermediate states
- To allow this, the suppliers constructor won't receive the chanells as params. Instead, its methods will ask for them
- Most changes in this PR are because of this
- After those changes, I'm leaving AggregateMapper still there, as it still converts AggregateFunctions to its NamedExpressions
(cherry picked from commit 7bea3a5610)
# Conflicts:
# x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/AggregateMapper.java
* Fix privileges for system index migration WRITE block (#121327)
This PR removes a potential cause of data loss when migrating system indices. It does this by changing the way we set a "write-block" on the system index to migrate - now using a dedicated transport request rather than a settings update. Furthermore, we no longer delete the write-block prior to deleting the index, as this was another source of potential data loss. Additionally, we now remove the block if the migration fails.
* Update release notes
* Delete docs/changelog/122214.yaml
Two of the timeout tests have been muted for several months. The reason is that we tightened the assertions to cover for partial results being returned, but there were edge cases in which partial results were not actually returned.
The timeout used in the test was time dependent, hence when the timeout precisely will be thrown is unpredictable, because we have timeout checks in different places in the codebase, when iterating through the leaves, before scoring any document, or while scoring documents. The edge case that caused failures is a typical timing issue where the initial check for timeout in CancellableBulkScorer already triggers the timeout, before any document has been collected.
I made several adjustments to the test to make it more robust:
- use index random to index documents, that speeds it up
- share indexing across test methods, so that it happens once at the suite level
- replace the custom query that triggers a timeout to not be a script query, but rather a lucene query that is not time dependent, and throws a time exceeded exception precisely where we expect it, so that we can test how the system reacts to that. That allows to test that partial results are always returned when a timeout happens while scoring documents, and that partial results are never returned when a timeout happens before we even started to score documents.
Closes#98369Closes#98053
Improve LuceneSyntheticSourceChangesSnapshot by triggering to use a sequential stored field reader if docids are dense. This is done by computing for which docids to synthesize recovery source for. If the requested docids are dense and monotonic increasing a sequential stored field reader is used, which provided recovery source for many documents without repeatedly de-compressing the same block of stored fields.
* Adding condition to verify if the field belongs to an index
* Update docs/changelog/121720.yaml
* Remove unnecessary comma from yaml file
* remove duplicate inference endpoint creation
* updating isMetadata to return true if mapper has the correct type
* remove unnecessary index creation in yaml tests
* Adding check if the document has returned in the yaml test
* Updating test to skip time series check if index mode is standard
* Refactor tests to consider verifying every metafields with all index modes
* refactoring test to verify for all cases
* Adding assetFalse if not time_series and fields are from time_series
* updating test texts to have better description
This attempts to fix a flay test where the term_freq returned
by the multiple terms vectors API was `null`.
I was not able to reproduce this test but this proposes a fix
based on the following running theory:
- an Elasticsearch cluster comprised of at least 2 nodes
- we create a couple of indices with 1 primary and 1 replica
- we index a document that was acknowledged only by the primary
(because `wait_for_active_shards` defaults to `1`)
- the test executes the multiple terms vectors API and it hits the
node hosting the replica shard, which hasn't yet received the
document we ingested in the primary shard.
This race condition between the document replication and the test
running the terms vectors API on the replica shard could yield
a `null` value for the the term's `term_freq` (as the replica shard
contains 0 documents).
This PR proposes we change the `wait_for_active_shards` value to
`all` so each write is acknowledged by all replicas before the client
receives the response.
(cherry picked from commit a148fa2828)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This commit removes "TLSv1.1" from the list of default protocols in
Elasticsearch (starting with ES9.0)
TLSv1.1 has been deprecated by the IETF since March 2021
This affects a variety of TLS contexts, include
- The HTTP Server (Rest API)
- Transport protocol (including CCS and CCR)
- Outgoing connections for features that have configurable SSL
settings. This includes
- reindex
- watcher
- security realms (SAML, OIDC, LDAP, etc)
- monitoring exporters
- inference services
In practice, however, TLSv1.1 has been disabled in most Elasticsearch
deployments since around 7.12 because most JDK releases have disabled
TLSv1.1 (by default) starting in April 2021
That is, if you run a default installation of Elasticsearch (for any
currently supported version of ES) that uses the bundled JVM then
TLSv1.1 is already disabled.
And, since ES9+ requires JDK21+, all supported JDKs ship with TLSv1.1
disabled by default.
In addition, incoming HTTP connections to Elastic Cloud deployments
have required TLSv1.2 or higher since April 2020
This change simply makes it clear that Elasticsearch does not
attempt to enable TLSv1.1 and administrators who wish to use that
protocol will need to explicitly enable it in both the JVM and in
Elasticsearch.
Resolves: #108057
The downsample task sometimes needs a little bit longer to complete so
we bump the timeout from 60s to 120s.
Fixes#122056
(cherry picked from commit 0ec2fe05ef)
# Conflicts:
# muted-tests.yml
When a node is shutting down, scheduling tasks for the Driver can result
in a rejection exception. In this case, we drain and close all
operators. However, we don't clear the pending tasks in the scheduler,
which can lead to a pending task being triggered unexpectedly, causing a
ConcurrentModificationException.
* [Deprecation API] Adjust details in the SourceFieldMapper deprecation warning (#122041)
In this PR we improve the deprecation warning about configuring source
in the mapping.
- We reduce the size of the warning message so it looks better in kibana.
- We keep the original message in the details.
- We use an alias help url, so we can associate it with the guide when it's created.
* Remove bwc code
Like the plugin being testing, the entitled test plugin needs access to
dynamic elements (namely, file paths). This commit dynamically generates
the entitlement policy for the entitlted test plugin when it is
installed. It also adds using the file entitltlement as an example.
The only real path separators are either forward or back slash. Trying
to use something else like newline fails to even parse as a path on
windows. This commit removes testing of other separators.
closes#121872
The aggs timeout test waits for the agg to return and then double checks
that the agg is stopped using the tasks API. We're seeing some failures
where the tasks API reports that the agg is still running. I can't
reproduce them because computers. This adds two things:
1. Logs the hot_threads so we can see if the query is indeed still
running.
2. Retries the _tasks API for a minute. If it goes away soon after the
_search returns that's *fine*. If it sticks around for more than a
few seconds then the cancel isn't working. We wait for a minute
because CI can't be trusted to do anything quickly.
Closes#121993
We shouldn't run the post-snapshot-delete cleanup work on the master
thread, since it can be quite expensive and need not block subsequent
cluster state updates. This commit forks it onto a `SNAPSHOT` thread.
It is possible to create an index in 7.x with a single type. This fixes the CreateIndexFromSourceAction to not copy that type over when creating a destination index from a source index with a type.
Since introducing the fail_fast (see #117410) option to remote sinks,
the ExchangeSource can propagate failures that can lead to circular
references. The issue occurs as follows:
1. remote-sink-1 fails with exception e1, and the failure collector collects e1.
2. remote-sink-2 fails with exception e2, and the failure collector collects e2.
3. The listener of remote-sink-2 propagates e2 before the listener of
remote-sink-1 propagates e1.
4. The failure collector in ExchangeSource sees [e1, e2] and suppresses
e2 to e1. The upstream sees [e2, e1] and suppresses e1 to e2, leading to
a circular reference.
With this change, we stop collecting failures in ExchangeSource.
Labelled this non-issue for an unreleased bug.
Relates #117410
This updates the kibana signature json files in two ways:
* Renames `eval` to `scalar` - that's the name we use inside of ESQL and
we may as well make the name the same.
* Calls the `CATEGORIZE` and `BUCKET` function `grouping` because they
can only be used in the "grouping" positions of the `STATS` command.
Closes#113411
* Add 9.0 patch transport version constants #121985
Transport version changes must be unique per branch. Some transport
version changes meant for 9.0 are missing unique backport constants.
This is a backport of #121985, adding unique transport version patch
numbers for each change intended for 9.0.
* match constant naming in main
When we are already parsing events, we can receive errors as the next
event.
OpenAI formats these as:
```
event: error
data: <payload>
```
Elastic formats these as:
```
data: <payload>
```
Unified will consolidate them into the new error structure.