This test would fail to see the expected response headers if the task
timed out before it started executing, which could happen very rarely.
It's also not a very good test because it never actually executed any of
the paths involving acking.
This commit fixes the rare failure and tightens up the assertions to
verify that it does indeed see the right thread context while handling
the end of the acking process, and indeed that it always completes the
acking process.
Closes#118914
Fixes two bugs in _resolve/cluster.
First, the code that detects older clusters versions and does a fallback to the _resolve/index
endpoint was using an outdated string match for error detection. That has been adjusted.
Second, upon security exceptions, the _resolve/cluster endpoint was marking the clusters as connected: true,
under the assumption that all security exceptions related to cross cluster calls and remote index access were
coming from the remote cluster, but that is not always the case. Some cross-cluster security violations can
be detected on the local querying cluster after issuing the remoteClient.execute call but before the transport
layer actually sends the request remotely. So we now mark the connected status as false for all ElasticsearchSecurityException cases. End user docs have been updated with this information.
* fix: do not let `_resolve/cluster` hang if remote is unresponsive
Previously, `_resolve/cluster` would wait for a response from a remote
as part of the connection strategy. If the remote were to be
unresponsive, this API would wait until `netty` would terminate the
connection with a handshake exception. The threshold for terminating the
connection is `10s`. This means that the API would wait for `10s` before
determining that the remote is unresponsive. This strategy is now
replaced with a fail fast where a response is sent back to the user
immediately rather than waiting for a connection termination.
* Update docs/changelog/119516.yaml
This updates the gradle wrapper to 8.12
We addressed deprecation warnings due to the update that includes:
- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
(cherry picked from commit ba61f8c7f7)
# Conflicts:
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/distribution/DockerUbiElasticsearchDistributionType.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/test/Fixture.java
# plugins/repository-hdfs/hadoop-client-api/build.gradle
# qa/entitlements/build.gradle
# server/src/main/java/org/elasticsearch/indices/IndicesFeatures.java
# x-pack/plugin/migrate/build.gradle
# x-pack/plugin/security/qa/security-basic/build.gradle
Document IDs are frequently used in HTTP requests, such as `GET /index/_doc/{id}`, where they must be URL-safe to avoid issues with invalid characters. This change ensures that IDs generated by `TimeBasedKOrderedUUIDGenerator` are properly Base64 URL-encoded, free of characters that could break URLs. We also test that no IDs include invalid characters like +, /, or = to guarantee they are fully compliant with URL-safe requirements.
Moreover `TimeBasedKOrderedUUIDGenerator` and `TimeBasedUUIDGenerator` are refactored to allow injection of dependencies which enables us to increase test coverage by including tests for high-throughput scenarios, sequence id overflow and unreliable clocks usage.
If flattened field is configured as non-dimension and non-metric field, then downsampling fails to execute successfully. Downsampling doesn't know how to use the flattened field or how to serialize it. This change addresses this.
Closes#116319
* Handle all exceptions in data nodes can match (#117469)
During the can match phase, prior to the query phase, we may have exceptions
that are returned back to the coordinating node, handled gracefully as if the
shard returned canMatch=true.
During the query phase, we perform an additional rewrite and can match phase
to eventually shortcut the query phase for the shard. That needs to handle
exceptions as well. Currently, an exception there causes shard failures, while
we should rather go ahead and execute the query on the shard.
Instead of adding another try catch on consumers code, this commit adds exception handling to the method itself so that it can no longer throw exceptions and similar mistakes can no longer be made in the future.
At the same time, this commit makes the can match method more easily testable without requiring a full-blown SearchService instance.
Closes#104994
* fix compile
* Address mapping and compute engine runtime field issues (#117792)
This change addresses the following issues:
Fields mapped as runtime fields not getting stored if source mode is synthetic.
Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
Address concurrency issue when runtime fields get pushed down to Lucene. (2)
1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.
Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.
2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.
Closes#117644
* fixed compile error after backporting
* Parse the contents of dynamic objects for [subobjects:false]
* Update docs/changelog/117762.yaml
* add tests
* tests
* test dynamic field
* test dynamic field
* fix tests
(cherry picked from commit f2addbc69a)
# Conflicts:
# server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
* Don't skip shards in coord rewrite if timestamp is an alias (#117271)
The coordinator rewrite has logic to skip indices if the provided date range
filter is not within the min and max range of all of its shards. This mechanism
is enabled for event.ingested and @timestamp fields, against searchable snapshots.
We have basic checks that such fields need to be of date field type, yet if they
are defined as alias of a date field, their range will be empty, which indicates
that the shards are empty, and the coord rewrite logic resolves the alias and
ends up skipping shards that may have matching docs.
This commit adds an explicit check that declares the range UNKNOWN instead of EMPTY
in these circumstances. The same check is also performed in the coord rewrite logic,
so that shards are no longer skipped by mistake.
* fix compile
We had a race here where the non-blocking pending execution
would be starved of executing threads.
This happened when all the current holders of permits from the semaphore
would release their permit after a producer thread failed to acquire a
permit and then enqueued its task.
=> need to peek the queue again after releasing the permit and try to
acquire a new permit if there's work left to be done to avoid this
scenario.
Currently, we emit a deprecation warning in the parser of the source
field when source mode is used in mappings. However, this behavior
causes warnings to be emitted for every mapping update. In tests with
assertions enabled, warnings are also triggered for every change to
index metadata. As a result, deprecation warnings are inadvertently
emitted for index or update requests.
This change relocates the deprecation check to the mapper, limiting it
to cases where a new index is created or a template is created/updated.
Relates to #11752
Indicates whether es.mapping.synthetic_source_fallback_to_stored_source.cutoff_date_restricted_override system property has been configured.
A follow up from #116647
I goofed on the bit * byte and bit * float comparisons. Naturally, these
should be bigendian and compare the dimensions with the binary ones
appropriately.
Additionally, I added a test to ensure that this is handled correctly.
(cherry picked from commit 374c88a832)
The transposition of the bits in half-byte queries for BBQ is pretty
convoluted and slow. This commit greatly simplifies & improves
performance for this small part of bbq queries and indexing.
Here are the results of a small JMH benchmark for this particular
function.
```
TransposeBinBenchmark.transposeBinNew 1024 thrpt 5 857.779 ± 44.031 ops/ms
TransposeBinBenchmark.transposeBinOrig 1024 thrpt 5 94.950 ± 2.898 ops/ms
```
While this is a huge improvement for this small function, the impact at
query and index time is only marginal. But, the code simplification
itself is enough to warrant this change in my opinion.
* Propagate scoring function through random sampler.
* Update docs/changelog/116957.yaml
* Correct score mode in random sampler weight
* Fix random sampling with scores and p=1.0
* Unit test with scores
* YAML test
* Add capability
This reverts #117106. Bwc tests fail, because older nodes are killed with the following error:
```
[2024-11-20T10:54:58,600][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v8.17.0-0] fatal error in thread [elasticsearch[v8.17.0-0
][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"_data_stream_timestamp":{"enabled":true},"_source":{},"properties":{"@timestamp":{"type":"date"},"k8s":{"properties":{"pod":{"properties":{"ip":{"type":"ip"},"name":{"type":"keyword"},"network":{"properties":{"rx":{"type":"long"},"tx":{"type":"long"}}},"uid":{"type":"keyword","time_series_dimension":true}}}}},"metricset":{"type":"keyword","time_series_dimension":true}}}}] differs from mapping [{"_doc":{"_data_stream_timestamp":{"enabled":true},"_source":{"mode":"synthetic"},"properties":{"@timestamp":{"type":"date"},"k8s":{"properties":{"pod":{"properties":{"ip":{"type":"ip"},"name":{"type":"keyword"},"network":{"properties":{"rx":{"type":"long"},"tx":{"type":"long"}}},"uid":{"type":"keyword","time_series_dimension":true}}}}},"metricset":{"type":"keyword","time_series_dimension":true}}}}]
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.mapper.DocumentMapper.<init>(DocumentMapper.java:66)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.mapper.MapperService.newDocumentMapper(MapperService.java:588)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.mapper.MapperService.updateMapping(MapperService.java:346)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.IndexService.updateMapping(IndexService.java:840)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndicesAndUpdateShards(IndicesClusterStateService.java:583)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.doApplyClusterState(IndicesClusterStateService.java:306)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:260)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:530)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:157)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:956)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:218)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:184)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1575)
```
The `mode` parameter no longer gets serialized for new indices. However on the older nodes still serialize the `mode` parameter, which caused the menioned assertion to fail. Reverting for now and see how best to address this bwc serialization issue.
We can only stop serializing mode, when all nodes are on the same version. Unfortunately we can't invoke `c.clusterTransportVersion().get()` from parser or builder, because that calling thread isn't allowed to call `clusterService.state()`.
* Added stricter range type checks and runtime warnings for ENRICH (#115091)
It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: https://github.com/elastic/elasticsearch/issues/107357
This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.
However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.
In order to not create a backwards compatibility problem, we have compromised with the following:
* Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD
* For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning
Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
* Fix compile error likely due to mismatched ordering of backports
Backport of PR #116658.
Changes of this PR are ineffective for stateful, and backport is not
used in serverless. This is mostly to adopt the new transport version
in stateful to keep them consecutive.
Relates ES-9573
* Skip eager reconciliation for empty routing table (#116903)
No need to start the eager reconciliation when the routing table is
empty. An empty routing table means either the cluster has no shards or
the state has not recovered. The eager reconciliation is not necessary
in both cases.
Resolves: #115885
* fix compilation
This commit changes the signature of InternalAggregation#buildAggregations(long[]) to
InternalAggregation#buildAggregations(LongArray) to avoid allocations of humongous arrays.
This was trying to assert that the code under test threw an exception
using the 'try-act-fail-catch-assert' pattern, only the 'fail' step
was missing, meaning that the tests would have incorrectly passed if
the method didn't throw.
This switches it to using `assertThrows`, which is less easy to get
wrong.
Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.
It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.
Backport of #116918 to 8.x
Clarifies that insecure settings are stored in plaintext and must not be
used. Also removes the mention of the (wrong) system property from the
error message if insecure settings are not permitted.
Backport of #116915 to `8.x`
The fetch phase is subject to timeouts like any other search phase. Timeouts
may happen when low level cancellation is enabled (true by default), hence the
directory reader is wrapped into ExitableDirectoryReader and a timeout is
provided to the search request.
The exception that is used is TimeExceededException, but it is an internal
exception that should never be returned to the user. When that is thrown, we
need to catch it and throw error or mark the response as timed out depending
on whether partial results are allowed or not.
This implementation restructures auto-generated document IDs to maximize compression within Lucene's terms dictionary. The key insight is placing stable or slowly-changing components at the start of the ID - the most significant bytes of the timestamp change very gradually (the first byte shifts only every 35 years, the second every 50 days). This careful ordering means that large sequences of IDs generated close in time will share common prefixes, allowing Lucene's Finite State Transducer (FST) to store terms more compactly.
To maintain uniqueness while preserving these compression benefits, the ID combines three elements: a timestamp that ensures time-based ordering, the coordinator's MAC address for cluster-wide uniqueness, and a sequence number for handling high-throughput scenarios. The timestamp handling is particularly robust, using atomic operations to prevent backwards movement even if the system clock shifts.
For high-volume indices generating millions of documents, this optimization can lead to substantial storage savings while maintaining strict guarantees about ID uniqueness and ordering.