Commit graph

13415 commits

Author SHA1 Message Date
Benjamin Trent
d3efd51bc1
Fix automatic tracking of collapse with docvalue_fields (#110103) (#110154)
There were some optimizations that broke collapse fields automatically
being added to `docvalue_fields` during the fetch phase. 

Consequently, users will get really weird errors like
`unsupported_operation_exception`. This commit corrects the intended
behavior of automatically including the collapse field in the
docvalue_fields context during fetch if it isn't already included.

closes: https://github.com/elastic/elasticsearch/issues/96510
2024-06-26 07:55:11 -04:00
Mary Gouseti
978557ab65
[Bug] During heavy indexing load it's possible for lazy rollover to trigger multiple rollovers (#109636) (#110031)
Let’s say we have `my-metrics`  data stream which is receiving a lot of
indexing requests. The following scenario can result in multiple
unnecessary rollovers:

1. We update the mapping and mark it to be lazy rolled over
2. We receive 5 bulk index requests that all contain a write request for this data stream.
3. Each of these requests are being picked up “at the same time”, they see that the data stream needs to be rolled over and they issue a lazy rollover request.
4. Currently, data stream my-metrics  has 5 tasks executing an unconditional rollover.
5. The data stream gets rolled over 5 times instead of one.

This scenario is captured in the `LazyRolloverDuringDisruptionIT`.

We have witnessed this also in the wild, where a data stream was rolled
over 281 times extra resulting in 281 empty indices.

This PR proposes:

- To create a new task queue with a more efficient executor that further batches/deduplicates the requests.
- We add two safe guards, the first to ensure we will not enqueue the rollover task if we see that a rollover has occurred already. The second safe guard is during task execution, if we see that the data stream does not have the `rolloverOnWrite` flag set to `true` we skip the rollover.
- When we skip the rollover we return the following response:

```
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "old_index": ".ds-my-data-stream-2099.05.07-000002",
  "new_index": ".ds-my-data-stream-2099.05.07-000002",
  "rolled_over": false,
  "dry_run": false,
  "lazy": false,
}
```
2024-06-21 15:09:08 +03:00
Artem Prigoda
cc36b7f346
[8.14] Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult (#108094) (#110012)
Backport #108094 to 8.14

Resolves #106043
2024-06-21 07:16:00 +10:00
David Turner
342a4517dc
[8.14] Fix TasksIT#testTasksCancellation (#109929) (#109941)
* Fix `TasksIT#testTasksCancellation` (#109929)

The tasks are removed from the task manager _after_ sending the
response, so we cannot reliably assert they're done. With this commit we
wait for them to complete properly first.

Closes #109686

* Introduce safeGet
2024-06-20 03:56:46 +10:00
Panagiotis Bailis
65ec438730
[8.14] backporting fix for RRF pagination (#109788) 2024-06-19 18:40:35 +03:00
David Turner
da1ba774be AwaitsFix for #109686 2024-06-19 14:56:54 +01:00
Luca Cavanna
06172d7e00
Check array size before returning array item in script doc values (#109824) (#109840)
When accessing array elements from a script, if the backing array has enough items, meaning that
there has previously been a doc with enough values, we let the request go through, and we end up
returning items from the previous doc that had a value at that position if the current doc does not
have enough elements.

We should instead validate the length of the array for the current doc and eventually throw an error
if the index goes over the available number of values.

Closes #104998
2024-06-18 11:00:40 +02:00
David Turner
4a0fc736f7 Revert "[ML] Handle the "output memory allocator bytes" field (#109653) (#109833)"
This reverts commit 44200c7743.
2024-06-18 08:35:06 +01:00
Ed Savage
44200c7743
[ML] Handle the "output memory allocator bytes" field (#109653) (#109833)
Handle the "output memory allocator bytes" field if and only if it is present in the model size stats, as reported by the C++ backend.

This PR must be merged prior to the corresponding ml-cpp one, to keep CI tests happy.

Backports #109653
2024-06-18 16:34:54 +12:00
Nhat Nguyen
8c1b168fd5
Fix ESQL cancellation for exchange requests (#109695) (#109712)
Currently, we do not register task cancellations for exchange requests, 
which leads to a long delay in failing the main request when a data-node 
request is rejected.
2024-06-14 08:13:04 +10:00
elasticsearchmachine
7ae24b31d3 Bump versions after 7.17.22 release 2024-06-13 15:15:24 +00:00
Ryan Ernst
308f2ace5f
Guard file settings readiness on file settings support (#109500) (#109556)
* Guard file settings readiness on file settings support (#109500)

Consistency of file settings is an important invariant. However, when
upgrading from Elasticsearch versions before file settings existed,
cluster state will not yet have the file settings metadata. If the first
node upgraded is not the master node, new nodes will never become ready
while they wait for file settings metadata to exist.

This commit adds a node feature for file settings to guard waiting on
file settings for readiness. Although file settings has existed since
8.4, the feature is not a historical feature because historical features
are not applied to cluster state that readiness checks. In this case it
is not needed since upgrading from 8.4+ will already contain file
settings metadata.

* fix test

* iter

* Revert "fix test"

This reverts commit 570e16a788.

* cleanup

* remove test from 8.15

* spotless
2024-06-14 01:04:09 +10:00
Mary Gouseti
a32f1ae2f3
[Data streams] Fix the source of a lazy rollover task (#109629) (#109637) 2024-06-12 19:59:25 +03:00
elasticsearchmachine
0cb9ee9b62 Bump versions after 8.14.1 release 2024-06-12 16:32:48 +00:00
Benjamin Trent
829ab67d57
[8.14] add hexstring support byte painless scorers (#109492) (#109595)
* add hexstring support byte painless scorers (#109492)

Hexidecimal strings are supported for index input and for kNN queries. We should support them for byte vectors in painless.

This commit addresses this for our common scoring functions.

closes: #109412

* adjust bwc test version
2024-06-12 06:03:30 +10:00
Martijn van Groningen
a20fa1713d
Re-define index.mapper.dynamic setting in 8.x (#109341) (#109564)
Currently when upgrading a 7.x cluster to 8.x with
`index.mapper.dynamic` index setting defined the following happens:

- In case of a full cluster restart upgrade, then the index setting gets archived and after the upgrade the cluster is in a green health.
- In case of a rolling cluster restart upgrade, then shards of indices with the index setting fail to allocate as nodes start with 8.x version. The result is that the cluster has a red health and the index setting isn't archived. Closing and opening the index should archive the index setting and allocate the shards.

The change is about ensuring the same behavior happens when upgrading a
cluster from 7.x to 8.x with indices that have the
`index.mapper.dynamic` index setting defined.  By re-defining the
`index.mapper.dynamic `index setting with
`IndexSettingDeprecatedInV7AndRemovedInV8` property, the index is
allowed to exist in 7.x indices, but can't be defined in new indices
after the upgrade. This way we don't have to rely on index archiving and
upgrading via full cluster restart or rolling restart will yield the
same outcome.

Based on the test in #109301. Relates to #109160 and #96075
2024-06-11 19:11:13 +10:00
Albert Zaharovits
5a2cef53c3
Fix task cancellation on remote cluster when original request fails (#109440) (#109484)
Fixes a bug where the task on the remote cluster node is not cancelled
when the original request (that started the task) fails (returns an
exception).
2024-06-08 02:29:12 +10:00
Benjamin Trent
fcb713ec6d
[8.14] Correct how hex strings are handled when dynamically updating vector dims (#109423) (#109448)
* Correct how hex strings are handled when dynamically updating vector dims (#109423)

closes: https://github.com/elastic/elasticsearch/issues/109411

* adjusting version

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-06-07 22:56:45 +10:00
Albert Zaharovits
807dc6c115
[8.14] Fix task cancellation authz on fulfilling cluster (#109357) (#109422)
This fixes task cancellation actions (i.e. internal:admin/tasks/cancel_child and internal:admin/tasks/ban) not being authorized by the fulfilling cluster. This can result in orphaned tasks on the fulfilling cluster.

Backport of #109357
2024-06-06 17:58:48 +03:00
elasticsearchmachine
62f28786ab Bump versions after 8.14.0 release 2024-06-05 15:07:02 +00:00
Nhat Nguyen
99b62b5f7b
Add remove index setting command (#109276) (#109327)
The new subcommand elasticsearch-node remove-index-settings can be used 
to remove index settings from the cluster state in case where it
contains incompatible index settings that prevent the cluster from
forming. This tool can cause data loss and its use should be your last
resort.

Relates #96075
2024-06-05 02:19:26 +10:00
Aurélien FOUCRET
fe3ef71223
Wrap "Pattern too complex" exception into an IllegalArgumentException (#109173) (#109255) 2024-05-31 19:47:00 +02:00
David Turner
0f93250a85
Fix double-pausing shard snapshot (#109148) (#109245)
Closes #109143
2024-05-31 09:29:37 -04:00
eyalkoren
80e1f431a6
Raw mapping merge fix for properties field (#108867) (#109147)
(cherry picked from commit 92dc76ee22)
2024-05-29 05:31:21 -04:00
Nhat Nguyen
a02f2f59d9
Harden field-caps request dispatcher (#108736) (#108833)
ExceptionHelper#useAndSuppress can throw exceptions if both input 
exceptions having the same root cause. If this happens, the field-caps
request dispatcher might fail to notify the completion to the caller. I
found this while running ES|QL with disruptions.

Relates #107347
2024-05-20 14:52:24 -04:00
Armin Braun
80c5ff28f9
Add internalClusterTest for and fix leak in ExpandSearchPhase (#108562) (#108582)
`ExpandSearchPhase` was leaking `SearchHits` when a pooled `SearchHits`
that was read from the wire was added to an unpooled `SearchHit`.
This commit makes the relevant `SearchHit` instances that need to be
pooled so they released nested hits, pooled. This requires a couple of
smaller adjustments in the codebase, mainly around error handling.
2024-05-13 15:10:45 -04:00
elasticsearchmachine
948fab0a25 Bump versions after 8.13.4 release 2024-05-10 21:38:45 +00:00
Jim Ferenczi
9341c15633
[8.14] Handle must_not clauses when disabling the weight matches highlighting mode (#108500)
* Handle must_not clauses when disabling the weight matches highlighting mode (#108453)

This change makes sure we check all queries, even the must_not ones, to decide if we should disable weight matches highlighting or not.

Closes #101667
Closes #106693

* adapt test skip version
2024-05-10 05:50:40 -04:00
Nhat Nguyen
e07f4b9f5a
Fix tsdb codec when doc-values spread in two blocks (#108276) (#108281)
Currently, loading ordinals multiple times (after advanceExact) for 
documents with values spread across multiple blocks in the TSDB codec
will fail due to the absence of re-seeking for the ordinals block.

Doc-values of a document can spread across multiple blocks in two cases:
when it has more than 128 values or when it exceeds the remaining space
in the current block.
2024-05-04 17:51:00 -04:00
Simon Cooper
810e86be1a
Backport serialization fix of put/delete shutdown requests to 8.14 (#108251)
Backport of #107862 to 8.14
2024-05-03 16:58:04 +01:00
elasticsearchmachine
176061e897 Bump versions after 7.17.21 release 2024-05-03 15:39:48 +00:00
elasticsearchmachine
6948daea89 Bump versions after 8.13.3 release 2024-05-03 15:12:16 +00:00
Christoph Büscher
ae1d15d305
Mute SearchTransportTelemetryTests testSearchTransportMetricsDfsQueryThenFetch (#107942) (#108141) 2024-05-01 09:03:02 -04:00
Luca Cavanna
b6c927823c
Handle parallel calls to createWeight when profiling is on (#108041)
We disable inter-segment concurrency in the query phase whenever profile is on, because there
are known concurrency issues that need fixing. The way we disable concurrency is by creating a single
slice that search will execute against. We still offload the execution to the search workers thread pool.

Inter-segment concurrency in Lucene is though not always based on slices. Knn query (as well as terms enum loading
and other places) parallelizes across all segments independently of slices that group multiple segments together.
That behavior is not easy to disable unless you don't set the executor to the searcher, in which case though you
entirely disable using the separate executor for potentially heavy CPU/IO based loads which is not desirable.

That means that when executing a knn query, it will execute in parallel (in DFS as well as in the query phase)
no matter if inter-segment concurrency has been disabled because profiling is on. When using pre-filtering,
there are queries like multi term queries that will call createWeight from each segment, in parallel, when
pulling the scorer. That causes non-deterministic behavior as the profiler does not support concurrent access
to some of its data structures.

This commit protects the profiler from concurrent access to its data structures by synchronizing access to its tree.
Performance is not a concern here, as profiler is already known to slow down query execution.

Closes #104235
Closes #104131
2024-04-30 11:27:33 +02:00
Simon Cooper
e4d7a7c8db
[8.14] Update min CCS version for 8.14 release (#107939) (#108006)
* Update min CCS version for 8.14 release (#107939)

* Update constant name
2024-04-29 08:27:41 -04:00
Simon Cooper
aeb8bc1b1c
Update several references to IndexVersion.toString to use toReleaseVersion (#107828) (#107889) 2024-04-26 16:45:36 +01:00
Simon Cooper
2b4ce40728
[8.14] Update several references to TransportVersion.toString to use toReleaseVersion (#107902) (#107935) 2024-04-26 13:13:11 +01:00
Alexander Spies
73a025ebfe
[8.14] ESQL: Fix MV_DEDUPE when using data from an index (#107577) (#107850)
* ESQL: Fix MV_DEDUPE when using data from an index (#107577)

Correctly label numerical/boolean blocks loaded from indices, so that MV_DEDUPE works correctly.

(cherry picked from commit 70cfe6f016)

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlFeatures.java

* Do not require MvOrdering.SORTED_ASCENDING

Adapt this fix so that we do not have to introduce SORTED_ASCENDING, and
thus do not have to bump the transport version for this to work.

Boolean/numerical doc values are just labelled as UNORDERED, instead,
which is still correct.
2024-04-24 16:02:48 -04:00
Ignacio Vera
5d0e72af83
Validate stats formatting in standard InternalStats constructor (#107678) (#107851)
We want to validate stats formatting before we serialize to XContent, as chunked x-content serialization
 assumes that we don't throw exceptions at that point. It is not necessary to do it in the StreamInput constructor 
as this one has been serialise from an already checked object.

This commit adds starts formatting validation to the standard InternalStats constructor.
2024-04-24 12:03:04 -04:00
Ignacio Vera
a69605c167
Use LogDocMergePolicy in DiversifiedSamplerTests#testDiversifiedSampler (#107826) (#107835)
Similar to tother cases, the addition of a new merge policy that reverse the order of the documents in lucene causes 
this test to fail in edge cases. To avoid randomisation we hardcode the merge policy to LogDocMergePolicy.
2024-04-24 09:20:37 -04:00
Nhat Nguyen
4adbc8473c
Fix minimized_round_trips in lookup runtime fields (#107785) (#107802)
Today, we have disabled ccs_minimized_round_trips for lookup requests, 
under the assumption that cross-cluster lookups occur when
ccs_minimized_round_trips is disabled in the main search request.
However, this assumption does not hold true for cases where the search
is local but the lookup happens remotely.
2024-04-23 20:45:27 -04:00
James Baiera
48131293ec
Fix bulk NPE when retrying failure redirect after cluster block (#107598) (#107793)
This PR fixes a bug in the bulk operation when retrying blocked cluster states before 
executing a failure store write by correctly wrapping the retry runnable to keep it from 
prematurely returning a null response.
2024-04-23 17:17:42 -04:00
Craig Taverner
d888413525
Fails 3/10000 times, so we're slightly less restrictive with numerical errors (#107679) (#107684)
Fixes #106126
2024-04-22 07:42:43 -04:00
Martijn van Groningen
cbfe5670e1
Address zstd release test failures in CodecTests. (#107477) (#107660)
The tests can only be run when zstd feature flag is enabled.

Closes #107417
2024-04-19 18:12:04 -04:00
David Turner
b7f37c872a
Fix CONCURRENT_REPOSITORY_WRITERS link (#107603) (#107609)
This page was split up in #104614 but the `ReferenceDocs` symbol links
to the top-level page still rather than the correct subpage. This fixes
the link.
2024-04-18 08:29:03 -04:00
Luca Cavanna
77a23e5d9f
Avoid attempting to load the same empty field twice in fetch phase (#107551)
During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes.

We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead.

This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time).

This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value.

Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`).

Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.
2024-04-18 09:26:32 +02:00
Mary Gouseti
b13aa013de
[DSL] Remove REST APIs for global retention (#107565) (#107596)
(cherry picked from commit 732c7c4c30)
2024-04-18 02:55:47 -04:00
Simon Cooper
eb6af0e6b5
Refactor PathTrie to tidy it up (#107542) 2024-04-17 15:29:22 +01:00
David Turner
a94f2b056a
Always validate node ID on relocation (#107420)
Follow-up to complete the change started in #107407, removing the
temporary compatibility shim.
2024-04-17 05:41:26 -04:00
Armin Braun
1d0c470de0
Stop using ReleasableLock in o.e.c.cache.Cache to save O(10M) in heap (#107555)
I have a couple heap dumps that show the lock wrapper alone waste O(10M)
in heap for these things. Also, I suspect the indirection does cost
non-trivial performance here in some cases. => lets spend a couple more
lines of code to save that overhead
2024-04-17 04:55:32 -04:00