This test was somehow difficult to write in the first place. We had to come up
with a threshold of how many tasks max are going to be created, but that is
not that easy to calculate as it depends on how quickly such tasks can be created
and be executed.
We should have rather used a higher threshold to start with, the important part
is anyways that we create a total of tasks that is no longer dependent on the
number of segments, given there are much less threads available to execute them.
Closes#116048
Split the test in two, one to verify behaviour with threashold greather than 1. Wrote a specific test for the edge case of threshold set to 1. Added a comment that explains the nuance around the behaviour and what influences it.
Closes#106647
This fixes a bug when concurrently executing index requests that have different types for the same field.
(cherry picked from commit 9658940a51)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Resolve pipelines from template if lazy rollover write (#116031)
If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index.
Fixes: #112781
* Remute tests block merge
* Remute tests block merge
Until now if `store.cleanupAndVerify` was called on a store with no
commits, it would throw `IndexNotFoundException`. Based on variable
naming (`metadataOrEmpty`), this appears to be unintentional, though the
issue has been present since the `cleanupAndVerify` method was
introduced.
This change is motivated by #104473 - I would like to be able to use
this method to clean up a store prior to recovery regardless of how
far along a previous recovery attempt got.
This PR adds telemetry for logsdb. However, this change only tracks the
count of indices using logsdb and those that use synthetic source.
Additional stats, such as shard, indexing, and search stats, will be
added in a follow-up, as they require reaching out to data nodes.
* Enable _tier based coordinator rewrites for all indices (not just mounted indices) (#115797)
As part of https://github.com/elastic/elasticsearch/pull/114990 we
enabled using the `_tier` field as part of the coordinator rewrite in
order to skip shards that do not match a `_tier` filter, but only for
fully/partially mounted indices.
This PR enhances the previous work by allowing a coordinator rewrite to
skip shards that will not match the `_tier` query for all indices
(irrespective of their lifecycle state i.e. hot and warm indices can
now skip shards based on the `_tier` query)
Note however, that hot/warm indices will not automatically take
advantage of the `can_match` coordinator rewrite (like read only
indices do) but only the search requests that surpass the
`pre_filter_shard_size` threshold will.
Relates to
[#114910](https://github.com/elastic/elasticsearch/issues/114910)
(cherry picked from commit 71dfb0689b)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* Fix test compilation
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded:
1. **`index.mapping.total_fields.limit`**: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail.
2. **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by `index.mapping.total_fields.limit`. If set to `false`, indexing will fail once the limit is surpassed. However, if set to `true`, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit.
To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of **`index.mapping.total_fields.ignore_dynamic_beyond_limit` from `false` to `true` in LogsDB**. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure.
This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.
* Check index setting for source mode in SourceOnlySnapshotRepository
* update
* Revert "update"
This reverts commit 9bbf0490f7.
(cherry picked from commit 37a4ee3102)
* Use directory name as project name for libs (#115720)
The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.
This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
* fixes
Since we removed the search workers thread pool with #111099, we execute many
more tasks in the search thread pool, given that each shard search request
parallelizes across slices or even segments (knn query rewrite. There are also
rare situations where segment level tasks may parallelize further
(e.g. createWeight), that cause the creation of many many tasks for a single
top-level request. These are rather small tasks that previously queued up in
the unbounded search workers queue. With recent improvements in Lucene,
these tasks queue up in the search queue, yet they get executed by the caller
thread while they are still in the queue, and remain in the queue as no-op
until they are pulled out of the queue. We have protection against rejections
based on turning off search concurrency when we have more than maxPoolSize
items in the queue, yet that is not enough if enough parallel requests see
an empty queue and manage to submit enough tasks to fill the queue at once.
That will cause rejections for top-level searches that should not be rejected.
This commit introduces wrapping for the executor to limit the number of tasks
that a single search instance can submit to the executor, to prevent the situation
where a single search submits way more tasks than threads available.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
The index.mode, source.mode, and index.sort.* settings cannot be
modified during restore, as this may lead to data corruption or issues
retrieving _source. This change enforces a restriction on modifying
these settings during restore. While a fine-grained check could permit
equivalent settings, it seems simpler and safer to reject restore
requests if any of these settings are specified.
Relates to #115811, but applies to resize requests.
The index.mode, source.mode, and index.sort.* settings cannot be
modified during resize, as this may lead to data corruption or issues
retrieving _source. This change enforces a restriction on modifying
these settings during resize. While a fine-grained check could allow
equivalent settings, it seems simpler and safer to reject resize
requests if any of these settings are specified.
Currently the thread context is lost between streaming context switches.
This commit ensures that each time the thread context is properly set
before providing new data to the stream.
This commit fixes and unmutes org.elasticsearch.script.StatsSummaryTests:testEqualsAndHashCode.
Previously, there was no guarantee that the doubles added to stats1 and stats2 will be different. In fact, the count may even be zero - which we seen in one particular failure. The simplest thing here, to avoid this potential situation, is to ensure that there is at least one value, and that the values added to each stats instance are different.
* Backport
* Version fix
* Another
* Fix
* Fix again
* Skip
* One more
* Formatting fix
---------
Co-authored-by: Johannes Fredén <109296772+jfreden@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
As part of ES|QL planning of a cross-cluster search, a field-caps call is done to each cluster and,
if an ENRICH command is present, the enrich policy-resolve API is called on each remote. If a
remote cluster cannot be connected to in these calls, the outcome depends on the
skip_unavailable setting.
For skip_unavailable=false clusters, the error is fatal and the error will immediately be propagated
back to the client with a top level error message with a 500 HTTP status response code.
For skip_unavailable=true clusters, the error is not fatal. The error will be trapped, recorded in the
EsqlExecutionInfo object for the query, marking the cluster as SKIPPED. If the user requested
CCS metadata to be included, the cluster status and connection failure will be present in the
_clusters/details section of the response.
If no clusters can be contacted, if they are all marked as skip_unavailable=true, no error will be
returned. Instead a 200 HTTP status will be returned with no column and no values. If the
include_ccs_metadata: true setting was included on the query, the errors will listed in the
_clusters metadata section. (Note: this is also how the _search endpoint works for CCS.)
Partially addresses https://github.com/elastic/elasticsearch/issues/114531
In case a stream handler throws uncaught exception, we should close the
channel and release associated resources to avoid the channel entering a
limbo state. This PR does that.
Resolves: ES-9537
Co-authored-by: Yang Wang <yang.wang@elastic.co>
When a numeric setting is too large or too small such that it can't be
parsed at all, the error message is the same as for garbage values. This
commit improves the error message in these cases to be the same as for
normal bounds checks.
closes#115080
The version randomization has been changed recently with the unintended effect
that now randomized "old" and "new" versions can be the same, and new versions
can even be lower than old versions. This change corrects this by going back to
the previous version randomization logic.
Closes#114593
* fix: correctly update search status for a nonexistent local index
* Check for cluster existence before updation
* Remove unnecessary `println`
* Address review comment: add an explanatory code comment
* Further clarify code comment
(cherry picked from commit ad9c5a0a06)
Forking when an action completes on the current thread is needlessly heavy handed
in preventing stack-overflows. Also, we don't need locking/synchronization
to deal with a worker-count + queue length problem. Both of these allow for
non-trivial optimization even in the current execution model, also this change
helps with moving to a more efficient execution model by saving needless forking to
the search pool in particular.
-> refactored the code to never fork but instead avoid stack-depth issues through use
of a `SubscribableListener`
-> replaced our home brew queue and semaphore combination by JDK primitives which
saves blocking synchronization on task start and completion.
We use about 1M for the route stats trackers instances per ES instance.
Making this lazy init should come at a trivial overhead and in fact
makes the computation of the node stats cheaper by saving spurious sums
on 0-valued long adders.
* Add lookup index mode (#115143)
This change introduces a new index mode, lookup, for indices intended
for lookup operations in ES|QL. Lookup indices must have a single shard
and be replicated to all data nodes by default. Aside from these
requirements, they function as standard indices. Documentation will be
added later when the lookup operator in ES|QL is implemented.
* default shard
* minimal
* compile
The approach taken by `ExpressionList` becomes very expensive for large
numbers of indices/datastreams. It implies that large lists of concrete
names (as they are passed down from the transport layer via e.g. security)
are copied at least twice during iteration.
Removing the intermediary list and inlining the logic brings down the latency of searches
targetting many shards/indices at once and allows for subsequent
optimizations.
The removed tests appear redundant as they tested an implementation
detail of the IndexNameExpressionResolver which itself is well covered
by its own tests.
The blob store may be triggered to create a local directory while in a
reduced privilege context. This commit guards the creation of
directories with doPrivileged.
* Allow for queries on _tier to skip shards during coordinator rewrite (#114990)
The `_tier` metadata field was not used on the coordinator when
rewriting queries in order to exclude shards that don't match. This lead
to queries in the following form to continue to report failures even
though the only unavailable shards were in the tier that was excluded
from search (frozen tier in this example):
```
POST testing/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"_tier": "data_frozen"
}
}
]
}
}
}
```
This PR addresses this by having the queries that can execute on `_tier`
(term, match, query string, simple query string, prefix, wildcard)
execute a coordinator rewrite to exclude the indices that don't match
the `_tier` query **before** attempting to reach to the shards (shards,
that might not be available and raise errors).
Fixes#114910
* Don't use getFirst
* test compilation
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
**Introduction**
> In order to make adoption of failure stores simpler for all users, we
are introducing a new syntactical feature to index expression
resolution: The selector. > > Selectors, denoted with a :: followed by a
recognized suffix will allow users to specify which component of an
index abstraction they would like to operate on within an API call. In
this case, an index abstraction is a concrete index, data stream, or
alias; Any abstraction that can be resolved to a set of indices/shards.
We define a component of an index abstraction to be some searchable unit
of the index abstraction. > > To start, we will support two components:
data and failures. Concrete indices are their own data components, while
the data component for index aliases are all of the indices contained
therein. For data streams, the data component corresponds to their
backing indices. Data stream aliases mirror this, treating all backing
indices of the data streams they correspond to as their data component.
> > The failure component is only supported by data streams and data
stream aliases. The failure component of these abstractions refer to the
data streams' failure stores. Indices and index aliases do not have a
failure component.
For more details and examples see
https://github.com/elastic/elasticsearch/pull/113144. All this work has
been cherry picked from there.
**Purpose of this PR**
This PR is introducing the `::*` as another selector option and not as a
combination of `::data` and `::failure`. The reason for this change is
that we need to differentiate between:
- `my-index::*` which should resolve to `my-index::data` only and not to `my-index::failures` and
- a user explicitly requesting `my-index::data, my-index::failures` which should result potentially to an error.
Long GC disruption relies on Thread.resume, which is removed in JDK 23.
Tests that use it predate more modern disruption tests. This commit
removes gc disruption and the master disruption tests. Note that tests
relying on this scheme have already not been running since JDK 20 first
deprecated Thread.resume.