The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from `perThreadProvider` field could be stale and therefore returning a previous source provider. Instead the source provider should be returned from `provider` local variable.
This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.
Closes#118238
* ESQL: Opt into extra data stream resolution
This opts ESQL's data node request into extra data stream resolution.
* Update docs/changelog/118378.yaml
* Fix enrich cache size setting name (#117575)
The enrich cache size setting accidentally got renamed from
`enrich.cache_size` to `enrich.cache.size` in #111412. This commit
updates the enrich plugin to accept both names and deprecates the
wrong name.
* Remove `UpdateForV10` annotation
Today, when an ES|QL task encounters an exception, we trigger a
cancellation on the root task, causing child tasks to fail due to
cancellation. We chose not to include cancellation exceptions in the
output, as they are unhelpful and add noise during problem analysis.
However, these exceptions are still slipping through via
RefCountingListener. This change addresses the issue by introducing
ESQLRefCountingListener, ensuring that no cancellation exceptions are
returned.
Some mock verifies where missing and `LicenseState#copyCurrentLicenseState(...)` wasn't always mocked.
And because of incorrect mocking the testGoldOrPlatinumLicenseCustomCutoffDate() test had an incorrect assertion.
* ESQL: Fix a bug in LuceneQueryExpressionEvaluator
This fixes Lucene usage bug in `LuceneQueryExpressionEvaluator`, the
evaluator we plan to use to run things like `MATCH` when we *can't* push
it to a source operator. That'll be useful for things like:
```
FROM foo
| STATS COUNT(),
COUNT() WHERE MATCH(message, "error")
```
Explanation:
When using Lucene's `Scorer` and `BulkScorer` you must stay on the same
thread. It's a rule. Most of the time nothing bad happens if you shift
threads, but sometimes things explode and Lucene doesn't work. Driver
can shift from one thread to another - that's just how it's designed.
It's a "yield after running a while" kind of thing.
In tests we sometimes get a version of the `Scorer` and `BulkScorer`
that assert that you don't shift threads. That is what caused this test
failure.
Anyway! This builds protection into `LuceneQueryExpressionEvaluator` so
that if it *does* shift threads then it'll rebuild the `Scorer` and
`BulkScorer`. That makes the test happy and makes even the most grump
Lucene object happy.
Closes#116879
The test setup for `ProfileIntegTests` is flawed, where the full name of
a user can be a substring of other profile names (e.g., `SER` is a
substring of `User <random-string>-space1`) -- when that's passed into
suggest call with the `*` space, we get a match on all profiles, instead
of only the one profile expected in the test, since we are matching on
e.g. `SER*`. This PR restricts the setup to avoid the wildcard profile
for that particular test.
Closes: https://github.com/elastic/elasticsearch/issues/117782
* Address mapping and compute engine runtime field issues (#117792)
This change addresses the following issues:
Fields mapped as runtime fields not getting stored if source mode is synthetic.
Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
Address concurrency issue when runtime fields get pushed down to Lucene. (2)
1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.
Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.
2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.
Closes#117644
* fixed compile error after backporting
We identified a BWC bug in the cluster computer request. Specifically,
the indices options were not properly selected for requests from an
older querying cluster. This caused the search_shards API on the remote
cluster to use restricted indices options, leading to failures when
resolving wildcard index patterns.
Our tests didn't catch this issue because the current BWC tests for
cross-cluster queries only cover one direction: the querying cluster on
the current version and the remote cluster on a compatible version.
This PR fixes the issue and expands BWC tests to support both
directions: the querying cluster on the current version with the remote
cluster on a compatible version, and vice versa.
* Don't skip shards in coord rewrite if timestamp is an alias (#117271)
The coordinator rewrite has logic to skip indices if the provided date range
filter is not within the min and max range of all of its shards. This mechanism
is enabled for event.ingested and @timestamp fields, against searchable snapshots.
We have basic checks that such fields need to be of date field type, yet if they
are defined as alias of a date field, their range will be empty, which indicates
that the shards are empty, and the coord rewrite logic resolves the alias and
ends up skipping shards that may have matching docs.
This commit adds an explicit check that declares the range UNKNOWN instead of EMPTY
in these circumstances. The same check is also performed in the coord rewrite logic,
so that shards are no longer skipped by mistake.
* fix compile
> **java.lang.AssertionError: Leftover exchanges ExchangeService{sinks=[veZSyrPATq2Sg83dtgK3Jg:700/3]} on node node_s4**
I looked into the test failure described in
https://github.com/elastic/elasticsearch/issues/117253. The reason we
don't clean up the exchange sink quickly is that, once a failure occurs,
we cancel the request along with all its child requests. These exchange
sinks will be cleaned up only after they become inactive, which by
default takes 5 minutes.
We could override the `esql.exchange.sink_inactive_interval` setting in
the test to remove these exchange sinks faster. However, I think we
should allow exchange requests that close exchange sinks to bypass
cancellation, enabling quicker resource cleanup than the default
inactive interval.
Closes#117253
* ESQL: fix COUNT filter pushdown (#117503)
If `COUNT` agg has a filter applied, this must also be push down to source. This currently does not happen, but this issue is masked currently by two factors:
* a logical optimisation, `ExtractAggregateCommonFilter` that extracts the filter out of the STATS entirely (and pushes it to source then from a `WHERE`);
* the phisical plan optimisation implementing the push down, `PushStatsToSource`, currently only applies if there's just one agg function to push down.
However, this fix needs to be applied since:
* it's still present in versions prior to `ExtractAggregateCommonFilter` introduction;
* the defect might resurface when the restriction in `PushStatsToSource` is lifted.
Fixes#115522.
(cherry picked from commit 560e0c5d04)
* 8.17 adaptation
Currently, we have three clients fetching pages by default, each with
its own lifecycle. This can result in scenarios where more than one
request is sent to complete the remote sink. While this does not cause
correctness issues, it is inefficient, especially for cross-cluster
requests. This change tracks the status of the remote sink and tries to
send only one finish request per remote sink.
* Add test and fix
* Update docs/changelog/117595.yaml
* Remove test which wasn't working
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Indicates whether es.mapping.synthetic_source_fallback_to_stored_source.cutoff_date_restricted_override system property has been configured.
A follow up from #116647
* Adjust SyntheticSourceLicenseService (#116647)
Allow gold and platinum license to use synthetic source for a limited time. If the start time of a license is before the cut off date, then gold and platinum licenses will not fallback to stored source if synthetic source is used.
Co-authored-by: Nikolaj Volgushev <n1v0lg@users.noreply.github.com>
* spotless
---------
Co-authored-by: Nikolaj Volgushev <n1v0lg@users.noreply.github.com>
In the ManyShardsIT#testRejection test, we intercept exchange requests
and fail them with EsRejectedExecutionException, verifying that we
return a 400 response instead of a 500.
The issue with the current test is that if a data-node request never
arrives because the whole request was canceled after the exchange
request failed—the leftover exchange sink remains until it times out,
which defaults to 5 minutes. This change adjusts the test to use a
single data node and ensures exchange requests are only failed after the
data-node request has arrived.
Closes#112406Closes#112418Closes#112424
Each data-node request involves two exchange sinks: an external one for
fetching pages from the coordinator and an internal one for node-level
reduction. Currently, the test selects one of these sinks randomly,
leading to assertion failures. This update ensures the test consistently
selects the external exchange sink.
Closes#117397
[esql] > Unexpected error from Elasticsearch: illegal_state_exception - sink exchanger for id [ruxoDDxXTGW55oIPHoCT-g:964613010] already exists.
This issue occurs when two or more clusterAliases point to the same
physical remote cluster. The exchange service assumes the destination is
unique, which is not true in this topology. This PR addresses the
problem by appending a suffix using a monotonic increasing number,
ensuring that different exchanges are created in such cases.
Another issue arising from this behavior is that data on a remote
cluster is processed multiple times, leading to incorrect results. I can
work on the fix for this once we agree that this is an issue.
ES|QL doesn't work well with 500 clusters or clusters with 500 nodes.
The reason is that we enqueue three tasks to the thread pool queue,
which has a limit of 1000, during the initialization of the exchange for
each target (cluster or node). This simple PR reduces it to one task.
I'm considering using AsyncProcessor for these requests, but that will
be a follow-up issue for later.
* Fix deberta tokenizer bug caused by bug in normalizer which caused offesets to be negative
* Update docs/changelog/117189.yaml
(cherry picked from commit 5500a5ec68)