This logic will need a bit of adjustment for bulk query execution.
Lets dry it up before so we don't have to copy and paste the fix which
will be a couple lines.
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Nik Everett <nik9000@gmail.com>
The node environment has many paths. The accessors for these currently
use a "file" suffix, but they are always directories. This commit
renames the accessors to make it clear these paths are directories.
This causes the ESQL heap attack tests to grow their memory usage if
they first don't cause a circuit breaking exception. It just tries again
with more data. That's slow, but it should stop this from failing quite
as much. And it'll give us even more information about failures.
Closes#121465
We experimented with using synthetic source for recovery and observed quite positive impact
on indexing throughput by means of our nightly Rally benchmarks. As a result, here we enable
it by default when synthetic source is used. To be more precise, if `index.mapping.source.mode`
setting is `synthetic` we enable recovery source by means of synthetic source.
Moreover, enabling synthetic source recovery is done behind a feature flag. That would allow us
to enable it in snapshot builds which in turn will allow us to see performance results in Rally nightly
benchmarks.
Stateless has a subclass for PersistedClusterStateService which needs to
know the value of supportsMultipleProjects. Due to how a node is
initialized, the value is not known until the plugin components are
created which is after the stateless service is created. Therefore this
PR changes the parameter from boolean to a booleanSupplier so that it
can be resolved later when actually used.
The alternative is to change the PersistedClusterStateServiceFactory
interface so it takes a projectResolver or boolean
supportsMultipleProjects. I prefer to not change the interface. The
creation of Stateless PersistedClusterStateService already uses two
other suppliers so that it feels more aligned to use a boolean supplier.
The other reason is that supportsMultipleProjects may not be needed in
long term so it is better to not have it in an interface.
Relates: MP-1880
This patch adds the needed data generator and source matcher to include
counted_keyword fields in our randomized testing.
This patch also updates the source matcher such that field-specific
matchers are checked before the generic matcher is used. It seems that
this is the correct behavior, and the only reason the generic matcher was
checked first was as a workaround for issue #111916, which is now closed.
Under very unfortunate conditions tests that check xContent objects
roundtrip parsing (like i.e. SearchHitsTests#testFromXContent)
can fail when we happen to randomly pick YAML xContent type and create
random (realistic)Unicode character sequences that may contain the
character U+0085 (133) from the Latin1 code page. That specific character
doesn't get parsed back to its original form for YAML xContent, which can
lead to rare but hard to diagnose test failures.
This change adds logic to AbstractXContentTestCase#test() which lies at
the core of most of our xContent roundtrip tests that disallows test
instances containing that particular character when using YAML xContent
type.
Closes#97716
This reverts commit ae0f1a64b5.
The refresh block would be removed in a subsequent cluster state
update instead of removing it immediately after an index is ready
for searches.
Closes ES-10697
Indices from different projects could have identical mappings and hence
identical mapping hash. The unused mapping hashes are computed for a
project scope. Hence the deletion must also be project-scoped to not
deleting mapping hashes on a different project.
Resolves: ES-10568
Reenables some heap attack tests, bumping their memory requirements to
try and force a failure on all CI machines. Previously some CI machines
weren't failing, invalidating the test on those machines.
Close#121481Close#121465
Under very unfortunate conditions tests that check xContent objects
roundtrip parsing (like i.e. [SearchHitsTests
testFromXContent](https://github.com/elastic/elasticsearch/issues/97716)
can fail when we happen to randomly pick YAML xContent type and create
random (realistic)Unicode character sequences that may contain the
character U+0085 (133) from the [Latin1 code
page](https://de.wikipedia.org/wiki/Unicodeblock_Lateinisch-1,_Erg%C3%A4nzung).
That specific character doesn't get parsed back to its original form for
YAML xContent, which can lead to [rare but hard to diagnose test
failures](https://github.com/elastic/elasticsearch/issues/97716#issuecomment-2464465939).
This change adds logic to AbstractXContentTestCase#test() which lies at
the core of most of our xContent roundtrip tests that disallows test
instances containing that particular character when using YAML xContent
type.
Closes#97716
* ESQL: Expand HeapAttack for LOOKUP
This expands the heap attack tests for LOOKUP. Now there are three
flavors:
1. LOOKUP a single geo_point - about 30 bytes or so.
2. LOOKUP a one mb string.
3. LOOKUP no fields - just JOIN to alter cardinality.
Fetching a geo_point is fine with about 500 repeated docs before it
circuit breaks which works out to about 256mb of buffered results.
That's sensible on our 512mb heap and likely to work ok for most folks.
We'll flip to a streaming method eventually and this won't be a problem
any more. But for now, we buffer.
The no lookup fields is fine with like 7500 matches per incoming row.
That's quite a lot, really.
The 1mb string is trouble! We circuit break properly which is great and
safe, but if you join 1mb worth of columns in LOOKUP you are going to
need bigger heaps than our test. Again, we'll move from buffering these
results to streaming them and it'll work better, but for now we buffer.
* updates
Adds non-grouping support for min, max, sum, and count, using
CompositeBlock as the underlying block type and an internal
FromAggregateMetricDouble function to handle converting from
CompositeBlock to the correct metric subfields.
Closes#110649
This commit adds compatibility tests that target ES revisions that align with specific Lucene versions. In this case, we are intending to upgrade from Lucene 10.0 to 10.1. Since no on-prem Elasticsearch release exists with 10.0, we need another method to ensure compatibility with Lucene 10.0 indicies.
The work here is a bit hacky since all our compatibility testing infrastructure is centered around versions and we're now effectively doing compatibility tests between two different revisions of Elasticsearch that both report the same version. Ideally this specific testing would be replaced by unit tests, rather that reusing our full cluster restart tests for this purpose.
We'll also want to bump the commit referenced in the CI pipelines here to align with the last commit using Lucene 10.0.
Today, Elasticsearch supports two models to establish secure connections
and trust between two Elasticsearch clusters:
- API key based security model
- Certificate based security model
This PR deprecates the _Certificate based security model_ in favour of *API key based security model*.
The _API key based security model_ is preferred way to configure remote clusters,
as it allows to follow security best practices when setting up remote cluster connections
and defining fine-grained access control.
Users are encouraged to migrate remote clusters from certificate to API key authentication.
This commit enhances the ShardStartedClusterStateTaskExecutor by
introducing functionality to automatically remove the
INDEX_REFRESH_BLOCK once an index becomes searchable.
The change ensures search availability by checking that at least one
copy of each searchable shard is available whenever an unpromotable
shard is started. Once this condition is met, the INDEX_REFRESH_BLOCK
is removed.
Closes ES-10278