The health node might not have received the health info from all
nodes yet before the execution of this test, resulting in an
"unknown" status. We make the status assertion more lenient to
allow for this uncertainty. Additionally, we add some more
assertions for the basic response structure of the other indicators.
We will need to acquire a new listener in the catch block; therefore, we
should not release the listenerRefs in the try-with-resources block,
which is executed before the catch block.
Relates #108580
ExceptionHelper#useAndSuppress can throw exceptions if both input
exceptions having the same root cause. If this happens, the field-caps
request dispatcher might fail to notify the completion to the caller. I
found this while running ES|QL with disruptions.
Relates #107347
We noticed a regression in performance for topn last week. It turns out
that we had turned off support for skipping non-competitive docs. We
shouldn't do that!
Closes#108565
A lot of these lists are empty most of the time, we can save memory here
by moving to immutable lists. Found in a heap dump where this saves
about 10M of heap.
Change the ingest byte stats to always be returned
whether or not they have a value of 0. Add human readable
form of byte stats. Update docs to reflect changes.
Add test confirming that pipelines are run after a reroute.
Fix test of two stage reroute. Delete pipelines during teardown
so as to not break other tests using name pipeline name.
Co-authored-by: Joe Gallo <joegallo@gmail.com>
Add the semantic query to the Query DSL, which is used to query semantic_text fields
---------
Co-authored-by: carlosdelest <carlos.delgado@elastic.co>
Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Now that mock logging has a single internal appender, the "appender"
suffix for MockLogAppender doesn't make sense. This commit renames the
class to MockLog. It was completely mechanical, done with IntelliJ
renames.
We can dry up the tests a little, remove a branch that is never taken
(equality of response object and `Integer` is always false there)
and remove redundant arguments in the production code to simplify this
code a little.
This commit enables inter-segment search concurrency for numeric terms aggs over long, integer and short field types.
It estimates the cardinality by computing the min and max value of the shard using the BKD tree. When the estimated
cardinality of the field being aggregated on is lower than the shard size then inter-segment concurrency is enabled.
This commit fixes a potential multithreading issue with the lib vec
vector scorer.
Since the implementation falls back to a lucene scorer which needs to
read from the index input, then we need to make a copy of the index
input. Otherwise, there is a potential for the stateful index input to
be accessed across threads - which would be bad.
The fallback is only used when one or other vector cross a segment
boundary, which is 16G by default. So the likelihood of this occurring
in practice is small, but the affect is bad.
The fix is deliberately small and targeted, so that it can be
backported. After this change, I'm going to drop the custom VectorScorer
and adapter type, in favour of using the Lucene type directly. This
custom types were initially used when the code lived inside the native
module, where we didn't want to add a dependency on Lucene directly.
This PR is a syntactic change for `registerRepository` in
`RepositoriesService`. I use `SubscribableListener` to display order of
events and reduce boilerplate code around failures delegation
`listener.delegateFailureAndWrap`.
It's a part of larger change for verification logic, which should take
advantage of this "sequential" version of code. #108531
Add the notion of a "zero page" or "hole" to big arrays. We have some use cases where we run up byte arrays of hundreds of MB that are extremely sparse.
Each page starts out as a "hole" and only gets replaced by a real page from the pool on write similar to how FS holes work.
This change adds a small amount of overhead to the write side but is performance neutral or better on the read side (for sparse arrays we likely get a big improvement from using less CPU cache).
The only change outside of the array itself this needed was in CCR, see inline comments for that.
Current ingest byte stat fields could easily be confused.
Add more descriptive name to make it clear that they do not
count all docs processed by the pipeline.
This moves the logic for finding the offset in a table that we will use
in `LOOKUP` from a method on `BlockHash` and some complex building logic
in `HashLookupOperator`. Now it's in an `RowInTable` interface - both
a static builder method and some implementations.
There are three implementations:
1. One that talks to `BlockHash` just like `HashLookupOperator` used to.
Right now it talks to `PackedValuesBlockHash` because it's the only
one who's `lookup` method returns the offset in the original row, but
we'll fix it eventually.
2. A `RowInTable` that works with increasing sequences of integers,
say, `1, 2, 3, 4, 5` - this is fairly simple - it just checks that
the input is between `1` and `5` and, if it is, subtracts `1`. Easy.
Obvious. And very very fast. Simple. Good simple example.
3. An `RowInTable` that handles empty tables - this just makes
writing the rest of the code simpler. It always returns `null`.
Currently, we only reinitialize the Lucene internal of the new top if
the tsid changes. This isn't enough. We should always ensure the new top
is reinitialized if necessary, regardless of tsid.
Closes#108727
Check that capabilities required in CSV tests really exist.
This avoids: - Having old capabilities (We aren't removing them afaik,
so may never happen) - Typos in capabilities
Currently, it would probably fail in the BWC tests. But this way we
avoid either waiting for them, or other potential errors.
_This change was extracted from [another
PR](https://github.com/elastic/elasticsearch/pull/108574) where there
was such typo in a commit_
* add docs and embeddings tutorial pieces
* cleanup openai reference
* Suggested cleanups; add missing div tag
* one more change for clarity (requests per minute)
Some tests use MockLogAppender to assert on a single expected logging
message. The utility method assertThatLogger handles creating the
appender and asserting the expecation. However some other tests want to
do the same but with multiple expectations. This commit adjusts
assertThatLogger to allow multiple expectations, and converts a few
tests that had helper methods that are now obsoleted.
Add ingested_in_bytes and produced_in_bytes stats to pipeline ingest stats.
These track how many bytes are ingested and produced by a given pipeline.
For efficiency, these stats are recorded for the first pipeline to process a
document. Thus, if a pipeline is called as a final pipeline after a default pipeline,
as a pipeline processor, and after a reroute request, a document will not
contribute to the stats for that pipeline. If a given pipeline has 0 bytes recorded
for both of these stats, due to not being the first pipeline to run any doc, these
stats will not appear in the pipeline's entry in ingest stats.
This adds in the tests from OptimizerRunTests in SQL to apply to ESQL. I've opened issues and applied the AwaitsFix annotation for those of the tests that are currently failing.