Starting with empty rows and growing them causes lots of allocations and
thus bad performance in case of many large fields being contained in the
rows.
Instead, use the previously encountered row to estimate the size of the next row.
It's less code and it actually inlines (avoiding virtual calls in most
cases) to just do the null check here instead of delegating to IOUtils
and then catching the impossible IOException. Also, no need to use
`Releaseables` in 2 spots where try-with-resources works as well and
needs less code.
Noticed this when I saw that we had a lot of strange CPU overhead in
this call in some hot loops like translog writes.
Our readEnum code instantiates/clones enum value arrays on read.
Normally, this doesn't matter much but the two spots adjusted here are
visibly hot during bulk indexing, causing GBs of allocations during e.g.
the http_logs indexing run.
This introduces a second implementation of RequestBuilder (#104778). As opposed
to ActionRequestBuilder, ActionRequestLazyBuilder does not create its request
until the request() method is called, and does not hold onto that request (so each
call to request() gets a new request instance).
This PR also updates BulkRequestBuilder to inherit from ActionRequestLazyBuilder
as an example of its use.
* Document nested expressions for stats
* More docs
* Apply suggestions from review
- count-distinct.asciidoc
- Content restructured, moving the section about approximate counts to end of doc.
- count.asciidoc
- Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows.
- percentile.asciidoc
- Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc.
- stats.asciidoc
- Clarified the `STATS` command
- Added a note indicating that individual `null` values are skipped during aggregation
* Comment out mentioning a buggy behavior
* Update sum with inline function example, update test file
* Fix typo
* Delete line
* Simplify wording
* Fix conflict fix typo
---------
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Today, we allow ESQL to execute against an unlimited number of shards
concurrently on each node. This can lead to cases where we open and hold
too many shards, equivalent to opening too many file descriptors or
using too much memory for FieldInfos in ValuesSourceReaderOperator.
This change limits the number of concurrent shards to 10 per node. This
number was chosen based on the _search API, which limits it to 5.
Besides the primary reason stated above, this change has other
implications:
We might execute fewer shards for queries with LIMIT only, leading to
scenarios where we execute only some high-priority shards then stop.
For now, we don't have a partial reduce at the node level, but if we
introduce one in the future, it might not be as efficient as executing
all shards at the same time. There are pauses between batches because
batches are executed sequentially one by one. However, I believe the
performance of queries executing against many shards (after can_match)
is less important than resiliency.
Closes#103666
This adds support for the `match` query type to the Query API key Information API.
Note that since string values associated to API Keys are mapped as `keywords`,
a `match` query with no analyzer parameter is effectively equivalent to a `term` query
for such fields (e.g. `name`, `username`, `realm_name`).
Relates: #101691
This adds support for the `type` parameter, for sorting, to the Query API key API.
The type for an API Key can currently be either `rest` or `cross_cluster`.
This was overlooked in #103695 when support for the `type` parameter
was first introduced only for querying.
SearchStats#count incorrectly counts the number of documents (or rows)
in which a document appears instead of the actual number of values.
This PR fixes this by looking at the term frequency instead of the doc
count.
Fix#104795
During a promotable relocation, a `get_from_translog` sent by the
unpromotable shard to handle a real-time get might encounter
`ShardNotFoundException` or `IndexNotFoundException`. In these cases,
we should retry.
This is just for `GET`. I'll open a second PR for `mGET`. The relevant
IT is in the Stateless PR.
Relates ES-5727
* [Profiling] Add the number of cores to HostMetadata
* Update AWS pricelist (remove cost_factor, add usd_per_hour)
* Switch cost calculations from 'cost_factor' to 'usd_per_hour'
* Remove superfluous CostEntry.toXContent()
* Check for Number type in CostEntry.fromSource()
* Add comment
This change fixes the engine to apply the current codec when retrieving documents from the translog.
We need to use the same codec than the main index in order to ensure that all the source data is indexable.
The internal codec treats some fields differently than the default one, for instance dense_vectors are limited to 1024 dimensions.
This PR ensures that these customizations are applied when indexing document for translog retrieval.
Closes#104639
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Clarify that in this situation there is a rebalancing move that would
improve the cluster balance, but there's some reason why rebalancing is
not happening. Also points at the `can_rebalance_cluster_decisions` as
well as the node-by-node decisions since the action needed could be
described in either place.
BestBucketsDeferringCollector holds the documents and buckets in memory to be replayed to the children
aggregations. These objects can get large and they are not backed by BigArrays so let's release them as soon as they
are consume.
This commit addresses 3 problems in the blob cache:
* Fix a race during initChunk where the result would be a fallback to direct read.
* Fix a bug in computeDecay that led to only decaying the first item per frequency.
* Remove the time dependency of the cache by moving to a logical clock (epochs)
Trigger decay whenever freq0 is empty, ensuring we decay slowly/rapidly as needed.
Divide time into epochs, switch to new one whenever we need to decay. A region
now promotes 2 freqs per access but only once per epoch
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>