Speeds up the VALUES agg when collecting from many buckets.
Specifically, this speeds up the algorithm used to `finish` the
aggregation. Most specifically, this makes the algorithm more tollerant
to large numbers of groups being collected. The old algorithm was
`O(n^2)` with the number of groups. The new one is `O(n)`
```
(groups)
1 219.683 ± 1.069 -> 223.477 ± 1.990 ms/op
1000 426.323 ± 75.963 -> 463.670 ± 7.275 ms/op
100000 36690.871 ± 4656.350 -> 7800.332 ± 2775.869 ms/op
200000 89422.113 ± 2972.606 -> 21920.288 ± 3427.962 ms/op
400000 timed out at 10 minutes -> 40051.524 ± 2011.706 ms/op
```
The `1` group version was not changed at all. That's just noise in the
measurement. The small bump in the `1000` case is almost certainly worth
it and real. The huge drop in the `100000` case is quite real.
To avoid having AggregateMapper find aggregators based on their names with reflection, I'm doing some changes:
- Make the suppliers have methods returning the intermediate states
- To allow this, the suppliers constructor won't receive the chanells as params. Instead, its methods will ask for them
- Most changes in this PR are because of this
- After those changes, I'm leaving AggregateMapper still there, as it still converts AggregateFunctions to its NamedExpressions
```
before after
(operation) Score Error Score Error Units
coalesce_2_noop 75.949 ± 3.961 -> 0.010 ± 0.001 ns/op 99.9%
coalesce_2_eager 99.299 ± 6.959 -> 4.292 ± 0.227 ns/op 95.7%
coalesce_2_lazy 113.118 ± 5.747 -> 26.746 ± 0.954 ns/op 76.4%
```
We tend to advise folks that "COALESCE is faster than CASE", but, as of
8.16.0/https://github.com/elastic/elasticsearch/pull/112295 that wasn't the true. I was working with someone a few
days ago to port a scripted_metric aggregation to ESQL and we saw
COALESCE taking ~60% of the time. That won't do.
The trouble is that CASE and COALESCE have to be *lazy*, meaning that
operations like:
```
COALESCE(a, 1 / b)
```
should never emit a warning if `a` is not `null`, even if `b` is `0`. In
8.16/https://github.com/elastic/elasticsearch/pull/112295 CASE grew an optimization where it could operate non-lazily
if it was flagged as "safe". This brings a similar optimization to
COALESCE, see it above as "case_2_eager", a 95.7% improvement.
It also brings and arguably more important optimization - entire-block
execution for COALESCE. The schort version is that, if the first
parameter of COALESCE returns no nulls we can return it without doing
anything lazily. There are a few more cases, but the upshot is that
COALESCE is pretty much *free* in cases where long strings of results
are `null` or not `null`. That's the `coalesce_2_noop` line.
Finally, when there mixed null and non-null values we were using a
single builder with some fairly inefficient paths. This specializes them
per type and skips some slow null-checking where possible. That's the
`coalesce_2_lazy` result, a more modest 76.4%.
NOTE: These %s of improvements on COALESCE itself, or COALESCE with some load-overhead operators like `+`. If COALESCE isn't taking a *ton* time in your query don't get particularly excited about this. It's fun though.
Closes#119953
* Exhaustive testParseFractionalNumber
* Refactor: encapsulate ByteSizeUnit constructor
* Refactor: store size in bytes
* Support up to 2 decimals in parsed ByteSizeValue
* Fix test for rounding up with no warnings
* ByteSizeUnit transport changes
* Update docs/changelog/120142.yaml
* Changelog details and impact
* Fix change log breaking.area
* Address PR comments
Closes https://github.com/elastic/elasticsearch/issues/119969
- Rename "pages_in/out" to "pages_received/emitted", to standardize the name along most operators
- **There are still "pages_processed" operators**, maybe it would make sense to also rename those?
- Add "pages_received/emitted" to TopN operator, as it was missing that
- Added "rows_received/emitted" to most operators
- Added a test to ensure all operators with status provide those metrics
`fold` can be surprisingly heavy! The maximally efficient/paranoid thing
would be to fold each expression one time, in the constant folding rule,
and then store the result as a `Literal`. But this PR doesn't do that
because it's a big change. Instead, it creates the infrastructure for
tracking memory usage for folding as plugs it into as many places as
possible. That's not perfect, but it's better.
This infrastructure limit the allocations of fold similar to the
`CircuitBreaker` infrastructure we use for values, but it's different
in a critical way: you don't manually free any of the values. This is
important because the plan itself isn't `Releasable`, which is required
when using a real CircuitBreaker. We could have tried to make the plan
releasable, but that'd be a huge change.
Right now there's a single limit of 5% of heap per query. We create the
limit at the start of query planning and use it throughout planning.
There are about 40 places that don't yet use it. We should get them
plugged in as quick as we can manage. After that, we should look to the
maximally efficient/paranoid thing that I mentioned about waiting for
constant folding. That's an even bigger change, one I'm not equipped
to make on my own.
This change introduces optional source filtering directly within source loaders (both synthetic and stored).
The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently.
By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.
This update also modifies the get document API to apply source filters earlier—directly through the source loader.
The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits),
and source filtering is always applied as the final step.
A follow-up will be required to ensure careful handling of all search-related scenarios.
This change loads all the modules and creates the module layers for plugins prior to entitlement
checking during the 2nd phase of bootstrap initialization. This will allow us to know what modules exist
for both validation and checking prior to actually loading any plugin classes (in a follow up change).
There are now two classes:
PluginsLoader which does the module loading and layer creation
PluginsService which uses a PluginsLoader to create the main plugin classes and start the plugins
Static fields dont do well in Gradle with configuration cache enabled.
- Use buildParams extension in build scripts
- Keep BuildParams.ci for now for easy serverless migration
- Tweak testing doc
Noticed during a code review that added yet another one of these:
We have quite a few instances of duplicate noop implementations,
lets make tests a little less verbose here.
Technically the constant is test-only but it felt right to just leave it
on the interface.
* Add benchmark for IndexNameExpressionResolver
* Extract IndicesRequest in a local class
* Added one more benchmark to capture a mixed request
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.
This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
The most relevant ES changes that upgrading to Lucene 10 requires are:
- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area
Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
This speeds up grouping by bytes valued fields (keyword, text, ip, and
wildcard) when the input is an ordinal block:
```
bytes_refs 22.213 ± 0.322 -> 19.848 ± 0.205 ns/op (*maybe* real, maybe noise. still good)
ordinal didn't exist -> 2.988 ± 0.011 ns/op
```
I see this as 20ns -> 3ns, an 85% speed up. We never hard the ordinals
branch before so I'm expecting the same performance there - about 20ns
per op.
This also speeds up grouping by a pair of byte valued fields:
```
two_bytes_refs 83.112 ± 42.348 -> 46.521 ± 0.386 ns/op
two_ordinals 83.531 ± 23.473 -> 8.617 ± 0.105 ns/op
```
The speed up is much better when the fields are ordinals because hashing
bytes is comparatively slow.
I believe the ordinals case is quite common. I've run into it in quite a
few profiles.
Lots of effectively singleton objects here and fields that can be made
static, saves a little more on the per-index overhead and might reveal
further simplifications.
Final (I wish) part of
https://github.com/elastic/elasticsearch/issues/99815 Also, fixes
https://github.com/elastic/elasticsearch/issues/113916
## Steps 1. Migrate TDigest classes to use a custom Array
implementation. Temporarily use a simple array wrapper
(https://github.com/elastic/elasticsearch/pull/112810) 2. Implement
CircuitBreaking in the `WrapperTDigestArrays` class. Add
Releasable/AutoCloseable and ensure everything is closed
(https://github.com/elastic/elasticsearch/pull/113105) 3. Pass the
CircuitBreaker as a parameter to TDigestState from wherever it's being
used (https://github.com/elastic/elasticsearch/pull/113387) - ESQL:
Pass a real CB - Other aggs: Use the deprecated methods on
`TDigestState`, that will use a No-op CB instead 4. Account remaining
TDigest classes size ("SHALLOW_SIZE") (This PR)
Every step should be safely mergeable to main: - The first and second
steps should have no impact. - The third and fourth ones will start
increasing the CB count partially.
## Remarks As TDigests are releasable now, I had to refactor all tests,
adding try-with-resources or direct close() calls. That added a lot of
changes, but most of them are trivial.
Outside of it, in ESQL, TDigestStates are closed now. Old aggregations
don't close them, as it's not trivial. However, as they are using the
NoopCircuitBreaker, there's no problem with it. There's nothing to be
closed.
## _Remarks 2_ I tried to follow the same pattern in how everything is
accounted. On each TDigest class: - Static constant "SHALLOW_SIZE" with
the object weight - Field `AtomicBoolean closed` to ensure indempotent
`close()` - Static `create()` method that accounts the SHALLOW_SIZE, and
returns a new isntance. And the important part: On exception, it
discounts SHALLOW_SIZE again - A `ramBytesUsed()` (Accountable
interface), barely used for anything really, but some assertions I
believe - A constructor, that closes everything it created on exception
(If it creates an array, and the next array surpasses the CB limit, the
first one must be closed) - And a close() that will, well, close
everything and discount SHALLOW_SIZE
A lot of steps to make sure everything works well in this multi-level
structure, but I believe the result was quite clean