* Update Gradle Wrapper to 8.2 (#96686)
- Convention usage has been deprecated and was fixed in our build files
- Fix test dependencies and deprecation
The NodePath inner class of NodeEnvironment represents path.data entries
of the node. The name however is sometimes confusing since these are the
data paths, and there is normally no singular concept of "node path" (an
installation has several different paths). This commit renames NodePath
to DataPath, as well as all methods and variables referring to it, so
that the more general term "node paths" can be utilized for something
node wide: the paths in environment (in a future PR).
backport of #86942
Backport of #79001 to 7.x branch.
Move shardsWithState(...) methods from RoutingNodes to RoutingNodesHelper helper class
in the test framework library.
This is a large change, but the majority of this change is changing tests to use
`RoutingNodesHelper#shardsWithState(...)` method. The changes to `ReactiveStorageDeciderService`
are the only changes to production code.
Originates from #78959
Today the search action send the full list of original indices on every shard request.
This change restricts the list to the concrete index of the shard or the alias that was used to resolve it.
Relates #78314
This change raises a TaskCancelledException to stop the search query if it is detected that the SearchTask has been cancelled during the reduce phase.
Issue: #71021
Co-authored-by: Daniel Hsu <daniel.hsu7@gmail.com>
Co-authored-by: Igor Motov <igor@motovs.org>
* [ML] Text/Log categorization multi-bucket aggregation (#71752)
This commit adds a new multi-bucket aggregation: `categorize_text`
The aggregation follows a similar design to significant text in that it reads from `_source`
and re-analyzes the the text as it is read.
Key difference is that it does not use the indexed field's analyzer, but instead relies on
the `ml_standard` tokenizer with specialized ML token filters. The tokenizer + filters are the
same that machine learning categorization anomaly jobs utilize.
The high level logical flow is as follows:
- at each shard, read in the text field with a custom analyzer using `ml_standard` tokenizer
- Read in the particular tokens from the analyzer
- Feed these tokens to a token tree algorithm (an adaptation of the drain categorization algorithm)
- Gather the individual log categories (the leaf nodes), sort them by doc_count, ship those buckets to be merged
- Merge all buckets that have the EXACT same key
- Once all buckets are merged, pass those keys + counts to a new token tree for additional merging
- That tree builds the final buckets and that is returned to the user
Algorithm explanation:
- Each log is parsed with the ml-standard tokenizer
- each token is passed into a token tree
- For `max_match_token` each token is stored in the tree and at `max_match_token+1` (or `len(tokens)`) a log group is created
- If another log group exists at that leaf, merge it if they have `similarity_threshold` percentage of tokens in common
- merging simply replaces tokens that are different in the group with `*`
- If a layer in the tree has `max_unique_tokens` we add a `*` child and any new tokens are passed through there. Catch here is that on the final merge, we first attempt to merge together subtrees with the smallest number of documents. Especially if the new sub tree has more documents counted.
## Aggregation configuration.
Here is an example on some openstack logs
```js
POST openstack/_search?size=0
{
"aggs": {
"categories": {
"categorize_text": {
"field": "message", // The field to categorize
"similarity_threshold": 20, // merge log groups if they are this similar
"max_unique_tokens": 20, // Max Number of children per token position
"max_match_token": 4, // Maximum tokens to build prefix trees
"size": 1
}
}
}
}
```
This will return buckets like
```json
"aggregations" : {
"categories" : {
"buckets" : [
{
"doc_count" : 806,
"key" : "nova-api.log.1.2017-05-16_13 INFO nova.osapi_compute.wsgi.server * HTTP/1.1 status len time"
}
]
}
}
```
* fixing for backport
* fixing test after backport
Backports the following PRs:
* Add dimension mapping parameter (#74450)
Added the dimension parameter to the following field types:
keyword
ip
Numeric field types (integer, long, byte, short)
The dimension parameter is of type boolean (default: false) and is used
to mark that a field is a time series dimension field.
Relates to #74014
* Add constraints to dimension fields (#74939)
This PR adds the following constraints to dimension fields:
It must be an indexed field and must has doc values
It cannot be multi-valued
The number of dimension fields in the index mapping must not be more than 16. This should be configurable through an index property (index.mapping.dimension_fields.limit)
keyword fields cannot be more than 1024 bytes long
keyword fields must not use a normalizer
Based on the code added in PR #74450
Relates to #74660
* Expand DocumentMapperTests (#76368)
Adds a test for setting the maximum number of dimensions setting and
tests the names and types of the metadata fields in the index.
Previously we just asserted the count of metadata fields. That made it
hard to read failures.
* Fix broken test for dimension keywords (#75408)
Test was failing because it was testing 1024 bytes long keyword and assertion was failing.
Closes#75225
* Checkstyle
* Add time_series_metric parameter (#76766)
This PR adds the time_series_metric parameter to the following field types:
Numeric field types
histogram
aggregate_metric_double
* Rename `dimension` mapping parameter to `time_series_dimension` (#78012)
This PR renames dimension mapping parameter to time_series_dimension to make it consistent with time_series_metric parameter (#76766)
Relates to #74450 and #74014
* Add time series params to `unsigned_long` and `scaled_float` (#78204)
Added the time_series_metric mapping parameter to the unsigned_long and scaled_float field types
Added the time_series_dimension mapping parameter to the unsigned_long field type
Fixes#78100
Relates to #76766, #74450 and #74014
Co-authored-by: Nik Everett <nik9000@gmail.com>
* Memory efficient xcontent filtering (backport of #77154)
I found myself needing support for something like `filter_path` on
`XContentParser`. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(#25168) so I put together a quick benchmark comparing the source
filtering that we do in `_search`.
Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:
```
Benchmark (excludes) (includes) (source) Mode Cnt Score Error Units
FetchSourcePhaseBenchmark.filterObjects message short avgt 5 2360.342 ± 4.715 ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder message short avgt 5 2010.278 ± 15.042 ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser message short avgt 5 1588.446 ± 18.593 ns/op
```
The top line is the way we filter now. The middle line is adding a
filter to `XContentBuilder` - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.
This isn't particularly impresive. 33% *sounds* great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.
But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:
```
Benchmark (excludes) (includes) (source) Mode Cnt Score Error Units
FetchSourcePhaseBenchmark.filterObjects message one_4k_field avgt 5 5957.128 ± 117.402 ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder message one_4k_field avgt 5 4999.073 ± 96.003 ns/op
FetchSourcePhaseBenchmark.filterXContentonParser message one_4k_field avgt 5 3261.478 ± 48.879 ns/op
```
That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!
But have a look at how things come out when you want a single field from
a 4 *megabyte* document:
```
Benchmark (excludes) (includes) (source) Mode Cnt Score Error Units
FetchSourcePhaseBenchmark.filterObjects message one_4m_field avgt 5 8266343.036 ± 176197.077 ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder message one_4m_field avgt 5 6227560.013 ± 68306.318 ns/op
FetchSourcePhaseBenchmark.filterXContentonParser message one_4m_field avgt 5 1617153.472 ± 80164.547 ns/op
```
These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.
Take a look at what you get when I turn on gc metrics:
```
FetchSourcePhaseBenchmark.filterObjects message one_4m_field avgt 5 7036097.561 ± 84721.312 ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate message one_4m_field avgt 5 2166.613 ± 25.975 MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder message one_4m_field avgt 5 6104595.992 ± 55445.508 ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message one_4m_field avgt 5 2496.978 ± 22.650 MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser message one_4m_field avgt 5 1614980.846 ± 31716.956 ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate message one_4m_field avgt 5 1.755 ± 0.035 MB/sec
```
* Fixup benchmark for 7.x
When performing incremental reductions, 0 value of docCountError may mean that
the error was not previously calculated, or that the error was indeed previously
calculated and its value was 0. We end up rejecting true values set to 0 this
way. This may lead to wrong upper bound of error in result. To fix it, this PR
makes docCountError nullable. null values mean that error was not calculated
yet.
Fixes#40005, #75667
Co-authored-by: Nikita Glashenko <nikita.glashenko@gmail.com>
Adds minimal fields API support to sort and score scripts.
Example: `field('myfield').getValue(123)` where `123` is the default if the field has no values.
Refs: #61388
Backport: 6c02a6c
Change the formatter config to sort / order imports, and reformat the
codebase. We already had a config file for Eclipse users, so Spotless now
uses that.
The "Eclipse Code Formatter" plugin ought to be able to use this file as
well for import ordering, but in my experiments the results were poor.
Instead, use IntelliJ's `.editorconfig` support to configure import
ordering.
I've also added a config file for the formatter plugin.
Other changes:
* I've quietly enabled the `toggleOnOff` option for Spotless. It was
already possible to disable formatting for sections using the markers
for docs snippets, so enabling this option just accepts this reality
and makes it possible via `formatter:off` and `formatter:on` without
the restrictions around line length. It should still only be used as
a very last resort and with good reason.
* I've removed mention of the `paddedCell` option from the contributing
guide, since I haven't had to use that option for a very long time. I
moved the docs to the spotless config.
Modularization of the JDK has been ongoing for several years. Recently
in Java 16 the JDK began enforcing module boundaries by default. While
Elasticsearch does not yet use the module system directly, there are
some side effects even for those projects not modularized (eg #73517).
Before we can even begin to think about how to modularize, we must
Prepare The Way by enforcing packages only exist in a single jar file,
since the module system does not allow packages to coexist in multiple
modules.
This commit adds a precommit check to the build which detects split
packages. The expectation is that we will add the existing split
packages to the ignore list so that any new classes will not exacerbate
the problem, and the work to cleanup these split packages can be
parallelized.
relates #73525
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
backport #73909
FieldTypeLookup and MappingLookup expose the getMatchingFieldTypes method to look up matching field type by a string pattern. We have migrated ExistsQueryBuilder to instead rely on getMatchingFieldNames, hence we can go ahead and remove the remaining usages and the method itself.
The remaining usages are to find specific field types from the mappings, specifically to eagerly load global ordinals and for the join field type. These are operations that are performed only once when loading the mappings, and may be refactored to work differently in the future. For now, we remove getMatchingFieldTypes and rather call for the two mentioned scenarios getMatchingFieldNames(*) and then getFieldType for each of the returned field name. This is a bit wasteful but performance can be sacrificed for these scenarios in favour of less code to maintain.
Sometimes our fancy "run this agg as a Query" optimizations end up
slower than running the aggregation in the old way. We know that and use
heuristics to dissable the optimization in that case. But it turns out
that the process of running the heuristics itself can be slow, depending
on the query. Worse, changing the heuristics requires an upgrade, which
means waiting. If the heurisics make a terrible choice folks need a
quick way out. This adds such a way: a cluster level setting that
contains a list of queries that are considered "too expensive" to try
and optimize. If the top level query contains any of those queries we'll
disable the "run as Query" optimization.
The default for this settings is wildcard and term-in-set queries, which
is fairly conservative. There are certainly wildcard and term-in-set
queries that the optimization works well with, but there are other queries
of that type that it works very badly with. So we're being careful.
Better, you can modify this setting in a running cluster to disable the
optimization if we find a new type of query that doesn't work well.
Closes#73426
MappingLookup has a method simpleMatchToFieldName that attempts
to return all field names that match a given pattern; if no patterns match,
then it returns a single-valued collection containing just the pattern that
was originally passed in. This is a fairly confusing semantic.
This PR replaces simpleMatchToFullName with two new methods:
* getMatchingFieldNames(), which returns a set of all mapped field names
that match a pattern. Calling getFieldType() with a name returned by
this method is guaranteed to return a non-null MappedFieldType
* getMatchingFieldTypes, that returns a collection of all MappedFieldTypes
in a mapping that match the passed-in pattern.
This allows us to clean up several call-sites because we know that
MappedFieldTypes returned from these calls will never be null. It also
simplifies object field exists query construction.
backports #72030 to 7.x
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
This fixes the `global` aggregator when `profile` is enabled. It does so
by removing all of the special case handling for `global` aggs in
`AggregationPhase` and having the global aggregator itself perform the
scoped collection using the same trick that we use in filter-by-filter
mode of the `filters` aggregation.
Closes#71098
Write out the formatter config using the latest Eclipse. This has the
effect of configuring assertion formatting properly, which has improved
how some of our assertion messsages are formatted. Also reconfigure how
annotations are formatted, so that they are correctly line-wrapped.
For bulk operations that fall back to hotspot intrinsic code (reading short, int, long)
using this stream brings a massive speedup. The added benchmark for reading `long` values
sees a ~100x speedup in local benchmarking and the vLong read benchmark still sees a slightly
under ~10x speedup.
Also, this PR moves creation of the `StreamInput` out of the hot benchmark loop for all the bytes
reference benchmarks to make the benchmark less noisy and more practically useful.
(the `readLong` case using intrinsic operations is so fast that it wouldn't even show up in a profile relative to
instantiating the stream otherwise).
Relates work in #71181
Same optimization as in #71181 (also used by buffering Lucene DataInput implementations) but for the variable length encodings.
Benchmarks show a ~50% speedup for the benchmarked mix of values for `vLong`. Generally this change helps the most with large values but shows a slight speedup even for the 1 byte length case by avoiding some indirection and bounds checking.
Relates #70800
This reproduces the issue reported in #70800 and demonstrates that the fix in #71181 brings
about a 5x speedup for reading `long` from a bytes reference stream when backed by a paged bytes
reference.
This commit adds a script parameter to long and double fields that makes
it possible to calculate a value for these fields at index time. It uses the same
script context as the equivalent runtime fields, and allows for multiple index-time
scripted fields to cross-refer while still checking for indirection loops.
This commit removes the metadata field _parent_join
that was needed to ensure that only one join field is used in a mapping.
It is replaced with a validation at the field level.
This change also fixes in [bug](https://github.com/elastic/kibana/issues/92960) in the handling of parent join fields in _field_caps.
This metadata field throws an unexpected exception in [7.11](https://github.com/elastic/elasticsearch/pull/63878)
when checking if the field is aggregatable.
That's now fixed since this unused field has been removed.
This saves 16 bytes of memory per bucket for some aggregations.
Specifically, it kicks in when there is a parent bucket and we have a good
estimate on its upper bound cardinality, and we have good estimate on the
per-bucket cardinality of this aggregation, and both those upper bounds
will fit into a single `long`.
That sounds unlikely, but there is a fairly common case where we have it:
a `date_histogram` followed by a `terms` aggregation powered by global
ordinals. This is common enough that we already had at least two rally
operations for it:
* `date-histo-string-terms-via-global-ords`
* `filtered-date-histo-string-terms-via-global-ords`
Running those rally tracks shows that the space savings yields a small
but statistically significant perform bump. The 90th percentile service
time drops by about 4% in the unfiltered case and 1% for the filtered
case. That's not great but it good to know saving 16 bytes doesn't slow
us down.
```
| 50th percentile latency | date-histo | 3185.77 | 3028.65 | -157.118 | ms |
| 90th percentile latency | date-histo | 3237.07 | 3101.32 | -135.752 | ms |
| 100th percentile latency | date-histo | 3270.53 | 3178.7 | -91.8319 | ms |
| 50th percentile service time | date-histo | 3181.55 | 3024.32 | -157.238 | ms |
| 90th percentile service time | date-histo | 3232.91 | 3097.67 | -135.238 | ms |
| 100th percentile service time | date-histo | 3266.63 | 3175.08 | -91.5494 | ms |
| 50th percentile latency | filtered-date-histo | 1349.22 | 1331.94 | -17.2717 | ms |
| 90th percentile latency | filtered-date-histo | 1402.71 | 1383.7 | -19.0131 | ms |
| 100th percentile latency | filtered-date-histo | 1412.41 | 1397.7 | -14.7139 | ms |
| 50th percentile service time | filtered-date-histo | 1345.18 | 1326.2 | -18.9806 | ms |
| 90th percentile service time | filtered-date-histo | 1397.24 | 1378.14 | -19.1031 | ms |
| 100th percentile service time | filtered-date-histo | 1406.69 | 1391.63 | -15.0529 | ms |
```
The microbenchmarks for `LongKeyedBucketOrds`, the interface we're
targeting, show a performance boost on the method in the path of about
13%. This is obvious not the entire hot path, given that th 13% savings
translated to a 4% performance savings over the whole agg. But its
something.
```
Benchmark Mode Cnt Score Error Units
multiBucketMany avgt 5 10.038 ± 0.009 ns/op
multiBucketManySmall avgt 5 8.738 ± 0.029 ns/op
singleBucketIntoMulti avgt 5 7.701 ± 0.073 ns/op
singleBucketIntoSingleImmutableBimorphicInvocation avgt 5 6.160 ± 0.029 ns/op
singleBucketIntoSingleImmutableMonmorphicInvocation avgt 5 6.571 ± 0.043 ns/op
singleBucketIntoSingleMutableBimorphicInvocation avgt 5 7.714 ± 0.010 ns/op
singleBucketIntoSingleMutableMonmorphicInvocation avgt 5 7.459 ± 0.017 ns/op
```
While I was touching the JMH benchmarks for `LongKeyedBucketOrds` I took
the opportunity to try and make the runs that collect from a single
bucket more comparable to the ones that collect from many buckets. It
only seemed fair.
This adds a microbenchmark running our traditional `ScriptScoreQuery`
race. This races Lucene Expressions, Painless, and a hand rolled
implementation of `ScoreScript`. Through the magic of the async
profiler, this revealed a few bottlenecks that hit painless that we
likely can fix! Happy times.
Co-authored-by: Rene Groeschke <rene@breskeby.com>
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
Upgrade JMH to latest (1.26) to pick up its async profiler integration
and update the documentation to include instructions to running the
async profiler and making pretty pretty flame graphs.
This lowers the contention on the `REQUEST` circuit breaker when building
many aggregations on many threads by preallocating a chunk of breaker
up front. This cuts down on the number of times we enter the busy loop
in `ChildMemoryCircuitBreaker.limit`. Now we hit it one time when building
aggregations. We still hit the busy loop if we collect many buckets.
We let the `AggregationBuilder` pick size of the "chunk" that we
preallocate but it doesn't have much to go on - not even the field types.
But it is available in a convenient spot and the estimates don't have to
be particularly accurate.
The benchmarks on my 12 core desktop are interesting:
```
Benchmark (breaker) Mode Cnt Score Error Units
sum noop avgt 10 1.672 ± 0.042 us/op
sum real avgt 10 4.100 ± 0.027 us/op
sum preallocate avgt 10 4.230 ± 0.034 us/op
termsSixtySums noop avgt 10 92.658 ± 0.939 us/op
termsSixtySums real avgt 10 278.764 ± 39.751 us/op
termsSixtySums preallocate avgt 10 120.896 ± 16.097 us/op
termsSum noop avgt 10 4.573 ± 0.095 us/op
termsSum real avgt 10 9.932 ± 0.211 us/op
termsSum preallocate avgt 10 7.695 ± 0.313 us/op
```
They show pretty clearly that not using the circuit breaker at all is
faster. But we can't do that because we don't want to bring the node
down on bad aggs. When there are many aggs (termsSixtySums) the
preallocation claws back much of the performance. It even helps
marginally when there are two aggs (termsSum). For a single agg (sum)
we see a 130 nanosecond hit. Fine.
But these values are all pretty small. At best we're seeing a 160
microsecond savings. Not so on a 160 vCPU machine:
```
Benchmark (breaker) Mode Cnt Score Error Units
sum noop avgt 10 44.956 ± 8.851 us/op
sum real avgt 10 118.008 ± 19.505 us/op
sum preallocate avgt 10 241.234 ± 305.998 us/op
termsSixtySums noop avgt 10 1339.802 ± 51.410 us/op
termsSixtySums real avgt 10 12077.671 ± 12110.993 us/op
termsSixtySums preallocate avgt 10 3804.515 ± 1458.702 us/op
termsSum noop avgt 10 59.478 ± 2.261 us/op
termsSum real avgt 10 293.756 ± 253.854 us/op
termsSum preallocate avgt 10 197.963 ± 41.578 us/op
```
All of these numbers are larger because we're running all the CPUs
flat out and we're seeing more contention everywhere. Even the "noop"
breaker sees some contention, but I think it is mostly around memory
allocation. Anyway, with many many (termsSixtySums) aggs we're looking
at 8 milliseconds of savings by preallocating. Just by dodging the busy
loop as much as possible. The error in the measurements there are
substantial. Here are the runs:
```
real:
Iteration 1: 8679.417 ±(99.9%) 273.220 us/op
Iteration 2: 5849.538 ±(99.9%) 179.258 us/op
Iteration 3: 5953.935 ±(99.9%) 152.829 us/op
Iteration 4: 5763.465 ±(99.9%) 150.759 us/op
Iteration 5: 14157.592 ±(99.9%) 395.224 us/op
Iteration 1: 24857.020 ±(99.9%) 1133.847 us/op
Iteration 2: 24730.903 ±(99.9%) 1107.718 us/op
Iteration 3: 18894.383 ±(99.9%) 738.706 us/op
Iteration 4: 5493.965 ±(99.9%) 120.529 us/op
Iteration 5: 6396.493 ±(99.9%) 143.630 us/op
preallocate:
Iteration 1: 5512.590 ±(99.9%) 110.222 us/op
Iteration 2: 3087.771 ±(99.9%) 120.084 us/op
Iteration 3: 3544.282 ±(99.9%) 110.373 us/op
Iteration 4: 3477.228 ±(99.9%) 107.270 us/op
Iteration 5: 4351.820 ±(99.9%) 82.946 us/op
Iteration 1: 3185.250 ±(99.9%) 154.102 us/op
Iteration 2: 3058.000 ±(99.9%) 143.758 us/op
Iteration 3: 3199.920 ±(99.9%) 61.589 us/op
Iteration 4: 3163.735 ±(99.9%) 71.291 us/op
Iteration 5: 5464.556 ±(99.9%) 59.034 us/op
```
That variability from 5.5ms to 25ms is terrible. It makes me not
particularly trust the 8ms savings from the report. But still,
the preallocating method has much less variability between runs
and almost all the runs are faster than all of the non-preallocated
runs. Maybe the savings is more like 2 or 3 milliseconds, but still.
Or maybe we should think of hte savings as worst vs worst? If so its
19 milliseconds.
Anyway, its hard to measure how much this helps. But, certainly some.
Closes#58647
* Move tasks in build scripts to task avoidance api (#64046)
- Some trivial cleanup on build scripts
- Change task referencing in build scripts to use task avoidance api
where replacement is trivial.
Determines the shard size of shards before allocating shards that are
recovering from snapshots. It ensures during shard allocation that the
target node that is selected as recovery target will have enough free
disk space for the recovery event. This applies to regular restores,
CCR bootstrap from remote, as well as mounting searchable snapshots.
The InternalSnapshotInfoService is responsible for fetching snapshot
shard sizes from repositories. It provides a getShardSize() method
to other components of the system that can be used to retrieve the
latest known shard size. If the latest snapshot shard size retrieval
failed, the getShardSize() returns
ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE. While
we'd like a better way to handle such failures, returning this value
allows to keep the existing behavior for now.
Note that this PR does not address an issues (we already have today)
where a replica is being allocated without knowing how much disk
space is being used by the primary.
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
This commit allows coordinating node to account the memory used to perform partial and final reduce of
aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save
and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final
reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown
if exceeds the maximum memory allowed in this breaker.
This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced.
This estimation can be completely off for some aggregations but it is corrected with the real size after
the reduce completes.
If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations
and replace the estimation with the serialized size of the newly reduced result.
As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead
of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is
to [reduce the default batch reduce size](https://github.com/elastic/elasticsearch/issues/51857) of blocking
search request to a more sane number.
Closes#37182
This speeds up `StreamOutput#writeVInt` quite a bit which is nice
because it is *very* commonly called when serializing aggregations. Well,
when serializing anything. All "collections" serialize their size as a
vint. Anyway, I was examining the serialization speeds of `StringTerms`
and this saves about 30% of the write time for that. I expect it'll be
useful other places.
This way is faster, saving about 8% on the microbenchmark that rounds to
the nearest month. That is in the hot path for `date_histogram` which is
a very popular aggregation so it seems worth it to at least try and
speed it up a little.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
I've always been confused by the strange behavior that I saw when
working on #57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.
It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```
is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```
But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic avgt 10 6.468 ± 0.045 ns/op
immutable monomorphic avgt 10 6.756 ± 0.026 ns/op
mutable bimorphic avgt 10 9.741 ± 0.073 ns/op
mutable monomorphic avgt 10 8.190 ± 0.016 ns/op
```
So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.
Relates to #56487
* Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build
* Fix compile usages in 7.x branch
When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In #55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.
The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.
Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)
I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.
This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.
This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.
Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
This is a backport of #54803 for 7.x.
This pull request cherry picks the squashed commit from #54803 with the additional commits:
6f50c92 which adjusts master code to 7.x
a114549 to mute a failing ILM test (#54818)
48cbca1 and 50186b2 that cleans up and fixes the previous test
aae12bb that adds a missing feature flag (#54861)
6f330e3 that adds missing serialization bits (#54864)
bf72c02 that adjust the version in YAML tests
a51955f that adds some plumbing for the transport client used in integration tests
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
IDEs can sometimes run annotation processors that leave files in
`src/main/generated/**/*.java`, causing Spotless to complain. Even
though this path ought not to exist, exclude it anyway in order to avoid
spurious failures.
Add `:qa:os` and `:benchmarks` to the list of automatically formatted
projects, and apply some manual fix-ups to polish it up.
In particular, I noticed that `Files.write(...)` when passed a list will
automaticaly apply a UTF-8 encoding and write a newline after each line,
making it easier to use than FileUtils.append. It's even available from
1.8.
Also, in the Allocators class, a number of methods declared thrown exceptions that IntelliJ reported were never thrown, and as far as I could see this is true, so I removed the exceptions.
Backport of #52542.
This commit is part of issue #40366 to remove disabled Xlint warnings
from gradle files. In particular, it removes the Xlint exclusions from
the following files:
- benchmarks/build.gradle
- client/client-benchmark-noop-api-plugin/build.gradle
- x-pack/qa/rolling-upgrade/build.gradle
- x-pack/qa/third-party/active-directory/build.gradle
- modules/transport-netty4/build.gradle
For the first three files no code adjustments were needed. For
x-pack/qa/third-party/active-directory move the suppression at the code
level. For transport-netty4 replace the variable arguments with
ArrayLists and remove any redundant casts.
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.