This commit changes the signature of InternalAggregation#buildAggregations(long[]) to
InternalAggregation#buildAggregations(LongArray) to avoid allocations of humongous arrays.
* Refactor build params for FieldMapper
* more mappers and tests
* more mappers
* more mappers
* spotless
* spotless
* stored by default
* Revert "stored by default"
This reverts commit bbd247d64b.
* restore storeIgnored
* sync
* list valid values for SourceKeepMode
* small refactoring
* spotless
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while the MappedFieldType name does.
We have renamed Mapper.Builder#name to leafName (#109971) and Mapper#simpleName to leafName (#110030). This commit renames Mapper#name to fullPath for clarity
This required some adjustments in FieldAliasMapper to avoid confusion between the existing path method and fullPath. I renamed path to targetPath for clarity.
ObjectMapper already had a fullPath method that returned name, and was effectively a copy of name, so it could be removed.
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does. We have method called simpleName to signal that, but leafName signals that more clearly and aligns with
the name we have recently introduced in Mapper.Builder (renamed from name to leafName).
Relates to #109971
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does.
During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes.
We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead.
This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time).
This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value.
Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`).
Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.
Here we extract the logic to populate metadata fields such as _ignored, _routing, _size and the deprecated _type into FetchFieldsPhase so that we can use the ValueFetcher interface to retrieve field values. This allows us to fetch values no matter if the Mapper uses stored or doc values.
To simplify the migration away from version based skip checks in YAML specs,
this PR adds a synthetic version feature `gte_vX.Y.Z` for any version at or before 8.14.0.
New test specs for 8.14 or later are expected to use respective new cluster features,
or a test-only feature supplied via ESRestTestCase#createAdditionalFeatureSpecifications
if sufficient.
Supporting non-keyword fields requires updating non-keyword fields in
the routing path to be included in routing calculations. Routing is
performed in coordinating nodes that lack mappings (or mappings haven't
been created yet, for dynamically-defined dimensions), so the routing
hash they calculate are passed to data nodes and stored in a new fields,
namely _ts_routind_hash. This is included in the _id field, in turn, so
that it can consistently reach the right shard for get-by-id and
delete-by-id operations.
A few interesting points:
- The hash is passed from the coordinating to data nodes using the `routing` field in `IndexRequest`; adding another field to the latter requires updating dozens of classes.
- We explicitly skip (double-) storing the hash to the routing field, as the latter is not optimized for storage using the TSDB codec.
- The routing hash may not be available in Translog operations, it can then be retrieved from the `id` prefix.
Related to https://github.com/elastic/elasticsearch/issues/103567
This commit enables inter-segment search concurrency for terms aggs, when the cardinality of the field being aggregated on is lower than the shard size. This is to avoid precision errors compared that would be caused by parallelizing the executor across slices for fields with high cardinality.
For terms agg that are ordered by key, we still take cardinality into account to parallelize only low cardinality fields and avoid performance overhead caused by parallelizing high cardinality fields.
Same as #101175, shorten `client().prepareIndex(index)` and
`client().prepareIndex().setIndex(index)` via a test utility.
Saves lots of code now and sets up some follow-up simplifcations.
Cleaning this up a little even though it's still quite horrible.
`.get()` in this API actually means `actionGet()` so to speak.
I think a good first step to cleaning this up is to at least reduce
the duplication though and save 1k lines.
`ExternalTestCluster` doesn't really make sense now that the transport
client is removed. We only use it in the ML integ test suite and it'd be
good to avoid expanding its usage further, so this commit deprecates it
and removes the functionality in `ESIntegTestCase` that might quietly
switch to using it in a new test suite if running with certain system
properties.
Relates #49582
Remove `assertSearchResponse` which was just an alias for
`assertNoFailures` and then cleanup many spots in the result
by combining the hit count and no failure assertion into a single
method.
follow-up to #100966
Follow up to #100966 introducing new combined assertion `assertSearchHitsWithoutFailures`
to combine no-failure, count, and id assertions into one block.
We'd like to make `SearchResponse` reference counted and pooled but there are around 6k
instances of tests that create a `SearchResponse` local variable that would need to be
released manually to avoid leaks in the tests.
This does away with about 10% of these spots by adding an override for `assertHitCount`
that handles the actual execution of the search request and its release automatically
and making use of it in all spots where the `.get()` on the request build could be inlined
semi-automatically and in a straight-forward fashion without other code changes.
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
As part of our effort to increase the supportability of Elasticsearch,
this PR changes many aggregations errors from being 500 class (which is
the default for `AggregationExecutionException`) to 400 class (which is
the default for `IllegalArgumentException`). All of these cases are
errors which should not be retried, as they are failing directly related
to the content of the request and/or state of the index.
There are definitely more cases where we are returning an incorrect
error code, but for this PR I focused on just changing the low hanging
fruit.
Data-stream mappings require a @timestamp field to be present and configured
as a date with a specific set of parameters. The index-wide setting of
ignore_malformed can cause problems here if it is set to true, because it needs
to be false for the @timestamp field.
This commit detects if a set of mappings is configured for a datastream by checking
for the presence of a DataStreamTimestampFieldMapper metadata field, and passes
that information on during Mapper construction as part of the MapperBuilderContext.
DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp
field, and if it is, sets ignore_malformed to false.
Relates to #96051
Constants for TransportVersion currently live alongeside the class
definition. This has been fine since there was only one set of
constants. However, to support serverless, some constants will need to
be defined elsewhere.
This commit moves the existing constants to a new holder class,
TransportVersions. It is almost entirely mechanical, using IntelliJ move
members. The only non mechanical part was slightly shifting how CURRENT
is found, defining a LATEST in TransportVersions that is automatically
calculated (since we already have it, no need to manually define it).
Hide the creation of the index searcher from the implementers by changing the signature of
AggregatorTestCase#searchAndReduce and AggregatorTestCase#createAggregationContext to take
an IndexReader instead of an IndexSearcher.
When the subobject property is set to false and we encounter an object
while parsing we need a way to understand if its FieldMapper is able to
parse an object. If that's the case we can provide the entire object to
the FieldMapper otherwise its name becomes the part of the dotted field
name of each internal value.
This has being achieved by adding the `supportsParsingObject()` method
to the `FieldMapper` class. This method defaults to `false` since the
majority of FieldMappers do not support parsing objects and is
overwritten to return `true` by the ones that do support objects.
Lots of spots where we did weird things around streams like redundant stream creation, redundant collecting
before adding all the collected elements to another collection or so, redundant streams for joining strings
and using less efficient `Collectors.toList` and in a few cases also incorrectly relying on the result being mutable.
Replacing the remaining usages that I could automatically replace
and a couple that I did by hand in this PR.
Also, added the same shortcut to the single node tests to save some
duplication there.