This updates the gradle wrapper to 8.12
We addressed deprecation warnings due to the update that includes:
- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
(cherry picked from commit ba61f8c7f7)
* kNN vector rescoring for quantized vectors (#116663)
(cherry picked from commit 59967727cf)
# Conflicts:
# server/src/main/java/org/elasticsearch/search/vectors/KnnSearchBuilder.java
# x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRankBuilder.java
* FloatVectorValues have a different interface in this Lucene version
* Refactor build params for FieldMapper
* more mappers and tests
* more mappers
* more mappers
* spotless
* spotless
* stored by default
* Revert "stored by default"
This reverts commit bbd247d64b.
* restore storeIgnored
* sync
* list valid values for SourceKeepMode
* small refactoring
* spotless
This change allows querying the `index.mode` setting via a new
`_index_mode` metadata field, enabling APIs such as `field_caps` or
`resolve_indices` to target indices that are either time_series or logs
only. This approach avoids adding and handling a new parameter for
`index_mode` in these APIs. Both ES|QL and the `_search` API should also
work with this new field.
Introduce an optional k param for knn query
If k is not set, knn query has the previous behaviour:
- `num_candidates` docs is collected from each shard. This `num_candidates` docs
are used for combining with results with other queries and aggregations on each shard.
- docs from all shards are merged to produce the top global `size` results
If k is set, the behaviour instead is following:
- `k` docs is collected from each shard. This `k` docs are used for
combining results with other queries and aggregations on each shard.
- similarly, docs from all shards are merged to produce the top global `size`
results.
Having `k` param makes it more intuitive for users to address their needs.
They also don't need to care and can skip `num_candidates` param for this query
as it is of more internal details to tune how knn search operates.
Closes#108473
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while the MappedFieldType name does.
We have renamed Mapper.Builder#name to leafName (#109971) and Mapper#simpleName to leafName (#110030). This commit renames Mapper#name to fullPath for clarity
This required some adjustments in FieldAliasMapper to avoid confusion between the existing path method and fullPath. I renamed path to targetPath for clarity.
ObjectMapper already had a fullPath method that returned name, and was effectively a copy of name, so it could be removed.
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does. We have method called simpleName to signal that, but leafName signals that more clearly and aligns with
the name we have recently introduced in Mapper.Builder (renamed from name to leafName).
Relates to #109971
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does.
PR #103084 introduced an ability to return matched_queries during percolate
process for all percolator queries containing `_name` field.
But there was a bug with complex queries, as they were not rewritten before
obraining their Weight function. This fixes the bug by ensuring all
queries are first rewritten.
Closes#107176
To simplify the migration away from version based skip checks in YAML specs,
this PR adds a synthetic version feature `gte_vX.Y.Z` for any version at or before 8.14.0.
New test specs for 8.14 or later are expected to use respective new cluster features,
or a test-only feature supplied via ESRestTestCase#createAdditionalFeatureSpecifications
if sufficient.
Final step in #102030 ... actually makes `SearchHit` read a releasable bytes reference.
Does still fallback to copying to unrolled buffers here and there which can be removed in follow-ups where it's worth the effort (aggs being the most important one probably).
Hard to create very reliable benchmarks for this because all our macro-benchmarks are quite noisy. Running http logs and PMC though, there's a statistically significant reduction in GC and reduced tail latencies in most benchmarks.
The overhead for ref-counting these bytes isn't visible in profiling as far as I can tell and for large source values, no corresponding large `byte[]` are created any longer outside of the few remaining spots where we copy to pooled buffers.
closes#102657closes#102030
Return matched_queries for named queries in Percolator.
In a response, each hit together with
a `_percolator_document_slot` field will contain
`_percolator_document_slot_<slotNumber>_matched_queries` fields that will show
which sub-queries matched each percolated document.
Closes#10163
Part of the effort to fix search response leaks is to fix these. Fixed
all that I could easily find in tests. Production changes incoming once
the dependencies for those are fixed.
part of #102030 but no fancy utility here like for search responses
since we don't have so many use cases an none of them are tricky.
Same as #101175, shorten `client().prepareIndex(index)` and
`client().prepareIndex().setIndex(index)` via a test utility.
Saves lots of code now and sets up some follow-up simplifcations.
Cleaning this up a little even though it's still quite horrible.
`.get()` in this API actually means `actionGet()` so to speak.
I think a good first step to cleaning this up is to at least reduce
the duplication though and save 1k lines.
Another step towards ref counting search hits. This adds leak tracking to the search context. Required 2 fixes in the production code to not fail tests: sub aggregations need to be closed eventually, found it easiest to just tie this to the parent context. If we throw in the constructor of the context (we have tests for this case), we should release/close it still (it's just impossible to fix the leak tracking otherwise, also it seems to me that this is more correct anyway since we initialise resources in that constructor).
Other than that, just trivial test changes to make sure the contexts get closed everywhere.
This introduced a new knn query:
- knn query is executed during the Query phase similar to all other queries.
- No k parameter, k defaults to size
- num_candidates is a size of queue for candidates to consider while
search a graph on each shard
- For aggregations: "size" results are collected with total = size * shards.
Aggregations will see size * shards results.
- All filters from DSL are applied as post-filters, except: 1) alias filter
is applied as pre-filter or 2) a filter provided as a parameter
inside knn query.
While dot expansion is disabled when parsing percolator queries at index
time, as that would interfere with query parsing, we still use a wrapper parser
that is conservative about what methods it supports, assuming that
document parsing needs nextToken and not much more. Turns out that when
parsing queries instead, we need to support all the XContentParser
methods including map, list etc.
This commit adds a test for script score query parsing through document
parsing via percolator field mapper, and removes the limitations in the
wrapper parser when dots expansion is disabled.
Similar to the TransportVersions holder class, IndexVersions is the new
place to contain all constants for IndexVersion. This commit moves all
existing constants to the new class. It is purely mechanical.
Follow up to #100966 introducing new combined assertion `assertSearchHitsWithoutFailures`
to combine no-failure, count, and id assertions into one block.
We'd like to make `SearchResponse` reference counted and pooled but there are around 6k
instances of tests that create a `SearchResponse` local variable that would need to be
released manually to avoid leaks in the tests.
This does away with about 10% of these spots by adding an override for `assertHitCount`
that handles the actual execution of the search request and its release automatically
and making use of it in all spots where the `.get()` on the request build could be inlined
semi-automatically and in a straight-forward fashion without other code changes.
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
Data-stream mappings require a @timestamp field to be present and configured
as a date with a specific set of parameters. The index-wide setting of
ignore_malformed can cause problems here if it is set to true, because it needs
to be false for the @timestamp field.
This commit detects if a set of mappings is configured for a datastream by checking
for the presence of a DataStreamTimestampFieldMapper metadata field, and passes
that information on during Mapper construction as part of the MapperBuilderContext.
DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp
field, and if it is, sets ignore_malformed to false.
Relates to #96051
The copyTo builder is really hard to reason about when it comes to
mapper merging, because the `reset` method would actually mutate an
existing mapper. That seems dangerous and the whole thing is quite
inefficient as well. -> this PR just removes it and uses a copy
constructor for copy on write, avoiding instance creation on mapper
merges here and there and leaving no doubt about these things being
immutable.
Constants for TransportVersion currently live alongeside the class
definition. This has been fine since there was only one set of
constants. However, to support serverless, some constants will need to
be defined elsewhere.
This commit moves the existing constants to a new holder class,
TransportVersions. It is almost entirely mechanical, using IntelliJ move
members. The only non mechanical part was slightly shifting how CURRENT
is found, defining a LATEST in TransportVersions that is automatically
calculated (since we already have it, no need to manually define it).
The `StreamOutput` and `StreamInput` APIs are designed so that code
which serializes objects to the transport protocol aligns closely with
the corresponding deserialization code. However today
`StreamOutput#writeCollection` pairs up with a variety of methods on
`StreamInput`, including `readList`, `readSet`, and so on. These methods
are not obviously compatible with `writeCollection` unless you look at
the implementation, and that makes verifying transport protocol code
harder than it needs to be.
This commit renames these methods to `readCollectionAsList`,
`readCollectionAsSet`, and so on, to clarify that they are compatible
with `writeCollection`.
Relates
https://github.com/elastic/elasticsearch/pull/98971#issuecomment-1697289815
An optimization introduced in:
https://github.com/elastic/elasticsearch/pull/81985 changed percolator
query behavior.
Users can specify a percolator query which expands fields based on a
wildcard pattern. Just one example is `simple_query_string`, which
allows field names like `"text_*"`. The user expects that this field
name will expand to relevant mapped fields (e.g. "text_foo"). However,
if there are no documents indexed in those fields at the time when the
percolator query is indexed, it doesn't expand to the relevant fields.
Additionally at query time, we may skip expanding fields and not match
the relevant mapped fields if they are considered "empty" (e.g. has no
values in the shard). We should instead allow expansion by indicating
that the field may exist in the shard.
closes: https://github.com/elastic/elasticsearch/issues/98819
When the subobject property is set to false and we encounter an object
while parsing we need a way to understand if its FieldMapper is able to
parse an object. If that's the case we can provide the entire object to
the FieldMapper otherwise its name becomes the part of the dotted field
name of each internal value.
This has being achieved by adding the `supportsParsingObject()` method
to the `FieldMapper` class. This method defaults to `false` since the
majority of FieldMappers do not support parsing objects and is
overwritten to return `true` by the ones that do support objects.
This change swaps test code that directly creates IndexSearcher instances with LuceneTestCase#newSearcher calls
that have the advantage of randomly using concurrency and also randomly use assertion wrappers internally.
While this doesn't guarantee testing the concurrent code path, it should generally increase the likelihood of doing so.
Lots of spots where we did weird things around streams like redundant stream creation, redundant collecting
before adding all the collected elements to another collection or so, redundant streams for joining strings
and using less efficient `Collectors.toList` and in a few cases also incorrectly relying on the result being mutable.
Drying this up further and adding the same short-cut for single node
tests. Dealing with most of the spots that I could grab via automatic
refactorings.
Replacing the remaining usages that I could automatically replace
and a couple that I did by hand in this PR.
Also, added the same shortcut to the single node tests to save some
duplication there.
This commit changes access to the latest TransportVersion constant to
use a static method instead of a public static field. By encapsulating
the field we will be able to (in a followup) lazily determine what the
latest is, outside of clinit.
Motivated by looking into allocations of listeners in detail for shared cache benchmarking.
Wrapping a listener and using `listener::onFailure` as the failure callback means that we
have a reference to the listener from both the failure and the response handler.
If we use the approach used by the `.deleteGate*` methods, we can often save allocating
a response handler lambda or at least make the response handler cheaper.
We also save allocating the failure handler lambda.