Commit graph

14959 commits

Author SHA1 Message Date
Oleksandr Kolomiiets
8ea52ffd91
ignore_above default to 8191 for logsdb (#113442) (#115373)
In LogsDB we would like to use a default value of `8191` for the index-level setting
`index.mapping.ignore_above`. The value for `ignore_above` is the _character count_,
but Lucene counts bytes. Here we set the limit to `32766 / 4 = 8191` since UTF-8
characters may occupy at most 4 bytes.

(cherry picked from commit 521e4341d7)

# Conflicts:
#	server/src/main/java/org/elasticsearch/common/settings/Setting.java

Co-authored-by: Salvatore Campagna <93581129+salvatore-campagna@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-23 12:44:34 -07:00
David Kyle
aa558c8b16
[ML] Add pathc transport version for change to Get Inference Request (#115250) (#115446) 2024-10-24 05:58:34 +11:00
Panagiotis Bailis
65137030fd
Updating error handling for compound retrievers (#115277) (#115428)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-24 05:09:48 +11:00
Joe Gallo
552a4381e8
Refactor PipelineConfiguration#getVersion (#115423) (#115433) 2024-10-24 03:49:09 +11:00
Martijn van Groningen
3814cf4429
Sometimes delegate to SourceLoader in ValueSourceReaderOperator for required stored fields (#115114) (#115390)
If source is required by a block loader then the StoredFieldsSpec that gets populated should be enhanced by SourceLoader#requiredStoredFields(...) in ValuesSourceReaderOperator. Otherwise in case of synthetic source many stored fields aren't loaded, which causes only a subset of _source to be synthesized. For example when unmapped fields exist or field values that exceed configured ignore above will not appear is _source.

This happens when field types fallback to a block loader implementation that uses _source. The required field values are then extracted from the source once loaded.

This change also reverts the production code changes introduced via #114903. That change only ensured that _ignored_source field was added to the required list of stored fields. In reality more fields could be required. This change is better fix, since it handles also other cases and the SourceLoader implementation indicates which stored fields are needed.

Closes #115076
2024-10-23 21:06:58 +11:00
Joe Gallo
d6fe260728
Optimize IndexTemplateRegistry#clusterChanged (#115347) (#115366) 2024-10-23 12:03:38 +11:00
Lee Hinman
10f711d209
Revert "Add ResolvedExpression wrapper (#114592)" (#115317) (#115376)
This reverts commit 4c15cc0778.
This commit introduced an orders of magnitude regression when searching many shards.

(cherry picked from commit d9baf6f9db)

Co-authored-by: Armin Braun <me@obrown.io>
2024-10-23 11:22:10 +11:00
Keith Massey
9cf174f629
Adding support for simulate ingest mapping adddition for indices with mappings that do not come from templates (#115359) (#115369) 2024-10-23 09:00:31 +11:00
Joe Gallo
8f568b5649
Optimize downloader task executor (#115355) (#115365) 2024-10-23 08:36:13 +11:00
Joe Gallo
66f285e272
Optimize IngestService#resolvePipelinesFromIndexTemplates (#115348) (#115367) 2024-10-23 08:17:32 +11:00
elasticsearchmachine
7976313877 Bump versions after 7.17.25 release 2024-10-22 20:43:46 +00:00
Kostas Krikellas
f9a1561c88
Handle setting merge conflicts for overruling settings providers (#115217) (#115325)
* Handle setting merge conflicts for overruling settings providers

* spotless

* update TransportSimulateIndexTemplateAction

* update comment and add test

* fix flakiness

* fix flakiness

(cherry picked from commit e3c198a23a)
2024-10-22 20:06:53 +03:00
Panagiotis Bailis
c210cd477b
Adding validation for incompatibility of compound retrievers and scroll (#115106) (#115323) 2024-10-23 01:37:51 +11:00
Keith Massey
35f7efefd1
Adding support for additional mapping to simulate ingest API (#114742) (#115284) 2024-10-22 08:13:33 -05:00
Simon Cooper
a7c7004c28
Remove ChunkedToXContentHelper.array method, swap for ChunkedToXContentBuilder (#114319) (#115252)
Backport #114319 to 8.17
2024-10-22 11:11:16 +01:00
Jim Ferenczi
467e564d4c
[8.x] Add prefilters only once in the compound and text similarity retrievers (#115296)
* Add prefilters only once in the compound and text similarity retrievers (#114983)

This change ensures that the prefilters are propagated in the downstream retrievers only once.
It also removes the ability to extends `explainQuery` in the compound retriever. This is not needed
as the rank docs are now responsible for the explanation.

* Trigger Build
2024-10-22 20:03:13 +11:00
Mary Gouseti
21312daadf
Adjust failure store to work with TSDS (#114307) (#115294)
In this PR we add a test and we fix the issues we encountered when we
enabled the failure store for TSDS and logsdb.

**Logsdb** Logsdb worked out of the box, so we just added the test that
indexes with a bulk request a couple of documents and tests how they are
ingested.

**TSDS** Here it was a bit trickier. We encountered the following
issues:

- TSDS requires a timestamp to determine the write index of the data stream meaning the failure happens earlier than we have anticipated so far. We added a special exception to detect this case and we treat it accordingly.
- The template of a TSDS data stream sets certain settings that we do not want to have in the failure store index. We added an allowlist that gets applied before we add the necessary index settings. 

Furthermore, we added a test case to capture this.
2024-10-22 19:45:24 +11:00
Ignacio Vera
2203e4c538
Grow internal arrays when growing the capacity in AbstractHash implementations (#114907) (#115289)
This commit resizes those arrays when incrementing the capacity of the hashes to the maxSize.
2024-10-22 16:40:47 +11:00
Ignacio Vera
2ad380743a
Always check the parent breaker with zero bytes in PreallocatedCircuitBreakerService (#115181) (#115274)
PreallocatedCircuitBreakerService will call the parent breaker if the nunber of bytes passed is zero.
2024-10-22 06:46:59 +11:00
Carlos Delgado
b73f5c55df
Do not exclude empty arrays or empty objects in source filtering with Jackson streaming (#112250) (#115222)
(cherry picked from commit 6be3036c01)

Co-authored-by: mccheah <mcheah@palantir.com>
2024-10-21 16:29:02 +02:00
Nikolaj Volgushev
dd50942dc8
[8.x] Reprocess operator file settings on service start (#114295) (#115198)
* Reprocess operator file settings on service start (#114295)

Changes `FileSettingsService` to reprocess file settings on every
restart or master node change, even if versions match between file and
cluster-state metadata. If the file version is lower than the metadata
version, processing is still skipped to avoid applying stale settings. 

This makes it easier for consumers of file settings to change their
behavior w.r.t. file settings contents. For instance, an update of how
role mappings are stored will automatically apply on the next restart,
without the need to manually increment the file settings version to
force reprocessing. 

Relates: ES-9628

* Backport 114295

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-22 00:26:00 +11:00
Matteo Piergiovanni
b80f6770f5
Bool query early termination should also consider must_not clauses (#115031) (#115072)
* Bool query early termination should also consider must_not clauses

* Update docs/changelog/115031.yaml

(cherry picked from commit 5e381a3a89)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-21 10:44:14 +02:00
Nhat Nguyen
fdbef48ac0
Refactor TSDB doc_values util allow introduce new codec (#115042) (#115159)
This PR refactors the doc_values utils used in the TSDB codec to allow
sharing between the current codec and the new codec.
2024-10-19 11:12:39 +11:00
Oleksandr Kolomiiets
682ed39b21
Remove IndexMode#isSyntheticSourceEnabled (#114963) (#115144)
(cherry picked from commit 16bde51891)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java
2024-10-18 15:44:40 -07:00
Benjamin Trent
e7ac7c3c96
Add timeout and cancellation check to rescore phase (#115048) (#115132)
This adds cancellation checks to rescore phase. This cancellation checks
for the parent task being cancelled and for timeout checks.

The assumption is that rescore is always significantly more expensive
than a regular query, so we check for timeout as frequently as the most
frequent check in ExitableDirectoryReader.

For LTR, we check on hit inference. Maybe we should also check for per
feature extraction?

For QueryRescorer, we check in the combine method.

closes: https://github.com/elastic/elasticsearch/issues/114955
2024-10-19 05:58:04 +11:00
Martijn van Groningen
d9c930d924
Include ignored source as part of loading field values in ValueSourceReaderOperator via BlockSourceReader. (#114903) (#115064)
Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
2024-10-18 17:58:42 +11:00
Martijn van Groningen
33b96cdc81
Reduce the number of SFM singletons. (#114969) (#115036)
This remove all recovery source specific SFM singletons. Whether  recovery source is enabled can be checked via `DocumentParserContext`. This reduces the number of SFM instances by half.
2024-10-18 04:27:33 +11:00
Alexander Spies
068f51d76a
Reapply "ESQL: Remove parent from FieldAttribute (#112881)" (#115006) (#115007) (#115035)
This reverts commit 17ecb66a06 and
reapplies https://github.com/elastic/elasticsearch/pull/112881 once the
previous, non-backported transport version bump is dealt with.
2024-10-18 03:46:46 +11:00
Ignacio Vera
918e8c48ed
Don't normalize coordinates in GeoTileUtils (#114929) (#115030)
The main users of this class use as input latitudes and longitudes read from doc values. These coordinates are always 
on bounds so there is no point to try to normalise them, more over when this piece of code is in the hot path for 
aggregations.
2024-10-18 03:11:16 +11:00
elasticsearchmachine
d55ddee129 Bump versions after 8.15.3 release 2024-10-17 14:52:28 +00:00
Simon Cooper
274966443b
Add transport version constants used in main, to ensure the protocol is identical (#115013)
This adds constants introduced by #115009 on main, so that the transport version id matches between main and 8.x to make backports apply easier
2024-10-17 15:24:44 +01:00
Simon Cooper
a7883aea9b
Squash transport versions for 8.15 (#114827) (#114971)
Backport #114827 to 8.x
2024-10-17 11:48:44 +01:00
Nhat Nguyen
4b690386e4
Collect query metrics on search nodes (#114267) (#114345)
When I added the query/fetch metrics, I overlooked that non-primary 
shards were being skipped during metrics collection, and the stateful
tests didn't catch it. This change ensures that search metrics are now
collected from every shard copy.
2024-10-17 12:25:25 +11:00
Panagiotis Bailis
49f359a598
[8.x] Adding deprecation warnings for rank and sub_searches (#114854) (#114950)
* Adding deprecation warnings for rank and sub_searches (#114854)

* removing updatev10 reference
2024-10-17 07:58:07 +11:00
Tim Brooks
9922d544a1
Standardize error code when bulk body is invalid (#114869) (#114944)
Currently the incremental and non-incremental bulk variations will
return different error codes when the json body provided is invalid.
This commit ensures both version return status code 400. Additionally,
this renames the incremental rest tests to bulk tests and ensures that
all tests work with both bulk api versions. We set these tests to
randomize which version of the api we test each run.
2024-10-17 06:30:33 +11:00
Luca Cavanna
4be9122150
Enhance empty queue conditional in slicing logic (#114911) (#114940)
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see https://github.com/apache/lucene/pull/13472), we have removed the search
worker thread pool in #111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.
2024-10-17 06:01:35 +11:00
Brian Seeders
b84e990232
Bump to version 8.17.0 2024-10-16 12:32:03 -04:00
Salvatore Campagna
0cab608638
Inject the host.name field mapping only if required for logsdb index mode (#114573) (#114916)
Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.

(cherry picked from commit 9bf6e3b0ba)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-16 18:03:35 +02:00
Kostas Krikellas
bae09a8cf2
[8.x] Reset array scope tracking for nested objects (#114891) (#114906)
* Reset array scope tracking for nested objects (#114891)

* Reset array scope tracking for nested objects

* update

* update

* update

(cherry picked from commit 8ae5ca468d)

# Conflicts:
#	muted-tests.yml

* update

* update
2024-10-17 01:30:54 +11:00
Panagiotis Bailis
5aadfebfca
Removing tech-preview header and updating documentation for retrievers and RRF (#114810) (#114875)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-16 22:38:37 +11:00
David Turner
bdac01785a
Inline MockTransportService#getLocalDiscoNode() (#114883) (#114887)
This method just delegates to `getLocalNode()`, we may as well call the
more widely-used method with the shorter name directly.
2024-10-16 21:47:07 +11:00
Mary Gouseti
6de48b122f
[Failure store - selector syntax] Replace failureOptions with selector options internally. (#114812) (#114882)
**Introduction**

> In order to make adoption of failure stores simpler for all users, we
are introducing a new syntactical feature to index expression
resolution: The selector. > > Selectors, denoted with a :: followed by a
recognized suffix will allow users to specify which component of an
index abstraction they would like to operate on within an API call. In
this case, an index abstraction is a concrete index, data stream, or
alias; Any abstraction that can be resolved to a set of indices/shards.
We define a component of an index abstraction to be some searchable unit
of the index abstraction. > > To start, we will support two components:
data and failures. Concrete indices are their own data components, while
the data component for index aliases are all of the indices contained
therein. For data streams, the data component corresponds to their
backing indices. Data stream aliases mirror this, treating all backing
indices of the data streams they correspond to as their data component.
>  > The failure component is only supported by data streams and data
stream aliases. The failure component of these abstractions refer to the
data streams' failure stores. Indices and index aliases do not have a
failure component.

For more details and examples see
https://github.com/elastic/elasticsearch/pull/113144. All this work has
been cherry picked from there.

**Purpose of this PR**

This PR is replacing the `FailureStoreOptions` with the
`SelectorOptions`, there shouldn't be any perceivable change to the user
since we kept the query parameter "failure_store" for now. It will be
removed in the next PR which will introduce the parsing of the
expressions. 

_The current PR is just a refactoring and does not and should not change
any existing behaviour._
2024-10-16 20:27:25 +11:00
Salvatore Campagna
87987e7c6a
[8.x] Allow synthetic source and disabled source for standard indices (#114817) (#114866)
* Allow synthetic source and disabled source for standard indices (#114817)

When using the index.mapping.source.mode setting we need to make sure
that it takes precedence and that is used also when standard index mode
is used. Without this patch we always return stored source if
_source.mode is not used and the setting is.

Relates #114433

(cherry picked from commit 3af4d67fac)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java

* fix: conflict resolution mistake

* fix: error message

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-16 08:56:47 +02:00
David Kyle
19fddef8fb
[ML] Create an ml node inference endpoint referencing an existing deployment (#114750) (#114858)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-16 09:51:37 +11:00
Benjamin Trent
c43c9b6086
Fix bbq index feature exposure for testing & remove feature flag (#114832) (#114851)
We actually don't need a cluster feature, a capability added if the
feature flag is enabled is enough for testing.

closes https://github.com/elastic/elasticsearch/issues/114787

(cherry picked from commit e87b894f68)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-16 06:51:06 +11:00
Oleksandr Kolomiiets
17022fdefc
[8.x] Allow stored source in logsdb and tsdb (#114454) (#114648)
* Allow stored source in logsdb and tsdb (#114454)

(cherry picked from commit a62228a744)

# Conflicts:
#	modules/aggregations/build.gradle
#	modules/data-streams/src/javaRestTest/java/org/elasticsearch/datastreams/logsdb/LogsIndexModeCustomSettingsIT.java
#	rest-api-spec/build.gradle

* Fix tests

* Fix tests

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-15 12:07:27 -07:00
David Turner
a47a21012f
Introduce utils for _really_ stashing the thread context (#114786) (#114841)
`ThreadContext#stashContext` does not yield a completely fresh context:
it preserves headers related to tracing the original request. That may
be appropriate in many situations, but sometimes we really do want to
detach processing entirely from the original task. This commit
introduces new utilities to do that.
2024-10-15 18:45:52 +01:00
Costin Leau
fc901055ba
ESQL: Introduce per agg filter (#113735) (#114842)
* ESQL: Introduce per agg filter (#113735)

Add support for aggregation scoped filters that work dynamically on the
 data in each group.

| STATS
    success = COUNT(*) WHERE 200 <= code AND code < 300,
   redirect = COUNT(*) WHERE 300 <= code AND code < 400,
 client_err = COUNT(*) WHERE 400 <= code AND code < 500,
 server_err = COUNT(*) WHERE 500 <= code AND code < 600,
 total_count = COUNT(*)

Implementation wise, the base AggregateFunction has been extended to
 allow a filter to be passed on. This is required to incorporate the
 filter as part of the aggregate equality/identify which would fail with
 the filter as an external component.

As part of the process, the serialization for the existing aggregations
 had to be fixed so AggregateFunction implementations so that it
 delegates to their parent first.

(cherry picked from commit d102659dce)

* Update docs/changelog/114842.yaml

* Delete docs/changelog/114842.yaml
2024-10-16 03:09:03 +11:00
Jim Ferenczi
8204ebae57
Fix Max Score Propagation in RankDocsQuery (#114716) (#114725)
Fix rank doc query when some segments have no ranked docs
2024-10-16 01:28:04 +11:00
David Kyle
63b4d76c33
[ML] Dynamically get of num allocations (#114636) (#114805) 2024-10-15 12:35:50 +01:00