Commit graph

75574 commits

Author SHA1 Message Date
Mark Vieira
6c4e55e714
Fix file path when looking for docker exclusions file (#105304) 2024-02-08 12:27:09 -08:00
David Turner
97dbb2a27e
Fix leaked HTTP response sent after close (#105293)
Today a `HttpResponse` is always released via a `ChannelPromise` which
means the release happens on a network thread. However, it's possible we
try and send a `HttpResponse` after the node has got far enough through
shutdown that it doesn't have any running network threads left, which
means the response just leaks.

This is no big deal in production, it becomes irrelevant when the
process exits, but in tests we start and stop many nodes within the same
process so mustn't leak anything.

At this point in shutdown, all HTTP channels are now closed, so it's
sufficient to check whether the channel is open first, and to fail the
listener on the calling thread if not. That's what this commit does.

Closes #104651
2024-02-08 14:57:02 -05:00
Przemysław Witek
0cbc745f57
[Transform] Do not log warning when triggering an ABORTING transform (#105234) 2024-02-08 20:10:34 +01:00
John Verwolf
95570af0eb
Testfix/fixes 103781 (#105251)
Updates transport telemetry tests to use a single node test case in an attempt to reduce flakyness.
2024-02-08 11:03:47 -08:00
Michael Peterson
33e22c4467
Docs improvements for the new resolve/cluster API (#105297) 2024-02-08 13:36:07 -05:00
Jonathan Buttner
563c3e60ab
Moving byte embeddings to text_embedding_bytes field (#105290) 2024-02-08 13:21:20 -05:00
Przemysław Witek
8cfcb706f8
[Transform] Add test for cancelling transform persistent task. (#105285) 2024-02-08 19:09:13 +01:00
Michael Peterson
15ded61150
Reversing unintentional change to indices.resolve_index/10_basic_resolve_index.yml (#105300)
Was changed in #102726
2024-02-08 12:48:40 -05:00
Armin Braun
8e42535440
Fix potential huge allocations when reading TermsQueryBuilder.BinaryValues from the network (#105235)
We should be reading a single `BytesReference` (that would be backed by a single large `byte[]`)
here when we care about the individual values in the list only.
Without breaking the behavior of only serializing once when sending to multiple targets this change:
* lazy serializes as needed and keeps the original terms, so we don't needlessly go through serialization in e.g. a single node situation
or or requests that are handled on the coordinator directly (concurrency should be fine here, we serialize on the same thread in practice and
should we ever not be on the same thread at all times this will worst case lead to serializing multiple times).
* stops allocating a potentially huge byte[] when receiving these things over the wire
2024-02-08 17:54:56 +01:00
Dianna Hohensee
ffc711bf35
Add an outline for the distributed area team architecture guide (#105264) 2024-02-08 11:53:08 -05:00
Matteo Piergiovanni
54cfce4379
Flag in _field_caps to return only fields with values in index (#103651)
We are adding a query parameter to the field_caps api in order to filter out 
fields with no values. The parameter is called `include_empty_fields`  and 
defaults to true, and if set to false it will filter out from the field_caps 
response all the fields that has no value in the index.
We keep track of FieldInfos during refresh in order to know which field has 
value in an index. We added also a system property 
`es.field_caps_empty_fields_filter` in order to disable this feature if needed.

---------

Co-authored-by: Matthias Wilhelm <ankertal@gmail.com>
2024-02-08 17:52:21 +01:00
Costin Leau
5c1e3e2c91
ESQL: Replace [ccq.mode] in favor of a policy prefix (#105224)
For consistency, replace [ccq.mode:<type>] with _<resolution>:policyName
`ENRICH [ccq.mode=any] policyName` becomes `ENRICH _any:policyName`
2024-02-08 11:09:00 -05:00
Michael Peterson
ac36aa7795
Resolve Cluster API (#102726)
To improve cross-cluster search user experience, Kibana needs an endpoint that is accessible
by arbitrary Kibana dashboard search users and provides:

1. a listing of clusters in scope for a CCS query (based on the index expression and whether 
there are any indices on each cluster that the Kibana user has access to query).
2. whether that cluster is currently connected to the querying cluster (will it come back as 
skipped or failed in a CCS search)
3. showing the skip_unavailable setting for those clusters (so you can know whether it will
return skipped or failed in a CCS search)
4. the ES version of the cluster

Since no single Elasticsearch endpoint provides all of these features, this PR creates a new endpoint `_resolve/cluster` that works along side the existing `_resolve/index` endpoint 
(and leverages some of its features).

Example usage against a cluster with 2 remote clusters configured:

GET /_resolve/cluster/*,remote*:bl*

Response:

{
  "(local)": {
    "connected": true,
    "skip_unavailable": false,
    "matching_indices": true,
    "version": {
      "number": "8.12.0-SNAPSHOT",
      "build_flavor": "default",
      "minimum_wire_compatibility_version": "7.17.0",
      "minimum_index_compatibility_version": "7.0.0"
    }
  },
  "remote2": {
    "connected": true,
    "skip_unavailable": true,
    "matching_indices": true,
    "version": {
      "number": "8.12.0-SNAPSHOT",
      "build_flavor": "default",
      "minimum_wire_compatibility_version": "7.17.0",
      "minimum_index_compatibility_version": "7.0.0"
    }
  },
  "remote1": {
    "connected": true,
    "skip_unavailable": false,
    "matching_indices": false,
    "version": {
      "number": "8.12.0-SNAPSHOT",
      "build_flavor": "default",
      "minimum_wire_compatibility_version": "7.17.0",
      "minimum_index_compatibility_version": "7.0.0"
    }
  }
}

Almost all errors show up as "error" entries in the response.
Only the local SecurityException returns a 403 since that happens before the ResolveCluster
Transport code kicks in.
2024-02-08 10:50:05 -05:00
Costin Leau
fca3fc82be
ESQL: Grammar - FROM METADATA no longer require [] (#105221)
Remove usage of [ ] through-out the grammar, in this case inside
 FROM METADATA.
2024-02-08 07:03:19 -08:00
David Turner
cda94ac3ca
Extend timeout in RepositoryAnalysisFailureIT (#105287)
We see occasional test failures in CI due to the analysis not completing
within this 30s timeout. It doesn't look like anything is actually
wrong, the test machine is just busy and these tests can be quite
IO-intensive. This commit gives them more time.

Closes #99422
2024-02-08 09:49:06 -05:00
Felix Barnsteiner
f36dff7485
Efficiently encode multi-valued dimensions (#105271)
Detects and efficiently encodes cyclic ordinals, as proposed by
@jpountz. This is beneficial for encoding dimensions that are
multivalued, such as host.ip.

A follow-up on #99747
2024-02-08 09:30:16 -05:00
Dmitry Cherniachenko
263ea5e987
Replace generic HashSet / HashMap with more efficient EnumSet / EnumMap (#105238) 2024-02-08 13:43:14 +00:00
Felix Barnsteiner
f426b68a82
Unmute LogsDataStreamIT.testIgnoreDynamicBeyondLimit (#105282) 2024-02-08 13:26:42 +01:00
Kostas Krikellas
b6f20ff166
Increase timeout for ensureGreen in testDownsampleIndexWithRollingRestart (#105277) 2024-02-08 14:23:28 +02:00
David Turner
e489951d84
Close currentChunkedWrite on client cancel (#105258)
If the client closes the channel while we're in the middle of a chunked
write then today we don't complete the corresponding listener. This
commit fixes the problem.
2024-02-08 07:07:04 -05:00
Felix Barnsteiner
9dfd5dbd8f
Mute LogsDataStreamIT.testIgnoreDynamicBeyondLimit (#105280) 2024-02-08 12:25:45 +01:00
Albert Zaharovits
6cd29a6331
Introduce AggregationBuilder#deepCopy (#105114)
Introduces an AggregationBuilder#deepCopy method that
iteratively copies, and optionally modifies, AggregationBuilder instances.

Relates: #104895
2024-02-08 13:23:38 +02:00
Simon Cooper
9ba5651e74
Collapse all transport versions between 8.11 and 8.12 into a constant for 8.12 (#104937)
This also cleans up the versioning checks around ELSER models
2024-02-08 10:54:19 +00:00
Ignacio Vera
8f37ef977f
Remove abstract method InternalMultiBucketAggregation#reduceBucket (#105275) 2024-02-08 11:24:02 +01:00
David Turner
7b44334727 AwaitsFix for #105276 2024-02-08 09:24:29 +00:00
Felix Barnsteiner
50902e15a6
Use new ignore_dynamic_beyond_limit setting in logs and metrics data streams (#105180)
This reduces the risk of document loss if too many fields are added.

As these component templates are imported by Fleet, this also affects
integrations.
2024-02-08 04:23:50 -05:00
David Turner
4467352887
More descriptive messages for safeAwait (#105260)
Today the various `ESTestCase#safeAwait` variants do not include a
descriptive message with their failures, which means you have to dig
through the stack trace to work out the reason for a test failure. This
commit adds the missing messages to make it a little easier on the
reader.
2024-02-08 08:45:41 +00:00
Ignacio Vera
609e8059eb
Introduce an AggregatorReducer to reduce the footprint of aggregations in the coordinating node (#105207)
This commit adds an abstraction that performs reduction of InternalAggregations in a streaming fashion.
2024-02-08 09:30:54 +01:00
Henning Andersen
d00b5d37bd
Lower G1 minimum full GC interval (#105259)
We sometimes see a need to do full GC twice within the current 5s interval.
While we should work to improve our allocation pattern for that, it also
seems too conservative to not allow more full GCs, as long as we also get
some real work done. Hence lowering it to 2s here, which would fix the
current problematic cases.
2024-02-08 08:53:17 +01:00
Bogdan Pintea
f26691f987
ESQL: Mark a few features as experimental (#105263)
Mark the following features as experimental in the docs:
* `AUTO_BUCKET()`
* `SHOW FUNCTIONS`
* unsigned_long type
2024-02-07 17:28:13 -08:00
Ryan Ernst
6375e9f443
Add native access library (#105100)
Elasticsearch requires access to some native functions. Historically
this has been achieved with the JNA library. However, JNA is a
complicated, magical library, and has caused various problems booting
Elasticsearch over the years. The new Java Foreign Function and Memory
API allows access to call native functions directly from Java. It also
has the advantage of tight integration with hotspot which can improve
performance of these functions (though performance of Elasticsearch's
native calls has never been much of an issue since they are mostly at
boot time).

This commit adds a new native lib that is internal to Elasticsearch. It
is built to use the foreign function api starting with Java 21, and
continue using JNA with Java versions below that.

Only one function, checking whether Elasticsearch is running as root, is
migrated. Future changes will migrate other native functions.
2024-02-07 18:27:09 -05:00
Ry Biesemeyer
0022005e17
Add stable ThreadPool constructor to LogstashInternalBridge (#105163) 2024-02-07 17:20:59 -05:00
Nhat Nguyen
c736c34035
Avoid wrapping searchers multiple times in mget (#104227)
Wrapping a searcher can be expensive; and this optimization avoids 
wrapping the same searcher multiple times for a MGET request.

Closes #85069
2024-02-07 12:53:34 -08:00
Fabio Busatto
b1adb78f6c
[DOCS] Update remote cluster setup instructions (#105256) 2024-02-07 21:11:57 +01:00
Niels Bauman
4b54526e8f
Fix UpdateHealthInfoCacheActionTests.testRequestSerialization failing (#105257)
Fixes #105254
2024-02-07 14:25:01 -05:00
Ryan Ernst
18a1ac09e7
Use open and fstat in preallocate (#105171)
Preallocate opens a FileInputStream in order to get a native file
desctiptor to pass to native functions. However, getting at the file
descriptor requires breaking modular access. This commit adds native
posix functions for opening/closing and retrieving stats on a file in
order to avoid requiring additional permissions.
2024-02-07 13:40:05 -05:00
Ryan Ernst
2ca6df71d6
Make ProviderLocator aware of boot qualified exports (#105250)
Qualfied exports in the boot layer only work when they are to other boot
modules. Yet Elasticsearch has dynamically loaded modules as in plugins.
For this purpose we have ModuleQualifiedExportsService. This commit
moves loading of ModuleQualfiedExportService instances in the boot layer
into core so that it can be reused by ProviderLocator when a qualified
export applies to an embedded module.
2024-02-07 09:43:22 -08:00
Keith Massey
d8fdf6f04d
Releasing child request builder memory from BulkRequestBuilder (#105194) 2024-02-07 10:57:58 -06:00
Mary Gouseti
65d1d3d47d
Change the rest client configuration in the LazyRolloverDataStreamIT (#105243) 2024-02-07 17:44:40 +02:00
Tim Rühsen
0ea58c8ec2
[Profiling] Add azure_cost_factor request parameter (#105231) 2024-02-07 16:42:02 +01:00
Luca Cavanna
32cbb49a3f
Remove SearchException usages without a proper status code (#105150)
We have some usages of SearchException that don't provide a cause exception and also don't define a status code. That means that the status code of such requests will default to 500 which is in many cases not a good choice. Normally, for internal server error a cause is associated with the wrapper exception.

This scenario is not very common, and looks like a leftover of shard validation that used to happen on shards, which can be moved to the coordinating node. This commit moves some of the exceptions thrown in SearchService#parseSource to SearchRequest#validate. This way we will fail before serializing the shard level request to all the shards, which is much better.

Note that for bw comp reasons, we need to keep on throwing the same exception from the data node, while intuitively this is now replaced by the same validation in the coord node. This is because in a mixed cluster scenario, an older node that does not perform the validation as coord node, could serialize shard level requests that need to be checked again on data nodes, to prevent unexpected situations.
2024-02-07 16:12:27 +01:00
Liam Thompson
fb743da0d7
[DOCS][ESQL] Document _source metadata field (#105237)
* [DOCS][ESQL] Document _source metadata field

* 🚗 Minor copyedit to entire page
2024-02-07 15:57:51 +01:00
Martijn van Groningen
cc67205c25
Assign index.downsample.interval setting when downsample index gets created. (#105241)
This avoids keeping downsamplingInterval field around. Additionally, the
downsample interval is known when downsample interval is invoked and
doesn't change.
2024-02-07 09:31:26 -05:00
Niels Bauman
64891011d3
Extend repository_integrity health indicator for unknown and invalid repos (#104614)
This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status.
To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks.
Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.
2024-02-07 15:18:55 +01:00
Craig Taverner
a58b2c2b05
Move doc-values classes needed by ST_INTERSECTS to server (#104980)
* Move doc-values classes needed by ST_INTERSECTS to server

This classes are needed by ESQL spatial queries, and are not licensed in a way that prevents this move.
Since they depend on lucene it is not possible to move them to a library.
Instead they are moved to be co-located with the GeoPoint doc-values classes that already exist in server.

* Moved to lucene package org.elasticsearch.lucene.spatial

* Moved Geo/ShapeDocValuesQuery to server because it is Lucene specific

And this gives us access to these classes from ESQL for lucene-pushdown of spatial queries.
2024-02-07 15:00:38 +01:00
Daniel Mitterdorfer
9651cd7e26
[Profiling] Use plain arrays in stack traces (#105226)
With this commit we refactor the internal representation of stacktraces
to use plain arrays instead of lists for some of its properties. The
motivation behind this change is simplicity:

* It avoids unnecessary boxing
* We could eliminate a few redundant null checks because we use
  primitive types now in some places
* We could slightly simplify runlength decoding
2024-02-07 14:39:38 +01:00
Martijn van Groningen
baf8b5ae38
Fix a few downsample api issues (#105228)
Improve downsampling by making the following changes:

- Avoid NPE and assert tripping when fetching the last processed tsid.
- If the write block has been set, then there is no reason to start the downsample persistent tasks, since shard level downsampling has completed. Not doing so also causes ILM/DSL to get stuck on downsampling. In this case shard level downsampling should be skipped.
- Sometimes the source index may not be allocated yet on the node performing shard level downsampling operation. This causes a NPE, with this PR, this now fails a shard level downsample with a less disturbing error.

Additionally unmute
DataStreamLifecycleDownsampleDisruptionIT#testDataStreamLifecycleDownsampleRollingRestart

Relates to #105068
2024-02-07 08:28:28 -05:00
David Turner
25dd12df3b AwaitsFix for #105236 2024-02-07 12:11:51 +00:00
Pooya Salehi
db4d31ddb4
Improve exception handling for stateless realtime-get/mget (#105028)
Relates #105003, ES-5727
2024-02-07 12:50:57 +01:00
Mary Gouseti
011876367a
Execute lazy rollover with an internal dedicated user #104732 (#104905)
The unconditional rollover that is a consequence of a lazy rollover command is triggered by the creation of a document. In many cases, the user triggering this rollover won't have sufficient privileges to ensure the successful execution of this rollover. For this reason, we introduce a dedicated rollover action and a dedicated internal user to cover this case and enable this functionality.
2024-02-07 13:01:01 +02:00