This change adds access to mapped match_only_text fields via the Painless scripting fields API. The
values returned from a match_only_text field via the scripting fields API always use source as described
by (#81246). These are not available via doc values so there are no bwc issues.
This change adds source fallback support for date and date_nanos by using the existing
SourceValueFetcherSortedNumericIndexFieldData to emulate doc values.
This change adds access to mapped text fields via the Painless scripting fields API. The values returned
from a text field via the scripting fields API always use source as described by (#81246). Access via the
old-style through doc will still depend on field data, so there is no change and avoids bwc issues.
Currently, source fallback numeric types do not match doc values numeric types. Source fallback
numeric types de-duplicate numeric values in multi-valued fields. This change removes the de-
duplication for source fallback values for numeric types using value fetchers. This also adds test cases
for all the supported source fallback types to ensure they continue to match their doc values
counterparts exactly.
Adds support for loading `text` and `keyword` fields that have
`store: true`. We could likely load *any* stored fields, but I
wanted to blaze the trail using something fairly useful.
Create more segments here to make sure the background merge always merges.
With just 3 segments and a max-segments-per-tier of 2 we don't have the guarantee
that a merge will actually run and hence the test will fail when waiting for the background
merge.
closes#89412
This commit adds support for floating point node.processors setting.
This is useful when the nodes run in an environment where the CPU
time assigned to the ES node process is limited (i.e. using cgroups).
With this change, the system would be able to size the thread pools
accordingly, in this case it would round up the provided setting
to the closest integer.
When parsing mappings, we may find a field with same name specified twice, either
because JSON duplicate keys are allowed, or because a mix of object notation and dot
notation is used when providing mappings. The same can happen when applying dynamic
mappings as part of parsing an incoming document, as well as when merging separate
index templates that may contain the definition for the same field using a
mix of object notation and dot notation.
While we propagate the MapperBuilderContext across merge calls thanks to #86946, we do not
propagate the right context when we call merge on objects as part of parsing/building
mappings. This causes a situation in which the leaf fields that result from the merge
have the wrong path, which misses the first portion e.g. sub.field instead of obj.sub.field.
This commit applies the correct mapper builder context when building the object mapper builder
and two objects with same name are found.
Relates to #86946Closes#88573
`map` should be a side-effect free function, because it's a non-terminal operation.
If we want to have side effect, we should use `forEach` which is terminal.
We have to wait for the snapshot to actually have started before we restart
the data node. This is not guaranteed since we start the snapshot via an async
client call. Otherwise the expectation of partial snapshot failure will not hold
because we will only partially fail if the data node has actually started work on
the snapshot when it's restarted.
closes#89039
When calling RuntimeField.parseRuntimeFields() for fields defined in the
search request, we need to wrap the Map containing field definitions in another
Map that supports value removal, so that we don't inadvertently remove the
definitions from the root request. CompositeRuntimeField was not doing this
extra wrapping, which meant that requests that went to multiple shards and
that therefore parsed the definitions multiple times would throw an error
complaining that the fields parameter was missing, because the root request
had been modified.
This change includes:
* moving resetFailedAllocationCounter to a common place in RoutingNodes so that it is accessible from multiple allocation service implementation
* splitting ClusterRerouteTests#testClusterStateUpdateTask into 2 distinct test scenarios to avoid reusing the task
This replaces the code that build `_id` in tsid indices that used to
re-parse the entire json object with one that reuses the parsed values.
It speed up writes by about 4%. Here's the rally output:
```
| Min Throughput | 8164.67 | 8547.24 | docs/s | +4.69% |
| Mean Throughput | 8891.11 | 9256.75 | docs/s | +4.11% |
| Median Throughput | 8774.52 | 9134.15 | docs/s | +4.10% |
| Max Throughput | 10246.7 | 10482.3 | docs/s | +2.30% |
```
I broke shard splitting when `_routing` is required and you use `nested`
docs. The mapping would look like this:
```
"mappings": {
"_routing": {
"required": true
},
"properties": {
"n": { "type": "nested" }
}
}
```
If you attempt to split an index with a mapping like this it'll blow up
with an exception like this:
```
Caused by: [idx] org.elasticsearch.action.RoutingMissingException: routing is required for [idx]/[0]
at org.elasticsearch.cluster.routing.IndexRouting$IdAndRoutingOnly.checkRoutingRequired(IndexRouting.java:181)
at org.elasticsearch.cluster.routing.IndexRouting$IdAndRoutingOnly.getShard(IndexRouting.java:175)
```
This fixes the problem by entirely avoiding the branch of code. That
branch was trying to find any top level documents that don't have a
`_routing`. But we *know* that there aren't any top level documents
without a routing in this case - the routing is "required". ES wouldn't
have let you index any top level documents without the routing.
This also adds a small pile of REST layer tests for shard splitting that
hit various branches in this area. For extra paranoia.
Closes#88109
The test GlobalCheckpointSyncIT#testBackgroundGlobalCheckpointSync
failed once recently due to an engine already close exception. It has
not occurred again and the reasoning is unclear.
This commit adds a log line to indicate exactly when it happens, which
shard it is, and what the current state of the index shard is.
Closes#88428.
Two fixes:
1. Looking up `Custom` values over and over for every shard incurs a measurable cost.
This removes that cost for desired nodes and node shutdown metadata.
2. Node shutdown metadata logic wasn't inlining nicely because of the wrapped map.
No need to be as complicated as we were in many spots, use a simple immutable map
for all operations and remove a bunch of branching.
We have a check that enforces the total number of fields needs to be below a
certain (configurable) threshold. Before runtime fields did not contribute
to the count. This patch makes all runtime fields contribute to the
count, runtime fields:
- that were explicitly defined in mapping by a user
- as well as runtime fields that were dynamically created by dynamic
mappings
Closes#88265
Adds a simple preflight check to `JoinHelper#sendJoinRequest` to avoid
sending a join request if it looks like the `inflight_requests` circuit
breaker is going to trip on the join validation message.
Closes#85003
It's possible for a cluster state update task to emit deprecation
warnings, but if the task is executed in a batch then these warnings
will be exposed to the listener for every item in the batch. With this
commit we introduce a mechanism for tasks to capture just the warnings
relevant to them, along with assertions that warnings are not
inadvertently leaked back to the master service.
Closes#85506
Replaces the two arguments to `ClusterStateTaskExecutor#execute` with a
parameter object called `BatchExecutionContext` so that #85525 can add a
new and rarely-used parameter without generating tons of noise.
This PR builds on #86524, #87482, and #87306 by supporting the case where there has been no
master node in the last 30 second, no node has been elected master, and the current node is not
master eligible.
It's not obvious from reading the code that
`TransportService#sendRequest` and friends always catch exceptions and
pass them to the response handler, which means some callers are wrapping
calls to `sendRequest` in their own unnecessary try/catch blocks. This
commit makes it clear that all exceptions are handled and removes the
unnecessary exception handling in callers.
Closes#89274
The reloading secure settings action sends one node level request with
password (secureString) to each node. These node level requests share
the same secureString instance. This is not a problem when the requests
are sent across the wire because the secureString will end up to be
independent instances after de/serilization. But when the request is
handled by the local node, it skips the de/serialization process. This
means when the secureString gets closed, it is closed for all the node
level requests. If a node level request has not been sent across wire
when the secureString is closed under it, the serialization process will
result into error.
This PR fixes the bug by letting each Node level request creates a clone
of the secureString and have the Nodes level request to track all Node
level requests. All copies of secureString (on the coordinate node) will
be closed at Nodes request level which is safe because it is after
completion of all Node level requests.
Resolves: #88887
This fixes an edge case in the master stability polling code from
#89014. If there has not been an elected master node for the entire life
of a non-master-eligible node, then `clusterChanged()` will have never
been called on that node, so
`beginPollingRemoteMasterStabilityDiagnostic()` will have never been
called. And even though the node might know of some master-eligible
nodes, it will never have requested diagnostic information from them.
This PR adds a call to `beginPollingRemoteMasterStabilityDiagnostic` in
`CoordinationDiagnosticsService`'s constructor to cover this edge case.
In almost all cases, `clusterChanged()` will be called within 10 seconds
so the polling will never occur. However if there is no master node then
there will be no cluster changed events, and `clusterChanged()` will not
be called, and the results of the polling will likely be useful. This PR
has several possibly controversial pieces of code. I'm listing them here
with some discussion:
1. Because there is now a call to `beginPollingRemoteMasterStabilityDiagnostic()` in the ~~constructor~~ object's initialization code, `beginPollingRemoteMasterStabilityDiagnostic()` is no longer solely called from the cluster change thread. However, this call happens before the object is registered as a cluster service listener, so there is no new thread safety concern.
2. Because there is now a call to `beginPollingRemoteMasterStabilityDiagnostic()` in the ~~constructor~~ object's initialization code, we have to explicitly switch to the system context so that the various transport requests work in secure mode.
3. ~~When we're in the constructor, we don't actually know yet whether we're a master eligible node or not, so we kick off `beginPollingRemoteMasterStabilityDiagnostic()` for all node types, including master-eligible nodes. This will be fairly harmless for master eligible nodes though. In the worst case, they'll retrieve some information that they'll never use. This explains why `clusterChanged()` now cancels polling even if we are on a master eligible node.~~
4. ~~It is now possible that we use `clusterService.state()` before it is ready when we're trying to get the list of master-eligible peers. In production mode this method returns null, so we can check that before using it. If assertions are enabled in the JVM, just calling that method throws an `AssertionError`. I'm currently catching that with the assumption that it is harmless because there does not seem to be a way around it (without even further complicating code).~~
5. ~~It is now possible that we call `transportService.sendRequest()` before the transport service is ready. This happens if the server is initializing unusually slowly (i.e. it takes more than 10 seconds to complete the `Node` constructor) and if assertions are enabled. I don't see a way around this without further complicating the code, so I'm catching `AssertionError` and moving on, with the assumption that it will work 10 seconds later when it runs again. I'm also catching and storing `Exception`, which I think I should have been doing before anyway.~~
Note: Points 3, 4, and 5 are no longer relevant because I moved the call
to `beginPollingRemoteMasterStabilityDiagnostic()` out of the
constructor, and am now calling it after the transport service and
cluster state have been initialized.
Today if a node shutdown is stalled due to unmoveable shards then we say
to use the allocation explain API to find details. In fact, since #78727
we include the allocation explanation in the response already so we
should tell users just to look at that instead. This commit adjusts the
message to address this.
Today if there are multiple nodes with the same name then
`POST /_cluster/voting_config_exclusions?node_names=ambiguous-name` will
return a `500 Internal Server Error` and a mysterious message. This
commit changes the behaviour to throw an `IllegalArgumentException`
(i.e. `400 Bad Request`) along with a more useful message describing the
problem.
Adds `WriteScript` as the common base class for the write scripts: `IngestScript`, `UpdateScript`, `UpdateByQueryScript` and `ReindexScript`.
This pulls the common `getCtx()` and `metadata()` methods into the base class and prepares for the implementation of the ingest fields api (https://github.com/elastic/elasticsearch/issues/79155).
As part of the refactor, `IngestScript` now takes a `CtxMap` directly rather than taking "sourceAndMetadata" (`CtxMap`) and `Metadata` (from `CtxMap`). There is a new `getCtxMap()` getter to get the typed `CtxMap`. `getSourceAndMetadata` could have been refactored to do this, but most of the callers of that don't need to know about `CtxMap` and are happy with a `Map<String, Object>`.
Fixing the names of the internal actions used by CoordinationDiagnosticsService to begin with "internal:" so
that they can be used in the system context with security enabled.
The equality checks on these in `DiskThresholdDecider` become very expensive
during reroute in a large cluster. Deduplicating these when building the `ClusterInfo`
saves more than 2% CPU time during many-shards benchmark bootstrapping because
the lookup of the shard data path by shard-routing mostly hit instance equality.
Also, this saves a little memory.
This PR also moves the callback for building `ClusterInfo` from the stats response to
the management pool as it is now more expensive (though the overall CPU use from it is trivial
relative to the cost savings during reroute) and was questionable to run on
a transport thread in a large cluster to begin with.
Co-authored-by: David Turner <david.turner@elastic.co>