* Fix privileges for system index migration WRITE block (#121327)
This PR removes a potential cause of data loss when migrating system indices. It does this by changing the way we set a "write-block" on the system index to migrate - now using a dedicated transport request rather than a settings update. Furthermore, we no longer delete the write-block prior to deleting the index, as this was another source of potential data loss. Additionally, we now remove the block if the migration fails.
* Update release notes
* Delete docs/changelog/122214.yaml
Two of the timeout tests have been muted for several months. The reason is that we tightened the assertions to cover for partial results being returned, but there were edge cases in which partial results were not actually returned.
The timeout used in the test was time dependent, hence when the timeout precisely will be thrown is unpredictable, because we have timeout checks in different places in the codebase, when iterating through the leaves, before scoring any document, or while scoring documents. The edge case that caused failures is a typical timing issue where the initial check for timeout in CancellableBulkScorer already triggers the timeout, before any document has been collected.
I made several adjustments to the test to make it more robust:
- use index random to index documents, that speeds it up
- share indexing across test methods, so that it happens once at the suite level
- replace the custom query that triggers a timeout to not be a script query, but rather a lucene query that is not time dependent, and throws a time exceeded exception precisely where we expect it, so that we can test how the system reacts to that. That allows to test that partial results are always returned when a timeout happens while scoring documents, and that partial results are never returned when a timeout happens before we even started to score documents.
Closes#98369Closes#98053
Improve LuceneSyntheticSourceChangesSnapshot by triggering to use a sequential stored field reader if docids are dense. This is done by computing for which docids to synthesize recovery source for. If the requested docids are dense and monotonic increasing a sequential stored field reader is used, which provided recovery source for many documents without repeatedly de-compressing the same block of stored fields.
* Adding condition to verify if the field belongs to an index
* Update docs/changelog/121720.yaml
* Remove unnecessary comma from yaml file
* remove duplicate inference endpoint creation
* updating isMetadata to return true if mapper has the correct type
* remove unnecessary index creation in yaml tests
* Adding check if the document has returned in the yaml test
* Updating test to skip time series check if index mode is standard
* Refactor tests to consider verifying every metafields with all index modes
* refactoring test to verify for all cases
* Adding assetFalse if not time_series and fields are from time_series
* updating test texts to have better description
* [Deprecation API] Adjust details in the SourceFieldMapper deprecation warning (#122041)
In this PR we improve the deprecation warning about configuring source
in the mapping.
- We reduce the size of the warning message so it looks better in kibana.
- We keep the original message in the details.
- We use an alias help url, so we can associate it with the guide when it's created.
* Remove bwc code
We shouldn't run the post-snapshot-delete cleanup work on the master
thread, since it can be quite expensive and need not block subsequent
cluster state updates. This commit forks it onto a `SNAPSHOT` thread.
* Add 9.0 patch transport version constants #121985
Transport version changes must be unique per branch. Some transport
version changes meant for 9.0 are missing unique backport constants.
This is a backport of #121985, adding unique transport version patch
numbers for each change intended for 9.0.
* match constant naming in main
* Validate transport handshake from known version (#121747)
With parallel releases on multiple branches it's possible that an older
branch sees a transport version update that is not known to a
numerically newer but chronologically older version. In that case the
two nodes cannot intercommunicate, so with this commit we reject such
connection attempts at the version negotiation stage.
* Fix version/transportversion confusion
* CI poke
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Nik Everett <nik9000@gmail.com>
This test creates an incorrectly-serialized handshake which cannot be
validated, and #121747 made that validation compulsory. This test
corrects the serialization.
Closes#121816
* Improve logging of put-mapping failures (#121372)
No sense in converting to a list just to convert to a string, we may as
well convert directly to a string. Also removes the unnecessary extra
`[]` wrapper.
* CI poke
* CI poke
Today we use the ID from `Version#CURRENT` in this test, which only
works if its ID is no less than that of `TranportVersion#current()`.
This commit fixes the test to ensure it always picks a transport version
ID that is not from the past.
The node environment has many paths. The accessors for these currently
use a "file" suffix, but they are always directories. This commit
renames the accessors to make it clear these paths are directories.
This adds a `task_description` field to `profile` output and task
`status`. This looks like:
```
...
"profile" : {
"drivers" : [
{
"task_description" : "final",
"start_millis" : 1738768795349,
"stop_millis" : 1738768795405,
...
"task_description" : "node_reduce",
"start_millis" : 1738768795392,
"stop_millis" : 1738768795406,
...
"task_description" : "data",
"start_millis" : 1738768795391,
"stop_millis" : 1738768795404,
...
```
Previously you had to look at the signature of the operators in the
driver to figure out what the driver is *doing*. You had to know enough
about how ESQL works to guess. Now you can look at this description to
see what the server *thinks* it is doing. No more manual classification.
This will be useful when debugging failures and performance regressions
because it is much easier to use `jq` to group on it:
```
| jq '.profile[] | group_by(.task_description)[]'
```
If a custom analyzer provided in _analyze API can not be built, return
400 instead of the current 500. This most probably means that the user's
provided analyzer specifications are wrong.
Closes#121443
Backports #114496 to 9.0
> Failure handling for snapshots was made stricter in #107191 (8.15), so this
field is always empty since then. Clients don't need to check it anymore for
failure handling, we can remove it from API responses in 9.0
The test failed because we tried to move a shard to a node that already
has a copy. This change prevents that from happening.
Closes#119280Closes#120772
This commit forces the delegate for ES logging to always use the String
version of LogManager.getLogger instead of the one taking a Class. The
reason is that if a classloader is not in the hierarchy of the app
classloader, the ES logging configuration will not be found. By using
the String variant, the app classloader is always used.
Co-authored-by: Ryan Ernst <ryan@iernst.net>
* Integrate watsonx reranking to inference api
* Add api_version to the watsonx api call
* Fix the return_doc option
* Add top_n parameter to task_settings
* Add truncate_input_tokens parameter to task_settings
* Add test for IbmWatonxRankedResponseEntity
* Add test for IbmWatonxRankedRequestEntity
* Add test for IbmWatonxRankedRequest
* [CI] Auto commit changes from spotless
* Add changelog
* Fix transport version
* Add test for IbmWatsonxService
* Remove canHandleStreamingResponses
* Add requireNonNull for modelId and projectId
* Remove maxInputToken method
* Convert all optionals to required
* [CI] Auto commit changes from spotless
* Set minimal_supported version to be ML_INFERENCE_IBM_WATSONX_RERANK_ADDED
* Remove extraction of unused fields from IbmWatsonxRerankServiceSettings
* Add space
* Add space
---------
Co-authored-by: Saikat Sarkar <132922331+saikatsarkar056@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This commit adds the data dirs, config dir and temp dir into
entitlement bootstrapping. It doesn't yet use them in entitlement
policies, but makes them available to use within initialization.
* Refactor: separate package for entitlement records (#121204)
* Fix PolicyManagerTests after package move (#121304)
* Fix PolicyManagerTests after package move
* Unmute
If the `MasterService` needs to log a create-snapshot task description
then it will call `CreateSnapshotTask#toString`, which today calls
`RepositoryData#toString` which is not overridden so ends up calling
`RepositoryData#hashCode`. This can be extraordinarily expensive in a
large repository. Worse, if there's masses of create-snapshot tasks to
execute then it'll do this repeatedly, because each one only ends up
yielding a short hex string so we don't reach the description length
limit very easily.
With this commit we provide a more efficient implementation of
`CreateSnapshotTask#toString` and also override
`RepositoryData#toString` to protect against some other caller running
into the same issue.
* Add node-local rate limiting for the inference API
* Fix integration tests by using new LocalStateInferencePlugin instead of InferencePlugin and adjust formatting.
* Correct feature flag name
* Add more docs, reorganize methods and make some methods package private
* Clarify comment in BaseInferenceActionRequest
* Fix wrong merge
* Fix checkstyle
* Fix checkstyle in tests
* Check that the service we want to the read the rate limit config for actually exists
* [CI] Auto commit changes from spotless
* checkStyle apply
* Update docs/changelog/120400.yaml
* Move rate limit division logic to RequestExecutorService
* Spotless apply
* Remove debug sout
* Adding a few suggestions
* Adam feedback
* Fix compilation error
* [CI] Auto commit changes from spotless
* Add BWC test case to InferenceActionRequestTests
* Add BWC test case to UnifiedCompletionActionRequestTests
* Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/InferenceServiceNodeLocalRateLimitCalculator.java
Co-authored-by: Adam Demjen <demjened@gmail.com>
* Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/InferenceServiceNodeLocalRateLimitCalculator.java
Co-authored-by: Adam Demjen <demjened@gmail.com>
* Remove addressed TODO
* Spotless apply
* Only use new rate limit specific feature flag
* Use ThreadLocalRandom
* [CI] Auto commit changes from spotless
* Use Randomness.get()
* [CI] Auto commit changes from spotless
* Fix import
* Use ConcurrentHashMap in InferenceServiceNodeLocalRateLimitCalculator
* Check for null value in getRateLimitAssignment and remove AtomicReference
* Remove newAssignments
* Up the default rate limit for completions
* Put deprecated feature flag back in
* Check feature flag in BaseTransportInferenceAction
* spotlessApply
* Export inference.common
* Do not export inference.common
* Provide noop rate limit calculator, if feature flag is disabled
* Add proper dependency injection
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co>
Co-authored-by: Adam Demjen <demjened@gmail.com>
Adds non-grouping support for min, max, sum, and count, using
CompositeBlock as the underlying block type and an internal
FromAggregateMetricDouble function to handle converting from
CompositeBlock to the correct metric subfields.
Closes#110649
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field.
The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default.
All other fields will continue to use the `unified` highlighter by default.
Today, Elasticsearch supports two models to establish secure connections
and trust between two Elasticsearch clusters:
- API key based security model
- Certificate based security model
This PR deprecates the _Certificate based security model_ in favour of *API key based security model*.
The _API key based security model_ is preferred way to configure remote clusters,
as it allows to follow security best practices when setting up remote cluster connections
and defining fine-grained access control.
Users are encouraged to migrate remote clusters from certificate to API key authentication.