Prepares the `main` branch for the backport of #125631. Specifically,
this adds the version constant for 8.19 to main and the serialization
code that lets main talk to 8.19.
The test method needs to distinguish between the available disk space update values, when they are coming from either FS.
So the update values from the 2 FSs mustn't be equal.
Fixes#129149
Initial version of patterned_text mapper. Behaves similarly to match_only_text. This version uses a single SortedSetDocValues for a template and another for arguments. It splits the message by delimiters, the classifies a token as an argument if it contains a digit. All arguments are concatenated and inserted as a single doc value. A single inverted index is used, without positions. Phrase queries are still possible, using the SourceConfirmedTextQuery, but are not fast.
Keep better track of shard contexts using RefCounted, so they can be released more aggressively during operator processing. For example, during TopN, we can potentially release some contexts if they don't pass the limit filter.
This is done in preparation of TopN fetch optimization, which will delay the fetching of additional columns to the data node coordinator, instead of doing it in each individual worker, thereby reducing IO. Since the node coordinator would need to maintain the shard contexts for a potentially longer duration, it is important we try to release what we can eariler.
An even more advanced optimization is to delay fetching to the main cluster coordinator, but that would be more involved, since we need to first figure out how to transport the shard contexts between nodes.
Summary of main changes:
DocVector now maintains a RefCounted instance per shard.
Things which can build or release DocVectors (e.g., LuceneSourceOperator, TopNOperator), can also hold RefCounted instances, so they can pass them to DocVector and also ensure contexts aren't released if they can still be potentially used later.
Driver's main loop iteration (runSingleLoopIteration), now closes its operators even between different operator processing. This is extra aggressive, and was mostly done to improve testability.
Added a couple of tests to TopNOperator and a new integration test EsqlTopNShardManagementIT, which uses the pausable plugin framework to check that TopNOperator releases things as early as possible..
The test might produce over-budget tasks that cannot run even if all the
other tasks that were blocked (and hold up budget) while running
complete. Rather than prevent submitting such over-budget tasks, this
fix simply sets the merge task's queue available budget to
`Long.MAX_VALUE`, in order to ensure that all merge tasks run before the
test ends.
Fixes https://github.com/elastic/elasticsearch/issues/129148
When there are multiple FS with the same available disk space but different total sizes, it's unpredictable (and irrelevant) which one the checker uses.
Fixes#129823
With the introduction of entitlements (#120243) and exclusive file
access (#123087) it is no longer safe to watch a whole directory.
In a lot of deployments, the parent directory for SSL config files
will be the main config directory, which also contains exclusive files
such as SAML realm metadata or File realm users. Watching that
directory will cause entitlement warnings because it is not
permissible for core/ssl-config to read files that are exclusively
owned by the security module (or other modules)
This PR makes RepositoriesService project aware so that the basic Put,
Get, Delete and Verify repository actions are now project scoped.
It intentionally leaves the following aspects out of scope for the
current changes: * Repository stats reporting * Repository clean-up,
analysis and integrity verification * Repository usages for searchable
snapshots and CCR
They will be worked on separately. One main reason for leaving them out
is that they are not needed by OBS which is currently blocked by
repository/snapshot changes. They may also have their own complexities,
e.g. stats reporting.
Resolves: ES-10478
Introduces a new `RemoveBlock` API that complements the existing `AddBlock` API by allowing users to remove index blocks using `DELETE /{index}/_block/{block}`.
Resolves#128966
---------
Co-authored-by: Niels Bauman <nielsbauman@gmail.com>
The comment in `TransportHandshaker` indicates (correctly) that we emit
a warning when talking to a chronologically-newer-yet-numerically-older
version, but the wording of the warning message is inverted and says
that the remote is chronologically-older-yet-numerically-newer. This
commit straightens out the message to match the situation it is
describing.
Relates #123397
To better support project restoration after deletion, this change moves project Settings from ProjectMetadata to the new custom in the ClusterState. It also introduces a new transport version for cluster state serialization. Reserved cluster state for project settings remains within ProjectMetadata.
Note: In mixed-version multiproject clusters, this may cause existing settings for projects to temporarily disappear until all nodes have been upgraded and restarted.
Relates to an effort to combine the merge schedulers from stateless and stateful. The stateless merge scheduler has MergeMetrics that we want in both stateless and stateful. This PR copies over the merge metrics from the stateless merge scheduler into the combined merge scheduler.
Relates ES-9687
* Initial checkin of refactored index_options code
* [CI] Auto commit changes from spotless
* initial unit testing
* complete unit tests; add yaml tests
* [CI] Auto commit changes from spotless
* register test feature for sparse vector
* Update docs/changelog/129089.yaml
* update changelog
* add docs
* explicit set default index_options if null
* [CI] Auto commit changes from spotless
* update yaml tests; update docs
* fix yaml tests
* readd auth for teardown
* only serialize index options if not default
* [CI] Auto commit changes from spotless
* serialization refactor; pass index version around
* [CI] Auto commit changes from spotless
* fix transport versions merge
* fix up docs
* [CI] Auto commit changes from spotless
* fix docs; add include_defaults unit and yaml test
* [CI] Auto commit changes from spotless
* override getIndexReaderManager for SemanticQueryBuilderTests
* [CI] Auto commit changes from spotless
* cleanup mapper/builder/tests; index vers. in type
still need to refactor / clean YAML tests
* [CI] Auto commit changes from spotless
* cleanups to mapper tests for clarity
* [CI] Auto commit changes from spotless
* move feature into mappers; fix yaml tests
* cleanups; add comments; remove redundant test
* [CI] Auto commit changes from spotless
* escape more periods in the YAML tests
* cleanup mapper and type tests
* [CI] Auto commit changes from spotless
* rename mapping for previous index test
* set explicit number of shards for yaml test
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
Send the Elastic API Payload to a SageMaker endpoint, and parse the
response as if it were an Elastic API response.
- SageMaker now supports all task types in the Elastic API format.
- Streaming is supported using the SageMaker client/server rpc,
rather than SSE. Payloads must be in a complete and valid JSON
structure.
- Task Settings can be used for additional passthrough settings, but
they will not be saved alongside the model. Elastic cannot make
guarantees on the structure or contents of this payload, so Elastic
will treat it like the other input payloads and only allow them during
inference.
The vast majority of ingest pipelines are light CPU
operations. We don't want these to be put behind IO work on the write
executor. Instead, execute these on the coordination pool.
* Using the STREAMS_LOGS_SUPPORT_8_19 transport version
* Update StreamsMetadata.java
Returning null from getMinimalSupportedVersion
* Return minimal supported version as 8.19 for metadata object to fix test fail
---------
Co-authored-by: Luke Whiting <luke.whiting@elastic.co>
This commit ports the IndexVersions.UPGRADE_TO_LUCENE_9_12_2 constant to the main branch.
This is required after the update of Lucene 9.12.2 in the 8.19 branch, see #129555.
This change makes the GeoIp persistent task executor/downloader multi-project aware.
- the database downloader persistent task will be at the project level, meaning there will be a downloader instance per project
- persistent task id is prefixed with project id, namely `<project-id>/geoip-downloader` for cluster in MP mode
Due to the way how stored fields get flushed when index sorting is active, it is possible that we encounter significant page cache faults when memory is scarce. In order to mitigate some of the slowness around this, we're planning to no longer mmap the fdt temp file. Initially behind a feature flag, to check for unforeseen side effects.
Typically using always mmap directory is better compared to noifs directory given there is a sufficient memory available to the OS for filesystem caching. However when that isn't the case, then indexing performance can vary a lot (often very slow). This is more true for files tmp files that stored fields create during flushing. These files exist for only a brief moment to sort stored fields in the order of the configured index sorting and are then removed. If these tmp files are mmapped there is risk to trash file system cache.
This change only avoids using mmap for the fdt tmp file. This the file that actually contains the data and can large compared to other files that get flushed. The fdm (metadata) and fdi (stored field index) remain being mmapped.
With this change we will create first the tmp file and the posting list and once the file is deleted we will
merge the vectors on the vec file. Therefore we only have two copies of the vector at the same time.
There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
* Making progress on different request parameters
* Working tests
* Adding custom service validator for rerank
* Fixing embedding bug
* Adding transport version check
* Fixing tests
* Fixing license header
* Fixing writeTo
* Moving file and removing commented code
* Fixing test
* Fixing tests
* Refactoring and tests
* Fixing test
* Field infos calculation method inside Engine
* buildSeqNoStats as static public method
So it can be overriden in stateless if/as needed.
Relates ES-11457
This PR addresses ES-12071.
We want to collect metrics for the time that is spent waiting for the next chunk of a bulk request. This can help with diagnosing high bulk latency in case the latency is attributable to external factors such as network connection.
Co-authored-by: Francisco Fernández Castaño <francisco.fernandez.castano@gmail.com>