Because of #93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
S3 register reads are subject to the regular client retry policy, but in
practice we see failures of these reads sometimes for errors that are
transient but for which the SDK does not retry. This commit adds another
layer of retries to these reads.
Relates ES-9721
* Add data stream template validation
to snapshot restore
* Add data stream template validation
to data stream promotion endpoint
* Add new assertion for response headers
Add a new assertion to synchronously execute a request and check the
response contains a specific warning header
* Test for warning header on snapshot restore
When missing templates
* Test for promotion warnings
* Add documentation for the potential error states
* PR changes
* Spotless reformatting
* Add logic to look in snapshot global metadata
This checks if the snapshot contains a matching template for the DS
* Comment on test cleanup to explain it was copied
* Removed cluster service field
Today there are a handful of integer settings for `repository-s3`
repositories whose docs link to the page about numeric field types. Yet
these settings are not fields, and do not support floating-point values
either. The convention throughout the rest of the docs is to just call
these things `integer` without linking to anything. This commit aligns
the `repository-s3` docs with this convention.
Here we test reindexing logsdb indices, creating and restoring
snapshots. Note that logsdb uses synthetic source and restoring
source only snapshots fails due to missing _source.
Adds an API which scans all the metadata (and optionally the raw data)
in a snapshot repository to look for corruptions or other
inconsistencies.
Closes https://github.com/elastic/elasticsearch/issues/52622 Closes
ES-8560
If Elasticsearch fails part-way through a multipart upload to S3 it will
generally try and abort the upload, but it's possible that the abort
attempt also fails. In this case the upload becomes _dangling_. Dangling
uploads consume storage space, and therefore cost money, until they are
eventually aborted.
Earlier versions of Elasticsearch require users to check for dangling
multipart uploads, and to manually abort any that they find. This commit
introduces a cleanup process which aborts all dangling uploads on each
snapshot delete instead.
Closes#44971Closes#101169
Let's not mention `DefaultAzureCredentialProvider` so that we can
consider it an implementation detail and avoid committing to any
unwanted BwC guarantees.
Relates #111577
Relates #111344
Adds information about using Azure Managed Identity and Azure Workload
Identity to the relevant docs, and also reworks the docs a bit for
easier reading.
Relates #111344
Make it clear that this API should be used only if the detailed shard
info is needed and only on ongoing snapshots. Remove incorrectly
mentioned `STATE` value.
This PR moves the doPrivileged wrapper closer to the actual deletion
request to ensure the necesary security context is established at all
times. Also added a new repository setting to configure max size for s3
deleteObjects request.
Fixes: #108049
One user reached out mentioning that it would be a good idea to remind
users to re-upload the license after full cluster recovery from snapshot
as one can easily miss this when trying to figure out why some features
aren't working after the restore.
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
The S3 SDK permits changing the maximum number of concurrent connections
that it will open, but today there's no way to adjust this setting
within Elasticsearch. This commit adds a setting for this parameter.
Specifying `?master_timeout=-1` on an API which performs a cluster state
update means that the cluster state update task will never time out
while waiting in the pending tasks queue. However this parameter is also
re-used in a few places where a timeout of `-1` means something else,
typically to timeout immediately. This commit fixes those places so that
`?master_timeout=-1` consistently means to wait forever.
Allegedly-S3-compatible APIs are very popular these days, but many
third-party systems offering such an API also support a shared
filesystem interface. Shared filesystem protocols such as NFS are much
better specified than the S3 API, and experience shows that they lead to
fewer compatibility headaches. This commit adds a recommendation to the
`repository-s3` docs to consider such an interface instead.
To prevent leaking sensitive information such as credentials and keys in logs, this
commit prevents configuring some restricted loggers (currently `org.apache.http`
and `com.amazonaws.request`) at high verbosity unless the NetworkTraceFlag
(`es.insecure_network_trace_enabled`) is enabled.
Moves some of the detail about S3 storage classes to their own section
for easier linking, and adds a note about `intelligent_tiering` archive
classes.
* docs: fix numbering in restore-snapshot.asciidoc
Fix numbering in "Restore an entire cluster" section.
Remove "3." for "Universal Profiling" and add "3." just before "If you use Elasticsearch security features"
* Keep ID, fix list rendering
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
This commit updates the docs to call out that custom certificate
authorities for S3 repositories will need to be reinstalled every time
ES is upgraded, is the node is using the bundled JDK
These docs have various subsections under the _Client settings_ and
_Repository settings_ sections which are nothing to do with those
settings. The reasons for this are historical and no longer relevant.
This commit moves these subsections together and up a level so that they
better reflect the structure of the information.
We don't mention linearizable registers in the snapshot/restore docs
today, but these things are verified for correctness by the repository
analysis API and some users with incorrect repository implementations
struggle to understand the verification errors. This commit adds some
docs to describe them and their various implementations.
Adds the `?register_operation_count` parameter that allows to control
the number of register operations separately from the number of regular
blob operations.
It's often useful to quote these docs to users encountering problems
with their not-quite-S3-compatible storage system. In practice we don't
need to quote the bits in the middle but we do need the last sentence
about working with the supplier to address incompatibilities. This
commit reorders things so that the most commonly quoted sentences form a
standalone paragraph.
👋 howdy, team! Expanding reference to [internal](https://github.com/elastic/cloud/pull/118105) update, we've just confirmed ILM requires the repository name to be the same among migrating clusters. This is a hard block for Searchable Snapshots which requires un-Searchable-Snapshotting or redoing migration to resolve.
* [DOC+] snapshot-restore single index example
👋🏼 howdy, team! I'd like to append an example to snapshot-restore a single index. Support usually points users to [this page](https://www.elastic.co/guide/en/elasticsearch/reference/master/restore-snapshot-api.html) but then users attempt the `rename_pattern` example (which makes sense!). I'd like to point them to a more literal "close index > restore on that index" example in the future.
* Fix test failure and reword
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>