Today we imply that CCR will automatically fall back to a full index
copy if it cannot replay any missing history. This was true for earlier
versions of the design but we ultimately decided not to do this without
adjusting the docs to match. This commit fixes the docs.
Currently, existing runtime fields can be updated, but they cannot be removed. That allows to correct potential mistakes, but once a runtime field is added to the index mappings, it is not possible to remove it.
With this commit we introduce the ability to remove an existing runtime field by providing a null value for it through the put mapping API. If a field with such name does not exist, such specific instruction will have no effect on other existing runtime fields.
Note that the removal of runtime fields makes the recently introduced assertRefreshItNotNeeded assertion trip, because when each local node merges mappings back in, the runtime fields that were previously removed by the master node, get added back again locally. This is only a problem for the assertion that verifies that the removed refresh operation is never needed. We worked around this by tweaking the assertion to ignore runtime fields completely, for simplicity, by assertion on the serialized merged mappings and incoming mappings without the corresponding runtime section.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Today we rely on blob stores behaving in a certain way so that they can be used
as a snapshot repository. There are an increasing number of third-party blob
stores that claim to be S3-compatible, but which may not offer a suitably
correct or performant implementation of the S3 API. We rely on somesubtle
semantics with concurrent readers and writers, but some blob stores may not
implement it correctly. Hitting a corner case in the implementation may be rare
in normal use, and may be hard to reproduce or to distinguish from an
Elasticsearch bug.
This commit introduces a new `POST /_snapshot/.../_analyse` API which exercises
the more problematic corners of the repository implementation looking for
correctness bugs and measures the details of the performance of the repository
under concurrent load.
This change adds a new cluster privilege cancel_task that allows to:
Cancel running tasks (_tasks/_cancel).
Cancel and delete async searches.
Today the 'manage' cluster privilege is required to cancel tasks and
to delete async searches when security features are enabled.
This new focused privilege allows to handle tasks and searches only.
The change also adds the privilege to the internal 'kibana_system'
and '_async_search' roles. They both need to be able to cancel tasks
and delete async searches.
Relates #67965
Add a `max_analyzed_offset` query parameter to allow users
to limit the highlighting of text fields to a value less than or equal to the
`index.highlight.max_analyzed_offset`, thus avoiding an exception when
the length of the text field exceeds the limit. The highlighting still takes place,
but stops at the length defined by the new parameter.
Closes: #52155
This moves the execution of the `searchable_snapshot` action before the
`migrate` action in the `cold` and `frozen` phases for more efficient
data migration (ie. mounting it as a searchable snapshot directly on the
target tier)
Now that searchable_snapshot can precede other actions in the same phase
(eg. in frozen it is followed by `migrate`) we need to allow the mounted
index to resume executing the ILM policy starting with a step that's part
of a new action (ie. migrate).
This adds support to resume the execution of the mounted index from another
action.
With older versions, the execution would resume from the PhaseCompleteStep
as it was the last action in a phase, which was handled as a special case
in the `CopyExecutionStateStep`. This generalises the `CopyExecutionStateStep`
to be able to resume from any `StepKey`.
This PR expands the meaning of `include_global_state` for snapshots to include system indices. If `include_global_state` is `true` on creation, system indices will be included in the snapshot regardless of the contents of the `indices` field. If `include_global_state` is `true` on restoration, system indices will be restored (if included in the snapshot), regardless of the contents of the `indices` field. Index renaming is not applied to system indices, as system indices rely on their names matching certain patterns. If restored system indices are already present, they are automatically deleted prior to restoration from the snapshot to avoid conflicts.
This behavior can be overridden to an extent by including a new field in the snapshot creation or restoration call, `feature_states`, which contains an array of strings indicating the "feature" for which system indices should be snapshotted or restored. For example, this call will only restore the `watcher` and `security` system indices (in addition to `index_1`):
```
POST /_snapshot/my_repository/snapshot_2/_restore
{
"indices": "index_1",
"include_global_state": true,
"feature_states": ["watcher", "security"]
}
```
If `feature_states` is present, the system indices associated with those features will be snapshotted or restored regardless of the value of `include_global_state`. All system indices can be omitted by providing a special value of `none` (`"feature_states": ["none"]`), or included by omitting the field or explicitly providing an empty array (`"feature_states": []`), similar to the `indices` field.
The list of currently available features can be retrieved via a new "Get Snapshottable Features" API:
```
GET /_snapshottable_features
```
which returns a response of the form:
```
{
"features": [
{
"name": "tasks",
"description": "Manages task results"
},
{
"name": "kibana",
"description": "Manages Kibana configuration and reports"
}
]
}
```
Features currently map one-to-one with `SystemIndexPlugin`s, but this should be considered an implementation detail. The Get Snapshottable Features API and snapshot creation rely upon all relevant plugins being installed on the master node.
Further, the list of feature states included in a given snapshot is exposed by the Get Snapshot API, which now includes a new field, `feature_states`, which contains a list of the feature states and their associated system indices which are included in the snapshot. All system indices in feature states are also included in the `indices` array for backwards compatibility, although explicitly requesting system indices included in a feature state is deprecated. For example, an excerpt from the Get Snapshot API showing `feature_states`:
```
"feature_states": [
{
"feature_name": "tasks",
"indices": [
".tasks"
]
}
],
"indices": [
".tasks",
"test1",
"test2"
]
```
Co-authored-by: William Brafford <william.brafford@elastic.co>
Currently runtime fields from search requests don't appear in the output of the
field capabilities API, but some consumer of runtime fields would like to see
runtime section just like they are defined in search requests reflected and
merged into the field capabilities output.
This change adds parsing of a "runtime_mappings" section equivallent to the one
on search requests to the `_field_caps` endpoint, passes this section down to
the shard level where any runtime fields defined here overwrite the mapping of
the targetet indices.
Closes#68117
Introduce eql search status API,
that reports the status of eql stored or async search.
GET _eql/search/status/<id>
The API is restricted to the monitoring_user role.
For a running eql search, a response has the following format:
{
"id" : <id>,
"is_running" : true,
"is_partial" : true,
"start_time_in_millis" : 1611690235000,
"expiration_time_in_millis" : 1611690295000
}
For a completed eql search, a response has the following format:
{
"id" : <id>,
"is_running" : false,
"is_partial" : false,
"expiration_time_in_millis" : 1611690295000,
"completion_status" : 200
}
Closes#66955
This partially reverts #64016 and and adds #67839 and adds
additional tests that would have caught issues with the changes
in #64016. It's mostly Nik's code, I am just cleaning things up
a bit.
Co-authored-by: Nik Everett <nik9000@gmail.com>
Moving towards grouping of data types in the field caps API
the internal data type `DATETIME_NANOS` introduced for `date_nanos`
support is eliminated.
Relates: #67722
Follows: #67666
* Integrate "fields" API into QL (#68467)
* QL: retry SQL and EQL requests in a mixed-node (rolling upgrade) cluster (#68602)
* Adapt nested fields extraction from "fields" API output to the new un-flattened structure (#68745)
* Fixing Painless tests.
* Update runtime field context to fix test cases.
* Remove watcher logging from usage API and replace test.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit adds support for the recently introduced partial searchable snapshot (#68509) to ILM.
Searchable snapshot ILM actions may now be specified with a `storage` option, specifying either
`full_copy` or `shared_cache` (similar to the "mount" API) to mount either a full or partial
searchable snapshot:
```json
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"cold": {
"actions": {
"searchable_snapshot" : {
"snapshot_repository" : "backing_repo",
"storage": "shared_cache"
}
}
}
}
}
}
```
Internally, If more than one searchable snapshot action is specified (for example, a full searchable
snapshot in the "cold" phase and a partial searchable snapshot in the "frozen" phase) ILM will
re-use the existing snapshot when doing the second mount since a second snapshot is not required.
Currently this is allowed for actions that use the same repository, however, multiple
`searchable_snapshot` actions for the same index that use different repositories is not allowed (the
ERROR state is entered). We plan to allow this in the future in subsequent work.
If the `storage` option is not specified in the `searchable_snapshot` action, the mount type
defaults to "shared_cache" in the frozen phase and "full_copy" in all other phases.
Relates to #68605
This commit spells out how important repository reliability is to
searchable snapshots, and also documents a procedure for taking a backup
of a snapshot repository.
Relates #54944
The doc is misleading : The following intervals search returns documents containing `my favorite food` **immediately** followed by `hot water` or `cold porridge`
max_gaps apply only to the match query and is not used for checking proximity with the other match, the example given actually`This search would match a my_text value of my favorite food is cold`
Co-authored-by: Julien Guay <guay_j@yahoo.fr>
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.
The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).
Relates to #60848
Fixed the inconsistencies regarding NULL argument handling.
NULL literal vs NULL field value as function arguments in some case
resulted in different function return values.
Functions should return with the same value no matter if the argument(s)
came from a field or from a literal.
The introduced integration test tests if function calls with same
argument values (regardless of literal/field) will return with the
same output (also checks if newly added functions are added to the
testcases).
Fixed the following functions:
* Insert: NULL start, length and replacement arguments (as fields) also
result in NULL return value instead of returning the input.
* Locate: NULL pattern results in NULL return value, NULL optional start
argument handled the same as missing start argument
* Replace: NULL pattern and replacement results in NULL instead of
returning the input
* Substring: NULL start or length results in NULL instead of returning
the input
Fixes#58907
Changes:
- Reworks the ILM tutorial to focus on the Elastic Agent and a built-in ILM policy
- Updates several screenshots in the docs for the new ILM UI
Co-authored-by: debadair <debadair@elastic.co>
A frozen tier is backed by an external object store (like S3) and caches only a
small portion of data on local disks. In this way, users can reduce hardware
costs substantially for infrequently accessed data. For the frozen tier we only
pull in the parts of the files that are actually needed to run a given search.
Further, we don't require the node to have enough space to host all the files.
We therefore have a cache that manages which file parts are available, and which
ones not. This node-level shared cache is bounded in size (typically in relation
to the disk size), and will evict items based on a LFU policy, as we expect some
parts of the Lucene files to be used more frequently than other parts. The level
of granularity for evictions is at the level of regions of a file, and does not
require evicting full files. The on-disk representation that was chosen for the
cold tier is not a good fit here, as it won't allow evicting parts of a file.
Instead we are using fixed-size pre-allocated files and have implemented our own
memory management logic to map regions of the shard's original Lucene files onto
regions in these node-level shared files that are representing the on-disk
cache.
This PR adds the core functionality to searchable snapshots to power such a
frozen tier:
- It adds the node-level shared cache that evicts file regions based on a LFU
policy
- It adds the machinery to dynamically download file regions into this cache and
serve their contents when searches execute.
- It extends the mount API with a new parameter, `storage`, which selects the
kind of local storage used to accelerate searches of the mounted index. If set
to `full_copy` (default, used for cold tier), each node holding a shard of the
searchable snapshot index makes a full copy of the shard to its local storage.
If set to `shared_cache`, the shard uses the newly introduced shared cache,
only holding a partial copy of the index on disk (used for frozen tier).
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: David Turner <david.turner@elastic.co>