Commit graph

147 commits

Author SHA1 Message Date
David Turner
eeedb98c60
Make cluster health API cancellable (#96990)
This API can be quite heavy in large clusters, and might spam the
`MANAGEMENT` threadpool queue with work for clients that have long-since
given up. This commit adds some basic cancellability checks to reduce
the problem.

Backport of #96551 to 7.17
2023-06-22 08:05:03 +01:00
Mark Vieira
47c6fd34da
[7.17] Add JUnit rule based integration test cluster orchestration framework… (#92517)
This commit adds a new test framework for configuring and orchestrating
test clusters for both Java and YAML REST testing. This will eventually
replace the existing "test-clusters" Gradle plugin and the build-time
cluster orchestration.
2022-12-22 17:48:07 -08:00
Pooya Salehi
cd96706053
[7.17] Wait for task on master in testGetMappingsCancellation (#91709) (#91916) (#91926)
* Wait for task on master in testGetMappingsCancellation (#91709) (#91916)

* replace List.of usage
2022-11-25 04:04:50 -05:00
Pooya Salehi
83c19aec5f
Ensure tasks with banned parents always get cancelled (#90188) (#90248)
The check used to entirely skip parent lookup relies on
ConcurrentHashMap#isEmpty() which could return inconsistent results, and
potentially skip the cancellation of a task with a banned parent upon
registration, and it doesn't seem to have a benefit considering the hash
code computation.

Closes #88201
2022-09-22 21:33:11 +09:30
Pooya Salehi
a9b79a7386
Increase assertAllCancellableTasksAreCancelled timeout (#89744) (#89749)
The following two failures happen rarely, but both fail in the same
`assertBusy` block. I don't have a clue why, and couldn't reproduce
them. Considering the amount of checks in that block, maybe a larger
timeout is more suitable. (Also it seems from the test history, it is
not uncommon for those tests to take 2-3s, so every few thousand runs
hitting the 10s timeout seems likely, IMO!)  Relates
https://github.com/elastic/elasticsearch/issues/88884,
https://github.com/elastic/elasticsearch/issues/88201
2022-08-31 20:54:09 +09:30
Pooya Salehi
b08b30169f
Log more details in TaskAssertions (#88864) (#88881) 2022-07-28 17:55:24 +09:30
David Turner
021fbeb9cd
Make GetIndexAction cancellable (#87731)
The get-indices API does some nontrivial work on the master and at high
index counts the response may be very large and could take a long time
to compute. Some clients will time out and retry if it takes too long.
Today this API is not properly cancellable which leads to a good deal of
wasted work in this situation, and the potentially-enormous response is
serialized on a transport worker thread.

With this commit we make the API cancellable and move the serialization
to a `MANAGEMENT` thread.

Backport of #87681
Relates #77466
2022-06-16 06:57:27 -04:00
Rene Groeschke
c51abc6570
[7.17] Port QA projects to use javaRestTest and yamlRestTest (#86703) (#86725)
Backports the following commits to 7.17:
 - Port QA projects to use javaRestTest and yamlRestTest (#86703)
2022-05-12 19:33:38 +02:00
Sylvain Wallez
37c06a0508
[client] Fix decompressed response headers (#63419) (#84581)
When a gzip-encoded response is decompressed the response should no more
have a content-encoding header and content-length should be set to
"unknown". GzipDecompressingEntity correctly does this for the entity
but the response still reported the original response's content-encoding
and content-length headers.
2022-03-02 13:36:00 -05:00
David Turner
c39a08fbaa Rename InternalTestCluster#getMasterNodeInstance (#83407)
This method's name is trappy: it is easy to misinterpret it as returning
an instance from the elected master, but in fact it uses any
master-eligible node. If you want an instance from the elected master,
you have to use `getCurrentMasterNodeInstance()` instead.

This commit renames the method to clarify that it might not get an
instance from the elected master, and adds docs with cross-refs to help
developers choose the right method.
2022-02-08 15:32:05 +01:00
Rene Groeschke
c389c85e8c
[7.17] Update build tools internal dependencies (#82875) (#82947) 2022-01-24 17:56:37 +01:00
Artem Prigoda
f690f778e0
[7.17] Make testSortAndPaginateWithInProgress test stable (#80530) (#82759)
Start asserting snapshots in progress only in case when they reach
a stable state (the first index has finished, the second has been
blocked).

* Move LARGE_SNAPSHOT_SETTINGS to AbstractSnapshotRestTestCase to be reused
* Check that test-index-2 is blocked
* Be more clear that the 2nd index is blocked

Fixes #79779
Relates #78507
2022-01-19 09:31:40 +01:00
David Kyle
cfddef473b
Mute failing RestGetSnapshotsIT test (#79780) (#80141)
See #79779
2021-11-01 10:28:25 -04:00
Mark Vieira
bcfbf00074 Reformat Elasticsearch source 2021-10-27 15:23:15 -07:00
Armin Braun
5d043e02a2
Implement from_sort_value Parameter in Get Snapshots API (#77618) (#79318)
Add `from_sort_value` parameter to allow for filtering snapshots by comparing to concrete sort column
values similar to the existing after parameter`.
2021-10-17 17:53:15 +02:00
Armin Braun
08a83659f3
Add Filtering by SLM Policy to Get Snapshots API (#77321) (#79308)
It's in the title, add new `slm_policy_filter` param as a filter to the get snapshots API.
2021-10-17 13:50:25 +02:00
Nikola Grcevski
a7ae031ce7
[7.x] [TEST] Switch to persistent settings in java tests (#78562) (#79103)
Migrate to persistent cluster settings in Java tests

We are deprecating transient settings, therefore this
PR changes uses of transient cluster settings to
persistent cluster settings.
2021-10-13 20:10:11 -04:00
Chris Hegarty
964180ba99
[7.x] Fix split package org.elasticsearch.common.xcontent (#79061)
* Fix split package org.elasticsearch.common.xcontent

* Fix test
2021-10-13 15:43:41 +01:00
Rene Groeschke
30f0bd8388
Enforce common dependency configuration setup (#78310) (#78430)
* Enforce common dependency configuration setup

* Tweak dependencies for plugin sql server tests

* Fix test runtime dependencies after disabling transitive support
2021-09-29 05:15:19 -04:00
Rene Groeschke
c9648b079f
[7.x] Do not create unused testCluster (#78312)
* Do not create unused testCluster (#77581)

* Do not create unused testCluster

This avoids creating test clusters that are not required during the build.
We use lazy configuration here on testClusters and only instantiate them as theyre

* Do not fail on run task (debug)

* Create more test cluster lazy

* Make more test cluster lazy

* Avoid creating unused testcluster

* Fix PluginBuildPlugin

* Fix disabling geo db download

* Fix cluster setup in repository-multi-version

* Polishing

* Fix issue with irretic groovy ogic

* Fix bwc tests

* Fix more bwcTests

* Fix more bwc tests

* Fix more bwc tests

* Fix more bwc tests

* Fix typo

* Minor polishing

* Fix rolling upgrade tests

* Fix cluster config in sql qa mixedcluster project

* Fix more bwc tests

* Clean up before review

* Document test cluster usage

* Api polising after Review

provide useCluster(Provider) method to TestClusterAware

Ideally we take this a step further and realize those test clusters only on use.
But out of scope of this PR.

* Allow gradle provider as value for nonSystemProperties

* Some simplification on test configuration

* Fix typo in rest test config

* Fix more typos

* Fix another typo

* Fix more typos

* Fix runEqlCorrectnessNode run task and cluster configuration (#78249)

* Fix merge issue

* Fix bwc tests after backporting
2021-09-27 14:00:29 -04:00
Armin Braun
6716f537fe
Fix Get Snapshots Request Cancellation with ignore_unavailable=true (#78004) (#78056)
Short-circuit the failure method when cancelled just like in the fail fast case.
Also, remove the special case handling that asserts but swallows exceptions in production
for when ignoring unavailable to not swallow the task cancellation exception.

closes #77980
2021-09-21 09:39:49 +02:00
Armin Braun
85431d8106
Implement Sort By Repository Name in Get Snapshots API (#77049) (#77112)
This one is the last sort column not yet implemented but used by Kibana.
2021-09-01 14:23:26 +02:00
Armin Braun
84d7831817
Add Sort By Shard Count and Failed Shard Count to Get Snapshots API (#77011) (#77018)
It's in the title. As requested by the Kibana team, adding these two additional sort columns.

relates #74350
2021-08-30 14:58:48 +02:00
Armin Braun
d356a4b603
Implement Numeric Pagination in Get Snapshots API (#76532)
* Return Total Result Count and Remaining Count in Get Snapshots Response (#76150)

Add total result count and remaining count to get snapshots response.

* Implement Numeric Offset Parameter in Get Snapshots API (#76233)

Add numeric offset parameter to this API.

Relates #74350
2021-08-14 18:29:14 +02:00
Armin Braun
9326448c89
Fix RestSnapshotsStatusCancellationIT (#75524) (#76527)
Reused the fix from the other snapshot API test.

closes #75075
2021-08-14 17:24:41 +02:00
Armin Braun
a5a5aec803
Fix NPE in Cat Snapshots API Default (#76161)
When backporting get-snapshots pagination I missed the cat snapshots API that needed adjustment
to be in line with how `8.x` works as well, leading to an NPE. Fixed by making the code the same
as in `8.x` and adding a test (that should be forward-ported to 8.x as well).

closes #76158
2021-08-05 13:34:43 +02:00
Mark Vieira
486f77df31
Set netty available processors system property for tests globally (#75699) (#75757)
(cherry picked from commit 9d14bc91d7)
2021-07-27 16:47:11 -04:00
Armin Braun
84f0b5c14d
Fix RestGetSnapshotsCancellationIT Failures (#74827) (#75282)
Found this to be the easiest fix, the alternative would have been to actually
wait for all snapshot meta threads to become blocked but that's kind of hacky.

closes #74743
2021-07-13 13:07:27 +02:00
Armin Braun
d64a72c127
Snapshot Pagination and Scalability Improvements Backport to 7.x (#74676)
Backport of the recently introduced snapshot pagination and scalability improvements listed below.
Merged as a single backport because the `7.x` and master snapshot status API logic had massively diverged between master and 7.x. With the work in the below PRs, the logic in master and 7.x once again has been aligned very closely again.

#72842
#73172
#73199
#73570 
#73952
#74236 
#74451 (this one is only partly applicable as it was mainly a change to master to align `master` and `7.x` branches)
2021-06-29 15:16:26 +02:00
Armin Braun
a9b782c883
Dry up HTTP Smoke Tests around Snapshots (#73962) (#74048)
Drying up a few spots of code duplication with these tests. Partly to
reduce the size of PR #73952 that makes use of the smoke test infrastructure.
2021-06-13 22:27:59 +02:00
Ryan Ernst
393ab2d813
Rename o.e.common in libs/core to o.e.core (#73909) (#73920)
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.

relates #73784
backport #73909
2021-06-08 14:17:44 -07:00
Armin Braun
b351dee205
Make SnapshotStatusAction Cancellable (#73818) (#73829)
Same as #72644. This is a much longer running action than normal
get snapshots even so it should definitely be cancellable.
Parallelization for this action will be introduced in a separate PR.
2021-06-07 14:36:34 +02:00
Armin Braun
30da196eca
Make GetSnapshotsAction Cancellable (#72644) (#73820)
If this runs needlessly for large repositories (especially in timeout/retry situations)
it's a significant memory+cpu hit => made it cancellable like we recently did for many
other endpoints.
2021-06-07 13:17:46 +02:00
Rene Groeschke
7c3630989d
Remove internal build logic from public build tool plugins (#72470) (7.x backport) (#72832)
back porting #72470 to 7.x
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and
related plugins and build logic

This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.

It also introduces a set of internal versions of public plugins.

As part of this we also generate the plugin descriptors now.

As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)

We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
2021-06-03 12:43:40 +02:00
Francisco Fernández Castaño
2196c18935
Add support for RestGetMapping cancellation (#72482)
Backport of #72234
2021-04-30 13:06:14 +02:00
Francisco Fernández Castaño
0831bb0891
Fix ClusterStateRestCancellationIT (#72469)
Take into account task cancellation for local requests too

Closes #72056
Backport of #72407
2021-04-29 15:51:25 +02:00
Francisco Fernández Castaño
2daa984bb4
Add support for Rest XPackUsage task cancellation (#72413)
Backport of #72304
2021-04-29 11:02:09 +02:00
Francisco Fernández Castaño
eb6afe47eb
Add support for task cancellation to RestNodesStatsAction (#71907)
Backport #71897
2021-04-20 12:10:44 +02:00
Lyudmila Fokina
3c7731dd3f
Warn users if security is implicitly disabled (#71650)
* Warn users if security is implicitly disabled (#70114)

Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
 clusters.
 This change introduces clear warnings when security features are
 implicitly disabled.
 - a warning header in each REST response if security is implicitly
 disabled;
 - a log message during cluster boot.
2021-04-13 20:51:52 +02:00
Jay Modi
da7f2cf68a
System index descriptors support mixed versions (#71370)
System index descriptors are used to describe a system index, which are
expected to change as new versions are developed. As part of this, the
descriptors had a minimum supported version field so that the contents
within that descriptor would not be applied if there were nodes older
than that version. However, this falls short of being able to
accurately describe what a system index should look like in a given
cluster where there are mixed node versions.

This change moves us towards being able to accurately describe and
know what the system index should look like. A system index is now
able to accept a list of the prior system index descriptor objects
so that clusters with mixed versions can select the appropriate
descriptor and ensure the index is created properly. As the node
versions change during a rolling upgrade, the cluster will then be
able to adapt the system index to the most recent version once all
master and data nodes have been upgraded.

Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Yang Wang <ywangd@gmail.com>
Backport of #71144
2021-04-06 20:36:25 -06:00
David Turner
46d72542e2 Reduce size of MANAGEMENT threadpool on small node (#71171)
Today by default the `MANAGEMENT` threadpool always permits 5 threads
even if the node has a single CPU, which unfairly prioritises management
activities on small nodes. With this commit we limit the size of this
threadpool to the number of processors if less than 5.

Relates #70435
2021-04-06 13:24:00 +01:00
Jason Tedor
42cd3433b9
Pass override settings when creating test cluster (#71203)
Today when creating an internal test cluster, we allow the test to
supply the node settings that are applied. The extension point to
provide these settings has a single integer parameter, indicating the
index (zero-based) of the node being constructed. This allows the test
to make some decisions about the settings to return, but it is too
simplistic. For example, imagine a test that wants to provide a setting,
but some values for that setting are not valid on non-data nodes. Since
the only information the test has about the node being constructed is
its index, it does not have sufficient information to determine if the
node being constructed is a non-data node or not, since this is done by
the test framework externally by overriding the final settings with
specific settings that dicate the roles of the node. This commit changes
the test framework so that the test has information about what settings
are going to be overriden by the test framework after the test provide
its test-specific settings. This allows the test to make informed
decisions about what values it can return to the test framework.
2021-04-02 10:51:47 -04:00
Joe Gallo
50b0d11ddd
[REST Compatible API] Route refactoring (addendum) (#70168) (#70249) 2021-03-10 11:51:29 -05:00
Jay Modi
92c715d878
Introduce system index types including external (#69744)
This commit introduces system index types that will be used to
differentiate behavior. Previously system indices were all treated the
same regardless of whether they belonged to Elasticsearch, a stack
component, or one of our solutions. Upon further discussion and
analysis this decision was not in the best interest of the various
teams and instead a new type of system index was needed. These system
indices will be referred to as external system indices. Within external
system indices, an option exists for these indices to be managed by
Elasticsearch or to be managed by the external product.

In order to represent this within Elasticsearch, each system index will
have a type and this type will be used to control behavior.

Closes #67383
Backport of #68919
2021-03-02 09:31:16 -07:00
David Turner
7d11fe661a Make indices stats requests cancellable (#69174)
Relates #55550
2021-02-25 11:51:22 +00:00
David Turner
926238917c Make recovery APIs cancellable (#69177)
Relates #55550
2021-02-25 09:24:32 +00:00
Yannick Welsch
3027131fb9 Lazily load soft-deletes for searchable snapshot shards (#69203)
Opening a Lucene index that supports soft-deletes currently creates the liveDocs bitset eagerly. This requires scanning
the doc values to materialize the liveDocs bitset from the soft-delete doc values. In order for searchable snapshot shards
to be available for searches as quickly as possible (i.e. on recovery, or in case of FrozenEngine whenever a search comes
in), they should read as little as possible from the Lucene files.

This commit introduces a LazySoftDeletesDirectoryReaderWrapper, a variant of Lucene's
SoftDeletesDirectoryReaderWrapper that loads the livedocs bitset lazily on first access. It is special-tailored to
ReadOnlyEngine / FrozenEngine as it only operates on non-NRT readers.
2021-02-23 09:24:25 +01:00
David Turner
4b8c8f8d76 Make GET /_cat/segments cancellable (#69020)
A small followup to #67413 and #68965: the underlying actions of the
`GET /_cat/segments` API are now cancellable, so we may as well cancel
them if needed.
2021-02-16 09:53:20 +00:00
David Turner
77eb32eddc Indices segments: bg serialize, make cancellable (#68965)
The response to an `IndicesSegmentsAction` might be large, perhaps 10s
of MBs of JSON, and today it is serialized on a transport thread. It
also might take so long to respond that the client times out, resulting
in the work needed to compute the response being wasted.

This commit introduces the `DispatchingRestToXContentListener` which
dispatches the work of serializing an `XContent` response to a
non-transport thread, and also makes `TransportBroadcastByNodeAction`
sensitive to the cancellability of its tasks.

It uses these two features to make the `RestIndicesSegmentsAction`
serialize its response on a `MANAGEMENT` thread, and to abort its work
more promptly if the client's channel is closed before the response is
sent.
2021-02-16 08:26:23 +00:00
Gordon Brown
2fde28e318
[7.x] Introduce "Feature States" for managing snapshots of system indices (#63513)
This PR expands the meaning of `include_global_state` for snapshots to include system indices. If `include_global_state` is `true` on creation, system indices will be included in the snapshot regardless of the contents of the `indices` field. If `include_global_state` is `true` on restoration, system indices will be restored (if included in the snapshot), regardless of the contents of the `indices` field. Index renaming is not applied to system indices, as system indices rely on their names matching certain patterns. If restored system indices are already present, they are automatically deleted prior to restoration from the snapshot to avoid conflicts.

This behavior can be overridden to an extent by including a new field in the snapshot creation or restoration call, `feature_states`, which contains an array of strings indicating the "feature" for which system indices should be snapshotted or restored. For example, this call will only restore the `watcher` and `security` system indices (in addition to `index_1`):

```
POST /_snapshot/my_repository/snapshot_2/_restore
{
  "indices": "index_1",
  "include_global_state": true,
  "feature_states": ["watcher", "security"]
}
```

If `feature_states` is present, the system indices associated with those features will be snapshotted or restored regardless of the value of `include_global_state`. All system indices can be omitted by providing a special value of `none` (`"feature_states": ["none"]`), or included by omitting the field or explicitly providing an empty array (`"feature_states": []`), similar to the `indices` field.

The list of currently available features can be retrieved via a new "Get Snapshottable Features" API:
```
GET /_snapshottable_features
```

which returns a response of the form:
```
{
    "features": [
        {
            "name": "tasks",
            "description": "Manages task results"
        },
        {
            "name": "kibana",
            "description": "Manages Kibana configuration and reports"
        }
    ]
}
```

Features currently map one-to-one with `SystemIndexPlugin`s, but this should be considered an implementation detail. The Get Snapshottable Features API and snapshot creation rely upon all relevant plugins being installed on the master node.

Further, the list of feature states included in a given snapshot is exposed by the Get Snapshot API, which now includes a new field, `feature_states`, which contains a list of the feature states and their associated system indices which are included in the snapshot. All system indices in feature states are also included in the `indices` array for backwards compatibility, although explicitly requesting system indices included in a feature state is deprecated. For example, an excerpt from the Get Snapshot API showing `feature_states`:
```
"feature_states": [
    {
        "feature_name": "tasks",
        "indices": [
            ".tasks"
        ]
    }
],
"indices": [
    ".tasks",
    "test1",
    "test2"
]
```

Co-authored-by: William Brafford <william.brafford@elastic.co>
2021-02-11 15:34:09 -07:00