Commit graph

279 commits

Author SHA1 Message Date
David Roberts
6e392a317d
Add processor architectures to cluster stats (#68264)
This change adds a new "architectures" section to the
cluster stats, containing a summary of how many nodes
in the cluster are on each processor architecture.

The intention is to make it easier to see whether
clusters are running on aarch64, or mixed x86_64/aarch64,
which may aid support as aarch64 becomes more commonly
used.
2021-02-02 09:48:20 +00:00
David Turner
2adeb4a666
Expand and consolidate networking docs (#68051)
Today's network config docs are split into "Network", "HTTP" and
"Transport" pages, with unclear relationships between them. We often
encounter users with weird configs that indicate they don't really
understand how these settings all relate. In fact these pages are all
very interrelated, and the HTTP and Transport pages are almost all only
for advanced users. This commit brings these docs into a single page and
rewords some things to try and guide users away from the advanced
settings unless their configuration needs all the extra complexity.

It also adds a section entitled "Binding and publishing" which clarifies
the meanings of the `bind_host` and `publish_host` parameters. This is
also a common source of confusion amongst users.

It also clarifies that many of these settings accept a list of
addresses, and warns that this may not be what you want. Closes #67956.

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2021-02-01 13:06:20 +00:00
Lee Hinman
ac1433d300
Add index creation version stats to cluster stats (#68141)
This commit adds statistics about the index creation versions to the `/_cluster/stats` endpoint. The
stats look like:

```
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "indices" : {
    "count" : 3,
    ...
    "versions" : [
      {
        "version" : "8.0.0",
        "index_count" : 1,
        "primary_shard_count" : 2,
        "total_primary_size" : "8.6kb",
        "total_primary_bytes" : 8831
      },
      {
        "version" : "7.11.0",
        "index_count" : 1,
        "primary_shard_count" : 1,
        "total_primary_size" : "4.6kb",
        "total_primary_bytes" : 4230
      }
    ]
  },
  ...
}
```

(`total_primary_size` is only shown with the `?human` flag)

This is useful for telemetry as it allows us to see if/when a cluster has indices created on a
previous version that would need to be either upgraded or supported during an upgrade.
2021-01-28 13:58:21 -07:00
James Rodewig
3e34247570
[DOCS] Add security privileges to cluster API docs (#67589) 2021-01-19 10:18:59 -05:00
Ioannis Kakavas
bd873698bc
Ensure CI is run in FIPS 140 approved only mode (#64024)
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.

This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.

It also addresses a number of tests that would fail in approved only mode
Mainly:

    Tests that use PBKDF2 with a password less than 112 bits (14char). We
    elected to change the passwords used everywhere to be at least 14
    characters long instead of mandating
    the use of pbkdf2_stretch because both pbkdf2 and
    pbkdf2_stretch are supported and allowed in fips mode and it makes sense
    to test with both. We could possibly figure out the password algorithm used
    for each test and adjust password length accordingly only for pbkdf2 but
    there is little value in that. It's good practice to use strong passwords so if
    our docs and tests use longer passwords, then it's for the best. The approach
    is brittle as there is no guarantee that the next test that will be added won't
    use a short password, so we add some testing documentation too.
    This leaves us with a possible coverage gap since we do support passwords
    as short as 6 characters but we only test with > 14 chars but the
    validation itself was not tested even before. Tests can be added in a followup,
    outside of fips related context.

    Tests that use a PKCS12 keystore and were not already muted.

    Tests that depend on running test clusters with a basic license or
    using the OSS distribution as FIPS 140 support is not available in
    neither of these.

Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
2020-12-23 21:00:49 +02:00
James Rodewig
10b036e934
[DOCS] Fix timeout parameter defaults (#66111) 2020-12-21 09:02:06 -05:00
bellengao
d14492ca13
[DOCS] Fix some typos in docs (#66672) 2020-12-21 12:45:51 +02:00
James Rodewig
7c0f193b2c
[DOCS] Fix formatting (#66450) 2020-12-16 11:09:55 -05:00
Adam Locke
be3bc46111
[DOCS] Add description for node info settings. (#66362) 2020-12-15 11:27:42 -05:00
bellengao
e198bb233e
[DOCS] Correct the default value of wait_for_completion query param (#65800)
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2020-12-04 15:52:35 -05:00
James Rodewig
0f406f1734
[DOCS] Add cluster get settings API example (#65754) 2020-12-02 10:37:01 -05:00
James Rodewig
72621873fd
[DOCS] Remove erroneous flat_settings query param (#65670) (#65745)
Co-authored-by: Thiago Souza <thiago@elastic.co>
2020-12-02 09:42:35 -05:00
Wylie Conlon
10ee0f2878
Clarify field data cache behavior in docs (#64375)
* Clarify that field data cache includes global ordinals
* Describe that the cache should be cleared once the limit is reached
* Clarify that the `_id` field does not supported aggregations anymore
* Fold the `fielddata` mapping parameter page into the `text field docs
* Improve cross-linking
2020-11-20 13:53:23 -08:00
James Rodewig
1ea83359bb
[DOCS] Fix case for 'Boolean' (#64299) 2020-10-29 09:04:43 -04:00
Adam Locke
789ee2d73e
[DOCS] Combining important config settings into a single page (#63849)
* Combining important config settings into a single page.

* Updating ids for two pages causing link errors and implementing redirects.
2020-10-19 10:02:22 -04:00
Lee Hinman
0c3599577e
Add index.routing.allocation.prefer._tier setting (#62589)
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:

```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```

If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.

This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.

Subsequent work will change the ILM migration to make additional use of this setting.

Relates to #60848
2020-09-18 14:49:59 -06:00
James Rodewig
136275e3e6
[DOCS] Fix typo in nodes stats docs (#61601) (#61716)
Co-authored-by: Henry <henryloh@ucla.edu>
2020-08-31 09:29:40 -04:00
Lee Hinman
28cec563b1
Allocate newly created indices on data_hot tier nodes (#61342)
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
2020-08-27 12:51:12 -06:00
James Rodewig
a94e5cb7c4
[DOCS] Replace Wikipedia links with attribute (#61171) 2020-08-17 09:44:24 -04:00
James Rodewig
ae01606785
[DOCS] Replace twitter dataset in docs (#60604) 2020-08-03 12:49:56 -04:00
Tim Brooks
b1a6271ec8
Add configured indexing memory limit to node stats (#60342)
This commit adds the configured memory limit to the node stats API.
2020-07-29 11:20:59 -06:00
David Turner
940d618186
Log and track open/close of transport connections (#60297)
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.

This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
2020-07-28 16:58:00 +01:00
James Rodewig
441c3a21b1
[DOCS] Update my-index examples (#60132)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 14:46:39 -04:00
Tim Brooks
5c227dac88
Implement human readable indexing pressure stats (#60022)
The indexing pressure stats do not currently have human readable
variants. This commit add human readable variants and updates the
documentation.
2020-07-22 09:54:51 -06:00
James Rodewig
80b674fb25
[DOCS] Reformat snippets to use two-space indents (#59973) 2020-07-21 12:24:26 -04:00
Tim Brooks
08506de861
Add indexing pressure documentation (#59456)
This commit adds documentation about the new indexing pressure memory
limit setting and exposure of this metrics in node stats.
2020-07-20 19:35:26 -06:00
David Turner
7bb748da8c
Remove sporadic min/max usage estimates from stats (#59755)
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.

This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
2020-07-20 14:48:53 +01:00
James Rodewig
2be9db01c8
[DOCS] Replace datatype with data type (#58972) 2020-07-07 13:52:10 -04:00
James Rodewig
e5a1269e6f
[DOCS] Add data streams to cluster APIs docs (#58945)
Makes existing docs for the cluster health and cluster state APIs aware
of data streams.
2020-07-02 17:04:55 -04:00
David Turner
83d6589b2a
Account for remaining recovery in disk allocator (#58029)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
2020-07-01 08:04:45 +01:00
Lisa Cawley
27111f9faa
[DOCS] Updates pull and issue release attributes (#58348) 2020-06-18 12:38:49 -07:00
David Turner
dc3e047a16
Add admonition to cluster state instability note (#57985)
We document that the cluster state API is an internal representation which may
change, but apparently not emphatically enough. This commit adds a `NOTE:`
admonition to this paragraph.
2020-06-11 15:26:26 +01:00
Lisa Cawley
8b9293b3bf
[DOCS] Replace docdir attribute with es-repo-dir (#57489) 2020-06-01 15:55:05 -07:00
James Rodewig
b8a4e00b11
[DOCS] Document dynamic and static setting types (#56919) 2020-05-19 12:10:59 -04:00
Théophile Helleboid - chtitux
0c00a982be
Docs fix node_id spec for secure settings reload API (#55712)
Fix docs typo for the `node_id` parameter in the secure settings reload API.
2020-05-05 11:20:06 +03:00
David Turner
b04a6f4766
Improve same-shard allocation explanations (#56010)
I see occasional confusion about the explanations emitted by the same-shard
allocation decider, particularly amongst new users setting up a single-node
cluster and trying to determine why their cluster has `yellow` health. For
example:

    the shard cannot be allocated to the same node on which a copy of the shard
    already exists

This is technically correct but it's quite a complicated sentence. Also, by
starting with "the shard cannot be allocated" it makes it sound like this is
the problem, whereas in fact this message is a good thing and users should
typically focus their attention elsewhere.

This commit simplifies the wording of these messages and makes them sound more
positive, for example:

    a copy of this shard is already allocated to this node
2020-04-30 16:58:06 +01:00
Igor Motov
b909cee8e9
Expose agg usage in Feature Usage API (#55732)
* Expose agg usage in Feature Usage API

Counts usage of the aggs and exposes them on the _nodes/usage/.

Closes #53746

* Refactor to include non value sources aggregations

* Fix reported values source type for parent and children aggs

* Refactor SearchModule constructor

* Fix subtype in TTest and IPRanges

* Fix more subtypes in aggs that don't register themselves

* Fix doc tests

* Fix docs

* Fix ScriptedMetricAggregatorTests

* Fix compilation issues after merge

* Fix merge fallout

* This gets stale quickly...

* Address review comments

* Fix tests that were missing proper agg registration in the search module

* Fix ScriptedMetricAggregatorTests

* Address review comments

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-04-30 09:49:59 -04:00
David Turner
10ab397d7f
Adjust docs for voting config exclusions API (#55006)
In #50836 we deprecated the existing voting config exclusions API and added a
new one. This commit adjust the docs to match.
2020-04-20 19:47:09 +01:00
James Rodewig
399bc86574
[DOCS] Document analysis/mapping response for cluster stats API (#55054)
PR #51260 moved usage counts about mapping field types and analysis to
the `_cluster/stats` API.

This documents those stats in the response section of the cluster stats
API docs.
2020-04-17 08:42:13 -04:00
Ioannis Kakavas
16e9433ead
Fix ReloadSecureSettings API to consume password (#54771)
The secure_settings_password was never taken into consideration in
the ReloadSecureSettings API. This commit fixes that and adds
necessary REST layer testing. Doing so, it also

- Allows TestClusters to have a password protected keystore
so that it can be set for tests.
- Adds a parameter to the run task so that elastisearch can
be run with a password protected keystore from source.
2020-04-10 16:48:36 +03:00
Jason Tedor
a0cb977f23
Clarify available processors (#54907)
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
2020-04-10 08:38:00 -04:00
Vishal Patel
7003ac4ab6
[DOCS] Collapse nested objects in cluster reroute docs (#54851) 2020-04-09 14:56:37 -04:00
James Rodewig
0037334d23
[DOCS] Collapse nested objects in node stats API response (#54755)
Replaces dot notation with collapsed nested object formatting
per the [Elastic API reference template][0].

[0]:https://github.com/elastic/docs/blob/master/shared/api-ref-ex.asciidoc
2020-04-06 15:18:58 -04:00
James Rodewig
e4cb9cd737
[DOCS] Collapse nested objects in cluster stats API response (#54739)
Replaces dot notation with collapsed nested object formatting
per the [Elastic API reference template][0].

[0]:https://github.com/elastic/docs/blob/master/shared/api-ref-ex.asciidoc
2020-04-06 13:10:40 -04:00
Nhat Nguyen
61e5350e77
Support hierarchical task cancellation (#54757)
With this change, when a task is canceled, the task manager will cancel 
not only its direct child tasks but all also its descendant tasks.

Closes #50990
2020-04-06 12:00:02 -04:00
Nhat Nguyen
ee3d40320a
Broadcast cancellation to only nodes have outstanding child tasks (#54312)
Today when canceling a task we broadcast ban/unban requests to all nodes 
in the cluster. This strategy does not scale well for hierarchical
cancellation. With this change, we will track outstanding child requests
and broadcast the cancellation to only nodes that have outstanding child
tasks. This change also prevents a parent task from sending child
requests once it got canceled.

Relates #50990
Supersedes #51157

Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2020-04-01 11:22:13 -04:00
James Rodewig
52e8f6db99
[DOCS] Document missing data types for node stats API's response parameters (#53475)
Documents missing data types for several response parameters returned
by the node stats API.

Also adds several missing human-readable parameters returned by the API.
2020-03-25 08:25:26 -04:00
Tim Brooks
8ccdaa3a35
Align remote info api with new settings (#53441)
Currently the remote info api has added a number of possible fields
(proxy, num_socket_connections, etc) that are available in proxy mode.
These fields are not aligned with what the settings are named. This
commit modifies this API to align with the settings.
2020-03-13 15:01:01 -06:00
James Rodewig
33d537fa36
[DOCS] Document nodes cluster stats (#52813)
Documents the `nodes` response parameters returned by the
`_cluster/stats` API.

Also adds collapsible attributes for the `indices` and `nodes`
sections.
2020-03-10 05:03:17 -04:00
Tim Brooks
abd8a36f9b
Add documentation for remote cluster proxy mode (#52779)
This is related to #49067.
2020-03-09 10:49:41 -06:00