Commit graph

8658 commits

Author SHA1 Message Date
Przemysław Witek
38aa474dec
Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) 2020-07-01 13:29:56 +02:00
Russ Cam
39c0083eee
Update link to .NET BulkAllObservable 2020-07-01 19:36:25 +10:00
David Turner
83d6589b2a
Account for remaining recovery in disk allocator (#58029)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
2020-07-01 08:04:45 +01:00
James Rodewig
483bab2281
[DOCS] Add data streams to API conventions (#58695)
Updates the existing API conventions docs to make them aware of data
streams.

Co-authored-by: debadair <debadair@elastic.co>
2020-06-30 17:06:17 -04:00
James Rodewig
c7ca1d5941 [DOCS] Make <target> defs consistent 2020-06-30 15:53:32 -04:00
Nik Everett
32bdf8549b
Fail variable_width_histogram that collects from many (#58619)
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.

Relates to #42035
2020-06-30 15:42:46 -04:00
James Rodewig
c9fc9c9d21
[DOCS] Clarify request formats for index API (#58768) 2020-06-30 15:09:26 -04:00
James Rodewig
b292459ab1
[DOCS] Add data streams to cat APIs (#58699) 2020-06-30 15:06:51 -04:00
James Rodewig
3d77914db7
[DOCS] Add data streams to count API (#58771) 2020-06-30 15:01:37 -04:00
James Rodewig
0edeb97206
[DOCS] Add data streams to get field mapping API docs (#58689)
Updates the existing get field mapping API docs to make them aware of
data streams. Relates to #58488.
2020-06-30 11:58:30 -04:00
Lee Hinman
3b68df2355
Add default composable templates for new indexing strategy (#57629)
This commit adds the component and composable templates, as well as ILM policies, for the new
default indexing strategy. It installs:

- logs-default-mappings (component)
- logs-default-settings (component)
- logs-default-policy (ilm policy)
- logs-default-template (composable template)
- metrics-default-mappings (component)
- metrics-default-settings (component)
- metrics-default-policy (ilm policy)
- metrics-default-template (composable template)

These templates and policies are managed by a new x-pack module, `stack`, and can be disabled by
setting `stack.templates.enabled` to `false`.

These ensure that patterns for the `logs-*-*` and `metrics-*-*` indices are set up to create data
streams with the proper mappings and settings.

This also makes changes to the `IndexTemplateRegistry` to support installing component and
composable templates (previously it supported only legacy templates).

Resolves #56709
2020-06-30 09:19:37 -06:00
James Rodewig
31b89ac083
[DOCS] Fix error in stop SLM API docs (#58747) 2020-06-30 09:55:59 -04:00
James Rodewig
66bcc556ee [DOCS] Reword admon for index API and data streams 2020-06-30 09:52:03 -04:00
James Rodewig
f18e136400 [DOCS] Fix xref format in async EQL search docs 2020-06-30 09:36:08 -04:00
James Rodewig
682615a15e
[DOCS] Suppress searchable snapshots in releases (#58740) (#58743)
Fixes a searchable snapshot reference overlooked in #58652
2020-06-30 09:22:40 -04:00
James Rodewig
cc3bd3974f
[DOCS] EQL: Document head and tail pipes (#58673) 2020-06-30 08:35:37 -04:00
David Turner
f52f5c1f02
Suppress searchable snapshots docs in releases (#58652)
This commit adds conditional logic to the docs to avoid including any
docs on searchable snapshots in released versions.

Rework of #58556 which was reverted.
2020-06-30 12:24:35 +01:00
Yannick Welsch
118521d022
Account for recovery throttling when restoring snapshot (#58658)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-06-30 13:08:21 +02:00
Przemysław Witek
dfa06240fc
Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) 2020-06-30 13:06:15 +02:00
Yannick Welsch
5e345e115b
Add index block api (#58094)
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.

This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
2020-06-30 09:33:15 +02:00
James Rodewig
55e2ec6248
[DOCS] Document delete/update by query for data streams (#58679) 2020-06-29 16:31:37 -04:00
Adam Locke
23abe8ec44
[DOCS] Adding create index snapshot API docs (#58519)
* Adding create index snapshot API page.
* Condense API description.
* Remove parameter from query.
* Add POST method and remove `-name` from the snapshot variable.
* Expand description of `<snapshot>`.
* Add data streams to introduction and expand the overall description.
* Add support for data streams.
* Add support for data streams.
* Add data stream and reference for "point-in-time view".
* Add data streams.
* Change `my_backup` to `my_repository`.
* Add description of boolean options for `wait_for_completion` parameter.
* Change command --> response
* Clarify `indices` parameter description
* Update `ignore-unavailable` parameter description
* Reword example description
* Remove "index" from API name
* Incorporating review comments from James R.
* Adding a much better request + response
* Clarify `include_global_state` description
* Incorporating additional edits.
* Changing my_backup to my_repository in example.
* Update snippet test to avoid failures
* Update TESTRESPONSE snippets
* Remove errant space
* Removing the  parameter per reviewer comments
2020-06-29 14:53:30 -04:00
weizijun
974f6e66b6
LLRC RequestOptions add RequestConfig (#57972)
Different kinds of requests may need different request options from the client 
default. Users can optionally set RequestConfig on a single request's 
RequestOptions to override the default. Without this, socketTimeout can only 
set at RestClient initialization.
2020-06-29 11:19:04 -04:00
James Rodewig
29da275b0a
[DOCS] EQL: Remove fields from EQL search response (#58667) 2020-06-29 09:19:07 -04:00
István Zoltán Szabó
d0042fb791
[DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 11:28:17 +02:00
David Turner
01b666bbdc Revert "Suppress searchable snapshots docs in releases (#58556)"
This reverts commit e5c3e5625c.
2020-06-29 09:27:54 +01:00
David Turner
e5c3e5625c
Suppress searchable snapshots docs in releases (#58556)
This commit adds conditional logic to the docs to avoid including any
docs on searchable snapshots in released versions.
2020-06-29 08:33:49 +01:00
Przemysław Witek
3953de4c98
Introduce DataFrameAnalyticsConfig update API (#58302) 2020-06-29 09:26:31 +02:00
Yang Wang
38185e5da0
Add cache for application privileges (#55836)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
2020-06-29 13:59:00 +10:00
Dimitris Athanasiou
96853df6af
[ML] Rename increased_memory_estimate_bytes (#58614)
... to memory_reestimate_bytes in DF Analytics
memory usage.

Relates #58588
2020-06-27 12:04:39 +03:00
Costin Leau
d6731d659d Update JSON results in EQL docs 2020-06-27 09:45:50 +03:00
Costin Leau
4521ca3367
EQL: Add Head/Tail pipe support (#58536)
Introduce pipe support, in particular head and tail
(which can also be chained).
2020-06-27 09:08:03 +03:00
James Rodewig
a7aa3da3bf
[DOCS] Add data streams to multi search API docs (#58610)
Makes the existing multi search API docs aware of data streams.
2020-06-26 17:06:58 -04:00
James Rodewig
926e9aff52
[DOCS] Document open requests for data streams (#58615)
Adds an open API example to the data streams docs. Also updates the
existing open API docs to make them aware of data streams.
2020-06-26 16:28:26 -04:00
James Rodewig
9f86ce6c0e
[DOCS] Remove composable index template refs (#58567)
Replaces `composable index template` and `composable template` with
`index template` throughout data stream-related docs.

`Composable index template` is only used to contrast with legacy index
templates.
2020-06-26 11:12:36 -04:00
James Rodewig
d14b7d5399
[DOCS] EQL: Remove references to partial async EQL results (#58548)
Removes references to partial results from the async EQL search docs.
If an EQL search does not complete during the `wait_for_completion_timeout`
timeout period, it returns no results.
2020-06-26 10:27:30 -04:00
James Rodewig
05da3e0e48
[DOCS] Fix analyzer page titles (#58362)
Changes the titles for analyzer pages to sentence case.

Also changes the 'Pattern character filter' page title to sentence case.
2020-06-26 09:30:37 -04:00
Dimitris Athanasiou
0994005c2e
[ML] Add status and increased estimate to memory usage (#58588)
Adds parsing of `status` and `increased_memory_estimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`increased_memory_estimate_bytes` can be used to update the job's
limit in order to restart the job.
2020-06-26 16:10:14 +03:00
James Rodewig
b2b3599012
[DOCS] Fix tokenizer page titles (#58361)
Changes the titles for tokenizer pages to sentence case.

Also moves the 'Path hierarchy tokenizer examples' page within the
'Path hierarchy tokenizer' page and adds a related redirect.
2020-06-26 09:08:44 -04:00
Bogdan Pintea
94eb5a05e7
SQL: fix handling of escaped chars in JDBC connection string (#58429)
* Fix: preserve URI query and fragment char escaping

This commit fixes an issue emerging when the connection string URI
contains escaped characters.

The original URI is pre-parsed in order to re-assemble a new URI having
the optional elements filled in with defaults. The new URI has been
using however the unescaped query and fragment parts. So if these
contained any escaped `&` or `=` (such as in the password option value),
the unescaping would reveal them and make them later interfere with the
options parsing.

The commit changes that, so that the new URI be built from the unescaped
"raw" parts of the original URI.
2020-06-26 10:19:44 +02:00
Przemyslaw Gomulka
ed43839a60
Update format.asciidoc to describe strict_date_optional_time_nanos (#57527)
closes #57019
2020-06-26 08:29:52 +02:00
Nik Everett
dda78ff760
Docs: Mark variable_width_histogram experimental (#58574)
We're tracking this aggregation's experimental-progress in #58573. We'd
like a little time to be able to make backwards incompatible changes to
the aggregation because we're not 100% sure about the request and
response format yet.
2020-06-25 16:54:37 -04:00
James Rodewig
662cf81bbc
[DOCS] Fix EQL search snippet for tiebreaker example (#58545) 2020-06-25 09:23:50 -04:00
James Rodewig
07874ec357
[DOCS] EQL: Document search API's tiebreaker_field param (#57935) 2020-06-25 08:44:34 -04:00
James Rodewig
e33a0dfe77
[DOCS] Note that DS timestamp field mapping changes require reindex (#58444)
With #58096, data streams now track the timestamp field mapping outside
of the template associated with the stream. This means you can no longer
update the timestamp field mapping using template changes.

This updates the associated data stream docs.
2020-06-24 17:00:09 -04:00
Jason Tedor
a914d84429
Introduce node.roles setting (#54998)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-24 14:46:31 -04:00
Russ Cam
e54402526c
[DOCS] Update aliases to indicate array (#58469)
Updates the aliases documentation
to correct the parameter to an array.
2020-06-24 09:38:53 -04:00
markharwood
cdc1be144b
Field capabilities - make keyword a family of field types (#58315)
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.

Relates to #53175
2020-06-24 11:37:16 +01:00
James Dorfman
e99d287fbb
Add Variable Width Histogram Aggregation (#42035)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue. 

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
2020-06-23 09:26:54 -04:00
James Rodewig
48f4a8db0d
[DOCS] Add data streams to bulk, delete, and index API docs (#58340)
Updates existing docs for the bulk, delete and index APIs to make them
aware of data streams.
2020-06-23 09:18:28 -04:00