Fixes an intro sentence for the Docker install instructions.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit 472a7d8e91)
Co-authored-by: Alexander Reelsen <alexander@reelsen.net>
Starting a trained model deployment the user may set values for `inference_threads`
of `model_threads`. The first improves latency whereas the latter improves throughput.
It is easier to reason on how a model allocation uses resources if we ensure only
one of those two may be greater than one. In addition, it allows us to distribute
the cores of the ML nodes in the cluster across the model allocations in the future.
This commit adds a validation that prevents both `inference_threads` and `model_threads`
to be greater than one.
Today the note in the docs about S3-compatible repositories notes that
the repo must behave correctly, but it's also important that it has the
same performance profile. This commit extends the docs to include this
info.
Throughput is measured as the number of inference requests
processed per minute. The node level stats peak_throughput_per_minute,
throughput_last_minute and average_inference_time_ms_last_minute are
added with a deployment level stat peak_throughput_per_minute which
is the summed throughput of all nodes.
* add data tier configuration information for Cloud
* Move configuration docs and add headings
* update ILM tutorial image
* add save changes step
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
The use of `apt-key` is deprecated and will no longer be available after
Debian 11 and Ubuntu 22.04. This updates the installation instructions
for Debian-based distributions.
Closes#84644
This adds a new sampling aggregation that performs a background sampling over all documents in an index.
The syntax is as follows:
```
{
"aggregations": {
"sampling": {
"random_sampler": {
"probability": 0.1
},
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "taxful_total_price"
}
}
}
}
}
}
```
This aggregation provides fast random sampling over the entire document set in order to speed up costly aggregations.
Testing this over a variety of aggregations and data sets, the median speed up when sampling at `0.001` over millions of documents is around 70X speed improvement.
Relative error rate does rely on the size of the data and the aggregation kind. Here are some typically expected numbers when sampling over 10s of millions of documents. `p` is the configured probability and `n` is the number of documents matched by your provided filter query.
Fixes an error and test snippets for the sum aggregation example for histograms.
Closes#84491
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit fb45ac9dea)
Co-authored-by: Maja Grubic <maja.grubic@elastic.co>
This commit adds initial windowing support for text_classification tasks.
Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating
sub-sequences.
Default value is span: -1 indicates that no windowing should take place.
* Collapse more specialized sections around nested fields, unmapped fields, and
ignored values
* Move information on metadata fields to a 'note' and streamline it a bit
Closes#82983.
We know that we plan to remove direct access to system indices, but we aren't sure what major version that change will fall in. This updates the docs to avoid any confusion in the meantime.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit ffd21e5259)
Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>
Documents the following:
* FWC for CCS within the same major version.
* A local cluster running the last minor of a major can search a remote cluster running any minor in the following major.
* Only features that exist across all searched clusters are supported.
In 8.0.0. we introduce TLS autoconfiguration. We store the key and
certificate materials in password protected PKCS#12 keystores and
we store these passwords in the elasticsearch keystore.
This commit adds instructions on how to get hold of the passwords
for users to inspect or alter the PKCS#12 keystores.
There has been some confusion over the definition of a field type family. This
PR clarifies the definition in the docs: the two types should have the exact
same search behavior (including supporting the same queries/ aggs, and producing
the same response). It's not sufficient for them to just support the samme
search operations.
This change also fixes an inaccurate statement that there is only one field type
family so far.
In the intro, we mention that parts of the feature are still under development.
This is not very helpful information for users, and could give the wrong
impression about its maturity.
This commit adds an explanation for the relation between `allow_partial_search_results` and `skip_unavailable` in CCS requests.
Relates to #33915Closes#82407
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
This PR introduces the lookup runtime fields which are used to retrieve
data from the related indices. The below search request enriches its
search hits with the location of each IP address from the `ip_location`
index.
```
POST logs/_search
{
"runtime_mappings": {
"location": {
"type": "lookup",
"lookup_index": "ip_location",
"query_type": "term",
"query_input_field": "ip",
"query_target_field": "_id",
"fetch_fields": [
"country",
"city"
]
}
},
"fields": [
"timestamp",
"message",
"location"
]
}
```
Response:
```
{
"hits": {
"hits": [
{
"_index": "logs",
"_id": "1",
"fields": {
"location": [
{
"city": [ "Montreal" ],
"country": [ "Canada" ]
}
],
"message": [ "the first message" ]
}
}
]
}
}
```
The cluster allocation explain API includes a top-level status
indicating to the user whether the shard can be assigned/rebalanced/etc
or not. Today this status is fairly terse and experience shows that
users sometimes struggle to understand how to interpret it and to decide
on follow-up actions.
This commit makes the top-level explanation more detailed and
actionable. For instance, in the cases like `THROTTLED` where the status
is transient we instruct the user to wait; if a shard is lost we say to
restore it from a snapshot; if a shard cannot be assigned we say to
choose a specific node where its assignment is expected and to address
the obstacles.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
* Add a note that the http_ca.crt certificate that is generated and
stored in config/certs can be used to configure any client to trust
the certificate that elasticsearch uses for TLS on the HTTP layer
* Add a note that the elasticsearch-create-enrollment-token CLI
tool can only be used with auto-configured TLS settings.
Clarifies that the `orientation` mapping parameter only applies to WKT polygons. GeoJSON polygons use a default orientation of `RIGHT`, regardless of the mapping parameter.
Also notes that the document-level `orientation` parameter overrides the default orientation for both WKT and GeoJSON polygons.
Closes https://github.com/elastic/elasticsearch/issues/84009.
The current `ignore_unavailable` definition is a bit misleading. The parameter primarily determines if a request that targets a missing or closed index returns an error.