* [DOCS] Warn only one date format is added to the field date formats
When using multiple options in `dynamic_date_formats`, only one of the formats of the first document having a date matching one of the date formats provided will be used.
E.g.
```
PUT my-index-000001
{
"mappings": {
"dynamic_date_formats": [ "yyyy/MM", "MM/dd/yyyy"]
}
}
PUT my-index-000001/_doc/1
{
"create_date": "09/25/2015"
}
```
The generated mappings will be:
```
"mappings": {
"dynamic_date_formats": [
"yyyy/MM",
"MM/dd/yyyy"
],
"properties": {
"create_date": {
"type": "date",
"format": "MM/dd/yyyy"
}
}
},
```
Indexing a document with `2015/12` would lead to the `format` `"yyyy/MM"` being used for the `create_date`.
This can be misleading especially if the user is using multiple date formats on the same field.
The first document will determine the format of the `date` field being detected.
Maybe we should provide an additional example, such as:
```
PUT my-index-000001
{
"mappings": {
"dynamic_date_formats": [ "yyyy/MM||MM/dd/yyyy"]
}
}
```
My wording is not great, so feel free to amend/edit.
* Update docs/reference/mapping/dynamic/field-mapping.asciidoc
Reword and add code example
* Turned discussion of the two syntaxes into an admonition
* Fix failing tests
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
The docs for `transport.ping_schedule` note that the transport client
defaults to a 5s ping schedule, but this is no longer relevant. This
commit drops this from the docs, and also moves the docs for this
setting further down the page to reflect its relative unimportance.
We encountered a case where a substantial fraction of the heap usage was
due to per-segment-per-field `FieldInfo` objects, particularly
`FieldInfo#name`. This commit adds a note to the sizing docs about this
overhead.
Today we say that voting-only nodes require a "low-latency" network.
This term has a specific meaning in some operating environments which is
different from our intended meaning. To avoid this confusion this commit
removes the absolute term "low-latency" in favour of describing the
requirements relative to the user's own performance goals.
This adds a new `_ml/trained_models/<model_id>/deployment/cache/_clear` API. This will clear the inference cache on every node where the model is allocated.
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field
caps API per field.
Especially for metrics fields, it must be clear which fields are metrics and if they belong
to only time-series indexes or mixed time-series and non-time-series indexes.
To further distinguish metric fields when they belong to any of the following indices:
- Standard (non-time-series) indexes
- Time series indexes
- Downsampled time series indexes
This PR modifies the field caps API so that the mapping parameters time_series_dimension
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.
This PR revisits some of the conventions adopted by #78790
Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns
Relates #77065
Co-authored-by: David Roberts <dave.roberts@elastic.co>
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.
Relates to #87625
Introduced in: #88439
* [ML] add text_similarity nlp task documentation
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/ml-shared.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Clean up network setting docs
- Add types for all params
- Remove mention of JDKs before 11
- Clarify some wording
Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of
a data stream while renaming the data stream. For example, assume a user has an auto-follow request
that looks like this:
```
PUT /_ccr/auto_follow/my-auto-follow-pattern
{
"remote_cluster" : "other-cluster",
"leader_index_patterns" : ["logs-*"],
"follow_index_pattern" : "{{leader_index}}_copy"
}
```
And then the data stream `logs-mysql-error` was created, creating the backing index
`.ds-logs-mysql-error-2022-07-29-000001`.
Prior to this commit, replicating this data stream means that the backing index would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This
caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not
renamed for a data stream during following.
After this commit, there are a couple of changes:
First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes
`logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works,
this means we need to support renaming a data stream for a regular "create follower" request, so a
new parameter has been added: `data_stream_name`. It works like this:
```
PUT /mynewindex/_ccr/follow
{
"remote_cluster": "other-cluster",
"leader_index": "myotherindex",
"data_stream_name": "new_ds"
}
```
Second, the backing index for a data stream must be renamed in a way that does not break the parsing
of a data stream backing pattern, whereas previously the index
`.ds-logs-mysql-error-2022-07-29-000001` would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the
rollover digit), after this commit it will be renamed to
`.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for
the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like:
| Leader Cluster | Follower Cluster |
|--------------|-----------|
| `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) |
| `.ds-logs-mysql-error-2022-07-29-000001` | `.ds-logs-mysql-error_copy-2022-07-29-000001` |
Which internally means the auto-follow request turned into the create follower request of:
```
PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow
{
"remote_cluster": "other-cluster",
"leader_index": ".ds-logs-mysql-error-2022-07-29-000001",
"data_stream_name": "logs-mysql-error_copy"
}
```
Relates to https://github.com/elastic/elasticsearch/pull/84940 (cherry-picked the commit for a test)
Relates to https://github.com/elastic/elasticsearch/pull/61993 (where data stream support was first introduced for CCR)
Resolves https://github.com/elastic/elasticsearch/issues/81751
DiscoveryPlugin allows extending getJoinValidator and
getElectionStrategies. These are implementation details of the system.
This commit deprecates these methods so that plugin authors are
discouraged from overriding them.
Network plugins provide network implementations. In the past this has
been used for alternatives to netty based networking, using the JDK's
nio. However, nio has now been removed, and it is inadvisable for a
plugin to implement this low level part of the system.
Therefore, this commit marks the NetworkPlugin interface as deprecated.
Adds some docs giving more detailed background about what data
corruption really means and some suggestions about how to narrow down
the root cause.
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
The inference node stats for deployed PyTorch inference
models now contain two new fields: `inference_cache_hit_count`
and `inference_cache_hit_count_last_minute`.
These indicate how many inferences on that node were served
from the C++-side response cache that was added in
https://github.com/elastic/ml-cpp/pull/2305. Cache hits
occur when exactly the same inference request is sent to the
same node more than once.
The `average_inference_time_ms` and
`average_inference_time_ms_last_minute` fields now refer to
the time taken to do the cache lookup, plus, if necessary,
the time to do the inference. We would expect average inference
time to be vastly reduced in situations where the cache hit
rate is high.
This change adds support for kNN vector fields to the `_disk_usage` API. The
strategy:
* Iterate the vector values (using the same strategy as for doc values) to
estimate the vector data size
* Run some random vector searches to estimate the vector index size
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Closes#84801
This change ensures that existing read_only_allow_delete blocks that
are placed on indices when the flood_stage watermark threshold is
exceeded, are removed when the disk threshold monitoring is disabled.
This is done by changing how InternalClusterInfoService behaves when
disabled. With this change, it will keep calling the registered
listeners periodically, but with an empty ClusterInfo.
Closes#86383
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.
See #82975
This PR adds a user action to the SLM health indicator which checks each SLM policy's invocations
since last success field and reports degraded health (YELLOW) in the event that any policy is at or
above the failure threshold (default is 5 failures in a row).
This commit removes the notion of components from the health API. They are gone from being
a top-level field in the response, and indicators is promoted into its place.
Our current default for the http.max_header_size setting is 8kb. This
is lower than the current default for Kibana (16kb in 8.x), and the ESS
proxy (1mb based on the Go http library default). To align with the
current convention of other Elastic components, this PR increases the
ES header size setting default to 16kb.
Closes#88501
Remove help_url,rename summary->symptom,user_actions->diagnosis
Separate the diagnosis `message` field in `cause` and `action`
Co-authored-by: Mary Gouseti <mgouseti@gmail.com>
* Convert disk watermarks to RelativeByteSizeValues
Similar to the existing watermark setting for the frozen tier.
Pre-requisite for PR 88639 that plans to introduce max headroom
settings for the disk watermarks, similar to the frozen tier max
headroom setting.
* Add changelog
* Revert 20gb to 20GB
* Make formatNoTrailingZerosPercent non static
* ByteSizeValue.MINUS_ONE
* Remove getMinimumTotalSizeForBelowWatermark
* Remove comment
* Fix minor stuff
* Make parsing of RelativeByteSizeValue faster
Mimicks older definitelyNotPercentage function
* Remove Locale from Strings.format
* More MINUS_ONE