In https://github.com/elastic/kibana/pull/113783, we renamed Kibana's **Ingest Pipelines** feature to **Ingest Pipelines**. This updates screenshots and references for the feature. It also replaces a few remaining `ingest node pipeline` references.
Introduces a setting cluster.deprecation_indexing.x_opaque_id_used.enabled to disable use of
x-opaque-id in RateLimitingFilter. This will be used for deprecation
logs indexing and will not affect logging to files (it uses different
instance of RateLimitingFilter with this flag enabled by default)
Changes the indices backing a deprecation log data stream to be hidden.
Refactors DeprecationHttpIT to be more reliable
relates #76292closes#77936
* Index prefixes for searchable snapshots
added a note about how ILM managed indices are prefixed with "restored-" or "partial-" when they are either fully or partially mounted for searchable snapshots
* Apply suggestions from code review
Co-authored-by: debadair <debadair@elastic.co>
* Implement and test get feature upgrade status API
* Add integration test for feature upgrade endpoint
* Use constant enum for statuses
* Add unit tests for transport class methods
* WIP, basic implementation
* Pull `if` branch into a variable
* Remove outdated javadoc
* Remove map iteration, use target name instead of id (whoops)
* Remove streaming from isReplacementSource
* Simplify getReplacementName
* Only calculate node shutdowns if canRemain==false and forceMove==false
* Move canRebalance comment in BalancedShardsAllocator
* Rename canForceDuringVacate -> canForceAllocateDuringReplace
* Add comment to AwarenessAllocationDecider.canForceAllocateDuringReplace
* Revert changes to ClusterRebalanceAllocationDecider
* Change "no replacement" decision message in NodeReplacementAllocationDecider
* Only construct shutdown map once in isReplacementSource
* Make node shutdowns and target shutdowns available within RoutingAllocation
* Add randomization for adding the filter that is overridden in test
* Add integration test with replicas: 1
* Go nuts with the verbosity of allocation decisions
* Also check NODE_C in unit test
* Test with randomly assigned shard
* Fix test for extra verbose decision messages
* Remove canAllocate(IndexMetadat, RoutingNode, RoutingAllocation) overriding
* Spotless :|
* Implement 100% disk usage check during force-replace-allocate
* Add rudimentary documentation for "replace" shutdown type
* Use RoutingAllocation shutdown map in BalancedShardsAllocator
* Add canForceAllocateDuringReplace to AllocationDeciders & add test
* Switch from percentage to bytes in DiskThresholdDecider force check
* Enhance docs with note about rollover, creation, & shrink
* Clarify decision messages, add test for target-only allocation
* Simplify NodeReplacementAllocationDecider.replacementOngoing
* Start nodeC before nodeB in integration test
* Spotleeeessssssss! You get me every time!
* Remove outdated comment
If the _nodes/stats API received a level=shards request parameter, then the response would have two "shards" fields,
which would cause problems with json parsers. This commit renames the "shards" field that currently only contains
"total_count" to "shard_stats".
Relates #78311#75433
Fixes a couple of erroneous references related to system indices in the snapshot restore tutorial:
* Calling the delete index API on `*` will only delete
some system indices, such as the `.security`. It won't delete others, such as
`.geoip_databases`.
* Not all dot indices are system indices. Some are just hidden indices.
Relates to #76929
The composite aggregation is considered expensive. Users should perform load testing before deploying it in production.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
The documentation indicates that `stack.templates.enabled` can be used in Elasticsearch Service, but it is not part of the settings allowlist in ESS. This PR makes the documentation match the state of the allowlist.
This PR adds a MonitoringIndexTemplateRegistry to the monitoring plugin which automatically
installs all monitoring templates locally when the plugin is initialized. Exporters have been
updated to no longer attempt installation of the monitoring templates, and instead will wait for
the templates to become available before setting themselves as started. Some older
functionality related to templates has been removed as well, such as the expectation that
version 6 monitoring templates are installed, as well as the setting that controls their installation
(xpack.monitoring.exporters.<EXPORTER>.index.template.create_legacy_templates).
This change removes several pieces of deprecated code from stored scripts.
Stored scripts/templates are no longer allowed to be an empty and will throw an exception when used
with PutStoredScript.
ScriptMetadata will now drop any existing stored scripts that are empty with a deprecation warning in
the case they have not been previously removed.
The code field is now only allowed as source as part of a PutStoredScript JSON blob.
As the script has only access to the nested document, this should be
documented.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* [DOCS] Add Beats config example for ingest pipelines
The Elasticsearch ingest pipeline docs cover ingest pipelines for Fleet and
Elastic Agent. However, the docs don't cover Beats. This adds those docs.
Relates to https://github.com/elastic/beats/pull/28239.
* Update docs/reference/ingest.asciidoc
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
The 'verbose' option to /_segments returns memory information
for each segment. However, lucene 9 has stopped tracking this memory
information as it is largely held off-heap and so is no longer significant.
This commit deprecates the 'verbose' parameter and makes it a no-op.
Fixes#75955
This commit adds a new multi-bucket aggregation: `categorize_text`
The aggregation follows a similar design to significant text in that it reads from `_source`
and re-analyzes the the text as it is read.
Key difference is that it does not use the indexed field's analyzer, but instead relies on
the `ml_standard` tokenizer with specialized ML token filters. The tokenizer + filters are the
same that machine learning categorization anomaly jobs utilize.
The high level logical flow is as follows:
- at each shard, read in the text field with a custom analyzer using `ml_standard` tokenizer
- Read in the particular tokens from the analyzer
- Feed these tokens to a token tree algorithm (an adaptation of the drain categorization algorithm)
- Gather the individual log categories (the leaf nodes), sort them by doc_count, ship those buckets to be merged
- Merge all buckets that have the EXACT same key
- Once all buckets are merged, pass those keys + counts to a new token tree for additional merging
- That tree builds the final buckets and that is returned to the user
Algorithm explanation:
- Each log is parsed with the ml-standard tokenizer
- each token is passed into a token tree
- For `max_match_token` each token is stored in the tree and at `max_match_token+1` (or `len(tokens)`) a log group is created
- If another log group exists at that leaf, merge it if they have `similarity_threshold` percentage of tokens in common
- merging simply replaces tokens that are different in the group with `*`
- If a layer in the tree has `max_unique_tokens` we add a `*` child and any new tokens are passed through there. Catch here is that on the final merge, we first attempt to merge together subtrees with the smallest number of documents. Especially if the new sub tree has more documents counted.
## Aggregation configuration.
Here is an example on some openstack logs
```js
POST openstack/_search?size=0
{
"aggs": {
"categories": {
"categorize_text": {
"field": "message", // The field to categorize
"similarity_threshold": 20, // merge log groups if they are this similar
"max_unique_tokens": 20, // Max Number of children per token position
"max_match_token": 4, // Maximum tokens to build prefix trees
"size": 1
}
}
}
}
```
This will return buckets like
```json
"aggregations" : {
"categories" : {
"buckets" : [
{
"doc_count" : 806,
"key" : "nova-api.log.1.2017-05-16_13 INFO nova.osapi_compute.wsgi.server * HTTP/1.1 status len time"
}
]
}
}
```
The get SLM status API will only return one of three statuses: `RUNNING`, `STOPPING`, or `STOPPED`.
This corrects the docs to remove the `STARTED` status and document the `RUNNING` status.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
In #77686 we added a service to clean up blob store
cache docs after a searchable snapshot is no more
used. We noticed some situations where some cache
docs could still remain in the system index: when the
system index is not available when the searchable
snapshot index is deleted; when the system index is
restored from a backup or when the searchable
snapshot index was deleted on a version before #77686.
This commit introduces a maintenance task that
periodically scans and cleans up unused blob cache
docs. This task is scheduled to run every hour on the
data node that contain the blob store cache primary
shard. The periodic task works by using a point in
time context with search_after.
For this grid type, the features on the aggregation layer are represented by a point that is computed from the
centroid of the data inside the cell
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Documents the `runs` keyword for running the same event criteria successively in a sequence query.
Relates to #75082.
# Conflicts:
# docs/reference/release-notes/highlights.asciidoc
Documents `archived.*` persistent cluster settings and index settings.
These settings are commonly produced during a major version upgrade.
Closes#28027
* Add stubs for get API
* Add stub for post API
* Register new actions in ActionModule
* HLRC stubs
* Unit tests
* Add rest api spec and tests
* Add new action to non-operator actions list
This change removes JodaCompatibleZonedDateTime and replaces it with ZonedDateTime for use in
scripting.
Breaking changes:
* JodaCompatibleDateTime no longer exists and cannot be cast to in Painless. Use ZonedDateTime
instead.
* The dayOfWeek method on ZonedDateTime returns the DayOfWeek enum instead of an int from
JodaCompatibleDateTime. dayOfWeekEnum still exists on ZonedDateTime as an augmentation to
support the transition to ZonedDateTime, but is now deprecated in favor of dayOfWeek on
ZonedDateTime.
* [DOCS] Always enable file and native realms by default
Adds an 8.0 breaking change for PR #69096.
The copy is based on the 7.13 deprecation notice added with PR #69320.
* reword
* Update docs/reference/migration/migrate_8_0/security.asciidoc
Co-authored-by: Yang Wang <ywangd@gmail.com>
* Update docs/reference/migration/migrate_8_0/security.asciidoc
Co-authored-by: Yang Wang <ywangd@gmail.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
* [ML] add documentation for get deployment stats API
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Improve docs for pre-release version compatibility
Follow-up to #78317 clarifying a couple of points:
- a pre-release build can restore snapshots from released builds
- compatibility applies if at least one of the local or remote cluster
is a released build
* Remote cluster build date nit
Monitoring installs a number of ingest pipelines which have been historically used
to upgrade documents when mappings and document structures change between
versions. Since there aren't any changes to the document format, nor will there be
by the time the format is completely retired, we can comfortably remove these
pipelines.
Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels.
This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be:
"Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...].
This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification.
See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works.
The zeroshot classification task is configured as follows:
```js
{
// <snip> model configuration </snip>
"inference_config" : {
"zero_shot_classification": {
"classification_labels": ["entailment", "neutral", "contradiction"], // <1>
"labels": ["sad", "glad", "mad", "rad"], // <2>
"multi_label": false, // <3>
"hypothesis_template": "This example is {}.", // <4>
"tokenization": { /*<snip> tokenization configuration </snip>*/}
}
}
}
```
* <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case
* <2> This is an optional parameter for the default zero_shot labels to attempt to classify
* <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels
* <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.`
For inference in a pipeline one may provide label updates:
```js
{
//<snip> pipeline definition </snip>
"processors": [
//<snip> other processors </snip>
{
"inference": {
// <snip> general configuration </snip>
"inference_config": {
"zero_shot_classification": {
"labels": ["humanities", "science", "mathematics", "technology"], // <1>
"multi_label": true // <2>
}
}
}
}
//<snip> other processors </snip>
]
}
```
* <1> The `labels` we care about, these replace the default ones if they exist.
* <2> Should the results allow multiple true labels
Similarly one may provide label changes against the `_infer` endpoint
```js
{
"docs":[{ "text_field": "This is a very happy person"}],
"inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}}
}
```