The Redact processor uses the Grok rules engine to
redact text in the input document that matches the
Grok pattern. For example Email or IP addresses can
be redacted using the definitions from the standard
Grok pattern bank. New patterns can be defined in
the processor configuration
This commit changes the geoip downloader so that we only download the geoip databases if you
have at least one geoip processor in your cluster, or when you add a new geoip processor (or if
`ingest.geoip.downloader.eager.download` is explicitly set to true).
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:
```
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "",
"processors" : [
{
"json" : {
"field" : "message",
"strict_json_parsing": false
}
}
]
},
"docs": [
{
"_source": {
"message": "123 \"foo\""
}
}
]
}'
```
Closes#92898
This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.
Relates to #86014
The ingest attachment processor is currently available as a plugin. This
commit moves the processor to the default distribution so it is always
available.
Changes the type of the version parameter in `IngestDocument` from
`Long` to `long` and moves it to the third argument, so all required
values occur before nullable arguments.
The `IngestService` expects a non-null version for a document and will
throw an `NullPointerException` if one is not provided.
Related: #87309
Add `ignore_missing_pipeline` option to `pipeline` processor. This
controls whether the `pipeline` processor should fail with an error if
no pipeline with a name specified in the `name` option exists.
This enhancement is useful to setup a pipeline infrastructure that
lazily adds extension points for overwrites. So that for specific
cluster setups custom pre-processing can be added at a later point in
time.
Relates to #87323
This commit adds initial windowing support for text_classification tasks.
Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating
sub-sequences.
Default value is span: -1 indicates that no windowing should take place.
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.
Relates to #79309, #31619
Changes:
* Removes several `[testenv="gold+"]` attributes from the docs. `gold+` is not a valid [subscription level](https://www.elastic.co/subscriptions) or testenv value.
* Moves two `[testenv="basic"]` attributes to the file header. This makes the `testenv` placement consistent and fixes the yml file generated from `docs/reference/snapshot-restore/register-repository.asciidoc`.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This PR changes uses of transient cluster settings to
persistent cluster settings.
The PR also deprecates the transient settings usage.
Relates to #49540
Add a section in the docs that describe a number of node level settings
for the enrich processor.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Adjusted integration tests to use geoip test fixture or to use test databases provided via config dirs (for qa module / docs).
* Kept the geolite2-databases dependency for most of the unit tests only.
* Made fallback_to_default_databases parameter on geoip processor a noop and emit deprecation warning upon using it.
* If no geoip databases are available yet to a node then the geoip processor factory returns a processor implementation that flags documents that databases are unavailable. This allows these documents to be reindex later with a pipeline. These documents will have a tag string array field, which contains a string _geoip_database_unavailable_{database_name} for each missing database in a pipeline.
* Added reload pipeline capabilities is IngestService, so that when databases are available again on a node then pipelines with geoip processor definition can be reloaded.
Relates to #68920
In https://github.com/elastic/kibana/pull/113783, we renamed Kibana's **Ingest Pipelines** feature to **Ingest Pipelines**. This updates screenshots and references for the feature. It also replaces a few remaining `ingest node pipeline` references.
Related to issue #77823
This does the following:
- Updates several asciidoc files that contained code snippets with
invalid JSON, most involving unnecessary trailing commas.
- Makes the switch from the Groovy JSON parser to the Jackson parser,
pursuant to the general goal of eliminating Groovy dependence.
- Makes testing of JSON validity at build time more strict.
Note that this update still allows backslash escaping for any
character. Currently that matters because of the file
"docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc",
specifically this part:
"attributes" : {
"ml.machine_memory" :
"$body.datafeeds.0.node.attributes.ml\.machine_memory",
"ml.max_open_jobs" : "512"
}
It's not clear to me what change, if any, is appropriate there. So,
I've left in the escaped period and configured the parser to ignore
it for the time being.
We are adding a _meta field to many of our REST APIs so that users can attach whatever metadata they
want. The data in this field will not be used by Elasticsearch. This commit add the _meta field to ingest
pipelines.