This PR extends the assumptions we make about database file availability to all database file
names instead of the default ones we host at Elastic. When creating a geo ip processor with
a database name that is not recognized we unilaterally convert the processor to one that
tags documents with a missing database message until the database file requested is
downloaded or provided via the manual configuration route. This allows a pipeline to be
created and for the download service to be started, potentially sourcing the needed files.
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.
* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.
* Adding the geoip processor back
* Adding tags to the events mapping.
* Fix a forbidden API call into tests.
* lint
* Adding an integration tests for managed pipelines.
* lint
* Add a geoip_database_lazy_download param to pipelines and use it instead of managed.
* Fix a edge case: pipeline can be set after index is created.
* lint.
* Update docs/changelog/96624.yaml
* Update 96624.yaml
* Uses a processor setting (download_database_on_pipeline_creation) to decide database download strategy.
* Removing debug instruction.
* Improved documentation.
* Improved the way to check for referenced pipelines.
* Fixing an error in test.
* Improved integration tests.
* Lint.
* Fix failing tests.
* Fix failing tests (2).
* Adding javadoc.
* lint javadoc.
* Using a set instead of a list to store checked pipelines.
The Redact processor uses the Grok rules engine to
redact text in the input document that matches the
Grok pattern. For example Email or IP addresses can
be redacted using the definitions from the standard
Grok pattern bank. New patterns can be defined in
the processor configuration
This commit changes the geoip downloader so that we only download the geoip databases if you
have at least one geoip processor in your cluster, or when you add a new geoip processor (or if
`ingest.geoip.downloader.eager.download` is explicitly set to true).
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:
```
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "",
"processors" : [
{
"json" : {
"field" : "message",
"strict_json_parsing": false
}
}
]
},
"docs": [
{
"_source": {
"message": "123 \"foo\""
}
}
]
}'
```
Closes#92898
This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.
Relates to #86014
The ingest attachment processor is currently available as a plugin. This
commit moves the processor to the default distribution so it is always
available.
Changes the type of the version parameter in `IngestDocument` from
`Long` to `long` and moves it to the third argument, so all required
values occur before nullable arguments.
The `IngestService` expects a non-null version for a document and will
throw an `NullPointerException` if one is not provided.
Related: #87309
Add `ignore_missing_pipeline` option to `pipeline` processor. This
controls whether the `pipeline` processor should fail with an error if
no pipeline with a name specified in the `name` option exists.
This enhancement is useful to setup a pipeline infrastructure that
lazily adds extension points for overwrites. So that for specific
cluster setups custom pre-processing can be added at a later point in
time.
Relates to #87323
This commit adds initial windowing support for text_classification tasks.
Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating
sub-sequences.
Default value is span: -1 indicates that no windowing should take place.
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.
Relates to #79309, #31619
This PR changes uses of transient cluster settings to
persistent cluster settings.
The PR also deprecates the transient settings usage.
Relates to #49540