Adds a new option trace_redact in redact processor to indicate a document has been redacted in the ingest pipeline. If a document is processed by a redact processor AND any field is redacted, ingest metadata _ingest._redact._is_redacted = true will be set.
Closes#94633
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* (Doc+) Inference Pipeline ignores Mapping Analyzers
From internal Dev feedback (will cross-link after), this updates that inference processors within ingest pipelines run before mapping analyzers effectively ignoring them. So if users want analyzers to take effect, they would need to select the analyzer's ingest pipeline process equivalent and run it higher in flow than the inference processor.
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
The GeoIP endpoint does not use the xpack http client. The GeoIP downloader uses the JDKs builtin cacerts.
If customer is using custom https endpoint they need to provide the cacert in the jdk, whether our jdk bundled in or their jdk. Otherwise they will see something like
```
...PKiX path building failed: sun.security.provier.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target...
```
* added override flag for rename processer along with factory tests
* added yaml tests for rename processor using the override flag
* updated renameProcessor tests to include override flag as a parameter
* updated rename processor tests to incorporate override flag = true scenario
* updated rename processor asciidoc with override option
* updated rename processor asciidoc with override option
* removed unnecessary supresswarnings tag
* corrected formatting errors
* updated processor tests
* fixed yaml tests
* Prefer early throw style here
* Whitespace
* Move and rewrite this test
It's just a simple test of the primary behavior of the rename
processor, so put it first and simplify it.
* Rename this test
It doesn't actually exercise template snippets
* Tidy up this test
---------
Co-authored-by: Joe Gallo <joegallo@gmail.com>
This PR extends the assumptions we make about database file availability to all database file
names instead of the default ones we host at Elastic. When creating a geo ip processor with
a database name that is not recognized we unilaterally convert the processor to one that
tags documents with a missing database message until the database file requested is
downloaded or provided via the manual configuration route. This allows a pipeline to be
created and for the download service to be started, potentially sourcing the needed files.
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.
* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.
* Adding the geoip processor back
* Adding tags to the events mapping.
* Fix a forbidden API call into tests.
* lint
* Adding an integration tests for managed pipelines.
* lint
* Add a geoip_database_lazy_download param to pipelines and use it instead of managed.
* Fix a edge case: pipeline can be set after index is created.
* lint.
* Update docs/changelog/96624.yaml
* Update 96624.yaml
* Uses a processor setting (download_database_on_pipeline_creation) to decide database download strategy.
* Removing debug instruction.
* Improved documentation.
* Improved the way to check for referenced pipelines.
* Fixing an error in test.
* Improved integration tests.
* Lint.
* Fix failing tests.
* Fix failing tests (2).
* Adding javadoc.
* lint javadoc.
* Using a set instead of a list to store checked pipelines.
The Redact processor uses the Grok rules engine to
redact text in the input document that matches the
Grok pattern. For example Email or IP addresses can
be redacted using the definitions from the standard
Grok pattern bank. New patterns can be defined in
the processor configuration
This commit changes the geoip downloader so that we only download the geoip databases if you
have at least one geoip processor in your cluster, or when you add a new geoip processor (or if
`ingest.geoip.downloader.eager.download` is explicitly set to true).
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:
```
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "",
"processors" : [
{
"json" : {
"field" : "message",
"strict_json_parsing": false
}
}
]
},
"docs": [
{
"_source": {
"message": "123 \"foo\""
}
}
]
}'
```
Closes#92898