Commit graph

380 commits

Author SHA1 Message Date
Joe Gallo
9bc09d576a
Fix ignore_missing docs for a couple of Ingest processors (#95244) 2023-04-13 16:34:40 -04:00
Aurélien FOUCRET
9071d114f5
[Ingest Processor] Add ignore_missing param to the uri_parts ingest processor. (#95068) 2023-04-13 15:11:19 +02:00
Jean-Fabrice Bobo
a7e901263b
Update geoip.asciidoc (#95101)
Fix `ingest.geoip.downloader.eager.download` setting not appearing in the rendered documentation
2023-04-12 09:59:27 +02:00
Alessandro Stoltenberg
c787e3808f
docs: set-processor minor update (#94899) 2023-03-30 14:27:05 +02:00
Dimitris Kotsakos
38a09bea60
[ML] Make redact processor experimental for first release (#94683) 2023-03-23 18:28:03 +02:00
Joe Gallo
36aeb00835
Add an example of dot_expander's path option (#94291) 2023-03-06 09:26:40 -05:00
David Kyle
f8e306e688
Rewrite Redact Processor docs intro (#93856)
Focus on what redact does rather than describing Grok
2023-02-16 14:17:54 +00:00
David Kyle
b588d2ddd7
Redact Ingest Processor (#92951)
The Redact processor uses the Grok rules engine to
redact text in the input document that matches the
Grok pattern. For example Email or IP addresses can
be redacted using the definitions from the standard
Grok pattern bank. New patterns can be defined in
the processor configuration
2023-02-07 17:10:07 +00:00
Craig Taverner
c18078e11e
Geo_grid ingest processor docs (#93507)
* Add docs for geo_grid ingest processor

Adds docs for https://github.com/elastic/elasticsearch/pull/93370

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Consistent GeoJSON case

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-02-06 16:17:00 +01:00
Keith Massey
13b71900a6
Download the geoip databases only when needed (#92335)
This commit changes the geoip downloader so that we only download the geoip databases if you
have at least one geoip processor in your cluster, or when you add a new geoip processor (or if
`ingest.geoip.downloader.eager.download` is explicitly set to true).
2023-01-30 13:07:48 -06:00
Keith Massey
f327352601
Making JsonProcessor stricter so that it does not silently drop data (#93179)
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:

```
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "",
    "processors" : [
      {
        "json" : {
          "field" : "message",
          "strict_json_parsing": false
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "123 \"foo\""
      }
    }
  ]
}'
```

Closes #92898
2023-01-24 18:43:35 -05:00
David Kilfoyle
37f7b7b325
[Docs] Update remove processor with 'keep' option (#92836) 2023-01-11 12:52:35 -05:00
Keith Massey
d5b4584612
Adding more detail about ingest.geoip.downloader.endpoint (#91182) 2022-11-09 09:17:33 -06:00
Roberto Seldner
8e35a6a846
Update documentation with supported IANA numbers (#90531)
Based on this:
https://github.com/elastic/elasticsearch/blob/main/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/CommunityIdProcessor.java#L440-L451
2022-10-19 08:23:11 -05:00
Lee Hinman
4fe9fc488c
Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)
This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
2022-10-04 01:04:40 +10:30
Abdon Pijpelink
00d4953df5
[DOCS] Fixes broken example in pipeline tutorial (#89315) 2022-08-16 08:40:10 +02:00
István Zoltán Szabó
5372c51dfd
[DOCS] Fixes a link that breaks the docs build. (#88111) 2022-06-28 10:22:23 +02:00
Ryan Ernst
eed8da3919
Move the ingest attachment processor to the default distribution (#87989)
The ingest attachment processor is currently available as a plugin. This
commit moves the processor to the default distribution so it is always
available.
2022-06-28 02:10:36 -04:00
Stuart Tettemer
d42211c431
Ingest: IngestDocument requires non-null version (#87665)
Changes the type of the version parameter in `IngestDocument` from
`Long` to `long` and moves it to the third argument, so all required
values occur before nullable arguments.

The `IngestService` expects a non-null version for a document and will
throw an `NullPointerException` if one is not provided.

Related: #87309
2022-06-15 07:50:45 -05:00
Martijn van Groningen
7154608abf
Allow pipeline processor to ignore missing pipelines (#87354)
Add `ignore_missing_pipeline` option to `pipeline` processor. This
controls whether the `pipeline` processor should fail with an error if
no pipeline with a name specified in the `name` option exists.

This enhancement is useful to setup a pipeline infrastructure that
lazily adds extension points for overwrites. So that for specific
cluster setups custom pre-processing can be added at a later point in
time.

Relates to #87323
2022-06-07 07:02:18 -04:00
wallrik
10f53f8766
Clarify environments with strict firewalls and GEOIP (RE: #85637) (#86648) 2022-05-23 06:43:26 -06:00
Luca Belluccini
1c52081b1f
[DOC] Air gapped environments and GEOIP (#85637)
* [DOC] Air gapped environments and GEOIP

Closing https://github.com/elastic/elasticsearch/issues/85542

* Use variable name for Elasticsearch

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-05-10 16:34:28 -04:00
Benjamin Trent
258d2b71e2
[ML] add roberta/bart docs (#85001)
adds roberta section to NLP tokenization documentation.
2022-03-17 12:14:57 -04:00
Benjamin Trent
45deac4c96
[ML] add windowing support for text_classification (#83989)
This commit adds initial windowing support for text_classification tasks.

Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating
sub-sequences.

Default value is span: -1 indicates that no windowing should take place.
2022-03-01 08:29:12 -05:00
James Rodewig
d3d468e5f1
[DOCS] Update screenshots for ingest pipeline docs (#83845)
https://github.com/elastic/kibana/pull/101216 adds a new ECS mapper feature to the Ingest Pipelines UI. This updates the ES docs to cover the new feature.
2022-02-23 10:50:02 -05:00
Chris
3e72ffcac9
[DOCS] Change license abbreviation (#82266)
As far as I can see the correct abbreviation for the CC `Attribution-ShareAlike 4.0 International` License is `CC BY-SA 4.0` https://creativecommons.org/licenses/by-sa/4.0/
2022-01-13 09:38:42 -05:00
David Kyle
1473b09415
[ML] Add NLP inference configs to the inference processor docs (#82320) 2022-01-11 08:50:45 +00:00
James Rodewig
f1004ee698
[DOCS] Fix xref for conditionally running ingest processor (#82001)
Closes #81966
2021-12-21 11:37:20 -05:00
Lisa Cawley
076343933f
[DOCS] Update link in inference processor (#81897) 2021-12-17 15:49:59 -08:00
Dan Hermann
b1f5373e02
Correct docs on output_format option for date processor (#81557) 2021-12-17 06:07:03 -06:00
Lisa Cawley
b18f5fd2c6
[DOCS] Fixes link to language identification example (#81347) 2021-12-03 17:21:04 -08:00
Jan Doberstein
73b3d8f639
Update execute-enrich-policy.asciidoc (#80750)
Changed the wording, as the execution of the policy does not trigger the delete. That delete is done periodical and can be configured with the `enrich.cleanup_period` 

https://www.elastic.co/guide/en/elasticsearch/reference/7.16/enrich-setup.html#ingest-enrich-settings
2021-11-16 11:57:04 +01:00
James Rodewig
f56a0f4b66
[DOCS] Remove testenv annotations from doc snippet tests (#80023)
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.

Relates to #79309, #31619
2021-11-05 18:38:50 -04:00
edh-oss
3c23a9e9cd
[DOCS] Remove [testenv="gold+"] attributes (#79309)
Changes:

* Removes several `[testenv="gold+"]` attributes from the docs. `gold+` is not a valid [subscription level](https://www.elastic.co/subscriptions) or testenv value.
* Moves two `[testenv="basic"]` attributes to the file header. This makes the `testenv` placement consistent and fixes the yml file generated from `docs/reference/snapshot-restore/register-repository.asciidoc`.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-27 16:32:30 -04:00
Michael Bischoff
c30ab868ee
[DOCS] Document range enrich policy (#79607)
Adding docs for the range enrich policy

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-26 15:15:53 +02:00
Dan Hermann
a23f58f809
[DOCS] if_version parameter for OCC on pipeline updates (#79640) 2021-10-25 08:25:26 -05:00
James Rodewig
58abbe941f
[DOCS] Fix cluster update settings refs (#79580)
The API is named 'cluster update settings,' not 'update cluster settings.'
2021-10-20 13:16:35 -04:00
Nikola Grcevski
055c770083
Deprecation of transient cluster settings (#78794)
This PR changes uses of transient cluster settings to
persistent cluster settings. 

The PR also deprecates the transient settings usage.

Relates to #49540
2021-10-15 13:00:52 -04:00
Martijn van Groningen
230e866842
Document a number of enrich node settings. (#78930)
Add a section in the docs that describe a number of node level settings
for the enrich processor.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-14 15:00:45 +02:00
Martijn van Groningen
04e5823a69
Remove default maxmind geoip databases from distribution (#78362)
* Adjusted integration tests to use geoip test fixture or to use test databases provided via config dirs (for qa module / docs).
* Kept the geolite2-databases dependency for most of the unit tests only.
* Made fallback_to_default_databases parameter on geoip processor a noop and emit deprecation warning upon using it.
* If no geoip databases are available yet to a node then the geoip processor factory returns a processor implementation that flags documents that databases are unavailable. This allows these documents to be reindex later with a pipeline. These documents will have a tag string array field, which contains a string _geoip_database_unavailable_{database_name} for each missing database in a pipeline.
* Added reload pipeline capabilities is IngestService, so that when databases are available again on a node then pipelines with geoip processor definition can be reloaded.

Relates to #68920
2021-10-13 14:52:18 +02:00
James Rodewig
a763a86a0d
[DOCS] Update ingest node pipeline refs (#78770)
In https://github.com/elastic/kibana/pull/113783, we renamed Kibana's **Ingest Pipelines** feature to **Ingest Pipelines**. This updates screenshots and references for the feature. It also replaces a few remaining `ingest node pipeline` references.
2021-10-12 08:18:24 -04:00
edh-oss
62a471aefe
Update JSON parser and snippets (#77983)
Related to issue  #77823

This does the following:

- Updates several asciidoc files that contained code snippets with
  invalid JSON, most involving unnecessary trailing commas.

- Makes the switch from the Groovy JSON parser to the Jackson parser,
  pursuant to the general goal of eliminating Groovy dependence.

- Makes testing of JSON validity at build time more strict.

Note that this update still allows backslash escaping for any
character. Currently that matters because of the file
"docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc",
specifically this part:

    "attributes" : {
      "ml.machine_memory" :
        "$body.datafeeds.0.node.attributes.ml\.machine_memory",
      "ml.max_open_jobs" : "512"
    }

It's not clear to me what change, if any, is appropriate there. So,
I've left in the escaped period and configured the parser to ignore
it for the time being.
2021-09-20 11:08:26 +01:00
Dan Hermann
09004d30dc
[DOCS] ECS support for the grok processor (#77059) 2021-09-10 13:10:28 -05:00
Martijn van Groningen
1ae4f3c937
Add enrich node cache (#76800)
Introduce a LRU cache to avoid searches that occur frequently
from the enrich processor.

Relates to #48988
2021-09-03 09:33:44 +02:00
Francois-Clement Brossard
ec11f9f931
Execute enrich policy wait_for_completion docfix (#77046) 2021-08-31 14:02:24 -04:00
Dan Hermann
c4aad2965f
[DOCS] Map iteration support in ForEach processor (#76972) 2021-08-27 07:35:11 -05:00
James Rodewig
c238296b7a
[DOCS] Add missing timeout param to create pipeline API docs (#76432) 2021-08-12 14:32:34 -04:00
Keith Massey
498684a696
Add support for _meta field to ingest pipelines (#75905)
We are adding a _meta field to many of our REST APIs so that users can attach whatever metadata they
want. The data in this field will not be used by Elasticsearch. This commit add the _meta field to ingest
pipelines.
2021-08-11 08:30:36 -05:00
Dan Hermann
c81cf2f7fe
Configurable media_type for mustache template encoding on append processor (#76210) 2021-08-10 15:13:36 -05:00
James Rodewig
c277110a14
[DOCS] Merge ingest APIs index to one page (#76264)
Adds some `discrete` tags to merge the ingest APIs index into a single
page.
2021-08-10 09:30:47 -04:00