Commit graph

408 commits

Author SHA1 Message Date
Liam Thompson
52aefa59eb
[DOCS] Ingest processors docs improvements (#104384)
* [DOCS] Categorize ingest processors on overview page, summarize use cases

* Add overview info, subheading, links

* Apply suggestions from review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Insert space

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-01-17 11:50:29 +01:00
ShourieG
147484b059
[elasticsearch][processors] - Added support for override flag in rename processor (#103565)
* added override flag for rename processer along with factory tests

* added yaml tests for rename processor using the override flag

* updated renameProcessor tests to include override flag as a parameter

* updated rename processor tests to incorporate override flag = true scenario

* updated rename processor asciidoc with override option

* updated rename processor asciidoc with override option

* removed unnecessary supresswarnings tag

* corrected formatting errors

* updated processor tests

* fixed yaml tests

* Prefer early throw style here

* Whitespace

* Move and rewrite this test

It's just a simple test of the primary behavior of the rename
processor, so put it first and simplify it.

* Rename this test

It doesn't actually exercise template snippets

* Tidy up this test

---------

Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-01-11 16:00:02 +05:30
Adam Demjen
a26ff243f6
[Docs] [Enterprise Search] ML inference pipeline documentation updates (#103022)
* Remove mapping step, wording and screenshot updates

* Notes about pipeline name and model deployment

* Address CR comments
2024-01-02 09:56:50 -05:00
Abdon Pijpelink
ac973f0064
[DOCS] Improve enrich policy execute 'wait_for_completion' docs (#102291)
* [DOCS] Improve enrich policy execute 'wait_for_completion' docs

* Update docs/reference/ingest/apis/enrich/execute-enrich-policy.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

---------

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-11-27 17:17:06 +01:00
Abdon Pijpelink
bc59315baa
[DOCS] Examples for ES|QL DISSECT and WHERE (#102591)
* DISSECT examples

* WHERE examples

* Remove references to empty keys

* Fix non-deterministic test
2023-11-27 10:56:48 +01:00
Keith Massey
643d825c45
Adding a simulate ingest api (#101409)
This commit introduces a new _ingest/simulate API that runs any pipelines
on the given data that would be executed for a given index, but instead of
indexing the data into the index, returns the transformed documents and
the list of pipelines that were executed.
2023-11-15 17:25:09 -06:00
Liam Thompson
ddd94446f8
[DOCS] Fix incorrect image paths (#102082) 2023-11-13 16:00:00 +01:00
Felix Barnsteiner
978a5469ce
Add support for marking component templates as deprecated (#101148) 2023-11-02 19:28:20 +01:00
István Zoltán Szabó
c34e0c0746
[DOCS] Clarifies that inference input must be single string (#101301) 2023-10-25 17:18:05 +02:00
Liam Thompson
a6ed18c144
[DOCS] [Enterprise Search] Migrate ingest pipelines/ML docs (#101156)
* WIP, port docs

- Update link syntax
- Update ids
- Fix n^n build failures :/
-

* Fix id for doclink

* Let's try this on for size

* Idem

* Update attributes, Test image rendering

* Update image name

* Fix typo

* Update filename

* Add images, cleanup, standardize naming

* Tweak heading

* Cleanup, rewordings

- Modified introduction in `search-inference-processing.asciidoc`.
- Changed "Search connector" to "Elastic connector".
- Adjusted heading levels in `search-inference-processing.asciidoc`.
- Simplified ingest pipelines intro in `search-ingest-pipelines.asciidoc`.
- Edited ingest pipelines section for the *Content* UI.
- Reordered file inclusions in `search-ingest-pipelines.asciidoc`.
- Formatted inference pipeline creation into steps in `search-nlp-tutorial.asciidoc`.

* Lingering erroneousness

* Delete FAQ
2023-10-25 17:17:24 +02:00
Abdon Pijpelink
284f81873f
[DOCS] Expand ES|QL DISSECT and GROK documentation (#101225)
* Add 'Process data with DISSECT and GROK' page

* Expand DISSECT docs

* More DISSECT and GROK enhancements

* Improve examples

* Fix CSV tests

* Review feedback

* Reword
2023-10-25 13:19:17 +02:00
Felix Barnsteiner
75d9bd7790
Rename component templates and pipelines according to the new naming conventions (#99975)
- Creates a new StackTemplateRegistry that uses the new names
- The new registry only respects stack.templates.enabled for index templates
- Renames the old registry to LegacyStackTemplateRegistry
- Component templates are not duplicated but registered under two different names
- Documents the new naming convention
- Index templates are not renamed, at least for now, as there are some challenges with it
  See 7fd0423 for more details.
2023-10-25 11:56:28 +02:00
Abdon Pijpelink
fcdeb21993
[DOCS] Expand ES|QL ENRICH documentation (#101079)
* [DOCS] Expand ES|QL ENRICH documentation

* Add examples to 'Enrich data' page

* Add another diagram

* Remove redirect that's no longer needed

* Review feedback
2023-10-19 17:14:21 +02:00
István Zoltán Szabó
446ac9f378
[DOCS] Updates ELSER tutorial with inference processor changes (#100420)
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-10-11 17:33:20 +02:00
David Kyle
6cde0df463
[ML] More checks and tests for parsing Inference processor config (#100335)
Following on from #100205 this PR adds more tests and checks 
for corner cases when parsing the configuration.
2023-10-06 15:10:45 +01:00
David Kyle
b055204b43
[ML] Simplify the Inference Ingest Processor configuration (#100205)
Adds a `input_ouput` option the removes the need for a `field_map` and/or
target fields. Multiple inputs can be specified in `input_output`
2023-10-03 18:42:31 +01:00
István Zoltán Szabó
e0cc375b14
[DOCS] Adds text_expansion config to inference processor reference docs. (#99900) 2023-09-26 12:58:19 +02:00
Felix Barnsteiner
3a7bdb5838
Make reroute processor GA (#99531) 2023-09-20 13:22:36 +02:00
Marius Iversen
4b41b17772
Update documentation for Set Processor (#99191) 2023-09-07 14:47:07 -04:00
Joe Gallo
3284903205
Document the redact processor's skip_if_unlicensed option (#99063) 2023-08-31 14:00:12 -04:00
James Baiera
7d990d5a09
Allow custom geo ip database files to be downloaded (#97850)
This PR extends the assumptions we make about database file availability to all database file 
names instead of the default ones we host at Elastic. When creating a geo ip processor with 
a database name that is not recognized we unilaterally convert the processor to one that 
tags documents with a missing database message until the database file requested is 
downloaded or provided via the manual configuration route. This allows a pipeline to be 
created and for the download service to be started, potentially sourcing the needed files.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-08-16 00:31:51 -04:00
James Rodewig
fe6a42b35f
[DOCS] Update Elastic GeoIP service link (#97455)
Adds TOS-related query parameters to the Elastic GeoIP link in the [GeoIP ingest processor docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/geoip-processor.html). The current link returns a 400 HTTP status.
2023-07-07 10:53:02 -04:00
Aurélien FOUCRET
dd1d157b47
Enable analytics geoip in behavioral analytics. (#96624)
* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.

* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.

* Adding the geoip processor back

* Adding tags to the events mapping.

* Fix a forbidden API call into tests.

* lint

* Adding an integration tests for managed pipelines.

* lint

* Add a geoip_database_lazy_download param to pipelines and use it instead of managed.

* Fix a edge case: pipeline can be set after index is created.

* lint.

* Update docs/changelog/96624.yaml

* Update 96624.yaml

* Uses a processor setting (download_database_on_pipeline_creation) to decide database download strategy.

* Removing debug instruction.

* Improved documentation.

* Improved the way to check for referenced pipelines.

* Fixing an error in test.

* Improved integration tests.

* Lint.

* Fix failing tests.

* Fix failing tests (2).

* Adding javadoc.

* lint javadoc.

* Using a set instead of a list to store checked pipelines.
2023-06-15 23:42:10 +02:00
debadair
777598d602
[DOCS] Remove redirect pages (#88738)
* [DOCS] Remove manual redirects

* [DOCS] Removed refs to modules-discovery-hosts-providers

* [DOCS] Fixed broken internal refs

* Fixing bad cross links in ES book, and adding redirects.asciidoc[] back into docs/reference/index.asciidoc.

* Update docs/reference/search/point-in-time-api.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/setup/restart-cluster.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/sql/endpoints/translate.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/snapshot-restore/restore-snapshot.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update repository-azure.asciidoc

* Update node-tool.asciidoc

* Update repository-azure.asciidoc

---------

Co-authored-by: amyjtechwriter <61687663+amyjtechwriter@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-05-24 12:32:46 +01:00
István Zoltán Szabó
b164555072
[DOCS] Adds deployment ID param documentation to trained model APIs (#96174) 2023-05-17 15:56:58 +02:00
amyjtechwriter
c3e186ea01
Example of dot notation to access and array field for set processor. (#95893) 2023-05-09 10:21:27 +01:00
amyjtechwriter
3d6143b829
Nodes need access to storage.googleapis.com for geoip. (#95554) 2023-04-28 10:40:18 +01:00
Felix Barnsteiner
11b598a519
Add reroute processor (#76511) 2023-04-18 19:09:25 +02:00
Joe Gallo
9bc09d576a
Fix ignore_missing docs for a couple of Ingest processors (#95244) 2023-04-13 16:34:40 -04:00
Aurélien FOUCRET
9071d114f5
[Ingest Processor] Add ignore_missing param to the uri_parts ingest processor. (#95068) 2023-04-13 15:11:19 +02:00
Jean-Fabrice Bobo
a7e901263b
Update geoip.asciidoc (#95101)
Fix `ingest.geoip.downloader.eager.download` setting not appearing in the rendered documentation
2023-04-12 09:59:27 +02:00
Alessandro Stoltenberg
c787e3808f
docs: set-processor minor update (#94899) 2023-03-30 14:27:05 +02:00
Dimitris Kotsakos
38a09bea60
[ML] Make redact processor experimental for first release (#94683) 2023-03-23 18:28:03 +02:00
Joe Gallo
36aeb00835
Add an example of dot_expander's path option (#94291) 2023-03-06 09:26:40 -05:00
David Kyle
f8e306e688
Rewrite Redact Processor docs intro (#93856)
Focus on what redact does rather than describing Grok
2023-02-16 14:17:54 +00:00
David Kyle
b588d2ddd7
Redact Ingest Processor (#92951)
The Redact processor uses the Grok rules engine to
redact text in the input document that matches the
Grok pattern. For example Email or IP addresses can
be redacted using the definitions from the standard
Grok pattern bank. New patterns can be defined in
the processor configuration
2023-02-07 17:10:07 +00:00
Craig Taverner
c18078e11e
Geo_grid ingest processor docs (#93507)
* Add docs for geo_grid ingest processor

Adds docs for https://github.com/elastic/elasticsearch/pull/93370

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Consistent GeoJSON case

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-02-06 16:17:00 +01:00
Keith Massey
13b71900a6
Download the geoip databases only when needed (#92335)
This commit changes the geoip downloader so that we only download the geoip databases if you
have at least one geoip processor in your cluster, or when you add a new geoip processor (or if
`ingest.geoip.downloader.eager.download` is explicitly set to true).
2023-01-30 13:07:48 -06:00
Keith Massey
f327352601
Making JsonProcessor stricter so that it does not silently drop data (#93179)
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:

```
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "",
    "processors" : [
      {
        "json" : {
          "field" : "message",
          "strict_json_parsing": false
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "123 \"foo\""
      }
    }
  ]
}'
```

Closes #92898
2023-01-24 18:43:35 -05:00
David Kilfoyle
37f7b7b325
[Docs] Update remove processor with 'keep' option (#92836) 2023-01-11 12:52:35 -05:00
Keith Massey
d5b4584612
Adding more detail about ingest.geoip.downloader.endpoint (#91182) 2022-11-09 09:17:33 -06:00
Roberto Seldner
8e35a6a846
Update documentation with supported IANA numbers (#90531)
Based on this:
https://github.com/elastic/elasticsearch/blob/main/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/CommunityIdProcessor.java#L440-L451
2022-10-19 08:23:11 -05:00
Lee Hinman
4fe9fc488c
Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)
This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
2022-10-04 01:04:40 +10:30
Abdon Pijpelink
00d4953df5
[DOCS] Fixes broken example in pipeline tutorial (#89315) 2022-08-16 08:40:10 +02:00
István Zoltán Szabó
5372c51dfd
[DOCS] Fixes a link that breaks the docs build. (#88111) 2022-06-28 10:22:23 +02:00
Ryan Ernst
eed8da3919
Move the ingest attachment processor to the default distribution (#87989)
The ingest attachment processor is currently available as a plugin. This
commit moves the processor to the default distribution so it is always
available.
2022-06-28 02:10:36 -04:00
Stuart Tettemer
d42211c431
Ingest: IngestDocument requires non-null version (#87665)
Changes the type of the version parameter in `IngestDocument` from
`Long` to `long` and moves it to the third argument, so all required
values occur before nullable arguments.

The `IngestService` expects a non-null version for a document and will
throw an `NullPointerException` if one is not provided.

Related: #87309
2022-06-15 07:50:45 -05:00
Martijn van Groningen
7154608abf
Allow pipeline processor to ignore missing pipelines (#87354)
Add `ignore_missing_pipeline` option to `pipeline` processor. This
controls whether the `pipeline` processor should fail with an error if
no pipeline with a name specified in the `name` option exists.

This enhancement is useful to setup a pipeline infrastructure that
lazily adds extension points for overwrites. So that for specific
cluster setups custom pre-processing can be added at a later point in
time.

Relates to #87323
2022-06-07 07:02:18 -04:00
wallrik
10f53f8766
Clarify environments with strict firewalls and GEOIP (RE: #85637) (#86648) 2022-05-23 06:43:26 -06:00
Luca Belluccini
1c52081b1f
[DOC] Air gapped environments and GEOIP (#85637)
* [DOC] Air gapped environments and GEOIP

Closing https://github.com/elastic/elasticsearch/issues/85542

* Use variable name for Elasticsearch

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-05-10 16:34:28 -04:00