Commit graph

452 commits

Author SHA1 Message Date
Joe Gallo
aee9aa484f
[8.17] Minor docs fixes (#126144) (#126153) 2025-04-02 13:02:08 -04:00
Joe Gallo
1a252841a1
Update geolocation database documentation (#121472) (#121670) 2025-02-05 02:22:30 +11:00
Lisa Cawley
ebceb76d49
[DOCS] More links to new API site (#119377) (#119418)
(cherry picked from commit ba8beecdb0)
2024-12-31 20:21:46 +00:00
Lisa Cawley
f6a2a991d5
[DOCS] Link to new API site (#119038) (#119360)
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
2024-12-31 04:18:27 +11:00
István Zoltán Szabó
db4f33043e
[DOCS] Adds examples to inference processor docs (#116018) (#118134)
(cherry picked from commit f27cb5efd3)
2024-12-06 10:26:17 +01:00
Joe Gallo
7427eb97b6
Document new ip_location processor (#116623) (#116630) 2024-11-12 12:20:52 +11:00
Joe Gallo
807d988c9d
Document new ip_location APIs (#116611) (#116615) 2024-11-12 06:24:17 +11:00
Joe Gallo
c8134bf787
Document new ip geolocation fields (#116603) (#116606) 2024-11-12 03:41:21 +11:00
István Zoltán Szabó
45af6f97f0
[DOCS] Updates inference processor docs. (#115566) (#115627) 2024-10-25 19:44:51 +11:00
Keith Massey
35f7efefd1
Adding support for additional mapping to simulate ingest API (#114742) (#115284) 2024-10-22 08:13:33 -05:00
Pete Gillin
d3535d5a64
Actually add terminate docs page (#114440) (#114478)
A docs page for the `terminate` processor was added in
https://github.com/elastic/elasticsearch/pull/114157, but the change
to include it in the outer processor reference page was omitted. This
change corrects that oversight.
2024-10-10 19:00:53 +11:00
Pete Gillin
6ec7a3439d
Add a terminate ingest processor (#114157) (#114343)
This processor simply causes any remaining processors in the pipeline
to be skipped. It will normally be executed conditionally using the
`if` option. (If this pipeline is being called from another pipeline,
the calling pipeline is *not* terminated.)

For example, this:

```
POST /_ingest/pipeline/_simulate
{
  "pipeline":
  {
    "description": "Appends just 'before' to the steps field if the number field
 is present, or both 'before' and 'after' if not",
    "processors": [
      {
        "append": {
          "field": "steps",
          "value": "before"
        }
      },
      {
        "terminate": {
          "if": "ctx.error != null"
        }
      },
      {
        "append": {
          "field": "steps",
          "value": "after"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "doc1",
      "_source": {
        "name": "okay",
        "steps": []
      }
    },
    {
      "_index": "index",
      "_id": "doc2",
      "_source": {
        "name": "bad",
        "error": "oh no",
        "steps": []
      }
    }
  ]
}
```

returns something like this:

```
{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc1",
        "_source": {
          "name": "okay",
          "steps": [
            "before",
            "after"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448881Z"
        }
      }
    },
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc2",
        "_source": {
          "name": "bad",
          "error": "oh no",
          "steps": [
            "before"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448932Z"
        }
      }
    }
  ]
}
```
2024-10-09 16:44:57 +01:00
Keith Massey
37125f265d
Adding index_template_substitutions to the simulate ingest API (#114128) (#114374)
This adds support for a new `index_template_substitutions` field to the
body of an ingest simulate API request. These substitutions can be used
to change the pipeline(s) used for ingest, or to change the mappings
used for validation. It is similar to the
`component_template_substitutions` added in #113276. Here is an example
that shows both of those usages working together:

```
## First, add a couple of pipelines that set a field to a boolean:
PUT /_ingest/pipeline/foo-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "foo",
        "value": true
      }
    }
  ]
}

PUT /_ingest/pipeline/bar-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "bar",
        "value": true
      }
    }
  ]
}

## Now, create three component templates. One provides a mapping enforces that the only field is "foo"
## and that field is a keyword. The next is similar, but adds a `bar` field. The final one provides a setting
## that makes "foo-pipeline" the default pipeline.
## Remember that the "foo-pipeline" sets the "foo" field to a boolean, so using both of these templates
## together would cause a validation exception. These could be in the same template, but are provided
## separately just so that later we can show how multiple templates can be overridden.
PUT _component_template/mappings_template
{
  "template": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "foo": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT _component_template/mappings_template_with_bar
{
    "template": {
      "mappings": {
        "dynamic": "strict",
        "properties": {
          "foo": {
            "type": "keyword"
          },
          "bar": {
            "type": "boolean"
          }
        }
      }
    }
}

PUT _component_template/settings_template
{
  "template": {
    "settings": {
      "index": {
        "default_pipeline": "foo-pipeline"
      }
    }
  }
}

## Here we create an index template  pulling in both of the component templates above
PUT _index_template/template_1
{
  "index_patterns": ["foo*"],
  "composed_of": ["mappings_template", "settings_template"]
}

## We can index a document here to create the index, or not. Either way the simulate call ought to work the same
POST foo-1/_doc
{
  "foo": "FOO"
}

## This will not blow up with validation exceptions because the substitute "index_template_substitutions"
## uses `mappings_template_with_bar`, which adds the bar field.
## And the bar-pipeline is executed rather than the foo-pipeline because the substitute
## "index_template_substitutions" uses a substitute `settings_template`, so the value of "foo"
## does not get set to an invalid type.
POST _ingest/_simulate?pretty&index=foo-1
{
  "docs": [
    {
      "_id": "asdf",
      "_source": {
        "foo": "foo",
        "bar": "bar"
      }
    }
  ],
  "component_template_substitutions": {
    "settings_template": {
      "template": {
        "settings": {
          "index": {
            "default_pipeline": "bar-pipeline"
          }
        }
      }
    }
  },
  "index_template_substitutions": {
    "template_1": {
      "index_patterns": ["foo*"],
      "composed_of": ["mappings_template_with_bar", "settings_template"]
    }
  }
}
```
2024-10-09 13:04:23 +11:00
István Zoltán Szabó
bca80f7797
[DOCS] Adds DeBERTA v2 to the tokenizers list in API docs (#112752) (#114203)
Co-authored-by: Max Hniebergall <137079448+maxhniebergall@users.noreply.github.com>
2024-10-07 19:48:43 +11:00
Sam Xiao
d405df9679
Tag redacted document in ingest pipeline (#113552) (#113750)
Adds a new option trace_redact in redact processor to indicate a document has been redacted in the ingest pipeline. If a document is processed by a redact processor AND any field is redacted, ingest metadata _ingest._redact._is_redacted = true will be set.

Closes #94633

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-10-04 06:26:23 +10:00
Liam Thompson
a7dfc80005
[DOCS] Port connector docs from Enterprise Search guide (#112953) (#113771)
(cherry picked from commit 6e400c12a7)
2024-09-30 19:01:25 +10:00
Simon Cooper
53d9c3cc6a
Add some information on locale database to the ES docs (#113587) 2024-09-30 09:28:13 +01:00
kosabogi
ff926182f1
Adds text_similarity task type to inference processor documentation (#113517) (#113612) 2024-09-27 00:38:48 +10:00
Keith Massey
7870e2dbe2
Adding component template substitutions to the simulate ingest API (#113276) (#113567) 2024-09-26 07:32:13 +10:00
Simon Cooper
ceb9deff89
Use deprecation logger for CLDR date format specifiers (#112917)
The addition of the logger requires several updates to tests to deal with the possible warning, or muting if there is not way to specify an allowed (but not mandatory) warning
2024-09-19 15:50:37 +01:00
Stef Nestor
b9662b505b
(Doc+) Inference Pipeline ignores Mapping Analyzers (#112522) (#112776)
* (Doc+) Inference Pipeline ignores Mapping Analyzers

From internal Dev feedback (will cross-link after), this updates that inference processors within ingest pipelines run before mapping analyzers effectively ignoring them. So if users want analyzers to take effect, they would need to select the analyzer's ingest pipeline process equivalent and run it higher in flow than the inference processor.

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-09-12 08:30:07 +10:00
Keith Massey
4aa3c3d7ee
Add support for templates when validating mappings in the simulate ingest API (#111161) 2024-09-05 09:25:53 -05:00
Panos Koutsovasilis
29453cb2ce
fix: support all allowed protocol numbers (#111528)
* fix(CommunityIdProcessor): support all allowed protocol numbers

* fix(CommunityIdProcessor): update documentation
2024-08-26 08:37:40 +03:00
Niels Bauman
e0c1ccbc1e
Make enrich cache based on memory usage (#111412)
The max enrich cache size setting now also supports an absolute max size in bytes (of used heap space) and a percentage of the max heap space, next to the existing flat document count. The default is 1% of the max heap space.

This should prevent issues where the enrich cache takes up a lot of memory when there are large documents in the cache.
2024-08-23 09:26:55 +02:00
István Zoltán Szabó
1ba72e4602
[DOCS] Documents output_field behavior after multiple inference runs (#111875)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-08-15 12:36:59 +02:00
Keith Massey
c6a7537df7
Ingest download databases docs (#111688)
Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-08-08 09:23:56 -05:00
Joe Gallo
1aa5b2face
Fix geoip processor isp_organization_name property and docs (#111372) 2024-07-26 18:28:44 -04:00
Niels Bauman
86727a8741
Add size_in_bytes to enrich cache stats (#110578)
As preparation for #106081, this PR adds the `size_in_bytes`
field to the enrich cache. This field is calculated by summing
the ByteReference sizes of all the search hits in the cache.
It's not a perfect representation of the size of the enrich cache
on the heap, but some experimentation showed that it's quite close.
2024-07-12 08:53:53 +02:00
Matt Culbreth
81b8495388
Mark the Redact processor as Generally Available 2024-07-02 16:58:57 -04:00
Kathleen DeRusso
7a1d532ffb
Pass over Sparse Vector docs for correctness (#110282)
* Remove legacy mentions of text expansion queries

* Add missing query_vector param to sparse_vector query docs

* Fix formatting errors in sparse vector query dsl doc

* Remove unnecessary test setup block
2024-07-02 13:37:25 -04:00
Joe Gallo
d9941f6285
Ingest geoip new databases release highlight (#109355) 2024-06-04 12:48:19 -04:00
Joe Gallo
e1b2b599de
Add continent_code support to the geoip processor (#108780) 2024-05-17 11:48:23 -04:00
Joe Gallo
babab0a8c0
Add support for the 'Connection Type' database to the geoip processor (#108683) 2024-05-15 17:58:08 -04:00
Keith Massey
639eee577e
Adding user_type support for the enterprise database for the geoip processor (#108687) 2024-05-15 12:23:52 -05:00
Keith Massey
69ec54d541
Add support for the 'ISP' database to the geoip processor (#108651) 2024-05-15 09:27:06 -05:00
Joe Gallo
cc6597df23
Add support for the 'Domain' database to the geoip processor (#108639) 2024-05-14 17:49:05 -04:00
Keith Massey
bcd62e8d03
Adding hits_time_in_millis and misses_time_in_millis to enrich cache stats (#107579) 2024-04-18 15:19:24 -05:00
Keith Massey
8adc2926a2
Fixed the spelling of the word successful in docs (#107595) 2024-04-18 08:08:30 -05:00
Liam Thompson
33a71e3289
[DOCS] Refactor book-scoped variables in docs/reference/index.asciidoc (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00
Keith Massey
f5c7938ab8
Adding cache_stats to geoip stats API (#107334) 2024-04-16 16:57:14 -05:00
Joe Gallo
6ff3a2628a
Add support for the 'Enterprise' database to the geoip processor (#107377) 2024-04-11 16:45:10 -04:00
Joe Gallo
5266f79b16
Add support for the 'Anonymous IP' database to the geoip processor (#107287) 2024-04-11 14:05:52 -04:00
Keith Massey
48a88c575c
Renaming GeoIpDownloaderStatsAction (#107290)
Renaming GeoIpDownloaderStatsAction to GeoIpStatsAction
2024-04-10 09:21:24 -05:00
Jennie Soria
30828a5680
Update geoip.asciidoc (#105908)
The GeoIP endpoint does not use the xpack http client. The GeoIP downloader uses the JDKs builtin cacerts.

If customer is using custom https endpoint they need to provide the cacert in the jdk, whether our jdk bundled in or their jdk. Otherwise they will see something like
```
...PKiX path building failed: sun.security.provier.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target...
```
2024-03-05 11:26:49 +01:00
Liam Thompson
52aefa59eb
[DOCS] Ingest processors docs improvements (#104384)
* [DOCS] Categorize ingest processors on overview page, summarize use cases

* Add overview info, subheading, links

* Apply suggestions from review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Insert space

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-01-17 11:50:29 +01:00
ShourieG
147484b059
[elasticsearch][processors] - Added support for override flag in rename processor (#103565)
* added override flag for rename processer along with factory tests

* added yaml tests for rename processor using the override flag

* updated renameProcessor tests to include override flag as a parameter

* updated rename processor tests to incorporate override flag = true scenario

* updated rename processor asciidoc with override option

* updated rename processor asciidoc with override option

* removed unnecessary supresswarnings tag

* corrected formatting errors

* updated processor tests

* fixed yaml tests

* Prefer early throw style here

* Whitespace

* Move and rewrite this test

It's just a simple test of the primary behavior of the rename
processor, so put it first and simplify it.

* Rename this test

It doesn't actually exercise template snippets

* Tidy up this test

---------

Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-01-11 16:00:02 +05:30
Adam Demjen
a26ff243f6
[Docs] [Enterprise Search] ML inference pipeline documentation updates (#103022)
* Remove mapping step, wording and screenshot updates

* Notes about pipeline name and model deployment

* Address CR comments
2024-01-02 09:56:50 -05:00
Abdon Pijpelink
ac973f0064
[DOCS] Improve enrich policy execute 'wait_for_completion' docs (#102291)
* [DOCS] Improve enrich policy execute 'wait_for_completion' docs

* Update docs/reference/ingest/apis/enrich/execute-enrich-policy.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

---------

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-11-27 17:17:06 +01:00
Abdon Pijpelink
bc59315baa
[DOCS] Examples for ES|QL DISSECT and WHERE (#102591)
* DISSECT examples

* WHERE examples

* Remove references to empty keys

* Fix non-deterministic test
2023-11-27 10:56:48 +01:00
Keith Massey
643d825c45
Adding a simulate ingest api (#101409)
This commit introduces a new _ingest/simulate API that runs any pipelines
on the given data that would be executed for a given index, but instead of
indexing the data into the index, returns the transformed documents and
the list of pipelines that were executed.
2023-11-15 17:25:09 -06:00