mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
* Remove `es-test-dir` book-scoped variable * Remove `plugins-examples-dir` book-scoped variable * Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables - In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed. - In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path - In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem * Replace `es-repo-dir` with `es-ref-dir` * Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
283 lines
No EOL
9.8 KiB
Text
283 lines
No EOL
9.8 KiB
Text
[[ingest-enriching-data]]
|
|
== Enrich your data
|
|
|
|
You can use the <<enrich-processor,enrich processor>> to add data from your
|
|
existing indices to incoming documents during ingest.
|
|
|
|
For example, you can use the enrich processor to:
|
|
|
|
* Identify web services or vendors based on known IP addresses
|
|
* Add product information to retail orders based on product IDs
|
|
* Supplement contact information based on an email address
|
|
* Add postal codes based on user coordinates
|
|
|
|
[discrete]
|
|
[[how-enrich-works]]
|
|
=== How the enrich processor works
|
|
|
|
Most processors are self-contained and only change _existing_ data in incoming
|
|
documents.
|
|
|
|
image::images/ingest/ingest-process.svg[align="center"]
|
|
|
|
The enrich processor adds _new_ data to incoming documents and requires a few
|
|
special components:
|
|
|
|
image::images/ingest/enrich/enrich-process.svg[align="center"]
|
|
|
|
[[enrich-policy]]
|
|
enrich policy::
|
|
+
|
|
--
|
|
A set of configuration options used to add the right enrich data to the right
|
|
incoming documents.
|
|
|
|
An enrich policy contains:
|
|
|
|
// tag::enrich-policy-fields[]
|
|
* A list of one or more _source indices_ which store enrich data as documents
|
|
* The _policy type_ which determines how the processor matches the enrich data
|
|
to incoming documents
|
|
* A _match field_ from the source indices used to match incoming documents
|
|
* _Enrich fields_ containing enrich data from the source indices you want to add
|
|
to incoming documents
|
|
// end::enrich-policy-fields[]
|
|
|
|
Before it can be used with an enrich processor, an enrich policy must be
|
|
<<execute-enrich-policy-api,executed>>. When executed, an enrich policy uses
|
|
enrich data from the policy's source indices to create a streamlined system
|
|
index called the _enrich index_. The processor uses this index to match and
|
|
enrich incoming documents.
|
|
--
|
|
|
|
[[source-index]]
|
|
source index::
|
|
An index which stores enrich data you'd like to add to incoming documents. You
|
|
can create and manage these indices just like a regular {es} index. You can use
|
|
multiple source indices in an enrich policy. You also can use the same source
|
|
index in multiple enrich policies.
|
|
|
|
[[enrich-index]]
|
|
enrich index::
|
|
+
|
|
--
|
|
A special system index tied to a specific enrich policy.
|
|
|
|
Directly matching incoming documents to documents in source indices could be
|
|
slow and resource intensive. To speed things up, the enrich processor uses an
|
|
enrich index.
|
|
|
|
// tag::enrich-index[]
|
|
Enrich indices contain enrich data from source indices but have a few special
|
|
properties to help streamline them:
|
|
|
|
* They are system indices, meaning they're managed internally by {es} and only
|
|
intended for use with enrich processors and the {esql} `ENRICH` command.
|
|
* They always begin with `.enrich-*`.
|
|
* They are read-only, meaning you can't directly change them.
|
|
* They are <<indices-forcemerge,force merged>> for fast retrieval.
|
|
// end::enrich-index[]
|
|
--
|
|
|
|
[[enrich-setup]]
|
|
=== Set up an enrich processor
|
|
|
|
To set up an enrich processor, follow these steps:
|
|
|
|
. Check the <<enrich-prereqs, prerequisites>>.
|
|
. <<create-enrich-source-index>>.
|
|
. <<create-enrich-policy>>.
|
|
. <<execute-enrich-policy>>.
|
|
. <<add-enrich-processor>>.
|
|
. <<ingest-enrich-docs>>.
|
|
|
|
Once you have an enrich processor set up,
|
|
you can <<update-enrich-data,update your enrich data>>
|
|
and <<update-enrich-policies, update your enrich policies>>.
|
|
|
|
[IMPORTANT]
|
|
====
|
|
The enrich processor performs several operations and may impact the speed of
|
|
your ingest pipeline.
|
|
|
|
We strongly recommend testing and benchmarking your enrich processors
|
|
before deploying them in production.
|
|
|
|
We do not recommend using the enrich processor to append real-time data.
|
|
The enrich processor works best with reference data
|
|
that doesn't change frequently.
|
|
====
|
|
|
|
[discrete]
|
|
[[enrich-prereqs]]
|
|
==== Prerequisites
|
|
|
|
include::{es-ref-dir}/ingest/apis/enrich/put-enrich-policy.asciidoc[tag=enrich-policy-api-prereqs]
|
|
|
|
[[create-enrich-source-index]]
|
|
==== Add enrich data
|
|
|
|
// tag::create-enrich-source-index[]
|
|
To begin, add documents to one or more source indices. These documents should
|
|
contain the enrich data you eventually want to add to incoming data.
|
|
|
|
You can manage source indices just like regular {es} indices using the
|
|
<<docs,document>> and <<indices,index>> APIs.
|
|
|
|
You also can set up {beats-ref}/getting-started.html[{beats}], such as a
|
|
{filebeat-ref}/filebeat-installation-configuration.html[{filebeat}], to
|
|
automatically send and index documents to your source indices. See
|
|
{beats-ref}/getting-started.html[Getting started with {beats}].
|
|
//end::create-enrich-source-index[]
|
|
|
|
[[create-enrich-policy]]
|
|
==== Create an enrich policy
|
|
|
|
// tag::create-enrich-policy[]
|
|
After adding enrich data to your source indices, use the
|
|
<<put-enrich-policy-api,create enrich policy API>> or
|
|
<<manage-enrich-policies,Index Management in {kib}>> to create an enrich policy.
|
|
|
|
[WARNING]
|
|
====
|
|
Once created, you can't update or change an enrich policy.
|
|
See <<update-enrich-policies>>.
|
|
====
|
|
// end::create-enrich-policy[]
|
|
|
|
[[execute-enrich-policy]]
|
|
==== Execute the enrich policy
|
|
|
|
// tag::execute-enrich-policy1[]
|
|
Once the enrich policy is created, you need to execute it using the
|
|
<<execute-enrich-policy-api,execute enrich policy API>> or
|
|
<<manage-enrich-policies,Index Management in {kib}>> to create an
|
|
<<enrich-index,enrich index>>.
|
|
// end::execute-enrich-policy1[]
|
|
|
|
image::images/ingest/enrich/enrich-policy-index.svg[align="center"]
|
|
|
|
// tag::execute-enrich-policy2[]
|
|
include::apis/enrich/execute-enrich-policy.asciidoc[tag=execute-enrich-policy-def]
|
|
// end::execute-enrich-policy2[]
|
|
|
|
[[add-enrich-processor]]
|
|
==== Add an enrich processor to an ingest pipeline
|
|
|
|
Once you have source indices, an enrich policy, and the related enrich index in
|
|
place, you can set up an ingest pipeline that includes an enrich processor for
|
|
your policy.
|
|
|
|
image::images/ingest/enrich/enrich-processor.svg[align="center"]
|
|
|
|
Define an <<enrich-processor,enrich processor>> and add it to an ingest
|
|
pipeline using the <<put-pipeline-api,create or update pipeline API>>.
|
|
|
|
When defining the enrich processor, you must include at least the following:
|
|
|
|
* The enrich policy to use.
|
|
* The field used to match incoming documents to the documents in your enrich index.
|
|
* The target field to add to incoming documents. This target field contains the
|
|
match and enrich fields specified in your enrich policy.
|
|
|
|
You also can use the `max_matches` option to set the number of enrich documents
|
|
an incoming document can match. If set to the default of `1`, data is added to
|
|
an incoming document's target field as a JSON object. Otherwise, the data is
|
|
added as an array.
|
|
|
|
See <<enrich-processor>> for a full list of configuration options.
|
|
|
|
You also can add other <<processors,processors>> to your ingest pipeline.
|
|
|
|
[[ingest-enrich-docs]]
|
|
==== Ingest and enrich documents
|
|
|
|
You can now use your ingest pipeline to enrich and index documents.
|
|
|
|
image::images/ingest/enrich/enrich-process.svg[align="center"]
|
|
|
|
Before implementing the pipeline in production, we recommend indexing a few test
|
|
documents first and verifying enrich data was added correctly using the
|
|
<<docs-get,get API>>.
|
|
|
|
[[update-enrich-data]]
|
|
==== Update an enrich index
|
|
|
|
include::{es-ref-dir}/ingest/apis/enrich/execute-enrich-policy.asciidoc[tag=update-enrich-index]
|
|
|
|
If wanted, you can <<docs-reindex,reindex>>
|
|
or <<docs-update-by-query,update>> any already ingested documents
|
|
using your ingest pipeline.
|
|
|
|
[[update-enrich-policies]]
|
|
==== Update an enrich policy
|
|
|
|
// tag::update-enrich-policy[]
|
|
Once created, you can't update or change an enrich policy.
|
|
Instead, you can:
|
|
|
|
. Create and <<execute-enrich-policy-api,execute>> a new enrich policy.
|
|
|
|
. Replace the previous enrich policy
|
|
with the new enrich policy
|
|
in any in-use enrich processors or {esql} queries.
|
|
|
|
. Use the <<delete-enrich-policy-api, delete enrich policy>> API or
|
|
<<manage-enrich-policies,Index Management in {kib}>>
|
|
to delete the previous enrich policy.
|
|
// end::update-enrich-policy[]
|
|
|
|
[[ingest-enrich-components]]
|
|
==== Enrich components
|
|
|
|
The enrich coordinator is a component that manages and performs the searches
|
|
required to enrich documents on each ingest node. It combines searches from all enrich
|
|
processors in all pipelines into bulk <<search-multi-search,multi-searches>>.
|
|
|
|
The enrich policy executor is a component that manages the executions of all
|
|
enrich policies. When an enrich policy is executed, this component creates
|
|
a new enrich index and removes the previous enrich index. The enrich policy
|
|
executions are managed from the elected master node. The execution of these
|
|
policies occurs on a different node.
|
|
|
|
[[ingest-enrich-settings]]
|
|
==== Node Settings
|
|
|
|
The `enrich` processor has node settings for enrich coordinator and
|
|
enrich policy executor.
|
|
|
|
The enrich coordinator supports the following node settings:
|
|
|
|
`enrich.cache_size`::
|
|
Maximum number of searches to cache for enriching documents. Defaults to `1000`.
|
|
There is a single cache for all enrich processors in the cluster. This setting
|
|
determines the size of that cache.
|
|
|
|
`enrich.coordinator_proxy.max_concurrent_requests`::
|
|
Maximum number of concurrent <<search-multi-search,multi-search requests>> to
|
|
run when enriching documents. Defaults to `8`.
|
|
|
|
`enrich.coordinator_proxy.max_lookups_per_request`::
|
|
Maximum number of searches to include in a <<search-multi-search,multi-search
|
|
request>> when enriching documents. Defaults to `128`.
|
|
|
|
The enrich policy executor supports the following node settings:
|
|
|
|
`enrich.fetch_size`::
|
|
Maximum batch size when reindexing a source index into an enrich index. Defaults
|
|
to `10000`.
|
|
|
|
`enrich.max_force_merge_attempts`::
|
|
Maximum number of <<indices-forcemerge,force merge>> attempts allowed on an
|
|
enrich index. Defaults to `3`.
|
|
|
|
`enrich.cleanup_period`::
|
|
How often {es} checks whether unused enrich indices can be deleted. Defaults to
|
|
`15m`.
|
|
|
|
`enrich.max_concurrent_policy_executions`::
|
|
Maximum number of enrich policies to execute concurrently. Defaults to `50`.
|
|
|
|
include::geo-match-enrich-policy-type-ex.asciidoc[]
|
|
include::match-enrich-policy-type-ex.asciidoc[]
|
|
include::range-enrich-policy-type-ex.asciidoc[] |