mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
A docs page for the `terminate` processor was added in https://github.com/elastic/elasticsearch/pull/114157, but the change to include it in the outer processor reference page was omitted. This change corrects that oversight.
269 lines
9.7 KiB
Text
269 lines
9.7 KiB
Text
[[processors]]
|
||
== Ingest processor reference
|
||
++++
|
||
<titleabbrev>Processor reference</titleabbrev>
|
||
++++
|
||
|
||
An <<ingest,ingest pipeline>> is made up of a sequence of processors that are applied to documents as they are ingested into an index.
|
||
Each processor performs a specific task, such as filtering, transforming, or enriching data.
|
||
|
||
Each successive processor depends on the output of the previous processor, so the order of processors is important.
|
||
The modified documents are indexed into {es} after all processors are applied.
|
||
|
||
{es} includes over 40 configurable processors.
|
||
The subpages in this section contain reference documentation for each processor.
|
||
To get a list of available
|
||
processors, use the <<cluster-nodes-info,nodes info>> API.
|
||
|
||
[source,console]
|
||
----
|
||
GET _nodes/ingest?filter_path=nodes.*.ingest.processors
|
||
----
|
||
|
||
[discrete]
|
||
[[ingest-processors-categories]]
|
||
=== Ingest processors by category
|
||
|
||
We've categorized the available processors on this page and summarized their functions.
|
||
This will help you find the right processor for your use case.
|
||
|
||
* <<ingest-process-category-data-enrichment>>
|
||
* <<ingest-process-category-data-transformation>>
|
||
* <<ingest-process-category-data-filtering>>
|
||
* <<ingest-process-category-pipeline-handling>>
|
||
* <<ingest-process-category-array-json-handling>>
|
||
|
||
[discrete]
|
||
[[ingest-process-category-data-enrichment]]
|
||
=== Data enrichment processors
|
||
|
||
[discrete]
|
||
[[ingest-process-category-data-enrichment-general]]
|
||
==== General outcomes
|
||
|
||
<<append-processor, `append` processor>>::
|
||
Appends a value to a field.
|
||
|
||
<<date-index-name-processor, `date_index_name` processor>>::
|
||
Points documents to the right time-based index based on a date or timestamp field.
|
||
|
||
<<enrich-processor, `enrich` processor>>::
|
||
Enriches documents with data from another index.
|
||
[TIP]
|
||
====
|
||
Refer to <<ingest-enriching-data, Enrich your data>> for detailed examples of how to use the `enrich` processor to add data from your existing indices to incoming documents during ingest.
|
||
====
|
||
|
||
<<inference-processor, `inference` processor>>::
|
||
Uses {ml} to classify and tag text fields.
|
||
|
||
[discrete]
|
||
[[ingest-process-category-data-enrichment-specific]]
|
||
==== Specific outcomes
|
||
|
||
<<attachment, `attachment` processor>>::
|
||
Parses and indexes binary data, such as PDFs and Word documents.
|
||
|
||
<<ingest-circle-processor, `circle` processor>>::
|
||
Converts a location field to a Geo-Point field.
|
||
|
||
<<community-id-processor, `community_id` processor>>::
|
||
Computes the Community ID for network flow data.
|
||
|
||
<<fingerprint-processor, `fingerprint` processor>>::
|
||
Computes a hash of the document’s content.
|
||
|
||
<<ingest-geo-grid-processor, `geo_grid` processor>>::
|
||
Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape.
|
||
|
||
<<geoip-processor, `geoip` processor>>::
|
||
Adds information about the geographical location of an IPv4 or IPv6 address.
|
||
|
||
<<network-direction-processor, `network_direction` processor>>::
|
||
Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.
|
||
|
||
<<registered-domain-processor, `registered_domain` processor>>::
|
||
Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN).
|
||
|
||
<<ingest-node-set-security-user-processor, `set_security_user` processor>>::
|
||
Sets user-related details (such as `username`, `roles`, `email`, `full_name`,`metadata`, `api_key`, `realm` and `authentication_type`) from the current authenticated user to the current document by pre-processing the ingest.
|
||
|
||
<<uri-parts-processor, `uri_parts` processor>>::
|
||
Parses a Uniform Resource Identifier (URI) string and extracts its components as an object.
|
||
|
||
<<urldecode-processor, `urldecode` processor>>::
|
||
URL-decodes a string.
|
||
|
||
<<user-agent-processor, `user_agent` processor>>::
|
||
Parses user-agent strings to extract information about web clients.
|
||
|
||
[discrete]
|
||
[[ingest-process-category-data-transformation]]
|
||
=== Data transformation processors
|
||
|
||
[discrete]
|
||
[[ingest-process-category-data-transformation-general]]
|
||
==== General outcomes
|
||
|
||
<<convert-processor, `convert` processor>>::
|
||
Converts a field in the currently ingested document to a different type, such as converting a string to an integer.
|
||
|
||
<<dissect-processor, `dissect` processor>>::
|
||
Extracts structured fields out of a single text field within a document.
|
||
Unlike the <<grok-processor,grok processor>>, dissect does not use regular expressions.
|
||
This makes the dissect's a simpler and often faster alternative.
|
||
|
||
<<grok-processor, `grok` processor>>::
|
||
Extracts structured fields out of a single text field within a document, using the <<grok, Grok>> regular expression dialect that supports reusable aliased expressions.
|
||
|
||
<<gsub-processor, `gsub` processor>>::
|
||
Converts a string field by applying a regular expression and a replacement.
|
||
|
||
<<redact-processor, `redact` processor>>::
|
||
Uses the <<grok, Grok>> rules engine to obscure text in the input document matching the given Grok patterns.
|
||
|
||
<<rename-processor, `rename` processor>>::
|
||
Renames an existing field.
|
||
|
||
<<set-processor, `set` processor>>::
|
||
Sets a value on a field.
|
||
|
||
[discrete]
|
||
[[ingest-process-category-data-transformation-specific]]
|
||
==== Specific outcomes
|
||
|
||
<<bytes-processor, `bytes` processor>>::
|
||
Converts a human-readable byte value to its value in bytes (for example `1kb` becomes `1024`).
|
||
|
||
<<csv-processor, `csv` processor>>::
|
||
Extracts a single line of CSV data from a text field.
|
||
|
||
<<date-processor, `date` processor>>::
|
||
Extracts and converts date fields.
|
||
|
||
<<dot-expand-processor, `dot_expand`>> processor::
|
||
Expands a field with dots into an object field.
|
||
|
||
<<htmlstrip-processor, `html_strip` processor>>::
|
||
Removes HTML tags from a field.
|
||
|
||
<<join-processor, `join` processor>>::
|
||
Joins each element of an array into a single string using a separator character between each element.
|
||
|
||
<<kv-processor, `kv` processor>>::
|
||
Parse messages (or specific event fields) containing key-value pairs.
|
||
|
||
<<lowercase-processor, `lowercase` processor>> and <<uppercase-processor, `uppercase` processor>>::
|
||
Converts a string field to lowercase or uppercase.
|
||
|
||
<<split-processor, `split` processor>>::
|
||
Splits a field into an array of values.
|
||
|
||
<<trim-processor, `trim` processor>>::
|
||
Trims whitespace from field.
|
||
|
||
[discrete]
|
||
[[ingest-process-category-data-filtering]]
|
||
=== Data filtering processors
|
||
|
||
<<drop-processor, `drop` processor>>::
|
||
Drops the document without raising any errors.
|
||
|
||
<<remove-processor, `remove` processor>>::
|
||
Removes fields from documents.
|
||
|
||
[discrete]
|
||
[[ingest-process-category-pipeline-handling]]
|
||
=== Pipeline handling processors
|
||
|
||
<<fail-processor, `fail` processor>>::
|
||
Raises an exception. Useful for when you expect a pipeline to fail and want to relay a specific message to the requester.
|
||
|
||
<<pipeline-processor, `pipeline` processor>>::
|
||
Executes another pipeline.
|
||
|
||
<<reroute-processor, `reroute` processor>>::
|
||
Reroutes documents to another target index or data stream.
|
||
|
||
<<terminate-processor, `terminate` processor>>::
|
||
Terminates the current ingest pipeline, causing no further processors to be run.
|
||
|
||
[discrete]
|
||
[[ingest-process-category-array-json-handling]]
|
||
=== Array/JSON handling processors
|
||
|
||
<<foreach-processor, `for_each` processor>>::
|
||
Runs an ingest processor on each element of an array or object.
|
||
|
||
<<json-processor, `json` processor>>::
|
||
Converts a JSON string into a structured JSON object.
|
||
|
||
<<script-processor, `script` processor>>::
|
||
Runs an inline or stored <<modules-scripting, script>> on incoming documents.
|
||
The script runs in the {painless}/painless-ingest-processor-context.html[painless `ingest` context].
|
||
|
||
<<sort-processor, `sort` processor>>::
|
||
Sorts the elements of an array in ascending or descending order.
|
||
|
||
[discrete]
|
||
[[ingest-process-plugins]]
|
||
=== Add additional processors
|
||
|
||
You can install additional processors as {plugins}/ingest.html[plugins].
|
||
|
||
You must install any plugin processors on all nodes in your cluster. Otherwise,
|
||
{es} will fail to create pipelines containing the processor.
|
||
|
||
Mark a plugin as mandatory by setting `plugin.mandatory` in
|
||
`elasticsearch.yml`. A node will fail to start if a mandatory plugin is not
|
||
installed.
|
||
|
||
[source,yaml]
|
||
----
|
||
plugin.mandatory: my-ingest-plugin
|
||
----
|
||
|
||
include::processors/append.asciidoc[]
|
||
include::processors/attachment.asciidoc[]
|
||
include::processors/bytes.asciidoc[]
|
||
include::processors/circle.asciidoc[]
|
||
include::processors/community-id.asciidoc[]
|
||
include::processors/convert.asciidoc[]
|
||
include::processors/csv.asciidoc[]
|
||
include::processors/date.asciidoc[]
|
||
include::processors/date-index-name.asciidoc[]
|
||
include::processors/dissect.asciidoc[]
|
||
include::processors/dot-expand.asciidoc[]
|
||
include::processors/drop.asciidoc[]
|
||
include::processors/enrich.asciidoc[]
|
||
include::processors/fail.asciidoc[]
|
||
include::processors/fingerprint.asciidoc[]
|
||
include::processors/foreach.asciidoc[]
|
||
include::processors/geo-grid.asciidoc[]
|
||
include::processors/geoip.asciidoc[]
|
||
include::processors/grok.asciidoc[]
|
||
include::processors/gsub.asciidoc[]
|
||
include::processors/html_strip.asciidoc[]
|
||
include::processors/inference.asciidoc[]
|
||
include::processors/join.asciidoc[]
|
||
include::processors/json.asciidoc[]
|
||
include::processors/kv.asciidoc[]
|
||
include::processors/lowercase.asciidoc[]
|
||
include::processors/network-direction.asciidoc[]
|
||
include::processors/pipeline.asciidoc[]
|
||
include::processors/redact.asciidoc[]
|
||
include::processors/registered-domain.asciidoc[]
|
||
include::processors/remove.asciidoc[]
|
||
include::processors/rename.asciidoc[]
|
||
include::processors/reroute.asciidoc[]
|
||
include::processors/script.asciidoc[]
|
||
include::processors/set.asciidoc[]
|
||
include::processors/set-security-user.asciidoc[]
|
||
include::processors/sort.asciidoc[]
|
||
include::processors/split.asciidoc[]
|
||
include::processors/terminate.asciidoc[]
|
||
include::processors/trim.asciidoc[]
|
||
include::processors/uppercase.asciidoc[]
|
||
include::processors/url-decode.asciidoc[]
|
||
include::processors/uri-parts.asciidoc[]
|
||
include::processors/user-agent.asciidoc[]
|