elasticsearch/docs/reference/ingest/processors.asciidoc
Pete Gillin d3535d5a64
Actually add terminate docs page (#114440) (#114478)
A docs page for the `terminate` processor was added in
https://github.com/elastic/elasticsearch/pull/114157, but the change
to include it in the outer processor reference page was omitted. This
change corrects that oversight.
2024-10-10 19:00:53 +11:00

269 lines
9.7 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[processors]]
== Ingest processor reference
++++
<titleabbrev>Processor reference</titleabbrev>
++++
An <<ingest,ingest pipeline>> is made up of a sequence of processors that are applied to documents as they are ingested into an index.
Each processor performs a specific task, such as filtering, transforming, or enriching data.
Each successive processor depends on the output of the previous processor, so the order of processors is important.
The modified documents are indexed into {es} after all processors are applied.
{es} includes over 40 configurable processors.
The subpages in this section contain reference documentation for each processor.
To get a list of available
processors, use the <<cluster-nodes-info,nodes info>> API.
[source,console]
----
GET _nodes/ingest?filter_path=nodes.*.ingest.processors
----
[discrete]
[[ingest-processors-categories]]
=== Ingest processors by category
We've categorized the available processors on this page and summarized their functions.
This will help you find the right processor for your use case.
* <<ingest-process-category-data-enrichment>>
* <<ingest-process-category-data-transformation>>
* <<ingest-process-category-data-filtering>>
* <<ingest-process-category-pipeline-handling>>
* <<ingest-process-category-array-json-handling>>
[discrete]
[[ingest-process-category-data-enrichment]]
=== Data enrichment processors
[discrete]
[[ingest-process-category-data-enrichment-general]]
==== General outcomes
<<append-processor, `append` processor>>::
Appends a value to a field.
<<date-index-name-processor, `date_index_name` processor>>::
Points documents to the right time-based index based on a date or timestamp field.
<<enrich-processor, `enrich` processor>>::
Enriches documents with data from another index.
[TIP]
====
Refer to <<ingest-enriching-data, Enrich your data>> for detailed examples of how to use the `enrich` processor to add data from your existing indices to incoming documents during ingest.
====
<<inference-processor, `inference` processor>>::
Uses {ml} to classify and tag text fields.
[discrete]
[[ingest-process-category-data-enrichment-specific]]
==== Specific outcomes
<<attachment, `attachment` processor>>::
Parses and indexes binary data, such as PDFs and Word documents.
<<ingest-circle-processor, `circle` processor>>::
Converts a location field to a Geo-Point field.
<<community-id-processor, `community_id` processor>>::
Computes the Community ID for network flow data.
<<fingerprint-processor, `fingerprint` processor>>::
Computes a hash of the documents content.
<<ingest-geo-grid-processor, `geo_grid` processor>>::
Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape.
<<geoip-processor, `geoip` processor>>::
Adds information about the geographical location of an IPv4 or IPv6 address.
<<network-direction-processor, `network_direction` processor>>::
Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.
<<registered-domain-processor, `registered_domain` processor>>::
Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN).
<<ingest-node-set-security-user-processor, `set_security_user` processor>>::
Sets user-related details (such as `username`, `roles`, `email`, `full_name`,`metadata`, `api_key`, `realm` and `authentication_type`) from the current authenticated user to the current document by pre-processing the ingest.
<<uri-parts-processor, `uri_parts` processor>>::
Parses a Uniform Resource Identifier (URI) string and extracts its components as an object.
<<urldecode-processor, `urldecode` processor>>::
URL-decodes a string.
<<user-agent-processor, `user_agent` processor>>::
Parses user-agent strings to extract information about web clients.
[discrete]
[[ingest-process-category-data-transformation]]
=== Data transformation processors
[discrete]
[[ingest-process-category-data-transformation-general]]
==== General outcomes
<<convert-processor, `convert` processor>>::
Converts a field in the currently ingested document to a different type, such as converting a string to an integer.
<<dissect-processor, `dissect` processor>>::
Extracts structured fields out of a single text field within a document.
Unlike the <<grok-processor,grok processor>>, dissect does not use regular expressions.
This makes the dissect's a simpler and often faster alternative.
<<grok-processor, `grok` processor>>::
Extracts structured fields out of a single text field within a document, using the <<grok, Grok>> regular expression dialect that supports reusable aliased expressions.
<<gsub-processor, `gsub` processor>>::
Converts a string field by applying a regular expression and a replacement.
<<redact-processor, `redact` processor>>::
Uses the <<grok, Grok>> rules engine to obscure text in the input document matching the given Grok patterns.
<<rename-processor, `rename` processor>>::
Renames an existing field.
<<set-processor, `set` processor>>::
Sets a value on a field.
[discrete]
[[ingest-process-category-data-transformation-specific]]
==== Specific outcomes
<<bytes-processor, `bytes` processor>>::
Converts a human-readable byte value to its value in bytes (for example `1kb` becomes `1024`).
<<csv-processor, `csv` processor>>::
Extracts a single line of CSV data from a text field.
<<date-processor, `date` processor>>::
Extracts and converts date fields.
<<dot-expand-processor, `dot_expand`>> processor::
Expands a field with dots into an object field.
<<htmlstrip-processor, `html_strip` processor>>::
Removes HTML tags from a field.
<<join-processor, `join` processor>>::
Joins each element of an array into a single string using a separator character between each element.
<<kv-processor, `kv` processor>>::
Parse messages (or specific event fields) containing key-value pairs.
<<lowercase-processor, `lowercase` processor>> and <<uppercase-processor, `uppercase` processor>>::
Converts a string field to lowercase or uppercase.
<<split-processor, `split` processor>>::
Splits a field into an array of values.
<<trim-processor, `trim` processor>>::
Trims whitespace from field.
[discrete]
[[ingest-process-category-data-filtering]]
=== Data filtering processors
<<drop-processor, `drop` processor>>::
Drops the document without raising any errors.
<<remove-processor, `remove` processor>>::
Removes fields from documents.
[discrete]
[[ingest-process-category-pipeline-handling]]
=== Pipeline handling processors
<<fail-processor, `fail` processor>>::
Raises an exception. Useful for when you expect a pipeline to fail and want to relay a specific message to the requester.
<<pipeline-processor, `pipeline` processor>>::
Executes another pipeline.
<<reroute-processor, `reroute` processor>>::
Reroutes documents to another target index or data stream.
<<terminate-processor, `terminate` processor>>::
Terminates the current ingest pipeline, causing no further processors to be run.
[discrete]
[[ingest-process-category-array-json-handling]]
=== Array/JSON handling processors
<<foreach-processor, `for_each` processor>>::
Runs an ingest processor on each element of an array or object.
<<json-processor, `json` processor>>::
Converts a JSON string into a structured JSON object.
<<script-processor, `script` processor>>::
Runs an inline or stored <<modules-scripting, script>> on incoming documents.
The script runs in the {painless}/painless-ingest-processor-context.html[painless `ingest` context].
<<sort-processor, `sort` processor>>::
Sorts the elements of an array in ascending or descending order.
[discrete]
[[ingest-process-plugins]]
=== Add additional processors
You can install additional processors as {plugins}/ingest.html[plugins].
You must install any plugin processors on all nodes in your cluster. Otherwise,
{es} will fail to create pipelines containing the processor.
Mark a plugin as mandatory by setting `plugin.mandatory` in
`elasticsearch.yml`. A node will fail to start if a mandatory plugin is not
installed.
[source,yaml]
----
plugin.mandatory: my-ingest-plugin
----
include::processors/append.asciidoc[]
include::processors/attachment.asciidoc[]
include::processors/bytes.asciidoc[]
include::processors/circle.asciidoc[]
include::processors/community-id.asciidoc[]
include::processors/convert.asciidoc[]
include::processors/csv.asciidoc[]
include::processors/date.asciidoc[]
include::processors/date-index-name.asciidoc[]
include::processors/dissect.asciidoc[]
include::processors/dot-expand.asciidoc[]
include::processors/drop.asciidoc[]
include::processors/enrich.asciidoc[]
include::processors/fail.asciidoc[]
include::processors/fingerprint.asciidoc[]
include::processors/foreach.asciidoc[]
include::processors/geo-grid.asciidoc[]
include::processors/geoip.asciidoc[]
include::processors/grok.asciidoc[]
include::processors/gsub.asciidoc[]
include::processors/html_strip.asciidoc[]
include::processors/inference.asciidoc[]
include::processors/join.asciidoc[]
include::processors/json.asciidoc[]
include::processors/kv.asciidoc[]
include::processors/lowercase.asciidoc[]
include::processors/network-direction.asciidoc[]
include::processors/pipeline.asciidoc[]
include::processors/redact.asciidoc[]
include::processors/registered-domain.asciidoc[]
include::processors/remove.asciidoc[]
include::processors/rename.asciidoc[]
include::processors/reroute.asciidoc[]
include::processors/script.asciidoc[]
include::processors/set.asciidoc[]
include::processors/set-security-user.asciidoc[]
include::processors/sort.asciidoc[]
include::processors/split.asciidoc[]
include::processors/terminate.asciidoc[]
include::processors/trim.asciidoc[]
include::processors/uppercase.asciidoc[]
include::processors/url-decode.asciidoc[]
include::processors/uri-parts.asciidoc[]
include::processors/user-agent.asciidoc[]