elasticsearch/docs/reference/ingest/apis/simulate-ingest.asciidoc
Keith Massey 37125f265d
Adding index_template_substitutions to the simulate ingest API (#114128) (#114374)
This adds support for a new `index_template_substitutions` field to the
body of an ingest simulate API request. These substitutions can be used
to change the pipeline(s) used for ingest, or to change the mappings
used for validation. It is similar to the
`component_template_substitutions` added in #113276. Here is an example
that shows both of those usages working together:

```
## First, add a couple of pipelines that set a field to a boolean:
PUT /_ingest/pipeline/foo-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "foo",
        "value": true
      }
    }
  ]
}

PUT /_ingest/pipeline/bar-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "bar",
        "value": true
      }
    }
  ]
}

## Now, create three component templates. One provides a mapping enforces that the only field is "foo"
## and that field is a keyword. The next is similar, but adds a `bar` field. The final one provides a setting
## that makes "foo-pipeline" the default pipeline.
## Remember that the "foo-pipeline" sets the "foo" field to a boolean, so using both of these templates
## together would cause a validation exception. These could be in the same template, but are provided
## separately just so that later we can show how multiple templates can be overridden.
PUT _component_template/mappings_template
{
  "template": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "foo": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT _component_template/mappings_template_with_bar
{
    "template": {
      "mappings": {
        "dynamic": "strict",
        "properties": {
          "foo": {
            "type": "keyword"
          },
          "bar": {
            "type": "boolean"
          }
        }
      }
    }
}

PUT _component_template/settings_template
{
  "template": {
    "settings": {
      "index": {
        "default_pipeline": "foo-pipeline"
      }
    }
  }
}

## Here we create an index template  pulling in both of the component templates above
PUT _index_template/template_1
{
  "index_patterns": ["foo*"],
  "composed_of": ["mappings_template", "settings_template"]
}

## We can index a document here to create the index, or not. Either way the simulate call ought to work the same
POST foo-1/_doc
{
  "foo": "FOO"
}

## This will not blow up with validation exceptions because the substitute "index_template_substitutions"
## uses `mappings_template_with_bar`, which adds the bar field.
## And the bar-pipeline is executed rather than the foo-pipeline because the substitute
## "index_template_substitutions" uses a substitute `settings_template`, so the value of "foo"
## does not get set to an invalid type.
POST _ingest/_simulate?pretty&index=foo-1
{
  "docs": [
    {
      "_id": "asdf",
      "_source": {
        "foo": "foo",
        "bar": "bar"
      }
    }
  ],
  "component_template_substitutions": {
    "settings_template": {
      "template": {
        "settings": {
          "index": {
            "default_pipeline": "bar-pipeline"
          }
        }
      }
    }
  },
  "index_template_substitutions": {
    "template_1": {
      "index_patterns": ["foo*"],
      "composed_of": ["mappings_template_with_bar", "settings_template"]
    }
  }
}
```
2024-10-09 13:04:23 +11:00

501 lines
12 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[simulate-ingest-api]]
=== Simulate ingest API
++++
<titleabbrev>Simulate ingest</titleabbrev>
++++
Executes ingest pipelines against a set of provided documents, optionally
with substitute pipeline definitions. This API is meant to be used for
troubleshooting or pipeline development, as it does not actually index any
data into {es}.
////
[source,console]
----
PUT /_ingest/pipeline/my-pipeline
{
"description" : "example pipeline to simulate",
"processors": [
{
"set" : {
"field" : "field1",
"value" : "value1"
}
}
]
}
PUT /_ingest/pipeline/my-final-pipeline
{
"description" : "example final pipeline to simulate",
"processors": [
{
"set" : {
"field" : "field2",
"value" : "value2"
}
}
]
}
PUT /my-index
{
"settings": {
"index": {
"default_pipeline": "my-pipeline",
"final_pipeline": "my-final-pipeline"
}
}
}
----
// TESTSETUP
////
[source,console]
----
POST /_ingest/_simulate
{
"docs": [
{
"_index": "my-index",
"_id": "id",
"_source": {
"foo": "bar"
}
},
{
"_index": "my-index",
"_id": "id",
"_source": {
"foo": "rab"
}
}
],
"pipeline_substitutions": { <1>
"my-pipeline": {
"processors": [
{
"set": {
"field": "field3",
"value": "value3"
}
}
]
}
},
"component_template_substitutions": { <2>
"my-component-template": {
"template": {
"mappings": {
"dynamic": "true",
"properties": {
"field3": {
"type": "keyword"
}
}
},
"settings": {
"index": {
"default_pipeline": "my-pipeline"
}
}
}
}
},
"index_template_substitutions": { <3>
"my-index-template": {
"index_patterns": ["my-index-*"],
"composed_of": ["component_template_1", "component_template_2"]
}
}
}
----
<1> This replaces the existing `my-pipeline` pipeline with the contents given here for the duration of this request.
<2> This replaces the existing `my-component-template` component template with the contents given here for the duration of this request.
These templates can be used to change the pipeline(s) used, or to modify the mapping that will be used to validate the result.
<3> This replaces the existing `my-index-template` index template with the contents given here for the duration of this request.
These templates can be used to change the pipeline(s) used, or to modify the mapping that will be used to validate the result.
[[simulate-ingest-api-request]]
==== {api-request-title}
`POST /_ingest/_simulate`
`GET /_ingest/_simulate`
`POST /_ingest/<target>/_simulate`
`GET /_ingest/<target>/_simulate`
[[simulate-ingest-api-prereqs]]
==== {api-prereq-title}
* If the {es} {security-features} are enabled, you must have the
`index` or `create` <<privileges-list-indices,index privileges>>
to use this API.
[[simulate-ingest-api-desc]]
==== {api-description-title}
The simulate ingest API simulates ingesting data into an index. It
executes the default and final pipeline for that index against a set
of documents provided in the body of the request. If a pipeline
contains a <<reroute-processor,reroute processor>>, it follows that
reroute processor to the new index, executing that index's pipelines
as well the same way that a non-simulated ingest would. No data is
indexed into {es}. Instead, the transformed document is returned,
along with the list of pipelines that have been executed and the name
of the index where the document would have been indexed if this were
not a simulation. The transformed document is validated against the
mappings that would apply to this index, and any validation error is
reported in the result.
This API differs from the
<<simulate-pipeline-api,simulate pipeline API>> in that you specify a
single pipeline for that API, and it only runs that one pipeline. The
simulate pipeline API is more useful for developing a single pipeline,
while the simulate ingest API is more useful for troubleshooting the
interaction of the various pipelines that get applied when ingesting
into an index.
By default, the pipeline definitions that are currently in the system
are used. However, you can supply substitute pipeline definitions in the
body of the request. These will be used in place of the pipeline
definitions that are already in the system. This can be used to replace
existing pipeline definitions or to create new ones. The pipeline
substitutions are only used within this request.
[[simulate-ingest-api-path-params]]
==== {api-path-parms-title}
`<target>`::
(Optional, string)
The index to simulate ingesting into. This can be overridden by specifying an index
on each document. If you provide a <target> in the request path, it is used for any
documents that dont explicitly specify an index argument.
[[simulate-ingest-api-query-params]]
==== {api-query-parms-title}
`pipeline`::
(Optional, string)
Pipeline to use as the default pipeline. This can be used to override the default pipeline
of the index being ingested into.
[role="child_attributes"]
[[simulate-ingest-api-request-body]]
==== {api-request-body-title}
`docs`::
(Required, array of objects)
Sample documents to test in the pipeline.
+
.Properties of `docs` objects
[%collapsible%open]
====
`_id`::
(Optional, string)
Unique identifier for the document.
`_index`::
(Optional, string)
Name of the index that the document will be ingested into.
`_source`::
(Required, object)
JSON body for the document.
====
`pipeline_substitutions`::
(Optional, map of strings to objects)
Map of pipeline IDs to substitute pipeline definition objects.
+
.Properties of pipeline definition objects
[%collapsible%open]
====
include::put-pipeline.asciidoc[tag=pipeline-object]
====
`component_template_substitutions`::
(Optional, map of strings to objects)
Map of component template names to substitute component template definition objects.
+
.Properties of component template definition objects
[%collapsible%open]
====
include::{es-ref-dir}/indices/put-component-template.asciidoc[tag=template]
====
`index_template_substitutions`::
(Optional, map of strings to objects)
Map of index template names to substitute index template definition objects.
+
.Properties of index template definition objects
[%collapsible%open]
====
include::{es-ref-dir}/indices/put-index-template.asciidoc[tag=request-body]
====
[[simulate-ingest-api-example]]
==== {api-examples-title}
[[simulate-ingest-api-pre-existing-pipelines-ex]]
===== Use pre-existing pipeline definitions
In this example the index `index` has a default pipeline called `my-pipeline` and a final
pipeline called `my-final-pipeline`. Since both documents are being ingested into `index`,
both pipelines are executed using the pipeline definitions that are already in the system.
[source,console]
----
POST /_ingest/_simulate
{
"docs": [
{
"_index": "my-index",
"_id": "123",
"_source": {
"foo": "bar"
}
},
{
"_index": "my-index",
"_id": "456",
"_source": {
"foo": "rab"
}
}
]
}
----
The API returns the following response:
[source,console-result]
----
{
"docs": [
{
"doc": {
"_id": "123",
"_index": "my-index",
"_version": -3,
"_source": {
"field1": "value1",
"field2": "value2",
"foo": "bar"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
},
{
"doc": {
"_id": "456",
"_index": "my-index",
"_version": -3,
"_source": {
"field1": "value1",
"field2": "value2",
"foo": "rab"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
}
]
}
----
[[simulate-ingest-api-request-body-ex]]
===== Specify a pipeline substitution in the request body
In this example the index `my-index` has a default pipeline called `my-pipeline` and a final
pipeline called `my-final-pipeline`. But a substitute definition of `my-pipeline` is
provided in `pipeline_substitutions`. The substitute `my-pipeline` will be used in place of
the `my-pipeline` that is in the system, and then the `my-final-pipeline` that is already
defined in the system will be executed.
[source,console]
----
POST /_ingest/_simulate
{
"docs": [
{
"_index": "my-index",
"_id": "123",
"_source": {
"foo": "bar"
}
},
{
"_index": "my-index",
"_id": "456",
"_source": {
"foo": "rab"
}
}
],
"pipeline_substitutions": {
"my-pipeline": {
"processors": [
{
"uppercase": {
"field": "foo"
}
}
]
}
}
}
----
The API returns the following response:
[source,console-result]
----
{
"docs": [
{
"doc": {
"_id": "123",
"_index": "my-index",
"_version": -3,
"_source": {
"field2": "value2",
"foo": "BAR"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
},
{
"doc": {
"_id": "456",
"_index": "my-index",
"_version": -3,
"_source": {
"field2": "value2",
"foo": "RAB"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
}
]
}
----
[[simulate-ingest-api-substitute-component-templates-ex]]
===== Specify a component template substitution in the request body
In this example, imagine that the index `my-index` has a strict mapping with only the `foo`
keyword field defined. Say that field mapping came from a component template named
`my-mappings-template`. We want to test adding a new field, `bar`. So a substitute definition of
`my-mappings-template` is provided in `component_template_substitutions`. The substitute
`my-mappings-template` will be used in place of the existing mapping for `my-index` and in place
of the `my-mappings-template` that is in the system.
[source,console]
----
POST /_ingest/_simulate
{
"docs": [
{
"_index": "my-index",
"_id": "123",
"_source": {
"foo": "foo"
}
},
{
"_index": "my-index",
"_id": "456",
"_source": {
"bar": "rab"
}
}
],
"component_template_substitutions": {
"my-mappings_template": {
"template": {
"mappings": {
"dynamic": "strict",
"properties": {
"foo": {
"type": "keyword"
},
"bar": {
"type": "keyword"
}
}
}
}
}
}
}
----
The API returns the following response:
[source,console-result]
----
{
"docs": [
{
"doc": {
"_id": "123",
"_index": "my-index",
"_version": -3,
"_source": {
"foo": "foo"
},
"executed_pipelines": []
}
},
{
"doc": {
"_id": "456",
"_index": "my-index",
"_version": -3,
"_source": {
"bar": "rab"
},
"executed_pipelines": []
}
}
]
}
----
////
[source,console]
----
DELETE /my-index
DELETE /_ingest/pipeline/*
----
[source,console-result]
----
{
"acknowledged": true
}
----
////