mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-29 01:44:36 -04:00
* (Doc+) Inference Pipeline ignores Mapping Analyzers From internal Dev feedback (will cross-link after), this updates that inference processors within ingest pipelines run before mapping analyzers effectively ignoring them. So if users want analyzers to take effect, they would need to select the analyzer's ingest pipeline process equivalent and run it higher in flow than the inference processor. --------- Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
633 lines
18 KiB
Text
633 lines
18 KiB
Text
[role="xpack"]
|
|
[[inference-processor]]
|
|
=== {infer-cap} processor
|
|
++++
|
|
<titleabbrev>{infer-cap}</titleabbrev>
|
|
++++
|
|
|
|
|
|
Uses a pre-trained {dfanalytics} model or a model deployed for natural
|
|
language processing tasks to infer against the data that is being
|
|
ingested in the pipeline.
|
|
|
|
|
|
[[inference-options]]
|
|
.{infer-cap} Options
|
|
[options="header"]
|
|
|======
|
|
| Name | Required | Default | Description
|
|
| `model_id` . | yes | - | (String) The ID or alias for the trained model, or the ID of the deployment.
|
|
| `input_output` | no | - | (List) Input fields for {infer} and output (destination) fields for the {infer} results. This option is incompatible with the `target_field` and `field_map` options.
|
|
| `target_field` | no | `ml.inference.<processor_tag>` | (String) Field added to incoming documents to contain results objects.
|
|
| `field_map` | no | If defined the model's default field map | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.
|
|
| `inference_config` | no | The default settings defined in the model | (Object) Contains the inference type and its options.
|
|
| `ignore_missing` | no | `false` | (Boolean) If `true` and any of the input fields defined in `input_ouput` are missing then those missing fields are quietly ignored, otherwise a missing field causes a failure. Only applies when using `input_output` configurations to explicitly list the input fields.
|
|
include::common-options.asciidoc[]
|
|
|======
|
|
|
|
[IMPORTANT]
|
|
==================================================
|
|
* You cannot use the `input_output` field with the `target_field` and
|
|
`field_map` fields. For NLP models, use the `input_output` option. For
|
|
{dfanalytics} models, use the `target_field` and `field_map` option.
|
|
* Each {infer} input field must be single strings, not arrays of strings.
|
|
* The `input_field` is processed as is and ignores any <<mapping,index mapping>>'s <<analysis,analyzers>> at time of {infer} run.
|
|
==================================================
|
|
|
|
[discrete]
|
|
[[inference-input-output-example]]
|
|
==== Configuring input and output fields
|
|
|
|
Select the `content` field for inference and write the result to
|
|
`content_embedding`.
|
|
|
|
IMPORTANT: If the specified `output_field` already exists in the ingest document, it won't be overwritten.
|
|
The {infer} results will be appended to the existing fields within `output_field`, which could lead to duplicate fields and potential errors.
|
|
To avoid this, use an unique `output_field` field name that does not clash with any existing fields.
|
|
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"inference": {
|
|
"model_id": "model_deployment_for_inference",
|
|
"input_output": [
|
|
{
|
|
"input_field": "content",
|
|
"output_field": "content_embedding"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
==== Configuring multiple inputs
|
|
|
|
The `content` and `title` fields will be read from the incoming document and
|
|
sent to the model for the inference. The inference output is written to
|
|
`content_embedding` and `title_embedding` respectively.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"inference": {
|
|
"model_id": "model_deployment_for_inference",
|
|
"input_output": [
|
|
{
|
|
"input_field": "content",
|
|
"output_field": "content_embedding"
|
|
},
|
|
{
|
|
"input_field": "title",
|
|
"output_field": "title_embedding"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
Selecting the input fields with `input_output` is incompatible with
|
|
the `target_field` and `field_map` options.
|
|
|
|
{dfanalytics-cap} models must use the `target_field` to specify the root
|
|
location results are written to and optionally a `field_map` to map field names
|
|
in the input document to the model input fields.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"inference": {
|
|
"model_id": "model_deployment_for_inference",
|
|
"target_field": "FlightDelayMin_prediction_infer",
|
|
"field_map": {
|
|
"your_field": "my_field"
|
|
},
|
|
"inference_config": { "regression": {} }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
|
|
[discrete]
|
|
[[inference-processor-classification-opt]]
|
|
==== {classification-cap} configuration options
|
|
|
|
Classification configuration for inference.
|
|
|
|
`num_top_classes`::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
|
|
|
|
`num_top_feature_importance_values`::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
`top_classes_results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
|
|
|
|
`prediction_field_type`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
|
|
|
|
[discrete]
|
|
[[inference-processor-fill-mask-opt]]
|
|
==== Fill mask configuration options
|
|
|
|
`num_top_classes`::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
`tokenization`::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
+
|
|
.Properties of tokenization
|
|
[%collapsible%open]
|
|
=====
|
|
`bert`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
+
|
|
.Properties of bert
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`roberta`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
|
|
+
|
|
.Properties of roberta
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`mpnet`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
+
|
|
.Properties of mpnet
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
=====
|
|
|
|
[discrete]
|
|
[[inference-processor-ner-opt]]
|
|
==== NER configuration options
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
`tokenization`::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
+
|
|
.Properties of tokenization
|
|
[%collapsible%open]
|
|
=====
|
|
`bert`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
+
|
|
.Properties of bert
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`roberta`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
|
|
+
|
|
.Properties of roberta
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`mpnet`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
+
|
|
.Properties of mpnet
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
=====
|
|
|
|
[discrete]
|
|
[[inference-processor-regression-opt]]
|
|
==== {regression-cap} configuration options
|
|
|
|
Regression configuration for inference.
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
`num_top_feature_importance_values`::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
|
|
|
|
[discrete]
|
|
[[inference-processor-text-classification-opt]]
|
|
==== Text classification configuration options
|
|
|
|
`classification_labels`::
|
|
(Optional, string) An array of classification labels.
|
|
|
|
`num_top_classes`::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
`tokenization`::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
+
|
|
.Properties of tokenization
|
|
[%collapsible%open]
|
|
=====
|
|
`bert`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
+
|
|
.Properties of bert
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`span`::::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`roberta`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
|
|
+
|
|
.Properties of roberta
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`span`::::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`mpnet`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
+
|
|
.Properties of mpnet
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
=====
|
|
|
|
[discrete]
|
|
[[inference-processor-text-embedding-opt]]
|
|
==== Text embedding configuration options
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
`tokenization`::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
+
|
|
.Properties of tokenization
|
|
[%collapsible%open]
|
|
=====
|
|
`bert`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
+
|
|
.Properties of bert
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`roberta`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
|
|
+
|
|
.Properties of roberta
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`mpnet`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
+
|
|
.Properties of mpnet
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
=====
|
|
|
|
|
|
[discrete]
|
|
[[inference-processor-text-expansion-opt]]
|
|
==== Text expansion configuration options
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
|
|
`tokenization`::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
+
|
|
.Properties of tokenization
|
|
[%collapsible%open]
|
|
=====
|
|
`bert`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
+
|
|
.Properties of bert
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`span`::::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`roberta`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
|
|
+
|
|
.Properties of roberta
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`span`::::
|
|
(Optional, integer)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`mpnet`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
+
|
|
.Properties of mpnet
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
=====
|
|
|
|
|
|
[discrete]
|
|
[[inference-processor-zero-shot-opt]]
|
|
==== Zero shot classification configuration options
|
|
|
|
`labels`::
|
|
(Optional, array)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-labels]
|
|
|
|
`multi_label`::
|
|
(Optional, boolean)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-multi-label]
|
|
|
|
`results_field`::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
`tokenization`::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
+
|
|
.Properties of tokenization
|
|
[%collapsible%open]
|
|
=====
|
|
`bert`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
+
|
|
.Properties of bert
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`roberta`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
|
|
+
|
|
.Properties of roberta
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
|
|
`mpnet`::::
|
|
(Optional, object)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
+
|
|
.Properties of mpnet
|
|
[%collapsible%open]
|
|
=======
|
|
|
|
`truncate`::::
|
|
(Optional, string)
|
|
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
|
|
=======
|
|
=====
|
|
|
|
[discrete]
|
|
[[inference-processor-config-example]]
|
|
==== {infer-cap} processor examples
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"inference":{
|
|
"model_id": "my_model_id",
|
|
"field_map": {
|
|
"original_fieldname": "expected_fieldname"
|
|
},
|
|
"inference_config": {
|
|
"regression": {
|
|
"results_field": "my_regression"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
This configuration specifies a `regression` inference and the results are
|
|
written to the `my_regression` field contained in the `target_field` results
|
|
object. The `field_map` configuration maps the field `original_fieldname` from
|
|
the source document to the field expected by the model.
|
|
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"inference":{
|
|
"model_id":"my_model_id"
|
|
"inference_config": {
|
|
"classification": {
|
|
"num_top_classes": 2,
|
|
"results_field": "prediction",
|
|
"top_classes_results_field": "probabilities"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
This configuration specifies a `classification` inference. The number of
|
|
categories for which the predicted probabilities are reported is 2
|
|
(`num_top_classes`). The result is written to the `prediction` field and the top
|
|
classes to the `probabilities` field. Both fields are contained in the
|
|
`target_field` results object.
|
|
|
|
For an example that uses {nlp} trained models, refer to
|
|
{ml-docs}/ml-nlp-inference.html[Add NLP inference to ingest pipelines].
|
|
|
|
[discrete]
|
|
[[inference-processor-feature-importance]]
|
|
==== {feat-imp-cap} object mapping
|
|
|
|
To get the full benefit of aggregating and searching for
|
|
{ml-docs}/ml-feature-importance.html[{feat-imp}], update your index mapping of
|
|
the {feat-imp} result field as you can see below:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"ml.inference.feature_importance": {
|
|
"type": "nested",
|
|
"dynamic": true,
|
|
"properties": {
|
|
"feature_name": {
|
|
"type": "keyword"
|
|
},
|
|
"importance": {
|
|
"type": "double"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
The mapping field name for {feat-imp} (in the example above, it is
|
|
`ml.inference.feature_importance`) is compounded as follows:
|
|
|
|
`<ml.inference.target_field>`.`<inference.tag>`.`feature_importance`
|
|
|
|
* `<ml.inference.target_field>`: defaults to `ml.inference`.
|
|
* `<inference.tag>`: if is not provided in the processor definition, then it is
|
|
not part of the field path.
|
|
|
|
For example, if you provide a tag `foo` in the definition as you can see below:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"tag": "foo",
|
|
...
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
|
|
Then, the {feat-imp} value is written to the
|
|
`ml.inference.foo.feature_importance` field.
|
|
|
|
You can also specify the target field as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"tag": "foo",
|
|
"target_field": "my_field"
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
In this case, {feat-imp} is exposed in the
|
|
`my_field.foo.feature_importance` field.
|