elasticsearch/docs/reference/ml/trained-models/apis/put-trained-models.asciidoc

1149 lines
31 KiB
Text

[role="xpack"]
[[put-trained-models]]
= Create trained models API
[subs="attributes"]
++++
<titleabbrev>Create trained models</titleabbrev>
++++
.New API reference
[sidebar]
--
For the most up-to-date API details, refer to {api-es}/group/endpoint-ml-trained-model[{ml-cap} trained model APIs].
--
Creates a trained model.
WARNING: Models created in version 7.8.0 are not backwards compatible
with older node versions. If in a mixed cluster environment,
all nodes must be at least 7.8.0 to use a model stored by
a 7.8.0 node.
[[ml-put-trained-models-request]]
== {api-request-title}
`PUT _ml/trained_models/<model_id>`
[[ml-put-trained-models-prereq]]
== {api-prereq-title}
Requires the `manage_ml` cluster privilege. This privilege is included in the
`machine_learning_admin` built-in role.
[[ml-put-trained-models-desc]]
== {api-description-title}
The create trained model API enables you to supply a trained model that is not
created by {dfanalytics}.
[[ml-put-trained-models-path-params]]
== {api-path-parms-title}
`<model_id>`::
(Required, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=model-id]
[[ml-put-trained-models-query-params]]
== {api-query-parms-title}
`defer_definition_decompression`::
(Optional, boolean)
If set to `true` and a `compressed_definition` is provided, the request defers
definition decompression and skips relevant validations.
This deferral is useful for systems or users that know a good byte size estimate for their
model and know that their model is valid and likely won't fail during inference.
`wait_for_completion`::
(Optional, boolean)
Whether to wait for all child operations such as model download
to complete, before returning or not. Defaults to `false`.
[role="child_attributes"]
[[ml-put-trained-models-request-body]]
== {api-request-body-title}
`compressed_definition`::
(Required, string)
The compressed (GZipped and Base64 encoded) {infer} definition of the model.
If `compressed_definition` is specified, then `definition` cannot be specified.
//Begin definition
`definition`::
(Required, object)
The {infer} definition for the model. If `definition` is specified, then
`compressed_definition` cannot be specified.
+
.Properties of `definition`
[%collapsible%open]
====
//Begin preprocessors
`preprocessors`::
(Optional, object)
Collection of preprocessors. See <<ml-put-trained-models-preprocessor-example>>.
+
.Properties of `preprocessors`
[%collapsible%open]
=====
//Begin frequency encoding
`frequency_encoding`::
(Required, object)
Defines a frequency encoding for a field.
+
.Properties of `frequency_encoding`
[%collapsible%open]
======
`feature_name`::
(Required, string)
The name of the resulting feature.
`field`::
(Required, string)
The field name to encode.
`frequency_map`::
(Required, object map of string:double)
Object that maps the field value to the frequency encoded value.
`custom`::
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
======
//End frequency encoding
//Begin one hot encoding
`one_hot_encoding`::
(Required, object)
Defines a one hot encoding map for a field.
+
.Properties of `one_hot_encoding`
[%collapsible%open]
======
`field`::
(Required, string)
The field name to encode.
`hot_map`::
(Required, object map of strings)
String map of "field_value: one_hot_column_name".
`custom`::
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
======
//End one hot encoding
//Begin target mean encoding
`target_mean_encoding`::
(Required, object)
Defines a target mean encoding for a field.
+
.Properties of `target_mean_encoding`
[%collapsible%open]
======
`default_value`:::
(Required, double)
The feature value if the field value is not in the `target_map`.
`feature_name`:::
(Required, string)
The name of the resulting feature.
`field`:::
(Required, string)
The field name to encode.
`target_map`:::
(Required, object map of string:double)
Object that maps the field value to the target mean value.
`custom`::
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
======
//End target mean encoding
=====
//End preprocessors
//Begin trained model
`trained_model`::
(Required, object)
The definition of the trained model.
+
.Properties of `trained_model`
[%collapsible%open]
=====
//Begin tree
`tree`::
(Required, object)
The definition for a binary decision tree.
+
.Properties of `tree`
[%collapsible%open]
======
`classification_labels`:::
(Optional, string) An array of classification labels (used for
`classification`).
`feature_names`:::
(Required, string)
Features expected by the tree, in their expected order.
`target_type`:::
(Required, string)
String indicating the model target type; `regression` or `classification`.
`tree_structure`:::
(Required, object)
An array of `tree_node` objects. The nodes must be in ordinal order by their
`tree_node.node_index` value.
======
//End tree
//Begin tree node
`tree_node`::
(Required, object)
The definition of a node in a tree.
+
--
There are two major types of nodes: leaf nodes and not-leaf nodes.
* Leaf nodes only need `node_index` and `leaf_value` defined.
* All other nodes need `split_feature`, `left_child`, `right_child`,
`threshold`, `decision_type`, and `default_left` defined.
--
+
.Properties of `tree_node`
[%collapsible%open]
======
`decision_type`::
(Optional, string)
Indicates the positive value (in other words, when to choose the left node)
decision type. Supported `lt`, `lte`, `gt`, `gte`. Defaults to `lte`.
`default_left`::
(Optional, Boolean)
Indicates whether to default to the left when the feature is missing. Defaults
to `true`.
`leaf_value`::
(Optional, double)
The leaf value of the of the node, if the value is a leaf (in other words, no
children).
`left_child`::
(Optional, integer)
The index of the left child.
`node_index`::
(Integer)
The index of the current node.
`right_child`::
(Optional, integer)
The index of the right child.
`split_feature`::
(Optional, integer)
The index of the feature value in the feature array.
`split_gain`::
(Optional, double) The information gain from the split.
`threshold`::
(Optional, double)
The decision threshold with which to compare the feature value.
======
//End tree node
//Begin ensemble
`ensemble`::
(Optional, object)
The definition for an ensemble model. See <<ml-put-trained-models-model-example>>.
+
.Properties of `ensemble`
[%collapsible%open]
======
//Begin aggregate output
`aggregate_output`::
(Required, object)
An aggregated output object that defines how to aggregate the outputs of the
`trained_models`. Supported objects are `weighted_mode`, `weighted_sum`, and
`logistic_regression`. See <<ml-put-trained-models-aggregated-output-example>>.
+
.Properties of `aggregate_output`
[%collapsible%open]
=======
//Begin logistic regression
`logistic_regression`::
(Optional, object)
This `aggregated_output` type works with binary classification (classification
for values [0, 1]). It multiplies the outputs (in the case of the `ensemble`
model, the inference model values) by the supplied `weights`. The resulting
vector is summed and passed to a
{wikipedia}/Sigmoid_function[`sigmoid` function]. The result
of the `sigmoid` function is considered the probability of class 1 (`P_1`),
consequently, the probability of class 0 is `1 - P_1`. The class with the
highest probability (either 0 or 1) is then returned. For more information about
logistic regression, see
{wikipedia}/Logistic_regression[this wiki article].
+
.Properties of `logistic_regression`
[%collapsible%open]
========
`weights`:::
(Required, double)
The weights to multiply by the input values (the inference values of the trained
models).
========
//End logistic regression
//Begin weighted sum
`weighted_sum`::
(Optional, object)
This `aggregated_output` type works with regression. The weighted sum of the
input values.
+
.Properties of `weighted_sum`
[%collapsible%open]
========
`weights`:::
(Required, double)
The weights to multiply by the input values (the inference values of the trained
models).
========
//End weighted sum
//Begin weighted mode
`weighted_mode`::
(Optional, object)
This `aggregated_output` type works with regression or classification. It takes
a weighted vote of the input values. The most common input value (taking the
weights into account) is returned.
+
.Properties of `weighted_mode`
[%collapsible%open]
========
`weights`:::
(Required, double)
The weights to multiply by the input values (the inference values of the trained
models).
========
//End weighted mode
//Begin exponent
`exponent`::
(Optional, object)
This `aggregated_output` type works with regression. It takes a weighted sum of
the input values and passes the result to an exponent function
(`e^x` where `x` is the sum of the weighted values).
+
.Properties of `exponent`
[%collapsible%open]
========
`weights`:::
(Required, double)
The weights to multiply by the input values (the inference values of the trained
models).
========
//End exponent
=======
//End aggregate output
`classification_labels`::
(Optional, string)
An array of classification labels.
`feature_names`::
(Optional, string)
Features expected by the ensemble, in their expected order.
`target_type`::
(Required, string)
String indicating the model target type; `regression` or `classification.`
`trained_models`::
(Required, object)
An array of `trained_model` objects. Supported trained models are `tree` and
`ensemble`.
======
//End ensemble
=====
//End trained model
====
//End definition
`description`::
(Optional, string)
A human-readable description of the {infer} trained model.
`estimated_heap_memory_usage_bytes`::
(Optional, integer) deprecated:[7.16.0,Replaced by `model_size_bytes`]
`estimated_operations`::
(Optional, integer)
The estimated number of operations to use the trained model during inference.
This property is supported only if `defer_definition_decompression` is `true` or
the model definition is not supplied.
//Begin inference_config
`inference_config`::
(Required, object)
The default configuration for inference. This can be: `regression`,
`classification`, `fill_mask`, `ner`, `question_answering`,
`text_classification`, `text_embedding`, `text_expansion` or `zero_shot_classification`.
If `regression` or `classification`, it must match the `target_type` of the
underlying `definition.trained_model`. If `fill_mask`, `ner`,
`question_answering`, `text_classification`, `text_embedding` or `text_expansion`; the
`model_type` must be `pytorch`.
+
.Properties of `inference_config`
[%collapsible%open]
====
`classification`:::
(Optional, object)
Classification configuration for inference.
+
.Properties of classification inference
[%collapsible%open]
=====
`num_top_classes`::::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
`num_top_feature_importance_values`::::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
`prediction_field_type`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`top_classes_results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
=====
`fill_mask`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-fill-mask]
+
.Properties of fill_mask inference
[%collapsible%open]
=====
`num_top_classes`::::
(Optional, integer)
Number of top predicted tokens to return for replacing the mask token. Defaults to `0`.
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
`ner`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-ner]
+
.Properties of ner inference
[%collapsible%open]
=====
`classification_labels`::::
(Optional, string)
An array of classification labels. NER only supports Inside-Outside-Beginning
labels (IOB) and only persons, organizations, locations, and miscellaneous.
Example: ["O", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC", "B-MISC",
"I-MISC"]
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the
properties of the `tokenization` object.
=====
`pass_through`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-pass-through]
+
.Properties of pass_through inference
[%collapsible%open]
=====
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
`question_answering`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-question-answering]
+
.Properties of question_answering inference
[%collapsible%open]
=====
`max_answer_length`::::
(Optional, integer)
The maximum amount of words in the answer. Defaults to `15`.
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Recommended to set `max_sentence_length` to `386` with `128` of `span` and set
`truncate` to `none`.
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
`regression`:::
(Optional, object)
Regression configuration for inference.
+
.Properties of regression inference
[%collapsible%open]
=====
`num_top_feature_importance_values`::::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
=====
`text_classification`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-text-classification]
+
.Properties of text_classification inference
[%collapsible%open]
=====
`classification_labels`::::
(Optional, string) An array of classification labels.
`num_top_classes`::::
(Optional, integer)
Specifies the number of top class predictions to return. Defaults to all classes
(-1).
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
`text_embedding`:::
(Object, optional)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-text-embedding]
+
.Properties of text_embedding inference
[%collapsible%open]
=====
`embedding_size`::::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-text-embedding-size]
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
`text_expansion`:::
(Object, optional)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-text-expansion]
+
.Properties of text_expansion inference
[%collapsible%open]
=====
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
`text_similarity`:::
(Object, optional)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-text-similarity]
+
.Properties of text_similarity inference
[%collapsible%open]
=====
`span_score_combination_function`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-text-similarity-span-score-func]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
`zero_shot_classification`:::
(Object, optional)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification]
+
.Properties of zero_shot_classification inference
[%collapsible%open]
=====
`classification_labels`::::
(Required, array)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-classification-labels]
`hypothesis_template`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-hypothesis-template]
`labels`::::
(Optional, array)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-labels]
`multi_label`::::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-multi-label]
`results_field`::::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
`tokenization`::::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
+
Refer to <<tokenization-properties>> to review the properties of the
`tokenization` object.
=====
====
//End of inference_config
//Begin input
`input`::
(Required, object)
The input field names for the model definition.
+
.Properties of `input`
[%collapsible%open]
====
`field_names`:::
(Required, string)
An array of input field names for the model.
====
//End input
// Begin location
`location`::
(Optional, object)
The model definition location. If the `definition` or `compressed_definition`
are not specified, the `location` is required.
+
.Properties of `location`
[%collapsible%open]
====
`index`:::
(Required, object)
Indicates that the model definition is stored in an index. This object must be
empty as the index for storing model definitions is configured automatically.
====
// End location
`metadata`::
(Optional, object)
An object map that contains metadata about the model.
`model_size_bytes`::
(Optional, integer)
The estimated memory usage in bytes to keep the trained model in memory. This
property is supported only if `defer_definition_decompression` is `true` or the
model definition is not supplied.
`model_type`::
(Optional, string)
The created model type. By default the model type is `tree_ensemble`.
Appropriate types are:
+
--
* `tree_ensemble`: The model definition is an ensemble model of decision trees.
* `lang_ident`: A special type reserved for language identification models.
* `pytorch`: The stored definition is a PyTorch (specifically a TorchScript) model. Currently only
NLP models are supported. For more information, refer to {ml-docs}/ml-nlp.html[{nlp-cap}].
--
`platform_architecture`::
(Optional, string)
If the model only works on one platform, because it is heavily
optimized for a particular processor architecture and OS combination,
then this field specifies which. The format of the string must match
the platform identifiers used by Elasticsearch, so one of, `linux-x86_64`,
`linux-aarch64`, `darwin-x86_64`, `darwin-aarch64`, or `windows-x86_64`.
For portable models (those that work independent of processor architecture or
OS features), leave this field unset.
//Begin prefix_strings
`prefix_strings`::
(Optional, object)
Certain NLP models are trained in such a way that a prefix string should
be applied to the input text before the input is evaluated. The prefix
may be different depending on the intention. For asymmetric tasks such
as infromation retrieval the prefix applied to a passage as it is indexed
can be different to the prefix applied when searching those passages.
+
--
`prefix_strings` has 2 options, a prefix string that is always applied
in the search context and one that is always applied when ingesting the
docs. Both are optional.
--
+
.Properties of `prefix_strings`
[%collapsible%open]
====
`search`:::
(Optional, string)
The prefix string to prepend to the input text for requests
originating from a search query.
`ingest`:::
(Optional, string)
The prefix string to prepend to the input text for requests
at ingest where the {infer} ingest processor is used.
// TODO is there a shortcut for Inference ingest processor?
====
//End prefix_strings
`tags`::
(Optional, string)
An array of tags to organize the model.
[[tokenization-properties]]
=== Properties of `tokenizaton`
The `tokenization` object has the following properties.
`bert`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
+
.Properties of bert
[%collapsible%open]
====
`do_lower_case`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-do-lower-case]
`max_sequence_length`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-max-sequence-length]
`span`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
`truncate`:::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
`with_special_tokens`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
====
`deberta_v2`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
+
.Properties of deberta_v2
[%collapsible%open]
====
`do_lower_case`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-do-lower-case]
+
--
Defaults to `false`.
--
`max_sequence_length`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-max-sequence-length]
`span`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
`truncate`:::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate-deberta-v2]
`with_special_tokens`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2-with-special-tokens]
====
`roberta`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
+
.Properties of roberta
[%collapsible%open]
====
`add_prefix_space`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta-add-prefix-space]
`max_sequence_length`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-max-sequence-length]
`span`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
`truncate`:::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
`with_special_tokens`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta-with-special-tokens]
====
`mpnet`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
+
.Properties of mpnet
[%collapsible%open]
====
`do_lower_case`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-do-lower-case]
`max_sequence_length`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-max-sequence-length]
`span`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
`truncate`:::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
`with_special_tokens`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet-with-special-tokens]
====
`xlm_roberta`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-xlm-roberta]
+
.Properties of xlm_roberta
[%collapsible%open]
====
`max_sequence_length`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-max-sequence-length]
`span`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
`truncate`:::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
`with_special_tokens`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta-with-special-tokens]
====
`bert_ja`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-ja]
+
.Properties of bert_ja
[%collapsible%open]
====
`do_lower_case`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-do-lower-case]
`max_sequence_length`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-max-sequence-length]
`span`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
`truncate`:::
(Optional, string)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
`with_special_tokens`:::
(Optional, boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-ja-with-special-tokens]
====
[[ml-put-trained-models-example]]
== {api-examples-title}
[[ml-put-trained-models-preprocessor-example]]
=== Preprocessor examples
The example below shows a `frequency_encoding` preprocessor object:
[source,js]
----------------------------------
{
"frequency_encoding":{
"field":"FlightDelayType",
"feature_name":"FlightDelayType_frequency",
"frequency_map":{
"Carrier Delay":0.6007414737092798,
"NAS Delay":0.6007414737092798,
"Weather Delay":0.024573576178086153,
"Security Delay":0.02476631010889467,
"No Delay":0.6007414737092798,
"Late Aircraft Delay":0.6007414737092798
}
}
}
----------------------------------
//NOTCONSOLE
The next example shows a `one_hot_encoding` preprocessor object:
[source,js]
----------------------------------
{
"one_hot_encoding":{
"field":"FlightDelayType",
"hot_map":{
"Carrier Delay":"FlightDelayType_Carrier Delay",
"NAS Delay":"FlightDelayType_NAS Delay",
"No Delay":"FlightDelayType_No Delay",
"Late Aircraft Delay":"FlightDelayType_Late Aircraft Delay"
}
}
}
----------------------------------
//NOTCONSOLE
This example shows a `target_mean_encoding` preprocessor object:
[source,js]
----------------------------------
{
"target_mean_encoding":{
"field":"FlightDelayType",
"feature_name":"FlightDelayType_targetmean",
"target_map":{
"Carrier Delay":39.97465788139886,
"NAS Delay":39.97465788139886,
"Security Delay":203.171206225681,
"Weather Delay":187.64705882352948,
"No Delay":39.97465788139886,
"Late Aircraft Delay":39.97465788139886
},
"default_value":158.17995752420433
}
}
----------------------------------
//NOTCONSOLE
[[ml-put-trained-models-model-example]]
=== Model examples
The first example shows a `trained_model` object:
[source,js]
----------------------------------
{
"tree":{
"feature_names":[
"DistanceKilometers",
"FlightTimeMin",
"FlightDelayType_NAS Delay",
"Origin_targetmean",
"DestRegion_targetmean",
"DestCityName_targetmean",
"OriginAirportID_targetmean",
"OriginCityName_frequency",
"DistanceMiles",
"FlightDelayType_Late Aircraft Delay"
],
"tree_structure":[
{
"decision_type":"lt",
"threshold":9069.33437193022,
"split_feature":0,
"split_gain":4112.094574306927,
"node_index":0,
"default_left":true,
"left_child":1,
"right_child":2
},
...
{
"node_index":9,
"leaf_value":-27.68987349695448
},
...
],
"target_type":"regression"
}
}
----------------------------------
//NOTCONSOLE
The following example shows an `ensemble` model object:
[source,js]
----------------------------------
"ensemble":{
"feature_names":[
...
],
"trained_models":[
{
"tree":{
"feature_names":[],
"tree_structure":[
{
"decision_type":"lte",
"node_index":0,
"leaf_value":47.64069875778043,
"default_left":false
}
],
"target_type":"regression"
}
},
...
],
"aggregate_output":{
"weighted_sum":{
"weights":[
...
]
}
},
"target_type":"regression"
}
----------------------------------
//NOTCONSOLE
[[ml-put-trained-models-aggregated-output-example]]
=== Aggregated output example
Example of a `logistic_regression` object:
[source,js]
----------------------------------
"aggregate_output" : {
"logistic_regression" : {
"weights" : [2.0, 1.0, .5, -1.0, 5.0, 1.0, 1.0]
}
}
----------------------------------
//NOTCONSOLE
Example of a `weighted_sum` object:
[source,js]
----------------------------------
"aggregate_output" : {
"weighted_sum" : {
"weights" : [1.0, -1.0, .5, 1.0, 5.0]
}
}
----------------------------------
//NOTCONSOLE
Example of a `weighted_mode` object:
[source,js]
----------------------------------
"aggregate_output" : {
"weighted_mode" : {
"weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
}
}
----------------------------------
//NOTCONSOLE
Example of an `exponent` object:
[source,js]
----------------------------------
"aggregate_output" : {
"exponent" : {
"weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
}
}
----------------------------------
//NOTCONSOLE
[[ml-put-trained-models-json-schema]]
=== Trained models JSON schema
For the full JSON schema of trained models,
https://github.com/elastic/ml-json-schemas[click here].