[[infer-service-elasticsearch]] === Elasticsearch {infer} service Creates an {infer} endpoint to perform an {infer} task with the `elasticsearch` service. NOTE: If you use the E5 model through the `elasticsearch` service, the API request will automatically download and deploy the model if it isn't downloaded yet. [discrete] [[infer-service-elasticsearch-api-request]] ==== {api-request-title} `PUT /_inference//` [discrete] [[infer-service-elasticsearch-api-path-params]] ==== {api-path-parms-title} ``:: (Required, string) include::inference-shared.asciidoc[tag=inference-id] ``:: (Required, string) include::inference-shared.asciidoc[tag=task-type] + -- Available task types: * `rerank`, * `text_embedding`. -- [discrete] [[infer-service-elasticsearch-api-request-body]] ==== {api-request-body-title} `service`:: (Required, string) The type of service supported for the specified task type. In this case, `elasticsearch`. `service_settings`:: (Required, object) include::inference-shared.asciidoc[tag=service-settings] + -- These settings are specific to the `elasticsearch` service. -- `model_id`::: (Required, string) The name of the model to use for the {infer} task. It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or a text embedding model already {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland]. `num_allocations`::: (Required, integer) The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput. `num_threads`::: (Required, integer) Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node. Must be a power of 2. Max allowed value is 32. `task_settings`:: (Optional, object) include::inference-shared.asciidoc[tag=task-settings] + .`task_settings` for the `rerank` task type [%collapsible%closed] ===== `return_documents`::: (Optional, Boolean) Returns the document instead of only the index. Defaults to `true`. ===== [discrete] [[inference-example-elasticsearch]] ==== E5 via the `elasticsearch` service The following example shows how to create an {infer} endpoint called `my-e5-model` to perform a `text_embedding` task type. The API request below will automatically download the E5 model if it isn't already downloaded and then deploy the model. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/my-e5-model { "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": ".multilingual-e5-small" <1> } } ------------------------------------------------------------ // TEST[skip:TBD] <1> The `model_id` must be the ID of one of the built-in E5 models. Valid values are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`. For further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation]. [NOTE] ==== You might see a 502 bad gateway error in the response when using the {kib} Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the {ml-app} UI. If using the Python client, you can set the `timeout` parameter to a higher value. ==== [discrete] [[inference-example-eland]] ==== Models uploaded by Eland via the elasticsearch service The following example shows how to create an {infer} endpoint called `my-msmarco-minilm-model` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/my-msmarco-minilm-model <1> { "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": "msmarco-MiniLM-L12-cos-v5" <2> } } ------------------------------------------------------------ // TEST[skip:TBD] <1> Provide an unique identifier for the inference endpoint. The `inference_id` must be unique and must not match the `model_id`. <2> The `model_id` must be the ID of a text embedding model which has already been {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].