[role="xpack"] [[put-inference-api]] === Create {infer} API experimental[] Creates a model to perform an {infer} task. IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, or Hugging Face. For built-in models and models uploaded though Eland, the {infer} APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the {infer} APIs to use these models or if you want to use non-NLP models, use the <>. [discrete] [[put-inference-api-request]] ==== {api-request-title} `PUT /_inference//` [discrete] [[put-inference-api-prereqs]] ==== {api-prereq-title} * Requires the `manage_inference` <> (the built-in `inference_admin` role grants this privilege) [discrete] [[put-inference-api-desc]] ==== {api-description-title} The create {infer} API enables you to create and configure a {ml} model to perform a specific {infer} task. The following services are available through the {infer} API: * Cohere * ELSER * Hugging Face * OpenAI * Elasticsearch (for built-in models and models uploaded through Eland) [discrete] [[put-inference-api-path-params]] ==== {api-path-parms-title} ``:: (Required, string) The unique identifier of the {infer} endpoint. ``:: (Required, string) The type of the {infer} task that the model will perform. Available task types: * `sparse_embedding`, * `text_embedding`. [discrete] [[put-inference-api-request-body]] == {api-request-body-title} `service`:: (Required, string) The type of service supported for the specified task type. Available services: * `cohere`: specify the `text_embedding` task type to use the Cohere service. * `elser`: specify the `sparse_embedding` task type to use the ELSER service. * `hugging_face`: specify the `text_embedding` task type to use the Hugging Face service. * `openai`: specify the `text_embedding` task type to use the OpenAI service. * `elasticsearch`: specify the `text_embedding` task type to use the E5 built-in model or text embedding models uploaded by Eland. `service_settings`:: (Required, object) Settings used to install the {infer} model. These settings are specific to the `service` you specified. + .`service_settings` for the `cohere` service [%collapsible%closed] ===== `api_key`::: (Required, string) A valid API key of your Cohere account. You can find your Cohere API keys or you can create a new one https://dashboard.cohere.com/api-keys[on the API keys settings page]. IMPORTANT: You need to provide the API key only once, during the {infer} model creation. The <> does not retrieve your API key. After creating the {infer} model, you cannot change the associated API key. If you want to use a different API key, delete the {infer} model and recreate it with the same name and the updated API key. `embedding_type`:: (Optional, string) Specifies the types of embeddings you want to get back. Defaults to `float`. Valid values are: * `byte`: use it for signed int8 embeddings (this is a synonym of `int8`). * `float`: use it for the default float embeddings. * `int8`: use it for signed int8 embeddings. `model_id`:: (Optional, string) The name of the model to use for the {infer} task. To review the available models, refer to the https://docs.cohere.com/reference/embed[Cohere docs]. Defaults to `embed-english-v2.0`. ===== + .`service_settings` for the `elser` service [%collapsible%closed] ===== `num_allocations`::: (Required, integer) The number of model allocations to create. `num_allocations` must not exceed the number of available processors per node divided by the `num_threads`. `num_threads`::: (Required, integer) The number of threads to use by each model allocation. `num_threads` must not exceed the number of available processors per node divided by the number of allocations. Must be a power of 2. Max allowed value is 32. ===== + .`service_settings` for the `hugging_face` service [%collapsible%closed] ===== `api_key`::: (Required, string) A valid access token of your Hugging Face account. You can find your Hugging Face access tokens or you can create a new one https://huggingface.co/settings/tokens[on the settings page]. IMPORTANT: You need to provide the API key only once, during the {infer} model creation. The <> does not retrieve your API key. After creating the {infer} model, you cannot change the associated API key. If you want to use a different API key, delete the {infer} model and recreate it with the same name and the updated API key. `url`::: (Required, string) The URL endpoint to use for the requests. ===== + .`service_settings` for the `openai` service [%collapsible%closed] ===== `api_key`::: (Required, string) A valid API key of your OpenAI account. You can find your OpenAI API keys in your OpenAI account under the https://platform.openai.com/api-keys[API keys section]. IMPORTANT: You need to provide the API key only once, during the {infer} model creation. The <> does not retrieve your API key. After creating the {infer} model, you cannot change the associated API key. If you want to use a different API key, delete the {infer} model and recreate it with the same name and the updated API key. `model_id`::: (Optional, string) The name of the model to use for the {infer} task. Refer to the https://platform.openai.com/docs/guides/embeddings/what-are-embeddings[OpenAI documentation] for the list of available text embedding models. `organization_id`::: (Optional, string) The unique identifier of your organization. You can find the Organization ID in your OpenAI account under https://platform.openai.com/account/organization[**Settings** > **Organizations**]. `url`::: (Optional, string) The URL endpoint to use for the requests. Can be changed for testing purposes. Defaults to `https://api.openai.com/v1/embeddings`. ===== + .`service_settings` for the `elasticsearch` service [%collapsible%closed] ===== `model_id`::: (Required, string) The name of the model to use for the {infer} task. It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or a text embedding model already {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland]. `num_allocations`::: (Required, integer) The number of model allocations to create. `num_allocations` must not exceed the number of available processors per node divided by the `num_threads`. `num_threads`::: (Required, integer) The number of threads to use by each model allocation. `num_threads` must not exceed the number of available processors per node divided by the number of allocations. Must be a power of 2. Max allowed value is 32. ===== `task_settings`:: (Optional, object) Settings to configure the {infer} task. These settings are specific to the `` you specified. + .`task_settings` for the `text_embedding` task type [%collapsible%closed] ===== `input_type`::: (optional, string) For `cohere` service only. Specifies the type of input passed to the model. Valid values are: * `classification`: use it for embeddings passed through a text classifier. * `clusterning`: use it for the embeddings run through a clustering algorithm. * `ingest`: use it for storing document embeddings in a vector database. * `search`: use it for storing embeddings of search queries run against a vector data base to find relevant documents. `truncate`::: (Optional, string) For `cohere` service only. Specifies how the API handles inputs longer than the maximum token length. Defaults to `END`. Valid values are: * `NONE`: when the input exceeds the maximum input token length an error is returned. * `START`: when the input exceeds the maximum input token length the start of the input is discarded. * `END`: when the input exceeds the maximum input token length the end of the input is discarded. ===== [discrete] [[put-inference-api-example]] ==== {api-examples-title} This section contains example API calls for every service type. [discrete] [[inference-example-cohere]] ===== Cohere service The following example shows how to create an {infer} endpoint called `cohere_embeddings` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/cohere-embeddings { "service": "cohere", "service_settings": { "api_key": "", "model_id": "embed-english-light-v3.0", "embedding_type": "byte" } } ------------------------------------------------------------ // TEST[skip:TBD] [discrete] [[inference-example-e5]] ===== E5 via the elasticsearch service The following example shows how to create an {infer} endpoint called `my-e5-model` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/my-e5-model { "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": ".multilingual-e5-small" <1> } } ------------------------------------------------------------ // TEST[skip:TBD] <1> The `model_id` must be the ID of one of the built-in E5 models. Valid values are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`. For further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation]. [discrete] [[inference-example-elser]] ===== ELSER service The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/sparse_embedding/my-elser-model { "service": "elser", "service_settings": { "num_allocations": 1, "num_threads": 1 } } ------------------------------------------------------------ // TEST[skip:TBD] Example response: [source,console-result] ------------------------------------------------------------ { "inference_id": "my-elser-model", "task_type": "sparse_embedding", "service": "elser", "service_settings": { "num_allocations": 1, "num_threads": 1 }, "task_settings": {} } ------------------------------------------------------------ // NOTCONSOLE [discrete] [[inference-example-hugging-face]] ===== Hugging Face service The following example shows how to create an {infer} endpoint called `hugging-face_embeddings` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/hugging-face-embeddings { "service": "hugging_face", "service_settings": { "api_key": "", <1> "url": "" <2> } } ------------------------------------------------------------ // TEST[skip:TBD] <1> A valid Hugging Face access token. You can find on the https://huggingface.co/settings/tokens[settings page of your account]. <2> The {infer} endpoint URL you created on Hugging Face. Create a new {infer} endpoint on https://ui.endpoints.huggingface.co/[the Hugging Face endpoint page] to get an endpoint URL. Select the model you want to use on the new endpoint creation page - for example `intfloat/e5-small-v2` - then select the `Sentence Embeddings` task under the Advanced configuration section. Create the endpoint. Copy the URL after the endpoint initialization has been finished. [discrete] [[inference-example-eland]] ===== Models uploaded by Eland via the elasticsearch service The following example shows how to create an {infer} endpoint called `my-msmarco-minilm-model` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/my-msmarco-minilm-model { "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": "msmarco-MiniLM-L12-cos-v5" <1> } } ------------------------------------------------------------ // TEST[skip:TBD] <1> The `model_id` must be the ID of a text embedding model which has already been {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland]. [discrete] [[inference-example-openai]] ===== OpenAI service The following example shows how to create an {infer} endpoint called `openai_embeddings` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/openai_embeddings { "service": "openai", "service_settings": { "api_key": "", "model_id": "text-embedding-ada-002" } } ------------------------------------------------------------ // TEST[skip:TBD]