elasticsearch/docs/reference/inference/service-azure-openai.asciidoc

[[infer-service-azure-openai]]
=== Azure OpenAI {infer} integration

.New API reference
[sidebar]
--
For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
--

Creates an {infer} endpoint to perform an {infer} task with the `azureopenai` service.


[discrete]
[[infer-service-azure-openai-api-request]]
==== {api-request-title}

`PUT /_inference/<task_type>/<inference_id>`

[discrete]
[[infer-service-azure-openai-api-path-params]]
==== {api-path-parms-title}

`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]

`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:

* `completion`,
* `text_embedding`.
--

[discrete]
[[infer-service-azure-openai-api-request-body]]
==== {api-request-body-title}

`chunking_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=chunking-settings]

`max_chunk_size`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]

`overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-overlap]

`sentence_overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]

`strategy`:::
(Optional, string)
include::inference-shared.asciidoc[tag=chunking-settings-strategy]

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
`azureopenai`.

`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]
+
--
These settings are specific to the `azureopenai` service.
--

`api_key` or `entra_id`:::
(Required, string)
You must provide _either_ an API key or an Entra ID.
If you do not provide either, or provide both, you will receive an error when trying to create your model.
See the https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#authentication[Azure OpenAI Authentication documentation] for more details on these authentication types.
+
--
include::inference-shared.asciidoc[tag=api-key-admonition]
--

`resource_name`:::
(Required, string)
The name of your Azure OpenAI resource.
You can find this from the https://portal.azure.com/#view/HubsExtension/BrowseAll[list of resources] in the Azure Portal for your subscription.

`deployment_id`:::
(Required, string)
The deployment name of your deployed models.
Your Azure OpenAI deployments can be found though the https://oai.azure.com/[Azure OpenAI Studio] portal that is linked to your subscription.

`api_version`:::
(Required, string)
The Azure API version ID to use.
We recommend using the https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings[latest supported non-preview version].

`rate_limit`:::
(Optional, object)
The `azureopenai` service sets a default number of requests allowed per minute depending on the task type.
For `text_embedding` it is set to `1440`.
For `completion` it is set to `120`.
This helps to minimize the number of rate limit errors returned from Azure.
To modify this, set the `requests_per_minute` setting of this object in your service settings:
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]

More information about the rate limits for Azure can be found in the https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits[Quota limits docs] and https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest[How to change the quotas].
--

`task_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=task-settings]
+
.`task_settings` for the `completion` task type
[%collapsible%closed]
=====
`user`:::
(optional, string)
Specifies the user issuing the request, which can be used for abuse detection.
=====
+
.`task_settings` for the `text_embedding` task type
[%collapsible%closed]
=====
`user`:::
(optional, string)
Specifies the user issuing the request, which can be used for abuse detection.
=====


[discrete]
[[inference-example-azure-openai]]
==== Azure OpenAI service example

The following example shows how to create an {infer} endpoint called
`azure_openai_embeddings` to perform a `text_embedding` task type.
Note that we do not specify a model here, as it is defined already via our Azure OpenAI deployment.

The list of embeddings models that you can choose from in your deployment can be found in the https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings[Azure models documentation].

[source,console]
------------------------------------------------------------
PUT _inference/text_embedding/azure_openai_embeddings
{
    "service": "azureopenai",
    "service_settings": {
        "api_key": "<api_key>",
        "resource_name": "<resource_name>",
        "deployment_id": "<deployment_id>",
        "api_version": "2024-02-01"
    }
}
------------------------------------------------------------
// TEST[skip:TBD]

The next example shows how to create an {infer} endpoint called
`azure_openai_completion` to perform a `completion` task type.

[source,console]
------------------------------------------------------------
PUT _inference/completion/azure_openai_completion
{
    "service": "azureopenai",
    "service_settings": {
        "api_key": "<api_key>",
        "resource_name": "<resource_name>",
        "deployment_id": "<deployment_id>",
        "api_version": "2024-02-01"
    }
}
------------------------------------------------------------
// TEST[skip:TBD]

The list of chat completion models that you can choose from in your Azure OpenAI deployment can be found at the following places:

* https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-models[GPT-4 and GPT-4 Turbo models]
* https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-35[GPT-3.5]