mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
182 lines
No EOL
5.7 KiB
Text
182 lines
No EOL
5.7 KiB
Text
[[infer-service-azure-openai]]
|
|
=== Azure OpenAI {infer} integration
|
|
|
|
.New API reference
|
|
[sidebar]
|
|
--
|
|
For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
|
|
--
|
|
|
|
Creates an {infer} endpoint to perform an {infer} task with the `azureopenai` service.
|
|
|
|
|
|
[discrete]
|
|
[[infer-service-azure-openai-api-request]]
|
|
==== {api-request-title}
|
|
|
|
`PUT /_inference/<task_type>/<inference_id>`
|
|
|
|
[discrete]
|
|
[[infer-service-azure-openai-api-path-params]]
|
|
==== {api-path-parms-title}
|
|
|
|
`<inference_id>`::
|
|
(Required, string)
|
|
include::inference-shared.asciidoc[tag=inference-id]
|
|
|
|
`<task_type>`::
|
|
(Required, string)
|
|
include::inference-shared.asciidoc[tag=task-type]
|
|
+
|
|
--
|
|
Available task types:
|
|
|
|
* `completion`,
|
|
* `text_embedding`.
|
|
--
|
|
|
|
[discrete]
|
|
[[infer-service-azure-openai-api-request-body]]
|
|
==== {api-request-body-title}
|
|
|
|
`chunking_settings`::
|
|
(Optional, object)
|
|
include::inference-shared.asciidoc[tag=chunking-settings]
|
|
|
|
`max_chunk_size`:::
|
|
(Optional, integer)
|
|
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
|
|
|
|
`overlap`:::
|
|
(Optional, integer)
|
|
include::inference-shared.asciidoc[tag=chunking-settings-overlap]
|
|
|
|
`sentence_overlap`:::
|
|
(Optional, integer)
|
|
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
|
|
|
|
`strategy`:::
|
|
(Optional, string)
|
|
include::inference-shared.asciidoc[tag=chunking-settings-strategy]
|
|
|
|
`service`::
|
|
(Required, string)
|
|
The type of service supported for the specified task type. In this case,
|
|
`azureopenai`.
|
|
|
|
`service_settings`::
|
|
(Required, object)
|
|
include::inference-shared.asciidoc[tag=service-settings]
|
|
+
|
|
--
|
|
These settings are specific to the `azureopenai` service.
|
|
--
|
|
|
|
`api_key` or `entra_id`:::
|
|
(Required, string)
|
|
You must provide _either_ an API key or an Entra ID.
|
|
If you do not provide either, or provide both, you will receive an error when trying to create your model.
|
|
See the https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#authentication[Azure OpenAI Authentication documentation] for more details on these authentication types.
|
|
+
|
|
--
|
|
include::inference-shared.asciidoc[tag=api-key-admonition]
|
|
--
|
|
|
|
`resource_name`:::
|
|
(Required, string)
|
|
The name of your Azure OpenAI resource.
|
|
You can find this from the https://portal.azure.com/#view/HubsExtension/BrowseAll[list of resources] in the Azure Portal for your subscription.
|
|
|
|
`deployment_id`:::
|
|
(Required, string)
|
|
The deployment name of your deployed models.
|
|
Your Azure OpenAI deployments can be found though the https://oai.azure.com/[Azure OpenAI Studio] portal that is linked to your subscription.
|
|
|
|
`api_version`:::
|
|
(Required, string)
|
|
The Azure API version ID to use.
|
|
We recommend using the https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings[latest supported non-preview version].
|
|
|
|
`rate_limit`:::
|
|
(Optional, object)
|
|
The `azureopenai` service sets a default number of requests allowed per minute depending on the task type.
|
|
For `text_embedding` it is set to `1440`.
|
|
For `completion` it is set to `120`.
|
|
This helps to minimize the number of rate limit errors returned from Azure.
|
|
To modify this, set the `requests_per_minute` setting of this object in your service settings:
|
|
+
|
|
--
|
|
include::inference-shared.asciidoc[tag=request-per-minute-example]
|
|
|
|
More information about the rate limits for Azure can be found in the https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits[Quota limits docs] and https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest[How to change the quotas].
|
|
--
|
|
|
|
`task_settings`::
|
|
(Optional, object)
|
|
include::inference-shared.asciidoc[tag=task-settings]
|
|
+
|
|
.`task_settings` for the `completion` task type
|
|
[%collapsible%closed]
|
|
=====
|
|
`user`:::
|
|
(optional, string)
|
|
Specifies the user issuing the request, which can be used for abuse detection.
|
|
=====
|
|
+
|
|
.`task_settings` for the `text_embedding` task type
|
|
[%collapsible%closed]
|
|
=====
|
|
`user`:::
|
|
(optional, string)
|
|
Specifies the user issuing the request, which can be used for abuse detection.
|
|
=====
|
|
|
|
|
|
|
|
[discrete]
|
|
[[inference-example-azure-openai]]
|
|
==== Azure OpenAI service example
|
|
|
|
The following example shows how to create an {infer} endpoint called
|
|
`azure_openai_embeddings` to perform a `text_embedding` task type.
|
|
Note that we do not specify a model here, as it is defined already via our Azure OpenAI deployment.
|
|
|
|
The list of embeddings models that you can choose from in your deployment can be found in the https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings[Azure models documentation].
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
PUT _inference/text_embedding/azure_openai_embeddings
|
|
{
|
|
"service": "azureopenai",
|
|
"service_settings": {
|
|
"api_key": "<api_key>",
|
|
"resource_name": "<resource_name>",
|
|
"deployment_id": "<deployment_id>",
|
|
"api_version": "2024-02-01"
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
The next example shows how to create an {infer} endpoint called
|
|
`azure_openai_completion` to perform a `completion` task type.
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
PUT _inference/completion/azure_openai_completion
|
|
{
|
|
"service": "azureopenai",
|
|
"service_settings": {
|
|
"api_key": "<api_key>",
|
|
"resource_name": "<resource_name>",
|
|
"deployment_id": "<deployment_id>",
|
|
"api_version": "2024-02-01"
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
The list of chat completion models that you can choose from in your Azure OpenAI deployment can be found at the following places:
|
|
|
|
* https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-models[GPT-4 and GPT-4 Turbo models]
|
|
* https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-35[GPT-3.5] |