elasticsearch/docs/reference/inference/service-azure-ai-studio.asciidoc

[[infer-service-azure-ai-studio]]
=== Azure AI studio {infer} integration

.New API reference
[sidebar]
--
For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
--

Creates an {infer} endpoint to perform an {infer} task with the `azureaistudio` service.


[discrete]
[[infer-service-azure-ai-studio-api-request]]
==== {api-request-title}

`PUT /_inference/<task_type>/<inference_id>`

[discrete]
[[infer-service-azure-ai-studio-api-path-params]]
==== {api-path-parms-title}

`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]

`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:

* `completion`,
* `text_embedding`.
--

[discrete]
[[infer-service-azure-ai-studio-api-request-body]]
==== {api-request-body-title}

`chunking_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=chunking-settings]

`max_chunk_size`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]

`overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-overlap]

`sentence_overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]

`strategy`:::
(Optional, string)
include::inference-shared.asciidoc[tag=chunking-settings-strategy]

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
`azureaistudio`.

`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]
+
--
These settings are specific to the `azureaistudio` service.
--

`api_key`:::
(Required, string)
A valid API key of your Azure AI Studio model deployment.
This key can be found on the overview page for your deployment in the management section of your https://ai.azure.com/[Azure AI Studio] account.
+
--
include::inference-shared.asciidoc[tag=api-key-admonition]
--

`target`:::
(Required, string)
The target URL of your Azure AI Studio model deployment.
This can be found on the overview page for your deployment in the management section of your https://ai.azure.com/[Azure AI Studio] account.

`provider`:::
(Required, string)
The model provider for your deployment.
Note that some providers may support only certain task types.
Supported providers include:

* `cohere` - available for `text_embedding` and `completion` task types
* `databricks` - available for `completion` task type only
* `meta` - available for `completion` task type only
* `microsoft_phi` - available for `completion` task type only
* `mistral` - available for `completion` task type only
* `openai` - available for `text_embedding` and `completion` task types

`endpoint_type`:::
(Required, string)
One of `token` or `realtime`.
Specifies the type of endpoint that is used in your model deployment.
There are https://learn.microsoft.com/en-us/azure/ai-studio/concepts/deployments-overview#billing-for-deploying-and-inferencing-llms-in-azure-ai-studio[two endpoint types available] for deployment through Azure AI Studio.
"Pay as you go" endpoints are billed per token.
For these, you must specify `token` for your `endpoint_type`.
For "real-time" endpoints which are billed per hour of usage, specify `realtime`.

`rate_limit`:::
(Optional, object)
By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`.
This helps to minimize the number of rate limit errors returned from Azure AI Studio.
To modify this, set the `requests_per_minute` setting of this object in your service settings:
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]
--

`task_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=task-settings]
+
.`task_settings` for the `completion` task type
[%collapsible%closed]
=====
`do_sample`:::
(Optional, float)
Instructs the inference process to perform sampling or not.
Has no effect unless `temperature` or `top_p` is specified.

`max_new_tokens`:::
(Optional, integer)
Provides a hint for the maximum number of output tokens to be generated.
Defaults to 64.

`temperature`:::
(Optional, float)
A number in the range of 0.0 to 2.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
Should not be used if `top_p` is specified.

`top_p`:::
(Optional, float)
A number in the range of 0.0 to 2.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
Should not be used if `temperature` is specified.
=====
+
.`task_settings` for the `text_embedding` task type
[%collapsible%closed]
=====
`user`:::
(optional, string)
Specifies the user issuing the request, which can be used for abuse detection.
=====


[discrete]
[[inference-example-azureaistudio]]
==== Azure AI Studio service example

The following example shows how to create an {infer} endpoint called `azure_ai_studio_embeddings` to perform a `text_embedding` task type.
Note that we do not specify a model here, as it is defined already via our Azure AI Studio deployment.

The list of embeddings models that you can choose from in your deployment can be found in the https://ai.azure.com/explore/models?selectedTask=embeddings[Azure AI Studio model explorer].

[source,console]
------------------------------------------------------------
PUT _inference/text_embedding/azure_ai_studio_embeddings
{
    "service": "azureaistudio",
    "service_settings": {
        "api_key": "<api_key>",
        "target": "<target_uri>",
        "provider": "<model_provider>",
        "endpoint_type": "<endpoint_type>"
    }
}
------------------------------------------------------------
// TEST[skip:TBD]

The next example shows how to create an {infer} endpoint called `azure_ai_studio_completion` to perform a `completion` task type.

[source,console]
------------------------------------------------------------
PUT _inference/completion/azure_ai_studio_completion
{
    "service": "azureaistudio",
    "service_settings": {
        "api_key": "<api_key>",
        "target": "<target_uri>",
        "provider": "<model_provider>",
        "endpoint_type": "<endpoint_type>"
    }
}
------------------------------------------------------------
// TEST[skip:TBD]

The list of chat completion models that you can choose from in your deployment can be found in the https://ai.azure.com/explore/models?selectedTask=chat-completion[Azure AI Studio model explorer].