elasticsearch/docs/reference/inference/service-openai.asciidoc
István Zoltán Szabó f9a0509301
[8.18] [DOCS] Update service-openai.asciidoc (#125490)
* Update service-openai.asciidoc (#125419)

Many customers want to use our OpenAI Inference endpoint against OpenAI compatible API's they have written, or Ollama, or Nvidia Triton OpenAI API front end. I had heard that was the intent of this OpenAI inference endpoint, but we do not state it directly. Can we validate this is OK with Search PM and include this?

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/inference/service-openai.asciidoc

---------

Co-authored-by: Brad Quarry <38725582+bradquarry@users.noreply.github.com>
2025-03-24 14:52:48 +01:00

187 lines
5 KiB
Text

[[infer-service-openai]]
=== OpenAI {infer} integration
.New API reference
[sidebar]
--
For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
--
Creates an {infer} endpoint to perform an {infer} task with the `openai` service or `openai` compatible APIs.
[discrete]
[[infer-service-openai-api-request]]
==== {api-request-title}
`PUT /_inference/<task_type>/<inference_id>`
[discrete]
[[infer-service-openai-api-path-params]]
==== {api-path-parms-title}
`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]
`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:
* `chat_completion`,
* `completion`,
* `text_embedding`.
--
[NOTE]
====
The `chat_completion` task type only supports streaming and only through the `_stream` API.
include::inference-shared.asciidoc[tag=chat-completion-docs]
====
[discrete]
[[infer-service-openai-api-request-body]]
==== {api-request-body-title}
`chunking_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=chunking-settings]
`max_chunk_size`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
`overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-overlap]
`sentence_overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
`strategy`:::
(Optional, string)
include::inference-shared.asciidoc[tag=chunking-settings-strategy]
`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
`openai`.
`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]
+
--
These settings are specific to the `openai` service.
--
`api_key`:::
(Required, string)
A valid API key of your OpenAI account.
You can find your OpenAI API keys in your OpenAI account under the
https://platform.openai.com/api-keys[API keys section].
+
--
include::inference-shared.asciidoc[tag=api-key-admonition]
--
`dimensions`:::
(Optional, integer)
The number of dimensions the resulting output embeddings should have.
Only supported in `text-embedding-3` and later models.
If not set the OpenAI defined default for the model is used.
`model_id`:::
(Required, string)
The name of the model to use for the {infer} task.
Refer to the
https://platform.openai.com/docs/guides/embeddings/what-are-embeddings[OpenAI documentation]
for the list of available text embedding models.
`organization_id`:::
(Optional, string)
The unique identifier of your organization.
You can find the Organization ID in your OpenAI account under
https://platform.openai.com/account/organization[**Settings** > **Organizations**].
`url`:::
(Optional, string)
The URL endpoint to use for the requests.
Can be changed for testing purposes.
Defaults to `https://api.openai.com/v1/embeddings`.
`rate_limit`:::
(Optional, object)
The `openai` service sets a default number of requests allowed per minute depending on the task type.
For `text_embedding` it is set to `3000`.
For `completion` it is set to `500`.
This helps to minimize the number of rate limit errors returned from OpenAI.
To modify this, set the `requests_per_minute` setting of this object in your service settings:
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]
More information about the rate limits for OpenAI can be found in your https://platform.openai.com/account/limits[Account limits].
--
`task_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=task-settings]
+
.`task_settings` for the `completion` task type
[%collapsible%closed]
=====
`user`:::
(Optional, string)
Specifies the user issuing the request, which can be used for abuse detection.
=====
+
.`task_settings` for the `text_embedding` task type
[%collapsible%closed]
=====
`user`:::
(optional, string)
Specifies the user issuing the request, which can be used for abuse detection.
=====
[discrete]
[[inference-example-openai]]
==== OpenAI service example
The following example shows how to create an {infer} endpoint called `openai-embeddings` to perform a `text_embedding` task type.
The embeddings created by requests to this endpoint will have 128 dimensions.
[source,console]
------------------------------------------------------------
PUT _inference/text_embedding/openai-embeddings
{
"service": "openai",
"service_settings": {
"api_key": "<api_key>",
"model_id": "text-embedding-3-small",
"dimensions": 128
}
}
------------------------------------------------------------
// TEST[skip:TBD]
The next example shows how to create an {infer} endpoint called `openai-completion` to perform a `completion` task type.
[source,console]
------------------------------------------------------------
PUT _inference/completion/openai-completion
{
"service": "openai",
"service_settings": {
"api_key": "<api_key>",
"model_id": "gpt-3.5-turbo"
}
}
------------------------------------------------------------
// TEST[skip:TBD]