mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
421 lines
12 KiB
Text
421 lines
12 KiB
Text
[role="xpack"]
|
|
[[chat-completion-inference-api]]
|
|
=== Chat completion inference API
|
|
|
|
Streams a chat completion response.
|
|
|
|
IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face.
|
|
For built-in models and models uploaded through Eland, the {infer} APIs offer an alternative way to use and manage trained models.
|
|
However, if you do not plan to use the {infer} APIs to use these models or if you want to use non-NLP models, use the <<ml-df-trained-models-apis>>.
|
|
|
|
|
|
[discrete]
|
|
[[chat-completion-inference-api-request]]
|
|
==== {api-request-title}
|
|
|
|
`POST /_inference/<inference_id>/_unified`
|
|
|
|
`POST /_inference/chat_completion/<inference_id>/_unified`
|
|
|
|
|
|
[discrete]
|
|
[[chat-completion-inference-api-prereqs]]
|
|
==== {api-prereq-title}
|
|
|
|
* Requires the `monitor_inference` <<privileges-list-cluster,cluster privilege>>
|
|
(the built-in `inference_admin` and `inference_user` roles grant this privilege)
|
|
* You must use a client that supports streaming.
|
|
|
|
|
|
[discrete]
|
|
[[chat-completion-inference-api-desc]]
|
|
==== {api-description-title}
|
|
|
|
The chat completion {infer} API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation.
|
|
It only works with the `chat_completion` task type for `openai` and `elastic` {infer} services.
|
|
|
|
|
|
[NOTE]
|
|
====
|
|
* The `chat_completion` task type is only available within the _unified API and only supports streaming.
|
|
* The Chat completion {infer} API and the Stream {infer} API differ in their response structure and capabilities.
|
|
The Chat completion {infer} API provides more comprehensive customization options through more fields and function calling support.
|
|
If you use the `openai` service or the `elastic` service, use the Chat completion {infer} API.
|
|
====
|
|
|
|
[discrete]
|
|
[[chat-completion-inference-api-path-params]]
|
|
==== {api-path-parms-title}
|
|
|
|
`<inference_id>`::
|
|
(Required, string)
|
|
The unique identifier of the {infer} endpoint.
|
|
|
|
|
|
`<task_type>`::
|
|
(Optional, string)
|
|
The type of {infer} task that the model performs. If included, this must be set to the value `chat_completion`.
|
|
|
|
|
|
[discrete]
|
|
[[chat-completion-inference-api-request-body]]
|
|
==== {api-request-body-title}
|
|
|
|
`messages`::
|
|
(Required, array of objects) A list of objects representing the conversation.
|
|
Requests should generally only add new messages from the user (role `user`). The other message roles (`assistant`, `system`, or `tool`) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.
|
|
+
|
|
.Assistant message
|
|
[%collapsible%closed]
|
|
=====
|
|
`content`::
|
|
(Required unless `tool_calls` is specified, string or array of objects)
|
|
The contents of the message.
|
|
+
|
|
include::inference-shared.asciidoc[tag=chat-completion-schema-content-with-examples]
|
|
+
|
|
`role`::
|
|
(Required, string)
|
|
The role of the message author. This should be set to `assistant` for this type of message.
|
|
+
|
|
`tool_calls`::
|
|
(Optional, array of objects)
|
|
The tool calls generated by the model.
|
|
+
|
|
.Examples
|
|
[%collapsible%closed]
|
|
======
|
|
[source,js]
|
|
------------------------------------------------------------
|
|
{
|
|
"tool_calls": [
|
|
{
|
|
"id": "call_KcAjWtAww20AihPHphUh46Gd",
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"arguments": "{\"location\":\"Boston, MA\"}"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
------------------------------------------------------------
|
|
// NOTCONSOLE
|
|
======
|
|
+
|
|
`id`:::
|
|
(Required, string)
|
|
The identifier of the tool call.
|
|
+
|
|
`type`:::
|
|
(Required, string)
|
|
The type of tool call. This must be set to the value `function`.
|
|
+
|
|
`function`:::
|
|
(Required, object)
|
|
The function that the model called.
|
|
+
|
|
`name`::::
|
|
(Required, string)
|
|
The name of the function to call.
|
|
+
|
|
`arguments`::::
|
|
(Required, string)
|
|
The arguments to call the function with in JSON format.
|
|
=====
|
|
+
|
|
.System message
|
|
[%collapsible%closed]
|
|
=====
|
|
`content`:::
|
|
(Required, string or array of objects)
|
|
The contents of the message.
|
|
+
|
|
include::inference-shared.asciidoc[tag=chat-completion-schema-content-with-examples]
|
|
+
|
|
`role`:::
|
|
(Required, string)
|
|
The role of the message author. This should be set to `system` for this type of message.
|
|
=====
|
|
+
|
|
.Tool message
|
|
[%collapsible%closed]
|
|
=====
|
|
`content`::
|
|
(Required, string or array of objects)
|
|
The contents of the message.
|
|
+
|
|
include::inference-shared.asciidoc[tag=chat-completion-schema-content-with-examples]
|
|
+
|
|
`role`::
|
|
(Required, string)
|
|
The role of the message author. This should be set to `tool` for this type of message.
|
|
+
|
|
`tool_call_id`::
|
|
(Required, string)
|
|
The tool call that this message is responding to.
|
|
=====
|
|
+
|
|
.User message
|
|
[%collapsible%closed]
|
|
=====
|
|
`content`::
|
|
(Required, string or array of objects)
|
|
The contents of the message.
|
|
+
|
|
include::inference-shared.asciidoc[tag=chat-completion-schema-content-with-examples]
|
|
+
|
|
`role`::
|
|
(Required, string)
|
|
The role of the message author. This should be set to `user` for this type of message.
|
|
=====
|
|
|
|
`model`::
|
|
(Optional, string)
|
|
The ID of the model to use. By default, the model ID is set to the value included when creating the inference endpoint.
|
|
|
|
`max_completion_tokens`::
|
|
(Optional, integer)
|
|
The upper bound limit for the number of tokens that can be generated for a completion request.
|
|
|
|
`stop`::
|
|
(Optional, array of strings)
|
|
A sequence of strings to control when the model should stop generating additional tokens.
|
|
|
|
`temperature`::
|
|
(Optional, float)
|
|
The sampling temperature to use.
|
|
|
|
`tools`::
|
|
(Optional, array of objects)
|
|
A list of tools that the model can call.
|
|
+
|
|
.Structure
|
|
[%collapsible%closed]
|
|
=====
|
|
`type`::
|
|
(Required, string)
|
|
The type of tool, must be set to the value `function`.
|
|
+
|
|
`function`::
|
|
(Required, object)
|
|
The function definition.
|
|
+
|
|
`description`:::
|
|
(Optional, string)
|
|
A description of what the function does. This is used by the model to choose when and how to call the function.
|
|
+
|
|
`name`:::
|
|
(Required, string)
|
|
The name of the function.
|
|
+
|
|
`parameters`:::
|
|
(Optional, object)
|
|
The parameters the functional accepts. This should be formatted as a JSON object.
|
|
+
|
|
`strict`:::
|
|
(Optional, boolean)
|
|
Whether to enable schema adherence when generating the function call.
|
|
=====
|
|
+
|
|
.Examples
|
|
[%collapsible%closed]
|
|
======
|
|
[source,js]
|
|
------------------------------------------------------------
|
|
{
|
|
"tools": [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_price_of_item",
|
|
"description": "Get the current price of an item",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"item": {
|
|
"id": "12345"
|
|
},
|
|
"unit": {
|
|
"type": "currency"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
------------------------------------------------------------
|
|
// NOTCONSOLE
|
|
======
|
|
|
|
`tool_choice`::
|
|
(Optional, string or object)
|
|
Controls which tool is called by the model.
|
|
+
|
|
String representation:::
|
|
One of `auto`, `none`, or `requrired`. `auto` allows the model to choose between calling tools and generating a message. `none` causes the model to not call any tools. `required` forces the model to call one or more tools.
|
|
+
|
|
Object representation:::
|
|
+
|
|
.Structure
|
|
[%collapsible%closed]
|
|
=====
|
|
`type`::
|
|
(Required, string)
|
|
The type of the tool. This must be set to the value `function`.
|
|
+
|
|
`function`::
|
|
(Required, object)
|
|
+
|
|
`name`:::
|
|
(Required, string)
|
|
The name of the function to call.
|
|
=====
|
|
+
|
|
.Examples
|
|
[%collapsible%closed]
|
|
=====
|
|
[source,js]
|
|
------------------------------------------------------------
|
|
{
|
|
"tool_choice": {
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather"
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// NOTCONSOLE
|
|
=====
|
|
|
|
`top_p`::
|
|
(Optional, float)
|
|
Nucleus sampling, an alternative to sampling with temperature.
|
|
|
|
[discrete]
|
|
[[chat-completion-inference-api-example]]
|
|
==== {api-examples-title}
|
|
|
|
The following example performs a chat completion on the example question with streaming.
|
|
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
POST _inference/chat_completion/openai-completion/_stream
|
|
{
|
|
"model": "gpt-4o",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "What is Elastic?"
|
|
}
|
|
]
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
The following example performs a chat completion using an Assistant message with `tool_calls`.
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
POST _inference/chat_completion/openai-completion/_stream
|
|
{
|
|
"messages": [
|
|
{
|
|
"role": "assistant",
|
|
"content": "Let's find out what the weather is",
|
|
"tool_calls": [ <1>
|
|
{
|
|
"id": "call_KcAjWtAww20AihPHphUh46Gd",
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"arguments": "{\"location\":\"Boston, MA\"}"
|
|
}
|
|
}
|
|
]
|
|
},
|
|
{ <2>
|
|
"role": "tool",
|
|
"content": "The weather is cold",
|
|
"tool_call_id": "call_KcAjWtAww20AihPHphUh46Gd"
|
|
}
|
|
]
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
<1> Each tool call needs a corresponding Tool message.
|
|
<2> The corresponding Tool message.
|
|
|
|
The following example performs a chat completion using a User message with `tools` and `tool_choice`.
|
|
|
|
[source,console]
|
|
------------------------------------------------------------
|
|
POST _inference/chat_completion/openai-completion/_stream
|
|
{
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "text",
|
|
"text": "What's the price of a scarf?"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"tools": [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_price",
|
|
"description": "Get the current price of a item",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"item": {
|
|
"id": "123"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
],
|
|
"tool_choice": {
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_price"
|
|
}
|
|
}
|
|
}
|
|
------------------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
The API returns the following response when a request is made to the OpenAI service:
|
|
|
|
|
|
[source,txt]
|
|
------------------------------------------------------------
|
|
event: message
|
|
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
|
|
|
|
event: message
|
|
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":Elastic"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
|
|
|
|
event: message
|
|
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":" is"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
|
|
|
|
(...)
|
|
|
|
event: message
|
|
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44}}} <1>
|
|
|
|
event: message
|
|
data: [DONE]
|
|
------------------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
<1> The last object message of the stream contains the token usage information.
|