[role="xpack"] [[post-inference-api]] === Perform inference API .New API reference [sidebar] -- For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs]. -- Performs an inference task on an input text by using an {infer} endpoint. IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the {infer} APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the {infer} APIs to use these models or if you want to use non-NLP models, use the <>. [discrete] [[post-inference-api-request]] ==== {api-request-title} `POST /_inference/` `POST /_inference//` [discrete] [[post-inference-api-prereqs]] ==== {api-prereq-title} * Requires the `monitor_inference` <> (the built-in `inference_admin` and `inference_user` roles grant this privilege) [discrete] [[post-inference-api-desc]] ==== {api-description-title} The perform {infer} API enables you to use {ml} models to perform specific tasks on data that you provide as an input. The API returns a response with the results of the tasks. The {infer} endpoint you use can perform one specific task that has been defined when the endpoint was created with the <>. [discrete] [[post-inference-api-path-params]] ==== {api-path-parms-title} ``:: (Required, string) The unique identifier of the {infer} endpoint. ``:: (Optional, string) The type of {infer} task that the model performs. [discrete] [[post-inference-api-query-params]] ==== {api-query-parms-title} `timeout`:: (Optional, timeout) Controls the amount of time to wait for the inference to complete. Defaults to 30 seconds. [discrete] [[post-inference-api-request-body]] ==== {api-request-body-title} `input`:: (Required, string or array of strings) The text on which you want to perform the {infer} task. `input` can be a single string or an array. + -- [NOTE] ==== Inference endpoints for the `completion` task type currently only support a single string as input. ==== -- `query`:: (Required, string) Only for `rerank` {infer} endpoints. The search query text. `task_settings`:: (Optional, object) Task settings for the individual {infer} request. These settings are specific to the `` you specified and override the task settings specified when initializing the service. [discrete] [[post-inference-api-example]] ==== {api-examples-title} [discrete] [[inference-example-completion]] ===== Completion example The following example performs a completion on the example question. [source,console] ------------------------------------------------------------ POST _inference/completion/openai_chat_completions { "input": "What is Elastic?" } ------------------------------------------------------------ // TEST[skip:TBD] The API returns the following response: [source,console-result] ------------------------------------------------------------ { "completion": [ { "result": "Elastic is a company that provides a range of software solutions for search, logging, security, and analytics. Their flagship product is Elasticsearch, an open-source, distributed search engine that allows users to search, analyze, and visualize large volumes of data in real-time. Elastic also offers products such as Kibana, a data visualization tool, and Logstash, a log management and pipeline tool, as well as various other tools and solutions for data analysis and management." } ] } ------------------------------------------------------------ // NOTCONSOLE [discrete] [[inference-example-rerank]] ===== Rerank example The following example performs reranking on the example input. [source,console] ------------------------------------------------------------ POST _inference/rerank/cohere_rerank { "input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"], "query": "star wars main character" } ------------------------------------------------------------ // TEST[skip:TBD] The API returns the following response: [source,console-result] ------------------------------------------------------------ { "rerank": [ { "index": "2", "relevance_score": "0.011597361", "text": "leia" }, { "index": "0", "relevance_score": "0.006338922", "text": "luke" }, { "index": "5", "relevance_score": "0.0016166499", "text": "star" }, { "index": "4", "relevance_score": "0.0011695103", "text": "r2d2" }, { "index": "1", "relevance_score": "5.614787E-4", "text": "like" }, { "index": "6", "relevance_score": "3.7850367E-4", "text": "wars" }, { "index": "3", "relevance_score": "1.2508839E-5", "text": "chewy" } ] } ------------------------------------------------------------ [discrete] [[inference-example-sparse]] ===== Sparse embedding example The following example performs sparse embedding on the example sentence. [source,console] ------------------------------------------------------------ POST _inference/sparse_embedding/my-elser-model { "input": "The sky above the port was the color of television tuned to a dead channel." } ------------------------------------------------------------ // TEST[skip:TBD] The API returns the following response: [source,console-result] ------------------------------------------------------------ { "sparse_embedding": [ { "port": 2.1259406, "sky": 1.7073475, "color": 1.6922266, "dead": 1.6247464, "television": 1.3525393, "above": 1.2425821, "tuned": 1.1440028, "colors": 1.1218185, "tv": 1.0111054, "ports": 1.0067928, "poem": 1.0042328, "channel": 0.99471164, "tune": 0.96235967, "scene": 0.9020516, (...) }, (...) ] } ------------------------------------------------------------ // NOTCONSOLE [discrete] [[inference-example-text-embedding]] ===== Text embedding example The following example performs text embedding on the example sentence using the Cohere integration. [source,console] ------------------------------------------------------------ POST _inference/text_embedding/my-cohere-endpoint { "input": "The sky above the port was the color of television tuned to a dead channel.", "task_settings": { "input_type": "ingest" } } ------------------------------------------------------------ // TEST[skip:TBD] The API returns the following response: [source,console-result] ------------------------------------------------------------ { "text_embedding": [ { "embedding": [ { 0.018569946, -0.036895752, 0.01486969, -0.0045204163, -0.04385376, 0.0075950623, 0.04260254, -0.004005432, 0.007865906, 0.030792236, -0.050476074, 0.011795044, -0.011642456, -0.010070801, (...) }, (...) ] } ] } ------------------------------------------------------------ // NOTCONSOLE