[[infer-service-cohere]] === Cohere {infer} integration .New API reference [sidebar] -- For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs]. -- Creates an {infer} endpoint to perform an {infer} task with the `cohere` service. [discrete] [[infer-service-cohere-api-request]] ==== {api-request-title} `PUT /_inference//` [discrete] [[infer-service-cohere-api-path-params]] ==== {api-path-parms-title} ``:: (Required, string) include::inference-shared.asciidoc[tag=inference-id] ``:: (Required, string) include::inference-shared.asciidoc[tag=task-type] + -- Available task types: * `completion`, * `rerank`, * `text_embedding`. -- [discrete] [[infer-service-cohere-api-request-body]] ==== {api-request-body-title} `chunking_settings`:: (Optional, object) include::inference-shared.asciidoc[tag=chunking-settings] `max_chunk_size`::: (Optional, integer) include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] `overlap`::: (Optional, integer) include::inference-shared.asciidoc[tag=chunking-settings-overlap] `sentence_overlap`::: (Optional, integer) include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] `strategy`::: (Optional, string) include::inference-shared.asciidoc[tag=chunking-settings-strategy] `service`:: (Required, string) The type of service supported for the specified task type. In this case, `cohere`. `service_settings`:: (Required, object) include::inference-shared.asciidoc[tag=service-settings] + -- These settings are specific to the `cohere` service. -- `api_key`::: (Required, string) A valid API key of your Cohere account. You can find your Cohere API keys or you can create a new one https://dashboard.cohere.com/api-keys[on the API keys settings page]. + -- include::inference-shared.asciidoc[tag=api-key-admonition] -- `rate_limit`::: (Optional, object) By default, the `cohere` service sets the number of requests allowed per minute to `10000`. This value is the same for all task types. This helps to minimize the number of rate limit errors returned from Cohere. To modify this, set the `requests_per_minute` setting of this object in your service settings: + -- include::inference-shared.asciidoc[tag=request-per-minute-example] More information about Cohere's rate limits can be found in https://docs.cohere.com/docs/going-live#production-key-specifications[Cohere's production key docs]. -- + .`service_settings` for the `completion` task type [%collapsible%closed] ===== `model_id`:: (Optional, string) The name of the model to use for the {infer} task. To review the available `completion` models, refer to the https://docs.cohere.com/docs/models#command[Cohere docs]. ===== + .`service_settings` for the `rerank` task type [%collapsible%closed] ===== `model_id`:: (Optional, string) The name of the model to use for the {infer} task. To review the available `rerank` models, refer to the https://docs.cohere.com/reference/rerank-1[Cohere docs]. ===== + .`service_settings` for the `text_embedding` task type [%collapsible%closed] ===== `embedding_type`::: (Optional, string) Specifies the types of embeddings you want to get back. Defaults to `float`. Valid values are: * `byte`: use it for signed int8 embeddings (this is a synonym of `int8`). * `float`: use it for the default float embeddings. * `int8`: use it for signed int8 embeddings. `model_id`::: (Optional, string) The name of the model to use for the {infer} task. To review the available `text_embedding` models, refer to the https://docs.cohere.com/reference/embed[Cohere docs]. The default value for `text_embedding` is `embed-english-v2.0`. `similarity`::: (Optional, string) Similarity measure. One of `cosine`, `dot_product`, `l2_norm`. Defaults based on the `embedding_type` (`float` -> `dot_product`, `int8/byte` -> `cosine`). ===== `task_settings`:: (Optional, object) include::inference-shared.asciidoc[tag=task-settings] + .`task_settings` for the `rerank` task type [%collapsible%closed] ===== `return_documents`:: (Optional, boolean) Specify whether to return doc text within the results. `top_n`:: (Optional, integer) The number of most relevant documents to return, defaults to the number of the documents. If this {infer} endpoint is used in a `text_similarity_reranker` retriever query and `top_n` is set, it must be greater than or equal to `rank_window_size` in the query. ===== + .`task_settings` for the `text_embedding` task type [%collapsible%closed] ===== `input_type`::: (Optional, string) Specifies the type of input passed to the model. Valid values are: * `classification`: use it for embeddings passed through a text classifier. * `clusterning`: use it for the embeddings run through a clustering algorithm. * `ingest`: use it for storing document embeddings in a vector database. * `search`: use it for storing embeddings of search queries run against a vector database to find relevant documents. + IMPORTANT: The `input_type` field is required when using embedding models `v3` and higher. `truncate`::: (Optional, string) Specifies how the API handles inputs longer than the maximum token length. Defaults to `END`. Valid values are: * `NONE`: when the input exceeds the maximum input token length an error is returned. * `START`: when the input exceeds the maximum input token length the start of the input is discarded. * `END`: when the input exceeds the maximum input token length the end of the input is discarded. ===== [discrete] [[inference-example-cohere]] ==== Cohere service examples The following example shows how to create an {infer} endpoint called `cohere-embeddings` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/cohere-embeddings { "service": "cohere", "service_settings": { "api_key": "", "model_id": "embed-english-light-v3.0", "embedding_type": "byte" } } ------------------------------------------------------------ // TEST[skip:TBD] The following example shows how to create an {infer} endpoint called `cohere-rerank` to perform a `rerank` task type. [source,console] ------------------------------------------------------------ PUT _inference/rerank/cohere-rerank { "service": "cohere", "service_settings": { "api_key": "", "model_id": "rerank-english-v3.0" }, "task_settings": { "top_n": 10, "return_documents": true } } ------------------------------------------------------------ // TEST[skip:TBD] For more examples, also review the https://docs.cohere.com/docs/elasticsearch-and-cohere#rerank-search-results-with-cohere-and-elasticsearch[Cohere documentation].