[[infer-service-hugging-face]] === HuggingFace {infer} integration .New API reference [sidebar] -- For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs]. -- Creates an {infer} endpoint to perform an {infer} task with the `hugging_face` service. [discrete] [[infer-service-hugging-face-api-request]] ==== {api-request-title} `PUT /_inference//` [discrete] [[infer-service-hugging-face-api-path-params]] ==== {api-path-parms-title} ``:: (Required, string) include::inference-shared.asciidoc[tag=inference-id] ``:: (Required, string) include::inference-shared.asciidoc[tag=task-type] + -- Available task types: * `text_embedding`. -- [discrete] [[infer-service-hugging-face-api-request-body]] ==== {api-request-body-title} `chunking_settings`:: (Optional, object) include::inference-shared.asciidoc[tag=chunking-settings] `max_chunk_size`::: (Optional, integer) include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] `overlap`::: (Optional, integer) include::inference-shared.asciidoc[tag=chunking-settings-overlap] `sentence_overlap`::: (Optional, integer) include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] `strategy`::: (Optional, string) include::inference-shared.asciidoc[tag=chunking-settings-strategy] `service`:: (Required, string) The type of service supported for the specified task type. In this case, `hugging_face`. `service_settings`:: (Required, object) include::inference-shared.asciidoc[tag=service-settings] + -- These settings are specific to the `hugging_face` service. -- `api_key`::: (Required, string) A valid access token of your Hugging Face account. You can find your Hugging Face access tokens or you can create a new one https://huggingface.co/settings/tokens[on the settings page]. + -- include::inference-shared.asciidoc[tag=api-key-admonition] -- `url`::: (Required, string) The URL endpoint to use for the requests. `rate_limit`::: (Optional, object) By default, the `huggingface` service sets the number of requests allowed per minute to `3000`. This helps to minimize the number of rate limit errors returned from Hugging Face. To modify this, set the `requests_per_minute` setting of this object in your service settings: + -- include::inference-shared.asciidoc[tag=request-per-minute-example] -- [discrete] [[inference-example-hugging-face]] ==== Hugging Face service example The following example shows how to create an {infer} endpoint called `hugging-face-embeddings` to perform a `text_embedding` task type. [source,console] ------------------------------------------------------------ PUT _inference/text_embedding/hugging-face-embeddings { "service": "hugging_face", "service_settings": { "api_key": "", <1> "url": "" <2> } } ------------------------------------------------------------ // TEST[skip:TBD] <1> A valid Hugging Face access token. You can find on the https://huggingface.co/settings/tokens[settings page of your account]. <2> The {infer} endpoint URL you created on Hugging Face. Create a new {infer} endpoint on https://ui.endpoints.huggingface.co/[the Hugging Face endpoint page] to get an endpoint URL. Select the model you want to use on the new endpoint creation page - for example `intfloat/e5-small-v2` - then select the `Sentence Embeddings` task under the Advanced configuration section. Create the endpoint. Copy the URL after the endpoint initialization has been finished. [discrete] [[inference-example-hugging-face-supported-models]] The list of recommended models for the Hugging Face service: * https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2[all-MiniLM-L6-v2] * https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2[all-MiniLM-L12-v2] * https://huggingface.co/sentence-transformers/all-mpnet-base-v2[all-mpnet-base-v2] * https://huggingface.co/intfloat/e5-base-v2[e5-base-v2] * https://huggingface.co/intfloat/e5-small-v2[e5-small-v2] * https://huggingface.co/intfloat/multilingual-e5-base[multilingual-e5-base] * https://huggingface.co/intfloat/multilingual-e5-small[multilingual-e5-small]