mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
* Adds default inference andpoints information
* Update docs/reference/inference/inference-apis.asciidoc
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
---------
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
(cherry picked from commit b2998378a3
)
# Conflicts:
# docs/reference/inference/inference-apis.asciidoc
This commit is contained in:
parent
3680bd902c
commit
dbd0e596ee
1 changed files with 25 additions and 12 deletions
|
@ -41,21 +41,34 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
|
|||
Now use <<semantic-search-semantic-text, semantic text>> to perform
|
||||
<<semantic-search, semantic search>> on your data.
|
||||
|
||||
//[discrete]
|
||||
//[[default-enpoints]]
|
||||
//=== Default {infer} endpoints
|
||||
[discrete]
|
||||
[[adaptive-allocations]]
|
||||
=== Adaptive allocations
|
||||
|
||||
//Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
|
||||
//The following list contains the default {infer} endpoints listed by `inference_id`:
|
||||
Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
|
||||
|
||||
//* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
|
||||
//* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
|
||||
When adaptive allocations are enabled:
|
||||
|
||||
//Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
|
||||
//The API call will automatically download and deploy the model which might take a couple of minutes.
|
||||
//Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
|
||||
//For these models, the minimum number of allocations is `0`.
|
||||
//If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
|
||||
* The number of allocations scales up automatically when the load increases.
|
||||
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
|
||||
|
||||
For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
|
||||
|
||||
[discrete]
|
||||
[[default-enpoints]]
|
||||
=== Default {infer} endpoints
|
||||
|
||||
Your {es} deployment contains preconfigured {infer} endpoints which makes them easier to use when defining `semantic_text` fields or using {infer} processors.
|
||||
The following list contains the default {infer} endpoints listed by `inference_id`:
|
||||
|
||||
* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
|
||||
* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
|
||||
|
||||
Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
|
||||
The API call will automatically download and deploy the model which might take a couple of minutes.
|
||||
Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
|
||||
For these models, the minimum number of allocations is `0`.
|
||||
If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
|
||||
|
||||
|
||||
[discrete]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue