mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
337 lines
12 KiB
Text
337 lines
12 KiB
Text
// tag::cohere[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT cohere-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 1024, <3>
|
|
"element_type": "byte"
|
|
},
|
|
"content": { <4>
|
|
"type": "text" <5>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be refrenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. Find this value in the
|
|
https://docs.cohere.com/reference/embed[Cohere documentation] of the model you
|
|
use.
|
|
<4> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<5> The field type which is text in this example.
|
|
|
|
// end::cohere[]
|
|
|
|
// tag::elser[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT elser-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "sparse_vector" <2>
|
|
},
|
|
"content": { <3>
|
|
"type": "text" <4>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be refrenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `sparse_vector` field for ELSER.
|
|
<3> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<4> The field type which is text in this example.
|
|
|
|
// end::elser[]
|
|
|
|
// tag::hugging-face[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT hugging-face-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 768, <3>
|
|
"element_type": "float"
|
|
},
|
|
"content": { <4>
|
|
"type": "text" <5>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be referenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. Find this value in the
|
|
https://huggingface.co/sentence-transformers/all-mpnet-base-v2[HuggingFace model documentation].
|
|
<4> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<5> The field type which is text in this example.
|
|
|
|
// end::hugging-face[]
|
|
|
|
// tag::openai[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT openai-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 1536, <3>
|
|
"element_type": "float",
|
|
"similarity": "dot_product" <4>
|
|
},
|
|
"content": { <5>
|
|
"type": "text" <6>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be referenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. Find this value in the
|
|
https://platform.openai.com/docs/guides/embeddings/embedding-models[OpenAI documentation]
|
|
of the model you use.
|
|
<4> The faster` dot_product` function can be used to calculate similarity
|
|
because OpenAI embeddings are normalised to unit length. You can check the
|
|
https://platform.openai.com/docs/guides/embeddings/which-distance-function-should-i-use[OpenAI docs]
|
|
about which similarity function to use.
|
|
<5> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<6> The field type which is text in this example.
|
|
|
|
// end::openai[]
|
|
|
|
// tag::azure-openai[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT azure-openai-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 1536, <3>
|
|
"element_type": "float",
|
|
"similarity": "dot_product" <4>
|
|
},
|
|
"content": { <5>
|
|
"type": "text" <6>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be referenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. Find this value in the
|
|
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings-models[Azure OpenAI documentation]
|
|
of the model you use.
|
|
<4> For Azure OpenAI embeddings, the `dot_product` function should be used to
|
|
calculate similarity as Azure OpenAI embeddings are normalised to unit length.
|
|
See the
|
|
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/understand-embeddings[Azure OpenAI embeddings]
|
|
documentation for more information on the model specifications.
|
|
<5> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<6> The field type which is text in this example.
|
|
|
|
// end::azure-openai[]
|
|
|
|
// tag::azure-ai-studio[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT azure-ai-studio-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 1536, <3>
|
|
"element_type": "float",
|
|
"similarity": "dot_product" <4>
|
|
},
|
|
"content": { <5>
|
|
"type": "text" <6>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be referenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. This value may be found on the model card in your Azure AI Studio deployment.
|
|
<4> For Azure AI Studio embeddings, the `dot_product` function should be used to
|
|
calculate similarity.
|
|
<5> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<6> The field type which is text in this example.
|
|
|
|
// end::azure-ai-studio[]
|
|
|
|
// tag::google-vertex-ai[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT google-vertex-ai-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 768, <3>
|
|
"element_type": "float",
|
|
"similarity": "dot_product" <4>
|
|
},
|
|
"content": { <5>
|
|
"type": "text" <6>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated embeddings. It must be referenced in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the embeddings is a `dense_vector` field.
|
|
<3> The output dimensions of the model. This value may be found on the https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api[Google Vertex AI model reference].
|
|
The {infer} API attempts to calculate the output dimensions automatically if `dims` are not specified.
|
|
<4> For Google Vertex AI embeddings, the `dot_product` function should be used to calculate similarity.
|
|
<5> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<6> The field type which is `text` in this example.
|
|
|
|
// end::google-vertex-ai[]
|
|
|
|
// tag::mistral[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT mistral-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 1024, <3>
|
|
"element_type": "float",
|
|
"similarity": "dot_product" <4>
|
|
},
|
|
"content": { <5>
|
|
"type": "text" <6>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be referenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. This value may be found on the https://docs.mistral.ai/getting-started/models/[Mistral model reference].
|
|
<4> For Mistral embeddings, the `dot_product` function should be used to
|
|
calculate similarity.
|
|
<5> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<6> The field type which is text in this example.
|
|
|
|
// end::mistral[]
|
|
|
|
// tag::amazon-bedrock[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT amazon-bedrock-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 1024, <3>
|
|
"element_type": "float",
|
|
"similarity": "dot_product" <4>
|
|
},
|
|
"content": { <5>
|
|
"type": "text" <6>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be referenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. This value may be different depending on the underlying model used.
|
|
See the https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html[Amazon Titan model] or the https://docs.cohere.com/reference/embed[Cohere Embeddings model] documentation.
|
|
<4> For Amazon Bedrock embeddings, the `dot_product` function should be used to
|
|
calculate similarity for Amazon titan models, or `cosine` for Cohere models.
|
|
<5> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<6> The field type which is text in this example.
|
|
|
|
// end::amazon-bedrock[]
|
|
|
|
// tag::alibabacloud-ai-search[]
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT alibabacloud-ai-search-embeddings
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"content_embedding": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 1024, <3>
|
|
"element_type": "float"
|
|
},
|
|
"content": { <4>
|
|
"type": "text" <5>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the field to contain the generated tokens. It must be referenced
|
|
in the {infer} pipeline configuration in the next step.
|
|
<2> The field to contain the tokens is a `dense_vector` field.
|
|
<3> The output dimensions of the model. This value may be different depending on the underlying model used.
|
|
See the https://help.aliyun.com/zh/open-search/search-platform/developer-reference/text-embedding-api-details[AlibabaCloud AI Search embedding model] documentation.
|
|
<4> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `content`. It must be referenced in
|
|
the {infer} pipeline configuration in the next step.
|
|
<5> The field type which is text in this example.
|
|
|
|
// end::alibabacloud-ai-search[]
|