elasticsearch/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc

84 lines
No EOL
2.4 KiB
Text

////
[source,console]
----
DELETE _ingest/pipeline/my-text-embeddings-pipeline
----
// TEST
// TEARDOWN
////
// tag::elser[]
This is how an ingest pipeline that uses the ELSER model is created:
[source,console]
----
PUT _ingest/pipeline/my-text-embeddings-pipeline
{
"description": "Text embedding pipeline",
"processors": [
{
"inference": {
"model_id": ".elser_model_2",
"input_output": [ <1>
{
"input_field": "my_text_field",
"output_field": "my_tokens"
}
]
}
}
]
}
----
<1> Configuration object that defines the `input_field` for the {infer} process
and the `output_field` that will contain the {infer} results.
To ingest data through the pipeline to generate tokens with ELSER, refer to the
<<reindexing-data-elser>> section of the tutorial. After you successfully
ingested documents by using the pipeline, your index will contain the tokens
generated by ELSER. Tokens are learned associations capturing relevance, they
are not synonyms. To learn more about what tokens are, refer to
{ml-docs}/ml-nlp-elser.html#elser-tokens[this page].
// end::elser[]
// tag::dense-vector[]
This is how an ingest pipeline that uses a text embedding model is created:
[source,console]
----
PUT _ingest/pipeline/my-text-embeddings-pipeline
{
"description": "Text embedding pipeline",
"processors": [
{
"inference": {
"model_id": "sentence-transformers__msmarco-minilm-l-12-v3", <1>
"target_field": "my_embeddings",
"field_map": { <2>
"my_text_field": "text_field"
}
}
}
]
}
----
<1> The model ID of the text embedding model you want to use.
<2> The `field_map` object maps the input document field name (which is
`my_text_field` in this example) to the name of the field that the model expects
(which is always `text_field`).
To ingest data through the pipeline to generate text embeddings with your chosen
model, refer to the
{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an inference ingest pipeline]
section. The example shows how to create the pipeline with the inference
processor and reindex your data through the pipeline. After you successfully
ingested documents by using the pipeline, your index will contain the text
embeddings generated by the model.
// end::dense-vector[]