elasticsearch/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc
Abdon Pijpelink 40409bf8ca
[DOCS] Semantic search page (#97715)
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2023-07-20 10:45:13 +02:00

88 lines
No EOL
2.4 KiB
Text

////
[source,console]
----
DELETE _ingest/pipeline/my-text-embeddings-pipeline
----
// TEST
// TEARDOWN
////
// tag::elser[]
This is how an ingest pipeline that uses the ELSER model is created:
[source,console]
----
PUT _ingest/pipeline/my-text-embeddings-pipeline
{
"description": "Text embedding pipeline",
"processors": [
{
"inference": {
"model_id": ".elser_model_1",
"target_field": "my_embeddings",
"field_map": { <1>
"my_text_field": "text_field"
},
"inference_config": {
"text_expansion": { <2>
"results_field": "tokens"
}
}
}
}
]
}
----
<1> The `field_map` object maps the input document field name (which is
`my_text_field` in this example) to the name of the field that the model expects
(which is always `text_field`).
<2> The `text_expansion` inference type needs to be used in the inference ingest
processor.
To ingest data through the pipeline to generate tokens with ELSER, refer to the
<<reindexing-data-elser>> section of the tutorial. After you successfully
ingested documents by using the pipeline, your index will contain the tokens
generated by ELSER.
// end::elser[]
// tag::dense-vector[]
This is how an ingest pipeline that uses a text embedding model is created:
[source,console]
----
PUT _ingest/pipeline/my-text-embeddings-pipeline
{
"description": "Text embedding pipeline",
"processors": [
{
"inference": {
"model_id": "sentence-transformers__msmarco-minilm-l-12-v3", <1>
"target_field": "my_embeddings",
"field_map": { <2>
"my_text_field": "text_field"
}
}
}
]
}
----
<1> The model ID of the text embedding model you want to use.
<2> The `field_map` object maps the input document field name (which is
`my_text_field` in this example) to the name of the field that the model expects
(which is always `text_field`).
To ingest data through the pipeline to generate text embeddings with your chosen
model, refer to the
{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an inference ingest pipeline]
section. The example shows how to create the pipeline with the inference
processor and reindex your data through the pipeline. After you successfully
ingested documents by using the pipeline, your index will contain the text
embeddings generated by the model.
// end::dense-vector[]