mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-30 18:33:26 -04:00
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: David Roberts <dave.roberts@elastic.co>
88 lines
No EOL
2.4 KiB
Text
88 lines
No EOL
2.4 KiB
Text
////
|
|
|
|
[source,console]
|
|
----
|
|
DELETE _ingest/pipeline/my-text-embeddings-pipeline
|
|
----
|
|
// TEST
|
|
// TEARDOWN
|
|
|
|
////
|
|
|
|
// tag::elser[]
|
|
|
|
This is how an ingest pipeline that uses the ELSER model is created:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _ingest/pipeline/my-text-embeddings-pipeline
|
|
{
|
|
"description": "Text embedding pipeline",
|
|
"processors": [
|
|
{
|
|
"inference": {
|
|
"model_id": ".elser_model_1",
|
|
"target_field": "my_embeddings",
|
|
"field_map": { <1>
|
|
"my_text_field": "text_field"
|
|
},
|
|
"inference_config": {
|
|
"text_expansion": { <2>
|
|
"results_field": "tokens"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
----
|
|
<1> The `field_map` object maps the input document field name (which is
|
|
`my_text_field` in this example) to the name of the field that the model expects
|
|
(which is always `text_field`).
|
|
<2> The `text_expansion` inference type needs to be used in the inference ingest
|
|
processor.
|
|
|
|
To ingest data through the pipeline to generate tokens with ELSER, refer to the
|
|
<<reindexing-data-elser>> section of the tutorial. After you successfully
|
|
ingested documents by using the pipeline, your index will contain the tokens
|
|
generated by ELSER.
|
|
|
|
// end::elser[]
|
|
|
|
|
|
// tag::dense-vector[]
|
|
|
|
This is how an ingest pipeline that uses a text embedding model is created:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _ingest/pipeline/my-text-embeddings-pipeline
|
|
{
|
|
"description": "Text embedding pipeline",
|
|
"processors": [
|
|
{
|
|
"inference": {
|
|
"model_id": "sentence-transformers__msmarco-minilm-l-12-v3", <1>
|
|
"target_field": "my_embeddings",
|
|
"field_map": { <2>
|
|
"my_text_field": "text_field"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
----
|
|
<1> The model ID of the text embedding model you want to use.
|
|
<2> The `field_map` object maps the input document field name (which is
|
|
`my_text_field` in this example) to the name of the field that the model expects
|
|
(which is always `text_field`).
|
|
|
|
To ingest data through the pipeline to generate text embeddings with your chosen
|
|
model, refer to the
|
|
{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an inference ingest pipeline]
|
|
section. The example shows how to create the pipeline with the inference
|
|
processor and reindex your data through the pipeline. After you successfully
|
|
ingested documents by using the pipeline, your index will contain the text
|
|
embeddings generated by the model.
|
|
|
|
// end::dense-vector[] |