mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-29 18:03:32 -04:00
84 lines
No EOL
2.4 KiB
Text
84 lines
No EOL
2.4 KiB
Text
////
|
|
|
|
[source,console]
|
|
----
|
|
DELETE _ingest/pipeline/my-text-embeddings-pipeline
|
|
----
|
|
// TEST
|
|
// TEARDOWN
|
|
|
|
////
|
|
|
|
// tag::elser[]
|
|
|
|
This is how an ingest pipeline that uses the ELSER model is created:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _ingest/pipeline/my-text-embeddings-pipeline
|
|
{
|
|
"description": "Text embedding pipeline",
|
|
"processors": [
|
|
{
|
|
"inference": {
|
|
"model_id": ".elser_model_2",
|
|
"input_output": [ <1>
|
|
{
|
|
"input_field": "my_text_field",
|
|
"output_field": "my_tokens"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
----
|
|
<1> Configuration object that defines the `input_field` for the {infer} process
|
|
and the `output_field` that will contain the {infer} results.
|
|
|
|
To ingest data through the pipeline to generate tokens with ELSER, refer to the
|
|
<<reindexing-data-elser>> section of the tutorial. After you successfully
|
|
ingested documents by using the pipeline, your index will contain the tokens
|
|
generated by ELSER. Tokens are learned associations capturing relevance, they
|
|
are not synonyms. To learn more about what tokens are, refer to
|
|
{ml-docs}/ml-nlp-elser.html#elser-tokens[this page].
|
|
|
|
// end::elser[]
|
|
|
|
|
|
// tag::dense-vector[]
|
|
|
|
This is how an ingest pipeline that uses a text embedding model is created:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _ingest/pipeline/my-text-embeddings-pipeline
|
|
{
|
|
"description": "Text embedding pipeline",
|
|
"processors": [
|
|
{
|
|
"inference": {
|
|
"model_id": "sentence-transformers__msmarco-minilm-l-12-v3", <1>
|
|
"target_field": "my_embeddings",
|
|
"field_map": { <2>
|
|
"my_text_field": "text_field"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
----
|
|
<1> The model ID of the text embedding model you want to use.
|
|
<2> The `field_map` object maps the input document field name (which is
|
|
`my_text_field` in this example) to the name of the field that the model expects
|
|
(which is always `text_field`).
|
|
|
|
To ingest data through the pipeline to generate text embeddings with your chosen
|
|
model, refer to the
|
|
{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an inference ingest pipeline]
|
|
section. The example shows how to create the pipeline with the inference
|
|
processor and reindex your data through the pipeline. After you successfully
|
|
ingested documents by using the pipeline, your index will contain the text
|
|
embeddings generated by the model.
|
|
|
|
// end::dense-vector[] |