mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-30 10:23:41 -04:00
Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels. This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be: "Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...]. This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification. See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works. The zeroshot classification task is configured as follows: ```js { // <snip> model configuration </snip> "inference_config" : { "zero_shot_classification": { "classification_labels": ["entailment", "neutral", "contradiction"], // <1> "labels": ["sad", "glad", "mad", "rad"], // <2> "multi_label": false, // <3> "hypothesis_template": "This example is {}.", // <4> "tokenization": { /*<snip> tokenization configuration </snip>*/} } } } ``` * <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case * <2> This is an optional parameter for the default zero_shot labels to attempt to classify * <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels * <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.` For inference in a pipeline one may provide label updates: ```js { //<snip> pipeline definition </snip> "processors": [ //<snip> other processors </snip> { "inference": { // <snip> general configuration </snip> "inference_config": { "zero_shot_classification": { "labels": ["humanities", "science", "mathematics", "technology"], // <1> "multi_label": true // <2> } } } } //<snip> other processors </snip> ] } ``` * <1> The `labels` we care about, these replace the default ones if they exist. * <2> Should the results allow multiple true labels Similarly one may provide label changes against the `_infer` endpoint ```js { "docs":[{ "text_field": "This is a very happy person"}], "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}} } ``` |
||
---|---|---|
.. | ||
aggregations | ||
analysis | ||
autoscaling | ||
cat | ||
ccr | ||
cluster | ||
commands | ||
data-management | ||
data-streams | ||
docs | ||
eql | ||
features/apis | ||
fleet | ||
graph | ||
high-availability | ||
how-to | ||
ilm | ||
images | ||
index-modules | ||
indices | ||
ingest | ||
licensing | ||
mapping | ||
migration | ||
ml | ||
modules | ||
monitoring | ||
query-dsl | ||
release-notes | ||
repositories-metering-api | ||
rest-api | ||
rollup | ||
scripting | ||
search | ||
searchable-snapshots | ||
settings | ||
setup | ||
shutdown/apis | ||
slm | ||
snapshot-restore | ||
sql | ||
tab-widgets | ||
text-structure/apis | ||
transform | ||
upgrade | ||
vectors | ||
aggregations.asciidoc | ||
alias.asciidoc | ||
analysis.asciidoc | ||
api-conventions.asciidoc | ||
cat.asciidoc | ||
cluster.asciidoc | ||
data-management.asciidoc | ||
data-rollup-transform.asciidoc | ||
datatiers.asciidoc | ||
dependencies-versions.asciidoc | ||
docs.asciidoc | ||
getting-started.asciidoc | ||
gs-index.asciidoc | ||
high-availability.asciidoc | ||
how-to.asciidoc | ||
index-extra-title-page.html | ||
index-modules.asciidoc | ||
index.asciidoc | ||
index.x.asciidoc | ||
indices.asciidoc | ||
ingest.asciidoc | ||
intro.asciidoc | ||
links.asciidoc | ||
mapping.asciidoc | ||
query-dsl.asciidoc | ||
redirects.asciidoc | ||
release-notes.asciidoc | ||
scripting.asciidoc | ||
search.asciidoc | ||
setup.asciidoc | ||
upgrade.asciidoc |