elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 10:23:41 -04:00

History

Benjamin Trent 408489310c [ML] add zero_shot_classification task for BERT nlp models (#77799 ) Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels. This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be: "Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...]. This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification. See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works. The zeroshot classification task is configured as follows: ```js { // <snip> model configuration </snip> "inference_config" : { "zero_shot_classification": { "classification_labels": ["entailment", "neutral", "contradiction"], // <1> "labels": ["sad", "glad", "mad", "rad"], // <2> "multi_label": false, // <3> "hypothesis_template": "This example is {}.", // <4> "tokenization": { /<snip> tokenization configuration </snip>/} } } } ``` * <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case * <2> This is an optional parameter for the default zero_shot labels to attempt to classify * <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels * <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.` For inference in a pipeline one may provide label updates: ```js { //<snip> pipeline definition </snip> "processors": [ //<snip> other processors </snip> { "inference": { // <snip> general configuration </snip> "inference_config": { "zero_shot_classification": { "labels": ["humanities", "science", "mathematics", "technology"], // <1> "multi_label": true // <2> } } } } //<snip> other processors </snip> ] } ``` * <1> The `labels` we care about, these replace the default ones if they exist. * <2> Should the results allow multiple true labels Similarly one may provide label changes against the `_infer` endpoint ```js { "docs":[{ "text_field": "This is a very happy person"}], "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}} } ```		2021-09-28 09:38:23 -04:00
..
aggregations	Document missing_order param for composite aggregations (#77839 )	2021-09-27 09:57:45 +02:00
analysis	change a typo in first letter of a user query (#76394 ) (#76450 )	2021-08-12 14:28:51 -04:00
autoscaling	Autoscale frozen tier into existence (#73435 )	2021-06-22 13:21:04 +02:00
cat	[DOCS] Update anchor and add redirect for aliases (#77349 )	2021-09-07 09:42:42 -04:00
ccr	Add docs for pre-release version compatibility (#78317 )	2021-09-27 16:56:35 +01:00
cluster	Adding priority list and executing description to the pending tasks doc (#74456 ) (#78259 )	2021-09-23 11:17:18 -04:00
commands	[DOCS] Re-add docs for multiple data paths (MDP) (#78342 )	2021-09-28 09:20:45 -04:00
data-management	[DOCS] How to migrate to node roles from node attrs. Closes #65855 (#71160 )	2021-04-27 14:39:54 -07:00
data-streams	[DOCS] Remove 'step' from headings (#76753 )	2021-08-20 08:52:04 -04:00
docs	[DOCS] Change `_routing` to `routing` in mget API docs (#76214 ) (#76304 )	2021-08-10 13:08:50 -04:00
eql	[DOCS] Update remote cluster docs (#77043 )	2021-09-22 16:02:33 -04:00
features/apis	Make feature reset API response more informative (#71240 )	2021-04-27 13:47:10 -04:00
fleet	[DOCS] Relocate tip for Fleet APIs	2021-07-26 18:14:05 -04:00
graph	[DOCS] Fix API titles (#67475 )	2021-01-13 15:15:37 -05:00
high-availability	[DOCS] Convert 'Restore a snapshot' to tutorial (#76929 )	2021-09-20 13:17:24 -04:00
how-to	[DOCS] Use dedicated hosts for ES (#77582 )	2021-09-21 17:50:21 -04:00
ilm	Add docs on searchable snaps costs (#77607 )	2021-09-15 03:27:52 -04:00
images	[DOCS] Updated ILM policies table screenshot (#77240 )	2021-09-10 10:17:53 +02:00
index-modules	[DOCS] Fix typo (#75635 ) (#75705 )	2021-07-26 18:05:22 -04:00
indices	[DOCS] Update remote cluster docs (#77043 )	2021-09-22 16:02:33 -04:00
ingest	Update JSON parser and snippets (#77983 )	2021-09-20 11:08:26 +01:00
licensing	[DOCS] Note get license API can return a `404` (#73951 )	2021-06-10 09:19:20 -04:00
mapping	[DOCS] Document `time_series_metric` mapping parameter (#78013 )	2021-09-23 08:54:19 -04:00
migration	[DOCS] Re-add docs for multiple data paths (MDP) (#78342 )	2021-09-28 09:20:45 -04:00
ml	[ML] add zero_shot_classification task for BERT nlp models (#77799 )	2021-09-28 09:38:23 -04:00
modules	Add docs for pre-release version compatibility (#78317 )	2021-09-27 16:56:35 +01:00
monitoring	update monitoring cluster node name (#74500 )	2021-06-28 09:30:55 -04:00
query-dsl	[DOCS] Clarify geoshape orientation docs (#75888 )	2021-09-08 11:10:03 -04:00
release-notes	8.0.0-alpha2 release notes (#77821 )	2021-09-15 19:40:46 -05:00
repositories-metering-api	[DOCS] Fix name of `cluster_version` parameter (#69615 )	2021-03-01 08:54:47 -05:00
rest-api	Update JSON parser and snippets (#77983 )	2021-09-20 11:08:26 +01:00
rollup	Fix privileges for GetRollupIndexCapabilities API (#75614 )	2021-07-29 11:57:42 +10:00
scripting	[DOCS] Add ES security principles (#76850 )	2021-08-31 12:37:22 -04:00
search	Add cross cluster search test for mvt end point (#78054 )	2021-09-23 07:59:44 +02:00
searchable-snapshots	Add docs on searchable snaps costs (#77607 )	2021-09-15 03:27:52 -04:00
settings	Remove HTTPS check for API Keys & Service Accounts (#76801 )	2021-09-22 07:32:03 +10:00
setup	[DOCS] Re-add docs for multiple data paths (MDP) (#78342 )	2021-09-28 09:20:45 -04:00
shutdown/apis	Node shutdown API docs (#74505 )	2021-08-19 12:42:27 -04:00
slm	[DOCS] Reuse snapshot config in put SLM policy API docs (#76712 )	2021-08-20 08:29:16 -04:00
snapshot-restore	Add docs for pre-release version compatibility (#78317 )	2021-09-27 16:56:35 +01:00
sql	SQL: Fix disjunctions (and `IN`) with multiple date math expressions (#76424 )	2021-08-31 17:30:49 +02:00
tab-widgets	[DOCS] Re-add docs for multiple data paths (MDP) (#78342 )	2021-09-28 09:20:45 -04:00
text-structure/apis	[ML] [DOCS] update find-structure reference docs (#67586 )	2021-01-15 12:19:38 -05:00
transform	[DOCS] Fix formatting (#77567 )	2021-09-10 09:33:32 -07:00
upgrade	[DOCS] Fix upgrade version logic for `-alpha` and `-beta` releases (#76727 )	2021-08-20 08:29:35 -04:00
vectors	Add access to dense_vector values (#71313 )	2021-04-19 08:02:05 -04:00
aggregations.asciidoc	Convert bucket aggs docs to runtime fields (#71202 )	2021-04-02 12:12:06 -04:00
alias.asciidoc	[DOCS] Fix default for `is_write_index` (#77006 ) (#77362 )	2021-09-07 11:34:53 -04:00
analysis.asciidoc	[DOCS] Fix Lucene's stop words links (#70405 )	2021-03-16 17:06:12 -04:00
api-conventions.asciidoc	[DOCS] Adds information about version compatibility headers (#77096 )	2021-09-03 14:33:23 -07:00
cat.asciidoc	[DOCS] Remove unneeded escapes	2021-04-26 12:14:45 -04:00
cluster.asciidoc	[DOCS] Reword node roles docs (#69301 )	2021-02-23 11:32:46 -05:00
data-management.asciidoc	[DOCS] Move Kibana index mgmt docs to ES (#64380 )	2020-10-30 09:14:52 -04:00
data-rollup-transform.asciidoc	[DOCS] Remove ifdefs for rollup refactor	2021-08-05 09:08:04 -04:00
datatiers.asciidoc	[DOCS] Note required node roles and data tiers (#74566 )	2021-07-07 09:57:32 -04:00
dependencies-versions.asciidoc	[DOCS] Added appendix to show dependencies (#67962 )	2021-01-26 16:16:05 -08:00
docs.asciidoc	[DOCS] Update single index APIs reference (#73103 )	2021-05-14 11:53:34 -04:00
getting-started.asciidoc	[DOCS] Remove 'step' from headings (#76753 )	2021-08-20 08:52:04 -04:00
gs-index.asciidoc	[DOCS] Adding index file for GS "mini book".	2017-07-18 13:44:08 -07:00
high-availability.asciidoc	[DOCS] Add docs for designing resilient clusters (#47233 )	2020-06-05 11:48:44 -04:00
how-to.asciidoc	[DOCS] Add 'Fix common cluster issues' docs (#72097 )	2021-04-28 08:28:51 -04:00
index-extra-title-page.html	[DOCS] Add `index-extra-title-page.html` for direct HTML migration (#50189 )	2019-12-13 12:44:12 -05:00
index-modules.asciidoc	[DOCS] Document `time_series_metric` mapping parameter (#78013 )	2021-09-23 08:54:19 -04:00
index.asciidoc	[DOCS] Move ES glossary to Stack docs (#74579 )	2021-06-24 19:04:31 -04:00
index.x.asciidoc	[DOCS] Removes redundant index.asciidoc files (#30707 )	2018-05-18 11:05:40 -07:00
indices.asciidoc	[DOCS] Fix broken doc url values in JSON API spec (#75385 )	2021-07-15 13:46:00 -04:00
ingest.asciidoc	[DOCS] Document regex circuit breaker (#76048 )	2021-08-04 16:37:29 -04:00
intro.asciidoc	[DOCS] Update ES intro for stretched clusters (#77651 )	2021-09-13 16:50:08 -04:00
links.asciidoc	[DOCS] Rename ES Reference to ES Guide (#71198 )	2021-04-01 15:38:41 -04:00
mapping.asciidoc	Minor revision missed in merge. (#67282 )	2021-01-11 13:50:06 -05:00
query-dsl.asciidoc	[DOCS] Remove deprecated `geo_shape` parameters (#74519 )	2021-06-29 08:52:05 -04:00
redirects.asciidoc	[DOCS] Update remote cluster docs (#77043 )	2021-09-22 16:02:33 -04:00
release-notes.asciidoc	[DOCS] Add placeholder for 8.0.0-alpha2 release notes (#77782 )	2021-09-15 10:13:13 -04:00
scripting.asciidoc	[DOCS] Move common scripting use cases up a level (#73445 )	2021-05-27 07:38:55 -04:00
search.asciidoc	[DOCS] Document `_mvt` API (#75384 )	2021-08-05 15:04:07 -04:00
setup.asciidoc	[DOCS] Edit dedicated hosts section heading	2021-09-21 17:53:07 -04:00
upgrade.asciidoc	Add docs for pre-release version compatibility (#78317 )	2021-09-27 16:56:35 +01:00