elasticsearch/docs/reference/ml
Ed Savage fd20027751
[ML] Performance improvements for categorization jobs (#89824)
Categorization of strings which break down to a huge number of tokens can cause the C++ backend process to choke - see elastic/ml-cpp#2403.

This PR adds a limit filter to the default categorization analyzer which caps the number of tokens passed to the backend at 100.

Unfortunately this isn't a complete panacea to all the issues surrounding categorization of many tokened / large messages as verification checks on the frontend can also fail due to calls to the datafeed _preview API returning an excessive amount of data.
2022-09-08 18:41:01 +01:00
..
anomaly-detection [DOCS] Simplifies composite aggregation recommendation (#89878) 2022-09-07 17:54:05 +02:00
common/apis [ML] Performance improvements for categorization jobs (#89824) 2022-09-08 18:41:01 +01:00
df-analytics/apis [ML] Lift limit of max number of classes for classification to 100 (#89755) 2022-09-01 10:47:58 +03:00
images [DOCS] Updates anomaly detection alert rule type screenshot. (#89532) 2022-08-23 15:37:40 +02:00
trained-models/apis [DOCS] Adds text similarity task example to API docs (#89756) 2022-09-01 11:53:26 +02:00
ml-shared.asciidoc [ML] add text_similarity nlp task documentation (#88994) 2022-08-02 12:17:14 -04:00