elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 09:54:06 -04:00

History

Ed Savage fd20027751 [ML] Performance improvements for categorization jobs (#89824 ) Categorization of strings which break down to a huge number of tokens can cause the C++ backend process to choke - see elastic/ml-cpp#2403. This PR adds a limit filter to the default categorization analyzer which caps the number of tokens passed to the backend at 100. Unfortunately this isn't a complete panacea to all the issues surrounding categorization of many tokened / large messages as verification checks on the frontend can also fail due to calls to the datafeed _preview API returning an excessive amount of data.	2022-09-08 18:41:01 +01:00
..
apis	[ML] Performance improvements for categorization jobs (#89824 )	2022-09-08 18:41:01 +01:00

Ed Savage fd20027751

[ML] Performance improvements for categorization jobs (#89824 )

Categorization of strings which break down to a huge number of tokens can cause the C++ backend process to choke - see elastic/ml-cpp#2403.

This PR adds a limit filter to the default categorization analyzer which caps the number of tokens passed to the backend at 100.

Unfortunately this isn't a complete panacea to all the issues surrounding categorization of many tokened / large messages as verification checks on the frontend can also fail due to calls to the datafeed _preview API returning an excessive amount of data.

2022-09-08 18:41:01 +01:00

apis

[ML] Performance improvements for categorization jobs (#89824 )

2022-09-08 18:41:01 +01:00