The _ml/info response now includes two extra fields in its
`limits`:
1. `max_single_ml_node_processors`
2. `total_ml_processors`
These fields are _only_ included if they can be accurately
calculated. If autoscaling is enabled and the ML nodes are
not at their maximum size then these fields _cannot_
currently be accurately calculated. (This could potentially
be improved in the future with additional settings set by
the control plane.)
Categorization of strings which break down to a huge number of tokens can cause the C++ backend process to choke - see elastic/ml-cpp#2403.
This PR adds a limit filter to the default categorization analyzer which caps the number of tokens passed to the backend at 100.
Unfortunately this isn't a complete panacea to all the issues surrounding categorization of many tokened / large messages as verification checks on the frontend can also fail due to calls to the datafeed _preview API returning an excessive amount of data.