elasticsearch/docs/reference/ml/common/apis/get-ml-info.asciidoc
Ed Savage fd20027751
[ML] Performance improvements for categorization jobs (#89824)
Categorization of strings which break down to a huge number of tokens can cause the C++ backend process to choke - see elastic/ml-cpp#2403.

This PR adds a limit filter to the default categorization analyzer which caps the number of tokens passed to the backend at 100.

Unfortunately this isn't a complete panacea to all the issues surrounding categorization of many tokened / large messages as verification checks on the frontend can also fail due to calls to the datafeed _preview API returning an excessive amount of data.
2022-09-08 18:41:01 +01:00

131 lines
3.3 KiB
Text

[role="xpack"]
[[get-ml-info]]
= Get machine learning info API
[subs="attributes"]
++++
<titleabbrev>Get {ml} info</titleabbrev>
++++
Returns defaults and limits used by machine learning.
[[get-ml-info-request]]
== {api-request-title}
`GET _ml/info`
[[get-ml-info-prereqs]]
== {api-prereq-title}
Requires the `monitor_ml` cluster privilege. This privilege is included in the
`machine_learning_user` built-in role.
[[get-ml-info-desc]]
== {api-description-title}
This endpoint is designed to be used by a user interface that needs to fully
understand machine learning configurations where some options are not specified,
meaning that the defaults should be used. This endpoint may be used to find out
what those defaults are. It also provides information about the maximum size
of {ml} jobs that could run in the current cluster configuration.
[[get-ml-info-example]]
== {api-examples-title}
The endpoint takes no arguments:
[source,console]
--------------------------------------------------
GET _ml/info
--------------------------------------------------
// TEST
This is a possible response:
[source,console-result]
----
{
"defaults" : {
"anomaly_detectors" : {
"categorization_analyzer" : {
"char_filter" : [
"first_line_with_letters"
],
"tokenizer" : "ml_standard",
"filter" : [
{
"type" : "stop",
"stopwords" : [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday",
"Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun",
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct",
"Nov",
"Dec",
"GMT",
"UTC"
]
},
{
"type": "limit",
"max_token_count": "100"
}
]
},
"model_memory_limit" : "1gb",
"categorization_examples_limit" : 4,
"model_snapshot_retention_days" : 10,
"daily_model_snapshot_retention_after_days" : 1
},
"datafeeds" : {
"scroll_size" : 1000
}
},
"upgrade_mode": false,
"native_code" : {
"version": "7.0.0",
"build_hash": "99a07c016d5a73"
},
"limits" : {
"effective_max_model_memory_limit": "28961mb",
"total_ml_memory": "86883mb"
}
}
----
// TESTRESPONSE[s/"upgrade_mode": false/"upgrade_mode": $body.upgrade_mode/]
// TESTRESPONSE[s/"version": "7.0.0",/"version": "$body.native_code.version",/]
// TESTRESPONSE[s/"build_hash": "99a07c016d5a73"/"build_hash": "$body.native_code.build_hash"/]
// TESTRESPONSE[s/"effective_max_model_memory_limit": "28961mb"/"effective_max_model_memory_limit": "$body.limits.effective_max_model_memory_limit"/]
// TESTRESPONSE[s/"total_ml_memory": "86883mb"/"total_ml_memory": "$body.limits.total_ml_memory"/]