elasticsearch/docs/reference/ml/anomaly-detection/apis/get-ml-info.asciidoc
David Roberts 0059c59e25
[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805)
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.

The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.

It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.

To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.

If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.

Closes elastic/ml-cpp#1724
2021-06-01 15:11:32 +01:00

128 lines
3.3 KiB
Text

[role="xpack"]
[testenv="platinum"]
[[get-ml-info]]
= Get machine learning info API
[subs="attributes"]
++++
<titleabbrev>Get {ml} info</titleabbrev>
++++
Returns defaults and limits used by machine learning.
[[get-ml-info-request]]
== {api-request-title}
`GET _ml/info`
[[get-ml-info-prereqs]]
== {api-prereq-title}
Requires the `monitor_ml` cluster privilege. This privilege is included in the
`machine_learning_user` built-in role.
[[get-ml-info-desc]]
== {api-description-title}
This endpoint is designed to be used by a user interface that needs to fully
understand machine learning configurations where some options are not specified,
meaning that the defaults should be used. This endpoint may be used to find out
what those defaults are. It also provides information about the maximum size
of {ml} jobs that could run in the current cluster configuration.
[[get-ml-info-example]]
== {api-examples-title}
The endpoint takes no arguments:
[source,console]
--------------------------------------------------
GET _ml/info
--------------------------------------------------
// TEST
This is a possible response:
[source,console-result]
----
{
"defaults" : {
"anomaly_detectors" : {
"categorization_analyzer" : {
"char_filter" : [
"first_non_blank_line"
],
"tokenizer" : "ml_standard",
"filter" : [
{
"type" : "stop",
"stopwords" : [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday",
"Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun",
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct",
"Nov",
"Dec",
"GMT",
"UTC"
]
}
]
},
"model_memory_limit" : "1gb",
"categorization_examples_limit" : 4,
"model_snapshot_retention_days" : 10,
"daily_model_snapshot_retention_after_days" : 1
},
"datafeeds" : {
"scroll_size" : 1000
}
},
"upgrade_mode": false,
"native_code" : {
"version": "7.0.0",
"build_hash": "99a07c016d5a73"
},
"limits" : {
"effective_max_model_memory_limit": "28961mb",
"total_ml_memory": "86883mb"
}
}
----
// TESTRESPONSE[s/"upgrade_mode": false/"upgrade_mode": $body.upgrade_mode/]
// TESTRESPONSE[s/"version": "7.0.0",/"version": "$body.native_code.version",/]
// TESTRESPONSE[s/"build_hash": "99a07c016d5a73"/"build_hash": "$body.native_code.build_hash"/]
// TESTRESPONSE[s/"effective_max_model_memory_limit": "28961mb"/"effective_max_model_memory_limit": "$body.limits.effective_max_model_memory_limit"/]
// TESTRESPONSE[s/"total_ml_memory": "86883mb"/"total_ml_memory": "$body.limits.total_ml_memory"/]