elasticsearch/docs/reference/ml/anomaly-detection/apis/get-ml-info.asciidoc
Benjamin Trent 281ec58b8d
[ML] add new default char filter first_line_with_letters for machine learning categorization (#77457)
The char filter replaces the previous default of `first_non_blank_line`.

`first_non_blank_line` worked well to figure out what line had characters at all, but log lines 
like the following were handled poorly:
```
--------------------------------------------------------------------------------

Alias 'foo' already exists and this prevents setting up ILM for logs

--------------------------------------------------------------------------------
```
When combined with the `ml_standard` tokenizer, the first line was used:
```
--------------------------------------------------------------------------------
```
This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer.


The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true).

Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens:

```
"tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"]
```
2021-09-09 10:09:57 -04:00

128 lines
3.3 KiB
Text

[role="xpack"]
[testenv="platinum"]
[[get-ml-info]]
= Get machine learning info API
[subs="attributes"]
++++
<titleabbrev>Get {ml} info</titleabbrev>
++++
Returns defaults and limits used by machine learning.
[[get-ml-info-request]]
== {api-request-title}
`GET _ml/info`
[[get-ml-info-prereqs]]
== {api-prereq-title}
Requires the `monitor_ml` cluster privilege. This privilege is included in the
`machine_learning_user` built-in role.
[[get-ml-info-desc]]
== {api-description-title}
This endpoint is designed to be used by a user interface that needs to fully
understand machine learning configurations where some options are not specified,
meaning that the defaults should be used. This endpoint may be used to find out
what those defaults are. It also provides information about the maximum size
of {ml} jobs that could run in the current cluster configuration.
[[get-ml-info-example]]
== {api-examples-title}
The endpoint takes no arguments:
[source,console]
--------------------------------------------------
GET _ml/info
--------------------------------------------------
// TEST
This is a possible response:
[source,console-result]
----
{
"defaults" : {
"anomaly_detectors" : {
"categorization_analyzer" : {
"char_filter" : [
"first_line_with_letters"
],
"tokenizer" : "ml_standard",
"filter" : [
{
"type" : "stop",
"stopwords" : [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday",
"Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun",
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct",
"Nov",
"Dec",
"GMT",
"UTC"
]
}
]
},
"model_memory_limit" : "1gb",
"categorization_examples_limit" : 4,
"model_snapshot_retention_days" : 10,
"daily_model_snapshot_retention_after_days" : 1
},
"datafeeds" : {
"scroll_size" : 1000
}
},
"upgrade_mode": false,
"native_code" : {
"version": "7.0.0",
"build_hash": "99a07c016d5a73"
},
"limits" : {
"effective_max_model_memory_limit": "28961mb",
"total_ml_memory": "86883mb"
}
}
----
// TESTRESPONSE[s/"upgrade_mode": false/"upgrade_mode": $body.upgrade_mode/]
// TESTRESPONSE[s/"version": "7.0.0",/"version": "$body.native_code.version",/]
// TESTRESPONSE[s/"build_hash": "99a07c016d5a73"/"build_hash": "$body.native_code.build_hash"/]
// TESTRESPONSE[s/"effective_max_model_memory_limit": "28961mb"/"effective_max_model_memory_limit": "$body.limits.effective_max_model_memory_limit"/]
// TESTRESPONSE[s/"total_ml_memory": "86883mb"/"total_ml_memory": "$body.limits.total_ml_memory"/]