elasticsearch/docs/reference/ml/anomaly-detection
David Roberts 0059c59e25
[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805)
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.

The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.

It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.

To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.

If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.

Closes elastic/ml-cpp#1724
2021-06-01 15:11:32 +01:00
..
apis [ML] Make ml_standard tokenizer the default for new categorization jobs (#72805) 2021-06-01 15:11:32 +01:00
functions [DOCS] Swap [float] for [discrete] (#60124) 2020-07-23 11:48:22 -04:00
ml-configuring-aggregations.asciidoc [ML] adding support for composite aggs in anomaly detection (#69970) 2021-03-30 08:25:40 -04:00
ml-configuring-alerts.asciidoc [DOCS] Adds anomaly detection rule advanced settings to docs (#72072) 2021-04-26 09:55:02 +02:00
ml-configuring-categories.asciidoc [ML] Make ml_standard tokenizer the default for new categorization jobs (#72805) 2021-06-01 15:11:32 +01:00
ml-configuring-detector-custom-rules.asciidoc [DOCS] Clarifies that custom rules are job rules in Kibana (#71678) 2021-04-15 09:33:03 +02:00
ml-configuring-populations.asciidoc [DOCS] Changes level offset of anomaly detection pages (#59911) 2020-07-20 16:33:54 -07:00
ml-configuring-transform.asciidoc [DOCS] Alters examples in anomaly detection page to use runtime mappings (#71745) 2021-04-19 13:06:50 +02:00
ml-configuring-url.asciidoc [DOCS] Makes the screenshot larger on the custom URLs page. (#65269) 2020-11-20 09:29:39 +01:00
ml-delayed-data-detection.asciidoc [DOCS] Clarify impact of delayed data in anomaly detection (#66816) 2021-01-05 12:14:51 -08:00