elasticsearch/docs/reference/ml/anomaly-detection
Benjamin Trent 281ec58b8d
[ML] add new default char filter first_line_with_letters for machine learning categorization (#77457)
The char filter replaces the previous default of `first_non_blank_line`.

`first_non_blank_line` worked well to figure out what line had characters at all, but log lines 
like the following were handled poorly:
```
--------------------------------------------------------------------------------

Alias 'foo' already exists and this prevents setting up ILM for logs

--------------------------------------------------------------------------------
```
When combined with the `ml_standard` tokenizer, the first line was used:
```
--------------------------------------------------------------------------------
```
This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer.


The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true).

Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens:

```
"tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"]
```
2021-09-09 10:09:57 -04:00
..
apis [ML] add new default char filter first_line_with_letters for machine learning categorization (#77457) 2021-09-09 10:09:57 -04:00
functions [DOCS] Update datafeed details in ML docs (#76854) 2021-08-25 11:35:21 -07:00
ml-configuring-aggregations.asciidoc [DOCS] Update datafeed details in ML docs (#76854) 2021-08-25 11:35:21 -07:00
ml-configuring-alerts.asciidoc [DOCS] Adds anomaly job health alert type docs (#76659) 2021-08-30 16:11:34 +02:00
ml-configuring-categories.asciidoc [ML] add new default char filter first_line_with_letters for machine learning categorization (#77457) 2021-09-09 10:09:57 -04:00
ml-configuring-detector-custom-rules.asciidoc [DOCS] Fixes links to machine learning concepts (#75194) 2021-07-09 13:09:03 -07:00
ml-configuring-populations.asciidoc [DOCS] Clarifies terminology in Performing population analysis page. (#74237) 2021-06-18 09:03:38 +02:00
ml-configuring-transform.asciidoc [DOCS] Update datafeed details in ML docs (#76854) 2021-08-25 11:35:21 -07:00
ml-configuring-url.asciidoc [DOCS] Replaces index pattern in ML docs (#77041) 2021-09-01 10:26:06 -07:00
ml-delayed-data-detection.asciidoc [DOCS] Anomaly detection: Visualize delayed data (#75098) 2021-07-13 18:06:07 -07:00