mirror of
https://github.com/elastic/kibana.git
synced 2025-04-23 17:28:26 -04:00
This commit is contained in:
parent
c08c6a7bb7
commit
db9fc60384
9 changed files with 27 additions and 201 deletions
|
@ -16,3 +16,14 @@ See <<monitoring-kibana>>.
|
|||
== Upgrade Assistant
|
||||
|
||||
See <<upgrade-assistant>>.
|
||||
|
||||
[role="exclude",id="ml-jobs"]
|
||||
== Creating {anomaly-jobs}
|
||||
|
||||
This page has moved. Please see {stack-ov}/create-jobs.html[Creating {anomaly-jobs}].
|
||||
|
||||
[role="exclude",id="job-tips"]
|
||||
== Machine learning job tips
|
||||
|
||||
This page has moved. Please see {stack-ov}/job-tips.html[Machine learning job tips].
|
||||
|
||||
|
|
12
docs/user/extend.asciidoc
Normal file
12
docs/user/extend.asciidoc
Normal file
|
@ -0,0 +1,12 @@
|
|||
[[extend]]
|
||||
= Extend your use case
|
||||
|
||||
[partintro]
|
||||
--
|
||||
//TBD
|
||||
|
||||
* <<xpack-ml>>
|
||||
|
||||
--
|
||||
|
||||
include::ml/index.asciidoc[]
|
|
@ -20,7 +20,7 @@ include::timelion.asciidoc[]
|
|||
|
||||
include::canvas.asciidoc[]
|
||||
|
||||
include::ml/index.asciidoc[]
|
||||
include::extend.asciidoc[]
|
||||
|
||||
include::{kib-repo-dir}/maps/index.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,67 +0,0 @@
|
|||
[role="xpack"]
|
||||
[[ml-jobs]]
|
||||
== Creating {anomaly-jobs}
|
||||
|
||||
{anomaly-jobs-cap} contain the configuration information and metadata
|
||||
necessary to perform an analytics task.
|
||||
|
||||
{kib} provides the following wizards to make it easier to create jobs:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-create-job.jpg[Create New Job]
|
||||
|
||||
A _single metric job_ is a simple job that contains a single _detector_. A
|
||||
detector defines the type of analysis that will occur and which fields to
|
||||
analyze. In addition to limiting the number of detectors, the single metric job
|
||||
creation wizard omits many of the more advanced configuration options.
|
||||
|
||||
A _multi-metric job_ can contain more than one detector, which is more efficient
|
||||
than running multiple jobs against the same data.
|
||||
|
||||
A _population job_ detects activity that is unusual compared to the behavior of
|
||||
the population. For more information, see
|
||||
{stack-ov}/ml-configuring-pop.html[Performing population analysis].
|
||||
|
||||
An _advanced job_ can contain multiple detectors and enables you to configure all
|
||||
job settings.
|
||||
|
||||
{kib} can also recognize certain types of data and provide specialized wizards
|
||||
for that context. For example, if you use {filebeat-ref}/index.html[{filebeat}]
|
||||
to ship access logs from your
|
||||
http://nginx.org/[Nginx] and https://httpd.apache.org/[Apache] HTTP servers to
|
||||
{es}, the following wizards appear:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-data-recognizer-filebeat.jpg[A screenshot of the {filebeat} job creation wizards]
|
||||
|
||||
Likewise, if you use {auditbeat-ref}/index.html[{auditbeat}] to audit process
|
||||
activity on your systems, the following wizards appear:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-data-recognizer-auditbeat.jpg[A screenshot of the {auditbeat} job creation wizards]
|
||||
|
||||
These wizards create {anomaly-jobs}, dashboards, searches, and visualizations that
|
||||
are customized to help you analyze your {auditbeat} and {filebeat} data.
|
||||
|
||||
If you are not certain which type of job to create, you can use the
|
||||
*Data Visualizer* to learn more about your data. If your index pattern contains
|
||||
a time field, it can identify possible fields for {ml} analysis.
|
||||
|
||||
[NOTE]
|
||||
===============================
|
||||
If your data is located outside of {es}, you cannot use {kib} to create
|
||||
your jobs and you cannot use {dfeeds} to retrieve your data in real time.
|
||||
{anomaly-detect-cap} is still possible, however, by using APIs to
|
||||
create and manage jobs and post data to them. For more information, see
|
||||
{ref}/ml-apis.html[Machine Learning APIs].
|
||||
===============================
|
||||
|
||||
Ready to get some hands-on experience? See
|
||||
{stack-ov}/ml-getting-started.html[Getting Started with Machine Learning].
|
||||
|
||||
The following video tutorials also demonstrate single metric, multi-metric, and
|
||||
advanced jobs:
|
||||
|
||||
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
|
||||
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
|
||||
* https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning for the Elastic Stack: Detect Outliers in a Population]
|
Binary file not shown.
Before Width: | Height: | Size: 359 KiB |
Binary file not shown.
Before Width: | Height: | Size: 173 KiB |
Binary file not shown.
Before Width: | Height: | Size: 169 KiB |
|
@ -1,9 +1,6 @@
|
|||
[role="xpack"]
|
||||
[[xpack-ml]]
|
||||
= {ml-cap}
|
||||
|
||||
[partintro]
|
||||
--
|
||||
== {ml-cap}
|
||||
|
||||
As datasets increase in size and complexity, the human effort required to
|
||||
inspect dashboards or maintain rules for spotting infrastructure problems,
|
||||
|
@ -29,7 +26,7 @@ The *Data Visualizer* identifies the file format and field mappings. You can the
|
|||
optionally import that data into an {es} index.
|
||||
|
||||
If you have a trial or platinum license, you can
|
||||
<<ml-jobs,create {anomaly-jobs}>> and manage jobs and {dfeeds} from the *Job
|
||||
create {anomaly-jobs} and manage jobs and {dfeeds} from the *Job
|
||||
Management* pane:
|
||||
|
||||
[role="screenshot"]
|
||||
|
@ -67,10 +64,6 @@ web browser so that it does not block pop-up windows or create an exception for
|
|||
your {kib} URL.
|
||||
|
||||
For more information about the {anomaly-detect} feature, see
|
||||
https://www.elastic.co/what-is/elastic-stack-machine-learning and
|
||||
{stack-ov}/xpack-ml.html[{ml-cap} {anomaly-detect}].
|
||||
|
||||
--
|
||||
|
||||
include::creating-jobs.asciidoc[]
|
||||
include::job-tips.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,123 +0,0 @@
|
|||
[role="xpack"]
|
||||
[[job-tips]]
|
||||
=== Machine learning job tips
|
||||
++++
|
||||
<titleabbrev>Job tips</titleabbrev>
|
||||
++++
|
||||
|
||||
When you are creating a job in {kib}, the job creation wizards can provide
|
||||
advice based on the characteristics of your data. By heeding these suggestions,
|
||||
you can create jobs that are more likely to produce insightful {ml} results.
|
||||
|
||||
[[bucket-span]]
|
||||
==== Bucket span
|
||||
|
||||
The bucket span is the time interval that {ml} analytics use to summarize and
|
||||
model data for your job. When you create a job in {kib}, you can choose to
|
||||
estimate a bucket span value based on your data characteristics.
|
||||
|
||||
NOTE: The bucket span must contain a valid time interval. For more information,
|
||||
see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].
|
||||
|
||||
If you choose a value that is larger than one day or is significantly different
|
||||
than the estimated value, you receive an informational message. For more
|
||||
information about choosing an appropriate bucket span, see
|
||||
{xpack-ref}/ml-buckets.html[Buckets].
|
||||
|
||||
[[cardinality]]
|
||||
==== Cardinality
|
||||
|
||||
If there are logical groupings of related entities in your data, {ml} analytics
|
||||
can make data models and generate results that take these groupings into
|
||||
consideration. For example, you might choose to split your data by user ID and
|
||||
detect when users are accessing resources differently than they usually do.
|
||||
|
||||
If the field that you use to split your data has many different values, the
|
||||
job uses more memory resources. In particular, if the cardinality of the
|
||||
`by_field_name`, `over_field_name`, or `partition_field_name` is greater than
|
||||
1000, you are advised that there might be high memory usage.
|
||||
|
||||
Likewise if you are performing population analysis and the cardinality of the
|
||||
`over_field_name` is below 10, you are advised that this might not be a suitable
|
||||
field to use. For more information, see
|
||||
{xpack-ref}/ml-configuring-pop.html[Performing Population Analysis].
|
||||
|
||||
[[detectors]]
|
||||
==== Detectors
|
||||
|
||||
Each job must have one or more _detectors_. A detector applies an analytical
|
||||
function to specific fields in your data. If your job does not contain a
|
||||
detector or the detector does not contain a
|
||||
{stack-ov}/ml-functions.html[valid function], you receive an error.
|
||||
|
||||
If a job contains duplicate detectors, you also receive an error. Detectors are
|
||||
duplicates if they have the same `function`, `field_name`, `by_field_name`,
|
||||
`over_field_name` and `partition_field_name`.
|
||||
|
||||
[[influencers]]
|
||||
==== Influencers
|
||||
|
||||
When you create a job, you can specify _influencers_, which are also sometimes
|
||||
referred to as _key fields_. Picking an influencer is strongly recommended for
|
||||
the following reasons:
|
||||
|
||||
* It allows you to more easily assign blame for the anomaly
|
||||
* It simplifies and aggregates the results
|
||||
|
||||
The best influencer is the person or thing that you want to blame for the
|
||||
anomaly. In many cases, users or client IP addresses make excellent influencers.
|
||||
Influencers can be any field in your data; they do not need to be fields that
|
||||
are specified in your detectors, though they often are.
|
||||
|
||||
As a best practice, do not pick too many influencers. For example, you generally
|
||||
do not need more than three. If you pick many influencers, the results can be
|
||||
overwhelming and there is a small overhead to the analysis.
|
||||
|
||||
The job creation wizards in {kib} can suggest which fields to use as influencers.
|
||||
|
||||
[[model-memory-limits]]
|
||||
==== Model memory limits
|
||||
|
||||
For each job, you can optionally specify a `model_memory_limit`, which is the
|
||||
approximate maximum amount of memory resources that are required for analytical
|
||||
processing. The default value is 1 GB. Once this limit is approached, data
|
||||
pruning becomes more aggressive. Upon exceeding this limit, new entities are not
|
||||
modeled.
|
||||
|
||||
You can also optionally specify the `xpack.ml.max_model_memory_limit` setting.
|
||||
By default, it's not set, which means there is no upper bound on the acceptable
|
||||
`model_memory_limit` values in your jobs.
|
||||
|
||||
TIP: If you set the `model_memory_limit` too high, it will be impossible to open
|
||||
the job; jobs cannot be allocated to nodes that have insufficient memory to run
|
||||
them.
|
||||
|
||||
If the estimated model memory limit for a job is greater than the model memory
|
||||
limit for the job or the maximum model memory limit for the cluster, the job
|
||||
creation wizards in {kib} generate a warning. If the estimated memory
|
||||
requirement is only a little higher than the `model_memory_limit`, the job will
|
||||
probably produce useful results. Otherwise, the actions you take to address
|
||||
these warnings vary depending on the resources available in your cluster:
|
||||
|
||||
* If you are using the default value for the `model_memory_limit` and the {ml}
|
||||
nodes in the cluster have lots of memory, the best course of action might be to
|
||||
simply increase the job's `model_memory_limit`. Before doing this, however,
|
||||
double-check that the chosen analysis makes sense. The default
|
||||
`model_memory_limit` is relatively low to avoid accidentally creating a job that
|
||||
uses a huge amount of memory.
|
||||
* If the {ml} nodes in the cluster do not have sufficient memory to accommodate
|
||||
a job of the estimated size, the only options are:
|
||||
** Add bigger {ml} nodes to the cluster, or
|
||||
** Accept that the job will hit its memory limit and will not necessarily find
|
||||
all the anomalies it could otherwise find.
|
||||
|
||||
If you are using {ece} or the hosted Elasticsearch Service on Elastic Cloud,
|
||||
`xpack.ml.max_model_memory_limit` is set to prevent you from creating jobs
|
||||
that cannot be allocated to any {ml} nodes in the cluster. If you find that you
|
||||
cannot increase `model_memory_limit` for your {ml} jobs, the solution is to
|
||||
increase the size of the {ml} nodes in your cluster.
|
||||
|
||||
For more information about the `model_memory_limit` property and the
|
||||
`xpack.ml.max_model_memory_limit` setting, see
|
||||
{ref}/ml-job-resource.html#ml-analysisconfig[Analysis limits] and
|
||||
{ref}/ml-settings.html[Machine learning settings].
|
Loading…
Add table
Add a link
Reference in a new issue