mirror of
https://github.com/elastic/kibana.git
synced 2025-04-21 16:29:04 -04:00
123 lines
No EOL
5.7 KiB
Text
123 lines
No EOL
5.7 KiB
Text
[role="xpack"]
|
|
[[job-tips]]
|
|
=== Machine learning job tips
|
|
++++
|
|
<titleabbrev>Job tips</titleabbrev>
|
|
++++
|
|
|
|
When you are creating a job in {kib}, the job creation wizards can provide
|
|
advice based on the characteristics of your data. By heeding these suggestions,
|
|
you can create jobs that are more likely to produce insightful {ml} results.
|
|
|
|
[[bucket-span]]
|
|
==== Bucket span
|
|
|
|
The bucket span is the time interval that {ml} analytics use to summarize and
|
|
model data for your job. When you create a job in {kib}, you can choose to
|
|
estimate a bucket span value based on your data characteristics.
|
|
|
|
NOTE: The bucket span must contain a valid time interval. For more information,
|
|
see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].
|
|
|
|
If you choose a value that is larger than one day or is significantly different
|
|
than the estimated value, you receive an informational message. For more
|
|
information about choosing an appropriate bucket span, see
|
|
{stack-ov}/ml-buckets.html[Buckets].
|
|
|
|
[[cardinality]]
|
|
==== Cardinality
|
|
|
|
If there are logical groupings of related entities in your data, {ml} analytics
|
|
can make data models and generate results that take these groupings into
|
|
consideration. For example, you might choose to split your data by user ID and
|
|
detect when users are accessing resources differently than they usually do.
|
|
|
|
If the field that you use to split your data has many different values, the
|
|
job uses more memory resources. In particular, if the cardinality of the
|
|
`by_field_name`, `over_field_name`, or `partition_field_name` is greater than
|
|
1000, you are advised that there might be high memory usage.
|
|
|
|
Likewise if you are performing population analysis and the cardinality of the
|
|
`over_field_name` is below 10, you are advised that this might not be a suitable
|
|
field to use. For more information, see
|
|
{stack-ov}/ml-configuring-pop.html[Performing population analysis].
|
|
|
|
[[detectors]]
|
|
==== Detectors
|
|
|
|
Each job must have one or more _detectors_. A detector applies an analytical
|
|
function to specific fields in your data. If your job does not contain a
|
|
detector or the detector does not contain a
|
|
{stack-ov}/ml-functions.html[valid function], you receive an error.
|
|
|
|
If a job contains duplicate detectors, you also receive an error. Detectors are
|
|
duplicates if they have the same `function`, `field_name`, `by_field_name`,
|
|
`over_field_name` and `partition_field_name`.
|
|
|
|
[[influencers]]
|
|
==== Influencers
|
|
|
|
When you create a job, you can specify _influencers_, which are also sometimes
|
|
referred to as _key fields_. Picking an influencer is strongly recommended for
|
|
the following reasons:
|
|
|
|
* It allows you to more easily assign blame for the anomaly
|
|
* It simplifies and aggregates the results
|
|
|
|
The best influencer is the person or thing that you want to blame for the
|
|
anomaly. In many cases, users or client IP addresses make excellent influencers.
|
|
Influencers can be any field in your data; they do not need to be fields that
|
|
are specified in your detectors, though they often are.
|
|
|
|
As a best practice, do not pick too many influencers. For example, you generally
|
|
do not need more than three. If you pick many influencers, the results can be
|
|
overwhelming and there is a small overhead to the analysis.
|
|
|
|
The job creation wizards in {kib} can suggest which fields to use as influencers.
|
|
|
|
[[model-memory-limits]]
|
|
==== Model memory limits
|
|
|
|
For each job, you can optionally specify a `model_memory_limit`, which is the
|
|
approximate maximum amount of memory resources that are required for analytical
|
|
processing. The default value is 1 GB. Once this limit is approached, data
|
|
pruning becomes more aggressive. Upon exceeding this limit, new entities are not
|
|
modeled.
|
|
|
|
You can also optionally specify the `xpack.ml.max_model_memory_limit` setting.
|
|
By default, it's not set, which means there is no upper bound on the acceptable
|
|
`model_memory_limit` values in your jobs.
|
|
|
|
TIP: If you set the `model_memory_limit` too high, it will be impossible to open
|
|
the job; jobs cannot be allocated to nodes that have insufficient memory to run
|
|
them.
|
|
|
|
If the estimated model memory limit for a job is greater than the model memory
|
|
limit for the job or the maximum model memory limit for the cluster, the job
|
|
creation wizards in {kib} generate a warning. If the estimated memory
|
|
requirement is only a little higher than the `model_memory_limit`, the job will
|
|
probably produce useful results. Otherwise, the actions you take to address
|
|
these warnings vary depending on the resources available in your cluster:
|
|
|
|
* If you are using the default value for the `model_memory_limit` and the {ml}
|
|
nodes in the cluster have lots of memory, the best course of action might be to
|
|
simply increase the job's `model_memory_limit`. Before doing this, however,
|
|
double-check that the chosen analysis makes sense. The default
|
|
`model_memory_limit` is relatively low to avoid accidentally creating a job that
|
|
uses a huge amount of memory.
|
|
* If the {ml} nodes in the cluster do not have sufficient memory to accommodate
|
|
a job of the estimated size, the only options are:
|
|
** Add bigger {ml} nodes to the cluster, or
|
|
** Accept that the job will hit its memory limit and will not necessarily find
|
|
all the anomalies it could otherwise find.
|
|
|
|
If you are using {ece} or the hosted Elasticsearch Service on Elastic Cloud,
|
|
`xpack.ml.max_model_memory_limit` is set to prevent you from creating jobs
|
|
that cannot be allocated to any {ml} nodes in the cluster. If you find that you
|
|
cannot increase `model_memory_limit` for your {ml} jobs, the solution is to
|
|
increase the size of the {ml} nodes in your cluster.
|
|
|
|
For more information about the `model_memory_limit` property and the
|
|
`xpack.ml.max_model_memory_limit` setting, see
|
|
{ref}/ml-job-resource.html#ml-analysisconfig[Analysis limits] and
|
|
{ref}/ml-settings.html[Machine learning settings]. |