mirror of
https://github.com/elastic/kibana.git
synced 2025-04-24 01:38:56 -04:00
[DOCS] Augments job validation tips (#21184)
This commit is contained in:
parent
c82e5a2644
commit
9395e7c25f
2 changed files with 75 additions and 14 deletions
|
@ -1,6 +1,6 @@
|
|||
[role="xpack"]
|
||||
[[ml-jobs]]
|
||||
== Creating Machine Learning Jobs
|
||||
== Creating machine learning jobs
|
||||
|
||||
Machine learning jobs contain the configuration information and metadata
|
||||
necessary to perform an analytics task.
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
[role="xpack"]
|
||||
[[job-tips]]
|
||||
=== Machine Learning Job Tips
|
||||
=== Machine learning job tips
|
||||
++++
|
||||
<titleabbrev>Job Tips</titleabbrev>
|
||||
<titleabbrev>Job tips</titleabbrev>
|
||||
++++
|
||||
|
||||
When you are creating a job in {kib}, the job creation wizards can provide
|
||||
|
@ -10,15 +10,19 @@ advice based on the characteristics of your data. By heeding these suggestions,
|
|||
you can create jobs that are more likely to produce insightful {ml} results.
|
||||
|
||||
[[bucket-span]]
|
||||
==== Bucket Span
|
||||
==== Bucket span
|
||||
|
||||
The bucket span is the time interval that {ml} analytics use to summarize and
|
||||
model data for your job. When you create a job in {kib}, you can choose to
|
||||
estimate a bucket span value based on your data characteristics. Typically, the
|
||||
estimated value is between 5 minutes to 1 hour. If you choose a value that is
|
||||
larger than one day or is significantly different than the estimated value, you
|
||||
receive an informational message. For more information about choosing an
|
||||
appropriate bucket span, see {xpack-ref}/ml-buckets.html[Buckets].
|
||||
estimate a bucket span value based on your data characteristics.
|
||||
|
||||
NOTE: The bucket span must contain a valid time interval. For more information,
|
||||
see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].
|
||||
|
||||
If you choose a value that is larger than one day or is significantly different
|
||||
than the estimated value, you receive an informational message. For more
|
||||
information about choosing an appropriate bucket span, see
|
||||
{xpack-ref}/ml-buckets.html[Buckets].
|
||||
|
||||
[[cardinality]]
|
||||
==== Cardinality
|
||||
|
@ -30,16 +34,26 @@ detect when users are accessing resources differently than they usually do.
|
|||
|
||||
If the field that you use to split your data has many different values, the
|
||||
job uses more memory resources. In particular, if the cardinality of the
|
||||
`partition_field_name` is greater than 100, you are advised to consider
|
||||
alternative options such as population analysis.
|
||||
`by_field_name`, `over_field_name`, or `partition_field_name` is greater than
|
||||
1000, you are advised that there might be high memory usage.
|
||||
|
||||
Likewise if you are performing population analysis and the cardinality of the
|
||||
`over_field_name` is below 10, you are advised that this might not be a suitable
|
||||
field to use.
|
||||
|
||||
For more information, see
|
||||
field to use. For more information, see
|
||||
{xpack-ref}/ml-configuring-pop.html[Performing Population Analysis].
|
||||
|
||||
[[detectors]]
|
||||
==== Detectors
|
||||
|
||||
Each job must have one or more _detectors_. A detector applies an analytical
|
||||
function to specific fields in your data. If your job does not contain a
|
||||
detector or the detector does not contain a
|
||||
{stack-ov}/ml-functions.html[valid function], you receive an error.
|
||||
|
||||
If a job contains duplicate detectors, you also receive an error. Detectors are
|
||||
duplicates if they have the same `function`, `field_name`, `by_field_name`,
|
||||
`over_field_name` and `partition_field_name`.
|
||||
|
||||
[[influencers]]
|
||||
==== Influencers
|
||||
|
||||
|
@ -60,3 +74,50 @@ do not need more than three. If you pick many influencers, the results can be
|
|||
overwhelming and there is a small overhead to the analysis.
|
||||
|
||||
The job creation wizards in {kib} can suggest which fields to use as influencers.
|
||||
|
||||
[[model-memory-limits]]
|
||||
==== Model memory limits
|
||||
|
||||
For each job, you can optionally specify a `model_memory_limit`, which is the
|
||||
approximate maximum amount of memory resources that are required for analytical
|
||||
processing. The default value is 1 GB. Once this limit is approached, data
|
||||
pruning becomes more aggressive. Upon exceeding this limit, new entities are not
|
||||
modeled.
|
||||
|
||||
You can also optionally specify the `xpack.ml.max_model_memory_limit` setting.
|
||||
By default, it's not set, which means there is no upper bound on the acceptable
|
||||
`model_memory_limit` values in your jobs.
|
||||
|
||||
TIP: If you set the `model_memory_limit` too high, it will be impossible to open
|
||||
the job; jobs cannot be allocated to nodes that have insufficient memory to run
|
||||
them.
|
||||
|
||||
If the estimated model memory limit for a job is greater than the model memory
|
||||
limit for the job or the maximum model memory limit for the cluster, the job
|
||||
creation wizards in {kib} generate a warning. If the estimated memory
|
||||
requirement is only a little higher than the `model_memory_limit`, the job will
|
||||
probably produce useful results. Otherwise, the actions you take to address
|
||||
these warnings vary depending on the resources available in your cluster:
|
||||
|
||||
* If you are using the default value for the `model_memory_limit` and the {ml}
|
||||
nodes in the cluster have lots of memory, the best course of action might be to
|
||||
simply increase the job's `model_memory_limit`. Before doing this, however,
|
||||
double-check that the chosen analysis makes sense. The default
|
||||
`model_memory_limit` is relatively low to avoid accidentally creating a job that
|
||||
uses a huge amount of memory.
|
||||
* If the {ml} nodes in the cluster do not have sufficient memory to accommodate
|
||||
a job of the estimated size, the only options are:
|
||||
** Add bigger {ml} nodes to the cluster, or
|
||||
** Accept that the job will hit its memory limit and will not necessarily find
|
||||
all the anomalies it could otherwise find.
|
||||
|
||||
If you are using {ece} or the hosted Elasticsearch Service on Elastic Cloud,
|
||||
`xpack.ml.max_model_memory_limit` is set to prevent you from creating jobs
|
||||
that cannot be allocated to any {ml} nodes in the cluster. If you find that you
|
||||
cannot increase `model_memory_limit` for your {ml} jobs, the solution is to
|
||||
increase the size of the {ml} nodes in your cluster.
|
||||
|
||||
For more information about the `model_memory_limit` property and the
|
||||
`xpack.ml.max_model_memory_limit` setting, see
|
||||
{ref}/ml-job-resource.html#ml-analysisconfig[Analysis limits] and
|
||||
{ref}/ml-settings.html[Machine learning settings].
|
Loading…
Add table
Add a link
Reference in a new issue