|
@ -35,3 +35,18 @@ This page has moved. Please see <<infra-configure-source>>.
|
|||
|
||||
This page has moved. Please see <<xpack-logs-configuring>>.
|
||||
|
||||
[role="exclude",id="creating-df-kib"]
|
||||
== Creating {transforms}
|
||||
|
||||
This page is deleted. Please see
|
||||
{stack-ov}/ecommerce-dataframes.html[Transforming the eCommerce sample data].
|
||||
|
||||
[role="exclude",id="ml-jobs"]
|
||||
== Creating {anomaly-jobs}
|
||||
|
||||
This page has moved. Please see {stack-ov}/create-jobs.html[Creating {anomaly-jobs}].
|
||||
|
||||
[role="exclude",id="job-tips"]
|
||||
== Machine learning job tips
|
||||
|
||||
This page has moved. Please see {stack-ov}/job-tips.html[Machine learning job tips].
|
||||
|
|
12
docs/user/extend.asciidoc
Normal file
|
@ -0,0 +1,12 @@
|
|||
[[extend]]
|
||||
= Extend your use case
|
||||
|
||||
[partintro]
|
||||
--
|
||||
//TBD
|
||||
|
||||
* <<xpack-ml>>
|
||||
|
||||
--
|
||||
|
||||
include::ml/index.asciidoc[]
|
|
@ -16,7 +16,7 @@ include::dashboard.asciidoc[]
|
|||
|
||||
include::canvas.asciidoc[]
|
||||
|
||||
include::ml/index.asciidoc[]
|
||||
include::extend.asciidoc[]
|
||||
|
||||
include::{kib-repo-dir}/maps/index.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,50 +0,0 @@
|
|||
[role="xpack"]
|
||||
[[creating-df-kib]]
|
||||
== Creating {dataframe-transforms}
|
||||
|
||||
beta[]
|
||||
|
||||
You can create {stack-ov}/ml-dataframes.html[{dataframe-transforms}] in the
|
||||
{kib} Machine Learning application.
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-definepivot.jpg["Defining a {dataframe} pivot"]
|
||||
|
||||
Select the index pattern or saved search you want to transform. To pivot your
|
||||
data, you must group the data by at least one field and apply at least one
|
||||
aggregation. The {dataframe} pivot preview on the right side provides a visual
|
||||
verification.
|
||||
|
||||
Once you have created the pivot, add a job ID and define the index for the
|
||||
transformed data (_target index_). If the target index does not exist, it will be
|
||||
created automatically. You can optionally select to create a {kib} index pattern
|
||||
for the target index. At the end of the process, a {dataframe} job is created as
|
||||
a result.
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-jobid.jpg["Job ID and target index"]
|
||||
|
||||
After you create {dataframe} jobs, you can start, stop, and delete them
|
||||
and explore their progress and statistics from the jobs list.
|
||||
|
||||
For a more detailed example of using {dataframes} with the {kib} sample data,
|
||||
see {stack-ov}/ecommerce-dataframes.html[Transforming your data].
|
||||
|
||||
[NOTE]
|
||||
===============================
|
||||
If {stack} {security-features} are enabled, you must have appropriate authority
|
||||
to work with {dataframes}. For example, there are built-in
|
||||
`data_frame_transforms_admin` and `data_frame_transforms_user` roles that have
|
||||
`manage_data_frame_transforms` and `monitor_data_frame_transforms` cluster
|
||||
privileges respectively. See
|
||||
{stack-ov}/built-in-roles.html[Built-in roles] and
|
||||
{stack-ov}/security-privileges.html[Security privileges].
|
||||
|
||||
Depending on what tasks you perform, you might require additional privileges.
|
||||
For example, to create a {dataframe-transform} and generate a new target index,
|
||||
you need `manage_data_frame_transforms` cluster privileges, `read` and
|
||||
`view_index_metadata` privileges on the source index, and `read`, `create_index`,
|
||||
and `index` privileges on the target index. For more information, see the
|
||||
authorization details for each {ref}/data-frame-apis.html[{dataframe} API].
|
||||
|
||||
===============================
|
|
@ -1,83 +0,0 @@
|
|||
[role="xpack"]
|
||||
[[ml-jobs]]
|
||||
== Creating {anomaly-jobs}
|
||||
|
||||
{anomaly-jobs-cap} contain the configuration information and metadata
|
||||
necessary to perform an analytics task.
|
||||
|
||||
{kib} provides the following wizards to make it easier to create jobs:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-create-job.jpg[Create New Job]
|
||||
|
||||
A _single metric job_ is a simple job that contains a single _detector_. A
|
||||
detector defines the type of analysis that will occur and which fields to
|
||||
analyze. In addition to limiting the number of detectors, the single metric job
|
||||
creation wizard omits many of the more advanced configuration options.
|
||||
|
||||
A _multi-metric job_ can contain more than one detector, which is more efficient
|
||||
than running multiple jobs against the same data.
|
||||
|
||||
A _population job_ detects activity that is unusual compared to the behavior of
|
||||
the population. For more information, see
|
||||
{stack-ov}/ml-configuring-pop.html[Performing population analysis].
|
||||
|
||||
An _advanced job_ can contain multiple detectors and enables you to configure all
|
||||
job settings.
|
||||
|
||||
{kib} can also recognize certain types of data and provide specialized wizards
|
||||
for that context. For example, if you
|
||||
<<add-sample-data,added the sample web log data set>>, the following wizard
|
||||
appears:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-data-recognizer-sample.jpg[A screenshot of the {kib} sample data web log job creation wizard]
|
||||
|
||||
TIP: Alternatively, after you load a sample data set on the {kib} home page, you can click *View data* > *ML jobs*. There are {anomaly-jobs} for both the sample eCommerce orders data set and the sample web logs data set.
|
||||
|
||||
If you use {filebeat-ref}/index.html[{filebeat}]
|
||||
to ship access logs from your
|
||||
http://nginx.org/[Nginx] and https://httpd.apache.org/[Apache] HTTP servers to
|
||||
{es} and store it using fields and datatypes from the
|
||||
{ecs-ref}/ecs-reference.html[Elastic Common Schema (ECS)], the following wizards
|
||||
appear:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-data-recognizer-filebeat.jpg[A screenshot of the {filebeat} job creation wizards]
|
||||
|
||||
If you use {auditbeat-ref}/index.html[{auditbeat}] to audit process
|
||||
activity on your systems, the following wizards appear:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-data-recognizer-auditbeat.jpg[A screenshot of the {auditbeat} job creation wizards]
|
||||
|
||||
Likewise, if you use the {metricbeat-ref}/metricbeat-module-system.html[{metricbeat} system module] to monitor your servers, the following
|
||||
wizards appear:
|
||||
|
||||
[role="screenshot"]
|
||||
image::user/ml/images/ml-data-recognizer-metricbeat.jpg[A screenshot of the {metricbeat} job creation wizards]
|
||||
|
||||
These wizards create {anomaly-jobs}, dashboards, searches, and visualizations
|
||||
that are customized to help you analyze your {auditbeat}, {filebeat}, and
|
||||
{metricbeat} data.
|
||||
|
||||
[NOTE]
|
||||
===============================
|
||||
If your data is located outside of {es}, you cannot use {kib} to create
|
||||
your jobs and you cannot use {dfeeds} to retrieve your data in real time.
|
||||
{anomal-detect-cap} is still possible, however, by using APIs to
|
||||
create and manage jobs and post data to them. For more information, see
|
||||
{ref}/ml-apis.html[{ml-cap} {anomaly-detect} APIs].
|
||||
===============================
|
||||
|
||||
////
|
||||
Ready to get some hands-on experience? See
|
||||
{stack-ov}/ml-getting-started.html[Getting Started with Machine Learning].
|
||||
|
||||
The following video tutorials also demonstrate single metric, multi-metric, and
|
||||
advanced jobs:
|
||||
|
||||
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
|
||||
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
|
||||
* https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning for the Elastic Stack: Detect Outliers in a Population]
|
||||
////
|
Before Width: | Height: | Size: 117 KiB |
Before Width: | Height: | Size: 204 KiB |
Before Width: | Height: | Size: 190 KiB |
Before Width: | Height: | Size: 156 KiB |
Before Width: | Height: | Size: 72 KiB |
Before Width: | Height: | Size: 319 KiB |
Before Width: | Height: | Size: 237 KiB |
|
@ -1,9 +1,6 @@
|
|||
[role="xpack"]
|
||||
[[xpack-ml]]
|
||||
= {ml-cap}
|
||||
|
||||
[partintro]
|
||||
--
|
||||
== {ml-cap}
|
||||
|
||||
As datasets increase in size and complexity, the human effort required to
|
||||
inspect dashboards or maintain rules for spotting infrastructure problems,
|
||||
|
@ -29,7 +26,7 @@ size). The *Data Visualizer* identifies the file format and field mappings. You
|
|||
can then optionally import that data into an {es} index.
|
||||
|
||||
If you have a trial or platinum license, you can
|
||||
<<ml-jobs,create {anomaly-jobs}>> and manage jobs and {dfeeds} from the *Job
|
||||
create {anomaly-jobs} and manage jobs and {dfeeds} from the *Job
|
||||
Management* pane:
|
||||
|
||||
[role="screenshot"]
|
||||
|
@ -67,11 +64,6 @@ browser so that it does not block pop-up windows or create an exception for your
|
|||
{kib} URL.
|
||||
|
||||
For more information about the {anomaly-detect} feature, see
|
||||
https://www.elastic.co/what-is/elastic-stack-machine-learning and
|
||||
{stack-ov}/xpack-ml.html[{ml-cap} {anomaly-detect}].
|
||||
|
||||
--
|
||||
|
||||
include::creating-jobs.asciidoc[]
|
||||
include::job-tips.asciidoc[]
|
||||
include::creating-df-kib.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,124 +0,0 @@
|
|||
[role="xpack"]
|
||||
[[job-tips]]
|
||||
=== Machine learning job tips
|
||||
++++
|
||||
<titleabbrev>Job tips</titleabbrev>
|
||||
++++
|
||||
|
||||
When you create an {anomaly-job} in {kib}, the job creation wizards can
|
||||
provide advice based on the characteristics of your data. By heeding these
|
||||
suggestions, you can create jobs that are more likely to produce insightful {ml}
|
||||
results.
|
||||
|
||||
[[bucket-span]]
|
||||
==== Bucket span
|
||||
|
||||
The bucket span is the time interval that {ml} analytics use to summarize and
|
||||
model data for your job. When you create an {anomaly-job} in {kib}, you can
|
||||
choose to estimate a bucket span value based on your data characteristics.
|
||||
|
||||
NOTE: The bucket span must contain a valid time interval. For more information,
|
||||
see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].
|
||||
|
||||
If you choose a value that is larger than one day or is significantly different
|
||||
than the estimated value, you receive an informational message. For more
|
||||
information about choosing an appropriate bucket span, see
|
||||
{stack-ov}/ml-buckets.html[Buckets].
|
||||
|
||||
[[cardinality]]
|
||||
==== Cardinality
|
||||
|
||||
If there are logical groupings of related entities in your data, {ml} analytics
|
||||
can make data models and generate results that take these groupings into
|
||||
consideration. For example, you might choose to split your data by user ID and
|
||||
detect when users are accessing resources differently than they usually do.
|
||||
|
||||
If the field that you use to split your data has many different values, the
|
||||
job uses more memory resources. In particular, if the cardinality of the
|
||||
`by_field_name`, `over_field_name`, or `partition_field_name` is greater than
|
||||
1000, you are advised that there might be high memory usage.
|
||||
|
||||
Likewise if you are performing population analysis and the cardinality of the
|
||||
`over_field_name` is below 10, you are advised that this might not be a suitable
|
||||
field to use. For more information, see
|
||||
{stack-ov}/ml-configuring-pop.html[Performing Population Analysis].
|
||||
|
||||
[[detectors]]
|
||||
==== Detectors
|
||||
|
||||
Each {anomaly-job} must have one or more _detectors_. A detector applies an
|
||||
analytical function to specific fields in your data. If your job does not
|
||||
contain a detector or the detector does not contain a
|
||||
{stack-ov}/ml-functions.html[valid function], you receive an error.
|
||||
|
||||
If a job contains duplicate detectors, you also receive an error. Detectors are
|
||||
duplicates if they have the same `function`, `field_name`, `by_field_name`,
|
||||
`over_field_name` and `partition_field_name`.
|
||||
|
||||
[[influencers]]
|
||||
==== Influencers
|
||||
|
||||
When you create an {anomaly-job}, you can specify _influencers_, which are also
|
||||
sometimes referred to as _key fields_. Picking an influencer is strongly
|
||||
recommended for the following reasons:
|
||||
|
||||
* It allows you to more easily assign blame for the anomaly
|
||||
* It simplifies and aggregates the results
|
||||
|
||||
The best influencer is the person or thing that you want to blame for the
|
||||
anomaly. In many cases, users or client IP addresses make excellent influencers.
|
||||
Influencers can be any field in your data; they do not need to be fields that
|
||||
are specified in your detectors, though they often are.
|
||||
|
||||
As a best practice, do not pick too many influencers. For example, you generally
|
||||
do not need more than three. If you pick many influencers, the results can be
|
||||
overwhelming and there is a small overhead to the analysis.
|
||||
|
||||
The job creation wizards in {kib} can suggest which fields to use as influencers.
|
||||
|
||||
[[model-memory-limits]]
|
||||
==== Model memory limits
|
||||
|
||||
For each {anomaly-job}, you can optionally specify a `model_memory_limit`, which
|
||||
is the approximate maximum amount of memory resources that are required for
|
||||
analytical processing. The default value is 1 GB. Once this limit is approached,
|
||||
data pruning becomes more aggressive. Upon exceeding this limit, new entities
|
||||
are not modeled.
|
||||
|
||||
You can also optionally specify the `xpack.ml.max_model_memory_limit` setting.
|
||||
By default, it's not set, which means there is no upper bound on the acceptable
|
||||
`model_memory_limit` values in your jobs.
|
||||
|
||||
TIP: If you set the `model_memory_limit` too high, it will be impossible to open
|
||||
the job; jobs cannot be allocated to nodes that have insufficient memory to run
|
||||
them.
|
||||
|
||||
If the estimated model memory limit for an {anomaly-job} is greater than the
|
||||
model memory limit for the job or the maximum model memory limit for the cluster,
|
||||
the job creation wizards in {kib} generate a warning. If the estimated memory
|
||||
requirement is only a little higher than the `model_memory_limit`, the job will
|
||||
probably produce useful results. Otherwise, the actions you take to address
|
||||
these warnings vary depending on the resources available in your cluster:
|
||||
|
||||
* If you are using the default value for the `model_memory_limit` and the {ml}
|
||||
nodes in the cluster have lots of memory, the best course of action might be to
|
||||
simply increase the job's `model_memory_limit`. Before doing this, however,
|
||||
double-check that the chosen analysis makes sense. The default
|
||||
`model_memory_limit` is relatively low to avoid accidentally creating a job that
|
||||
uses a huge amount of memory.
|
||||
* If the {ml} nodes in the cluster do not have sufficient memory to accommodate
|
||||
a job of the estimated size, the only options are:
|
||||
** Add bigger {ml} nodes to the cluster, or
|
||||
** Accept that the job will hit its memory limit and will not necessarily find
|
||||
all the anomalies it could otherwise find.
|
||||
|
||||
If you are using {ece} or the hosted Elasticsearch Service on Elastic Cloud,
|
||||
`xpack.ml.max_model_memory_limit` is set to prevent you from creating jobs
|
||||
that cannot be allocated to any {ml} nodes in the cluster. If you find that you
|
||||
cannot increase `model_memory_limit` for your {ml} jobs, the solution is to
|
||||
increase the size of the {ml} nodes in your cluster.
|
||||
|
||||
For more information about the `model_memory_limit` property and the
|
||||
`xpack.ml.max_model_memory_limit` setting, see
|
||||
{ref}/ml-job-resource.html#ml-analysisconfig[Analysis limits] and
|
||||
{ref}/ml-settings.html[Machine learning settings].
|