[DOCS] Move machine learning details out of Kibana Guide (#45855) (#45959)

2025-04-23 09:19:04 -04:00 · 2019-09-17 14:44:55 -07:00 · 2019-09-17 14:44:55 -07:00 · fa968cc6d1
commit fa968cc6d1
parent 745b0ba0a6
14 changed files with 31 additions and 269 deletions
--- a/docs/redirects.asciidoc
+++ b/docs/redirects.asciidoc
@ -35,3 +35,18 @@ This page has moved. Please see <<infra-configure-source>>.

 This page has moved. Please see <<xpack-logs-configuring>>.

+[role="exclude",id="creating-df-kib"]
+== Creating {transforms}
+
+This page is deleted. Please see
+{stack-ov}/ecommerce-dataframes.html[Transforming the eCommerce sample data].
+
+[role="exclude",id="ml-jobs"]
+== Creating {anomaly-jobs}
+
+This page has moved. Please see {stack-ov}/create-jobs.html[Creating {anomaly-jobs}].
+
+[role="exclude",id="job-tips"]
+== Machine learning job tips
+
+This page has moved. Please see {stack-ov}/job-tips.html[Machine learning job tips].
--- a/docs/user/extend.asciidoc
+++ b/docs/user/extend.asciidoc
@ -0,0 +1,12 @@
+[[extend]]
+= Extend your use case
+
+[partintro]
+--
+//TBD
+
+* <<xpack-ml>>
+
+--
+
+include::ml/index.asciidoc[]
--- a/docs/user/index.asciidoc
+++ b/docs/user/index.asciidoc
@ -16,7 +16,7 @@ include::dashboard.asciidoc[]

 include::canvas.asciidoc[]

-include::ml/index.asciidoc[]
+include::extend.asciidoc[]

 include::{kib-repo-dir}/maps/index.asciidoc[]

--- a/docs/user/ml/creating-df-kib.asciidoc
+++ b/docs/user/ml/creating-df-kib.asciidoc
@ -1,50 +0,0 @@
-[role="xpack"]
-[[creating-df-kib]]
-== Creating {dataframe-transforms}
-
-beta[]
-
-You can create {stack-ov}/ml-dataframes.html[{dataframe-transforms}] in the 
-{kib} Machine Learning application.
-
-[role="screenshot"]
-image::user/ml/images/ml-definepivot.jpg["Defining a {dataframe} pivot"]
-
-Select the index pattern or saved search you want to transform. To pivot your 
-data, you must group the data by at least one field and apply at least one 
-aggregation. The {dataframe} pivot preview on the right side provides a visual 
-verification.
-
-Once you have created the pivot, add a job ID and define the index for the 
-transformed data (_target index_). If the target index does not exist, it will be 
-created automatically. You can optionally select to create a {kib} index pattern 
-for the target index. At the end of the process, a {dataframe} job is created as 
-a result. 
-
-[role="screenshot"]
-image::user/ml/images/ml-jobid.jpg["Job ID and target index"]
-
-After you create {dataframe} jobs, you can start, stop, and delete them 
-and explore their progress and statistics from the jobs list.
-
-For a more detailed example of using {dataframes} with the {kib} sample data,
-see {stack-ov}/ecommerce-dataframes.html[Transforming your data].
-
-[NOTE]
-===============================
-If {stack} {security-features} are enabled, you must have appropriate authority
-to work with {dataframes}. For example, there are built-in
-`data_frame_transforms_admin` and `data_frame_transforms_user` roles that have
-`manage_data_frame_transforms` and `monitor_data_frame_transforms` cluster
-privileges respectively. See
-{stack-ov}/built-in-roles.html[Built-in roles] and
-{stack-ov}/security-privileges.html[Security privileges].
-
-Depending on what tasks you perform, you might require additional privileges.
-For example, to create a {dataframe-transform} and generate a new target index,
-you need `manage_data_frame_transforms` cluster privileges, `read` and
-`view_index_metadata` privileges on the source index, and `read`, `create_index`,
-and `index` privileges on the target index. For more information, see the
-authorization details for each {ref}/data-frame-apis.html[{dataframe} API].
-
-===============================
--- a/docs/user/ml/creating-jobs.asciidoc
+++ b/docs/user/ml/creating-jobs.asciidoc
@ -1,83 +0,0 @@
-[role="xpack"]
-[[ml-jobs]]
-== Creating {anomaly-jobs}
-
-{anomaly-jobs-cap} contain the configuration information and metadata
-necessary to perform an analytics task.
-
-{kib} provides the following wizards to make it easier to create jobs:
-
-[role="screenshot"]
-image::user/ml/images/ml-create-job.jpg[Create New Job]
-
-A _single metric job_ is a simple job that contains a single _detector_. A
-detector defines the type of analysis that will occur and which fields to
-analyze. In addition to limiting the number of detectors, the single metric job
-creation wizard omits many of the more advanced configuration options.
-
-A _multi-metric job_ can contain more than one detector, which is more efficient
-than running multiple jobs against the same data.
-
-A _population job_ detects activity that is unusual compared to the behavior of
-the population. For more information, see
-{stack-ov}/ml-configuring-pop.html[Performing population analysis].
-
-An _advanced job_ can contain multiple detectors and enables you to configure all
-job settings.
-
-{kib} can also recognize certain types of data and provide specialized wizards
-for that context. For example, if you
-<<add-sample-data,added the sample web log data set>>, the following wizard
-appears:
-
-[role="screenshot"]
-image::user/ml/images/ml-data-recognizer-sample.jpg[A screenshot of the {kib} sample data web log job creation wizard]
-
-TIP: Alternatively, after you load a sample data set on the {kib} home page, you can click *View data* > *ML jobs*. There are {anomaly-jobs} for both the sample eCommerce orders data set and the sample web logs data set.
-
-If you use {filebeat-ref}/index.html[{filebeat}]
-to ship access logs from your
-http://nginx.org/[Nginx] and https://httpd.apache.org/[Apache] HTTP servers to
-{es} and store it using fields and datatypes from the
-{ecs-ref}/ecs-reference.html[Elastic Common Schema (ECS)], the following wizards
-appear:
-
-[role="screenshot"]
-image::user/ml/images/ml-data-recognizer-filebeat.jpg[A screenshot of the {filebeat} job creation wizards]
-
-If you use {auditbeat-ref}/index.html[{auditbeat}] to audit process
-activity on your systems, the following wizards appear:
-
-[role="screenshot"]
-image::user/ml/images/ml-data-recognizer-auditbeat.jpg[A screenshot of the {auditbeat} job creation wizards]
-
-Likewise, if you use the {metricbeat-ref}/metricbeat-module-system.html[{metricbeat} system module] to monitor your servers, the following
-wizards appear:
-
-[role="screenshot"]
-image::user/ml/images/ml-data-recognizer-metricbeat.jpg[A screenshot of the {metricbeat} job creation wizards]
-
-These wizards create {anomaly-jobs}, dashboards, searches, and visualizations 
-that are customized to help you analyze your {auditbeat}, {filebeat}, and
-{metricbeat} data.
-
-[NOTE] 
-===============================
-If your data is located outside of {es}, you cannot use {kib} to create
-your jobs and you cannot use {dfeeds} to retrieve your data in real time.
-{anomal-detect-cap} is still possible, however, by using APIs to
-create and manage jobs and post data to them. For more information, see
-{ref}/ml-apis.html[{ml-cap} {anomaly-detect} APIs].
-===============================
-
-////
-Ready to get some hands-on experience? See
-{stack-ov}/ml-getting-started.html[Getting Started with Machine Learning].
-
-The following video tutorials also demonstrate single metric, multi-metric, and
-advanced jobs:
-
-* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
-* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
-* https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning for the Elastic Stack: Detect Outliers in a Population]
-////
--- a/docs/user/ml/images/ml-create-job.jpg
+++ b/docs/user/ml/images/ml-create-job.jpg
--- a/docs/user/ml/images/ml-data-recognizer-auditbeat.jpg
+++ b/docs/user/ml/images/ml-data-recognizer-auditbeat.jpg
--- a/docs/user/ml/images/ml-data-recognizer-filebeat.jpg
+++ b/docs/user/ml/images/ml-data-recognizer-filebeat.jpg
--- a/docs/user/ml/images/ml-data-recognizer-metricbeat.jpg
+++ b/docs/user/ml/images/ml-data-recognizer-metricbeat.jpg
--- a/docs/user/ml/images/ml-data-recognizer-sample.jpg
+++ b/docs/user/ml/images/ml-data-recognizer-sample.jpg
--- a/docs/user/ml/images/ml-definepivot.jpg
+++ b/docs/user/ml/images/ml-definepivot.jpg
--- a/docs/user/ml/images/ml-jobid.jpg
+++ b/docs/user/ml/images/ml-jobid.jpg
--- a/docs/user/ml/index.asciidoc
+++ b/docs/user/ml/index.asciidoc
@ -1,9 +1,6 @@
 [role="xpack"]
 [[xpack-ml]]
-= {ml-cap}
-
-[partintro]
--
+== {ml-cap}

 As datasets increase in size and complexity, the human effort required to
 inspect dashboards or maintain rules for spotting infrastructure problems,
@ -29,7 +26,7 @@ size). The *Data Visualizer* identifies the file format and field mappings. You
 can then optionally import that data into an {es} index.  

 If you have a trial or platinum license, you can 
-<<ml-jobs,create {anomaly-jobs}>> and manage jobs and {dfeeds} from the *Job 
+create {anomaly-jobs} and manage jobs and {dfeeds} from the *Job 
 Management* pane: 

 [role="screenshot"]
@ -67,11 +64,6 @@ browser so that it does not block pop-up windows or create an exception for your
 {kib} URL.

 For more information about the {anomaly-detect} feature, see
+https://www.elastic.co/what-is/elastic-stack-machine-learning and
 {stack-ov}/xpack-ml.html[{ml-cap} {anomaly-detect}].

--
-
-include::creating-jobs.asciidoc[]
-include::job-tips.asciidoc[]
-include::creating-df-kib.asciidoc[]
-
--- a/docs/user/ml/job-tips.asciidoc
+++ b/docs/user/ml/job-tips.asciidoc
@ -1,124 +0,0 @@
-[role="xpack"]
-[[job-tips]]
-=== Machine learning job tips
-++++
-<titleabbrev>Job tips</titleabbrev>
-++++
-
-When you create an {anomaly-job} in {kib}, the job creation wizards can
-provide advice based on the characteristics of your data. By heeding these
-suggestions, you can create jobs that are more likely to produce insightful {ml}
-results.
-
-[[bucket-span]]
-==== Bucket span
-
-The bucket span is the time interval that {ml} analytics use to summarize and
-model data for your job. When you create an {anomaly-job} in {kib}, you can
-choose to estimate a bucket span value based on your data characteristics. 
-
-NOTE: The bucket span must contain a valid time interval. For more information, 
-see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].
-
-If you choose a value that is larger than one day or is significantly different 
-than the estimated value, you receive an informational message. For more 
-information about choosing an appropriate bucket span, see 
-{stack-ov}/ml-buckets.html[Buckets].
-
-[[cardinality]]
-==== Cardinality
-
-If there are logical groupings of related entities in your data, {ml} analytics
-can make data models and generate results that take these groupings into
-consideration. For example, you might choose to split your data by user ID and
-detect when users are accessing resources differently than they usually do.
-
-If the field that you use to split your data has many different values, the
-job uses more memory resources. In particular, if the cardinality of the
-`by_field_name`, `over_field_name`, or `partition_field_name` is greater than 
-1000, you are advised that there might be high memory usage. 
-
-Likewise if you are performing population analysis and the cardinality of the
-`over_field_name` is below 10, you are advised that this might not be a suitable
-field to use. For more information, see
-{stack-ov}/ml-configuring-pop.html[Performing Population Analysis].
-
-[[detectors]]
-==== Detectors
-
-Each {anomaly-job} must have one or more _detectors_. A detector applies an
-analytical function to specific fields in your data. If your job does not
-contain a  detector or the detector does not contain a 
-{stack-ov}/ml-functions.html[valid function], you receive an error.
-
-If a job contains duplicate detectors, you also receive an error. Detectors are 
-duplicates if they have the same `function`, `field_name`, `by_field_name`, 
-`over_field_name` and `partition_field_name`. 
-
-[[influencers]]
-==== Influencers
-
-When you create an {anomaly-job}, you can specify _influencers_, which are also 
-sometimes referred to as _key fields_. Picking an influencer is strongly
-recommended for the following reasons:
-
-* It allows you to more easily assign blame for the anomaly
-* It simplifies and aggregates the results
-
-The best influencer is the person or thing that you want to blame for the
-anomaly. In many cases, users or client IP addresses make excellent influencers.
-Influencers can be any field in your data; they do not need to be fields that
-are specified in your detectors, though they often are.
-
-As a best practice, do not pick too many influencers. For example, you generally
-do not need more than three. If you pick many influencers, the results can be
-overwhelming and there is a small overhead to the analysis.
-
-The job creation wizards in {kib} can suggest which fields to use as influencers.
-
-[[model-memory-limits]]
-==== Model memory limits
-
-For each {anomaly-job}, you can optionally specify a `model_memory_limit`, which
-is the approximate maximum amount of memory resources that are required for
-analytical processing. The default value is 1 GB. Once this limit is approached,
-data pruning becomes more aggressive. Upon exceeding this limit, new entities
-are not modeled. 
-
-You can also optionally specify the `xpack.ml.max_model_memory_limit` setting. 
-By default, it's not set, which means there is no upper bound on the acceptable 
-`model_memory_limit` values in your jobs. 
-
-TIP: If you set the `model_memory_limit` too high, it will be impossible to open 
-the job; jobs cannot be allocated to nodes that have insufficient memory to run 
-them.
-
-If the estimated model memory limit for an {anomaly-job} is greater than the
-model memory limit for the job or the maximum model memory limit for the cluster,
-the job creation wizards in {kib} generate a warning. If the estimated memory 
-requirement is only a little higher than the `model_memory_limit`, the job will 
-probably produce useful results. Otherwise, the actions you take to address 
-these warnings vary depending on the resources available in your cluster:
-
-* If you are using the default value for the `model_memory_limit` and the {ml} 
-nodes in the cluster have lots of memory, the best course of action might be to 
-simply increase the job's `model_memory_limit`. Before doing this, however, 
-double-check that the chosen analysis makes sense. The default 
-`model_memory_limit` is relatively low to avoid accidentally creating a job that 
-uses a huge amount of memory.
-* If the {ml} nodes in the cluster do not have sufficient memory to accommodate 
-a job of the estimated size, the only options are:
-** Add bigger {ml} nodes to the cluster, or 
-** Accept that the job will hit its memory limit and will not necessarily find 
-all the anomalies it could otherwise find.
-
-If you are using {ece} or the hosted Elasticsearch Service on Elastic Cloud, 
-`xpack.ml.max_model_memory_limit` is set to prevent you from creating jobs 
-that cannot be allocated to any {ml} nodes in the cluster. If you find that you 
-cannot increase `model_memory_limit` for your {ml} jobs, the solution is to 
-increase the size of the {ml} nodes in your cluster.
-
-For more information about the `model_memory_limit` property and the 
-`xpack.ml.max_model_memory_limit` setting, see 
-{ref}/ml-job-resource.html#ml-analysisconfig[Analysis limits] and 
-{ref}/ml-settings.html[Machine learning settings].