mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-30 10:23:41 -04:00
There is little evidence of this endpoint being used and there is quite a lot of code complexity associated with the various formats that can be used to upload data and the different errors that can occur when direct data upload is open to end users. In a future release we can make this endpoint internal so that only datafeeds can use it, and remove all the options and formats that are not used by datafeeds. End users will have to store their input data for anomaly detection in Elasticsearch indices (which we believe all do today) and use a datafeed to feed it to anomaly detection jobs. Backport of #66347
112 lines
3.5 KiB
Text
112 lines
3.5 KiB
Text
[role="xpack"]
|
|
[testenv="platinum"]
|
|
[[ml-post-data]]
|
|
= Post data to jobs API
|
|
++++
|
|
<titleabbrev>Post data to jobs</titleabbrev>
|
|
++++
|
|
|
|
deprecated::[7.11.0, "Posting data directly to anomaly detection jobs is deprecated, in a future major version a <<ml-api-datafeed-endpoint,{dfeed}>> will be required."]
|
|
|
|
Sends data to an anomaly detection job for analysis.
|
|
|
|
[[ml-post-data-request]]
|
|
== {api-request-title}
|
|
|
|
`POST _ml/anomaly_detectors/<job_id>/_data`
|
|
|
|
[[ml-post-data-prereqs]]
|
|
== {api-prereq-title}
|
|
|
|
* If the {es} {security-features} are enabled, you must have `manage_ml` or
|
|
`manage` cluster privileges to use this API. See
|
|
<<security-privileges>> and {ml-docs-setup-privileges}.
|
|
|
|
[[ml-post-data-desc]]
|
|
== {api-description-title}
|
|
|
|
The job must have a state of `open` to receive and process the data.
|
|
|
|
The data that you send to the job must use the JSON format. Multiple JSON
|
|
documents can be sent, either adjacent with no separator in between them or
|
|
whitespace separated. Newline delimited JSON (NDJSON) is a possible whitespace
|
|
separated format, and for this the `Content-Type` header should be set to
|
|
`application/x-ndjson`.
|
|
|
|
Upload sizes are limited to the Elasticsearch HTTP receive buffer size
|
|
(default 100 Mb). If your data is larger, split it into multiple chunks
|
|
and upload each one separately in sequential time order. When running in
|
|
real time, it is generally recommended that you perform many small uploads,
|
|
rather than queueing data to upload larger files.
|
|
|
|
When uploading data, check the job data counts for progress.
|
|
The following documents will not be processed:
|
|
|
|
* Documents not in chronological order and outside the latency window
|
|
* Records with an invalid timestamp
|
|
|
|
IMPORTANT: For each job, data can only be accepted from a single connection at
|
|
a time. It is not currently possible to post data to multiple jobs using wildcards
|
|
or a comma-separated list.
|
|
|
|
[[ml-post-data-path-parms]]
|
|
== {api-path-parms-title}
|
|
|
|
`<job_id>`::
|
|
(Required, string)
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
|
|
|
[[ml-post-data-query-parms]]
|
|
== {api-query-parms-title}
|
|
|
|
`reset_start`::
|
|
(Optional, string) Specifies the start of the bucket resetting range.
|
|
|
|
`reset_end`::
|
|
(Optional, string) Specifies the end of the bucket resetting range.
|
|
|
|
[[ml-post-data-request-body]]
|
|
== {api-request-body-title}
|
|
|
|
A sequence of one or more JSON documents containing the data to be analyzed.
|
|
Only whitespace characters are permitted in between the documents.
|
|
|
|
[[ml-post-data-example]]
|
|
== {api-examples-title}
|
|
|
|
The following example posts data from the `it_ops_new_kpi.json` file to the
|
|
`it_ops_new_kpi` job:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
$ curl -s -H "Content-type: application/json"
|
|
-X POST http:\/\/localhost:9200/_ml/anomaly_detectors/it_ops_new_kpi/_data
|
|
--data-binary @it_ops_new_kpi.json
|
|
--------------------------------------------------
|
|
|
|
When the data is sent, you receive information about the operational progress of
|
|
the job. For example:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"job_id":"it_ops_new_kpi",
|
|
"processed_record_count":21435,
|
|
"processed_field_count":64305,
|
|
"input_bytes":2589063,
|
|
"input_field_count":85740,
|
|
"invalid_date_count":0,
|
|
"missing_field_count":0,
|
|
"out_of_order_timestamp_count":0,
|
|
"empty_bucket_count":16,
|
|
"sparse_bucket_count":0,
|
|
"bucket_count":2165,
|
|
"earliest_record_timestamp":1454020569000,
|
|
"latest_record_timestamp":1455318669000,
|
|
"last_data_time":1491952300658,
|
|
"latest_empty_bucket_timestamp":1454541600000,
|
|
"input_record_count":21435
|
|
}
|
|
----
|
|
|
|
For more information about these properties, see <<ml-get-job-stats-results>>.
|