mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-26 16:17:19 -04:00
Adds an API that can be used to find out how much memory ML is permitted to use and is currently using on each node, both within the JVM heap, and natively, outside of the JVM.
310 lines
8.4 KiB
Text
310 lines
8.4 KiB
Text
[role="xpack"]
|
|
[[get-ml-memory]]
|
|
= Get machine learning memory stats API
|
|
|
|
[subs="attributes"]
|
|
++++
|
|
<titleabbrev>Get {ml} memory stats</titleabbrev>
|
|
++++
|
|
|
|
Returns information on how {ml} is using memory.
|
|
|
|
[[get-ml-memory-request]]
|
|
== {api-request-title}
|
|
|
|
`GET _ml/memory/_stats` +
|
|
`GET _ml/memory/<node_id>/_stats`
|
|
|
|
[[get-ml-memory-prereqs]]
|
|
== {api-prereq-title}
|
|
|
|
Requires the `monitor_ml` cluster privilege. This privilege is included in the
|
|
`machine_learning_user` built-in role.
|
|
|
|
[[get-ml-memory-desc]]
|
|
== {api-description-title}
|
|
|
|
Get information about how {ml} jobs and trained models are using memory, on each
|
|
node, both within the JVM heap, and natively, outside of the JVM.
|
|
|
|
[[get-ml-memory-path-params]]
|
|
== {api-path-parms-title}
|
|
|
|
`<node_id>`::
|
|
(Optional, string) The names of particular nodes in the cluster to target.
|
|
For example, `nodeId1,nodeId2` or `ml:true`. For node selection options,
|
|
see <<cluster-nodes>>.
|
|
|
|
[[get-ml-memory-query-parms]]
|
|
== {api-query-parms-title}
|
|
|
|
`human`::
|
|
Specify this query parameter to include the fields with units in the response.
|
|
Otherwise only the `_in_bytes` sizes are returned in the response.
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=timeoutparms]
|
|
|
|
[role="child_attributes"]
|
|
[[get-ml-memory-response-body]]
|
|
== {api-response-body-title}
|
|
|
|
`_nodes`::
|
|
(object)
|
|
Contains statistics about the number of nodes selected by the request.
|
|
+
|
|
.Properties of `_nodes`
|
|
[%collapsible%open]
|
|
====
|
|
`failed`::
|
|
(integer)
|
|
Number of nodes that rejected the request or failed to respond. If this value
|
|
is not `0`, a reason for the rejection or failure is included in the response.
|
|
|
|
`successful`::
|
|
(integer)
|
|
Number of nodes that responded successfully to the request.
|
|
|
|
`total`::
|
|
(integer)
|
|
Total number of nodes selected by the request.
|
|
====
|
|
|
|
`cluster_name`::
|
|
(string)
|
|
Name of the cluster. Based on the <<cluster-name,cluster.name>> setting.
|
|
|
|
`nodes`::
|
|
(object)
|
|
Contains statistics for the nodes selected by the request.
|
|
+
|
|
.Properties of `nodes`
|
|
[%collapsible%open]
|
|
====
|
|
`<node_id>`::
|
|
(object)
|
|
Contains statistics for the node.
|
|
+
|
|
.Properties of `<node_id>`
|
|
[%collapsible%open]
|
|
=====
|
|
`attributes`::
|
|
(object)
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-attributes]
|
|
|
|
`ephemeral_id`::
|
|
(string)
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-ephemeral-id]
|
|
|
|
`jvm`::
|
|
(object)
|
|
Contains Java Virtual Machine (JVM) statistics for the node.
|
|
+
|
|
.Properties of `jvm`
|
|
[%collapsible%open]
|
|
======
|
|
`heap_max`::
|
|
(<<byte-units,byte value>>)
|
|
Maximum amount of memory available for use by the heap.
|
|
|
|
`heap_max_in_bytes`::
|
|
(integer)
|
|
Maximum amount of memory, in bytes, available for use by the heap.
|
|
|
|
`java_inference`::
|
|
(<<byte-units,byte value>>)
|
|
Amount of Java heap currently being used for caching inference models.
|
|
|
|
`java_inference_in_bytes`::
|
|
(integer)
|
|
Amount of Java heap, in bytes, currently being used for caching inference models.
|
|
|
|
`java_inference_max`::
|
|
(<<byte-units,byte value>>)
|
|
Maximum amount of Java heap to be used for caching inference models.
|
|
|
|
`java_inference_max_in_bytes`::
|
|
(integer)
|
|
Maximum amount of Java heap, in bytes, to be used for caching inference models.
|
|
======
|
|
|
|
`mem`::
|
|
(object)
|
|
Contains statistics about memory usage for the node.
|
|
+
|
|
.Properties of `mem`
|
|
[%collapsible%open]
|
|
======
|
|
`adjusted_total`::
|
|
(<<byte-units,byte value>>)
|
|
If the amount of physical memory has been overridden using the `es.total_memory_bytes`
|
|
system property then this reports the overridden value. Otherwise it reports the same
|
|
value as `total`.
|
|
|
|
`adjusted_total_in_bytes`::
|
|
(integer)
|
|
If the amount of physical memory has been overridden using the `es.total_memory_bytes`
|
|
system property then this reports the overridden value in bytes. Otherwise it reports
|
|
the same value as `total_in_bytes`.
|
|
|
|
`ml`::
|
|
(object)
|
|
Contains statistics about {ml} use of native memory on the node.
|
|
+
|
|
.Properties of `ml`
|
|
[%collapsible%open]
|
|
=======
|
|
`anomaly_detectors`::
|
|
(<<byte-units,byte value>>)
|
|
Amount of native memory set aside for {anomaly-jobs}.
|
|
|
|
`anomaly_detectors_in_bytes`::
|
|
(integer)
|
|
Amount of native memory, in bytes, set aside for {anomaly-jobs}.
|
|
|
|
`data_frame_analytics`::
|
|
(<<byte-units,byte value>>)
|
|
Amount of native memory set aside for {dfanalytics-jobs}.
|
|
|
|
`data_frame_analytics_in_bytes`::
|
|
(integer)
|
|
Amount of native memory, in bytes, set aside for {dfanalytics-jobs}.
|
|
|
|
`max`::
|
|
(<<byte-units,byte value>>)
|
|
Maximum amount of native memory (separate to the JVM heap) that may be used by {ml}
|
|
native processes.
|
|
|
|
`max_in_bytes`::
|
|
(integer)
|
|
Maximum amount of native memory (separate to the JVM heap), in bytes, that may be
|
|
used by {ml} native processes.
|
|
|
|
`native_code_overhead`::
|
|
(<<byte-units,byte value>>)
|
|
Amount of native memory set aside for loading {ml} native code shared libraries.
|
|
|
|
`native_code_overhead_in_bytes`::
|
|
(integer)
|
|
Amount of native memory, in bytes, set aside for loading {ml} native code shared libraries.
|
|
|
|
`native_inference`::
|
|
(<<byte-units,byte value>>)
|
|
Amount of native memory set aside for trained models that have a PyTorch `model_type`.
|
|
|
|
`native_inference_in_bytes`::
|
|
(integer)
|
|
Amount of native memory, in bytes, set aside for trained models that have a PyTorch `model_type`.
|
|
=======
|
|
|
|
`total`::
|
|
(<<byte-units,byte value>>)
|
|
Total amount of physical memory.
|
|
|
|
`total_in_bytes`::
|
|
(integer)
|
|
Total amount of physical memory in bytes.
|
|
|
|
======
|
|
|
|
`name`::
|
|
(string)
|
|
Human-readable identifier for the node. Based on the <<node-name>> setting.
|
|
|
|
`roles`::
|
|
(array of strings)
|
|
Roles assigned to the node. See <<modules-node>>.
|
|
|
|
`transport_address`::
|
|
(string)
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-transport-address]
|
|
|
|
=====
|
|
====
|
|
|
|
[[get-ml-memory-example]]
|
|
== {api-examples-title}
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET _ml/memory/_stats?human
|
|
--------------------------------------------------
|
|
// TEST[setup:node]
|
|
|
|
This is a possible response:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"_nodes": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"failed": 0
|
|
},
|
|
"cluster_name": "my_cluster",
|
|
"nodes": {
|
|
"pQHNt5rXTTWNvUgOrdynKg": {
|
|
"name": "node-0",
|
|
"ephemeral_id": "ITZ6WGZnSqqeT_unfit2SQ",
|
|
"transport_address": "127.0.0.1:9300",
|
|
"attributes": {
|
|
"ml.machine_memory": "68719476736",
|
|
"ml.max_jvm_size": "536870912"
|
|
},
|
|
"roles": [
|
|
"data",
|
|
"data_cold",
|
|
"data_content",
|
|
"data_frozen",
|
|
"data_hot",
|
|
"data_warm",
|
|
"ingest",
|
|
"master",
|
|
"ml",
|
|
"remote_cluster_client",
|
|
"transform"
|
|
],
|
|
"mem": {
|
|
"total": "64gb",
|
|
"total_in_bytes": 68719476736,
|
|
"adjusted_total": "64gb",
|
|
"adjusted_total_in_bytes": 68719476736,
|
|
"ml": {
|
|
"max": "19.1gb",
|
|
"max_in_bytes": 20615843020,
|
|
"native_code_overhead": "0b",
|
|
"native_code_overhead_in_bytes": 0,
|
|
"anomaly_detectors": "0b",
|
|
"anomaly_detectors_in_bytes": 0,
|
|
"data_frame_analytics": "0b",
|
|
"data_frame_analytics_in_bytes": 0,
|
|
"native_inference": "0b",
|
|
"native_inference_in_bytes": 0
|
|
}
|
|
},
|
|
"jvm": {
|
|
"heap_max": "512mb",
|
|
"heap_max_in_bytes": 536870912,
|
|
"java_inference_max": "204.7mb",
|
|
"java_inference_max_in_bytes": 214748364,
|
|
"java_inference": "0b",
|
|
"java_inference_in_bytes": 0
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"cluster_name": "my_cluster"/"cluster_name": $body.cluster_name/]
|
|
// TESTRESPONSE[s/"pQHNt5rXTTWNvUgOrdynKg"/\$node_name/]
|
|
// TESTRESPONSE[s/"ephemeral_id": "ITZ6WGZnSqqeT_unfit2SQ"/"ephemeral_id": "$body.$_path"/]
|
|
// TESTRESPONSE[s/"transport_address": "127.0.0.1:9300"/"transport_address": "$body.$_path"/]
|
|
// TESTRESPONSE[s/"attributes": \{[^\}]*\}/"attributes": $body.$_path/]
|
|
// TESTRESPONSE[s/"total": "64gb"/"total": "$body.$_path"/]
|
|
// TESTRESPONSE[s/"total_in_bytes": 68719476736/"total_in_bytes": $body.$_path/]
|
|
// TESTRESPONSE[s/"adjusted_total": "64gb"/"adjusted_total": "$body.$_path"/]
|
|
// TESTRESPONSE[s/"adjusted_total_in_bytes": 68719476736/"adjusted_total_in_bytes": $body.$_path/]
|
|
// TESTRESPONSE[s/"max": "19.1gb"/"max": "$body.$_path"/]
|
|
// TESTRESPONSE[s/"max_in_bytes": 20615843020/"max_in_bytes": $body.$_path/]
|
|
// TESTRESPONSE[s/"heap_max": "512mb"/"heap_max": "$body.$_path"/]
|
|
// TESTRESPONSE[s/"heap_max_in_bytes": 536870912/"heap_max_in_bytes": $body.$_path/]
|
|
// TESTRESPONSE[s/"java_inference_max": "204.7mb"/"java_inference_max": "$body.$_path"/]
|
|
// TESTRESPONSE[s/"java_inference_max_in_bytes": 214748364/"java_inference_max_in_bytes": $body.$_path/]
|