mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
The inference node stats for deployed PyTorch inference models now contain two new fields: `inference_cache_hit_count` and `inference_cache_hit_count_last_minute`. These indicate how many inferences on that node were served from the C++-side response cache that was added in https://github.com/elastic/ml-cpp/pull/2305. Cache hits occur when exactly the same inference request is sent to the same node more than once. The `average_inference_time_ms` and `average_inference_time_ms_last_minute` fields now refer to the time taken to do the cache lookup, plus, if necessary, the time to do the inference. We would expect average inference time to be vastly reduced in situations where the cache hit rate is high. |
||
---|---|---|
.. | ||
delete-trained-models-aliases.asciidoc | ||
delete-trained-models.asciidoc | ||
get-trained-models-stats.asciidoc | ||
get-trained-models.asciidoc | ||
index.asciidoc | ||
infer-trained-model-deployment.asciidoc | ||
infer-trained-model.asciidoc | ||
ml-trained-models-apis.asciidoc | ||
put-trained-model-definition-part.asciidoc | ||
put-trained-model-vocabulary.asciidoc | ||
put-trained-models-aliases.asciidoc | ||
put-trained-models.asciidoc | ||
start-trained-model-deployment.asciidoc | ||
stop-trained-model-deployment.asciidoc |