elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 09:54:06 -04:00

History

David Roberts 15e7b06b79 [ML] Add inference cache hit count to inference node stats (#88807 ) The inference node stats for deployed PyTorch inference models now contain two new fields: `inference_cache_hit_count` and `inference_cache_hit_count_last_minute`. These indicate how many inferences on that node were served from the C++-side response cache that was added in https://github.com/elastic/ml-cpp/pull/2305. Cache hits occur when exactly the same inference request is sent to the same node more than once. The `average_inference_time_ms` and `average_inference_time_ms_last_minute` fields now refer to the time taken to do the cache lookup, plus, if necessary, the time to do the inference. We would expect average inference time to be vastly reduced in situations where the cache hit rate is high.	2022-07-26 17:53:43 +01:00
..
apis	[ML] Add inference cache hit count to inference node stats (#88807 )	2022-07-26 17:53:43 +01:00

[ML] Add inference cache hit count to inference node stats (#88807 )

The inference node stats for deployed PyTorch inference
models now contain two new fields: `inference_cache_hit_count`
and `inference_cache_hit_count_last_minute`.

These indicate how many inferences on that node were served
from the C++-side response cache that was added in
https://github.com/elastic/ml-cpp/pull/2305. Cache hits
occur when exactly the same inference request is sent to the
same node more than once.

The `average_inference_time_ms` and
`average_inference_time_ms_last_minute` fields now refer to
the time taken to do the cache lookup, plus, if necessary,
the time to do the inference. We would expect average inference
time to be vastly reduced in situations where the cache hit
rate is high.

2022-07-26 17:53:43 +01:00

apis

[ML] Add inference cache hit count to inference node stats (#88807 )

2022-07-26 17:53:43 +01:00