elasticsearch/docs/reference/settings/inference-settings.asciidoc
David Kyle 3c1c8d0f32
[ML] Increase response size limit for batched requests (#110112)
Increase the default to 50MB and do not retry when the limit is exceeded
2024-06-26 10:31:06 +01:00

92 lines
4.4 KiB
Text

[role="xpack"]
[[inference-settings]]
=== Inference API settings in {es}
++++
<titleabbrev>Inference settings</titleabbrev>
++++
[[inference-settings-description]]
// tag::inference-settings-description-tag[]
You do not need to configure any settings to use the {infer} APIs. Each setting has a default.
// end::inference-settings-description-tag[]
[discrete]
[[xpack-inference-logging]]
// tag::inference-logging[]
==== Inference API logging settings
When certain failures occur, a log message is emitted. In the case of a
reoccurring failure the logging throttler restricts repeated messages from being logged.
`xpack.inference.logging.reset_interval`::
(<<cluster-update-settings,Dynamic>>) Specifies the interval for when a cleanup thread will clear an internal
cache of the previously logged messages. Defaults to one day (`1d`).
`xpack.inference.logging.wait_duration`::
(<<cluster-update-settings,Dynamic>>) Specifies the amount of time to wait after logging a message before that
message can be logged again. Defaults to one hour (`1h`).
// end::inference-logging[]
[[xpack-inference-http-settings]]
// tag::inference-http-settings[]
==== {infer-cap} API HTTP settings
`xpack.inference.http.max_response_size`::
(<<cluster-update-settings,Dynamic>>) Specifies the maximum size in bytes an HTTP response is allowed to have,
defaults to `50mb`, the maximum configurable value is `100mb`.
`xpack.inference.http.max_total_connections`::
(<<cluster-update-settings,Dynamic>>) Specifies the maximum number of connections the internal connection pool can
lease. Defaults to `50`.
`xpack.inference.http.max_route_connections`::
(<<cluster-update-settings,Dynamic>>) Specifies the maximum number of connections a single route can lease from
the internal connection pool. If this setting is set to a value equal to or greater than
`xpack.inference.http.max_total_connections`, then a single third party service could lease all available
connections and other third party services would be unable to lease connections. Defaults to `20`.
`xpack.inference.http.connection_eviction_interval`::
(<<cluster-update-settings,Dynamic>>) Specifies the interval that an eviction thread will run to remove expired and
stale connections from the internal connection pool. Decreasing this time value can help improve throughput if
multiple third party service are contending for the available connections in the pool. Defaults to one minute (`1m`).
`xpack.inference.http.connection_eviction_max_idle_time`::
(<<cluster-update-settings,Dynamic>>) Specifies the maximum duration a connection can be unused before it is marked as
idle and can be closed and removed from the shared connection pool. Defaults to one minute (`1m`).
`xpack.inference.http.request_executor.queue_capacity`::
(<<cluster-update-settings,Dynamic>>) Specifies the size of the internal queue for requests waiting to be sent. If
the queue is full and a request is sent to the {infer} API, it will be rejected. Defaults to `2000`.
[[xpack-inference-http-retry-settings]]
==== {infer-cap} API HTTP Retry settings
When a third-party service returns a transient failure code (for example, 429), the request is retried by the {infer}
API. These settings govern the retry behavior. When a request is retried, exponential backoff is used.
`xpack.inference.http.retry.initial_delay`::
(<<cluster-update-settings,Dynamic>>) Specifies the initial delay before retrying a request. Defaults to one second
(`1s`).
`xpack.inference.http.retry.max_delay_bound`::
(<<cluster-update-settings,Dynamic>>) Specifies the maximum delay for a request. Defaults to five seconds (`5s`).
`xpack.inference.http.retry.timeout`::
(<<cluster-update-settings,Dynamic>>) Specifies the maximum amount of time a request can be retried.
Once the request exceeds this time, the request will no longer be retried and a failure will be returned.
Defaults to 30 seconds (`30s`).
// end::inference-logging[]
[[xpack-inference-input-text]]
// tag::inference-input-text[]
==== {infer-cap} API Input text
For certain third-party service integrations, when the service returns an error indicating that the request
input was too large, the input will be truncated and the request is retried. These settings govern
how the truncation is performed.
`xpack.inference.truncator.reduction_percentage`::
(<<cluster-update-settings,Dynamic>>) Specifies the percentage to reduce the input text by if the 3rd party service
responds with an error indicating it is too long. Defaults to 50 percent (`0.5`).
// end::inference-input-text[]