mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-29 01:44:36 -04:00
92 lines
4.4 KiB
Text
92 lines
4.4 KiB
Text
|
|
[role="xpack"]
|
|
[[inference-settings]]
|
|
=== Inference API settings in {es}
|
|
++++
|
|
<titleabbrev>Inference settings</titleabbrev>
|
|
++++
|
|
|
|
[[inference-settings-description]]
|
|
// tag::inference-settings-description-tag[]
|
|
You do not need to configure any settings to use the {infer} APIs. Each setting has a default.
|
|
// end::inference-settings-description-tag[]
|
|
|
|
[discrete]
|
|
[[xpack-inference-logging]]
|
|
// tag::inference-logging[]
|
|
==== Inference API logging settings
|
|
|
|
When certain failures occur, a log message is emitted. In the case of a
|
|
reoccurring failure the logging throttler restricts repeated messages from being logged.
|
|
|
|
`xpack.inference.logging.reset_interval`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the interval for when a cleanup thread will clear an internal
|
|
cache of the previously logged messages. Defaults to one day (`1d`).
|
|
|
|
`xpack.inference.logging.wait_duration`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the amount of time to wait after logging a message before that
|
|
message can be logged again. Defaults to one hour (`1h`).
|
|
// end::inference-logging[]
|
|
|
|
[[xpack-inference-http-settings]]
|
|
// tag::inference-http-settings[]
|
|
==== {infer-cap} API HTTP settings
|
|
|
|
`xpack.inference.http.max_response_size`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the maximum size in bytes an HTTP response is allowed to have,
|
|
defaults to `50mb`, the maximum configurable value is `100mb`.
|
|
|
|
`xpack.inference.http.max_total_connections`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the maximum number of connections the internal connection pool can
|
|
lease. Defaults to `50`.
|
|
|
|
`xpack.inference.http.max_route_connections`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the maximum number of connections a single route can lease from
|
|
the internal connection pool. If this setting is set to a value equal to or greater than
|
|
`xpack.inference.http.max_total_connections`, then a single third party service could lease all available
|
|
connections and other third party services would be unable to lease connections. Defaults to `20`.
|
|
|
|
`xpack.inference.http.connection_eviction_interval`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the interval that an eviction thread will run to remove expired and
|
|
stale connections from the internal connection pool. Decreasing this time value can help improve throughput if
|
|
multiple third party service are contending for the available connections in the pool. Defaults to one minute (`1m`).
|
|
|
|
`xpack.inference.http.connection_eviction_max_idle_time`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the maximum duration a connection can be unused before it is marked as
|
|
idle and can be closed and removed from the shared connection pool. Defaults to one minute (`1m`).
|
|
|
|
`xpack.inference.http.request_executor.queue_capacity`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the size of the internal queue for requests waiting to be sent. If
|
|
the queue is full and a request is sent to the {infer} API, it will be rejected. Defaults to `2000`.
|
|
|
|
[[xpack-inference-http-retry-settings]]
|
|
==== {infer-cap} API HTTP Retry settings
|
|
|
|
When a third-party service returns a transient failure code (for example, 429), the request is retried by the {infer}
|
|
API. These settings govern the retry behavior. When a request is retried, exponential backoff is used.
|
|
|
|
`xpack.inference.http.retry.initial_delay`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the initial delay before retrying a request. Defaults to one second
|
|
(`1s`).
|
|
|
|
`xpack.inference.http.retry.max_delay_bound`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the maximum delay for a request. Defaults to five seconds (`5s`).
|
|
|
|
`xpack.inference.http.retry.timeout`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the maximum amount of time a request can be retried.
|
|
Once the request exceeds this time, the request will no longer be retried and a failure will be returned.
|
|
Defaults to 30 seconds (`30s`).
|
|
// end::inference-logging[]
|
|
|
|
[[xpack-inference-input-text]]
|
|
// tag::inference-input-text[]
|
|
==== {infer-cap} API Input text
|
|
|
|
For certain third-party service integrations, when the service returns an error indicating that the request
|
|
input was too large, the input will be truncated and the request is retried. These settings govern
|
|
how the truncation is performed.
|
|
|
|
`xpack.inference.truncator.reduction_percentage`::
|
|
(<<cluster-update-settings,Dynamic>>) Specifies the percentage to reduce the input text by if the 3rd party service
|
|
responds with an error indicating it is too long. Defaults to 50 percent (`0.5`).
|
|
// end::inference-input-text[]
|