elasticsearch/docs/reference/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md
Colleen McGinnis 9bcd59596d
[docs] Prepare for docs-assembler (#125118)
* reorg files for docs-assembler and create toc.yml files

* fix build error, add redirects

* only toc

* move images
2025-03-20 12:09:12 -05:00

166 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
navigation_title: "{{infer-cap}} bucket"
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-inference-bucket-aggregation.html
---
# {{infer-cap}} bucket aggregation [search-aggregations-pipeline-inference-bucket-aggregation]
A parent pipeline aggregation which loads a pre-trained model and performs {{infer}} on the collated result fields from the parent bucket aggregation.
To use the {{infer}} bucket aggregation, you need to have the same security privileges that are required for using the [get trained models API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-get-trained-models).
## Syntax [inference-bucket-agg-syntax]
A `inference` aggregation looks like this in isolation:
```js
{
"inference": {
"model_id": "a_model_for_inference", <1>
"inference_config": { <2>
"regression_config": {
"num_top_feature_importance_values": 2
}
},
"buckets_path": {
"avg_cost": "avg_agg", <3>
"max_cost": "max_agg"
}
}
}
```
1. The unique identifier or alias for the trained model.
2. The optional inference config which overrides the models default settings
3. Map the value of `avg_agg` to the models input field `avg_cost`
$$$inference-bucket-params$$$
| Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- |
| `model_id` | The ID or alias for the trained model. | Required | - |
| `inference_config` | Contains the inference type and its options. There are two types: [`regression`](#inference-agg-regression-opt) and [`classification`](#inference-agg-classification-opt) | Optional | - |
| `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | - |
## Configuration options for {{infer}} models [_configuration_options_for_infer_models]
The `inference_config` setting is optional and usually isnt required as the pre-trained models come equipped with sensible defaults. In the context of aggregations some options can be overridden for each of the two types of model.
#### Configuration options for {{regression}} models [inference-agg-regression-opt]
`num_top_feature_importance_values`
: (Optional, integer) Specifies the maximum number of [{{feat-imp}}](docs-content://explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance.md) values per document. By default, it is zero and no {{feat-imp}} calculation occurs.
#### Configuration options for {{classification}} models [inference-agg-classification-opt]
`num_top_classes`
: (Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
`num_top_feature_importance_values`
: (Optional, integer) Specifies the maximum number of [{{feat-imp}}](docs-content://explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance.md) values per document. Defaults to 0 which means no {{feat-imp}} calculation occurs.
`prediction_field_type`
: (Optional, string) Specifies the type of the predicted field to write. Valid values are: `string`, `number`, `boolean`. When `boolean` is provided `1.0` is transformed to `true` and `0.0` to `false`.
## Example [inference-bucket-agg-example]
The following snippet aggregates a web log by `client_ip` and extracts a number of features via metric and bucket sub-aggregations as input to the {{infer}} aggregation configured with a model trained to identify suspicious client IPs:
```console
GET kibana_sample_data_logs/_search
{
"size": 0,
"aggs": {
"client_ip": { <1>
"composite": {
"sources": [
{
"client_ip": {
"terms": {
"field": "clientip"
}
}
}
]
},
"aggs": { <2>
"url_dc": {
"cardinality": {
"field": "url.keyword"
}
},
"bytes_sum": {
"sum": {
"field": "bytes"
}
},
"geo_src_dc": {
"cardinality": {
"field": "geo.src"
}
},
"geo_dest_dc": {
"cardinality": {
"field": "geo.dest"
}
},
"responses_total": {
"value_count": {
"field": "timestamp"
}
},
"success": {
"filter": {
"term": {
"response": "200"
}
}
},
"error404": {
"filter": {
"term": {
"response": "404"
}
}
},
"error503": {
"filter": {
"term": {
"response": "503"
}
}
},
"malicious_client_ip": { <3>
"inference": {
"model_id": "malicious_clients_model",
"buckets_path": {
"response_count": "responses_total",
"url_dc": "url_dc",
"bytes_sum": "bytes_sum",
"geo_src_dc": "geo_src_dc",
"geo_dest_dc": "geo_dest_dc",
"success": "success._count",
"error404": "error404._count",
"error503": "error503._count"
}
}
}
}
}
}
}
```
1. A composite bucket aggregation that aggregates the data by `client_ip`.
2. A series of metrics and bucket sub-aggregations.
3. {{infer-cap}} bucket aggregation that specifies the trained model and maps the aggregation names to the models input fields.