## Summary Part of #187684. So far the popover to filter fields was only available when grouping was enabled. This PR updates the behavior so it's available all the time and can be used to exclude field candidates from the analysis. If we detect the index to be based on an ECS schema, we auto-select a set of predefined fields. Changes in this PR: - Creates a new route `/internal/aiops/log_rate_analysis/field_candidates` to be able to fetch field candidates independent of the main streaming API call. - Fixes the code to consider "remaining" field candidates to also consider text field candidates. This was originally developed to allow to continue an analysis that errored for some reason. We use that option to also pass on the custom field list from the field selection popover. - Fetching the field candidates is done in a new redux slice `logRateAnalysisFieldCandidatesSlice` using an async thunk. - Filters the list of field candidates by a predefined field of allowed fields when an ECS schema gets detected. - Renames `fieldCandidates` to `keywordFieldCandidates` for clearer distinction against `textFieldCandidates`. - Refactors `getLogRateAnalysisTypeForCounts` args to a config object. - Bump the API version for the full log rate analysis to version 3. We missed bumping the version in https://github.com/elastic/kibana/pull/188648. This update manages proper versioning between v2 and v3, also the API integration tests cover both versions. [aiops-log-rate-analysis-fields-filter-0001.webm](https://github.com/user-attachments/assets/e3ed8d5b-f01c-42ef-8033-caa7135b8cc0) ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) |
||
---|---|---|
.. | ||
public | ||
server | ||
jest.config.js | ||
kibana.jsonc | ||
README.md | ||
tsconfig.json |
aiops
The plugin provides APIs and components for AIOps features, including the “Log rate analysis” UI, maintained by the ML team.
Log Rate Analysis
Here's some notes on the structure of the code for the API endpoint /internal/aiops/log_rate_analysis
. The endpoint uses the @kbn/ml-response-stream
package to return the request's response as a HTTP stream of JSON objects. The files are located in x-pack/plugins/aiops/server/routes/log_rate_analysis/
.
define_route.ts:defineRoute()
is the outer most wrapper that's used to define the route and its versions. It calls route_handler_factory:routeHandlerFactory()
for each version.
The route handler sets up response_stream_factory:responseStreamFactory()
to create the response stream and then walks through the steps of the analysis.
The response stream factory acts as a wrapper to set up the stream itself, the stream state (for example to set if it's running etc.), some custom actions on the stream as well as analysis handlers that fetch data from ES and pass it on to the stream.
Analysis details
Here are some more details on the steps involved to do Log Rate Analysis:
- Index info: This gathers information from the selected index to identify which type of analysis will be run and which fields will be used for analysis.
- Zero Docs Fallback: If there are no docs in either
baseline
orbaseline
, the analysis will not identify statistically significant items but will just run regularterms
aggregations and return the top items for the deviation time range. - Field identification: This runs field caps with the
include_empty_fields=false
option to get populated fields. Custom Kibana code then identifieskeyword/ip/boolean
andtext/match_only/text
fields suitable for analysis. When there's a field with bothkeyword/text
mappings thekeyword
one will be preferred unless there's an override defined (currentlymessage
anderror.message
).
- Zero Docs Fallback: If there are no docs in either
- Statistically significant items:
- General notes: Both aggregatable fields and log pattern queries will be wrapped in
random_sampler
aggregations . The p-value threshold to define statistically significant items is0.02
. - Aggregatable fields: For this we use the ES
significant_terms
aggregation with the p-value score option (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#p-value-score). Thebaseline
time range is used as thebackground_filter
, thedeviation
time range is used for the query part (=foreground). - Log patterns: To identify statistically significant entries in text fields there is not an ES equivalent to
significant_terms
, so we cannot run a single query for a field to do this. Instead, we use the following approach: We use thecategorize_text
aggregation to identify top text patterns across the baseline and deviation timerange (not yet statistically significant!). Then, for each identified text pattern, we get the document counts for both baseline and deviation. We then use the retrieved counts to run them against the same Kibana code we use for the Data Drift View to detect if there's a statistically significant difference in the counts (@kbn/ml-chi2test
package,x-pack/packages/ml/chi2test/critical_table_lookup.ts
). Text field pattern support was added in 8.11, see #167467 for more details.
- General notes: Both aggregatable fields and log pattern queries will be wrapped in
- Grouping: The grouping tries to identify co-occurences of identified significant items. Again, we have to take different approaches for aggregatable fields and log patterns, but eventually we combine the results. The
frequent_item_sets
aggregation is used as a first step to get co-occurence stats of aggregatable fields. This can be a heavy aggregation so we limit how many values per field we pass on to the agg (50
at the moment). For each possible aggregatable field to log pattern relation we query the doc count. The result of thefrequent_item_sets
aggregation and those doc counts get then passed on to custom code (derived but over time slighty improved from the original PoC Python Notebooks) to transform that raw data into groups (x-pack/packages/ml/aiops_log_rate_analysis/queries/get_significant_item_groups.ts
). - Histogram data: In addition to the analysis itself the endpoint returns histogram data for the result table sparklines.