[ML] AIOps: Updates README with more details about log rate analysis. (#180258)

## Summary - Updates `README.md` in `plugins/aiops` to include more details about the implementation of log rate analysis. You can view the rendered markdown [here](86a6297808/x-pack/plugins/aiops/README.md). - Updates debug logging to output both `baseline` and `deviation` doc count. ### Checklist - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
2025-04-24 01:38:56 -04:00 · 2024-04-10 17:53:14 +02:00 · 2024-04-10 17:53:14 +02:00 · 00a637c8ce
commit 00a637c8ce
parent 1fa7a69fb6
3 changed files with 16 additions and 7 deletions
--- a/x-pack/plugins/aiops/README.md
+++ b/x-pack/plugins/aiops/README.md
@ -14,6 +14,16 @@ The route handler sets up `response_stream_factory:responseStreamFactory()` to c

 The response stream factory acts as a wrapper to set up the stream itself, the stream state (for example to set if it's running etc.), some custom actions on the stream as well as analysis handlers that fetch data from ES and pass it on to the stream.

-## Development
+### Analysis details

-See the [kibana contributing guide](https://github.com/elastic/kibana/blob/main/CONTRIBUTING.md) for instructions setting up your development environment.
+Here are some more details on the steps involved to do Log Rate Analysis:
+
+- **Index info**: This gathers information from the selected index to identify which type of analysis will be run and which fields will be used for analysis.
+  - **Zero Docs Fallback**: If there are no docs in either `baseline` or `baseline`, the analysis will not identify statistically significant items but will just run regular `terms` aggregations and return the top items for the deviation time range.
+  - **Field identification**: This runs field caps with the `include_empty_fields=false` option to get populated fields. Custom Kibana code then identifies `keyword/ip/boolean` and `text/match_only/text` fields suitable for analysis. When there's a field with both `keyword/text` mappings the `keyword` one will be preferred unless there's an override defined (currently `message` and `error.message`).
+- **Statistically significant items**:
+  - **General notes**: Both aggregatable fields and log pattern queries will be wrapped in `random_sampler` aggregations . The p-value threshold to define statistically significant items is `0.02`.
+  - **Aggregatable fields**: For this we use the ES `significant_terms` aggregation with the p-value score option (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#p-value-score). The `baseline` time range is used as the `background_filter`, the `deviation` time range is used for the query part (=foreground).
+  - **Log patterns**: To identify statistically significant entries in text fields there is not an ES equivalent to `significant_terms`, so we cannot run a single query for a field to do this. Instead, we use the following approach: We use the `categorize_text` aggregation to identify top text patterns across the baseline and deviation timerange (not yet statistically significant!). Then, for each identified text pattern, we get the document counts for both baseline and deviation. We then use the retrieved counts to run them against the same Kibana code we use for the Data Drift View to detect if there's a statistically significant difference in the counts (`@kbn/ml-chi2test` package, `x-pack/packages/ml/chi2test/critical_table_lookup.ts`). Text field pattern support was added in 8.11, see [#167467](https://github.com/elastic/kibana/issues/167467) for more details.
+- **Grouping**: The grouping tries to identify co-occurences of identified significant items. Again, we have to take different approaches for aggregatable fields and log patterns, but eventually we combine the results. The `frequent_item_sets` aggregation is used as a first step to get co-occurence stats of aggregatable fields. This can be a heavy aggregation so we limit how many values per field we pass on to the agg (`50` at the moment). For each possible aggregatable field to log pattern relation we query the doc count. The result of the `frequent_item_sets` aggregation and those doc counts get then passed on to custom code (derived but over time slighty improved from the original PoC Python Notebooks) to transform that raw data into groups (`x-pack/packages/ml/aiops_log_rate_analysis/queries/get_significant_item_groups.ts`).
+- **Histogram data**: In addition to the analysis itself the endpoint returns histogram data for the result table sparklines.
--- a/x-pack/plugins/aiops/server/routes/log_rate_analysis/analysis_handlers/index_info_handler.ts
+++ b/x-pack/plugins/aiops/server/routes/log_rate_analysis/analysis_handlers/index_info_handler.ts
@ -37,7 +37,6 @@ export const indexInfoHandlerFactory =

    const textFieldCandidates: string[] = [];

-    let totalDocCount = 0;
    let zeroDocsFallback = false;

    if (!requestBody.overrides?.remainingFieldCandidates) {
@ -63,10 +62,12 @@ export const indexInfoHandlerFactory =
          abortSignal
        );

+        logDebugMessage(`Baseline document count: ${indexInfo.baselineTotalDocCount}`);
+        logDebugMessage(`Deviation document count: ${indexInfo.deviationTotalDocCount}`);
+
        fieldCandidates.push(...indexInfo.fieldCandidates);
        fieldCandidatesCount = fieldCandidates.length;
        textFieldCandidates.push(...indexInfo.textFieldCandidates);
-        totalDocCount = indexInfo.deviationTotalDocCount;
        zeroDocsFallback = indexInfo.zeroDocsFallback;
      } catch (e) {
        if (!isRequestAbortedError(e)) {
@ -77,8 +78,6 @@ export const indexInfoHandlerFactory =
        return;
      }

-      logDebugMessage(`Total document count: ${totalDocCount}`);
-
      stateHandler.loaded(LOADED_FIELD_CANDIDATES, false);

      responseStream.pushPingWithTimeout();
--- a/x-pack/plugins/aiops/server/routes/log_rate_analysis/route_handler_factory.ts
+++ b/x-pack/plugins/aiops/server/routes/log_rate_analysis/route_handler_factory.ts
@ -81,7 +81,7 @@ export function routeHandlerFactory<T extends ApiVersion>(
          analysis.overridesHandler();
          responseStream.pushPingWithTimeout();

-          // Step 1: Index Info: Field candidates, total doc count, sample probability
+          // Step 1: Index Info: Field candidates and zero docs fallback flag
          const indexInfo = await analysis.indexInfoHandler();

          if (!indexInfo) {