mirror of
https://github.com/elastic/kibana.git
synced 2025-04-24 09:48:58 -04:00
# Backport This will backport the following commits from `main` to `8.x`: - [[Auto Import] Improve log format recognition (#196228)](https://github.com/elastic/kibana/pull/196228) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ilya Nikokoshev","email":"ilya.nikokoshev@elastic.co"},"sourceCommit":{"committedDate":"2024-10-15T12:02:00Z","message":"[Auto Import] Improve log format recognition (#196228)\n\nPreviously the LLM would often select `unstructured` format for what (to\r\nour eye) clearly are CSV samples.\r\n\r\nWe add the missing line break between the log samples (which should help\r\nformat recognition in general) and change the prompt to clarify when the\r\ncomma-separated list should be treated as a `csv` and when as\r\n`structured` format.\r\n\r\nSee GitHub for examples.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <123897612+bhapas@users.noreply.github.com>","sha":"bdc9ce932bbfa606dd1f1e188c8b32df4327a0a4","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:skip","backport missing","v9.0.0","backport:prev-minor","Team:Security-Scalability","Feature:AutomaticImport"],"title":"[Auto Import] Improve log format recognition","number":196228,"url":"https://github.com/elastic/kibana/pull/196228","mergeCommit":{"message":"[Auto Import] Improve log format recognition (#196228)\n\nPreviously the LLM would often select `unstructured` format for what (to\r\nour eye) clearly are CSV samples.\r\n\r\nWe add the missing line break between the log samples (which should help\r\nformat recognition in general) and change the prompt to clarify when the\r\ncomma-separated list should be treated as a `csv` and when as\r\n`structured` format.\r\n\r\nSee GitHub for examples.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <123897612+bhapas@users.noreply.github.com>","sha":"bdc9ce932bbfa606dd1f1e188c8b32df4327a0a4"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/196228","number":196228,"mergeCommit":{"message":"[Auto Import] Improve log format recognition (#196228)\n\nPreviously the LLM would often select `unstructured` format for what (to\r\nour eye) clearly are CSV samples.\r\n\r\nWe add the missing line break between the log samples (which should help\r\nformat recognition in general) and change the prompt to clarify when the\r\ncomma-separated list should be treated as a `csv` and when as\r\n`structured` format.\r\n\r\nSee GitHub for examples.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <123897612+bhapas@users.noreply.github.com>","sha":"bdc9ce932bbfa606dd1f1e188c8b32df4327a0a4"}}]}] BACKPORT--> Co-authored-by: Ilya Nikokoshev <ilya.nikokoshev@elastic.co>
This commit is contained in:
parent
8041d698d8
commit
bf0432de4e
2 changed files with 13 additions and 11 deletions
|
@ -26,7 +26,7 @@ export async function handleLogFormatDetection({
|
|||
|
||||
const logFormatDetectionResult = await logFormatDetectionNode.invoke({
|
||||
ex_answer: state.exAnswer,
|
||||
log_samples: samples,
|
||||
log_samples: samples.join('\n'),
|
||||
package_title: state.packageTitle,
|
||||
datastream_title: state.dataStreamTitle,
|
||||
});
|
||||
|
|
|
@ -17,16 +17,18 @@ export const LOG_FORMAT_DETECTION_PROMPT = ChatPromptTemplate.fromMessages([
|
|||
The samples apply to the data stream {datastream_title} inside the integration package {package_title}.
|
||||
|
||||
Follow these steps to do this:
|
||||
1. Go through each log sample and identify the log format. Output this as "name: <log_format>".
|
||||
2. If the samples have any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId in the beginning information then set "header: true".
|
||||
3. If the samples have a syslog header then set "header: true" , else set "header: false". If you are unable to determine the syslog header presence then set "header: false".
|
||||
4. If the log samples have structured message body with key-value pairs then classify it as "name: structured". Look for a flat list of key-value pairs, often separated by spaces, commas, or other delimiters.
|
||||
5. Consider variations in formatting, such as quotes around values ("key=value", key="value"), special characters in keys or values, or escape sequences.
|
||||
6. If the log samples have unstructured body like a free-form text then classify it as "name: unstructured".
|
||||
7. If the log samples follow a csv format then classify it with "name: csv". There are two sub-cases for csv:
|
||||
a. If there is a csv header then set "header: true".
|
||||
b. If there is no csv header then set "header: false" and try to find good names for the columns in the "columns" array by looking into the values of data in those columns. For each column, if you are unable to find good name candidate for it then output an empty string, like in the example.
|
||||
8. If you cannot put the format into any of the above categories then classify it with "name: unsupported".
|
||||
1. Go through each log sample and identify the log format. Output this as "name: <log_format>". Here are the values for log_format:
|
||||
* 'csv': If the log samples follow a Comma-Separated Values format then classify it with "name: csv". There are two sub-cases for csv:
|
||||
a. If there is a csv header then set "header: true".
|
||||
b. If there is no csv header then set "header: false" and try to find good names for the columns in the "columns" array by looking into the values of data in those columns. For each column, if you are unable to find good name candidate for it then output an empty string, like in the example.
|
||||
* 'structured': If the log samples have structured message body with key-value pairs then classify it as "name: structured". Look for a flat list of key-value pairs, often separated by some delimiters. Consider variations in formatting, such as quotes around values ("key=value", key="value"), special characters in keys or values, or escape sequences.
|
||||
* 'unstructured': If the log samples have unstructured body like a free-form text then classify it as "name: unstructured".
|
||||
* 'unsupported': If you cannot put the format into any of the above categories then classify it with "name: unsupported".
|
||||
2. Header: for structured and unstructured format:
|
||||
- if the samples have any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId in the beginning information then set "header: true".
|
||||
- if the samples have a syslog header then set "header: true"
|
||||
- else set "header: false". If you are unable to determine the syslog header presence then set "header: false".
|
||||
3. Note that a comma-separated list should be classified as 'csv' if its rows only contain values separated by commas. But if it looks like a list of comma separated key-values pairs like 'key1=value1, key2=value2' it should be classified as 'structured'.
|
||||
|
||||
You ALWAYS follow these guidelines when writing your response:
|
||||
<guidelines>
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue