[8.16] [Automatic Import] Fix Structured log flow to handle different type of structured syslogs (#212611) (#212644)

# Backport

This will backport the following commits from `main` to `8.16`:
- [[Automatic Import] Fix Structured log flow to handle different type
of structured syslogs
(#212611)](https://github.com/elastic/kibana/pull/212611)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Bharat
Pasupula","email":"123897612+bhapas@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-02-27T12:32:17Z","message":"[Automatic
Import] Fix Structured log flow to handle different type of structured
syslogs (#212611)\n\n## Release note\nFix structured log flow to handle
multiple types of structured logs.\n\n## Summary\nThe structured log
flow has some issues where the KV header validation\nfails for some type
of logs. This PR fixes the flow to match variety of\nstructured syslog
messages.\n\nA variety of logs are
tested.\n\n```\n[2025-01-03T07:48:58.989821Z] [DEBUG] AuthService -
EventID=361a5289eaf8e42b4c195b9b | Message=\"Session expired\" |
UserID=2882 | Duration=376ms\n[2025-01-29T17:34:18.989830Z] [ERROR]
InventoryService - EventID=acbb20d3c955edf718e691d9 | Message=\"Item
restocked\" | UserID=9656 |
Duration=421ms\n[2025-01-11T21:51:54.989839Z] [ERROR] APIGateway -
EventID=9c273d43b946020d5fdbe36c | Message=\"Response sent\" |
UserID=1468 | Duration=409ms\n[2025-01-20T08:40:22.989848Z] [WARN]
PaymentService - EventID=ae8c1425079119b848fa451cb7a | Message=\"3D
Secure required\" | UserID=9353 | Duration=270ms\n```\n\n```\n2021-10-22
22:11:32,131 DEBUG [org.keycloak.events] (default task-3)
type=CODE_TO_TOKEN, realmId=test, clientId=security-admin-console,
userId=ce637d23--4fca-9088-1aea1d053e19, ipAddress=10.1.2.1,
token_id=561459c0-75f1-46d4-986d, grant_type=authorization_code,
refresh_token_type=Refresh, scope=openid,
refresh_token_id=07434488-ca99-412a-c2e47c93d6d1,
code_id=bae6e56e-368f-4809-48cfb6279f5e,
client_auth_method=client-secret\n2021-10-22 22:12:09,871 DEBUG
[org.keycloak.events] (default task-3) operationType=CREATE,
realmId=test, clientId=7bcaf1cb-820a-40f1-75ced03ef03b,
userId=ce637d23-b89c-4fca-1aea1d053e19, ipAddress=10.1.2.6,
resourceType=USER,
resourcePath=users/07972d16-b173-803d-90f211080f40\n```\n\n```\n[18/Feb/2025:22:39:18
+0000] CONNECT conn=730729 from=10.2.2.9:56518 to=10.2.1.14:4389
protocol=LDAP\n[18/Feb/2025:22:39:16 +0000] CONNECT conn=207223
from=10.2.1.24:55730 to=10.1.3.7:4389 protocol=LDAP\n```\n\n```\n<134>1
1647479580.487048774 MX84_2 airmarshal_events type=rogue_ssid_detected
ssid='' bssid='AA:17:C8:D8:51' src='AA:17:C8:D8:51' dst='FF:FF:FF:FF:FF'
wired_mac='AC:17:C7:D8:51' vlan_id='0' channel='6' rssi='35' fc_type='0'
fc_subtype='8'\n<134>1 1647479604.334549372 MX84_5 airmarshal_events
type=rogue_ssid_detected ssid='' bssid='92:17:C7:D8:51'
src='92:17:C8:D8:51' dst='6A:3A:3E:85:F6' wired_mac='AC:17:C7:D8:51'
vlan_id='0' channel='6' rssi='23' fc_type='0' fc_subtype='5'\n```\n\n###
Checklist\n- [x] The PR description includes the appropriate Release
Notes section,\nand the correct `release_note:*` label is applied per
the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"f579f2d637ce6e7e51f15e32ba8c5d8ba554478e","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:prev-minor","backport:prev-major","Team:Security-Scalability","backport:version","Feature:AutomaticImport","v9.1.0","backport:8.18"],"title":"[Automatic
Import] Fix Structured log flow to handle different type of structured
syslogs","number":212611,"url":"https://github.com/elastic/kibana/pull/212611","mergeCommit":{"message":"[Automatic
Import] Fix Structured log flow to handle different type of structured
syslogs (#212611)\n\n## Release note\nFix structured log flow to handle
multiple types of structured logs.\n\n## Summary\nThe structured log
flow has some issues where the KV header validation\nfails for some type
of logs. This PR fixes the flow to match variety of\nstructured syslog
messages.\n\nA variety of logs are
tested.\n\n```\n[2025-01-03T07:48:58.989821Z] [DEBUG] AuthService -
EventID=361a5289eaf8e42b4c195b9b | Message=\"Session expired\" |
UserID=2882 | Duration=376ms\n[2025-01-29T17:34:18.989830Z] [ERROR]
InventoryService - EventID=acbb20d3c955edf718e691d9 | Message=\"Item
restocked\" | UserID=9656 |
Duration=421ms\n[2025-01-11T21:51:54.989839Z] [ERROR] APIGateway -
EventID=9c273d43b946020d5fdbe36c | Message=\"Response sent\" |
UserID=1468 | Duration=409ms\n[2025-01-20T08:40:22.989848Z] [WARN]
PaymentService - EventID=ae8c1425079119b848fa451cb7a | Message=\"3D
Secure required\" | UserID=9353 | Duration=270ms\n```\n\n```\n2021-10-22
22:11:32,131 DEBUG [org.keycloak.events] (default task-3)
type=CODE_TO_TOKEN, realmId=test, clientId=security-admin-console,
userId=ce637d23--4fca-9088-1aea1d053e19, ipAddress=10.1.2.1,
token_id=561459c0-75f1-46d4-986d, grant_type=authorization_code,
refresh_token_type=Refresh, scope=openid,
refresh_token_id=07434488-ca99-412a-c2e47c93d6d1,
code_id=bae6e56e-368f-4809-48cfb6279f5e,
client_auth_method=client-secret\n2021-10-22 22:12:09,871 DEBUG
[org.keycloak.events] (default task-3) operationType=CREATE,
realmId=test, clientId=7bcaf1cb-820a-40f1-75ced03ef03b,
userId=ce637d23-b89c-4fca-1aea1d053e19, ipAddress=10.1.2.6,
resourceType=USER,
resourcePath=users/07972d16-b173-803d-90f211080f40\n```\n\n```\n[18/Feb/2025:22:39:18
+0000] CONNECT conn=730729 from=10.2.2.9:56518 to=10.2.1.14:4389
protocol=LDAP\n[18/Feb/2025:22:39:16 +0000] CONNECT conn=207223
from=10.2.1.24:55730 to=10.1.3.7:4389 protocol=LDAP\n```\n\n```\n<134>1
1647479580.487048774 MX84_2 airmarshal_events type=rogue_ssid_detected
ssid='' bssid='AA:17:C8:D8:51' src='AA:17:C8:D8:51' dst='FF:FF:FF:FF:FF'
wired_mac='AC:17:C7:D8:51' vlan_id='0' channel='6' rssi='35' fc_type='0'
fc_subtype='8'\n<134>1 1647479604.334549372 MX84_5 airmarshal_events
type=rogue_ssid_detected ssid='' bssid='92:17:C7:D8:51'
src='92:17:C8:D8:51' dst='6A:3A:3E:85:F6' wired_mac='AC:17:C7:D8:51'
vlan_id='0' channel='6' rssi='23' fc_type='0' fc_subtype='5'\n```\n\n###
Checklist\n- [x] The PR description includes the appropriate Release
Notes section,\nand the correct `release_note:*` label is applied per
the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"f579f2d637ce6e7e51f15e32ba8c5d8ba554478e"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/212611","number":212611,"mergeCommit":{"message":"[Automatic
Import] Fix Structured log flow to handle different type of structured
syslogs (#212611)\n\n## Release note\nFix structured log flow to handle
multiple types of structured logs.\n\n## Summary\nThe structured log
flow has some issues where the KV header validation\nfails for some type
of logs. This PR fixes the flow to match variety of\nstructured syslog
messages.\n\nA variety of logs are
tested.\n\n```\n[2025-01-03T07:48:58.989821Z] [DEBUG] AuthService -
EventID=361a5289eaf8e42b4c195b9b | Message=\"Session expired\" |
UserID=2882 | Duration=376ms\n[2025-01-29T17:34:18.989830Z] [ERROR]
InventoryService - EventID=acbb20d3c955edf718e691d9 | Message=\"Item
restocked\" | UserID=9656 |
Duration=421ms\n[2025-01-11T21:51:54.989839Z] [ERROR] APIGateway -
EventID=9c273d43b946020d5fdbe36c | Message=\"Response sent\" |
UserID=1468 | Duration=409ms\n[2025-01-20T08:40:22.989848Z] [WARN]
PaymentService - EventID=ae8c1425079119b848fa451cb7a | Message=\"3D
Secure required\" | UserID=9353 | Duration=270ms\n```\n\n```\n2021-10-22
22:11:32,131 DEBUG [org.keycloak.events] (default task-3)
type=CODE_TO_TOKEN, realmId=test, clientId=security-admin-console,
userId=ce637d23--4fca-9088-1aea1d053e19, ipAddress=10.1.2.1,
token_id=561459c0-75f1-46d4-986d, grant_type=authorization_code,
refresh_token_type=Refresh, scope=openid,
refresh_token_id=07434488-ca99-412a-c2e47c93d6d1,
code_id=bae6e56e-368f-4809-48cfb6279f5e,
client_auth_method=client-secret\n2021-10-22 22:12:09,871 DEBUG
[org.keycloak.events] (default task-3) operationType=CREATE,
realmId=test, clientId=7bcaf1cb-820a-40f1-75ced03ef03b,
userId=ce637d23-b89c-4fca-1aea1d053e19, ipAddress=10.1.2.6,
resourceType=USER,
resourcePath=users/07972d16-b173-803d-90f211080f40\n```\n\n```\n[18/Feb/2025:22:39:18
+0000] CONNECT conn=730729 from=10.2.2.9:56518 to=10.2.1.14:4389
protocol=LDAP\n[18/Feb/2025:22:39:16 +0000] CONNECT conn=207223
from=10.2.1.24:55730 to=10.1.3.7:4389 protocol=LDAP\n```\n\n```\n<134>1
1647479580.487048774 MX84_2 airmarshal_events type=rogue_ssid_detected
ssid='' bssid='AA:17:C8:D8:51' src='AA:17:C8:D8:51' dst='FF:FF:FF:FF:FF'
wired_mac='AC:17:C7:D8:51' vlan_id='0' channel='6' rssi='35' fc_type='0'
fc_subtype='8'\n<134>1 1647479604.334549372 MX84_5 airmarshal_events
type=rogue_ssid_detected ssid='' bssid='92:17:C7:D8:51'
src='92:17:C8:D8:51' dst='6A:3A:3E:85:F6' wired_mac='AC:17:C7:D8:51'
vlan_id='0' channel='6' rssi='23' fc_type='0' fc_subtype='5'\n```\n\n###
Checklist\n- [x] The PR description includes the appropriate Release
Notes section,\nand the correct `release_note:*` label is applied per
the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"f579f2d637ce6e7e51f15e32ba8c5d8ba554478e"}}]}]
BACKPORT-->

Co-authored-by: Bharat Pasupula <123897612+bhapas@users.noreply.github.com>
This commit is contained in:
Kibana Machine 2025-02-28 01:37:44 +11:00 committed by GitHub
parent edec21f4bf
commit 0ffba3ab86
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 54 additions and 14 deletions

View file

@ -13,6 +13,26 @@ export const KV_EXAMPLE_ANSWER = {
ignore_missing: true,
};
export const KV_HEADER_EXAMPLE_LOGS = [
{
example:
'[18/Feb/2025:22:39:16 +0000] CONNECT conn=20597223 from=10.1.1.1:1234 to=10.2.3.4:4389 protocol=LDAP',
header: '[18/Feb/2025:22:39:16 +0000] CONNECT',
structuredBody: 'conn=20597223 from=10.1.1.1:1234 to=10.2.3.4:4389 protocol=LDAP',
grok_pattern:
'[%{HTTPDATE:`{packageName}.{dataStreamName}.`timestamp}] %{WORD:`{packageName}.{dataStreamName}`action}s%{GREEDYDATA:message}',
},
{
example:
'2021-10-22 22:12:09,871 DEBUG [org.keycloak.events] (default task-3) operationType=CREATE, realmId=test, clientId=abcdefgh userId=sdfsf-b89c-4fca-9088-sdfsfsf, ipAddress=10.1.1.1, resourceType=USER, resourcePath=users/07972d16-b173-4c99-803d-90f211080f40',
header: '2021-10-22 22:12:09,871 DEBUG [org.keycloak.events] (default task-3)',
structuredBody:
'operationType=CREATE, realmId=test, clientId=7bcaf1cb-820a-40f1-91dd-75ced03ef03b, userId=ce637d23-b89c-4fca-9088-1aea1d053e19, ipAddress=10.1.1.1, resourceType=USER, resourcePath=users/07972d16-b173-4c99-803d-90f211080f40',
grok_pattern:
'%{TIMESTAMP_ISO8601:`{packageName}.{dataStreamName}.`timestamp} %{LOGLEVEL:`{packageName}.{dataStreamName}`loglevel} [%{DATA:`{packageName}.{dataStreamName}`logsource}] (%{DATA:`{packageName}.{dataStreamName}`task})s%{GREEDYDATA:message}',
},
];
export const KV_HEADER_EXAMPLE_ANSWER = {
rfc: 'RFC2454',
regex:

View file

@ -9,7 +9,7 @@ import { JsonOutputParser } from '@langchain/core/output_parsers';
import type { KVState } from '../../types';
import type { HandleKVNodeParams } from './types';
import { KV_HEADER_PROMPT } from './prompts';
import { KV_HEADER_EXAMPLE_ANSWER } from './constants';
import { KV_HEADER_EXAMPLE_ANSWER, KV_HEADER_EXAMPLE_LOGS } from './constants';
export async function handleHeader({
state,
@ -23,6 +23,7 @@ export async function handleHeader({
samples: state.logSamples,
packageName: state.packageName,
dataStreamName: state.dataStreamName,
example_logs: KV_HEADER_EXAMPLE_LOGS,
ex_answer: JSON.stringify(KV_HEADER_EXAMPLE_ANSWER, null, 2),
});

View file

@ -56,8 +56,8 @@ describe('Testing kv header', () => {
field: 'message',
field_split: '',
target_field: 'testPackage.testDatastream',
trim_key: '',
trim_value: '',
trim_key: null,
trim_value: null,
value_split: '',
},
},

View file

@ -29,8 +29,8 @@ export const KV_MAIN_PROMPT = ChatPromptTemplate.fromMessages([
4. The \`value_split\` is the delimeter regex pattern to use for splitting the key from the value within a key-value pair (e.g., ':' or '=' )
5. The \`field_split\` is the regex pattern to use for splitting key-value pairs in the log. Make sure the regex pattern breaks the log into key-value pairs.
6. Ensure that the KV processor can handle different scenarios, such as: Optional or missing fields in the logs , Varying delimiters between keys and values (e.g., = or :), Complex log structures (e.g., nested key-value pairs or key-value pairs within strings, whitespaces , urls, ipv4 , ipv6 address, mac address etc.,).
7. Use \`trim_key\` for string of characters to trim from extracted keys.
8. Use \`trim_value\` for string of characters to trim from extracted values.
7. Use \`trim_key\` for string of characters to trim from extracted keys. Make sure to escape single quotes like \`\\'\`.
8. Use \`trim_value\` for string of characters to trim from extracted values. Make sure to escape single quotes like \`\\'\`.
You ALWAYS follow these guidelines when writing your response:
<guidelines>
@ -68,23 +68,34 @@ export const KV_HEADER_PROMPT = ChatPromptTemplate.fromMessages([
],
[
'human',
`Looking at the multiple syslog samples provided in the context, your task is to separate the "header" and the "message body" from this log. Our goal is to identify which RFC they belong to. Then create a regex pattern that can separate the header and the structured body.
You then have to create a grok pattern using the regex pattern.
You are given a log entry in a structured format.
`Here are a series of syslog samples in a structured log format, and your task is to create a regex and a grok pattern that will correctly parse only the header part of these logs. The pattern should be critical about the following points:
Follow these steps to identify the header pattern:
1. Identify if the log samples fall under RFC5424 or RFC3164. If not, return 'Custom Format'.
2. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
3. Make sure the regex and grok pattern matches all the header information. Only the structured message body should be under GREEDYDATA in grok pattern.
2. If the log samples fall under RFC3164 or RFC5424 then parse the header and structured body according to the RFC definition.
3. If the log sampels are in custom format , pay special attention to the special characters like brackets , colons or any punctuation marks in the syslog header, and ensure they are properly escaped.
4. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
You ALWAYS follow these guidelines when writing your response:
You ALWAYS follow these guidelines when writing your response:
<guidelines>
- Do not parse the message part in the regex. Just the header part should be in regex and grok_pattern.
- Timestamp Handling: Pay close attention to the timestamp format, ensuring that it is handled correctly with respect to any variations in date or time formatting. The timestamp should be extracted accurately, and make sure the pattern accounts for any variations in timezone representation, like time zone offsets or 'UTC' markers.
Also look for special characters around the timestamp in Custom Format, Like a timestamp enclosed in [] or <> or (). Match these characters in the grok pattern with appropriate excaping.
- Special Characters: Ensure that all special characters, like brackets, colons, or any punctuation marks in the syslog header, are properly escaped. Be particularly cautious with characters that could interfere with the regex engine, such as periods (.), asterisks (*), or square brackets ([]), and ensure they are treated correctly in the pattern.
- Strict Parsing of the Header: The regex and grok pattern should strictly focus on parsing only the header part of the syslog sample. Do not include any logic for parsing the structured message body. The message body should be captured using the GREEDYDATA field in the grok pattern, and any non-header content should be left out of the main pattern.
- Pattern Efficiency: Ensure that both the regex and the grok pattern are as efficient as possible while still accurately capturing the header components. Avoid overly complex or overly broad patterns that could capture unintended data.
- Make sure to map the remaining message body to \'message\' in grok pattern.
- If there are special characters between header and message body like space character, make sure to include that character in the header grok pattern
- Make sure to add \`{packageName}.{dataStreamName}\` as a prefix to each field in the pattern. Refer to example response.
- Do not respond with anything except the processor as a JSON object enclosed with 3 backticks (\`), see example response above. Use strict JSON response format.
</guidelines>
Some of the example samples look like this:
<example_logs>
\`\`\`json
{example_logs}
\`\`\`
</example_logs>
You are required to provide the output in the following example response format:
<example_response>
@ -120,6 +131,7 @@ Follow these steps to fix the errors in the header pattern:
2. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
3. The message body may start with a description, followed by structured key-value pairs.
4. Make sure the regex and grok pattern matches all the header information. Only the structured message body should be under GREEDYDATA in grok pattern.
You ALWAYS follow these guidelines when writing your response:
<guidelines>
- Do not parse the message part in the regex. Just the header part should be in regex and grok_pattern.

View file

@ -2,6 +2,6 @@
field: message
field_split: '{{ kvInput.field_split }}'
value_split: '{{ kvInput.value_split }}'
trim_key: '{{ kvInput.trim_key }}'
trim_value: '{{ kvInput.trim_value }}'
trim_key: {{ kvInput.trim_key }}
trim_value: {{ kvInput.trim_value }}
target_field: '{{ packageName }}.{{ dataStreamName }}'

View file

@ -69,6 +69,13 @@ export function createKVProcessor(kvInput: KVProcessor, state: KVState): ESProce
autoescape: false,
});
const template = env.getTemplate('kv.yml.njk');
if (kvInput.trim_key) {
kvInput.trim_key = kvInput.trim_key.replace(/['"]/g, '\\$&');
}
if (kvInput.trim_value) {
kvInput.trim_value = kvInput.trim_value.replace(/['"]/g, '\\$&');
}
const renderedTemplate = template.render({
kvInput,
packageName: state.packageName,