Apache, nginx, syslog, and many systems use emergency level,
and it was missing in logstash.
Also add tests to cover all scenarios of `LOGLEVEL` expansion.
RFC952 states of a hostname: "The last character must not be a minus sign or period."
https://tools.ietf.org/html/rfc952
Some of the limitations in RFC952 were lifted by RFC1123, but not this one.
https://tools.ietf.org/html/rfc1123
The updated regex still allows single character hostnames, but does not allow the final character in any section to be a '-'.
removed BSD/Linux-specific TTYS, as there are several more TTY-names under even under linux than /dev/pts/${NONNEGINT}.
This also allows
* "/dev/ttyUSB0"
* "/dev/ttyS0"
This is a personal preference, but for web logs, I prefer the parser to capture what it can. Currently with an invalid request, it fails completely rather than capturing the other log information such as date, bytes transferred and HTTP status.
This patch captures the invalid request into @fields.rawrequest and leaves @fields.verb, @fields.request and @fields.httpversion as nulls if it cannot be properly parsed.
Here is a sample of invalid requests I have from my logs:
115.70.170.86 - - [31/Oct/2012:06:41:24 +1100] "G" 408 0 "-" "-"
165.86.71.20 - - [31/Oct/2012:04:27:01 +1100] "GET http://dis.us.criteo.com/dis/dis.aspx?&t1=sendEvent&c=2&p=3937&p1=v%3D2%26wi%3D7715628%26pt1%3D0%26pt2%3D1%26si%3D1&cb=21664477550&ref=&sc_r=1280x1024&sc_d=32 HTTP/1.0" 400 672 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E)"
Obviously these are not valid requests, and I prefer to handle them this way, but the change is up to you.
The hyphens in the regexes are creating ranges and need to be escaped. Without this change, results in parser failures for logs containing URIs such as:
/test/page.html?arg=hypenated-arg
Commit e62536a introduced a complication: there are times when one
wants to match against zero as well as the positive integers (such
as in the LINUXTTY pattern). For these times, NONNEGINT can be used.
Existing users of POSINT might continue to expect zero to match, so
this change should probably be mentioned in the release notes (on the
other hand, some could be using POSINT without wanting it to match
zero, as happened to me).
Ref: Paragraph 3 of http://en.wikipedia.org/wiki/Natural_number
RFC 3986 (the URI specification) describes the , ; and =
characters used for including parameters in path segments.
Typically these are seen only on the final segment, just before
any query parameters, i.e.
http://www.site.com/path1/path2;jsessionid=OI24B9ASD7BSSD
Adding ; and = to the regex, as , is already included
Description :
Usual syslog message :
<85>Jun 14 15:19:47 localhost sudo: root : TTY=pts/1 ; PWD=/opt/logstash ; USER=root ; COMMAND=/bin/bash
Cisco typical message :
<166> Jun 14 15:30:00 10.100.252.52 %ASA-6-302021: Teardown ICMP connection for faddr 10.100.120.120/0 gaddr 10.100.252.1/0 laddr 10.100.252.1/0
----> program name start with a %
Can be reproduced sending a manual syslog message with python script :
import logging
from logging.handlers import SysLogHandler
#message='Jun 14 15:19:47 localhost sudo: root : TTY=pts/1 ; PWD=/opt/logstash ; USER=root ; COMMAND=/bin/bash'
message=' Jun 15 09:47:36 10.100.252.1 %ASA-6-111116: Teardown UDP connection 6201992 for internet:192.168.1.1/1026 to interne:10.100.120.120/427 duration 0:02:04 bytes 588'
logger = logging.getLogger()
logger.setLevel(logging.INFO)
syslog = SysLogHandler(address=('localhost',5544))
#syslog = SysLogHandler(address='/dev/log')
#formatter = logging.Formatter('%(name)s: %(levelname)s %(message)s')
#syslog.setFormatter(formatter)
logger.addHandler(syslog)
logger.warning(message)
Leading to a "NOT SYSLOG" message in the logs and no @fields{} values
With this change the fields are OK and "NOT SYSLOG" message is gone. I still have a "@tags":["_grokparsefailure"], error though...
In some cases, Onigiruma gets confused about negative matches, so
previously a pattern of '%{QS} something', if false match, would
cause Oniguruma to loop frantickly. I haven't yet dug into
the part of Oni that does this, but it's common that some regexp
engines have this behavior. Easy fix moving to non-backtracking
matches..