- Add charset support to the json codec. While technically JSON is

required to be UTF-8, some systems generate JSON that is not UTF-8
encoded. One observed example is nxlog on windows using this nxlog
config:

    <Output out>
      Module om_tcp
      Host 10.1.1.2
      Port 3515
      Exec $EventReceivedTime = integer($EventReceivedTime) / 1000000; \
      to_json();
    </Output>

The JSON emitted with nxlog above will generate JSON encoded with
Latin-1/CP1252/Windows-1252. This is incorrect, but logstash can easily
work around this now.
This commit is contained in:
Jordan Sissel 2013-09-04 09:18:46 -07:00
parent 4c32cd2851
commit 621d61e846

View file

@ -1,15 +1,39 @@
require "logstash/codecs/base"
require "json"
# This is the base class for logstash codecs.
# This codec will encode and decode JSON.
class LogStash::Codecs::Json < LogStash::Codecs::Base
config_name "json"
milestone 1
# The character encoding used in this codec. Examples include "UTF-8" and
# "CP1252"
#
# JSON requires valid UTF-8 strings, but in some cases, software that
# emits JSON does so in another encoding (nxlog, for example). In
# weird cases like this, you can set the charset setting to the
# actual encoding of the text and logstash will convert it for you.
#
# For nxlog users, you'll want to set this to "CP1252"
config :charset, :validate => ::Encoding.name_list, :default => "UTF-8"
public
def decode(data)
yield LogStash::Event.new(JSON.parse(data.force_encoding("UTF-8")))
data.force_encoding(@charset)
if @charset != "UTF-8"
# The user has declared the character encoding of this data is
# something other than UTF-8. Let's convert it (as cleanly as possible)
# into UTF-8 so we can use it with JSON, etc.
data = data.encode("UTF-8", :invalid => :replace, :undef => :replace)
end
begin
yield LogStash::Event.new(JSON.parse(data))
rescue JSON::ParserError => e
@logger.info("JSON parse failure. Falling back to plain-text", :error => e, :data => data)
yield LogStash::Event.new("message" => data)
end
end # def decode
public