This PR changes where the `events.in` are calculated, previously the
values were calculated in the `ReadClient` which was fine before the
addition of the PQ, but this make the stats not accurate when the PQ was
enabled and the producer are a lot faster than the consumer.
These commits change the collection of the metric inside an
instrumented `WriteClient` so both implementation of the client queues will use
the same code.
This also make possible to record `events.out` for every inputs and the
time waiting to push to the queue.
The API is now exposing theses values for each plugins, the events level
and and the pipeline.
Using a pipeline with a sleep filter and PQ we will see this kind of
response from the API.
```json
{
"duration_in_millis": 438624,
"in": 3011436,
"filtered": 2189,
"out": 2189,
"queue_push_duration_in_millis": 49845
}
```
Fixes: #6512Fixes#6532
This change was harder than it first appeared! Due to the complicated
interactions between our Setting class and our monkey-patched Clamp
classes this required adding some new hooks into various places to
properly intercept the settings at the right point and set this
dynamically.
Crucially, this only changes path.queue when the user has *not*
overriden it explicitly in the settings.yml file.
Fixes#6378 and #6387Fixes#6731
When we use the JRMonitor library to get information about the running
threads it will trigger a thread dump to get the stacktrace information this OK when
we do a direct call to the `hot_threads` API but in the context of the
periodic poller this would mean that the threads need to be stopped to
generate their current stacktrace.
Which could significantly slow down logstash. This PR use the **ThreadMXBean** but only use the `#getThreadCount` and the `#getPeakThreadCount`. Theses two calls won't generate a hreaddump and won't block the currents
threads.
**To test** add the following options to your `config/jvm.options` and let logstash run for a few minutes to trigger a few periodic poller iteration and stop logstash you will see the report.
```
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
```
Fixes: #6603Fixes#6705
fix agent and pipeline and specs for queue exclusive access
added comments and swapped all sleep 0.01 to 0.1
revert explicit pipeline close in specs using sample helper
fix multiple pipelines specs
use BasePipeline for config validation which does not instantiate a new queue
review modifications
improve queue exception message
since 5.x introduced log4j2 as the main logging mechanism, it's
necessary to be more explicit when logging complex objects.
In this case we tell the logger to use the .to_s version of the Snapshot
report generated by the Watcher.
The Snapshot#to_s calls .to_simple_hash.to_s
Fixes#6628
Instead of using a list of non reloadable plugin we add a new class
method on the base plugin class that the plugin will override.
By default we assume that all plugins are reloadable, only the stdin
shouldn't be reloadable.
Fixes#6499
during Agent#start_pipeline a new thread is launched that executes
a pipeline.run and a rescue block which increments the failed reload counter
After launching the thread, the parent thread will wait for the pipeline
to start, or detect that the pipeline aborted, or sleep and check again.
There is a bug that, if the pipeline.run aborts during start_workers,
the pipeline is still marked as `ready`, and the thread will continue
running for a very short period of time, incrementing the failed reload
metric.
During this period of `pipeline.ready? == true` and `thread.alive? == true`,
the parent check code will observe all the necessary conditions to
consider the pipeline.run to be succesful and thus increment the success
counter too. This failed reload can then result in both the success and
failure reload count being incremented.
This commit changes the parent thread check to use `pipeline.running?`
instead of `pipeline.ready?` which is the next logical state transition,
and ensures it is only true if `start_workers` runs successfuly.
Fixes#6566
re #6508.
- removed `acked_count`, `unacked_count`, and migrated `unread_count` to
top-level `events` field.
- removed `current_size_in_bytes` info from queue node stats
Fixes#6510
Record the wall clock time for each output a new `duration_in_millis`
key will now be available for each output in the api located at http://localhost:9600/_node/stats
This commit also change some expectations in the output_delegator_spec
that were not working as intended with the `have_received` matcher.
Fixes#6458
When we were initilizing the `duration_in_millis` in the the batch we
were using a `Gauge` instead of a counter, since all the object have the
same signature when the were actually recording the time the value was
replaced instead of incremented.
Fixes#6465
We have more the responsability of watching the collector inside the
input itself, this feature might come back when we have a new execution
model that can be improved in watching metrics. But this would require
more granular watchers.
No tests were affected by this changes since the code that required that
features was already removed.
Fixes: #6447Fixes#6456
When a plugin is loaded using the `plugins.path` option or is from a
universal plugin there no gemspec can be found for the specific plugin.
We should not print any warning on that case.
Fixes: #6444Fixes#6448
The metric store has no concept is a metric need to exist so as a rule
of thumb we need to defined them with 0 values and send them to the
store when we initialize something.
This PR make sure the batch object is recording the right default values
Fixes: #6449Fixes#6450
When logstash is run under a linux container we will gather statistic about the cgroup and the
cpu usage. This information will should in the /_node/stats api and the result will look like this:
```
"os" : {
"cgroup" : {
"cpuacct" : {
"usage" : 789470280230,
"control_group" : "/user.slice/user-1000.slice"
},
"cpu" : {
"cfs_quota_micros" : -1,
"control_group" : "/user.slice/user-1000.slice",
"stat" : {
"number_of_times_throttled" : 0,
"time_throttled_nanos" : 0,
"number_of_periods" : 0
},
"cfs_period_micros" : 100000
}
}
}
```
Fixes: #6252Fixes#6357
This library provides a "log4j 1.2"-like API from the log4j2 library.
We don't seem to use this, and including it seems to be the cause of the
Logstash log4j input rejecting log4j 1.x's SocketAppender with this
message:
org.apache.log4j.spi.LoggingEvent; class invalid for deserialization
The origin of this error is that log4j2's log4j-1.2-api defines
LoggingEvent without `implements Serializable`.
This commit also includes regenerated gemspec_jars.rb and
logstash-core_jars.rb.
Reference: https://github.com/logstash-plugins/logstash-input-log4j/issues/36Fixes#6309
add queue.max_acked_checkpoint and queue.checkpoint_rate settings
now using checkpoint.max_acks, checkpoint.max_writes and checkpoint.max_interval
rename options
wip rework checkpointing
refactored full acked pages handling on acking and recovery
correclty close queue
proper queue open/recovery
checkpoint dump utility
checkpoint on writes
removed debug code and added missing newline
added better comment on contiguous checkpoints
fix spec for new pipeline setting