Commit graph

430 commits

Author SHA1 Message Date
Colin Surprenant
154ce65c3a add queue drain option support
wip queue drain option

metrics on empty batches

reenabled spec

cosmetic fixes

stats collection, mutex, specs, empty batch handling

start_metrics
2017-03-01 14:14:36 -05:00
Joao Duarte
c5b7cbeacc introduce locking in path.data
Fixes #6738
2017-02-24 05:27:10 -05:00
Colin Surprenant
8c000445c2 namespace memory checkpoints for multiple queues support
memory checkpoint IO need the purge() method
2017-02-22 10:59:31 -05:00
Suyog Rao
8a8f04ab1d Add tests for deep nested env variables 2017-02-21 14:16:24 -08:00
emile
de289435bb add deep environment variables replacement in configuration 2017-02-21 14:16:21 -08:00
Colin Surprenant
5d6848f840 memory mapped buffer cleaner
resource path

add bytebuffer cleaner

added comments
2017-02-21 15:11:23 -05:00
Colin Surprenant
6838cdb588 use queue path in memory acked queue to namespace .lock file 2017-02-20 17:36:07 -05:00
Andrew Cholakian
9a130f2de1 Setting --path.data on CLI should also change path.queue
This change was harder than it first appeared! Due to the complicated
interactions between our Setting class and our monkey-patched Clamp
classes this required adding some new hooks into various places to
properly intercept the settings at the right point and set this
dynamically.

Crucially, this only changes path.queue when the user has *not*
overriden it explicitly in the settings.yml file.

Fixes #6378 and #6387

Fixes #6731
2017-02-17 16:43:07 -05:00
Suyog Rao
8515bbc8e7 Don't show config content when reload is not triggered
Fixes #6720
2017-02-16 16:43:57 -05:00
Pier-Hugues Pellerin
08827dc032 Collecting the ThreadCount and the PeakThreadCount should not trigger a threaddump
When we use the JRMonitor library to get information about the running
threads it will trigger a thread dump to get the stacktrace information this OK when
we do a direct call to the `hot_threads` API but in the context of the
periodic poller this would mean that the threads need to be stopped to
generate their current stacktrace.

Which could significantly slow down logstash. This PR use the **ThreadMXBean** but only use the `#getThreadCount` and the `#getPeakThreadCount`. Theses two calls won't generate a hreaddump and won't block the currents
threads.

**To test** add the following options to your `config/jvm.options` and let logstash run for a few minutes to trigger a few periodic poller iteration and stop logstash you will see the report.

```
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
```

Fixes: #6603

Fixes #6705
2017-02-14 13:13:59 -05:00
Colin Surprenant
971537c27e fix bad head positioning on recovery & tests 2017-02-14 11:09:13 -05:00
Colin Surprenant
f323e68ac7 cleanup partially initialized pipeline
DRY the plugin registration

typo

log exception message
2017-02-14 11:04:39 -05:00
Colin Surprenant
46e1068af6 support exclusive locking of PQ dir access
fix agent and pipeline and specs for queue exclusive access

added comments and swapped all sleep 0.01 to 0.1

revert explicit pipeline close in specs using sample helper

fix multiple pipelines specs

use BasePipeline for config validation which does not instantiate a new queue

review modifications

improve queue exception message
2017-02-10 17:38:03 -05:00
Suyog Rao
63d7520494 Better asciidoc formatting and text for ID option
Fixes #6679
2017-02-09 14:39:21 -05:00
Suyog Rao
cb9e8cf7c9 Revert to 5.4.0 version instead of 5.5.0 (#6672) 2017-02-08 20:33:32 -08:00
Suyog Rao
530c5fc43d Bump to v5.5 (#6670)
* Bump to v5.5
2017-02-08 19:38:07 -08:00
TAC
7abd136d00 Fix document format about plugin id
Fixes #6636
2017-02-08 14:39:19 -05:00
Jordan Sissel
2957c8f150 Do not show the configuration content anymore. Plugin validation errors and other config problems are more specifically logged before we get to this point, and showing the full config can too easily obscure a more actional 'you used an invalid setting' kind of error.
Fixes #6654
2017-02-07 19:27:10 -05:00
Joao Duarte
acdd400e97 add setting class that coerces value to array
Fixes #6630
2017-02-07 06:42:57 -05:00
Joao Duarte
ffa9c710a2 make SafeURI class clone deeper
Fixes #6645
2017-02-06 17:29:26 -05:00
Colin Surprenant
eba90f911d refactor agent pipeline reloading to avoid double live pipelines with same settings
extracted BasePipeline class to support complete config validation

minor review changes

added comment
2017-02-03 18:17:47 -05:00
Colin Surprenant
6ba6bb4037 support releaseLock
added usage comment
2017-02-03 13:24:16 -05:00
Joao Duarte
4d28c401f0 fix shutdown watcher reports
since 5.x introduced log4j2 as the main logging mechanism, it's
necessary to be more explicit when logging complex objects.

In this case we tell the logger to use the .to_s version of the Snapshot
report generated by the Watcher.
The Snapshot#to_s calls .to_simple_hash.to_s

Fixes #6628
2017-02-02 10:28:48 -05:00
Colin Surprenant
193022c97b exclusive file locking class and tests
use new bin/ruby and bundler without :development

refactor to DRY and use expected exception

added original Apache 2.0 license and some cosmetics

exclude bin/lock from packaging

rename variables
2017-01-31 13:50:24 -05:00
Colin Surprenant
7c51f46982 optimistic recovery of queue head page on open
recover method wip

recover method wip

recevered lenght 0 is invalid

fix firstUnackedSeqNum

DRYied open and recover

DRYed and refactored to extract AbstractByteBufferPageIO from ByteBufferPageIO and MmapPageIO

better exception messages

cleanup and remove subject idiom

rename _mapFile to mapFile

added invalid state recovery tests

use log4j

renamed TestSettings methods to improve readability

duplicate code

add version check

typo

use test exceptions annotation

use parametrized tests

added uncheck() method to clean test stream

add better message todo

proper javadoc comment

typo
2017-01-31 11:30:47 -05:00
Joao Duarte
f5601ecbd7 propagate pipeline.id to api resources
this removes explicit references to the "main" pipeline,
using instead the value of the `pipeline.id` from LogStash::SETINGS

Fixes #6606
2017-01-30 12:46:20 -05:00
Joao Duarte
d6d0c672ae ensure pipeline.id is correctly propagated
Fixes #6530
2017-01-26 11:15:36 -05:00
Joao Duarte
b4c7c97452 include pipeline id in queue path
create queue sub directory based on pipeline id

Fixes #6540
2017-01-26 10:27:04 -05:00
Pier-Hugues Pellerin
f2486324af missing specs for the refactoring
Fixes #6499
2017-01-25 08:55:47 -05:00
Pier-Hugues Pellerin
6f9fb96818 Refactor non_reloadable_plugin
Instead of using a list of non reloadable plugin we add a new class
method on the base plugin class that the plugin will override.

By default we assume that all plugins are reloadable, only the stdin
shouldn't be reloadable.

Fixes #6499
2017-01-25 08:55:46 -05:00
Joao Duarte
d73a2a90d4 use running instead of read in some agent specs
the pipeline class two state predicates: ready? and running?

ready? becomes true after `start_workers` terminates (succesfuly or not)
running? becomes true before calling `start_flusher`, which means that
`start_workers` is guaranteed to have terminated successfuly

Whenever possible, we should use `running?` instead of `ready?` in the
spec setup blocks. The only place where this may be bad is when the
pipeline execution is short lived (e.g. generator w/small count) and the
spec may never observe pipeline.running? == true

Fixes #6574
2017-01-24 05:29:35 -05:00
Joao Duarte
0d47a362d0 fix return value of start_pipeline when start_workers fails
during Agent#start_pipeline a new thread is launched that executes
a pipeline.run and a rescue block which increments the failed reload counter

After launching the thread, the parent thread will wait for the pipeline
to start, or detect that the pipeline aborted, or sleep and check again.

There is a bug that, if the pipeline.run aborts during start_workers,
the pipeline is still marked as `ready`, and the thread will continue
running for a very short period of time, incrementing the failed reload
metric.

During this period of `pipeline.ready? == true` and `thread.alive? == true`,
the parent check code will observe all the necessary conditions to
consider the pipeline.run to be succesful and thus increment the success
counter too. This failed reload can then result in both the success and
failure reload count being incremented.

This commit changes the parent thread check to use `pipeline.running?`
instead of `pipeline.ready?` which is the next logical state transition,
and ensures it is only true if `start_workers` runs successfuly.

Fixes #6566
2017-01-20 06:47:14 -05:00
Joao Duarte
705ad54443 improve user facing error when yaml is incorrect
Fixes #6546
2017-01-19 12:14:21 -05:00
Pier-Hugues Pellerin
fdc4c15e46 review comments
Fixes #6498
2017-01-16 12:28:37 -05:00
Pier-Hugues Pellerin
8e57042e25 Extract the creation of the queue into a factory
I've move the initialization code into a factory for future work on the
pipeline.

Fixes #6498
2017-01-16 12:28:37 -05:00
Suyog Rao
c9391e964e Renamed cgroups metric to number_of_elapsed_periods
Renamed number_of_periods to number_of_elapsed_periods to be consistent with ES.

Fixes #6536
2017-01-16 12:20:16 -05:00
Tal Levy
bca87bae40 remove current_size_in_bytes and acked info from node stats
re #6508.

- removed `acked_count`, `unacked_count`, and migrated `unread_count` to
top-level `events` field.
- removed `current_size_in_bytes` info from queue node stats

Fixes #6510
2017-01-09 19:19:28 -05:00
Suyog Rao
9fc093e484 Bump version to 5.3.0 for 5.x branch (#6483) 2017-01-04 08:54:42 -08:00
Mykola Shestopal
4c4330ec2f Fixed calculation of took_in_millis for #6476
Fixes #6481
2017-01-04 11:43:56 -05:00
Colin Surprenant
8c4d8b83fc avoid resetting inexisting tags field back to empty array plus specs
Fixes #6477
2017-01-03 22:31:31 -05:00
Pier-Hugues Pellerin
94a67cb4e0 Record the execution time for each output in the pipeline
Record the wall clock time for each output a new `duration_in_millis`
key will now be available for each output in the api located at http://localhost:9600/_node/stats

This commit also change some expectations in the output_delegator_spec
that were not working as intended with the `have_received` matcher.

Fixes #6458
2016-12-29 12:51:41 -05:00
Pier-Hugues Pellerin
7715fabdc2 Initialize the metric values in the batch to the correct type
When we were initilizing the `duration_in_millis` in the the batch we
were using a `Gauge` instead of a counter, since all the object have the
same signature when the were actually recording the time the value was
replaced instead of incremented.

Fixes #6465
2016-12-29 11:12:32 -05:00
Pier-Hugues Pellerin
8cd4ae5319 Add a test to make sure the Collector#snapshot_metric returns a cloned metric store.
Fixes #6456
2016-12-29 08:59:53 -05:00
Pier-Hugues Pellerin
6250974508 Code cleanup for the collector observer
We have more the responsability of watching the collector inside the
input itself, this feature might come back when we have a new execution
model that can be improved in watching metrics. But this would require
more granular watchers.

No tests were affected by this changes since the code that required that
features was already removed.

Fixes: #6447

Fixes #6456
2016-12-29 08:59:52 -05:00
Colin Surprenant
23e9d910d1 move options as constant
Fixes #6430
2016-12-23 14:34:43 -05:00
Colin Surprenant
1d156aaf14 add explit fsync on checkpoint write
Fixes #6430
2016-12-23 14:34:43 -05:00
Pier-Hugues Pellerin
cc5b283035 Do not log a warning if a plugin is not from in #print_notice_version
When a plugin is loaded using the `plugins.path` option or is from a
universal plugin there no gemspec can be found for the specific plugin.

We should not print any warning on that case.

Fixes: #6444

Fixes #6448
2016-12-21 09:44:37 -05:00
Pier-Hugues Pellerin
ad763c81a3 Initialize with default values global events and pipeline events related metric
The metric store has no concept is a metric need to exist so as a rule
of thumb we need to defined them with 0 values and send them to the
store when we initialize something.

This PR make sure the batch object is recording the right default values

Fixes: #6449

Fixes #6450
2016-12-21 09:43:29 -05:00
Joao Duarte
90c364e903 ensure metric collection is disabled when metric.collect is false
Fixes #6445
2016-12-21 07:20:23 -05:00
Joao Duarte
a9e474f0e1 add tests for webserver metric
Fixes #6385
2016-12-19 12:43:25 -05:00