Commit graph

10505 commits

Author SHA1 Message Date
Andres Rodriguez
ef6852b687
DRA: uploading missing docker-build-context files (#14722) 2022-11-02 16:35:45 -04:00
Ry Biesemeyer
372a61219f
Fix pipelines yaml loading (#14713)
* source/multilocal: fix detection of empty pipelines.yml

Fixes a regression introduced in elastic/logstash#13883 in which the presence
of an empty `pipelines.yml` file produces an error message indicating that
the file cannot be read.

When either `YAML::load` or `YAML::safe_load` encounter an effectively-empty
payload (such as one that is entirely comments), they use a `fallback` param
to determine what value to emit, with the former emitting `false` and the
latter emitting `nil`.

This is problematic because a _separate_ blind-`rescue nil` causes `nil` to
be bound to the MultiLocal's `@detected_marker`, and we assume that a `nil`
value in the marker means that there was an exception reading the file (such
as a permissions issue or parse failure).

By providing a `fallback: false` directive when parsing the contents, we
ensure that an empty file is reported as such.

* source/multilocal: avoid `rescue nil` that loses helpful context

When the pipelines yaml cannot be read, or can be read but fails to parse,
the MultiLocal#read_pipelines_from_yaml emits a helpful exception including
specifics about why it failed to load or parse, but a blind `rescue nil`
here causes that helpful information to be lost.

When pipeline detection is exceptional, hold onto the helpful exception
so that it can be reported along with the config conflicts.

* source/multilocal: differentiate between reading and parsing failure

* source/multilocal: use translations for conflict messages

* source/multilocal: specs for error conditions
2022-11-02 11:05:01 -07:00
Ry Biesemeyer
9460d4d7fc
specs: assert presence of logging without risking NoMethodError (#14633)
* specs: assert presence of logging without risking NoMethodError

* Update qa/integration/specs/monitoring_api_spec.rb
2022-11-02 11:01:59 -07:00
kaisecheng
0c1dcc2334
[Doc] k8s troubleshooting (#14606)
doc for k8s troubleshooting and common issues

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
2022-10-28 14:54:51 +01:00
Andres Rodriguez
dd399b62b2
Update add-docs-preview-link.yml (#14710)
Adds docsk8s/** to build preview links.
2022-10-27 11:09:10 -04:00
kaisecheng
cf54386d01
update release version (#14709) 2022-10-26 17:55:08 +01:00
kaisecheng
08072aae0b
update release file 7.17.8 (#14692) 2022-10-26 17:44:45 +01:00
Andres Rodriguez
2e8bd20cf5
DRA - Fix docker image build (#14706)
Fix the docker image building and upload process:
 * Builds ubi8 on x86_64.
 * Uploads ironbank and ubi8 context files from x86_64 only.
2022-10-26 11:20:50 -04:00
Andrea Selva
6ad5690a8c
Adds upload of missed docker docker-build-context.tar.gz artifacts (#14703)
Updates the dra_docker.sh script to upload also docker-build-context.tar.gz files
2022-10-26 12:15:55 +02:00
Andres Rodriguez
17d0bb5ffb
DRA - Fix error reporting (#14698)
Ensures the DRA build script surfaces a rake error, instead of allowing the build to continue.

This ensures that the build doesn't continue if any of the steps fails.

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
2022-10-25 17:00:09 -04:00
Andres Rodriguez
6ba5cc112f
DRA - generalize docker image building (#14670)
* Generalize docker image building
* Rename and add ability to pass the architecture as a parameter
* Handle ARCH env variable
2022-10-24 12:50:11 -04:00
Andres Rodriguez
9584d1332b
DRA - fix dra_upload syntax, breaking builds (#14685)
Fix dra_upload.sh syntax that's breaking the build.
2022-10-22 13:02:34 -04:00
Andrea Selva
9c7b7b7454
[DRA] Don't download Darwin arrch64 for 7.17 (#14677)
Version 7.17 doesn't generate Darwin aarch64 artifacts. Don't download these artifacts from the GCS bucket, given that we don't build Darwin for that release.
2022-10-20 16:30:41 +02:00
Andres Rodriguez
86a18e6e3f
Exclude jruby's bundler from artifacts (#14667)
Exclude Jruby's bundler and rake from the built artifacts. The artifacts don't need to ship with such dependencies. Also, Logstash will bundle its own bundler for plugin management but it is not the one shipped with jruby.

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
2022-10-18 11:13:02 -04:00
Andres Rodriguez
393460025e
Fix sourcing on dra_upload (#14659)
Fix sourcing on dra_upload.sh
2022-10-17 12:55:47 -04:00
Andres Rodriguez
8bfc7ef164
Fix dra_common sourcing (#14657)
Fixes the source of dra_common.sh. It will now first check the directory of the file from which this dra_common.sh script is being called. This allows the common script to be sourced regardless of where the sourcing script is being called from.
2022-10-17 11:09:56 -04:00
Andres Rodriguez
5b1d53622c
DRA: Improve shell scripts for debuggability (#14654)
The changes remove some code duplication by introducing a common file that can be sourced between all scripts. It also improves debuggability by adding better messages.
2022-10-17 10:23:26 -04:00
Mashhur
f19e9cb647
Collect queue growth events and bytes metrics when PQ is enabled. (#14554)
* Collect growth events and bytes metrics if PQ is enabled: Java changes.

* Move queue flow under queue namespace.

* Pipeline level PQ flow metrics: add unit & integration tests.

* Include queue info in node stats sample.

* Apply suggestions from code review

Change uptime precision for PQ growth metrics to uptime seconds since PQ events are based on seconds.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Add safeguard when using lazy delegating gauge type.

* flow metrics: simplify generics of lazy implementation

Enables interface `FlowMetrics::create` to take suppliers that _implement_
a `Metric<? extends Number>` instead of requiring them to be pre-cast, and
avoid unnecessary exposure of the metrics value-type into our lazy init.

* flow metrics: use lazy init for PQ gauge-based metrics

* noop: use enum equality

Avoids routing two enum values through `MetricType#toString()`
and `String#equals()` when they can be compared directly.

* Apply suggestions from code review

Optional.ofNullable used for safe return. Doc includes real tested expected metric values.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* flow metrics: make lazy-init wraper inherit from AbstractMetric

this allows the Jackson serialization annotations to work

* flow metrics: move pipeline queue-based flows into pipeline flow namespace

* Follow up for moving PQ growth metrics under pipeline.*.flow.
- Unit and integration tests are added or fixed.
- Documentation added along with sample response data

* flow: pipeline pq flow rates docs

* Do not expect flow in the queue section of API. Metrics moved to flow section.

Update logstash-core/spec/logstash/api/commands/stats_spec.rb

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Integration test failure fix.

Mistake: `flow_status` should be `pipeline_flow_stats`

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Integration test failures fix.

Number should be Numeric in the ruby specs.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Make CI happy.

* api specs: use PQ only where needed

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
2022-10-13 15:30:31 -07:00
Andres Rodriguez
db6a7bc619
DRA: Handle env variables better (#14644)
* DRA: Handle env variables better
* Moved the addition of SNAPSHOT suffix to the version after the VERSION_QUALIFIER
* Fix badly assigned variable, version qualifier has to be appended also to PLAIN_STACK_VERSION and not RELEASE_VER

Co-authored-by: andsel <selva.andre@gmail.com>
2022-10-13 10:38:55 -04:00
Andrea Selva
cb76c685b7
Follow up PR of #14645, adds version qualifier to the plain version variable (#14646) 2022-10-13 11:47:30 +02:00
Andrea Selva
b8792107ad
Avoid to pass SNAPSHOT particle to the version passed to release-manager (#14645)
The version passed to the release-manager doesn't need the SNAPSHOT particle because already handled by the --workflow="snapshot", if inserted make the release manager to search for artifacts named as 8.5.0-SNAPSHOT-SNAPSHOT
2022-10-13 10:24:34 +02:00
Andres Rodriguez
ad71ff24c8
Disable -x in dra build scripts (#14643) 2022-10-12 17:46:27 -04:00
Andres Rodriguez
bfaa063280
Enable debug for DRA shells scripts (#14642) 2022-10-12 16:22:30 -04:00
Andres Rodriguez
363adad3b6
dra_upload.sh: Leave artifacts under build/ (#14639)
Do not move out artifacts from the build/ former to ensure the upload doesn't fail.
2022-10-12 14:36:57 -04:00
João Duarte
00a7ae8a75
fix PipelineIR.getPostQueue by accounting for vertex copies (#13621)
During graph composition vertices may be copied. This caused
getPostQueue to malfunction as the QueueVertex object stored in the
PipelineIR isn't the one present in the graph once it's fully generated.

This object mismatch caused Graph.getSortedVerticesBetween to not find
the QueueVertex since it takes Objects instead of ids.

This commit waits for the graph to be built and then retrieves the
QueueVertex from the graph and sets it in PipelineIR.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
2022-10-12 16:19:05 +01:00
Andrea Selva
11ecaaea5a
Fix/dra use another technique to extract branch name (#14636)
Avoid to leverage on git local commands to guess the local branch, it switches to listing the branches and checking against the the stack version. If doesn't exists it's main
2022-10-12 14:51:47 +02:00
Andrea Selva
63d5658015
Re-added execution rights to dra_upload.sh (#14626)
Re-apply execution permission to DRA upload script
2022-10-12 09:02:23 +02:00
Ry Biesemeyer
bab2e1c03e
timestamp: respect locale's decimal-style when parsing (#14628)
Uses the locale-defined decimal style first.

When encountering a failure and the locale-defined decimal style is NOT
the "standard" decimal style, retry the parse operation with the "standard"
decimal style.
2022-10-11 15:29:56 -07:00
Andrea Selva
ff8afb2293
Switch branch selector from major.minor to read the current branch name (#14619)
Switch branch selector from major.minor to read the current branch name
2022-10-11 18:59:28 +02:00
Ry Biesemeyer
de49eba22a
api: source pipelines that are fully-loaded (#14595)
* specs: detangle out-of-band pipeline initialization

Our API tests were initializing their pipelines-to-test in an out-of-band
manner that prevented the agent from having complete knowledge of the
pipelines that were running. By providing a ConfigSource to our Agent's
SourceLoader, we can rely on the normal pipeline reload behaviour to ensure
that the agent fully-manages the pipelines in question.

* api: do not emit pipeline that is not fully-initialized
2022-10-11 08:14:00 -07:00
Andrea Selva
d8d690079a
Updates DRA scripts to build snapshot artifacts (#14600)
Handle the WORKFLOW_TYPE enviroment variable used to select the kind of artifacts to generate and consequently adapt the version name.
If the WORKFLOW_TYPE has a value assigned other then empty string it's assumed to be snapshot and so it generates snapshot artifacts else the release ones.

Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com>
2022-10-11 16:32:39 +02:00
Andrea Selva
3075029b27
DRA - Update scripts to use the version qualifier in stack_version (#14589)
Update DRA scripts to use the version qualifier in stack_version variable for alpha and beta builds

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
2022-10-10 18:08:16 +02:00
Andrea Selva
d07eb01e23
Adds new close method to Java's Filter API to be used to clean shutdown resources allocated by the filter during registration phase. (#14485)
- Adds a new method to the public API interface
- Pass the call through the JavaFilterDelegatorExt

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
2022-10-10 17:35:29 +02:00
kaisecheng
8a8a036896
Fix DLQ fails to start due to read 1 byte file (#14605)
This commit ignores DLQ files that contain only the version number. These files have no content and should be skipped.

Fixed: #14599
2022-10-10 11:07:13 +01:00
Andrea Selva
d3b92ec20c
Extract the branch name passed to release-manager from version file (#14592)
* Extract the branch name passed to release-manager from version and not from git current branch

Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com>
2022-10-10 11:13:27 +02:00
Ry Biesemeyer
46babd6041
Extended Flow Metrics (#14571)
* flow metrics: extract to interface, sharable-comon base, and implementation

In preparation of landing an additional implementation of FlowMetric, we
shuffle the current parts net-unchanged to provide interfaces for `FlowMetric`
and `FlowCapture`, along with a sharable-common `BaseFlowMetric`, and move
our initial implementation to a new `SimpleFlowMetric`, accessible only
through a static factory method on our new `FlowMetric` interface.

* flow-rates: refactor LIFETIME up to sharable base

* util: add SetOnceReference

* flow metrics: tolerate unavailable captures

While the metrics we capture from in the initial release of FlowMetrics
are all backed by `Metric<T extends Number>` whose values are non-null,
we will need to capture from nullable `Gauge<Number>` in order to
support persistent queue size and capacity metrics. This refactor uses
the newly-introduced `SetOnceReference` to defer our baseline lifetime
capture until one is available, and ensures `BaseFlowMetric#doCapture`
creates a capture if-and-only-if non-null values are available from
the provided metrics.

* flow rates: limit precision for readability

* flow metrics: introduce policy-driven extended windows implementation

The new ExtendedFlowMetric is an alternate implementation of the FlowMetric
introduced in Logstash 8.5.0 that is capable of producing windoes for a set of
policies, which dictate the desired retention for the rate along with a
desired resolution.

 - `current`: 10s retention, 1s resolution [*]
 - `last_1_minute`: one minute retention, at 3s resolution [*]
 - `last_5_minutes`: five minutes retention, at 15s resolution
 - `last_15_minutes`: fifteen minutes retention, at 30s resolution
 - `last_1_hour`: one hour retention, at 60s resolution
 - `last_24_hours`: one day retention at 15 minute resolution

A given series may report a range for slightly longer than its configured
retention period, up to the either the series' configured resolution or
our capture rate (currently ~5s), whichever is greater. This approach
allows us to retain sufficient data-points to present meaningful rolling
averages while ensuring that our memory footprint is bounded.

When recording these captures, we first stage the newest capture, and then
promote the previously-staged caputure to the tail of a linked list IFF
the gap between our new capture and the newest promoted capture is larger
than our desired resolution.

When _reading_ these rates, we compact the head of that linked list forward
in time as far as possible without crossing the desired retention barrier,
at which point the head points to the youngest record that is old enough
to satisfy the period for the series.

We also occesionally compact the head during writes, but only if the head
is significantly out-of-date relative to the allowed retention.

As implemented here, this extended flow rates are on by default, but can be
disabled by setting the JVM system property `-Dlogstash.flowMetric=simple`

* flow metrics: provide lazy-initiazed implementation

* flow metrics: append lifetime baseline if available during init

* flow metric tests: continuously monitor combined capture count

* collection of unrelated minor code-review fixes

* collection of even more unrelated minor code-review fixes
2022-10-06 18:35:33 -07:00
kaisecheng
b408638084
update ci release version (#14598) 2022-10-06 10:45:24 +01:00
David Kilfoyle
6dc5c5648a
[DOC] Add a short guide for using Logstash with K8s (#14532)
This contains an early draft structure and some content for a planned Logstash and Kubernetes Reference.
2022-10-03 09:33:23 -04:00
Andrea Selva
8ddd3ae6f3
Collect all artifacts created and upload to GCP with release-manager (#14584)
Downloads all artifacts generated for ARM and x86 by ci/dra_x86_64.sh ci/dra_aarch64.sh, position in locations expected by the release-manager and invokes it to upload to the global bucket.
2022-10-03 12:05:30 +02:00
Andrea Selva
4fbb57a522
Upload DRA artifacts to collector GCS bucket (#14568)
Save ARM and x86 artifacts into GCS collector bucket

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
2022-09-28 17:48:31 +02:00
Andrea Selva
214d2bed64
Split ci scripts into ARM and x86 ones (#14567)
Splitted DRA script for x86_64 and aarch64
2022-09-28 15:42:48 +02:00
Andres Rodriguez
184fb1075e
Update add_to_projects_beta.yml 2022-09-28 09:31:45 -04:00
Mashhur
bd3451270c
Dev instructions improvement (#14219)
* LS local development guide improvements.

* Use imperative guide instead abstraction.

* Give a proper name to Gradle task to install the dev gems.
2022-09-27 13:38:36 -07:00
Karol Bucek
74e72fb9b0
Perf: use JRuby JIT defaults (improves startup) (#14284)
Removing the -Djruby.jit.threshold=0 flag, which seems to have been introduced due benchmarking.

Removing the force of AOT means a noticeably faster startup (we do not need to 'compile' every method we bump into). The JIT threshold default is 50 in JRuby 9.2/9.3, there might be other heuristics in the future to better determine hot methods.
2022-09-27 16:28:08 -04:00
Ry Biesemeyer
228030c494
Simplify Pipeline class Hierarchy (#14551)
* refactor: pull members up from JavaBasePipelineExt to AbstractPipelineExt

* refactor: make `LogStash::JavaPipeline` inherit directly from `AbstractPipeline`
2022-09-26 18:16:20 -07:00
Andres Rodriguez
cd03c86102
Document JDK17 by default (#14511)
Document that JDK17 is now the default.
2022-09-26 13:23:05 -04:00
Andrea Selva
7e95f6ecaf
DRA fixes: (#14552)
- save docker images as tar.gz files
- move the CSV dependency report in the path that's expected by release-manager
2022-09-23 14:12:15 +02:00
github-actions[bot]
9bc2496e7b Release notes for 8.4.2 (#14531)
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
2022-09-21 16:40:07 +01:00
João Duarte
05785e9a0b
bump to 8.6.0 (#14545) 2022-09-21 15:44:35 +01:00
Ry Biesemeyer
6e0b365c92
Feature: flow metrics integration (#14518)
* Flow metrics: initial implementation (#14509)

* metrics: eliminate race condition when registering metrics

Ensure our fast-lookup and store tables cannot diverge in a race condition
by wrapping mutation of both in a single mutex and appropriately handle
another thread winning the race to the lock by using the value that it
persisted instead of writing our own.

* metrics: guard against intermediate namespace conflicts

 - ensures our safeguard that prevents using an existing metric as a namespace
   is applied to _intermediate_ nodes, not just the tail-node, eliminating a
   potential crash when sending `fetch_or_store` to a metric object that is not
   expected to respond to `fetch_or_store`.
 - uses the atomic `Concurrent::Map#compute_if_absent` instead of the
   non-atomic `Concurrent::Map#fetch_or_store`, which is prone to
   last-write-wins during contention (as-written, this method is only
   executed under lock and not subject to contention)
 - uses `Enumerable#reduce` to eliminate the need for recursion

* flow: introduce auto-advancing UptimeMetric

* flow: introduce FlowMetric with minimal current/lifetime rates

* flow: initialize pipeline metrics at pipeline start

* Controller and service layer implementation for flow metrics. (#14514)

* Controller and service layer implementation for flow metrics.

* Add flow metrics to unit test and benchmark cli definitions.

* flow: fix tests for metric types to accomodate new one

* Renaming concurrency and backpressure metrics.

Rename `concurrency` to `worker_concurrency ` and `backpressure` to `queue_backpressure` to provide proper scope naming.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* metric: register flow metrics only when we have a collector (#14529)

the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.

* Unit tests and integration tests added for flow metrics. (#14527)

* Unit tests and integration tests added for flow metrics.

* Node stat spec and pipeline spec metric updates.

* Metric keys statically imported, implicit error expectation added in metric spec.

* Fix node status API spec after renaming flow metrics.

* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.

* metric: register flow metrics only when we have a collector (#14529)

the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.

* Unit tests and integration tests added for flow metrics.

* Node stat spec and pipeline spec metric updates.

* Metric keys statically imported, implicit error expectation added in metric spec.

* Fix node status API spec after renaming flow metrics.

* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.

* Rebasing with feature branch.

* metric: register flow metrics only when we have a collector

the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.

* Apply suggestions from code review

Integration tests updated to test capturing the flow metrics.

* Flow metrics expectation updated in tegration tests.

* flow: refine integration expectations for reloads/monitoring

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>

* metric: add ScaledView with sub-unit precision to UptimeMetric (#14525)

* metric: add ScaledView with sub-unit precision to UptimeMetric

By presenting a _view_ of our metric that maintains sub-unit precision,
we prevent jitter that can be caused by our periodic poller not running at
exactly our configured cadence.

This is especially important as the UptimeMetric is used as the _denominator_ of
several flow metrics, and a capture at 4.999s that truncates to 4s, causes the
rate to be over-reported by ~25%.

The `UptimeMetric.ScaledView` implements `Metric<Number>`, so its full
lossless `BigDecimal` value is accessible to our `FlowMetric` at query time.

* metrics: reduce window for too-frequent-captures bug and document it

* fixup: provide mocked clock to flow metric

* Flow metrics cleanup (#14535)

* flow metrics: code-style and readability pass

* remove unused imports

* cleanup: simplify usage of internal helpers

* flow: migrate internals to use OptionalDouble

* Flow metrics global (#14539)

* flow: add global top-level flows

* docs: add flow metrics

* Top level flow metrics unit tests added. (#14540)

* Top level flow metrics unit tests added.

* Add unit tests when config reloads, make sure top-level flow metrics didn't get reset.

* Apply suggestions from code review

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Validating against Hash test cases updated.

* For the safety check against exact type in unit tests.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* docs: section links and clarity in node stats API flow metrics

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>
2022-09-19 14:21:45 -07:00