Commit graph

55 commits

Author SHA1 Message Date
Mark Vieira
cdbd7ad543
Add publishing plugin to elasticsearch-grok project (#89184)
It seems https://github.com/elastic/elasticsearch/pull/88982 introduced
a dependency on `elasticsearch-grok` to `x-pack-core`. Since the latter
is published to Maven Central, this means consumers will have issues
resolving it's dependencies since `elasticsearch-grok` isn't published.
This pull request resolves this, by adding the publishing plugin to the
`grok` library. We'll then follow up separately to add that to our
release configuration.
2022-08-09 08:02:08 +09:30
Rene Groeschke
3909b5eaf9
Add verification metadata for dependencies (#88814)
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support. 

- Use sha256 in favor of sha1 as sha1 is not considered safe these days.

Closes https://github.com/elastic/elasticsearch/issues/69736
2022-08-04 09:51:16 +02:00
Chris Hegarty
d245458227
Modularize the ingest.common component (as well as dissect and grok dependent libs) (#87219)
This is change modularizes the ingest.common component,
by adding a module-info.java. As well as two dependent libs.

The project only requires painless SPI to compile, so that was
fixed along the way ( so that the compile module path can be
inferred directly from the dependencies ).
2022-05-30 17:08:13 +01:00
Artem Prigoda
0699c9351f
Use Java 14 switch expressions (#82178)
JEP 361[https://openjdk.java.net/jeps/361] added support for switch expressions
which can be much more terse and less error-prone than switch statements.

Another useful feature of switch expressions is exhaustiveness: we can make
sure that an enum switch expression covers all the cases at compile time.
2022-01-10 09:53:35 +01:00
Artem Prigoda
763d6d510f
Use Java 15 text blocks for JSON and multiline strings (#80751)
The ES code base is quite JSON heavy. It uses a lot of multi-line JSON requests in tests which need to be escaped and concatenated which in turn makes them hard to read. Let's try to leverage Java 15 text blocks for representing them.
2021-12-15 18:01:28 +01:00
Ryan Ernst
c66a371f8e
Upgrade Mockito to 4.0.0 (#79949)
Mockito 4.0 removes several deprecated methods. This commit updates
usages of those deprecated methods and upgrades mockito. The changes
include: * Replace anyMapOf,anyListOf,anySetOf,anyCollectionOf with the
same   method name without `Of` and no longer taking any arguments. *
Replace anyObject with any * Removing argument from isNull * Replace
verifyZeroInteractions with verifyNoMoreInteractions The changes here
were completely mechanical, done entirely with forms of find/replace
within IntelliJ.
2021-10-27 16:16:18 -04:00
Mark Vieira
12ad399c48 Reformat Elasticsearch source 2021-10-27 08:19:51 -07:00
Ryan Ernst
f8d8702b88
Convert uses of mockito Matchers to ArgumentMatchers (#79852)
Matchers is deprecated in Mockito, in favor of the newer
ArgumentMatchers class. In fact, internally Matchers just extends
ArgumentMatchers as all the methods there were moved. This commit
changes all imports of org.mockito.Matchers to
org.mockito.ArgumentMatchers.
2021-10-26 14:16:11 -07:00
Rory Hunter
e55edf937a
Fix shadowed variables in various places - part 1 (#77555)
Part of #19752.

Fix a number of locations where local variables or parameters are shadowing a field
that is defined in the same class.
2021-09-13 13:48:46 +01:00
Dan Hermann
90d2899323
ECS support for Grok processor (#76885) 2021-08-31 06:40:52 -05:00
Dan Hermann
d648b768d8
Sync grok processor patterns with Logstash (#76752) 2021-08-24 07:34:27 -05:00
David Roberts
a0d26954bd
Improve efficiency of Grok circular reference check (#74814)
This change is a tweak to #74581 which removes the N^2
loop that was added in that PR.
2021-07-01 14:30:03 +01:00
Dan Hermann
7603fded04
Improve circular reference detection in grok processor (#74581) 2021-06-30 09:23:29 -05:00
Mark Vieira
a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Nik Everett
2e346f3fae
Grok: "native" results (#62843)
This adds the ability to fetch java primitives like `long` and `float`
from grok matches rather than their boxed versions. It also allows
customizing the which fields are extracted and how they are extracted.
By default we continue to fetch a `Map<String, Object>` but runtime
fields will be able to catch *just* the fields it is interested
in, and the values will be primitives.
2020-09-24 10:46:38 -04:00
Nik Everett
59eac2262e
Grok: Handle utf-8 natively (#62794)
This adds a method to `Grok` that matches against sections offset from
utf-8 byte arrays:
```
Map<String, Object> captures(byte[] utf8Bytes, int offset, int length)
```

This'll be useful for the grok-flavored runtime fields because they
want to match against utf-8 encoded strings stored in a big array. And
joni already supports this.
2020-09-23 08:42:43 -04:00
Nik Everett
d4e014c4c2
Extract capture config from grok patterns up front (#62706)
This extracts the configuration for extracting values from a groked
string when building the grok expression to do two things:
1. Create a method exposing that configuration on `Grok` itself which
   will be used grok `grok` flavored runtime fields.
2. Marginally speed up extracting grok values by skipping a little
   string manipulation.
2020-09-22 13:55:43 -04:00
Nik Everett
152ee3aca4
Raname grok's built-in patterns (#62735)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
2020-09-22 09:31:20 -04:00
Dan Hermann
9dcab76427
Preserve grok pattern ordering and add sort option (#61671) 2020-09-08 07:10:27 -05:00
Rene Groeschke
9526c7a4b3
Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build
2020-06-30 09:37:09 +02:00
Rene Groeschke
680ea07f7f
Remove deprecated usage of testCompile configuration (#57921)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-12 13:34:53 +02:00
Jake Landis
f5910664b7
Ensure Joni warning are logged at debug (#57302)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 13:33:27 -05:00
Jake Landis
839ac4dd6a
Prevent stack overflow for numerous grok patterns. (#55899)
This was noticed for a pipeline that was defining hundreds of
grok patterns inline with a single grok processor.

The recursive call used to translate a Grok pattern to a regular
expression can overflow the stack. This commit converts that method 
to an iterative method. 

Co-authored-by: Przemko Robakowski <probakowski@users.noreply.github.com>
2020-04-30 19:29:18 -05:00
Ryan Ernst
842ce32870
Use task avoidance with forbidden apis (#55034)
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
2020-04-15 13:23:55 -07:00
Gautam
eb097700cf
Missing suffix for German Month "Juli" in Grok Pattern MONTH (#51579) (#51591) 2020-02-03 14:53:30 -06:00
Ryan Ernst
bf317e8c4e
Remove comparison to true for booleans (#51723)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
2020-01-31 16:34:27 -08:00
Alexander Reelsen
c9786592eb
Sync grok patterns with logstash patterns (#50381)
In order to ensure that logstash and Elasticsearch are able to understand
the same patterns, this commit adapts to changes in logstash, adds a few
patterns and changes a few.
2020-01-08 14:50:43 +01:00
Rory Hunter
3a3e5f6176
Apply 2-space indent to all gradle scripts (#48849)
Closes #48724. Update `.editorconfig` to make the Java settings the default
for all files, and then apply a 2-space indent to all `*.gradle` files.
Then reformat all the files.
2019-11-13 10:14:04 +00:00
Martijn van Groningen
2b4aaab9e9
Unmuted and fixed test.
Multiple invocations are expected.

see #48519
2019-10-28 10:36:50 +01:00
Martijn van Groningen
54eda8d7d1
Muted test
See #48519
2019-10-28 09:26:16 +01:00
Martijn van Groningen
12d32af6b4
Change grok watch dog to be Matcher based instead of thread based. (#48346)
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
2019-10-24 15:33:30 +02:00
Martijn van Groningen
9e7cfc8183
Remove redundant nested operator in builtin grok expression. (#47870)
This prevents the following warning from being printed to console:
`regular expression has redundant nested repeat operator + /%\{(?<name>(?<pattern>[A-z0-9]+)(?::(?<subname>[[:alnum:]@\[\]_:.-]+))?)(?:=(?<definition>(?:(?:[^{}]+|\.+)+)+))?\}/`

The current grok expression is not failing, but just this warning is being printed.
The warning started being printed after upgrading joni (#47374).

Closes #47861
2019-10-14 14:34:09 +02:00
Tal Levy
f6f249be15
Expose ValueException in Grok (#47368)
Previously, Grok's groupMatch would allow the code to
fall into an IndexOutOfBoundsException, which can be avoided.
The other exception that can come up is a ValueException. The times
this exception occurs is less understood, but it may make sense to expose
this since it typically means something did not go well.
2019-10-04 13:55:41 -07:00
Martijn van Groningen
785cf6bd44
Upgrade joni from 2.1.6 to 2.1.29 (#47374)
Changed the Grok class to use searchInterruptible(...) instead of search(...)
otherwise we can't interrupt long running matching via the thread watch
dog.

Joni now also provides another way to interrupt long running matches.
By invoking the interrupt() method on the Matcher. We need then to refactor
the watch thread dog to keep track of Matchers instead of Threads, but
it is a better way of doing this, since interrupting would be more direct
(not every 30k iterations) and efficient (checking a volatile field).
This work needs to be done in a follow up.
2019-10-04 06:30:41 -05:00
Alpar Torok
ca54b442bf
Remove eclipse conditionals (#44075)
* Remove eclipse conditionals

We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.

This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.

Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.

This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
2019-10-03 10:50:46 +03:00
Alexander Reelsen
08a3549a1e
Upgrade jcodings dependency to 1.0.44 (#43334) 2019-06-26 10:03:16 +02:00
Albert Zaharovits
998419c49f
Eclipse libs projects setup fix (#42852)
Fallout from #42773 for eclipse users.
2019-06-04 14:53:26 -04:00
Mark Vieira
12d583dbf6
Remove unnecessary usage of Gradle dependency substitution rules (#42773) 2019-06-03 16:18:45 -07:00
David Roberts
677c391df0
Avoid HashMap construction on Grok non-match (#42444)
This change moves the construction of the result
HashMap in Grok.captures() into the branch that
actually needs it.

This probably will not make a measurable difference
for ingest pipelines, but it is beneficial to the
ML find_file_structure endpoint, as it tries out
many Grok patterns that will fail to match.
2019-05-23 21:04:03 +01:00
austintp
8ebff0512b Updates the grok patterns to be consistent with logstash (#27181) 2019-02-05 12:37:02 -06:00
Henning Andersen
68ed72b923
Handle scheduler exceptions (#38014)
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of #28667, #36137 and also fixes #37708.
2019-01-31 17:51:45 +01:00
Alpar Torok
a7c3d5842a
Split third party audit exclusions by type (#36763) 2019-01-07 17:24:19 +02:00
John
0baffda390 ingest: grok remove duplicated patterns (#35886)
This commit removes the redundant (and incorrect) JAVACLASS
and JAVAFILE grok patterns. This helps to keep parity with 
Logstash's patterns. 

See also: https://github.com/logstash-plugins/logstash-patterns-core/pull/237
 
closes #35699
2018-11-26 11:13:46 -06:00
Christoph Büscher
ba3ceeaccf
Clean up "unused variable" warnings (#31876)
This change cleans up "unused variable" warnings. There are several cases were we 
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
2018-09-26 14:09:32 +02:00
Alpar Torok
82d10b484a
Run forbidden api checks with runtimeJavaVersion (#32947)
Run forbidden APIs checks with runtime hava version
2018-08-22 09:05:22 +03:00
Armin Braun
4dda5a990b
INGEST: Fix ThreadWatchDog Throwing on Shutdown (#32578)
* INGEST: Fix ThreadWatchDog Throwing on Shutdown

* #32539 is caused by the fact that ThreadWatchDog.Default could throw on shutdown if the ThreadPool is interrupted while `interruptLongRunningExecutions` is in progress. This is a result of the watchdog not having a lifecycle of its own (normally it terminates when the threadpool terminates).
  * We can't easily use `org.elasticsearch.common.util.concurrent.EsRejectedExecutionException#isExecutorShutdown` to catch this state the same way other components do since thatwould require adding the core lib to Grok as a dependency
  * Since we have no knowledge of the lifecycle in this compontent since we're only passed the scheduler `BiFunction` I fixed this by only scheduling the watchdog when there's actually registered threads in it.
    * I think using the patter of locking via two `Atomic*` values should not be much of a performance concern here under load since either the integer will likely be > 0 in this case (because we have multiple Grok in parallel) or the running state will be true because there likely was at least one thread registered when the watchdog ran and so the enqueing of the watchdog task during `register` will happen very rarely here (in the worst case scenario of only a single Grok thread it will happen less frequently than once every `ingest.grok.watchdog.interval`). The atomic update on the count should not be relevant relative to the cost of adding a new node to the CHM either.
* Fixes #32539
  * Also fixes the watchdog to run if it doens't have to in general.
2018-08-06 22:46:26 +02:00
Armin Braun
b7b413e55e
Extend allowed characters for grok field names (#21745) (#31653) 2018-06-29 09:12:47 +02:00
Martijn van Groningen
6030d4be1e
[INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024)
This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search()
This method can hang forever if the regex expression is too complex.

The thread interrupter in the background checks every 3 seconds whether there are threads
execution the org.joni.Matcher#search() method for longer than 5 seconds and
if so interrupts these threads.

Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and
if so returns org.joni.Matcher#INTERRUPTED

Closes #28731
2018-06-12 07:49:03 +02:00
Tanguy Leroux
bf58660482
Remove all unused imports and fix CRLF (#31207)
The X-Pack opening and the recent other refactorings left a lot of 
unused imports in the codebase. This commit removes them all.
2018-06-11 15:12:12 +02:00
Nik Everett
69aabb7e40
Build: Fail if any libs depend on non-core libs (#29336)
Fails the build if any subprojects of `:libs` have dependencies in `:libs`
except for `:libs:elasticsearch-core`.

Since we now have three places where we resolve project substitutions
I've added `dependencyToProject` to `project.ext` in all projects. It
resolves both `project` style dependencies and "external" style (like
"org.elasticsearch:elasticsearch-core:${version}") dependencies to
`Project`s using the `projectSubstitutions`. I use this new function all
three places where resovle project substitutions.

Finally this pulls `apply plugin: 'elasticsearch.build'` out of
`libs/*/build.gradle` and into a subprojects clause in
`libs/build.gradle`. I do this entirely so that I can call
`tasks.precommit.dependsOn checkDependencies` without waiting for the
subprojects to be evaluated or worrying about whether or not they have
`precommit` set up in a normal way.
2018-04-16 11:49:27 -04:00