Commit graph

70 commits

Author SHA1 Message Date
Ryan Ernst
dedf9fd6d7
Use directory name as project name for libs (#115720) (#115984)
* Use directory name as project name for libs (#115720)

The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.

This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.

* fixes
2024-10-31 07:52:10 +11:00
Mark Vieira
0279c0a909
Add AGPLv3 as a supported license 2024-09-13 14:30:33 -07:00
David Turner
bff45aaa8a
Reduce CompletableFuture usage in tests (#111848)
Fixes some spots in tests where we use `CompletableFuture` instead of
one of the preferred alternatives.
2024-08-27 08:06:20 +01:00
Keith Massey
a02dc7165c
Improve performance of grok pattern cycle detection (#111947) 2024-08-26 13:39:19 -05:00
Jan Kuipers
5dec83f69e
Endpoint to test Grok pattern (#104394)
* Add extract match ranges functionality to Grok.

* TestGrokPatternAction and Request

* TestGrokPattern response

* Update docs/changelog/104394.yaml

* Polish validation error message

* Improve test_grok_pattern API

* Add explicit CharSet

* Add endpoint to operator constants

* Add TransportTestGrokPatternActionTests

* REST API spec

* One more TransportTestGrokPatternActionTest

* Fix API spec

* Refactor REST API spec

* Polish code

* Replace TransportTestGrokPatternActionTests by a YAML REST test

* Add ecs_compatibility

* Always return arrays in the API

* Documentation

* YAML test for ecs_compatibility

* Rename doc fileø

* serverless scope

* Fix docs (hopefully)

* Update docs/reference/rest-api/index.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Add "text structure APIs" header in docs TOC

* Move file

* Remove test grok from main index

* typo

* Nested APIs underneath text structure

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-01-24 09:35:59 +01:00
Armin Braun
b7eafce32c
Make some practically static methods static (#97565)
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
2023-10-06 23:37:07 +02:00
Chris Hegarty
7e884239f1
Make GrokCaptureType public (#97659) 2023-07-14 10:31:30 +01:00
Joe Gallo
249b83e4bd
Add more information for understanding GrokTests.testExponentialExpressions failure (#96230) 2023-05-18 17:24:45 -04:00
Pablo Alcantar Morales
b51951f679
New GrokPatternBank data structure (#95269)
This refactor introduces a new data structure called `PatternBank` which is an abstraction over the old `Map<String, String>` used all over the place. This data structure has handy methods to extend the pattern bank with new patterns and also centralize the validation of pattern banks into one place. Thanks to this, the repeated code to create Grok Pattern banks is 0.
---------

Co-authored-by: Joe Gallo <joe.gallo@elastic.co>
2023-04-20 13:11:28 +02:00
Pablo Alcantar Morales
7a51c31200
stop using array elements for grok modes (#95225)
Adds 2 new constants to hold the possible ECS compatibility modes. Then 
uses these values instead to pick them up from the array of available modes.
2023-04-13 17:06:35 +02:00
Luigi Dell'Aquila
1e840e22d2
Fix Grok.match() with offset and suffix pattern (#95003) 2023-04-11 09:36:06 +02:00
Rory Hunter
fe1083f6c5
Upgrade spotless plugin to 6.17.0 (#94994)
Fixes #82794. Upgrade the spotless plugin, which addresses the issue
around formatting `instanceof` expressions. Formatting of statements
including lambdas seems to have improved too.
2023-04-04 10:03:32 +01:00
Pablo Alcantar Morales
66fcf65a58
Split Grok's responsibilities (#93987)
* extract ECS compatibility modes
* moves builtin patterns loading logic to its own class
* split loading logic into their own method and extracting commonalities
* 2 new alias methods to clearly state what pattern are you trying to get
2023-03-01 10:10:54 +01:00
David Kyle
b588d2ddd7
Redact Ingest Processor (#92951)
The Redact processor uses the Grok rules engine to
redact text in the input document that matches the
Grok pattern. For example Email or IP addresses can
be redacted using the definitions from the standard
Grok pattern bank. New patterns can be defined in
the processor configuration
2023-02-07 17:10:07 +00:00
Pablo Alcantar Morales
e053f21780
Grok returns a list of matches for repeated pattern names #92092 (#92586)
Grok returns a list of matches for repeated pattern names

This change makes the Elasticsearch Grok processor behaves in the 
same way that Logstash's grok, when handling repeated pattern 
names, returning a list of matches instead only the first only

Closes #92092
2023-01-10 09:44:23 +01:00
Mark Vieira
cdbd7ad543
Add publishing plugin to elasticsearch-grok project (#89184)
It seems https://github.com/elastic/elasticsearch/pull/88982 introduced
a dependency on `elasticsearch-grok` to `x-pack-core`. Since the latter
is published to Maven Central, this means consumers will have issues
resolving it's dependencies since `elasticsearch-grok` isn't published.
This pull request resolves this, by adding the publishing plugin to the
`grok` library. We'll then follow up separately to add that to our
release configuration.
2022-08-09 08:02:08 +09:30
Rene Groeschke
3909b5eaf9
Add verification metadata for dependencies (#88814)
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support. 

- Use sha256 in favor of sha1 as sha1 is not considered safe these days.

Closes https://github.com/elastic/elasticsearch/issues/69736
2022-08-04 09:51:16 +02:00
Chris Hegarty
d245458227
Modularize the ingest.common component (as well as dissect and grok dependent libs) (#87219)
This is change modularizes the ingest.common component,
by adding a module-info.java. As well as two dependent libs.

The project only requires painless SPI to compile, so that was
fixed along the way ( so that the compile module path can be
inferred directly from the dependencies ).
2022-05-30 17:08:13 +01:00
Artem Prigoda
0699c9351f
Use Java 14 switch expressions (#82178)
JEP 361[https://openjdk.java.net/jeps/361] added support for switch expressions
which can be much more terse and less error-prone than switch statements.

Another useful feature of switch expressions is exhaustiveness: we can make
sure that an enum switch expression covers all the cases at compile time.
2022-01-10 09:53:35 +01:00
Artem Prigoda
763d6d510f
Use Java 15 text blocks for JSON and multiline strings (#80751)
The ES code base is quite JSON heavy. It uses a lot of multi-line JSON requests in tests which need to be escaped and concatenated which in turn makes them hard to read. Let's try to leverage Java 15 text blocks for representing them.
2021-12-15 18:01:28 +01:00
Ryan Ernst
c66a371f8e
Upgrade Mockito to 4.0.0 (#79949)
Mockito 4.0 removes several deprecated methods. This commit updates
usages of those deprecated methods and upgrades mockito. The changes
include: * Replace anyMapOf,anyListOf,anySetOf,anyCollectionOf with the
same   method name without `Of` and no longer taking any arguments. *
Replace anyObject with any * Removing argument from isNull * Replace
verifyZeroInteractions with verifyNoMoreInteractions The changes here
were completely mechanical, done entirely with forms of find/replace
within IntelliJ.
2021-10-27 16:16:18 -04:00
Mark Vieira
12ad399c48 Reformat Elasticsearch source 2021-10-27 08:19:51 -07:00
Ryan Ernst
f8d8702b88
Convert uses of mockito Matchers to ArgumentMatchers (#79852)
Matchers is deprecated in Mockito, in favor of the newer
ArgumentMatchers class. In fact, internally Matchers just extends
ArgumentMatchers as all the methods there were moved. This commit
changes all imports of org.mockito.Matchers to
org.mockito.ArgumentMatchers.
2021-10-26 14:16:11 -07:00
Rory Hunter
e55edf937a
Fix shadowed variables in various places - part 1 (#77555)
Part of #19752.

Fix a number of locations where local variables or parameters are shadowing a field
that is defined in the same class.
2021-09-13 13:48:46 +01:00
Dan Hermann
90d2899323
ECS support for Grok processor (#76885) 2021-08-31 06:40:52 -05:00
Dan Hermann
d648b768d8
Sync grok processor patterns with Logstash (#76752) 2021-08-24 07:34:27 -05:00
David Roberts
a0d26954bd
Improve efficiency of Grok circular reference check (#74814)
This change is a tweak to #74581 which removes the N^2
loop that was added in that PR.
2021-07-01 14:30:03 +01:00
Dan Hermann
7603fded04
Improve circular reference detection in grok processor (#74581) 2021-06-30 09:23:29 -05:00
Mark Vieira
a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Nik Everett
2e346f3fae
Grok: "native" results (#62843)
This adds the ability to fetch java primitives like `long` and `float`
from grok matches rather than their boxed versions. It also allows
customizing the which fields are extracted and how they are extracted.
By default we continue to fetch a `Map<String, Object>` but runtime
fields will be able to catch *just* the fields it is interested
in, and the values will be primitives.
2020-09-24 10:46:38 -04:00
Nik Everett
59eac2262e
Grok: Handle utf-8 natively (#62794)
This adds a method to `Grok` that matches against sections offset from
utf-8 byte arrays:
```
Map<String, Object> captures(byte[] utf8Bytes, int offset, int length)
```

This'll be useful for the grok-flavored runtime fields because they
want to match against utf-8 encoded strings stored in a big array. And
joni already supports this.
2020-09-23 08:42:43 -04:00
Nik Everett
d4e014c4c2
Extract capture config from grok patterns up front (#62706)
This extracts the configuration for extracting values from a groked
string when building the grok expression to do two things:
1. Create a method exposing that configuration on `Grok` itself which
   will be used grok `grok` flavored runtime fields.
2. Marginally speed up extracting grok values by skipping a little
   string manipulation.
2020-09-22 13:55:43 -04:00
Nik Everett
152ee3aca4
Raname grok's built-in patterns (#62735)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
2020-09-22 09:31:20 -04:00
Dan Hermann
9dcab76427
Preserve grok pattern ordering and add sort option (#61671) 2020-09-08 07:10:27 -05:00
Rene Groeschke
9526c7a4b3
Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build
2020-06-30 09:37:09 +02:00
Rene Groeschke
680ea07f7f
Remove deprecated usage of testCompile configuration (#57921)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-12 13:34:53 +02:00
Jake Landis
f5910664b7
Ensure Joni warning are logged at debug (#57302)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 13:33:27 -05:00
Jake Landis
839ac4dd6a
Prevent stack overflow for numerous grok patterns. (#55899)
This was noticed for a pipeline that was defining hundreds of
grok patterns inline with a single grok processor.

The recursive call used to translate a Grok pattern to a regular
expression can overflow the stack. This commit converts that method 
to an iterative method. 

Co-authored-by: Przemko Robakowski <probakowski@users.noreply.github.com>
2020-04-30 19:29:18 -05:00
Ryan Ernst
842ce32870
Use task avoidance with forbidden apis (#55034)
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
2020-04-15 13:23:55 -07:00
Gautam
eb097700cf
Missing suffix for German Month "Juli" in Grok Pattern MONTH (#51579) (#51591) 2020-02-03 14:53:30 -06:00
Ryan Ernst
bf317e8c4e
Remove comparison to true for booleans (#51723)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
2020-01-31 16:34:27 -08:00
Alexander Reelsen
c9786592eb
Sync grok patterns with logstash patterns (#50381)
In order to ensure that logstash and Elasticsearch are able to understand
the same patterns, this commit adapts to changes in logstash, adds a few
patterns and changes a few.
2020-01-08 14:50:43 +01:00
Rory Hunter
3a3e5f6176
Apply 2-space indent to all gradle scripts (#48849)
Closes #48724. Update `.editorconfig` to make the Java settings the default
for all files, and then apply a 2-space indent to all `*.gradle` files.
Then reformat all the files.
2019-11-13 10:14:04 +00:00
Martijn van Groningen
2b4aaab9e9
Unmuted and fixed test.
Multiple invocations are expected.

see #48519
2019-10-28 10:36:50 +01:00
Martijn van Groningen
54eda8d7d1
Muted test
See #48519
2019-10-28 09:26:16 +01:00
Martijn van Groningen
12d32af6b4
Change grok watch dog to be Matcher based instead of thread based. (#48346)
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
2019-10-24 15:33:30 +02:00
Martijn van Groningen
9e7cfc8183
Remove redundant nested operator in builtin grok expression. (#47870)
This prevents the following warning from being printed to console:
`regular expression has redundant nested repeat operator + /%\{(?<name>(?<pattern>[A-z0-9]+)(?::(?<subname>[[:alnum:]@\[\]_:.-]+))?)(?:=(?<definition>(?:(?:[^{}]+|\.+)+)+))?\}/`

The current grok expression is not failing, but just this warning is being printed.
The warning started being printed after upgrading joni (#47374).

Closes #47861
2019-10-14 14:34:09 +02:00
Tal Levy
f6f249be15
Expose ValueException in Grok (#47368)
Previously, Grok's groupMatch would allow the code to
fall into an IndexOutOfBoundsException, which can be avoided.
The other exception that can come up is a ValueException. The times
this exception occurs is less understood, but it may make sense to expose
this since it typically means something did not go well.
2019-10-04 13:55:41 -07:00
Martijn van Groningen
785cf6bd44
Upgrade joni from 2.1.6 to 2.1.29 (#47374)
Changed the Grok class to use searchInterruptible(...) instead of search(...)
otherwise we can't interrupt long running matching via the thread watch
dog.

Joni now also provides another way to interrupt long running matches.
By invoking the interrupt() method on the Matcher. We need then to refactor
the watch thread dog to keep track of Matchers instead of Threads, but
it is a better way of doing this, since interrupting would be more direct
(not every 30k iterations) and efficient (checking a volatile field).
This work needs to be done in a follow up.
2019-10-04 06:30:41 -05:00
Alpar Torok
ca54b442bf
Remove eclipse conditionals (#44075)
* Remove eclipse conditionals

We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.

This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.

Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.

This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
2019-10-03 10:50:46 +03:00