Commit graph

99 commits

Author SHA1 Message Date
Mark Vieira
bcfbf00074 Reformat Elasticsearch source 2021-10-27 15:23:15 -07:00
Armin Braun
262db33192
Optimize XContent Object Parsers (#78813) (#79673)
Optimize object parsers a little by extracting cold paths, removing some unnecessary
lambda wrapping and some other small things.
Also, fixed a very expensive use of these APIs in Phase moving from a very hot stream
instantiation to a standard loop.
2021-10-22 11:52:55 +02:00
Keith Massey
1f0e85bae3
Exposing the ability to log deprecated settings at non-critical level(#79107) (#79492)
A recent change for the deprecation logs provided the capability to emit deprecation's at critical vs. warning levels, #77482.
However deprecated settings always log at critical level without the ability to express that the setting deprecation is only a
warning.

This commit exposes the ability to set the deprecation level when deprecating a setting.
Relates #78781
2021-10-19 14:32:17 -05:00
Nik Everett
44ea58fa27
[7.x] Move xcontent filtering tests (#79298) (#79389)
* Move xcontent filtering tests (#79298)

* Move xcontent filtering tests

Moves the xcontent filtering tests to the xcontent project because its
testing code *in* the xcontent project.

* More clear

* Spotless

* Fixup
2021-10-18 17:56:14 -04:00
Chris Hegarty
964180ba99
[7.x] Fix split package org.elasticsearch.common.xcontent (#79061)
* Fix split package org.elasticsearch.common.xcontent

* Fix test
2021-10-13 15:43:41 +01:00
Armin Braun
0d23a0bf4f
Speed up toXContent Collection Serialization in some Spots (#78742) (#78755)
Found this when benchmarking large cluster states. When serializing collections we'd mostly
not take any advantage of what we know about the collection contents (like we do in `StreamOutput`).
This PR adds a couple of helpers to the x-content-builder similar to what we have on `StreamOutput`
to allow for faster serializing by avoiding the writer lookup and some self-reference checks.
2021-10-06 15:53:45 +02:00
Chris Hegarty
0323ce6ff7
Remove stray para tags (#77699) 2021-09-14 10:46:47 -04:00
Nik Everett
05af24335d
Memory efficient xcontent filtering (backport of #77154) (#77653)
* Memory efficient xcontent filtering (backport of #77154)

I found myself needing support for something like `filter_path` on
`XContentParser`. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(#25168) so I put together a quick benchmark comparing the source
filtering that we do in `_search`.

Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:
```
Benchmark                                          (excludes)  (includes)  (source)  Mode  Cnt     Score    Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message     short  avgt    5  2360.342 ±  4.715  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message     short  avgt    5  2010.278 ± 15.042  ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser                  message     short  avgt    5  1588.446 ± 18.593  ns/op
```

The top line is the way we filter now. The middle line is adding a
filter to `XContentBuilder` - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.

This isn't particularly impresive. 33% *sounds* great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.

But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt     Score     Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4k_field  avgt    5  5957.128 ± 117.402  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4k_field  avgt    5  4999.073 ±  96.003  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4k_field  avgt    5  3261.478 ±  48.879  ns/op
```

That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!

But have a look at how things come out when you want a single field from
a 4 *megabyte* document:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt        Score        Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4m_field  avgt    5  8266343.036 ± 176197.077  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4m_field  avgt    5  6227560.013 ±  68306.318  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4m_field  avgt    5  1617153.472 ±  80164.547  ns/op
```

These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.

Take a look at what you get when I turn on gc metrics:
```
FetchSourcePhaseBenchmark.filterObjects                          message  one_4m_field  avgt    5   7036097.561 ±  84721.312   ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate           message  one_4m_field  avgt    5      2166.613 ±     25.975  MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder                message  one_4m_field  avgt    5   6104595.992 ±  55445.508   ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message  one_4m_field  avgt    5      2496.978 ±     22.650  MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser                 message  one_4m_field  avgt    5   1614980.846 ±  31716.956   ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate  message  one_4m_field  avgt    5         1.755 ±      0.035  MB/sec
```

* Fixup benchmark for 7.x
2021-09-13 16:08:45 -04:00
Rory Hunter
934691e720 Fix shadowed variables in various places - part 1 (#77555)
Part of #19752.

Fix a number of locations where local variables or parameters are shadowing a field
that is defined in the same class.
2021-09-13 14:10:35 +01:00
Rory Hunter
a7c1ca790d
Changes to keep Checkstyle happy after reformatting (#76464) (#76647)
* Reformatting to keep Checkstyle after formatting
* Configure spotless everywhere, and disable the tasks if necessary
* Add XContentBuilder helpers, fix test
2021-08-18 09:40:17 -04:00
elasticsearchmachine
4941d0e74a
[7.x] Fix copyCurentStructure(MapXContentParser) (#76357) (#76374)
* Fix copyCurentStructure(MapXContentParser) (#76357)

This stops `MapXContentParser` from throwing an
`UnsupportedOperationException` when passed as an argument to
`XContentBuilder#copyCurrentStructure`. This is mostly useful in tests
where `Map` is a convenient way to talk about structured configuration
but the production APIs need the map to be embedded into a larger blob
of `XContent`.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

* Fixup

Co-authored-by: Nik Everett <nik9000@gmail.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2021-08-11 14:36:25 -04:00
Armin Braun
b806cdaf0d
Dry Up XContent Parser Construction (#75114) (#75126)
Cleanup duplication in how we parse byte arrrays directly.
2021-07-08 15:32:29 +02:00
elasticsearchmachine
d855d82284
[7.x] Json processor: allow duplicate keys (#74956) 2021-07-06 16:37:13 +02:00
Nhat Nguyen
0ebddea945
Allow build XContent directly from Writable (#73804)
Today, writing a Writable value to XContent in Base64 format performs
these steps: (1) create a BytesStreamOutput, (2) write Writable to that
output, (3) encode a copy of bytes from that output stream, (4) create a
string from the encoded bytes, (5) write the encoded string to XContent.
These steps allocate/use memory 5 times than writing the encode chars
directly to the output of XContent.

This API would help reduce memory usage when storing a large response
of an async search.

Relates #67594
2021-06-09 12:54:34 -04:00
Ryan Ernst
1f94aaf280
Move ParseField to o.e.c.xcontent (#73923) (#73929)
ParseField is part of the x-content lib, yet it doesn't exist under the
same root package as the rest of the lib. This commit moves the class to
the appropriate package.

relates #73784
2021-06-08 15:51:18 -07:00
Ryan Ernst
393ab2d813
Rename o.e.common in libs/core to o.e.core (#73909) (#73920)
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.

relates #73784
backport #73909
2021-06-08 14:17:44 -07:00
Ryan Ernst
ddf4c69f42
Revert "Upgrade Azure SDK and Jackson (#72833) (#72995) (#73493)" (#73861)
The recent upgrade of the Azure SDK has caused a few test failures that
have been difficult to debug and do not yet have a fix. In particular, a
change to the netty reactor resolving
(reactor/reactor-netty#1655). We need to wait
for a fix for that issue, so this reverts commit
f454cefc26.

relates #73493
2021-06-07 18:17:25 -07:00
Ryan Ernst
f454cefc26
Upgrade Azure SDK and Jackson (#72833) (#72995) (#73493)
This commit upgrades the Azure SDK to 12.11.0 and Jackson to 2.12.2. The
Jackson upgrade must happen at the same time due to Azure depending on
this new version of Jackson.

closes #66555
closes #67214

backport #72995
backport #73011

Co-authored-by: Francisco Fernández Castaño <francisco.fernandez.castano@gmail.com>
Co-authored-by: Mark Vieira <portugee@gmail.com>
2021-05-27 16:45:01 -07:00
Alan Woodward
5b0f267181
Remove 'external values', and replace with swapped out XContentParsers (#72203) (#72448)
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063
2021-04-29 10:44:17 +01:00
Alan Woodward
416c2b1aa8
Rework geo mappers to index value by value (#71696) (#71828)
The various geo field mappers are organised in a hierarchy that shares
parsing and indexing code. This ends up over-complicating things,
particularly when we have some mappers that accept multiple values
and others that only accept singletons. It also leads to confusing
behaviour around ignore_malformed behaviour: geo fields will ignore
all values if a single one is badly formed, while all other field mappers
will only ignore the problem value and index the rest. Finally, this
structure makes adding index-time scripts to geo_point needlessly
complex.

This commit refactors the indexing logic of the hierarchy to move the
individual value indexing logic into the concrete implementations,
and aligns the ignore_malformed behaviour with that of other mappers.

It contains two breaking changes:

* The geo field mappers no longer check for external field values on the
  parse context. This added considerable complication to the refactored
  parse methods, and is unused anywhere in our codebase, but may
  impact plugin-based field mappers which expect to use geo fields
  as multifields
* The geo_point field mapper now passes geohashes to its multifields
  one-by-one, instead of formatting them into a comma-delimited
  string and passing them all at once. Completion multifields using
  this as an input should still behave as normal because by default
  they would split this combined geohash string on the commas in any
  case, but keyword subfields may look different.

Fixes #69601
2021-04-19 13:43:27 +01:00
Jack Conradson
0dcd59e2a1 Add missing boolean array to unknown value writers for xcontent (#71651)
This change adds the ability to call value on an XContentBuilder and consume a boolean[]. This was 
missing from the set of other writers for the unknown value call.
2021-04-13 12:51:57 -07:00
Rory Hunter
1b6ad967cb Replace NOT operator with explicit false check - part 8 (#68625)
Part 8.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:32:53 +00:00
Mark Vieira
2d1e8b3abd Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

- Updating LICENSE and NOTICE files throughout the code base, as well
  as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
  created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
  license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
  with updated header (vendored code with Apache 2.0 headers obviously
  remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 18:07:23 -08:00
Rory Hunter
e8da7e33fd Replace NOT operator with explicit false check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-27 20:51:31 +00:00
Rene Groeschke
68fce39562
Avoid tasks materialized during configuration phase (#65922) (#66218)
* Avoid tasks materialized during configuration phase
* Fix RestTestFromSnippet testRoot setup
2020-12-12 22:13:38 +01:00
Hendrik Muhs
8e377da291
[7.x][Transform] use ISO dates in output instead of epoch millis (#65584) (#65952)
Transform writes dates as epoch millis, this does not work for historic data in some cases or is
unsupported. Dates should be written as such. With this PR transform starts writing dates in ISO
format, but as existing transform might rely on the format it provides backwards compatibility for
old jobs as well as a setting to write dates as epoch millis.

fixes #63787
backport #65584
2020-12-07 17:18:55 +01:00
Przemyslaw Gomulka
20cd2e4046
Allow passing versioned media types to 7.x server (#65362)
a follow up after #63071 where it missed the XContentType.fromMediaType
method.
That method also have to remove the vendor specific substrings
(vnd.elasticsearch+ and compatible-with parameter) from mediaType value

relates #51816
2020-11-25 11:52:41 +01:00
Rene Groeschke
709643e649
Move tasks in build scripts to task avoidance api (7.x backport) (#64990)
* Move tasks in build scripts to task avoidance api (#64046)

- Some trivial cleanup on build scripts
- Change task referencing in build scripts to use task avoidance api
where replacement is trivial.
2020-11-12 13:57:01 +01:00
Lee Hinman
dd99125193
[7.x] Add assert that raw and readable xcontent field names are different (#63332) (#63343)
This adds asserts that will catch the case where we accidentally provide the same raw and readable
field name in xcontent.
2020-10-06 11:32:41 -06:00
Przemyslaw Gomulka
eb630e599d
Allow passing versioned media types to 7.x server (#63071)
7.x client can pass media type with a version which will return a 7.x
version of the api in ES 8.
In ES server 7 this media type shoulld be accepted but it serve the same
version of the API (7x)
relates #61427
2020-10-02 09:17:11 +02:00
Armin Braun
9be36865ef
Speed up XContent Collection Parsing (#61442) (#61617)
1. Get rid of the capturing lambda on the hot path that inlines very badly
2. Remove as many bounds checks as possible, thereby reducing method size and improving inlining
2020-08-27 14:15:46 +02:00
Armin Braun
af2e2782eb
Stop Needlessly Copying Bytes in XContent Parsing (#61447) (#61469)
Wrapping a `BytesArray` in a `StreamInput` for deserialization is inefficient.
This forces Jackson to internally buffer (i.e. copy) all bytes from the `BytesArray`
before deserializing, adding overhead for copying the bytes and managing the buffers.

This commit fixes a number of spots where `BytesArray` is the most common type of
`BytesReference` to special case this type and parse it more efficiently.
Also improves parsing `String`s to use the more efficient direct `String` parsing APIs.
2020-08-24 15:49:15 +02:00
Armin Braun
7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Yang Wang
a84469742c
Improve role cache efficiency for API key roles (#58156) (#59397)
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
2020-07-13 22:58:11 +10:00
Przemysław Witek
4a791e835b
Simplify parser declarations when specialist types are stored in strings (#58996) (#59056) 2020-07-06 13:05:03 +02:00
Rene Groeschke
d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Rene Groeschke
abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
Rene Groeschke
01e9126588
Remove deprecated usage of testCompile configuration (#57921) (#58083)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
Christoph Büscher
56625e35b7 Fix bool query behaviour on null value (#56817)
Until 7.7 we used to ignore `null` values for `bool`queries `minimum_should_match`,
parameters and also for the `must`,  `must_not`, `should` and `filter` clauses.
An internal refactoring has changed this so now we get a parsing error. While `null` 
should not a common value here, we should restore the old behaviour for bwc for now.

Closes #56812
2020-05-26 16:23:40 +02:00
Ryan Ernst
9fb80d3827
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 20:23:07 -07:00
Jason Tedor
33669c0420
Upgrade to Jackson 2.10.4 (#56188)
Another Jackson release is available. There are some CVEs addressed,
none of which impact us, but since we can now bump Jackson easily, let
us move along with the train to avoid the false positives from security
scanners.
2020-05-06 17:20:23 -04:00
Hendrik Muhs
e177a38504
[7.x][Transform] add throttling (#56007) (#56184)
add throttling to transform, throttling will slow down search requests by
delaying the execution based on a documents per second metric.

fixes #54862
2020-05-05 13:09:02 +02:00
Igor Motov
3504755f44
Add InstantiatingObjectParser (#55483) (#55604)
Introduces InstantiatingObjectParser which is similar to the
ConstructingObjectParser, but instantiates the object using its constructor
instead of a builder function.

Closes #52499
2020-04-22 12:28:52 -04:00
William Brafford
2ba3be9db6
Remove deprecated third-party methods from tests (#55255) (#55269)
I've noticed that a lot of our tests are using deprecated static methods
from the Hamcrest matchers. While this is not a big deal in any
objective sense, it seems like a small good thing to reduce compilation
warnings and be ready for a new release of the matcher library if we
need to upgrade. I've also switched a few other methods in tests that
have drop-in replacements.
2020-04-15 17:54:47 -04:00
Ryan Ernst
29b70733ae
Use task avoidance with forbidden apis (#55034)
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
2020-04-15 13:27:53 -07:00
Mark Vieira
ce85063653
[7.x] Re-add origin url information to publish POM files (#55173) 2020-04-14 13:24:15 -07:00
Alan Woodward
d23112f441 Report parser name and location in XContent deprecation warnings (#53805)
It's simple to deprecate a field used in an ObjectParser just by adding deprecation
markers to the relevant ParseField objects. The warnings themselves don't currently
have any context - they simply say that a deprecated field has been used, but not
where in the input xcontent it appears. This commit adds the parent object parser
name and XContentLocation to these deprecation messages.

Note that the context is automatically stripped from warning messages when they
are asserted on by integration tests and REST tests, because randomization of
xcontent type during these tests means that the XContentLocation is not constant
2020-03-20 11:52:55 +00:00
Alan Woodward
580bc40c0c Make it possible to deprecate all variants of a ParseField with no replacement (#53722)
Sometimes we want to deprecate and remove a ParseField entirely, without replacement;
for example, the various places where we specify a _type field in 7x. Currently we can
tell users only that a particular field name should not be used, and that another name should
be used in its place. This commit adds the ability to say that a field should not be used at
all.
2020-03-18 14:16:19 +00:00
Ryan Ernst
5c472fcb47 Upgrade jackson to 2.10.3 and GeoIP to 2.13.1 (#53642)
Re-applies the change from #53523 along with test fixes.

closes #53626
closes #53624
closes #53622
closes #53625

Co-authored-by: Nik Everett <nik9000@gmail.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Jake Landis <jake.landis@elastic.co>
2020-03-17 10:28:51 -07:00
Mark Vieira
2f0aca992b
Revert "Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576)"
This reverts commit b7dbadeea0.
2020-03-15 18:10:40 -07:00