Commit graph

40 commits

Author SHA1 Message Date
Jordan Powers
96300a9d80
Optimized text for full unicode and some escape sequences (#129169)
Follow-up to #126492 to apply the json parsing optimization to strings
containing unicode characters and some backslash-escaped characters.

Supporting backslash-escaped strings is tricky as it requires modifying the
string. There are two types of modification: some just remove the backslash
(e.g. \", \\), and some replace the whole escape sequence with a new
character (e.g. \n, \r, \u00e5). In this implementation, the optimization
only supports the first case--removing the backslash. This is done by
making a copy of the data, skipping the backslash. It should still be more
optimized than full String decoding, but it won't be as fast as 
non-backslashed strings where we can directly reference the input bytes.

Relates to #129072.
2025-06-12 09:55:07 -07:00
Jordan Powers
496fb2d5a4
Skip UTF8 to UTF16 conversion during document indexing (#126492)
When parsing documents, we receive the document as UTF-8 encoded data which
we then parse and convert the fields to java-native UTF-16 encoded Strings. 
We then convert these strings back to UTF-8 for storage in lucene.

This patch skips the redundant conversion, instead passing lucene a
direct reference to the received UTF-8 bytes when possible.
2025-06-05 19:50:09 -07:00
Ryan Ernst
6174acdc39
Workaround max name limit imposed by Jackson 2.17 (#126806)
In Jackson 2.15 a maximum string length of 50k characters was
introduced. We worked around that by override the length to max int on
all parsers created by xcontent. Jackson 2.17 introduced a similar limit
on field names. This commit mimics the workaround for string length by
overriding the max name length to be unlimited.

relates #58952
2025-04-15 11:40:27 -07:00
Moritz Mack
a608f0626e
Added query param ?include_source_on_error for ingest requests (#120725)
A new query parameter `?include_source_on_error` was added for create / index, update and bulk REST APIs to control
if to include the document source in the error response in case of parsing errors. The default value is `true`.
2025-01-28 09:33:22 +01:00
Jim Ferenczi
b40a52035f
Add Optional Source Filtering to Source Loaders (#113827)
This change introduces optional source filtering directly within source loaders (both synthetic and stored).
The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently.
By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.

This update also modifies the get document API to apply source filters earlier—directly through the source loader.
The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits),
and source filtering is always applied as the final step.
A follow-up will be required to ensure careful handling of all search-related scenarios.
2024-12-11 13:17:19 +00:00
Henrique Paes
4740b02a9b
Wrap jackson exception on malformed json string (#114445)
This commit hides the underlying Jackson parse exception when encountered while parsing string tokens.
2024-12-05 09:22:48 -08:00
Ryan Ernst
e5d5c17c99
Use directory name as project name for libs (#115720)
The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.

This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
2024-10-29 13:02:28 -07:00
mccheah
6be3036c01
Do not exclude empty arrays or empty objects in source filtering with Jackson streaming (#112250) 2024-10-21 10:28:44 +02:00
Mark Vieira
a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Benjamin Trent
281ee04f7a
JSON parse failures should be 4xx codes (#112703)
It seemed if there wasn't any text to parse, this is not an internal
issue but instead an argument issue.

I simply changed the exception thrown. If we don't agree with this, I
can adjust `query` parsing directly, but this seemed like the better
choice.

closes: https://github.com/elastic/elasticsearch/issues/112296
2024-09-12 00:15:56 +10:00
Christoph Büscher
000ebaf7c2
Json parsing exceptions should not cause 500 errors (#111548)
Currently we wrap JsonEOFException from advancing the json parser into our own
XContentEOFException, but this has the drawback that is results in 500 errors on
the client side. Instead this should be 400 errors.
This changes XContentEOFException to extend XContentParseException so we report
a 400 error instead.

Closes #111542
2024-09-06 09:13:30 +02:00
Nhat Nguyen
98fe686da4
Upgrade xcontent to Jackson 2.17.2 (#112320)
Avoid FasterXML/jackson-core#1256
2024-08-28 15:59:12 -07:00
Ryan Ernst
5a10545d37
Upgrade xcontent to Jackson 2.17.0 (#111948) 2024-08-20 06:07:08 -07:00
Mikhail Berezovskiy
1163d2e4f9
Rename streamContent/Separator to bulkContent/Separator (#111716)
Rename `xContent.streamSeparator()` and
`RestHandler.supportsStreamContent()` to `xContent.bulkSeparator()` and
`RestHandler.supportsBulkContent()`.

I want to reserve use of "supportsStreamContent" for current work in
HTTP layer to [support incremental content
handling](https://github.com/elastic/elasticsearch/pull/111438) besides
fully aggregated byte buffers. `supportsStreamContent` would indicate
that handler can parse chunks of http content as they arrive.
2024-08-09 06:32:20 +10:00
Daniel Mitterdorfer
890bd4b8a5
Consider context in raw serialization (#106163)
With this commit we use `writeRawValue` instead of `writeRaw` when
serializing raw strings as XContent. The latter method does not consider
context (e.g. is the value being written as part of an array and
requires a comma separator?) whereas the former does. This ensures that
pre-rendered double values as we use them in the flamegraph response are
rendered correctly as XContent.

Closes #106103
2024-03-11 13:48:12 +01:00
Ryan Ernst
83585315fe
Only apply build to direct libs (#106101)
Sometimes libs have subprojects that may not be java projects. This commit adjusts the shared
configuration for libs to only affect direct subprojects of :lib.
2024-03-08 13:48:26 -08:00
Daniel Mitterdorfer
7179c12b24
[Profiling] Speed up serialization of flamegraph (#105779)
The response of the flamegraph is quite large: A typical response can
easily reach 50MB (uncompressed). In order to reduce memory pressure and
also to start sending the response sooner, we chunk the response.
However, this leads to many chunks that are very small and lead to high
overhead. In our experiments, just the serialization takes more than
500ms.

With this commit we take the following measures:

1. We split the response into chunks only when it makes sense and
   otherwise send one larger chunk.
2. Serialization of doubles is very expensive: Just the serialization of
   annual CO2 tons takes around 80ms in our test setup. Therefore, we
apply a custom serialization that is both faster than the builtin
serialization as well reduces the amount of bytes sent over the wire
because we round to four decimal places (which is more than sufficient for 
our purposes).
2024-03-07 15:31:02 +01:00
Stuart Tettemer
a359b1f648
Relax limit on max string size in CBOR, Smile, YAML (#103930)
Remove the rough limit on string length from Jackson 2.15. The limit was already relaxed for JSON in #96031, this extends that change to other XContent types.

Refs: #96031
Fixes: #104009
2024-01-08 13:31:54 -06:00
Armin Braun
37d55dac1c
Speed up String array writes to XContent (#98957)
Jackson has a direct method for writing string arrays
that saves us some of the indirection we have when looping
over a string array. This normally doesn't gain much, but for extreme
cases like long index name lists in field caps it saves a couple percent
in CPU time.
2023-08-30 12:02:41 +02:00
Rene Groeschke
b8627079b4
Update Gradle Wrapper to 8.2 (#96686)
- Convention usage has been deprecated and was fixed in our build files
- Fix test dependencies and deprecation
2023-07-04 15:35:15 +02:00
Ryan Ernst
1208c02cee
Relax limit on max string size (#96031)
Jackson 2.15 introduced a (rough) maximum limit on string length. This
commit relaxes that limit to its maximum size, leaving document size
constraints to other existing limits in the system. We can revisit
whether string length within a document should be independently
constrainted later.
2023-05-11 08:54:27 -07:00
Ryan Ernst
8b8a2be7dd
Upgrade Jackson xml to 2.15.0 (#95641)
Additionally this commit updates snakeyaml to 2.0 as that is the version
now used by Jackson.
2023-05-02 13:59:17 -07:00
Simon Cooper
c6487f64f2
Use double wildcards for JSON filtered excludes properly (#94195) 2023-03-10 08:50:28 +00:00
Przemyslaw Gomulka
d065d4b76d
Remove jackson override and upgrade to jackson to 2.14.2 (#93342)
before the jackson 2.14.2 elasticserach had to override the jackson locally to avoid a bug when filtering empty arrays. #92984
This commit reverts the local override and upgrades jackson to 2.14.2 which contain the fix to the bug
2023-01-30 16:58:09 +01:00
Przemyslaw Gomulka
26ccfab8bb
Exclude jackson patched class from spotlessApply (#93059) 2023-01-18 19:08:32 +01:00
Przemyslaw Gomulka
8f37934a76
Exclude the class from jackson jar (#93052)
in #92984 we override a file in jackson jar, but we rely on gradle internals which might change at any point.
This fixes this by excluding a element from a jar and allowing a new class to be added
2023-01-18 16:59:09 +01:00
Przemyslaw Gomulka
441e77c8cf
Patch jackson-core with locally modified class (#92984)
while jackson 2.14.2 with FasterXML/jackson-core#882 is still not released
we want to patch the jackson-core used by x-content with the modified class that fixes the bug #92480

closes #92480
2023-01-18 14:48:14 +01:00
Przemyslaw Gomulka
d19721b701
Update jackson to 2.14.1 (#92990)
Closes #92341
2023-01-17 16:30:49 +01:00
Chris Hegarty
cc9e13c307
Upgrade XContent to Jackson 2.14.0 and enable Fast Double Parser (#90553)
Co-authored-by: Nikola Grcevski <nikola.grcevski@elastic.co>
2022-11-07 16:46:01 +00:00
Jake Landis
58c0625f37
update snakeyaml dependency (#90414)
This commit updates snakeyaml to 1.33 and consolidates which version is used.
2022-09-28 13:13:03 -05:00
Rene Groeschke
3909b5eaf9
Add verification metadata for dependencies (#88814)
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support. 

- Use sha256 in favor of sha1 as sha1 is not considered safe these days.

Closes https://github.com/elastic/elasticsearch/issues/69736
2022-08-04 09:51:16 +02:00
Nik Everett
87ab933c8b
Remove calls to deprecated xcontent method (#84733)
This removes many calls to the last remaining `createParser` method that
I deprecated in #79814, migrating callers to one of the new methods that
it created.
2022-08-01 22:18:03 +09:30
Chris Hegarty
3071c6a055
Modularize Elasticsearch (#81066)
This PR represents the initial phase of Modularizing Elasticsearch (with
Java Modules).

This initial phase modularizes the core of the Elasticsearch server
with Java Modules, which is then used to load and configure extension
components atop the server. Only a subset of extension components are
modularized at this stage (other components come in a later phase).
Components are loaded dynamically at runtime with custom class loaders
(same as is currently done). Components with a module-info.class are
defined to a module layer.

This architecture is somewhat akin to the Modular JDK, where
applications run on the classpath. In the analogy, the Elasticsearch
server modules are the platform (thus are always resolved and present),
while components without a module-info.class are non-modular code
running atop the Elasticsearch server modules. The extension components
cannot access types from non-exported packages of the server modules, in
the same way that classpath applications cannot access types from
non-exported packages of modules from the JDK. Broadly, the core
Elasticseach java modules simply "wrap" the existing packages and export
them. There are opportunites to export less, which is best done in more
narrowly focused follow-up PRs.

The Elasticsearch distribution startup scripts are updated to put jars
on the module path (the class path is empty), so the distribution will
run the core of the server as java modules. A number of key components
have been retrofitted with module-info.java's too, and the remaining
components can follow later. Unit and functional tests run as
non-modular (since they commonly require package-private access), while
higher-level integration tests, that run the distribution, run as
modular.

Co-authored-by: Chris Hegarty <christopher.hegarty@elastic.co>
Co-authored-by: Ryan Ernst <ryan@iernst.net>
Co-authored-by: Rene Groeschke <rene@elastic.co>
2022-05-20 13:11:42 +01:00
Armin Braun
943c0d551b
Use Faster API for Parsing Maps from XContent (#85732)
We can be a little more efficient when parsing maps and exploit
the fact that we know the next token is a name in a couple of cases.
I fixed the most performance relevant one but there's a couple more
that could make use of this API in a follow up.
2022-04-29 12:20:33 +02:00
Ryan Ernst
b2c9028384
Move io utils to core package (#85954)
Most classes under elasticsearch-core had been moved to the o.e.core
package. However, a couple io related classes remained in an "internal"
package. This commit moves Streams and IOUtils to the core package, as
they are no more "internal" than the rest of the classes in core.
2022-04-19 21:26:28 -07:00
Armin Braun
caa0b61812
Speed up JsonXContentParser token conversion and text() (#85621)
This reduces the compile size of convert token by about 20 bytes making it inline in a couple
of places that wouldn't inline before once `unknownTokenException` is deoptimized as dead code.

Same for the change to `text()` that saves some method size there so it inlines into the much used
textOrNull here and there.
2022-04-01 18:06:54 +02:00
Armin Braun
898d84998b
Make classes+methods that can be static static in many spots (#85370)
Just some quick static analysis+fixing here. Not much in terms of code changes
besides adding the `static` keywords with the exception of some simplifications
to some of the search objects that don't need the search controller instance
passed down in many spots.
This was done mostly automatically by the IDE but some quick manual inspection shows
quite a few spots where this should make things behave better via things like making lambdas
non-capturing.
2022-03-30 00:21:56 +02:00
Armin Braun
0bd269d3bd
Speed up CompressedXContent Serialization (#84802)
Add a much faster and less allocation heavy path for deserializing and then writing
xcontent (as is common when serializing states) to speed up large cluster state handling
in APIs etc.
Also, optimize instantiating compressed xcontent some more by removing an allocation and
improving the extraction of byte arrays from a bytes output stream.

This is quite helpful in speeding up snapshotting large clusters in particular where we
serialize index metadata in a very hot loop on the master node.
2022-03-15 07:44:12 +01:00
Ryan Ernst
72be951535
Upgrade jackson for x-content to 2.13.2 (#84905)
This commit upgrades Jackson used in x-content. It does not affect
Jackson used in modules or plugins.
2022-03-14 09:06:21 -07:00
Ryan Ernst
070fcaa0ad
Move x-content implementation to a separate classloader (#83705)
This change isolates the Jackson implementation of x-content parsers and generators to a separate classloader. The code is loaded dynamically upon accessing any x-content functionality.

The x-content implementation is embedded inside the x-content jar, as a hidden set of resource files. These are loaded through a special classloader created to initialize the XContentProvider through service loader. One caveat to this approach is that IDEs will no longer trigger building the x-content implementation when it changes. However, running any test from the command line, or running a full Build in IntelliJ will trigger the directory to be built.

Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>
2022-03-07 15:44:59 -08:00