Commit graph

424 commits

Author SHA1 Message Date
Lorenzo Dematté
d18b6790f4
[Entitlements] Refactor: create/parse entitlement policies earlier during bootstrap (#120611) 2025-01-22 14:29:57 +01:00
Tim Vernum
552cec7ff0 Merge revision 34059c9dbd into multi-project 2025-01-17 16:32:15 +11:00
Patrick Doyle
34059c9dbd
Limit ByteSizeUnit to 2 decimals (#120142)
* Exhaustive testParseFractionalNumber

* Refactor: encapsulate ByteSizeUnit constructor

* Refactor: store size in bytes

* Support up to 2 decimals in parsed ByteSizeValue

* Fix test for rounding up with no warnings

* ByteSizeUnit transport changes

* Update docs/changelog/120142.yaml

* Changelog details and impact

* Fix change log breaking.area

* Address PR comments
2025-01-16 19:30:23 +00:00
Simon Cooper
5a70623d8d Merge remote-tracking branch 'upstream-main/main' into merge-main-16-01-25 2025-01-16 09:23:46 +00:00
Iván Cea Fontenla
b7ab8f8bb7
ESQL: Add row counts to profile results (#120134)
Closes https://github.com/elastic/elasticsearch/issues/119969

- Rename "pages_in/out" to "pages_received/emitted", to standardize the name along most operators
  - **There are still "pages_processed" operators**, maybe it would make sense to also rename those?
- Add "pages_received/emitted" to TopN operator, as it was missing that
- Added "rows_received/emitted" to most operators
- Added a test to ensure all operators with status provide those metrics
2025-01-15 15:30:41 +00:00
Nik Everett
c990377c95
ESQL: Limit memory usage of fold (#118602)
`fold` can be surprisingly heavy! The maximally efficient/paranoid thing
would be to fold each expression one time, in the constant folding rule,
and then store the result as a `Literal`. But this PR doesn't do that
because it's a big change. Instead, it creates the infrastructure for
tracking memory usage for folding as plugs it into as many places as
possible. That's not perfect, but it's better.

This infrastructure limit the allocations of fold similar to the
`CircuitBreaker` infrastructure we use for values, but it's different
in a critical way: you don't manually free any of the values. This is
important because the plan itself isn't `Releasable`, which is required
when using a real CircuitBreaker. We could have tried to make the plan
releasable, but that'd be a huge change.

Right now there's a single limit of 5% of heap per query. We create the
limit at the start of query planning and use it throughout planning.

There are about 40 places that don't yet use it. We should get them
plugged in as quick as we can manage. After that, we should look to the
maximally efficient/paranoid thing that I mentioned about waiting for
constant folding. That's an even bigger change, one I'm not equipped
to make on my own.
2025-01-13 15:04:27 +00:00
Yang Wang
e1151ef1ba Merge main into multi-project 2025-01-06 13:30:02 +11:00
Oleksandr Kolomiiets
8ca74fb956
Revert "Extract synthetic source logic from DocumentParser (#116049)" (#119530)
This reverts commit e8d32afdf4.
2025-01-03 12:03:40 -08:00
Tim Vernum
4ff691f066 Merge revision 7fb6ca447a into multi-project 2024-12-31 15:41:02 +11:00
Oleksandr Kolomiiets
e8d32afdf4
Extract synthetic source logic from DocumentParser (#116049) 2024-12-24 11:41:44 -08:00
Niels Bauman
3738202979 Merge main into multi-project 2024-12-24 18:26:13 +01:00
Chris Hegarty
3a2f8f62c4
Add square distance query variants to the vector distance benchmark (#119219)
This commit adds square distance query variants to the vector distance benchmark.
2024-12-23 17:18:23 +00:00
Tim Vernum
e5a0739005 Merge main into multi-project 2024-12-12 17:23:24 +11:00
Jim Ferenczi
b40a52035f
Add Optional Source Filtering to Source Loaders (#113827)
This change introduces optional source filtering directly within source loaders (both synthetic and stored).
The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently.
By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.

This update also modifies the get document API to apply source filters earlier—directly through the source loader.
The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits),
and source filtering is always applied as the final step.
A follow-up will be required to ensure careful handling of all search-related scenarios.
2024-12-11 13:17:19 +00:00
Yang Wang
92867cdf50 Merge main into multi-project 2024-11-29 08:50:54 +11:00
Jack Conradson
656b5f9480
Refactor PluginsLoader to better support tests (#117522)
This refactors the way PluginsLoader is created to better support
various types of testing.
2024-11-27 14:31:30 -08:00
Tim Vernum
192ed6c5a4 Merge main into multi-project 2024-11-21 11:25:11 +11:00
Jack Conradson
4f46924f36
Split plugin loading into two different phases to support entitlements (#116998)
This change loads all the modules and creates the module layers for plugins prior to entitlement 
checking during the 2nd phase of bootstrap initialization. This will allow us to know what modules exist 
for both validation and checking prior to actually loading any plugin classes (in a follow up change).

There are now two classes:

    PluginsLoader which does the module loading and layer creation
    PluginsService which uses a PluginsLoader to create the main plugin classes and start the plugins
2024-11-20 15:05:42 -08:00
Niels Bauman
0edb9fa778 Merge remote-tracking branch 'public/main' into merge-main
# Conflicts:
#	server/src/main/java/org/elasticsearch/action/search/TransportSearchShardsAction.java
#	server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationStatsService.java
#	server/src/main/java/org/elasticsearch/gateway/GatewayMetaState.java
#	server/src/main/java/org/elasticsearch/plugins/Plugin.java
#	server/src/test/java/org/elasticsearch/gateway/GatewayMetaStateTests.java
#	server/src/test/java/org/elasticsearch/ingest/IngestMetadataTests.java
2024-11-18 10:53:12 +01:00
Rene Groeschke
13c8aaeffa
[Gradle] Remove static use of BuildParams (#115122)
Static fields dont do well in Gradle with configuration cache enabled.

- Use buildParams extension in build scripts
- Keep BuildParams.ci for now for easy serverless migration
-  Tweak testing doc
2024-11-15 17:58:57 +01:00
Armin Braun
77a7c9c2e2
Add singleton for noop BitSetFilterCache.Listener (#116753)
Noticed during a code review that added yet another one of these:
We have quite a few instances of duplicate noop implementations,
lets make tests a little less verbose here.

Technically the constant is test-only but it felt right to just leave it
on the interface.
2024-11-13 21:55:14 +01:00
Tim Vernum
17c27bc42b Merge main into multi-project 2024-11-11 16:28:45 +11:00
Mary Gouseti
6c959b7e75
Add benchmark for IndexNameExpressionResolver (#115982)
* Add benchmark for IndexNameExpressionResolver

* Extract IndicesRequest in a local class

* Added one more benchmark to capture a mixed request

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-08 11:26:11 +02:00
Tim Vernum
2ba2d2a995 Merge main into multi-project 2024-10-31 11:55:04 +11:00
Ryan Ernst
e5d5c17c99
Use directory name as project name for libs (#115720)
The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.

This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
2024-10-29 13:02:28 -07:00
Tim Vernum
d4e4b5abb0 Merge main into multi-project 2024-10-22 13:03:12 +11:00
Luca Cavanna
8efd08b019
Upgrade to Lucene 10 (#114741)
The most relevant ES changes that upgrading to Lucene 10 requires are:

- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area

Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
2024-10-21 13:38:23 +02:00
Tim Vernum
883471e3b2 Merge main into multi-project 2024-10-21 16:35:42 +11:00
Nhat Nguyen
d0c8ff5932
Refactor TSDB doc_values util allow introduce new codec (#115042)
This PR refactors the doc_values utils used in the TSDB codec to allow 
sharing between the current codec and the new codec.
2024-10-18 08:01:04 -07:00
Tim Vernum
1c62e4f533 Merge main into multi-project 2024-10-14 16:30:28 +11:00
Nik Everett
e304c1d5c1
ESQL: Speed up grouping by bytes (#114021)
This speeds up grouping by bytes valued fields (keyword, text, ip, and
wildcard) when the input is an ordinal block:
```
    bytes_refs 22.213 ± 0.322 -> 19.848 ± 0.205 ns/op (*maybe* real, maybe noise. still good)
       ordinal didn't exist   ->  2.988 ± 0.011 ns/op
```
I see this as 20ns -> 3ns, an 85% speed up. We never hard the ordinals
branch before so I'm expecting the same performance there - about 20ns
per op.

This also speeds up grouping by a pair of byte valued fields:
```
two_bytes_refs 83.112 ± 42.348  -> 46.521 ± 0.386 ns/op
  two_ordinals 83.531 ± 23.473  ->  8.617 ± 0.105 ns/op
```
The speed up is much better when the fields are ordinals because hashing
bytes is comparatively slow.

I believe the ordinals case is quite common. I've run into it in quite a
few profiles.
2024-10-11 21:13:11 +02:00
Albert Zaharovits
36fcdb5d0b Merge main into multi-project 2024-10-06 13:54:49 +03:00
Armin Braun
ddfdd40b16
Reduce footprint of codec instances further (#114072)
Lots of effectively singleton objects here and fields that can be made
static, saves a little more on the per-index overhead and might reveal
further simplifications.
2024-10-04 08:34:24 +02:00
Iván Cea Fontenla
1d8c94b7e4
Add CircuitBreaker to TDigest, Step 4: Take into account shallow classes size (#113613)
Final (I wish) part of
https://github.com/elastic/elasticsearch/issues/99815 Also, fixes
https://github.com/elastic/elasticsearch/issues/113916

## Steps 1. Migrate TDigest classes to use a custom Array
implementation. Temporarily use a simple array wrapper
(https://github.com/elastic/elasticsearch/pull/112810) 2. Implement
CircuitBreaking in the `WrapperTDigestArrays` class. Add
Releasable/AutoCloseable and ensure everything is closed
(https://github.com/elastic/elasticsearch/pull/113105) 3. Pass the
CircuitBreaker as a parameter to TDigestState from wherever it's being
used (https://github.com/elastic/elasticsearch/pull/113387)     - ESQL:
Pass a real CB     - Other aggs: Use the deprecated methods on
`TDigestState`, that will use a No-op CB instead 4. Account remaining
TDigest classes size ("SHALLOW_SIZE") (This PR)

Every step should be safely mergeable to main: - The first and second
steps should have no impact. - The third and fourth ones will start
increasing the CB count partially.

## Remarks As TDigests are releasable now, I had to refactor all tests,
adding try-with-resources or direct close() calls. That added a lot of
changes, but most of them are trivial.

Outside of it, in ESQL, TDigestStates are closed now. Old aggregations
don't close them, as it's not trivial. However, as they are using the
NoopCircuitBreaker, there's no problem with it. There's nothing to be
closed.

## _Remarks 2_ I tried to follow the same pattern in how everything is
accounted. On each TDigest class: - Static constant "SHALLOW_SIZE" with
the object weight - Field `AtomicBoolean closed` to ensure indempotent
`close()` - Static `create()` method that accounts the SHALLOW_SIZE, and
returns a new isntance. And the important part: On exception, it
discounts SHALLOW_SIZE again - A `ramBytesUsed()` (Accountable
interface), barely used for anything really, but some assertions I
believe - A constructor, that closes everything it created on exception
(If it creates an array, and the next array surpasses the CB limit, the
first one must be closed) - And a close() that will, well, close
everything and discount SHALLOW_SIZE

A lot of steps to make sure everything works well in this multi-level
structure, but I believe the result was quite clean
2024-10-03 22:53:50 +10:00
Nik Everett
a18b331336
ESQL: Fix filtering all elements in aggs (#113804)
This adds a test to *every* agg for when it's entirely filtered away and
another when filtering is enabled but unused. I'll follow up with
another test later for partial filtering.

That test caught a bug where some aggs would think they'd been `seen`
when they hadn't. This fixes that too.
2024-10-03 00:42:43 +10:00
Niels Bauman
11d8665d99 Merge main into multi-project
# Conflicts:
#	.buildkite/hooks/pre-command
2024-09-26 14:07:03 -03:00
Iván Cea Fontenla
1faa351760
Add CircuitBreaker to TDigest, Step 2: Add CB to array wrappers (#113105)
Part of https://github.com/elastic/elasticsearch/issues/99815

## Steps
1. Migrate TDigest classes to use a custom Array implementation. Temporarily use a simple array wrapper (https://github.com/elastic/elasticsearch/pull/112810)
2. Implement CircuitBreaking in the `MemoryTrackingTDigestArrays` class. Add `Releasable` and ensure it's always closed within TDigest (This PR)
3. Pass the CircuitBreaker as a parameter to TDigestState from wherever it's being used
4. Account remaining TDigest classes size ("SHALLOW_SIZE")

Every step should be safely mergeable to main:
- The first and second steps should have no impact.
- The third and fourth ones will start increasing the CB count partially.

## Remarks
To simplify testing the CircuitBreaker, added a helper method + `@After` to ESTestCase.

Right now CBs are usually tested through MockBigArrays. E.g:
f7a0196b45/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/AbstractFunctionTestCase.java (L1263-L1265)
So I guess there was no need for this yet. But I may have missed something somewhere.

Also, I'm separating this PR from the "step 3" as integrating this (CB) in the current usages may require some refactor of external code, which may be somewhat more _dangerous_
2024-09-26 16:03:29 +02:00
Tim Vernum
ad6435dede Merge main into multi-project 2024-09-25 12:49:01 +10:00
Nik Everett
5c91edda9f
ESQL: Speed up CASE for some parameters (#112295)
This speeds up the `CASE` function when it has two or three arguments
and both of the arguments are constants or fields. This works because
`CASE` is lazy so it can avoid warnings in cases like
```
CASE(foo != 0, 2 / foo, 1)
```

And, in the case where the function is *very* slow, it can avoid the
computations.

But if the lhs  and rhs of the `CASE` are constant then there isn't any
work to avoid.

The performance improvment is pretty substantial:
```
 (operation)  Before   Error   After    Error  Units
 case_1_lazy  97.422 ± 1.048  101.571 ± 0.737  ns/op
case_1_eager  79.312 ± 1.190    4.601 ± 0.049  ns/op
```

The top line is a `CASE` that has to be lazy - it shouldn't change. The
4 nanos change here is noise. The eager version improves by about 94%.
2024-09-24 12:54:40 -04:00
Tim Vernum
d5d5131e25 Merge main into multi-project 2024-09-19 18:52:20 +10:00
Iván Cea Fontenla
182c9fb95e
Add CircuitBreaker to TDigest, Step 1: Arrays to BigArrays (#112810)
Part of https://github.com/elastic/elasticsearch/issues/99815

## Steps
1. Migrate TDigest classes to use a custom Array implementation. Temporarily use a simple array wrapper (This PR)
2. Implement a BigArrays class and replace the wrapper with it. Add Releasable/AutoCloseable and ensure everything is closed
3. Account remaining TDigest classes size

Every step should be safely mergeable to main:
- The first one should have no impact.
- The second and third ones will start increasing the CB count partially.

## Considerations
The third step will probably require some other interface to manually count used memory _before_ creation of the classes. Something like a `TDigestCircuitBreaker`.
After building this one, I've started considering if it would make sense to just do the breaker, and not migrate things to BigArrays. Simply call a `breaker.increase(...)` before array creation.

The pros I see of BigArrays:
- Automatically verified in ESTestCases. Which means tests will fail unless everything is correctly `.close()`d
- Automatic byte counting, without added calculations to the TDigests (Apart of the close() ones)

And the cons:
- Filling the code with .get()/.set()
- `.sort()` will require a custom implementation for BigArrays. Same for the `.set()` method that copies a range of an array to another

## Benchmarks
This is a comparison of the benchmarks between main (left) and this branch (right). Bigger is worse:
```
TDigest
(compression)  (distribution)  (tdigestFactory)  Score   Error  ->  Score   Error  Units
          100          NORMAL             MERGE  0,157 ± 0,248  ->  0,170 ± 0,071  us/op
          100          NORMAL          AVL_TREE  0,263 ± 0,078  ->  0,309 ± 0,170  us/op
          100          NORMAL            HYBRID  0,167 ± 0,040  ->  0,169 ± 0,042  us/op
          100        GAUSSIAN             MERGE  0,159 ± 0,041  ->  0,163 ± 0,070  us/op
          100        GAUSSIAN          AVL_TREE  0,339 ± 0,127  ->  0,336 ± 0,029  us/op
          100        GAUSSIAN            HYBRID  0,163 ± 0,049  ->  0,167 ± 0,072  us/op
          300          NORMAL             MERGE  0,174 ± 0,044  ->  0,183 ± 0,031  us/op
          300          NORMAL          AVL_TREE  0,443 ± 0,084  ->  0,438 ± 0,079  us/op
          300          NORMAL            HYBRID  0,180 ± 0,059  ->  0,184 ± 0,039  us/op
          300        GAUSSIAN             MERGE  0,167 ± 0,040  ->  0,173 ± 0,054  us/op
          300        GAUSSIAN          AVL_TREE  0,403 ± 0,098  ->  0,387 ± 0,125  us/op
          300        GAUSSIAN            HYBRID  0,183 ± 0,031  ->  0,178 ± 0,049  us/op
```

```
StableSort
(sortDirection)   Score   Error  ->   Score   Error Units
              0  16,435 ± 1,443  ->  15,909 ± 0,745 ms/op
              1   5,237 ± 0,184  ->   4,994 ± 0,461 ms/op
             -1   5,458 ± 0,398  ->   4,696 ± 0,265 ms/op
```

There's barely any relevant affectation I can see.
2024-09-17 15:29:58 +02:00
Niels Bauman
c41ed527b3 Merge main into multi-project 2024-09-14 10:52:45 +02:00
Mark Vieira
a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Albert Zaharovits
62e93779d8 Merge main into multi-project 2024-08-29 17:46:47 +03:00
Ryan Ernst
7b4443016f
Use test util for finding platform dir (#112286)
The native platform dir can be found using a TestUtil method, but
benchmarks was trying to construct it on its own. This commit switches
to using the util method.
2024-08-28 12:10:24 -07:00
Tim Vernum
a100bc3131 Merge main into multi-project 2024-08-28 20:22:59 +10:00
Nik Everett
c05f7e9c81
ESQL: Add way for Block to keepMask (#112160)
This adds a `Block#keepMask(BooleanVector)` method that will make a new
block, keeping all of the values where the vector is `true` and
`null`ing all of the velues where the vector is false.

This will be useful for implementing partial aggregation application
like `| STATS MAX(a WHERE b > 1), MIN(j WHERE b > 2) BY bar`. Or however
the syntax ends up being. We already skip `null` group keys and we can
evaluate the `b > 2` bits to a mask pretty easily. It should also be
useful in optimizing `CASE(a > 2, foo)` - but only when the RHS of the
CASE is `null` and the LHS is a constant or constant-like.

This is something that's very optimize-able. I haven't really optimized
it in this PR, but it should be possible to speed this up a ton and
remove a lot of copying. Here's where the benchmarks start:
```
(dataTypeAndBlockKind)  Mode  Cnt  Score   Error  Units
             int/array  avgt    7  3.705 ± 0.153  ns/op
            int/vector  avgt    7  3.234 ± 0.078  ns/op
```

That's about the same speed as reading the block. In a few of these
cases I expect we can get them to constant performance rather than
per-record performance.
2024-08-27 13:54:40 -04:00
Aurélien FOUCRET
29121fdf8f
New version of the script_score term stats helpers. (#108634) 2024-08-27 18:12:46 +02:00
Patrick Doyle
ae41e9ab65
Pluggable BuiltInExecutorBuilders (#111939)
* Refactor: move static calculations to Util

* BuiltInExecutorBuilders

* Spotless

* Change to getBuilders

* Move helper functions back into ThreadPool
2024-08-27 11:22:54 -04:00
Ryan Ernst
0aa4758f02
Stop setting java.library.path (#112119)
Native libraries in Java are loaded by calling System.loadLibrary. This
method inspects paths in the java.library.path to find the requested
library. Elasticsearch previously used this to find libsystemd, but now
the only remaining use is to set the additional platform directory in
which Elasticsearch keeps its own native libraries.

One issue with setting java.library.path is that its not set for the cli
process, which makes loading the native library infrastructure from clis
difficult. This commit reworks how Elasticsearch native libraries are
found in order to avoid needing to set java.library.path. There are two
cases. The simplest is production, where the working directory is the
Elasticsearch installation directory, so the platform specific directory
can be constructed. The second case is for tests where we don't have an
installtion. We already pass in java.library.path there, so this change
renames the system property to be a test specific property that the new
loading infrastructure looks for.
2024-08-23 11:16:18 -07:00