Commit graph

149 commits

Author SHA1 Message Date
Chris Hegarty
45a08b94b3
Upgrade to Lucene 9.12.0 (#113333) (#113835)
This commit upgrades to Lucene 9.12.0.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: John Wagster <john.wagster@elastic.co>
Co-authored-by: Luca Cavanna <javanna@apache.org>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-10-01 13:55:02 +01:00
Ryan Ernst
8b795d4048
Remove plugin classloader indirection (#113154) (#113273)
Extensible plugins use a custom classloader for other plugin jars. When
extensible plugins were first added, the transport client still existed,
and elasticsearch plugins did not exist in the transport client (at
least not the ones that create classloaders). Yet the transport client
still created a PluginsService. An indirection was used to avoid
creating separate classloaders when the transport client had created the
PluginsService.

The transport client was removed in 8.0, but the indirection still
exists. This commit removes that indirection layer.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-09-27 03:55:44 +10:00
Mark Vieira
0279c0a909
Add AGPLv3 as a supported license 2024-09-13 14:30:33 -07:00
Patrick Doyle
35a375329a
Move Guice to org.elasticsearch.injection.guice (#111723)
* Move files and fix imports & module exports
* Other consequences of moving Guice
2024-08-12 10:47:46 -04:00
Ryan Ernst
e6713a5c0a
Remove JNA from server dependencies (#110809)
All native methods are now bound through NativeAccess. This commit
removes the jna dependency from server.

relates #104876
2024-07-12 19:49:13 -07:00
Ryan Ernst
8417d3f141
Move preallocate functionality to native access (#110678)
This commit moves the file preallocation functionality into
NativeAccess. The code is basically the same. One small tweak is that
instead of breaking Java access boundaries in order to get an open file
handle, the new code uses posix open directly.

relates #104876
2024-07-11 09:42:44 -07:00
Volodymyr Krasnikov
6dbf8d59e5
Avoid possible flaky builds (#110301)
* Segragate sys prop dependent tests by gradle tasks

* Add dependency to gradle check task + style

* Update server/src/test/java/org/elasticsearch/index/IndexSettingsOverrideTests.java

Co-authored-by: Yang Wang <ywangd@gmail.com>

---------

Co-authored-by: Yang Wang <ywangd@gmail.com>
2024-07-02 10:00:03 -07:00
Carlos Delgado
d332ed7d16
Enforce synonyms limit on APIs (#109981) 2024-06-21 18:16:16 +02:00
Chris Hegarty
fa364bfcaf
Rename the vec module to better reflect that it provides SIMD optimized vector scorers (#109661)
This commit renames the vector module to better reflect its intent - to provide SIMD optimized vector scorer implementations.
2024-06-17 11:10:02 +01:00
Benjamin Trent
cf84416fc5 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-04 12:50:52 -04:00
Rene Groeschke
8ac3e3dd90
Update Gradle wrapper to 8.8 (#108021)
Fix incompatibility with 8.8 and our internal api usages

- Update ospackage to a version that contains a fix we provided
- Tweak build logic to avoid deprecation warnings
- Use newer permission api
- Use custom shadowplugin
- Rework ElasticsearchDistribution dependencies resolution
- Update Gradle wrapper to 8.8
2024-06-04 12:43:02 +02:00
ChrisHegarty
cd834e325c Fix lucene_snapshot build 2024-05-27 14:52:26 +01:00
Chris Hegarty
6b52d7837b
Add an optimised int8 vector distance function for aarch64. (#106133)
This commit adds an optimised int8 vector distance implementation for aarch64. Additional platforms like, say, x64, will be added as a follow-up.

The vector distance implementation outperforms Lucene's Pamana Vector implementation for binary comparisons by approx 5x (depending on the number of dimensions). It does so by means of compiler intrinsics built into a separate native library and link by Panama's FFI. Comparisons are performed on off-heap mmap'ed vector data.

The implementation is currently only used during merging of scalar quantized segments, through a custom format ES814HnswScalarQuantizedVectorsFormat, but its usage will likely be expanded over time.

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co>
Co-authored-by: Mark Vieira <portugee@gmail.com>
Co-authored-by: Ryan Ernst <ryan@iernst.net>
2024-04-12 08:44:21 +01:00
Moritz Mack
6b50b6ddf9
Block updates to log level for restricted loggers if less specific than INFO (#105020)
To prevent leaking sensitive information such as credentials and keys in logs, this 
commit prevents configuring some restricted loggers (currently `org.apache.http` 
and `com.amazonaws.request`) at high verbosity unless the NetworkTraceFlag 
(`es.insecure_network_trace_enabled`) is enabled.
2024-02-21 17:45:51 +01:00
Ryan Ernst
6375e9f443
Add native access library (#105100)
Elasticsearch requires access to some native functions. Historically
this has been achieved with the JNA library. However, JNA is a
complicated, magical library, and has caused various problems booting
Elasticsearch over the years. The new Java Foreign Function and Memory
API allows access to call native functions directly from Java. It also
has the advantage of tight integration with hotspot which can improve
performance of these functions (though performance of Elasticsearch's
native calls has never been much of an issue since they are mostly at
boot time).

This commit adds a new native lib that is internal to Elasticsearch. It
is built to use the foreign function api starting with Java 21, and
continue using JNA with Java versions below that.

Only one function, checking whether Elasticsearch is running as root, is
migrated. Future changes will migrate other native functions.
2024-02-07 18:27:09 -05:00
James Baiera
6fa7f60073
Add ability to create a data stream failure store (#99134)
Adds the ability to configure a data stream to create a new kind of backing index called a failure store which will eventually be used to store error information when ingest pipelines fail to ingest a document or when a document fails to be parsed correctly by the configured mapping on the data stream.
2023-11-15 15:32:51 -05:00
Andrei Dan
01ed7de99f
GA the data stream lifecycle (#98644)
This makes the data stream lifecycle generally available. This will allow
data streams to take advantage of a native simplified and resilient
lifecycle implementation.
2023-08-21 17:28:54 +01:00
Rene Groeschke
b8627079b4
Update Gradle Wrapper to 8.2 (#96686)
- Convention usage has been deprecated and was fixed in our build files
- Fix test dependencies and deprecation
2023-07-04 15:35:15 +02:00
Ryan Ernst
7d8aac3a3e
Implement custom JUL bridge (#96872)
The log4j JUL bridge turned out to have issues because it relied on java
beans. This commit implements a custom bridge between JUL and Log4j.

closes #94613
2023-06-20 09:48:25 -07:00
Przemyslaw Gomulka
31e20d9239
Revert "Add JUL bridge (#96683)" (#96832)
This reverts commit 2bdf1bc0d6.
2023-06-14 14:37:53 +02:00
Ryan Ernst
2bdf1bc0d6
Add JUL bridge (#96683)
This commit adds the Log4j JUL bridge so that messages using JUL are
more nicely converted to log4j messages. Currently these messages are
captured via the stdout logging stream. This commit also adds a log4j
filter to replace the logging stream filtering mechanism used to quiet
some Lucene log messages that may be confusing to users.

closes #94613
2023-06-13 19:31:05 -04:00
Kostas Krikellas
67211be81d
Fork TDigest library (#96086)
* Initial import for TDigest forking.

* Fix MedianTest.

More work needed for TDigestPercentile*Tests and the TDigestTest (and
the rest of the tests) in the tdigest lib to pass.

* Fix Dist.

* Fix AVLTreeDigest.quantile to match Dist for uniform centroids.

* Update docs/changelog/96086.yaml

* Fix `MergingDigest.quantile` to match `Dist` on uniform distribution.

* Add merging to TDigestState.hashCode and .equals.

Remove wrong asserts from tests and MergingDigest.

* Fix style violations for tdigest library.

* Fix typo.

* Fix more style violations.

* Fix more style violations.

* Fix remaining style violations in tdigest library.

* Update results in docs based on the forked tdigest.

* Fix YAML tests in aggs module.

* Fix YAML tests in x-pack/plugin.

* Skip failing V7 compat tests in modules/aggregations.

* Fix TDigest library unittests.

Remove redundant serializing interfaces from the library.

* Remove YAML test versions for older releases.

These tests don't address compatibility issues in mixed cluster tests as
the latter contain a mix of older and newer nodes, so the output depends
on which node is picked as a data node since the forked TDigest library
is not backwards compatible (produces slightly different results).

* Fix test failures in docs and mixed cluster.

* Reduce buffer sizes in MergingDigest to avoid oom.

* Exclude more failing V7 compatibility tests.

* Update results for JdbcCsvSpecIT tests.

* Update results for JdbcDocCsvSpecIT tests.

* Revert unrelated change.

* More test fixes.

* Use version skips instead of blacklisting in mixed cluster tests.

* Switch TDigestState back to AVLTreeDigest.

* Update docs and tests with AVLTreeDigest output.

* Update flaky test.

* Remove dead code, esp around tracking of incoming data.

* Update docs/changelog/96086.yaml

* Delete docs/changelog/96086.yaml

* Remove explicit compression calls.

This was added to prevent concurrency tests from failing, but it leads
to reduces precision. Submit this to see if the concurrency tests are
still failing.

* Revert "Remove explicit compression calls."

This reverts commit 5352c96f65.

* Remove explicit compression calls to MedianAbsoluteDeviation input.

* Add unittests for AVL and merging digest accuracy.

* Fix spotless violations.

* Delete redundant tests and benchmarks.

* Fix spotless violation.

* Use the old implementation of AVLTreeDigest.

The latest library version is 50% slower and less accurate, as verified
by ComparisonTests.

* Update docs with latest percentile results.

* Update docs with latest percentile results.

* Remove repeated compression calls.

* Update more percentile results.

* Use approximate percentile values in integration tests.

This helps with mixed cluster tests, where some of the tests where
blocked.

* Fix expected percentile value in test.

* Revert in-place node updates in AVL tree.

Update quantile calculations between centroids and min/max values to
match v.3.2.

* Add SortingDigest and HybridDigest.

The SortingDigest tracks all samples in an ArrayList that
gets sorted for quantile calculations. This approach
provides perfectly accurate results and is the most
efficient implementation for up to millions of samples,
at the cost of bloated memory footprint.

The HybridDigest uses a SortingDigest for small sample
populations, then switches to a MergingDigest. This
approach combines to the best performance and results for
small sample counts with very good performance and
acceptable accuracy for effectively unbounded sample
counts.

* Remove deps to the 3.2 library.

* Remove unused licenses for tdigest.

* Revert changes for SortingDigest and HybridDigest.

These will be submitted in a follow-up PR for enabling MergingDigest.

* Remove unused Histogram classes and unit tests.

Delete dead and commented out code, make the remaining tests run
reasonably fast. Remove unused annotations, esp. SuppressWarnings.

* Remove Comparison class, not used.

* Small fixes.

* Add javadoc and tests.

* Remove special logic for singletons in the boundaries.

While this helps with the case where the digest contains only
singletons (perfect accuracy), it has a major issue problem
(non-monotonic quantile function) when the first singleton is followed
by a non-singleton centroid. It's preferable to revert to the old
version from 3.2; inaccuracies in a singleton-only digest should be
mitigated by using a sorted array for small sample counts.

* Revert changes to expected values in tests.

This is due to restoring quantile functions to match head.

* Revert changes to expected values in tests.

This is due to restoring quantile functions to match head.

* Tentatively restore percentile rank expected results.

* Use cdf version from 3.2

Update Dist.cdf to use interpolation, use the same cdf
version in AVLTreeDigest and MergingDigest.

* Revert "Tentatively restore percentile rank expected results."

This reverts commit 7718dbba59.

* Revert remaining changes compared to main.

* Revert excluded V7 compat tests.

* Exclude V7 compat tests still failing.

* Exclude V7 compat tests still failing.

* Restore bySize function in TDigest and subclasses.
2023-06-13 11:43:54 +03:00
Simon Cooper
6670b778db
Introduce IndexVersion class (#94827)
This adds IndexVersion that represents the index data & metadata version, separate to the release version. Similar to TransportVersion, this will eventually be completely separated from release version.
2023-06-01 15:11:08 +01:00
Mark Vieira
f58f0d612b
Remove Version.transportVersion field (#95282)
This is the final part of separating Version and TransportVersion. There is now no definitive mapping between the two; the two version numbers need to be managed separately.
2023-05-02 11:00:50 -07:00
Mark Vieira
b5af53db4f
Revert "Remove Version.transportVersion field (#95282)"
This reverts commit 2017e76f40.
2023-05-02 09:37:53 -07:00
Simon Cooper
2017e76f40
Remove Version.transportVersion field (#95282)
This is the final part of separating Version and TransportVersion. There is now no definitive mapping between the two; the two version numbers need to be managed separately.
2023-04-28 15:50:11 +01:00
Joe Gallo
abc495d355
Move redact ingest processor into x-pack (#95426) 2023-04-21 15:04:49 -04:00
Ryan Ernst
c619be4b5e
Move preallocate module to libs (#94884)
The preallocate module needs access to java.io internals. However, in
order to open java.io to a specific module, rather than the unnamed
module as was previously done, the said module must be in the boot
layer.

This commit moves the preallocate module to libs. It adds it to the main
lib dir, though it does not add it as a compile dependency of server.
2023-04-10 13:05:43 -07:00
Mark Vieira
c5c8543b24
Publish test artifact from server project (#94906)
This allows other projects to extend tests from server. This supports
running some of these unit tests in different configurations.
2023-03-30 11:12:18 -04:00
Mary Gouseti
d38b8fc3b6
Enable dlm flag on non-snapshot builds tests (#94639) 2023-03-22 17:31:22 +01:00
Mark Vieira
cf95c34700
Fix third party audit task when running with Java 20 (#94601)
The upgrade to Lucene 9.6 snapshot broke third party audit when running
against Java 20, presumably because the usage of the since removed
MemorySegment API has been removed.
2023-03-21 13:41:02 -04:00
David Turner
421c2d4731
Add request/response body logging to HTTP tracer (#93133)
Adds another logger, `org.elasticsearch.http.HttpBodyTracer`, which logs
the body of every HTTP request and response as well as the usual
summaries.
2023-03-15 11:13:36 -04:00
Mark Vieira
915b475fbc Ignore Version.java file when applying spotless formatting 2023-02-02 10:43:07 -08:00
Mark Vieira
8e44603c06
Fix thirdPartyAudit tasks when running with Java 20 (#93394) 2023-02-01 09:10:51 -08:00
Rene Groeschke
43a0377735
Update forbiddenapis to 3.4 (#90624)
Fix breaking changes to source validation after change in default jdk rule set
2022-10-06 16:52:06 +02:00
Mark Vieira
3791d6da99 Silence server third party audit on Java 19 builds
The forbidden apis plugin bundles an older version of ASM that doesn't
support Java 19. The version of Lucene we use is a MR jar that contains
Java 19 classes. Until forbidden apis updates their bundled ASM we'll
just mute these checks on Java 19 for now.
2022-09-29 09:37:54 -07:00
Przemyslaw Gomulka
35ea2b13b5
[Stable plugin API] Load plugin named components (#89969)
Stable plugins are using @ extensible and @ NamedComponents annotations
to mark components to be loaded.
This commit is loading extensible classNames from extensibles.json and
named components from named_components.json

The scanning mechanism that can generate these files will be done later in a gradle plugin/plugin installer

relates #88980
2022-09-13 09:05:08 +02:00
Rene Groeschke
98b789c940
Update to to Gradle wrapper 7.5 (#85141)
This updates the gradle wrapper to a 7.5

Fixes #85123
2022-07-19 08:12:19 +02:00
Chris Hegarty
453f12c72d
Upgrade to Log4J 2.18.0 (#88237) 2022-07-04 11:30:38 +01:00
Rene Groeschke
cdf5bd7ed0
Rework testing conventions gradle plugin (#87213)
This PR reworks the testing conventions precommit plugin. This plugin now:
- is compatible with yaml, java rest tests and internalClusterTest (aka different sourceSets per test type)
- enforces test base class and simple naming conventions (as it did before)
- adds one check task per test sourceSet
- uses the worker api to improve task execution parallelism and encapsulation
- is gradle configuration cache compatible  

This also ports the TestingConventions integration testing to Spock and removes the build-tools-internal/test kit folder that is not required anymore. We also add some common logic for testing java related gradle plugins. 
We will apply further cleanup on other tests within our test suite in a dedicated follow up cleanup
2022-06-20 16:26:38 +02:00
Przemyslaw Gomulka
0ef15b49e9
Stable logging API - the basic use case (#86612)
Introducing a stable logging API under libs/logging.
This change covers the most common use cases for logging: fetching a logger with LogManager, emitting a log messages with Logger and Level.
It is influenced by log4j2-api, but do not include Marker and LogBuilder methods.
Also methods using org.apache.logging.log4j.util.Supplier are replaced with java.util.Supplier

The basic implementation is present in server and injected statically in LogConfigurator

relates #84478
2022-06-13 10:25:54 +02:00
Ryan Ernst
f5c0be5c89
Move spatial3d dependency to spatial (#87397)
Server depends on spatial3d, but it is only ever used by the spatial
xpack component. This commit moves the dependency there.

closes #87026
2022-06-07 12:54:11 -07:00
Chris Hegarty
14fab4e4cd
Fix generated plugins.txt resource dependency (#87107) 2022-05-26 07:48:27 +01:00
Ryan Ernst
52c52b996d
Migrate all uses of hppc BitMixer to Lucene (#85470)
Lucene has its own copy of BitMixer. Rather than giving Elasticsearch
yet another copy of these functions, this commit converts the uses to
Lucene's BitMixer.

relates #84735
2022-03-31 20:27:52 -07:00
Ryan Ernst
00bf5dd88f
Restrict hppc to server only (#85041)
This commit removes the final leakage of hppc from ImmutableOpenMap and
then moves hppc to an implementation dependency. Modules and plugins
will no longer get hppc on their compile classpath, so new uses should
not pop up.

relates #84735
2022-03-17 12:34:09 -07:00
Ryan Ernst
070fcaa0ad
Move x-content implementation to a separate classloader (#83705)
This change isolates the Jackson implementation of x-content parsers and generators to a separate classloader. The code is loaded dynamically upon accessing any x-content functionality.

The x-content implementation is embedded inside the x-content jar, as a hidden set of resource files. These are loaded through a special classloader created to initialize the XContentProvider through service loader. One caveat to this approach is that IDEs will no longer trigger building the x-content implementation when it changes. However, running any test from the command line, or running a full Build in IntelliJ will trigger the directory to be built.

Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>
2022-03-07 15:44:59 -08:00
Nikola Grcevski
487077cc05
Remove Lucene split packages (#82132)
This PR fixes the Lucene split package
issue in LazySoftDeletesDirectoryReaderWrapper.
2022-03-07 09:22:21 -05:00
Benjamin Trent
b592d2bf01
New random_sampler aggregation for sampling documents in aggregations (#84363)
This adds a new sampling aggregation that performs a background sampling over all documents in an index. 

The syntax is as follows:
```
{
  "aggregations": {
    "sampling": {
      "random_sampler": {
        "probability": 0.1
      },
      "aggs": {
        "price_percentiles": {
          "percentiles": {
            "field": "taxful_total_price"
          }
        }
      }
    }
  }
}
```

This aggregation provides fast random sampling over the entire document set in order to speed up costly aggregations.

Testing this over a variety of aggregations and data sets, the median speed up when sampling at `0.001` over millions of documents is around 70X speed improvement.

Relative error rate does rely on the size of the data and the aggregation kind. Here are some typically expected numbers when sampling over 10s of millions of documents. `p` is the configured probability and `n` is the number of documents matched by your provided filter query.
2022-03-02 14:32:30 -05:00
Mayya Sharipova
26c3dd6857
Upgrade to lucene-9.1.0-snapshot-1336263051c (#83667)
Lucene issues that resulted in elasticsearch changes:

LUCENE-9820 Separate logic for reading the BKD index from logic to intersecting it.
LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator()
LUCENE-10301: make the test-framework a proper module by moving all test
classes to org.apache.lucene.tests
LUCENE-10300: rewrite how resources are read in ukrainian morfologik analyzer:
LUCENE-10054 Make HnswGraph hierarchical
2022-02-22 09:53:20 +01:00
Benjamin Trent
b610aeeabb
[ML] add new random_sampler aggregation for background sampling documents (#81228)
This is a reincarnation of #53200

This commit adds a new `random_sampler` aggregation for randomly including documents in the collected result.

API format is
```js
{
  "aggs": {
    "sampler": {
        "random_sampler": {
           "probability": 0.001, //the probability that a doc is included
           "seed": 42 // Optional seed for consistent results
        },
        "aggs": {
          "mean": {
            "avg": {
              "field": "value"
            }
         }
      }
    }
  }
}
```

The sampling skips `n` documents where `n` is a random sampling from an optimized geometric distribution where the probability of success is the provided `probability`. Additionally, each shard queried will have a separate random stream (even when the seed is provided). One may consider `probability` as "percentage of documents matched", but that comparison is not exact as there is variability in the number of documents considered.

Performance is greatly improved for many metrics and on larger datasets this improvement can be immense.
2022-01-27 11:56:19 -05:00