elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-23 14:47:31 -04:00

Author	SHA1	Message	Date
Chris Hegarty	45a08b94b3	Upgrade to Lucene 9.12.0 (#113333 ) (#113835 ) This commit upgrades to Lucene 9.12.0. Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Armin Braun <me@obrown.io> Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: John Wagster <john.wagster@elastic.co> Co-authored-by: Luca Cavanna <javanna@apache.org> Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2024-10-01 13:55:02 +01:00
Ryan Ernst	8b795d4048	Remove plugin classloader indirection (#113154 ) (#113273 ) Extensible plugins use a custom classloader for other plugin jars. When extensible plugins were first added, the transport client still existed, and elasticsearch plugins did not exist in the transport client (at least not the ones that create classloaders). Yet the transport client still created a PluginsService. An indirection was used to avoid creating separate classloaders when the transport client had created the PluginsService. The transport client was removed in 8.0, but the indirection still exists. This commit removes that indirection layer. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-09-27 03:55:44 +10:00
Mark Vieira	0279c0a909	Add AGPLv3 as a supported license	2024-09-13 14:30:33 -07:00
Patrick Doyle	35a375329a	Move Guice to org.elasticsearch.injection.guice (#111723 ) * Move files and fix imports & module exports * Other consequences of moving Guice	2024-08-12 10:47:46 -04:00
Ryan Ernst	e6713a5c0a	Remove JNA from server dependencies (#110809 ) All native methods are now bound through NativeAccess. This commit removes the jna dependency from server. relates #104876	2024-07-12 19:49:13 -07:00
Ryan Ernst	8417d3f141	Move preallocate functionality to native access (#110678 ) This commit moves the file preallocation functionality into NativeAccess. The code is basically the same. One small tweak is that instead of breaking Java access boundaries in order to get an open file handle, the new code uses posix open directly. relates #104876	2024-07-11 09:42:44 -07:00
Volodymyr Krasnikov	6dbf8d59e5	Avoid possible flaky builds (#110301 ) * Segragate sys prop dependent tests by gradle tasks * Add dependency to gradle check task + style * Update server/src/test/java/org/elasticsearch/index/IndexSettingsOverrideTests.java Co-authored-by: Yang Wang <ywangd@gmail.com> --------- Co-authored-by: Yang Wang <ywangd@gmail.com>	2024-07-02 10:00:03 -07:00
Carlos Delgado	d332ed7d16	Enforce synonyms limit on APIs (#109981 )	2024-06-21 18:16:16 +02:00
Chris Hegarty	fa364bfcaf	Rename the vec module to better reflect that it provides SIMD optimized vector scorers (#109661 ) This commit renames the vector module to better reflect its intent - to provide SIMD optimized vector scorer implementations.	2024-06-17 11:10:02 +01:00
Benjamin Trent	cf84416fc5	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-04 12:50:52 -04:00
Rene Groeschke	8ac3e3dd90	Update Gradle wrapper to 8.8 (#108021 ) Fix incompatibility with 8.8 and our internal api usages - Update ospackage to a version that contains a fix we provided - Tweak build logic to avoid deprecation warnings - Use newer permission api - Use custom shadowplugin - Rework ElasticsearchDistribution dependencies resolution - Update Gradle wrapper to 8.8	2024-06-04 12:43:02 +02:00
ChrisHegarty	cd834e325c	Fix lucene_snapshot build	2024-05-27 14:52:26 +01:00
Chris Hegarty	6b52d7837b	Add an optimised int8 vector distance function for aarch64. (#106133 ) This commit adds an optimised int8 vector distance implementation for aarch64. Additional platforms like, say, x64, will be added as a follow-up. The vector distance implementation outperforms Lucene's Pamana Vector implementation for binary comparisons by approx 5x (depending on the number of dimensions). It does so by means of compiler intrinsics built into a separate native library and link by Panama's FFI. Comparisons are performed on off-heap mmap'ed vector data. The implementation is currently only used during merging of scalar quantized segments, through a custom format ES814HnswScalarQuantizedVectorsFormat, but its usage will likely be expanded over time. Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co> Co-authored-by: Mark Vieira <portugee@gmail.com> Co-authored-by: Ryan Ernst <ryan@iernst.net>	2024-04-12 08:44:21 +01:00
Moritz Mack	6b50b6ddf9	Block updates to log level for restricted loggers if less specific than INFO (#105020 ) To prevent leaking sensitive information such as credentials and keys in logs, this commit prevents configuring some restricted loggers (currently `org.apache.http` and `com.amazonaws.request`) at high verbosity unless the NetworkTraceFlag (`es.insecure_network_trace_enabled`) is enabled.	2024-02-21 17:45:51 +01:00
Ryan Ernst	6375e9f443	Add native access library (#105100 ) Elasticsearch requires access to some native functions. Historically this has been achieved with the JNA library. However, JNA is a complicated, magical library, and has caused various problems booting Elasticsearch over the years. The new Java Foreign Function and Memory API allows access to call native functions directly from Java. It also has the advantage of tight integration with hotspot which can improve performance of these functions (though performance of Elasticsearch's native calls has never been much of an issue since they are mostly at boot time). This commit adds a new native lib that is internal to Elasticsearch. It is built to use the foreign function api starting with Java 21, and continue using JNA with Java versions below that. Only one function, checking whether Elasticsearch is running as root, is migrated. Future changes will migrate other native functions.	2024-02-07 18:27:09 -05:00
James Baiera	6fa7f60073	Add ability to create a data stream failure store (#99134 ) Adds the ability to configure a data stream to create a new kind of backing index called a failure store which will eventually be used to store error information when ingest pipelines fail to ingest a document or when a document fails to be parsed correctly by the configured mapping on the data stream.	2023-11-15 15:32:51 -05:00
Andrei Dan	01ed7de99f	GA the data stream lifecycle (#98644 ) This makes the data stream lifecycle generally available. This will allow data streams to take advantage of a native simplified and resilient lifecycle implementation.	2023-08-21 17:28:54 +01:00
Rene Groeschke	b8627079b4	Update Gradle Wrapper to 8.2 (#96686 ) - Convention usage has been deprecated and was fixed in our build files - Fix test dependencies and deprecation	2023-07-04 15:35:15 +02:00
Ryan Ernst	7d8aac3a3e	Implement custom JUL bridge (#96872 ) The log4j JUL bridge turned out to have issues because it relied on java beans. This commit implements a custom bridge between JUL and Log4j. closes #94613	2023-06-20 09:48:25 -07:00
Przemyslaw Gomulka	31e20d9239	Revert "Add JUL bridge (#96683 )" (#96832 ) This reverts commit `2bdf1bc0d6`.	2023-06-14 14:37:53 +02:00
Ryan Ernst	2bdf1bc0d6	Add JUL bridge (#96683 ) This commit adds the Log4j JUL bridge so that messages using JUL are more nicely converted to log4j messages. Currently these messages are captured via the stdout logging stream. This commit also adds a log4j filter to replace the logging stream filtering mechanism used to quiet some Lucene log messages that may be confusing to users. closes #94613	2023-06-13 19:31:05 -04:00
Kostas Krikellas	67211be81d	Fork TDigest library (#96086 ) * Initial import for TDigest forking. * Fix MedianTest. More work needed for TDigestPercentileTests and the TDigestTest (and the rest of the tests) in the tdigest lib to pass. Fix Dist. * Fix AVLTreeDigest.quantile to match Dist for uniform centroids. * Update docs/changelog/96086.yaml * Fix `MergingDigest.quantile` to match `Dist` on uniform distribution. * Add merging to TDigestState.hashCode and .equals. Remove wrong asserts from tests and MergingDigest. * Fix style violations for tdigest library. * Fix typo. * Fix more style violations. * Fix more style violations. * Fix remaining style violations in tdigest library. * Update results in docs based on the forked tdigest. * Fix YAML tests in aggs module. * Fix YAML tests in x-pack/plugin. * Skip failing V7 compat tests in modules/aggregations. * Fix TDigest library unittests. Remove redundant serializing interfaces from the library. * Remove YAML test versions for older releases. These tests don't address compatibility issues in mixed cluster tests as the latter contain a mix of older and newer nodes, so the output depends on which node is picked as a data node since the forked TDigest library is not backwards compatible (produces slightly different results). * Fix test failures in docs and mixed cluster. * Reduce buffer sizes in MergingDigest to avoid oom. * Exclude more failing V7 compatibility tests. * Update results for JdbcCsvSpecIT tests. * Update results for JdbcDocCsvSpecIT tests. * Revert unrelated change. * More test fixes. * Use version skips instead of blacklisting in mixed cluster tests. * Switch TDigestState back to AVLTreeDigest. * Update docs and tests with AVLTreeDigest output. * Update flaky test. * Remove dead code, esp around tracking of incoming data. * Update docs/changelog/96086.yaml * Delete docs/changelog/96086.yaml * Remove explicit compression calls. This was added to prevent concurrency tests from failing, but it leads to reduces precision. Submit this to see if the concurrency tests are still failing. * Revert "Remove explicit compression calls." This reverts commit `5352c96f65`. * Remove explicit compression calls to MedianAbsoluteDeviation input. * Add unittests for AVL and merging digest accuracy. * Fix spotless violations. * Delete redundant tests and benchmarks. * Fix spotless violation. * Use the old implementation of AVLTreeDigest. The latest library version is 50% slower and less accurate, as verified by ComparisonTests. * Update docs with latest percentile results. * Update docs with latest percentile results. * Remove repeated compression calls. * Update more percentile results. * Use approximate percentile values in integration tests. This helps with mixed cluster tests, where some of the tests where blocked. * Fix expected percentile value in test. * Revert in-place node updates in AVL tree. Update quantile calculations between centroids and min/max values to match v.3.2. * Add SortingDigest and HybridDigest. The SortingDigest tracks all samples in an ArrayList that gets sorted for quantile calculations. This approach provides perfectly accurate results and is the most efficient implementation for up to millions of samples, at the cost of bloated memory footprint. The HybridDigest uses a SortingDigest for small sample populations, then switches to a MergingDigest. This approach combines to the best performance and results for small sample counts with very good performance and acceptable accuracy for effectively unbounded sample counts. * Remove deps to the 3.2 library. * Remove unused licenses for tdigest. * Revert changes for SortingDigest and HybridDigest. These will be submitted in a follow-up PR for enabling MergingDigest. * Remove unused Histogram classes and unit tests. Delete dead and commented out code, make the remaining tests run reasonably fast. Remove unused annotations, esp. SuppressWarnings. * Remove Comparison class, not used. * Small fixes. * Add javadoc and tests. * Remove special logic for singletons in the boundaries. While this helps with the case where the digest contains only singletons (perfect accuracy), it has a major issue problem (non-monotonic quantile function) when the first singleton is followed by a non-singleton centroid. It's preferable to revert to the old version from 3.2; inaccuracies in a singleton-only digest should be mitigated by using a sorted array for small sample counts. * Revert changes to expected values in tests. This is due to restoring quantile functions to match head. * Revert changes to expected values in tests. This is due to restoring quantile functions to match head. * Tentatively restore percentile rank expected results. * Use cdf version from 3.2 Update Dist.cdf to use interpolation, use the same cdf version in AVLTreeDigest and MergingDigest. * Revert "Tentatively restore percentile rank expected results." This reverts commit `7718dbba59`. * Revert remaining changes compared to main. * Revert excluded V7 compat tests. * Exclude V7 compat tests still failing. * Exclude V7 compat tests still failing. * Restore bySize function in TDigest and subclasses.	2023-06-13 11:43:54 +03:00
Simon Cooper	6670b778db	Introduce IndexVersion class (#94827 ) This adds IndexVersion that represents the index data & metadata version, separate to the release version. Similar to TransportVersion, this will eventually be completely separated from release version.	2023-06-01 15:11:08 +01:00
Mark Vieira	f58f0d612b	Remove Version.transportVersion field (#95282 ) This is the final part of separating Version and TransportVersion. There is now no definitive mapping between the two; the two version numbers need to be managed separately.	2023-05-02 11:00:50 -07:00
Mark Vieira	b5af53db4f	Revert "Remove Version.transportVersion field (#95282 )" This reverts commit `2017e76f40`.	2023-05-02 09:37:53 -07:00
Simon Cooper	2017e76f40	Remove Version.transportVersion field (#95282 ) This is the final part of separating Version and TransportVersion. There is now no definitive mapping between the two; the two version numbers need to be managed separately.	2023-04-28 15:50:11 +01:00
Joe Gallo	abc495d355	Move redact ingest processor into x-pack (#95426 )	2023-04-21 15:04:49 -04:00
Ryan Ernst	c619be4b5e	Move preallocate module to libs (#94884 ) The preallocate module needs access to java.io internals. However, in order to open java.io to a specific module, rather than the unnamed module as was previously done, the said module must be in the boot layer. This commit moves the preallocate module to libs. It adds it to the main lib dir, though it does not add it as a compile dependency of server.	2023-04-10 13:05:43 -07:00
Mark Vieira	c5c8543b24	Publish test artifact from server project (#94906 ) This allows other projects to extend tests from server. This supports running some of these unit tests in different configurations.	2023-03-30 11:12:18 -04:00
Mary Gouseti	d38b8fc3b6	Enable dlm flag on non-snapshot builds tests (#94639 )	2023-03-22 17:31:22 +01:00
Mark Vieira	cf95c34700	Fix third party audit task when running with Java 20 (#94601 ) The upgrade to Lucene 9.6 snapshot broke third party audit when running against Java 20, presumably because the usage of the since removed MemorySegment API has been removed.	2023-03-21 13:41:02 -04:00
David Turner	421c2d4731	Add request/response body logging to HTTP tracer (#93133 ) Adds another logger, `org.elasticsearch.http.HttpBodyTracer`, which logs the body of every HTTP request and response as well as the usual summaries.	2023-03-15 11:13:36 -04:00
Mark Vieira	915b475fbc	Ignore Version.java file when applying spotless formatting	2023-02-02 10:43:07 -08:00
Mark Vieira	8e44603c06	Fix thirdPartyAudit tasks when running with Java 20 (#93394 )	2023-02-01 09:10:51 -08:00
Rene Groeschke	43a0377735	Update forbiddenapis to 3.4 (#90624 ) Fix breaking changes to source validation after change in default jdk rule set	2022-10-06 16:52:06 +02:00
Mark Vieira	3791d6da99	Silence server third party audit on Java 19 builds The forbidden apis plugin bundles an older version of ASM that doesn't support Java 19. The version of Lucene we use is a MR jar that contains Java 19 classes. Until forbidden apis updates their bundled ASM we'll just mute these checks on Java 19 for now.	2022-09-29 09:37:54 -07:00
Przemyslaw Gomulka	35ea2b13b5	[Stable plugin API] Load plugin named components (#89969 ) Stable plugins are using @ extensible and @ NamedComponents annotations to mark components to be loaded. This commit is loading extensible classNames from extensibles.json and named components from named_components.json The scanning mechanism that can generate these files will be done later in a gradle plugin/plugin installer relates #88980	2022-09-13 09:05:08 +02:00
Rene Groeschke	98b789c940	Update to to Gradle wrapper 7.5 (#85141 ) This updates the gradle wrapper to a 7.5 Fixes #85123	2022-07-19 08:12:19 +02:00
Chris Hegarty	453f12c72d	Upgrade to Log4J 2.18.0 (#88237 )	2022-07-04 11:30:38 +01:00
Rene Groeschke	cdf5bd7ed0	Rework testing conventions gradle plugin (#87213 ) This PR reworks the testing conventions precommit plugin. This plugin now: - is compatible with yaml, java rest tests and internalClusterTest (aka different sourceSets per test type) - enforces test base class and simple naming conventions (as it did before) - adds one check task per test sourceSet - uses the worker api to improve task execution parallelism and encapsulation - is gradle configuration cache compatible This also ports the TestingConventions integration testing to Spock and removes the build-tools-internal/test kit folder that is not required anymore. We also add some common logic for testing java related gradle plugins. We will apply further cleanup on other tests within our test suite in a dedicated follow up cleanup	2022-06-20 16:26:38 +02:00
Przemyslaw Gomulka	0ef15b49e9	Stable logging API - the basic use case (#86612 ) Introducing a stable logging API under libs/logging. This change covers the most common use cases for logging: fetching a logger with LogManager, emitting a log messages with Logger and Level. It is influenced by log4j2-api, but do not include Marker and LogBuilder methods. Also methods using org.apache.logging.log4j.util.Supplier are replaced with java.util.Supplier The basic implementation is present in server and injected statically in LogConfigurator relates #84478	2022-06-13 10:25:54 +02:00
Ryan Ernst	f5c0be5c89	Move spatial3d dependency to spatial (#87397 ) Server depends on spatial3d, but it is only ever used by the spatial xpack component. This commit moves the dependency there. closes #87026	2022-06-07 12:54:11 -07:00
Chris Hegarty	14fab4e4cd	Fix generated plugins.txt resource dependency (#87107 )	2022-05-26 07:48:27 +01:00
Ryan Ernst	52c52b996d	Migrate all uses of hppc BitMixer to Lucene (#85470 ) Lucene has its own copy of BitMixer. Rather than giving Elasticsearch yet another copy of these functions, this commit converts the uses to Lucene's BitMixer. relates #84735	2022-03-31 20:27:52 -07:00
Ryan Ernst	00bf5dd88f	Restrict hppc to server only (#85041 ) This commit removes the final leakage of hppc from ImmutableOpenMap and then moves hppc to an implementation dependency. Modules and plugins will no longer get hppc on their compile classpath, so new uses should not pop up. relates #84735	2022-03-17 12:34:09 -07:00
Ryan Ernst	070fcaa0ad	Move x-content implementation to a separate classloader (#83705 ) This change isolates the Jackson implementation of x-content parsers and generators to a separate classloader. The code is loaded dynamically upon accessing any x-content functionality. The x-content implementation is embedded inside the x-content jar, as a hidden set of resource files. These are loaded through a special classloader created to initialize the XContentProvider through service loader. One caveat to this approach is that IDEs will no longer trigger building the x-content implementation when it changes. However, running any test from the command line, or running a full Build in IntelliJ will trigger the directory to be built. Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>	2022-03-07 15:44:59 -08:00
Nikola Grcevski	487077cc05	Remove Lucene split packages (#82132 ) This PR fixes the Lucene split package issue in LazySoftDeletesDirectoryReaderWrapper.	2022-03-07 09:22:21 -05:00
Benjamin Trent	b592d2bf01	New random_sampler aggregation for sampling documents in aggregations (#84363 ) This adds a new sampling aggregation that performs a background sampling over all documents in an index. The syntax is as follows: ``` { "aggregations": { "sampling": { "random_sampler": { "probability": 0.1 }, "aggs": { "price_percentiles": { "percentiles": { "field": "taxful_total_price" } } } } } } ``` This aggregation provides fast random sampling over the entire document set in order to speed up costly aggregations. Testing this over a variety of aggregations and data sets, the median speed up when sampling at `0.001` over millions of documents is around 70X speed improvement. Relative error rate does rely on the size of the data and the aggregation kind. Here are some typically expected numbers when sampling over 10s of millions of documents. `p` is the configured probability and `n` is the number of documents matched by your provided filter query.	2022-03-02 14:32:30 -05:00
Mayya Sharipova	26c3dd6857	Upgrade to lucene-9.1.0-snapshot-1336263051c (#83667 ) Lucene issues that resulted in elasticsearch changes: LUCENE-9820 Separate logic for reading the BKD index from logic to intersecting it. LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator() LUCENE-10301: make the test-framework a proper module by moving all test classes to org.apache.lucene.tests LUCENE-10300: rewrite how resources are read in ukrainian morfologik analyzer: LUCENE-10054 Make HnswGraph hierarchical	2022-02-22 09:53:20 +01:00
Benjamin Trent	b610aeeabb	[ML] add new random_sampler aggregation for background sampling documents (#81228 ) This is a reincarnation of #53200 This commit adds a new `random_sampler` aggregation for randomly including documents in the collected result. API format is ```js { "aggs": { "sampler": { "random_sampler": { "probability": 0.001, //the probability that a doc is included "seed": 42 // Optional seed for consistent results }, "aggs": { "mean": { "avg": { "field": "value" } } } } } } ``` The sampling skips `n` documents where `n` is a random sampling from an optimized geometric distribution where the probability of success is the provided `probability`. Additionally, each shard queried will have a separate random stream (even when the seed is provided). One may consider `probability` as "percentage of documents matched", but that comparison is not exact as there is variability in the number of documents considered. Performance is greatly improved for many metrics and on larger datasets this improvement can be immense.	2022-01-27 11:56:19 -05:00

1 2 3

149 commits