The log4j JUL bridge turned out to have issues because it relied on java
beans. This commit implements a custom bridge between JUL and Log4j.
closes#94613
This commit adds the Log4j JUL bridge so that messages using JUL are
more nicely converted to log4j messages. Currently these messages are
captured via the stdout logging stream. This commit also adds a log4j
filter to replace the logging stream filtering mechanism used to quiet
some Lucene log messages that may be confusing to users.
closes#94613
* Initial import for TDigest forking.
* Fix MedianTest.
More work needed for TDigestPercentile*Tests and the TDigestTest (and
the rest of the tests) in the tdigest lib to pass.
* Fix Dist.
* Fix AVLTreeDigest.quantile to match Dist for uniform centroids.
* Update docs/changelog/96086.yaml
* Fix `MergingDigest.quantile` to match `Dist` on uniform distribution.
* Add merging to TDigestState.hashCode and .equals.
Remove wrong asserts from tests and MergingDigest.
* Fix style violations for tdigest library.
* Fix typo.
* Fix more style violations.
* Fix more style violations.
* Fix remaining style violations in tdigest library.
* Update results in docs based on the forked tdigest.
* Fix YAML tests in aggs module.
* Fix YAML tests in x-pack/plugin.
* Skip failing V7 compat tests in modules/aggregations.
* Fix TDigest library unittests.
Remove redundant serializing interfaces from the library.
* Remove YAML test versions for older releases.
These tests don't address compatibility issues in mixed cluster tests as
the latter contain a mix of older and newer nodes, so the output depends
on which node is picked as a data node since the forked TDigest library
is not backwards compatible (produces slightly different results).
* Fix test failures in docs and mixed cluster.
* Reduce buffer sizes in MergingDigest to avoid oom.
* Exclude more failing V7 compatibility tests.
* Update results for JdbcCsvSpecIT tests.
* Update results for JdbcDocCsvSpecIT tests.
* Revert unrelated change.
* More test fixes.
* Use version skips instead of blacklisting in mixed cluster tests.
* Switch TDigestState back to AVLTreeDigest.
* Update docs and tests with AVLTreeDigest output.
* Update flaky test.
* Remove dead code, esp around tracking of incoming data.
* Update docs/changelog/96086.yaml
* Delete docs/changelog/96086.yaml
* Remove explicit compression calls.
This was added to prevent concurrency tests from failing, but it leads
to reduces precision. Submit this to see if the concurrency tests are
still failing.
* Revert "Remove explicit compression calls."
This reverts commit 5352c96f65.
* Remove explicit compression calls to MedianAbsoluteDeviation input.
* Add unittests for AVL and merging digest accuracy.
* Fix spotless violations.
* Delete redundant tests and benchmarks.
* Fix spotless violation.
* Use the old implementation of AVLTreeDigest.
The latest library version is 50% slower and less accurate, as verified
by ComparisonTests.
* Update docs with latest percentile results.
* Update docs with latest percentile results.
* Remove repeated compression calls.
* Update more percentile results.
* Use approximate percentile values in integration tests.
This helps with mixed cluster tests, where some of the tests where
blocked.
* Fix expected percentile value in test.
* Revert in-place node updates in AVL tree.
Update quantile calculations between centroids and min/max values to
match v.3.2.
* Add SortingDigest and HybridDigest.
The SortingDigest tracks all samples in an ArrayList that
gets sorted for quantile calculations. This approach
provides perfectly accurate results and is the most
efficient implementation for up to millions of samples,
at the cost of bloated memory footprint.
The HybridDigest uses a SortingDigest for small sample
populations, then switches to a MergingDigest. This
approach combines to the best performance and results for
small sample counts with very good performance and
acceptable accuracy for effectively unbounded sample
counts.
* Remove deps to the 3.2 library.
* Remove unused licenses for tdigest.
* Revert changes for SortingDigest and HybridDigest.
These will be submitted in a follow-up PR for enabling MergingDigest.
* Remove unused Histogram classes and unit tests.
Delete dead and commented out code, make the remaining tests run
reasonably fast. Remove unused annotations, esp. SuppressWarnings.
* Remove Comparison class, not used.
* Small fixes.
* Add javadoc and tests.
* Remove special logic for singletons in the boundaries.
While this helps with the case where the digest contains only
singletons (perfect accuracy), it has a major issue problem
(non-monotonic quantile function) when the first singleton is followed
by a non-singleton centroid. It's preferable to revert to the old
version from 3.2; inaccuracies in a singleton-only digest should be
mitigated by using a sorted array for small sample counts.
* Revert changes to expected values in tests.
This is due to restoring quantile functions to match head.
* Revert changes to expected values in tests.
This is due to restoring quantile functions to match head.
* Tentatively restore percentile rank expected results.
* Use cdf version from 3.2
Update Dist.cdf to use interpolation, use the same cdf
version in AVLTreeDigest and MergingDigest.
* Revert "Tentatively restore percentile rank expected results."
This reverts commit 7718dbba59.
* Revert remaining changes compared to main.
* Revert excluded V7 compat tests.
* Exclude V7 compat tests still failing.
* Exclude V7 compat tests still failing.
* Restore bySize function in TDigest and subclasses.
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support.
- Use sha256 in favor of sha1 as sha1 is not considered safe these days.
Closes https://github.com/elastic/elasticsearch/issues/69736
This PR uses Lucene-9.3 snapshot in Elasticsearch 8.4. Noticeable changes in this Lucene snapshot:
- Merge-on-refresh (disabled)
- No more pathological merging
- SortedSetDocValues#count for value_count aggs
Final (hopefully!) snapshot before the 9.2.0 release
* Update test to expect persian tokenfilter - will be exposed later
* Fix KnnVectorQueryBuilderTests::doAssertLuceneQuery
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Notable changes include:
count implementations for MultiRangeQuery and IndexSortedNumericDocValuesRangeQuery, which may speed up certain aggregations
more efficient decoding of docids in BKD reader
This PR upgrades Lucene to a newer snapshot `9.1.0-snapshot-9497524cc2d`.
Changes:
* Adapt to `LeafReader#searchNearestVectors` signature change
* Adapt checks in `GeometryIndexerTests`, `SearchServiceTests`, `FiltersAggregatorTests`, `AggregationProfilerIT`
* Address highlighting failures in `MultiPhrasePrefixQuery` and `HasChildQueryBuilder.LateParsingQuery`
Lucene issues that resulted in elasticsearch changes:
LUCENE-9820 Separate logic for reading the BKD index from logic to intersecting it.
LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator()
LUCENE-10301: make the test-framework a proper module by moving all test
classes to org.apache.lucene.tests
LUCENE-10300: rewrite how resources are read in ukrainian morfologik analyzer:
LUCENE-10054 Make HnswGraph hierarchical
Originally we tried to a log4j update in #47298, but we were unable to
that due to the `DeprecationLoggerTests.testLogPermissions` test
failing. The test relied on mocking and got removed in
https://github.com/elastic/elasticsearch/pull/61474/files#diff-70de5a6ba5c637e7f19c51341417760d6e957beb5a1fa5703049095ea2719ee0L47
Now we should be able to the upgrade and then we can address the Security
Manager permission questions raised in #47298 separately.
* Initialize pattern layout with AccessController.doPrivileged
We need the `getClassLoader` permissions
* Disable the SecurityManager for command testing because of `CommandLoggingConfigurator`
which fails under the `SecurityManager`
This commit upgrades JNA to 5.10.0. The primary reason for this upgrade
is to adopt a newer version of `libffi` which supports the
`LIBFFI_TMPDIR` environment variable so that we can change the location
of temporary executables and resolve#77014.
This commit also switches to the upstream JNA releases rather than using
the specially-repackaged ones we've used in the past.
This includes the following changes:
* LUCENE-10180: Avoid using lambdas in SegmentMerger
* LUCENE-10187: Reduce DirectWriter's padding
* LUCENE-10193: Cut over more array access to VarHandles
* LUCENE-10189: Optimize flush of doc-value fields that are effectively single-valued
* LUCENE-10165: Implement Lucene90DocValuesProducer#getMergeInstance
Includes the following new commits of interest:
* LUCENE-10150: override readLongs() in ByteBuffersDataInput
* LUCENE-10146: Add VectorSimilarityFunction.COSINE
* LUCENE-10140: Correct minimizing iterator sub-matches
* LUCENE-10103 Make QueryCache respect Accountable queries
* LUCENE-10170: Restore compression speed for LZ4.
This commit removes the dependency on the Joda library. It removes
many remaining references to joda, though not all because some comments
are worthwhile for historical reasoning.
Includes the following lucene changes:
* LUCENE-10145: Speed up byte[] comparisons using VarHandles.
* LUCENE-10143: Delegate primitive writes in RateLimitedIndexOutput
* LUCENE-10182: No longer check dvGen.
* LUCENE-10153: Speed up BKDWriter using VarHandles.
* LUCENE-10150: override ByteBuffersDataInput readLong/readInt/readShort
Includes the following changes:
- Updates to prevent package splits in ES: LUCENE-10118, LUCENE-10132
- Speedups in writing doc values: LUCENE-10127, LUCENE-10123, LUCENE-10125
- Speedups in writing primitives: LUCENE-10125
- Sort-after bugfixes: LUCENE-10126
Upgrade to a new Lucene 9 snapshot that includes LUCENE-10119 so we can
re-enable the sort optimization with points for scroll and search_after
requests.
Relates #78230
Java9 added a number of features that are useful to improve compression
and decompression. These include the Arrays#mismatch method and
VarHandles. This commit adds compression tools forked from the java-lz4
library which include these improvements. We hope to contribute these
changes back to the original project, however the project currently
supports Java7 so this is not possible at the moment.
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
Upgrades to Lucene-8.9 snapshot which includes:
- LUCENE-9507: Custom order for leaves (/cc @mayya-sharipova)
- LUCENE-9935: Enable bulk merge for stored fields with index sort
Latest JDKs are shipping with timezone data 2021a which is also included
in latest joda. In order to have the timezone information consistent in
both joda and java.time joda dependnecy has to be updated
closes#72028
This commit upgrades our JNA dependency to 5.7.0-1. The highlight of
this upgrade is that the Darwin native libraries are now universal
including support for both Intel-based and Apple Silicon-based Mac
hardware.
This change upgrades to the latest Lucene 8.8.0 snapshot.
It also restores the compression on binary doc values that was lost in the last snapshot upgrade.
The compression is now configurable on binary doc values but we don't expose this functionality yet so this commit ensures that we pick the same compression mode as previous releases (BEST_COMPRESSION).