writerWithOffset uses a lambda to create a RangeMissingHandler however,
the RangeMissingHandler interface has a default implementation for `sharedInputStreamFactory`.
This makes `writerWithOffset` delegate to the received writer only for the `fillCacheRange`
method where the writer itself perhaps didn't have the `sharedInputStream` method invoked
(always invoking `sharedInputStream` before `fillCacheRange` is part of the contract of the
RangeMissingHandler interface)
This PR makes `writerWithOffset` delegate the `sharedInputStream` to the underlying writer.
SharedBlobCacheService keeps track of the free regions in a
ConcurrentLinkedQueue. We use its "size()" method in three places
outside of tests but unfortunately this is not a constant time operation
because of the asynchronous nature of this queue. This change removes
two of the uses where we only check if the queue is empty by calling the
"isEmpty()" method instead.
This adds the file extentions for the blobs we request when populating the
cache.
The possible values for lucene extensions are around 50 and we use a special
"other" category for everything else, as a fallback.
* Exhaustive testParseFractionalNumber
* Refactor: encapsulate ByteSizeUnit constructor
* Refactor: store size in bytes
* Support up to 2 decimals in parsed ByteSizeValue
* Fix test for rounding up with no warnings
* ByteSizeUnit transport changes
* Update docs/changelog/120142.yaml
* Changelog details and impact
* Fix change log breaking.area
* Address PR comments
This updates the gradle wrapper to 8.12
We addressed deprecation warnings due to the update that includes:
- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.
This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
The most relevant ES changes that upgrading to Lucene 10 requires are:
- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area
Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
Some small speedups in here from pre-evaluating `isFiltered(properties)`
in lots of spots and not creating an unused `SimpleKey` in `toConcreteKey`
which runs a costly string interning at some rate.
Other than that, obvious deduplication using existing utilities or
adding obvious missing overloads for them.
The blob cache has an io field per region that is declared volatile, since it is originally null
and then later initialized. However, during "tryRead" we do not need the volatile access.
This commit changes the field to be non-volatile and use proper volatile accesses only
when needed.
This commit fixes lint warnings that arise when compiling with javac from JDK 21.
This change is in preparation for an eventual bump of Elasticsearch to a minimum of JDK 21, in ES 9.0.
Change `fillCacheRange` method to accept a completion listener that must be called by `RangeMissingHandler` implementations when they finish fetching data. By doing so, we support asynchronously fetching the data from a third party storage. We also support asynchronous `SourceInputStreamFactory` for reading gaps from the storage.
Change `fillCacheRange` method to accept a completion listener that must be called by `RangeMissingHandler` implementations when they finish fetching data. By doing so, we support asynchronously fetching the data from a third party storage. We also support asynchronous `SourceInputStreamFactory` for reading gaps from the storage.
LongHistogram's default base2 exponentional aggregagtion is optimized
for latency range of 1ms to 100s. Hence we should record time metrics in
milliseconds instead of micros.
Relates: ES-9065
Change `fillCacheRange` method to accept a completion listener that must be called by `RangeMissingHandler` implementations when they finish fetching data. By doing so, we support asynchronously fetching the data from a third party storage. We also support asynchronous `SourceInputStreamFactory` for reading gaps from the storage.
Depends on #111177
This PR augments the RangeMissingHandler interface to support shared
input stream which is reused when filling mulitple gaps. The shared
input stream is meant to be consumed sequentially to fill the list of
gaps in sequential order. The existing behaviour is preserved when
shared input stream is not used, i.e. when it is `null`.
This utility is almost entirely used in tests where an infinite wait is
inappropriate, and the exception-mangling behaviour is unhelpful, so
this commit replaces it with safer alternatives. There should be no need
for an easy way to block production threads forever.
This commit moves the file preallocation functionality into
NativeAccess. The code is basically the same. One small tweak is that
instead of breaking Java access boundaries in order to get an open file
handle, the new code uses posix open directly.
relates #104876
We need to use `System#nanoTime` here. We use this time source for
measuring the timing of blob IO. The 200ms default accuracy isn't enough
here, we need accurate timing.
Test required some adjustments now listeners are
not completed on progress update if they are waiting
up to the end of the gap.
Relates #108095Closes#109237
When the range to read (subRange) is already completed, we can
optimize by returning early without asking for the range to write
(range) to be fully filled and save one or more fetch requests.
ProgressListenableActionFuture executes listeners once the
progress is updated to a value the listener is waiting for.
But when a listener waits for the exact end of a range/gap,
there is no need to execute it at the time the progress is
updated to the end of the range. Instead we can leave the
listener and it will be executed when the range/gap is
completed (which should happen just after).
I'd like to propose this change because we can have read
listeners that use CacheFile#tryRead which relies on
SparseFileTracker#checkAvailable and the volatile
SparseFileTracker#complete field, which is only updated
when the range is completed, just before executing listeners.
If listeners are executed when the progress is updated to
the end of the range, they may call tryRead before the
SparseFileTracker#complete field is updated, making the
fast read path fails.
This change ensure that CacheFileRegion has references at the time
the Gap is completed. It also ensure that a reference on CacheFileRegion
is held until all writes/gaps are completed (ie not acquiring a ref before
writing each gap anymore).
Finally, it ensure that the Gap is correctly failed if the write task is rejected.
A common deadlock pattern is waiting and completing a future on the same executor.
This only works until the executor is fully depleted of threads. Now assert that
waiting for a future to be completed and the completion happens on different
executors.
Introduced UnsafePlainActionFuture, used in all offending places, allowing those
to be tackled independently.
Currently in ShardBytes we only support copying from an inputstream
using an intermediate buffer. This commit adds a method to copy from the
buffer directly without reading from the input stream.
Like with slicing we can optimize cloning the same way. This has lots of potential for saving memory
when dealing with offheap fst stores and the like where there's a bunch of small slices getting cloned.
This commit adds a maybeFetchRange() method that can be used
to fetch a given range in a specific blob region if at least a free
page is available. The range can be larger than the blob region
and is remapped (mapSubRangeToRegion();) before hitting the
cache. It is basically a variant of maybeFetchRegion that allow
passing a range instead of the whole region.