This commit adds an optimised int8 vector distance implementation for aarch64. Additional platforms like, say, x64, will be added as a follow-up.
The vector distance implementation outperforms Lucene's Pamana Vector implementation for binary comparisons by approx 5x (depending on the number of dimensions). It does so by means of compiler intrinsics built into a separate native library and link by Panama's FFI. Comparisons are performed on off-heap mmap'ed vector data.
The implementation is currently only used during merging of scalar quantized segments, through a custom format ES814HnswScalarQuantizedVectorsFormat, but its usage will likely be expanded over time.
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co>
Co-authored-by: Mark Vieira <portugee@gmail.com>
Co-authored-by: Ryan Ernst <ryan@iernst.net>
This makes a couple of changes to regex processing in the compute
engine:
1. Process utf-8 strings directly. This should save a ton of time.
2. Snip the `toString` output if it is too big - I chose 64kb of
strings.
3. I changed the formatting of the automaton to a slightly customized
`dot` output. Because automata are graphs. Everyone knows it. And
they are a lot easier to read as graphs. `dot` is easy to convert
into a graph.
4. I implement `EvaluatorMapper` for regex operations which is pretty
standard for the rest of our operations.
We want to report that observation of document parsing has finished only upon a successful indexing.
To achieve this, we need to perform reporting only in one place (not as previously in both IngestService and 'bulk action')
This commit splits the DocumentParsingObserver in two. One for wrapping an XContentParser and returning the observed state - the DocumentSizeObserver and a DocumentSizeReporter to perform an action when parsing has been completed and indexing successful.
To perform reporting in one place we need to pass the state from IngestService to 'bulk action'. The state is currently represented as long - normalisedBytesParsed.
In TransportShardBulkAction we are getting the normalisedBytesParsed information and in the serverless plugin we will check if the value is indicating that parsing already happened in IngestService (value being != -1) we create a DocumentSizeObserver with the fixed normalisedBytesParsed and won't increment it.
When the indexing is completed and successful we report the observed state for an index with DocumentSizeReporter
small nit: by passing the documentParsingObserve via SourceToParse we no longer have to inject it via complex hierarchy for DocumentParser. Hence some constructor changes
* Add initial structure for ST_CENTROID
* Revert "Revert stab at implementing forStats for doc-values vs source"
This reverts commit cfc4341bf4.
* Refined csv-spect tests with st_centroid
* Spotless disagrees with intellij
* Fixes after reverting fieldmapper code to test GeoPointFieldMapper
* Get GeoPointFieldMapperTests working again after enabling doc-values reading
* Simplify after rebase on main
In particular, field-mappers that do not need to know about fields can have simpler calls.
* Support local physical planning of forStats attributes for spatial aggregations
* Get st_centroid aggregation working on doc-values
We changed it to produce BytesRef, so we don't (yet) need any doc-values types.
* Create both DocValues and SourceValues versions of st_centroid
* Support physical planning of DocValues and SourceValues SpatialCentroid
* Improve test for physical planning of DocValues in SpatialCentroid
* Fixed show functions for st_centroid
* More st_centroid tests with mv_expand
To test single and multi-value centroids
* Fix st_centroid from point literals
The blocks contained BytesRef byte[] with multiple values, and we were ignoring the offsets when decoding, so decoding the first value over and over instead of decoding the subsequent values.
* Teach CsvTests to handle spatial types alternative loading from doc-values
Spatial GEO_POINT and CARTESIAN_POINT load from doc-values in some cases. If the physical planner has planned for this, we need the CsvTests to also take that into account, changing the type of the point field from BytesRefBlock to LongBlock.
* Fixed failing NodeSubclassTests
Required making the new constructor public and enabling Set as a valid parameter in the test framework.
* More complex st_centroid tests and fixed bug with multiple aggs
When there were multiple agregations in the same STATS, we were inadvertently re-ordering them, causing the wrong Blocks to be fed to the wrong aggregator in the coordinator node.
* Update docs/changelog/104218.yaml
* Fix automatically generated changelog file
* Fixed failing test
The nodes can now sometimes be Set, which is also a Collection, but not a List, and therefor never can be a subset of the children.
* More tests covering more combinations including MV_EXPAND and grouping
* Added cartesian st_centroid with grouping test
We could not add MV_EXPAND tests since the cartesian data does not have multi-value columns, but the geo_point tests are sufficient for this since they share the same code.
* Reduce flaky tests by sorting results
* Reduce flaky tests by sorting results
* Added tests for stats on stats to ensure planner coped
* Add unit tests to ensure doc-values in query planning complex cases
* Some minor updates from code review
* Fixes after rebase on main
* Get correct error message on unsupported geo_shape for st_centroid
* Refined point vs shape differences after merging main
* Added basic docs
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 4bc596a442.
* Fixed broken docs tag link
* Simplify BlockReaderSupport in MapperTestCase from code review
* Moved spatial aggregations into a sub-package
* Added some more code review updates, including nested tests
* Get nested functions working, if only from source values for now
* Code review update
* Code review update
* Added second location column to airports for wider testing
* Use second location in tests, including nulls
Includes a test fix for loading and converting nulls to encoded longs.
* Fixed bug supporting multi spatial aggregations in the local node
The local physical planner only marked a single field for stats loading, but marked all spatial aggregations for stats loading, which load to only one aggregation getting the right data, while the rest would get the wrong data.
* Renamed forStats to fieldExtractPreference for clarity
Now the planner decides whether to load data from doc-values. To remove the confusion of preferDocValues==false in the non-spatial cases, we use an ENUM with the default value of NONE, to make it clear we're leaving the choice up to the field type in all non-spatial cases.
* EsqlSpecIT was failing on very high precision centroids on different computers
This was not reproducible on the development machine, but CI machines were sufficiently different to lead to very tiny precision changes over very large Kahan summations. We fixed this by reducing the need for precision checks in clustered integration tests.
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 12c6980881.
* Fixed changelog entry
This implements metrics for the threadpools.
The aim is to emit metrics for the various threadpools, the metric callback should be created when the threadpool is created, and removed before the threadpool is shutdown.
The PR also includes a test for the new metrics, and some additions to the metrics test plugin.
Finally the metric name check has been modified to allow some of the non compliant threadpools (too long, includes - )
Co-authored-by: Przemyslaw Gomulka <przemyslaw.gomulka@elastic.co>
This implements metrics for the threadpools.
The aim is to emit metrics for the various threadpools, the metric callback should be created when the threadpool is created, and removed before the threadpool is shutdown.
The PR also includes a test for the new metrics, and some additions to the metrics test plugin.
Finally the metric name check has been modified to allow some of the non compliant threadpools (too long, includes - )
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Przemyslaw Gomulka <przemyslaw.gomulka@elastic.co>
Many places in ESQL receive a `BigArrays` and a `DriverContext`. We only
need one of the two because `DriveContext` *has* `BigArrays`. This stops
passing `BigArrays` many, many, many places.
Loading fields from `_source` is *super* slow because you have to
decompress the stored fields and then turn the stored field into a
map-of-maps. And then dig through the map-of-maps. This adds "does this
field exist" style checks before most loads from `_source`. Not all
fields can do it, but most fields can.
This really improves the performance of our
`esql_dissect_duration_and_stats` benchmark, mostly because it is
running `dissect` on a field that has to load from `_source` that isn't
in many of the documents. Here's the performance:
```
| 50th percentile service time | 867.667 | 100.491 | -767.176 | ms | -88.42% |
| 90th percentile service time | 886.042 | 102.434 | -783.608 | ms | -88.44% |
| 100th percentile service time | 893.035 | 104.598 | -788.437 | ms | -88.29% |
```
This optimizes loading fields across many, many indices by resolving the
field loading infrastructure when it's first needed rather than up
front. This speeds things up because, if you are loading from many many
shards, you often don't need to set up the field loading infrastructure
for all shards at all - often you'll just need to set it up for a couple
of the shards.
Use vectors consistently for storing blocks' values.
This allows for simple implementations of expand and filter by simply sharing the (refcounted) vector contained in a block.
This PR removes Builder APIs that use the non-breaking block factory.
Similar to the previous PRs, some tests now explicitly use the
non-breaking block factory. The main goal of this PR is to remove the
non-breaking block factory from the production code.
This PR removes APIs in Vectors that use the non-breaking block factory.
Some tests now explicitly use the non-breaking factory. The goal of this
PR, along with some follow-ups, is to phase out the non-breaking block
factory in production. We can gradually remove its usage in tests later
This adds support for loading a text field from a parent keyword field.
The mapping for that looks like:
```
"properties": {
"foo": {
"type": "keyword",
"fields": {
"text": { "type": "text" }
}
}
}
```
In this case it's safe to load the `text` subfield from the doc values
for the `keyword` field above.
Closes#102473
Here we export both parent and children circuit breaker trip counts as metrics so that we can collect their values using APM. We expose a counter for the trip count of the parent circuit breaker and a counter for each trip count of children circuit breakers including:
* field data circuit breakers
* per-request circuit breakers
* in-flight requests circuit breakers
* custom circuit breakers used by plugins (EQL and Machine Learning)
The circuit breaker metrics include:
* es.breaker.parent.trip.total
* es.breaker.field_data.trip.total
* es.breaker.request.trip.total
* es.breaker.in_flight_requests.trip.total
* es.breaker.eql_sequence.trip.total
* es.breaker.in_model_inference.trip.total
Each of the metrics is exposed at node level.
This adds support for load the `_source` field using the syntax:
```
FROM test [METADATA _source]
```
The `_source` field is loaded as a special type - `_source` which no
functions support (1). We just render it on the output. Which looks
like:
```
$ curl -XDELETE -uelastic:password localhost:9200/test
$ curl -XPOST -HContent-Type:application/json -uelastic:password localhost:9200/test/_doc/1?refresh -d'{
"words": "words",
"other stuff": [
"wow",
"such",
"suff"
]
}'
$ curl -XPOST -HContent-Type:application/json -uelastic:password localhost:9200/_query?pretty -d'{
"query": "FROM test [METADATA _source] | KEEP _source | LIMIT 1"
}'
{
"columns" : [
{
"name" : "_source",
"type" : "_source"
}
],
"values" : [
[
{
"words" : "words",
"other stuff" : [
"wow",
"such",
"suff"
]
}
]
]
}
```
The `_source` is just a json object. We use the same infrastructure to
convert it to json as the `_search` response.
This works for both stored `_source` and synthetic `_source`, but it
runs row-by-row every time. This *perfect* for stored `_source` but it's
less nice to synthetic `_source`. We'd be better of rebuilding synthetic
`_source` from blocks but that'd require a lot of new infrastructure.
And synthetic `_source` isn't going to be fast anyway.
(1): `IS NULL` and `IS NOT NULL` support `_source` because we get that
for free.
This modifies ESQL to load a list of fields at one time which is especially
effective when loading from stored fields or _source because it allows
visiting the stored fields one time.
Part of #101322
* Modularize shard availability service
This commit moves the `ShardsAvailabilityHealthIndicatorService` to a package and modularizes it
with exports so that Serverless can make use of it as a superclass.
Relates to #101394
This changes how we load values in ESQL, delegating to the
`MappedFieldType` like we do with doc values and synthetic
source. This allows a much more OO way of getting the loads
working which makes that path much easier to read. And! It
means those code paths look like doc values. So there's
symmetry. It's like it rhymes.
There are a few side effects here:
1. It's fairly simple to load from ordinals efficiently. I
wrote some block-at-a-time code for resolving ordinals
and it's about twice as fast. With more work it should
be possible to make custom ordinal-shaped blocks move
through the system to save space and speed things up.
2. Most fields can now be loaded from `_source`. Everything
that can be loaded from `_source` in scripts will load
from `_source` in ESQL.
3. We get a *lot* more tests for loading fields in
different configurations by piggybacking on the synthetic
source testing framework.
4. Loading from `_source` no longer sorts the fields. Same
for stored fields. Now we keep them in whatever they were
stored in. This is a pretty marginal time save because
loading from `_source` is so much more time consuming
than the sort. But it's something.
This adds comprehensive tests for `ExpressionEvaluator` making sure that it releases `Block`s. It fixes all of the `mv_*` evaluators to make sure they release as well.
This commit updates the hash grouping operator to close input pages, as well as use the block factory for internally created blocks.
Additionally:
* Adds a MockBlockFactory to help with tracking block creation
* Eagerly creates the block view of a vector, which helps with tracking since there can be only one block view instance per vector
* Resolves an issue with Filter Blocks, whereby they previously tried to emit their contents in toString
This creates `Block.Ref`, a reference to a `Block` which may or may not
be part of a `Page`. `Block.Ref` is `Releasable` and closing it is a
noop if the `Block` is part of a `Page`, but if it is "free floating"
then closing the `Block.Ref` will close the block.
It also modified `ExpressionEvaluator` to return a `Block.Ref` instead
of a `Block` - so you tend to work with `ExpressionEvaluator`s like
this:
```
try (Block.Ref ref = eval.eval(page)) {
return ref.block().doStuff();
}
```
This should make it *much* easier to release the memory from `Block`s
built by `ExpressionEvaluator`s.
This change is mostly mechanical, introducing the new signature for
`ExpressionEvaluator`. In a follow up change I'll modify the tests to
make sure we're correctly using it to close pages.
I did think about changing `ExpressionEvaluator` to add a method telling
you if the block that it returns must be closed or not. This would have
been more difficult to work with, and, ultimately, limiting.
Specifically, it is possible for an `ExpressionEvaluator` to *sometimes*
return a free floating block and other times return one that is
contained in a `Page`. Imagine `mv_concat` - it returns the block it
receives if the block doesn't have multivalued fields. Otherwise it
concats things. If that block happens to come directly out of the
`Page`, then `mv_concat` will sometimes produce free floating blocks and
sometimes not.
Today, we have the ability to specify whether multivalued fields are
sorted in ascending order or not. This feature allows operators like
topn to enable optimizations. However, we are currently missing the
deduplicated attribute. If multivalued fields are deduplicated at each
position, we can further optimize operators such as hash and mv_dedup.
In fact, blocks should not have mv_ascending property alone; it always
goes together with mv_deduplicated. Additionally, mv_dedup or hash
should generate blocks that have only the mv_dedup property.
This commit adds a BlockFactory - an extra level of indirection when building blocks. The factory couples circuit breaking when building, allowing for incrementing the breaker as blocks and Vectors are built.
This PR adds the infrastructure to allow us to move the operators and implementations over to the factory, rather than actually moving all there at once.
This prevents topn operations from using too much memory by hooking them
into circuit breaking framework. It builds on the work done in
https://github.com/elastic/elasticsearch/pull/99316 that moved all topn
storage to byte arrays by adding circuit breaking to process of growing
the underlying byte array.
This commit adds DriverContext to the construction of Evaluators.
DriverContext is enriched to carry bigArrays, and will eventually carry a BlockFactory - it's the context for code requiring to create instances of blocks and big arrays.
This lowers topn's memory usage somewhat and makes it easier to track
the memory usage. That looks like:
```
"status" : {
"occupied_rows" : 10000,
"ram_bytes_used" : 255392224,
"ram_used" : "243.5mb"
}
```
In some cases the memory usage savings is significant. In an example
with many, many keys the memory usage of each row drops from `58kb` to
`25kb`. This is a little degenerate though and I expect the savings to
normally be on the order of 10%.
The real advantage is memory tracking. It's *easy* to track used memory.
And, in a followup, it should be fairly easy to track circuit break the
used memory.
Mostly this is done by adding new abstractions and moving existing
abstractions to top level classes with tests and stuff.
* `TopNEncoder` is now a top level class. It has grown the ability to *decode* values as well as encode them. And it has grown "unsortable" versions which don't write their values such that sorting the bytes sorts the values. We use the "unsortable" versions when writing values.
* `KeyExtractor` extracts keys from the blocks and writes them to the row's `BytesRefBuilder`. This is basically objects replacing one of switch statements in `RowFactory`. They are more scattered but easier to test, and hopefully `TopNOperator` is more readable with this behavior factored out. Also! Most implementations are automatically generated.
* `ValueExtractor` extracts values from the blocks and writes them to the row's `BytesRefBuilder`. This replaces the other switch statement in `RowFactory` for the same reasons, except instead of writing to many arrays it writes to a `BytesRefBuilder` just like the key as compactly as it can manage.
The memory savings comes from three changes: 1. Lower overhead for
storing values by encoding them rather than using many primitive arrays.
2. Encode the value count as a vint rather than a whole int. Usually
there are very few rows and vint encodes that quite nicely. 3. Don't
write values that are in the key for single-valued fields. Instead we
read them from the key. That's going to be very very common.
This is unlikely to be faster than the old code. I haven't really tried
for speed. Just memory usage and accountability. Once we get good
accounting we can try and make this faster. I expect we'll have to
figure out the megamorphic invocations I've added. But, for now, they
help more than they hurt.
CompatibilityVersions now holds a map of system index names to their
mappings versions, alongside the transport version. We also add mapping
versions to the "minimum version barrier": if a node has a system index
whose version is below the cluster mappings version for that system
index, it is not allowed to join the cluster.
* ESQL: Disable optimizations with bad null handling
We have optimizations that kick in when aggregating on the following
pairs of field types:
* `long`, `long`
* `keyword`, `long`
* `long`, `keyword`
These optimizations don't have proper support for `null` valued fields
but will grow that after #98749. In the mean time this disables them in
a way that prevents them from bit-rotting.
* Update docs/changelog/99434.yaml
Cluster state currently holds a cluster minimum transport version and a map of nodes to transport versions. However, to determine node compatibility, we will need to account for more types of versions in cluster state than just the transport version (see #99076). Here we introduce a wrapper class to cluster state and update accessors and builders to use the new method. (I would have liked to re-use org.elasticsearch.cluster.node.VersionInformation, but that one holds IndexVersion rather than TransportVersion.
* Introduce CompatibilityVersions to cluster state class