Commit graph

14959 commits

Author SHA1 Message Date
Yang Wang
fd9b290190
Use the current term in a logging where it is relevant (#116786) (#116901)
As title says, this PR logs current term instead of last-accepted term
in a logging message where the former is expected.

(cherry picked from commit 131d3c1288)
2024-11-18 11:10:02 +11:00
Benjamin Trent
ef5f439a93
[8.x] Add multi_dense_vector value access to scripts (#116610) (#116850)
* Add multi_dense_vector value access to scripts (#116610)

This adds value access to multi_dense_vector values in scripts. The
users will get:

 - Count of vectors per field
 - Magnitudes of all the individual vectors
 - Access to each vector with an iterator

I will happily take design critiques around how these are exposed in
scripting.

I initially though of just providing directly `float[][]` access, but
this seems to have some unfavorable behavior around creating a TON of
garbage. The reason is that each field could have a different number of
vectors, so allocating a new collection of `float[dim]` for every field
seemed rough. 

Generally, when scripting or using the vectors, an iterator should be
enough and I have the iterator backed by a simple buffer to keep garbage
down.

* fixing test
2024-11-16 02:22:27 +11:00
Brendan Cully
4a261ad2a7
Attempt to clean up index before remote transfer (#115142) (#116854)
If a node crashes during recovery, it may leave temporary files behind
that can consume disk space, which may be needed to complete recovery.
So we attempt to clean up the index before transferring files from
a recovery source. We attempt to load the latest snapshot of the target
directory, which we supply to store's `cleanupAndVerify` method to remove
any files not referenced by it. We treat a failure to load the latest snapshot
as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to
purge the entire target directory and pull from scratch.

Closes #104473
2024-11-15 10:07:16 +11:00
Andrei Dan
28d5ded166
[8.x] Fix testSearchAndRelocateConcurrently (#116806) (#116830)
* Fix testSearchAndRelocateConcurrently (#116806)

This aims to test we can search through replica shard relocations.
However, the way the test was written it was sometimes also starting
another data node. The concurrent search requests would sometimes
hit this new node, before its cluster state was RECOVERED.

The search action throws exception when the cluster state is not
recovered as it needs to be able to read the cluster state.

This fixes the test to grab a coy of the bootstrapped nodes and use them when calling the _search API
before the cluster (potentially) resizes.

(cherry picked from commit 0be75e1b69)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

* compile
2024-11-15 07:25:02 +11:00
Mark J. Hoy
2459aa7016
add backport transport versions (#116827) (#116834)
(cherry picked from commit 74e6009bb3)
2024-11-15 05:44:40 +11:00
Aurélien FOUCRET
e4dbf3823a
Add tracking for query rule types (#116357) (#116820)
* Add total rule type counts to list calls and xpack usage

* Add feature

* Update docs/changelog/116357.yaml

* Fix docs test failure & update yaml tests

* remove additional spaces

---------

Co-authored-by: Mark J. Hoy <mark.hoy@elastic.co>
(cherry picked from commit 1b03a96e52)

Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
2024-11-14 18:16:56 +01:00
David Kyle
37ef5f21a6
[8.x] [ML] Pass inference timeout to start deployment (#116725) (#116733)
* [ML] Pass inference timeout to start deployment (#116725)

Default inference endpoints automatically deploy the model on inference
the inference timeout is now passed to start model deployment so users
can control that timeout

* handle max time

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-15 03:11:10 +11:00
Luke Whiting
8126bf5a49
[8.x] Introduce Email Address Allow Lists For Watcher (#116672) (#116805)
* Introduce Email Address Allow Lists For Watcher (#116672)

* New setting plus mutual exclusiveness validation

* New domain list checking

* Email service tests

* Documentation updates

* PR Changes

Fix comment

* Backport missing Settings method for default value with validator
2024-11-15 02:15:12 +11:00
Carlos Delgado
809fd9e7d7
Remove unused method introduced in #113194 (#116793) (#116807)
(cherry picked from commit 25223dddae)
2024-11-14 14:15:07 +01:00
Carlos Delgado
161b7ef129
[8.x] Add Search Phase APM metrics (#113194) (#116751) 2024-11-14 13:02:00 +01:00
Craig Taverner
f5246cda55
Use SearchStats instead of field.isAggregatable in data node planning (#115744) (#116800)
Since ES|QL makes use of field-caps and only considers `isAggregatable` during Lucene pushdown, turning off doc-values disables Lucene pushdown. This is incorrect. The physical planning decision for Lucene pushdown is made during local planning on the data node, at which point `SearchStats` are known, and both `isIndexed` and `hasDocValues` are separately knowable. The Lucene pushdown should happen for `isIndexed` and not consider `hasDocValues` at all.

This PR adds hasDocValues to SearchStats and the uses isIndexed and hasDocValue separately during local physical planning on the data nodes. This immediately cleared up one issue for spatial data, which could not push down a lucene query when doc-values was disabled.

Summary of what `isAggregatable` means for different implementations of `MappedFieldType`:
* Default implementation of `isAggregatable` in `MappedFieldType` is `hasDocValues`, and does not consider `isIndexed`
* All classes that extend `AbstractScriptFieldType` (eg. `LongScriptFieldType`) hard coded `isAggregatable` to `true`. This presumably means Lucene is happy to mimic having doc-values
* `TestFieldType`, and classes that extend it, return the value of `fielddata`, so consider the field aggregatable if there is field-data.
* `AggregateDoubleMetricFieldType` and `ConstantFieldType` hard coded to `true`
* `DenseVectorFieldType` hard coded to `false`
* `IdFieldType` return the value of `fieldDataEnabled.getAsBoolean()`

In no case is `isIndexed` used for `isAggregatable`. However, for our Lucene pushdown of filters, `isIndexed` would make a lot more sense. But for pushdown of TopN, `hasDocValues` makes more sense.

Summarising the results of the various options for the various field types, where `?` means configrable:

| Class | isAggregatable | isIndexed | isStored | hasDocValues |
| --- | --- | --- | --- | --- |
| AbstractScriptFieldType                 | true  | false | false | false |
| AggregateDoubleMetricFieldType | true  | true  | false | false |
| DenseVectorFieldType                    | false | ?       | false | !indexed |
| IdFieldType                                      | fieldData | true | true | false |
| TsidExtractingIdField                       | false | true | true | false |
| TextFieldType                                   | fieldData | ? | ? | false |
| ? (the rest)                                        | hasDocValues | ? | ? | ? |

It has also been observed that we cannot push filters to source without checking `hasDocValues` when we use the `SingleValueQuery`. So this leads to three groups of conditions:

| Category | require `indexed` | require `docValues` |
| --- | --- | --- |
| Filters(single-value) | true | true |
| Filters(multi-value) | true | false |
| TopN | true | true |

And for all cases we will also consider `isAggregatable` as a disjunction to cover the script field types, leading to two possible combinations:

* `fa.isAggregatable() || searchStats.isIndexed(fa.name()) && searchStats.hasDocValues(fa.name())`
* `fa.isAggregatable() || searchStats.isIndexed(fa.name())`
2024-11-14 22:11:01 +11:00
Armin Braun
05f3ba3edd
Add singleton for noop BitSetFilterCache.Listener (#116753) (#116773)
Noticed during a code review that added yet another one of these:
We have quite a few instances of duplicate noop implementations,
lets make tests a little less verbose here.

Technically the constant is test-only but it felt right to just leave it
on the interface.
2024-11-14 09:01:36 +11:00
Panagiotis Bailis
7cae545fed
Adding patch version from 8.16 for skip_inner_hits_search_source (#116741) 2024-11-13 19:04:32 +02:00
Tanguy Leroux
d6b2425771
Fix TranslogDeletionPolicy when assertions are disabled (#116654) (#116714)
Current code causes a NPE when assertions are disabled: the 
openTranslogRef is only non-null when assertions are enabled.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-14 01:31:12 +11:00
Simon Cooper
c441ada314
[8.x] Add a deprecation warning that the JSON format of non-detailed errors is changing in v9 (#116330) (#114739) 2024-11-13 14:17:50 +00:00
Dimitris Rempapis
08f8312457
_validate request does not honour ignore_unavailable (#116656) (#116717)
The IndicesOption has been updated into the ValidateQueryRequest to encapsulate the following logic.

If we target a closed index and ignore_unavailable=false, we get an IndexClosedException, otherwise
 if the request contains ignore_unavailable=true, we safely skip the closed index.
2024-11-13 14:31:18 +02:00
Panagiotis Bailis
7d33c5c597
[8.x] Backporting propagating nested inner_hits to the parent compound retriever (#116707) 2024-11-13 12:33:43 +01:00
Nikolaj Volgushev
7668eee283
Use retry logic and real file system in file settings ITs (#116392) (#116709)
Several file-settings ITs fail (rarely) with exceptions like:

```
java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\14\server\build\testrun\internalClusterTest\temp\org.elasticsearch.reservedstate.service.SnaphotsAndFileSettingsIT_5733F2A737542BE-001\tempFile-001.tmp -> C:\Users\jenkins\workspace\platform-support\14\server\build\testrun\internalClusterTest\temp\org.elasticsearch.reservedstate.service.SnaphotsAndFileSettingsIT_5733F2A737542BE-001\tempDir-002\config\operator\settings.json |  

at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) |  
-- | --
  |   | at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) |  
  |   | at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:317) |  
  |   | at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:293) |  
  |   | at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144) |  
  |   | at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144) |  
  |   | at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144) |  
  |   | at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144) |  
  |   | at java.nio.file.Files.move(Files.java:1430) |  
  |   | at org.elasticsearch.reservedstate.service.SnaphotsAndFileSettingsIT.writeJSONFile(SnaphotsAndFileSettingsIT.java:86) |  
  |   | at org.elasticsearch.reservedstate.service.SnaphotsAndFileSettingsIT.testRestoreWithPersistedFileSettings(SnaphotsAndFileSettingsIT.java:321)
```

This happens in Windows file systems, due to a race condition where the
file settings service is reading the settings file concurrently with the
test trying to modify it (a no-go in Windows). It turns out we have
already addressed this with a retry for one test suite
(https://github.com/elastic/elasticsearch/pull/91863), plus addressed a
related issue around mock windows file-systems misbehaving
(https://github.com/elastic/elasticsearch/pull/92653).

This PR extends the above fixes to all file-settings related ITs.

(cherry picked from commit 91559da015)
2024-11-13 21:30:51 +11:00
Lorenzo Dematté
cb4485e168
[Entitlements] External IT test for checkSystemExit (#116435) (#116705) 2024-11-13 20:41:48 +11:00
Patrick Doyle
37edf70bda
Backport entitlement work up to #116473 to 8.x (#116613)
* Add initial entitlement policy parsing (#114448)

This change adds entitlement policy parsing with the following design:
* YAML file for readability and re-use of our x-content parsers
* hierarchical structure to group entitlements under a single scope
* no general entitlements without a scope or for the entire project

* Avoid double instrumentation via class annotation (#115398)

* Move entitlement jars to libs (#115883)

The distribution tools are meant to be CLIs. This commit moves the
entitlements jar projects to the libs dir, under a single
libs/entitlement root directory to keep the related jars together.

* Entitlement tools: SecurityManager scanner (#116020)

* Dynamic entitlement agent (#116125)

* Refactor: treat "maybe" JVM options uniformly

* WIP

* Get entitlement running with bridge all the way through, with qualified
exports

* Cosmetic changes to SystemJvmOptions

* Disable entitlements by default

* Bridge module comments

* Fixup forbidden APIs

* spotless

* Rename EntitlementChecker

* Fixup InstrumenterTests

* exclude recursive dep

* Fix some compliance stuff

* Rename asm-provider

* Stop using bridge in InstrumenterTests

* Generalize readme for asm-provider

* InstrumenterTests doesn't need EntitlementCheckerHandle

* Better javadoc

* Call parseBoolean

* Add entitlement to internal module list

* Docs as requested by Lorenzo

* Changes from Jack

* Rename ElasticsearchEntitlementChecker

* Remove logging javadoc

* exportInitializationToAgent should reference EntitlementInitialization, not EntitlementBootstrap.

They're currently in the same module, but if that ever changes, this code would have become wrong.

* Some suggestions from Mark

---------

Co-authored-by: Ryan Ernst <ryan@iernst.net>

* Remove unused EntitlementInternals (#116473)

* Revert "Entitlement tools: SecurityManager scanner (#116020)"

This reverts commit 023fb663de.

---------

Co-authored-by: Jack Conradson <osjdconrad@gmail.com>
Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co>
Co-authored-by: Ryan Ernst <ryan@iernst.net>
2024-11-13 05:36:55 +11:00
Ying Mao
a49309f9f4
Hides hugging_face_elser service from the GET _inference/_services API (#116664) (#116677)
* Adding hideFromConfigurationApi flag

* Update docs/changelog/116664.yaml
2024-11-13 04:31:01 +11:00
Ying Mao
2ec5299460
Adds support for input_type field to Vertex inference service (#116431) (#116673)
* Adding input type to google vertex ai service

* Update docs/changelog/116431.yaml

* PR feedback - backwards compatibility

* Fix lint error

(cherry picked from commit 7039a1dc8c)
2024-11-13 04:13:04 +11:00
elasticsearchmachine
77881c697d Bump versions after 8.16.0 release 2024-11-12 16:47:20 +00:00
Ignacio Vera
8e35324b8d
Deduplicate DocValueFormat objects from InternalAggregation when deserializing (#116640) (#116670) 2024-11-13 03:06:37 +11:00
elasticsearchmachine
1ce95bbdd9 Bump versions after 8.15.4 release 2024-11-12 12:17:04 +00:00
Kostas Krikellas
de1db9877f
[8.x] Refactor DocumentDimensions to RoutingFields (#116321) (#116604)
* Refactor DocumentDimensions to RoutingFields (#116321)

* Refactor DocumentDimensions to RoutingFields

* update

* add test

* add test

* updates from review

* updates from review

* spotless

* remove final from subclass

* fix final

(cherry picked from commit 2054357902)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java

* fix imports
2024-11-12 21:19:43 +11:00
Ignacio Vera
41e07cad23
Deduplicate non-empty InternalAggregation metadata when deserializing (#116589) (#116635) 2024-11-12 18:49:45 +11:00
Keith Massey
7c3d4027cd
Adding a deprecation info API warning for data streams with old indices (#116447) (#116626)
* Adding a deprecation info API warning for data streams with old indices (#116447)

* removing use of a method not available in 8.x
2024-11-12 11:26:32 +11:00
Lorenzo Dematté
d698e72af3
[8.x] Add a cluster listener to fix missing system index mappings after upgrade (#115771)
This PR modifies `TransportVersionsFixupListener` to include all of
compatibility versions (not only TransportVersion) in the fixup.

`TransportVersionsFixupListener` spots the instances when the master has
been upgraded to the most recent code version, along with non-master
nodes, but some nodes are missing a "proper" (non-inferred) Transport
version. This PR adds another check to also ensure that we have real
(non-empty) system index mapping versions.

To do so, it modifies NodeInfo so it carries all of
CompatibilityVersions (TransportVersion +
SystemIndexDescriptor.MappingVersions).

This was initially done via a separate fixup listener + ad-hoc transport
action, but the 2 listeners "raced" to update ClusterState on the same
CompatibilityVersions structure; it just made sense to do it at the same
time.

The fixup is very similar to
https://github.com/elastic/elasticsearch/pull/110710, which does the
same for cluster features; plus, it adds a CI test to cover the bug
raised in https://github.com/elastic/elasticsearch/issues/112694

Closes https://github.com/elastic/elasticsearch/issues/112694
2024-11-12 05:45:10 +11:00
Lorenzo Dematté
67231ab0d8
Adding full CompatibilityVersions to NodeInfo (#116582)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-12 03:05:38 +11:00
Benjamin Trent
f14c8bd306
Add new multi_dense_vector field for brute-force search (#116275) (#116526)
This adds a new `multi_dense_vector` field that focuses on the maxSim
usecase provided by Col[BERT|Pali].

Indexing vectors in HNSW as it stands makes no sense. Performance wise
or for cost. However, we should totally support rescoring and
brute-force search over vectors with maxSim.

This is step one of many. Behind a feature flag, this adds support for
indexing any number of vectors of the same dimension.

Supports bit/byte/float.

Scripting support will be a follow up.

Marking as non-issue as its behind a flag and unusable currently.

(cherry picked from commit 7369c0818d)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-12 01:02:39 +11:00
Armin Braun
2b5faa9499
Two small improvemetns to IndexNameExpressionResolver (#116552) (#116563)
Not using an iterator loop for the mostly single item list saves
measurable runtime in the benchmarks for the resolver.
Also, cleaned up a redundant method argument.
2024-11-11 22:43:12 +11:00
Benjamin Trent
308ad0c05f
[8.x] Add docvalue_fields Support for dense_vector Fields (#114484) (#116491)
* Add `docvalue_fields` Support for `dense_vector` Fields (#114484)

Currently dense_vector field don't support docvalue_fields.

This add this support for debugging purposes. Users can inspect
row values of their vectors even if the source is disabled.

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
(cherry picked from commit c8a8d4d931)

* fixing for backport

---------

Co-authored-by: Rassyan <yjkhngds@gmail.com>
2024-11-09 08:15:13 +11:00
Jake Landis
8adb2c4043
[8.x] Add a monitor_stats privilege and allow that privilege for remote cluster privileges (#114964) (#116517)
* Add a monitor_stats privilege and allow that privilege for remote cluster privileges (#114964)

This commit does the following:
   * Add a new monitor_stats privilege
   * Ensure that monitor_stats can be set in the remote_cluster privileges
   * Give's Kibana the ability to remotely call monitor_stats via RCS 2.0

Since this is the first case where there is more than 1 remote_cluster privilege,
the following framework concern has been added:
    * Ensure that when sending to elder RCS 2.0 clusters that we don't send the new privilege
        previous only supported all or nothing remote_cluster blocks
    * Ensure that we when sending API key role descriptors that contains remote_cluster,
       we don't send the new privileges for RCS 1.0/2.0 if it not new enough
    * Fix and extend the BWC tests for RCS 1.0 and RCS 2.0

(cherry picked from commit af99654dac)

* adjust bwc for 8.x branch
2024-11-09 06:26:14 +11:00
Benjamin Trent
4eb1c00535
Adjust analyze limit exception to be a bad_request (#116325) (#116495)
The exception is due to large input on the user and is resolvable by
either the user adjusting their request or changing their cluster
settings. So a user focused error is preferred. I chose bad_request as
it seemed like the best fit.

closes: https://github.com/elastic/elasticsearch/issues/116323
2024-11-09 03:05:46 +11:00
Ignacio Vera
fc120f7708
Deduplicate the name of the aggregation when deserializing InternalAggregation (#116307) (#116457) 2024-11-08 16:48:45 +01:00
Andrei Dan
d75ed26899
Validate missing shards after the coordinator rewrite (#116382) (#116489)
The coordinate rewrite can skip searching shards when the query filters
on `@timestamp`, event.ingested  or the _tier field.

We currently check for missing shards across all the indices that are
the query is running against however,  some shards/indices might not
play a role in the query at all after the coordinator rewrite.

This moves the check for missing shards **after** we've run the
coordinator rewrite so we validate only the  shards that will be
searched by the query.

(cherry picked from commit cd2433d60c)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2024-11-09 02:43:33 +11:00
Aurélien FOUCRET
347b7fe369
[8.x] Add kql query to the DSL (#116262) (#116482)
* Add kql query to the DSL (#116262)

(cherry picked from commit e2c29f5487)

# Conflicts:
#	server/src/main/java/org/elasticsearch/rest/action/search/SearchCapabilities.java

* Fix typo introduced during merge.
2024-11-09 01:50:49 +11:00
Nhat Nguyen
9497410147
Add num docs and size to logsdb telemetry (#116128) (#116270)
Follow-up on #115994 to add telemetry for the total number of documents
and size in bytes of logsdb indices.

Relates #115994
2024-11-07 14:18:33 -08:00
Dimitris Rempapis
bfefe8d789
Fields caps does not honour ignore_unavailable (#116021) (#116430)
The IndicesOption has been updated into the FieldCapabilitiesRequest to encapsulate the following logic.

If we target a closed index and ignore_unavailable=false, we get an IndexClosedException, otherwise
 if the request contains ignore_unavailable=true, we safely skip the closed index.

(cherry picked from commit 3ae7921fb0)
2024-11-08 05:52:27 +11:00
Nhat Nguyen
c57f4526d4
Fallback to field-caps (#115977) (#116429)
This change falls back to the old field-caps action if the remote
cluster has not been updated to 8.16 or later.
2024-11-08 05:39:22 +11:00
Ignacio Vera
5b6387b7eb
Deduplicate the list of names when deserializing InternalTopMetrics (#116298) (#116417)
use deduplication infrastructure to deduplicate the names of metrics in InternalTopMetrics.
2024-11-08 03:18:25 +11:00
Nikolaj Volgushev
fd97a9b4d2
[8.x] Fix race conditions in file settings service tests (#116309) (#116402)
* Merge

* Fix merge
2024-11-08 03:14:35 +11:00
Iván Cea Fontenla
22c0eab6dc
Aggs: Add real memory CB call when building internal aggregators in buckets (#116329) (#116393)
Related with https://github.com/elastic/elasticsearch/issues/88128

This PR pretends to reduce the potential OOMs received when building internal aggregations.
2024-11-07 22:29:23 +11:00
Matteo Piergiovanni
94498b4b41
[8.x] Better sizing BytesRef for Strings in Queries (#115655) (#116381)
* Better sizing BytesRef for Strings in Queries (#115655)

* Better sizing BytesRefs for Strings in Queries

* Update docs/changelog/115655.yaml

* iter

* added test

* iter

* extracted method

* iter

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit 9ebe95a8a8)

* iter
2024-11-07 11:56:17 +01:00
Pooya Salehi
69df7fbfe1
Long balance computation should not delay new index primary assignment (#115511) (#116316)
A long desired balance computation could delay a newly created index shard from being assigned since first the computation has to finish for the assignments to be published and the shards getting assigned. With this change we add a new setting which allows setting a maximum time for a computation in case there are unassigned primary shards. Note that this is similar to how a new cluster state causes early publishing of the desired balance.

Closes ES-9616

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-07 11:48:46 +01:00
Dan Rubinstein
0ac7f65096
Adding inference endpoint validation for AzureAiStudioService (#113713) (#116347)
* Adding inference endpoint validation for AzureAiStudioService

* Run spotlessApple

* Update docs/changelog/113713.yaml

* Remove isInClusterService from InferenceService

* Run spotless apply

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-07 06:09:03 +11:00
Ignacio Vera
2a51685fba
Make InternalCentroid leaner (#116302) (#116334)
We are currently holding to fields to extract values, this commit makes them abstract methods so 
we don't use any heap.
2024-11-07 03:30:11 +11:00
Benjamin Trent
616b3908a0
[8.x] Add support for bitwise inner-product in painless (#116082) (#116285)
* Add support for bitwise inner-product in painless (#116082)

This adds bitwise inner product to painless. 

The idea here is:

 - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&`
 - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask.

This is effectively supporting asynchronous quantization. A prime
example of how this works is:
https://github.com/cohere-ai/BinaryVectorDB

Basically, you do your initial search against the binary space and then
rerank with a differently quantized vector allowing for more information
without additional storage space. 

closes:  https://github.com/elastic/elasticsearch/issues/111232

* removing unnecessary task adjustment

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-11-07 00:35:19 +11:00
Tim Brooks
735e6355a9
Parse bulk lines in individual steps (#114086) (#116210)
Currently our incremental bulk parsing framework only parses once both
the action line and document line are available. In addition, it will
re-search lines for line delimiters as data is received. This commit
ensures that the state is not lost in between parse attempts.
2024-11-05 12:26:10 -07:00