Commit graph

749 commits

Author SHA1 Message Date
Jonathan Buttner
6bb79f2dac
[ML] Adding placeholder functionality for custom model request logic (#127271)
* Adding placeholder functionality for request logic

* Updating comments

* Fixing tests

* Adding missing licenses

* Fixing String.format
2025-04-24 10:03:38 -04:00
Jonathan Buttner
e840334f8b
Fixing issue (#127074) 2025-04-21 13:14:53 -04:00
Kathleen DeRusso
e280aa5d50
Revert semantic_text model registry changes (#127075) 2025-04-18 18:36:33 -04:00
Jonathan Buttner
3156cc7c0f
[ML] Implement JSONPath replacement for Inference API (#127036)
* Adding initial extractor

* Finishing tests

* Addressing feedback
2025-04-18 14:34:19 -04:00
Pat Whelan
d870f42c90
[ML] Allow InputType for Bedrock Titan (#127021)
Semantic Search can now send InputType as part of the request to
non-Cohere Bedrock models.

Fix #126709
2025-04-18 18:38:59 +02:00
Kathleen DeRusso
a72883e8e3
Default new semantic_text fields to use BBQ when models are compatible (#126629)
* Default new semantic_text fields to use BBQ when models are compatible

* Update docs/changelog/126629.yaml

* Gate default BBQ by IndexVersion

* Cleanup from PR feedback

* PR feedback

* Fix test

* Fix test

* PR feedback

* Update test to test correct options

* Hack alert: Fix issue where mapper service was always being created with current index version
2025-04-17 08:25:10 -04:00
Jonathan Buttner
7a0f63c1a0
[ML] Refactor inference request executor to leverage scheduled execution (#126858)
* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log
2025-04-16 14:14:02 -04:00
Jonathan Buttner
e42c118ec6
[ML] Adding missing onFailure call for Inference API start model request (#126930)
* Adding missing onFailure call

* Update docs/changelog/126930.yaml
2025-04-16 14:07:13 -04:00
Jim Ferenczi
c906cc005c
Expose model registry to SemanticTextFieldMapper (#126635)
This change integrates the new model registry with the `SemanticTextFieldMapper`, allowing inference IDs to be eagerly resolved at parse time.
It also preserves the existing lenient behavior: no error is thrown if the specified inference id does not exist, only a warning is logged.
2025-04-15 09:45:23 +02:00
Jim Ferenczi
46c3657255
Fix and unmute SemanticInferenceMetadataFieldsRecoveryTests (#126784)
Use the TranslogOperationAsserter to compare the raw operations.

Closes #124383
Closes #124384
Closes #124385
2025-04-15 08:36:20 +02:00
Dan Rubinstein
b917d9a1e0
Revert endpoint creation validation for ELSER and E5 (#126792)
* Revert endpoint creation validation for ELSER and E5

* Update docs/changelog/126792.yaml

* Revert start model deployment being in TransportPutInferenceModelAction

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-04-14 17:00:33 -04:00
Mike Pellegrini
85713f78e0
Semantic Text Chunking Indexing Pressure (#125517)
We have observed many OOMs due to the memory required to inject chunked inference results for semantic_text fields. This PR uses coordinating indexing pressure to account for this memory usage. When indexing pressure memory usage exceeds the threshold set by indexing_pressure.memory.limit, chunked inference result injection will be suspended to prevent OOMs.
2025-04-14 15:55:37 -04:00
Jonathan Buttner
31bb3d1619
[ML] Refactoring inference API non-streaming response validation error object check (#126725)
* Refactoring so that non-streaming does not check for error object

* Fixing test
2025-04-14 10:42:40 -04:00
Jonathan Buttner
39e594f9b9
[ML] Exposing OpenAI URL field in services API (#126638)
* Adding url configuration field

* Fixing test
2025-04-11 08:26:35 -04:00
Armin Braun
dd1db5031e
Move calls to FeatureFlag.enabled to class-load time (#125885)
I noticed that we tend to create the flag instance and call this method
everywhere. This doesn't compile the same way as a real boolean constant
unless you're running with `-XX:+TrustFinalNonStaticFields`.
For most of the code spots changed here that's irrelevant but at least
the usage in the mapper parsing code is a little hot and gets a small
speedup from this potentially.
Also we're simply wasting some bytes for the static footprint of ES by
using the `FeatureFlag` indirection instead of just a boolean.
2025-04-11 01:46:28 +02:00
Mike Pellegrini
9956bedb52
Update TestSparseInferenceServiceExtension to not support text embeddings (#126618) 2025-04-10 14:08:17 -04:00
Pat Whelan
6c6500ec3b
[ML] Bedrock Cohere Task Settings Support (#126493)
Add support for Cohere Task Settings and Truncate, through
the Amazon Bedrock provider integration.

Task Settings can now be passed bother during Inference endpoint
creation and Inference POST requests.

Close #126156
2025-04-09 21:34:05 +02:00
Dan Rubinstein
44507cce04
Fix ELAND endpoints not updating dimensions (#126537)
* Fix ELAND endpoints not updating dimensions

* Update docs/changelog/126537.yaml
2025-04-09 12:16:06 -04:00
David Kyle
d2be03c946
[ML] Move Inference Service request and response classes into service package (#126482) 2025-04-08 23:26:25 +02:00
Patrick Doyle
728eb7504f
Fix inference plugin name in entitlements warning suppression (#126470) 2025-04-08 13:58:01 -04:00
Dan Rubinstein
20f6a2a76b
Adding endpoint creation validation to ElasticsearchInternalService (#123044)
* Adding validation to ElasticsearchInternalService

* Update docs/changelog/123044.yaml

* [CI] Auto commit changes from spotless

* Removing checkModelConfig

* Fixing IT

* [CI] Auto commit changes from spotless

* Remove DeepSeek checkModelConfig and fix tests

* Cleaning up comments, updating validation input type, and moving model deployment starting to model validator

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-04-08 09:53:31 -04:00
Ryan Ernst
991e80d56e
Remove unnecessary generic params from action classes (#126364)
Transport actions have associated request and response classes. However,
the base type restrictions are not necessary to duplicate when creating
a map of transport actions. Relatedly, the ActionHandler class doesn't
actually need strongly typed action type and classes since they are lost
when shoved into the node client map. This commit removes these type
restrictions and generic parameters.
2025-04-07 16:22:56 -07:00
David Kyle
20eb59080c
[ML] Move request managers into service package (#126114) 2025-04-07 16:01:49 +02:00
Mike Pellegrini
72066ea49f
Update InferenceException to retain top-level message (#126345) 2025-04-07 09:05:00 -04:00
Jan Kuipers
1927c6e91d
Fix deploying custom models with adaptive allocations (#126276) 2025-04-05 01:50:06 +11:00
Aurélien FOUCRET
a4a271415d
Adding ES|QL RERANK command in snapshot builds (#123074) 2025-04-04 15:39:18 +01:00
Kathleen DeRusso
3575bdba08
Ensure sentence overlap is considered in SentenceBoundaryChunkingSettings equals/hashCode (#126250) 2025-04-04 09:29:29 -04:00
Kathleen DeRusso
e7d4a28a87
Support configurable chunking in semantic_text fields (#121041)
* test

* Revert "test"

This reverts commit 9f4e2adba0.

* Refactor InferenceService to allow passing in chunking settings

* Add chunking config to inference field metadata and store in semantic_text field

* Fix test compilation errors

* Hacking around trying to get ingest to work

* Debugging

* [CI] Auto commit changes from spotless

* POC works and update TODO to fix this

* [CI] Auto commit changes from spotless

* Refactor chunking settings from model settings to field inference request

* A bit of cleanup

* Revert a bunch of changes to try to narrow down what broke CI

* test

* Revert "test"

This reverts commit 9f4e2adba0.

* Fix InferenceFieldMetadataTest

* [CI] Auto commit changes from spotless

* Add chunking settings back in

* Update builder to use new map

* Fix compilation errors after merge

* Debugging tests

* debugging

* Cleanup

* Add yaml test

* Update tests

* Add chunking to test inference service

* Trying to get tests to work

* Shard bulk inference test never specifies chunking settings

* Fix test

* Always process batches in order

* Fix chunking in test inference service and yaml tests

* [CI] Auto commit changes from spotless

* Refactor - remove convenience method with default chunking settings

* Fix ShardBulkInferenceActionFilterTests

* Fix ElasticsearchInternalServiceTests

* Fix SemanticTextFieldMapperTests

* [CI] Auto commit changes from spotless

* Fix test data to fit within bounds

* Add additional yaml test cases

* Playing with xcontent parsing

* A little cleanup

* Update docs/changelog/121041.yaml

* Fix failures introduced by merge

* [CI] Auto commit changes from spotless

* Address PR feedback

* [CI] Auto commit changes from spotless

* Fix predicate in updated test

* Better handling of null/empty ChunkingSettings

* Update parsing settings

* Fix errors post merge

* PR feedback

* [CI] Auto commit changes from spotless

* PR feedback and fix Xcontent parsing for SemanticTextField

* Remove chunking settings check to use what's passed in from sender service

* Fix some tests

* Cleanup

* Test failure whack-a-mole

* Cleanup

* Refactor to handle memory optimized bulk shard inference actions - this is ugly but at least it compiles

* [CI] Auto commit changes from spotless

* Minor cleanup

* A bit more cleanup

* Spotless

* Revert change

* Update chunking setting update logic

* Go back to serializing maps

* Revert change to model settings - source still errors on missing model_id

* Fix updating chunking settings

* Look up model if null

* Fix test

* Work around https://github.com/elastic/elasticsearch/issues/125723 in semantic text field serialization

* Add BWC tests

* Add chunking_settings to docs

* Refactor/rename

* Address minor PR feedback

* Add test case for null update

* PR feedback - adjust refactor of chunked inputs

* Refactored AbstractTestInferenceService to return offsets instead of just Strings

* [CI] Auto commit changes from spotless

* Fix tests where chunk output was of size 3

* Update mappings per PR feedback

* PR Feedback

* Fix problems related to merge

* PR optimization

* Fix test

* Delete extra file

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-03 17:45:26 -04:00
Tim Grein
00c051cc38
Only add product use case header if not present (#126212)
Co-authored-by: Brendan Jugan <brendan.jugan@elastic.co>
2025-04-03 17:05:22 -04:00
Pat Whelan
69180eafe1
[ML] Refactor SSE Parsing (#125959)
ServerSentEvent is now a record with `event` and `data`, rather than
it being a record for value with a separate `ServerSentEventField`.

- `value` was renamed to `data`
- `hasValue` was renamed to `hasData`
- Parsing was refactored to read more like its spec: writing to a buffer
  and flushing when we reach a blank newline
- We now support multiline data payloads
2025-04-03 09:42:49 -04:00
Joe Gallo
1306c6ba01
Bump spotless (#126125) 2025-04-02 13:06:42 -04:00
Samiul Monir
d8ae61e91f
[Semantic Text] Integration Test (#125141)
* Initial draft test with index version  setup

* Adding test in phases

* [CI] Auto commit changes from spotless

* Adding test for search functionality

* Adding test for highlighting

* Adding randomization during selection process

* Fix code styles by running spotlessApply

* Fix code styles by running spotlessApply

* Fixing forbiddenAPIcall issue

* Decoupled namedWritables to use separate fake plugin and simplified other override methods

* Updating settings string to variable and removed unused code

* Fix SemanticQueryBuilder dependencies

* fix setting maximum number of tests to run

* utilizing semantci_text index version param and removed unwanted override

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-04-01 09:42:50 -06:00
David Kyle
c521264815
[ML] Delay copying chunked input strings (#125837)
The chunked text is only required when the actual inference request is made, 
using a string supplier means the string creation can be done much much closer 
to where the request is made reducing the lifespan of the copied string.
2025-04-01 14:53:54 +01:00
David Kyle
d3a1b21a59
[ML] Move Inference Service account classes into the service package (#125928) 2025-04-01 14:38:24 +01:00
David Kyle
fc933d436a
[ML] Remove InferenceServiceResults#transformToLegacyFormat (#125924) 2025-04-01 13:49:48 +01:00
Jim Ferenczi
55827946a4
[ML] Fix ModelRegistryMetadataTests intermittent failures (#125883)
unmute test
2025-03-31 10:44:35 +01:00
David Kyle
9f4db73435
[ML] Move Inference Service Action classes into the service package (#125567) 2025-03-28 17:56:44 +01:00
David Kyle
67798dd25b
[ML] Fix ModelRegistryMetadataTests (#125756) 2025-03-28 13:40:53 +02:00
Ying Mao
a6f685cc2a
Adding common rerank options to Perform Inference API (#125239)
* wip

* Adding rerank common options

* Linting

* Linting

* [CI] Auto commit changes from spotless

* Update docs/changelog/125239.yaml

* PR feedback

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-03-25 12:32:18 -04:00
Mike Pellegrini
019ce3fb81
Fix Semantic Text 8.x Upgrade Bug (#125446) 2025-03-24 09:15:54 -04:00
Jim Ferenczi
0930a75642
Prevent default inference model to update the cluster state when deleting (#125369)
The Elastic inference service removes the default models at startup if the node cannot access EIS.
Since #125242 we don't store default models in the cluster state but we still try to delete them.
This change ensures that we don't try to update the cluster state when a default model is deleted
since the delete is not performed on the master node and default models are never stored in the cluster state.
2025-03-21 17:27:26 +00:00
Pat Whelan
b8db0eee96
[ML] Refactor stream metrics (#125092)
Remove the use of DelegatingProcessor and replace it with an inline
processor.
2025-03-21 18:51:08 +02:00
Pat Whelan
76260267b0
[ML] Move and rename AmazonBedrockSecretSettings (#125323)
In preparation for integrating with SageMaker, we want to reuse the
existing SecretSettings.

- AmazonBedrockSecretSettings moved from services.amazonbedrock to
  common.amazon.
- AmazonBedrockSecretSettings was renamed to AwsSecretSettings.
- accessKey and secretKey are now encapsulated.
2025-03-21 16:00:58 +02:00
David Kyle
e0d4599dad
[ML] Add logging for ModelRegistry cluster state update failure (#125401) 2025-03-21 15:46:03 +02:00
Jim Ferenczi
a26bbe25bf
Set default similarity for Cohere model to cosine (#125370)
Cohere embeddings are expected to be normalized to unit vectors, but due to floating point precision issues,
our check ({@link DenseVectorFieldMapper#isNotUnitVector(float)}) often fails.
This change fixes this bug by setting the default similarity for newly created Cohere inference endpoint to cosine.

Closes #122878
2025-03-21 12:23:26 +00:00
Jim Ferenczi
0ff526aa84
BWC Handling for ModelRegistryMetadata (#125301)
ModelRegistryMetadata has now been backported to 8.19 via #125150. This update ensures that we properly differentiate between nodes running 8.19.x (which supports the new custom metadata) and 9.0.x (which does not).
To achieve this, this PR introduces a new `supportsVersion(TransportVersion)` method for `NamedWriteable` and `NamedDiff`, allowing subclasses to customize their backward compatibility behavior.
2025-03-21 09:28:11 +00:00
Jonathan Buttner
2fd70d9d69
[ML] Refactoring inference API throttled logging (#124923)
* Refactoring throttled logging

* Renaming setting

* Resetting log counter when attempt to log for repeated messages

* Clean up

* Refactoring locking

* Comments

* Working tests

* Addressing logbuilder issue

* Fixing tests
2025-03-20 14:22:24 -04:00
Dan Rubinstein
52bc96240c
Fix AlibabaCloudSearchCompletionAction not accepting ChatCompletionInputs (#125023)
* Fix AlibabaCloudSearchCompletionAction not accepting ChatCompletionInputs

* Update docs/changelog/125023.yaml

* Fix unit tests
2025-03-20 13:07:58 -04:00
Tim Grein
177da5f270
[Inference API] Move elastic provider service & task settings under their own package similar to authorization and completion. (#125197) 2025-03-20 13:13:29 +01:00
Jim Ferenczi
2f1c8577f9
Exclude Default Inference Endpoints from Cluster State Storage (#125242)
When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint.
However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node.

Since #121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.

This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.

**Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
2025-03-19 20:19:08 +00:00