Commit graph

16605 commits

Author SHA1 Message Date
Przemysław Witek
c212abddeb
[Transform] Fix _reset API when called with force=true on a failed transform (#106574) (#106589) 2024-03-21 06:42:13 -04:00
Mark Vieira
d1c98f39fd
Validate that test cluster BWC nodes use the default distribution (#106559) (#106565)
We have instances where BWC tests configure old ES version nodes with
the integTest distribution. This isn't a valid configuration, and while
we in reality resolve the default distribution artifact, we have other
configuration logic that behaves differently based on whether the
integTest distro was _requested_. Specifically, what to set ES_JAVA_HOME
to. This bug resulted in us attempting to run old nodes using the
current bundled JDK version, which may be incompatible with that older
version of Elasticsearch.

Closes #104858
2024-03-20 12:55:47 -04:00
Aurélien FOUCRET
e29365583d
Ensure ILM policy is installed before starting the tests. (#106523) (#106545) 2024-03-20 09:19:11 -04:00
Nhat Nguyen
76aee0b313
Fix testCancelRequestWhenFailingFetchingPages (#106447) (#106515)
If we proceed without waiting for pages, we might cancel the main 
request before starting the data-node request. As a result, the exchange
sinks on data-nodes won't be removed until the inactive_timeout elapses,
which is longer than the assertBusy timeout.

Closes #106443
2024-03-19 18:54:03 -04:00
Nhat Nguyen
9a8864c21e
AwaitsFix #106443 (#106453) 2024-03-18 21:59:24 -04:00
Nhat Nguyen
aa2e022b31
Resume driver when failing to fetch pages (#106392) (#106436)
I investigated a heap attack test failure and found that an ESQL request
was stuck. This occurred in the following:

1. The ExchangeSource on the coordinator was blocked on reading because
there were no available pages.

2. Meanwhile, the ExchangeSink on the data node had pages ready for
fetching.

3. When an exchange request tried to fetch pages, it failed due to a
CircuitBreakingException. Despite the failure, no cancellation was
triggered because the status of the ExchangeSource on the coordinator
remained unchanged.  To fix this issue, this PR introduces two changes:

Resumes the ExchangeSourceOperator and Driver on the coordinator,
eventually allowing the coordinator to trigger cancellation of the
request when failing to fetch pages.

Ensures that an exchange sink on the data nodes fails when a data node
request is cancelled. This callback was inadvertently omitted when
introducing the node-level reduction in Run empty reduction node level
on data nodes #106204.

I plan to spend some time to harden the exchange and compute service.

Closes #106262
2024-03-18 12:04:22 -07:00
Craig Taverner
5dd26f762d
Make new spatial sort tests less flaky (#106401) (#106405)
The tests that assert sorting on spatial types causes consistent error messages, also were flaky for the non-error message cases under rare circumstances where the results were returned in different order. We now sort those on a sortable field for deterministic behaviour.
2024-03-18 07:47:33 -04:00
Athena Brown
97d4a86427
Adjust interception of requests for specific shard IDs (#101656) (#106376)
Some index requests target shard IDs specifically, which may not match the indices that the request targets as given by `IndicesRequest#indices()`, which requires a different interception strategy in order to make sure those requests are handled correctly in all cases and that any malformed messages are caught early to aid in troubleshooting.

This PR adds and interface allowing requests to report the shard IDs they target as well as the index names, and adjusts the interception of those requests as appropriate to handle those shard IDs in the cases where they are relevant.
2024-03-14 19:52:33 -04:00
Craig Taverner
45576fc0b4
ESQL: Fix error on sorting unsortable geo_point and cartesian_point (#106351) (#106379)
* Fix error on sorting unsortable geo_point and cartesian_point

Without a LIMIT the correct error worked, but with LIMIT it did not. This fix mimics the same error with LIMIT and adds tests for all three scenarios:
* Without limit
* With Limit
* From row with limit

* Update docs/changelog/106351.yaml

* Add tests for geo_shape and cartesian_shape also

* Updated changelog

* Separate point and shape error messages

* Move error to later so we get it only if geo field is actually used in sort.

* Implemented planner check in Verifier instead

This is a much better solution.

* Revert previous solution

* Also check non-field attributes so the same error is provided for ROW

* Changed "can't" to "cannot"

* Add unit tests for verifier error

* Added sort limitations to documentation

* Added unit tests for spatial fields in VerifierTests

* Don't run the new yaml tests on older versions

These tests mostly test the validation errors which were changed only in 8.14.0, so should not be tested in earlier versions.

* Simplify check based on code review, skip duplicate forEachDown
2024-03-14 19:08:05 -04:00
Kathleen DeRusso
b835827311
Fix Search Applications bug where deleting an alias before deleting an application intermittently caused errors (#106329) (#106354)
* Update delete object to never fail if alias does not exist

* Update docs/changelog/106329.yaml

* Update changelog

* Fix area in changelog
2024-03-14 09:18:16 -04:00
Lloyd
44e8f72b47
[IdP plugin] Fix exception handling (#106231) (#106336)
* Add regression tests that test ACS and entity id mismatch, causing
  us to go into the initCause branch

* Fix up exception creation: initCause it not
  allowed because ElasticsearchException
  initialises the cause to `null` already if
  it isn't passed as a contructor param.

Signed-off-by: lloydmeta <lloydmeta@gmail.com>
2024-03-13 20:25:30 -04:00
Youhei Sakurai
5198e7f041
Handling exceptions on watcher reload (#105442) (#106210) 2024-03-13 15:46:24 -05:00
Jonathan Buttner
74dfe58f11
Allowing byte and int8 (#106299) (#106326)
(cherry picked from commit de33a57f55)

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/cohere/CohereService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/cohere/CohereServiceSettings.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/OpenAiService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/embeddings/OpenAiEmbeddingsServiceSettings.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/cohere/CohereServiceSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/cohere/embeddings/CohereEmbeddingsServiceSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/openai/embeddings/OpenAiEmbeddingsServiceSettingsTests.java

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-03-13 16:05:57 -04:00
Kostas Krikellas
be29f7fe3b
[TEST] Increase timeout for rollover to exceed look_ahead_time (#106290) (#106291)
`look_ahead_time` is set to 1 minute, the `assertBusy` loop needs to
wait for longer than that to get a readonly backing index. 

Note that this is only relevant when the `UpdateTimeSeriesRangeService`
kicks in to bump the end time of the head index. This is rare (it runs
every 10 minutes) but can happen.

Fixes #101428
2024-03-13 11:20:25 -04:00
Rene Groeschke
aefa784360
Cleanup SamlAuthenticationIT (#106227) (#106287)
Remove comments about awaitsFix
2024-03-13 09:32:55 -04:00
Rene Groeschke
63e4917775
Add Saml test connection timeout debugging output (#104801) (#106226)
Add additional logging to idp test fixture container

(cherry picked from commit 46beceb180)
2024-03-13 13:07:11 +01:00
Kostas Krikellas
3d7122996b
backport pr-106225 (#106278) 2024-03-13 04:27:45 -04:00
David Kyle
f30f6adf98
[ML] Make task settings optional when creating Cohere embedding models (#106241) (#106258)
# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/cohere/CohereServiceTests.java
2024-03-12 14:33:44 -04:00
Aurélien FOUCRET
ad161a1586
Ensure LTR models are cached when used as a rescorer. (#106161) 2024-03-11 14:59:50 +01:00
Mark Vieira
7dcdba029f
AwaitsFix #105577 2024-03-08 11:45:52 -08:00
Luigi Dell'Aquila
724d119ef9
ES|QL: Disable optimizations that rely on Expression.nullable() (#105691) (#106132) 2024-03-08 12:30:22 -05:00
Jan Kuipers
4757b89750
During ML maintenance, reset jobs in the reset state without a corresponding task. (#106062) (#106125)
* During ML maintenance, reset jobs in the reset state without a corresponding task.

* Update docs/changelog/106062.yaml

* Fix race condition in MlDailyMaintenanceServiceTests

* Fix log level
2024-03-08 10:24:23 -05:00
Tim Vernum
321c4e1e6b
Respect --pass option in certutil csr mode (#106105) (#106120)
elasticsearch-certutil csr generates a private key and a certificate
signing request (CSR) file. It has always accepted the "--pass" command
line option, but ignore it and always generated an unencrypted private
key.

This commit fixes the utility so the --pass option is respected and the
private key is encrypted.
2024-03-08 08:20:03 -05:00
Jedr Blaszyk
a4cb440b88
[Connector API] Fix serialisation of script params in connector index service (#106060) (#106072) 2024-03-07 09:08:22 -05:00
Aurélien FOUCRET
05fe2849b8
Fix ILM to DSL migration test for BA. (#106054) (#106064) 2024-03-07 13:58:51 +01:00
David Turner
426201671b Avoid computing currentInferenceProcessors on every cluster state (#106057)
This computation involves parsing all the pipeline metadata on the
cluster applier thread. It's pretty expensive if there are lots of
pipelines, and seems mostly unnecessary because it's only needed for a
validation check when creating new processors.
2024-03-07 12:41:33 +00:00
Jan Kuipers
5d830b08dc
Backport 106020 (#106058)
* Reset job if existing reset fails (#106020)

* Try again to reset a job if waiting for completion of an existing reset task fails.

* Update docs/changelog/106020.yaml

* Update 106020.yaml

* Update docs/changelog/106020.yaml

* Improve code

* Trigger rebuild
2024-03-07 07:32:46 -05:00
Benjamin Trent
64ee22fb87
Test mute for #105485 (#106028) 2024-03-06 11:21:22 -05:00
David Roberts
e571d609ea
[ML] Fix categorize_text aggregation nested under empty buckets (#105987) (#106012)
Previously the `categorize_text` aggregation could throw an
exception if nested as a sub-aggregation of another aggregation
that produced empty buckets at the end of its results. This
change avoids this possibility.

Fixes #105836
2024-03-06 05:54:55 -05:00
Andrei Dan
88f2881a1c
Make sure we test the listener is called (#105914) (#106010) 2024-03-06 05:16:51 -05:00
Jedr Blaszyk
a91f0020b2
[Connector API] Fix default ordering in SyncJob list endpoint (#105945) (#106009) 2024-03-06 04:26:57 -05:00
Nik Everett
86719a3a38
ESQL: fix single valued query tests (backport of #105986) (#105995)
* ESQL: fix single valued query tests (#105986)

In some cases the tests for our lucene query that makes sure a field is
single-valued was asserting incorrect things about the stats that come
from the query. That was failing the test from time to time. This fixes
the assertion in those cases.

Closes #105918

* ESQL: Reenable svq tests

We fixed the test failure in #105986 but this snuck in.

Closes #105952
2024-03-05 16:41:32 -05:00
Benjamin Trent
41ec34a968
Test mute for #105952 (#105954)
test mute for https://github.com/elastic/elasticsearch/issues/105952
2024-03-05 07:57:46 -05:00
Benjamin Trent
bc57a519b7
Manually backport changes from #105578 (#105913) 2024-03-05 06:33:50 -05:00
Benjamin Trent
352850bba8
Test mute for #105918 (#105920)
mute for: https://github.com/elastic/elasticsearch/issues/105918
2024-03-04 11:44:33 -05:00
Andrei Dan
2103adc40b
[ILM] Delete step deletes data stream with only one index (#105772) (#105897)
We seem to have a couple of checks to make sure we delete the data
stream when the last index reaches the delete step however, these checks
seem a bit contradictory.

Namely, the first check makes use if `Index` equality (UUID included)
and the second just checks the index name. So if a data stream with just
one index (the write index) is restored from snapshot (different UUID)
we would've failed the first index equality check and go through the
second check `dataStream.getWriteIndex().getName().equals(indexName)`
and fail the delete step (in a non-retryable way :( ) because we don't
want to delete the write index of a data stream (but we really do if the
data stream has only one index)

This PR makes 2 changes: 1. use the index name equality everywhere in
the step (we already looked up the index abstraction and the parent data
stream, so we know for sure the managed index is part of the data
stream) 2. do not throw exception when we got here via a write index
that is NOT the last index in the data stream but report the exception
so we keep retrying this step (i.e. this enables our users to simply
execute a manual rollover and the index is deleted by ILM eventually on
retry)
2024-03-04 06:54:19 -05:00
Nhat Nguyen
9a474ab282
ProjectOperator should not retain references to released blocks (#105848) (#105883)
The heap attack tests hit OOM where the circuit breaker was 
under-accounted. This was because the ProjectOperator retained
references to released blocks. Consequently, the released block couldn't
be GCed although we have decreased memory usage in the circuit breaker.

Relates #10563
2024-03-02 21:42:04 -05:00
Benjamin Trent
9457972d3f
Muting test for issue #105794 (#105795)
related to: https://github.com/elastic/elasticsearch/issues/105794
2024-02-26 17:20:58 +01:00
Ryan Ernst
3695aa113e
Mute EsqlActionBrakerIT
see https://github.com/elastic/elasticsearch/issues/105543
2024-02-26 16:48:55 +01:00
Alexander Spies
a7ef700476
[8.13] ESQL: Fix wrong attribute shadowing in pushdown rules (#105650) (#105808)
* ESQL: Fix wrong attribute shadowing in pushdown rules (#105650)

Fix https://github.com/elastic/elasticsearch/issues/105434

Fixes accidental shadowing when pushing down `GROK`/`DISSECT`, `EVAL` or
`ENRICH` past a `SORT`.

Example for how this works:

```
...
| SORT x
| EVAL x = y
...

pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into

...
| EVAL $$x = x
| EVAL x = y
| SORT $$x
| DROP $$x
...
```

The same logic is applied to `GROK`/`DISSECT` and `ENRICH`.

This allows to re-enable the dependency checker (after fixing a small
bug in it when handling `ENRICH`).

* Make OptimizerRules compile again
2024-02-26 10:24:42 -05:00
David Kyle
e438c9bdc8
[ML] Rename the internal text embedding service to elasticsearch (#105803) 2024-02-25 07:16:43 -05:00
Andrei Stefan
30e7aa4796
ESQL: push down "[text_field] is not null" and "[text_field] is null"(#105593) (#105800)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-02-25 03:34:17 -05:00
Andrei Dan
8759b31f4b
[TEST] Issue another cluster state change to make sure ILM notices an index is assigned (#105725) (#105793)
ILM transitions to `wait-for-index-color` (a step that needs a cluster
state changed event to evaluate against) but misses the cluster state
event that notifies that `partial-index` is now `GREEN`. And then the
cluster is quiet and no more state changes occur and we timeout. Note
that the test is unblocked by the teardown of the IT that triggers some
cluster state changes.

This fixes the test by issueing some empty `reroute` request to cause
some cluster state  traffic in the cluster and ILM notices an index is
assigned.

Note that a production cluster is busy and ILM would eventually notice
the new state and make progress.

```
2024-02-22T06:33:01,388][INFO ][o.e.x.i.IndexLifecycleTransition] [node_t0] moving
index [index] from [{"phase":"frozen","action":"searchable_snapshot","name":"mount-snapshot"}] to [{"phase":"froz
en","action":"searchable_snapshot","name":"wait-for-index-color"}] in policy [policy]
[2024-02-22T06:33:01,490][INFO ][o.e.c.r.a.AllocationService] [node_t0] current.health="GREEN"
message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started
[[partial-index][0]]])." previous.health="YELLOW" reason="shards started [[partial-index][0]]"
```

Fixes #102405
2024-02-23 13:18:31 -05:00
Pat Whelan
c96523a140
[Transform] Retry destination index creation (#105759) (#105778)
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix #105683
Relate #104146
2024-02-23 10:11:58 -05:00
David Roberts
ba5ad57351
[ML] Unmute KDETests.testCdfAndSf (#105735) (#105741)
This test was supposed to be fixed by #102878, however,
the test was not unmuted in that PR.

Relates #102876
2024-02-22 10:08:33 -05:00
David Roberts
8744ebb246
[ML] Fix AutodetectMemoryLimitIT.testManyDistinctOverFields (#105727) (#105734)
It seems that the changes of https://github.com/elastic/ml-cpp/pull/2585
combined with the randomness of the test could cause it to fail
very occasionally, and by a tiny percentage over the expected
upper bound. This change reenables the test by very slightly
increasing the upper bound.

Fixes #105347
2024-02-22 08:37:09 -05:00
Luigi Dell'Aquila
68ef331daf
ES|QL: Set default query LIMIT to 1000 (#105618) (#105678) 2024-02-21 04:11:38 -05:00
Pat Whelan
032a50e9c2
[Transform] Fix testStopAtCheckpoint (#105664) (#105672)
Currently, there is a small chance that testStopAtCheckpoint will fail
to correctly count the amount of times `doSaveState` is invoked:
```
Expected: <5>
     but: was <4>
```

There are two potential issues:
1. The test thread starts the Transform thread, which starts a Search
   thread. If the Search thread starts reading from the
   `saveStateListeners` while the test thread writes to the
   `saveStateListeners`, then there is a chance our testing logic will
   not be able to count the number of times we read from
   `saveStateListeners`.
2. The non-volatile integer may be read as one value and written as
   another value.

Two fixes:
1. The test thread blocks the Transform thread until after the test
   thread writes all the listeners. The subsequent test will
   continue to verify that we can safely interlace reading and
   writing.
2. The counter is now an AtomicInteger to provide thread safety.

Fixes #90549
2024-02-20 15:48:01 -05:00
Jedr Blaszyk
0527fcd701
[Connector API] Bugfix: support list type in filtering advanced snippet value (#105633) (#105645) 2024-02-20 06:05:03 -05:00
Luigi Dell'Aquila
e91ab4c035
Disable insensitive equals operator (#105611) (#105613) 2024-02-19 06:11:27 -05:00