This is a temporary change to avoid doing incremental merges in
cross-cluster async-search when minimize_roundtrips=true.
Currently, Kibana polls for status of the async-search via the
_async_search endpoint, which (without this change) will
do an incremental merge of all search results. Once Kibana
moves to polling status via _async_search/status, then
we will undo the change in this commit.
When configuring an OpenAI text embedding service the `model_id` should
have always been part of the service settings rather than task settings.
Task settings are overridable, service settings cannot be changed. If
different models are used the configured entities are considered
distinct.
task_settings is now optional as it contains a single optional field
(`user`)
```
PUT _inference/text_embedding/openai_embeddings
{
"service": "openai",
"service_settings": {
"api_key": "XXX",
"model_id": "text-embedding-ada-002"
}
}
```
Backwards compatibility with previously configured models is maintained
by moving the `model_id` (or `model`) from task settings to service
settings at the first stage of parsing. New configurations are persisted
with `model_id` in service settings, old configurations with `model_id`
in task settings are not modified and will be tolerated by a lenient
parser.
* Add systemAudit logs to process restarts
* Updated error message for system audit / notifications
* Added system audit message for not restarting pytorch process
* Switch to inferenceAuditor and update error message
This change adds additional plumbing to pipe through the available cluster features into
SearchSourceBuilder. A number of different APIs use SearchSourceBuilder so they had to make this
available through their parsers as well often through ParserContext. This change is largely mechanical
passing a Predicate into existing REST actions to check for feature availability.
Note that this change was pulled mostly from this PR (#105040).
In #104846, we see an inconsistency in the state of the clusters between
the test and the clean up at the end of the test.
The failure is as follows: - The test using ILM at some point creates a
full searchable snapshots with name `restore-my-index-xxxx` - Then ILM
moves to frozen and creates a partially searchable snapshots with alias
`restore-my-index-xxxx` - The test confirms that `restore-my-index-xxxx`
is an alias and ends - During tear down, it appears that the cluster
state retrieved contains the `restore-my-index-xxxx` as an index so it
issues a request to delete it. - The deletion fails because
`restore-my-index-xxxx` is an alias.
I do not think that the test has an issue, most of the clues shows that
the partial searchable snapshot has been correctly processed. Only this
cluster state retrieval seems a bit off. In order to reduce this
flakiness we introduce a `GET _cluster/health?wait_for_events=languid`
to ensure we get the latest cluster state.
Fixes#104846
In a scenario where multiple indices are queried and some of them don't
contain the specified stacktrace field, empty stacktrace buckets will be
returned (that's ok and expected) but instead of being of type
`StringTerms` they are of type `UnmappedTerms`. This leads to avoidable
`ClassCastExceptions` when we instead just use the common base interface
`Terms` internally to iterate over buckets.
To catch potential mistakes in the output of a plan, introduce a sanity
check rule for verifying the nodes dependency within a tree after each
optimizer runs.
The goal of the rule is to assert that after being modified dependencies
between nodes don't get misplaced or incorrectly replaced.
Today this test suite relies on being able to cancel an in-flight
publication after it's reached a committed state. This is questionable,
and also a little flaky in the presence of the desired balance allocator
which may introduce a short delay before enqueuing the cluster state
update that performs the reconciliation step.
This commit removes the questionable meddling with the internals of
`Coordinator` and instead just blocks the cluster state updates at the
transport layer to achieve the same effect.
Closes#102947
Fix a bug in the Analyzer which always considered `*` (match any string)
in field names even when escaped (back quoted). This caused the clause
to be too greedy and either keep or drop too many fields.
Fix#104955
* Use single-char variant of String.indexOf() where possible
indexOf(char) is more efficient than searching for the same one-character String.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
GenerateUniqueIndexNameStep contained the exact copies of the generateValidIndexName() and generateValidIndexSuffix() methods from the IndexNameGenerator utility class.
I removed the duplicates and changed the code to use the utility method instead.
Also added javadoc and switched to a pre-compiled Pattern.
The test was also broken as it checked the suffix to consist of only illegal characters.
Replacing matches() with find() makes it check for presence of at least one illegal character.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The index needs to be in tsdb mode. All fields will use the tsdb coded, except fields start with a _ (not excluding _tsid).
Before this change relies on MapperService to check whether a field needed to use tsdb doc values codec, but we missed many field types (ip field type, scaled float field type, unsigned long field type, etc.). Instead we wanted to depend on the doc values type in FieldInfo, but that information is not available in PerFieldMapperCodec.
Borrowed the binary doc values implementation from Lucene90DocValuesFormat. This allows it to be used for any doc values field.
Followup on #99747
* Use String.replace() instead of replaceAll() for non-regexp replacements
When arguments do not make use of regexp features replace() is a more efficient option, especially the char-variant.
These statements come off a little too strongly towards "don't use data streams if you *ever* have updates", but they do support updates when necessary, as long as the backing indices are used.
The grant API key API is disabled in serverless.
Rather than mute the Query API Key yml test (21_query_with_aggs),
which uses the grant API key API, for the serverless build,
I've here modified the yml test to not invoke the grant API Key API.
Querying for granted API keys is already covered
(e.g. ApiKeyAggsIT#testFilterAggs).
Relates #104895
We want to report that observation of document parsing has finished only upon a successful indexing.
To achieve this, we need to perform reporting only in one place (not as previously in both IngestService and 'bulk action')
This commit splits the DocumentParsingObserver in two. One for wrapping an XContentParser and returning the observed state - the DocumentSizeObserver and a DocumentSizeReporter to perform an action when parsing has been completed and indexing successful.
To perform reporting in one place we need to pass the state from IngestService to 'bulk action'. The state is currently represented as long - normalisedBytesParsed.
In TransportShardBulkAction we are getting the normalisedBytesParsed information and in the serverless plugin we will check if the value is indicating that parsing already happened in IngestService (value being != -1) we create a DocumentSizeObserver with the fixed normalisedBytesParsed and won't increment it.
When the indexing is completed and successful we report the observed state for an index with DocumentSizeReporter
small nit: by passing the documentParsingObserve via SourceToParse we no longer have to inject it via complex hierarchy for DocumentParser. Hence some constructor changes
Adds support for the aggs request body parameter to the Query API Key Information API.
This parameter works identically to the well known eponymous parameter of the _search endpoint,
but the set of allowed aggregation types as well as the field names allowed is restricted.