We previously removed support for `fields` in the request body, to ensure there
was only one way to specify the parameter. We've now decided to undo the
change, since it was disruptive and the request body is actually the best place to
pass variable-length data like `fields`.
This PR restores support for `fields` in the request body. It throws an error
if the parameter is specified both in the URL and the body.
Closes#86875
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field
caps API per field.
Especially for metrics fields, it must be clear which fields are metrics and if they belong
to only time-series indexes or mixed time-series and non-time-series indexes.
To further distinguish metric fields when they belong to any of the following indices:
- Standard (non-time-series) indexes
- Time series indexes
- Downsampled time series indexes
This PR modifies the field caps API so that the mapping parameters time_series_dimension
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.
This PR revisits some of the conventions adopted by #78790
Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns
Relates #77065
Co-authored-by: David Roberts <dave.roberts@elastic.co>
represent transactions as bitsets for faster lookups when iterating over candidate sets. This PR implements
a lookup table and a subset check based on bits. It uses this lookup table to map transactions to items, this
so-called horizontal representation is used to speedup the lookup that checks if a transaction contains the
candidate item set
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.
Relates to #87625
Part of #84369. Implement the `Tracer` interface by providing a
module that uses OpenTelemetry, along with Elastic's APM
agent for Java.
See the file `TRACING.md` for background on the changes and the
reasoning for some of the implementation decisions.
The configuration mechanism is the most fiddly part of this PR. The
Security Manager permissions required by the APM Java agent make
it prohibitive to start an agent from within Elasticsearch
programmatically, so it must be configured when the ES JVM starts.
That means that the startup CLI needs to assemble the required JVM
options.
To complicate matters further, the APM agent needs a secret token
in order to ship traces to the APM server. We can't use Java system
properties to configure this, since otherwise the secret will be
readable to all code in Elasticsearch. It therefore has to be
configured in a dedicated config file. This in itself is awkward,
since we don't want to leave secrets in config files. Therefore,
we pull the APM secret token from the keystore, write it to a config
file, then delete the config file after ES starts.
There's a further issue with the config file. Any options we set
in the APM agent config file cannot later be reconfigured via system
properties, so we need to make sure that only "static" configuration
goes into the config file.
I generated most of the files under `qa/apm` using an APM test
utility (I can't remember which one now, unfortunately). The goal
is to setup up a complete system so that traces can be captured in
APM server, and the results in Elasticsearch inspected.
As discussed in #73569 the current implementation is too slow in certain scenarios.
The inefficient part of the code can be stated as the following problem:
Given a text (getText()) and a position in this text (offset), find the sentence
boundary before and after the offset, in such a way that the after boundary is
maximal but respects end boundary - start boundary < fragment size.
In case it's impossible to produce an after boundary that respects the said
condition, use the nearest boundary following offset.
The current approach begins by finding the nearest preceding and following boundaries,
and expands the following boundary greedily while it respects the problem restriction. This
is fine asymptotically, but BreakIterator which is used to find each boundary is sometimes
expensive.
This new approach maximizes the after boundary by scanning for the last boundary
preceding the position that would cause the condition to be violated (i.e. knowing start
boundary and offset, how many characters are left before resulting length is fragment size).
If this scan finds the start boundary, it means it's impossible to satisfy the problem
restriction, and we get the first boundary following offset instead (or better, since we
already scanned [offset, targetEndOffset], start from targetEndOffset + 1).
transform persists the internal state of a transform (e.g. the data cursor) in state document.
This change improves the error handling and fixes the problem described in #88905. A transform
can now recover from this problem.
fixes#88905
This change adds a SourceValueFetcherSortedDoubleIndexFieldData to support double doc values types for source fallback. This also adds support for double, float and half_float field types.
Introduced in: #88439
* [ML] add text_similarity nlp task documentation
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/ml-shared.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Computing routing nodes and the indices lookup takes considerable time
for large states. Both are needed during cluster state application and
Prior to this change would be computed on the applier thread in all cases.
By running the creation of both objects concurrently to publication, the
many shards benchmark sees a 10%+ reduction in the bootstrap time to
50k indices.
The parser used to parse Mount API requests is configured to
ignore unknown fields. I suspect we made it this way when it
was created because we were expecting to change the
request's body in the future, but that never happened.
This leniency confuses users (#75982) so we think it is better
to simply reject requests with unknown fields starting v8.5.0.
Because the High Level REST Client has a bug (to be fixed in #79604)
that injects a wrong ignored_index_settings we decided to just ignore
and not reject that one on purpose.
Closes#75982
Clean up network setting docs
- Add types for all params
- Remove mention of JDKs before 11
- Clarify some wording
Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>
This change adds source fallback support for byte, short, and long fields. These use the already
existing class SourceValueFetcherSortedNumericIndexFieldData.
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of
a data stream while renaming the data stream. For example, assume a user has an auto-follow request
that looks like this:
```
PUT /_ccr/auto_follow/my-auto-follow-pattern
{
"remote_cluster" : "other-cluster",
"leader_index_patterns" : ["logs-*"],
"follow_index_pattern" : "{{leader_index}}_copy"
}
```
And then the data stream `logs-mysql-error` was created, creating the backing index
`.ds-logs-mysql-error-2022-07-29-000001`.
Prior to this commit, replicating this data stream means that the backing index would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This
caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not
renamed for a data stream during following.
After this commit, there are a couple of changes:
First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes
`logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works,
this means we need to support renaming a data stream for a regular "create follower" request, so a
new parameter has been added: `data_stream_name`. It works like this:
```
PUT /mynewindex/_ccr/follow
{
"remote_cluster": "other-cluster",
"leader_index": "myotherindex",
"data_stream_name": "new_ds"
}
```
Second, the backing index for a data stream must be renamed in a way that does not break the parsing
of a data stream backing pattern, whereas previously the index
`.ds-logs-mysql-error-2022-07-29-000001` would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the
rollover digit), after this commit it will be renamed to
`.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for
the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like:
| Leader Cluster | Follower Cluster |
|--------------|-----------|
| `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) |
| `.ds-logs-mysql-error-2022-07-29-000001` | `.ds-logs-mysql-error_copy-2022-07-29-000001` |
Which internally means the auto-follow request turned into the create follower request of:
```
PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow
{
"remote_cluster": "other-cluster",
"leader_index": ".ds-logs-mysql-error-2022-07-29-000001",
"data_stream_name": "logs-mysql-error_copy"
}
```
Relates to https://github.com/elastic/elasticsearch/pull/84940 (cherry-picked the commit for a test)
Relates to https://github.com/elastic/elasticsearch/pull/61993 (where data stream support was first introduced for CCR)
Resolves https://github.com/elastic/elasticsearch/issues/81751
When a model is starting, it has been rarely observed that it will lock up while trying to restore the model objects to the native process.
This would manifest as a trained model being stuck in "starting" while also being assigned to a node. So, there is a native process started and task available on the assigned nodes, but the model state never gets out of "starting".
Speedup frequent_items by using bitsets instead of lists of longs. With this item sets
can be faster de-duplicated. A bit is set according to the order of top items (by count).
There were some cases where synthetic source wasn't properly rounding in
round trips. `0.15527719259262085` with a scaling factor of
`2.4206374697469164E16` was round tripping to `0.15527719259262088`
which then round trips up to `0.0.1552771925926209`, rounding the wrong
direction! This fixes the round tripping in this case through ever more
paranoid double checking and nudging.
Closes#88854
text_similarity is a cross-encoding task that compares two text inputs at inference time.
It can be used for cross-encoding re-ranking
```
POST _ml/trained_models/cross-encoder__ms-marco-tinybert-l-2-v2/_infer
{
"docs":[{ "text_field": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."}, {"text_field": "New York City is famous for the Metropolitan Museum of Art."}],
"inference_config": {
"text_similarity": {
"text": "How many people live in Berlin?"
}
}
}
```
With results:
```
{
"inference_results": [
{
"predicted_value": 7.235751628875732
},
{
"predicted_value": -11.562295913696289
}
]
}
```
Or with just raw text similarity. Here is an example for check if two questions are very similar:
```
POST _ml/trained_models/cross-encoder__quora-distilroberta-base/_infer
{
"docs":[{ "text_field": "what is your quest?"}, { "text_field": "what is your favorite color?"}, { "text_field": "is the swallow african or european?"}, { "text_field": "what is the airspeed velocity of a swallow carrying coconuts?"}, { "text_field": "how fast is an unladen swallow?"}],
"inference_config": {
"text_similarity": {
"text": "what is the airspeed velocity of an unladen swallow?"
}
}
}
```
With results:
```
{
"inference_results": [
{
"predicted_value": -8.312414169311523
},
{
"predicted_value": -8.239330291748047
},
{
"predicted_value": -8.256011009216309
},
{
"predicted_value": -4.1945390701293945
},
{
"predicted_value": -3.294121742248535
}
]
}
```
This PR adds a new API route to support bulk updates of API keys:
`POST _security/api_key/_bulk_update`
The route takes a list of IDs (`ids`) of API keys to update, along
with the same request parameters as the single operation route:
- `role_descriptors` - The list of role descriptors specified for the
key. This is one of the two parts that determines an API key’s
privileges.
- `metadata_flattened` - The searchable metadata associated
to an API key
Analogously to the single operation route, a call to `_bulk_update`
automatically updates the `limited_by_role_descriptors`, `creator`, and
`version` fields for each API key.
The implementation ports the single API key update operation to use the
new bulk functionality under the hood, translating as necessary at the
transport layer.
Relates: #88758
Plugin APIs are defined by a set of interfaces from server. Many of
these APIs are actually implementation details of the system. As we move
these implementation details to use different hook mechanisms so that
internals are only implementable by builtin components, the existing
plugin APIs need to be deprecated. Java provides a means to indicate
deprecation - through the `@Deprecated` annotation. But that annotation
is only seen when compiling a plugin implementing deprecated hooks, and
only then if deprecation warnings are not disabled.
This commit adds an introspection step to plugin initialization which
inspects each loaded plugin and looks for any APIs marked with the
@Deprecated annotation which are overridden by the plugin. If any are
found, deprecation log messages are then emitted to the deprecation log.
DiscoveryPlugin allows extending getJoinValidator and
getElectionStrategies. These are implementation details of the system.
This commit deprecates these methods so that plugin authors are
discouraged from overriding them.
Network plugins provide network implementations. In the past this has
been used for alternatives to netty based networking, using the JDK's
nio. However, nio has now been removed, and it is inadvisable for a
plugin to implement this low level part of the system.
Therefore, this commit marks the NetworkPlugin interface as deprecated.
When handling unicode accents, it may have been that BERT tokenizations removed the incorrect characters. This would result in an exceptionally strange result and possibly an error.
closes#88900
Adds metadata classes for Reindex and UpdateByQuery contexts.
For Reindex metadata:
* _index can't be null
* _id, _routing and _version are writable and nullable
* _now is read-only
* op is read-write must be 'noop', 'index' or 'delete'
Reindex metadata keeps the originx value for _index, _id, _routing and _version
so that `Reindexer` can see if they've changed.
If _version is null in the ctx map, or, equivalently, the augmentation
`setVersionToInternal()` was called by the script, `Reindexer` sets document
versioning to internal. If `_version` is `null` in the ctx map, `getVersion`
returns `Long.MIN_VALUE`.
For UpdateByQuery metadata:
* _index, _id, _version, _routing are all read-only
* _routing is also nullable
* _now is read-only
* op is read-write and one of 'index', 'noop', 'delete'
Closes: #86472
This change adds an operation parameter to FieldDataContext that allows us to specialize the field data that are returned from fielddataBuilder in MappedFieldType. Keyword, integer, and geo point field types now support source fallback where we build a doc values wrapper using source if doc values doesn't exist for this field under the operation SCRIPT. This allows us to have source fallback in scripting for the scripting fields API.
Adds some docs giving more detailed background about what data
corruption really means and some suggestions about how to narrow down
the root cause.
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
CtxMap delegates all metadata keys to it's `Metadata` container and
all other keys to it's source map. In most write contexts (update,
update by query, reindex), the source map should only contain one
key, `_source`, which has a `Map<String, Object>`.
This change adds validations to writes to the source map that rejects
insertion of invalid keys, illegal removal of the `_source` key, and
illegally overwriting the `_source` mapping with the wrong type.