Since https://github.com/apache/lucene-solr/pull/620, intervals disjunctions are automatically rewritten to handle cases where minimizations can miss valid matches.
This change updates the documentation to take this behaviour into account (users don't need to manually pull intervals disjunctions to the top anymore).
* Fix NullPointerException when doing knn search on empty index without dims
* Update docs/changelog/111756.yaml
* Fix typo in yaml test
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Gracefully handle invalid synonym rules by setting lenient to true by default when synonyms are updateable
---------
Co-authored-by: carlosdelest <carlos.delgado@elastic.co>
For nested kNN we support not only similarity thresholds, but also
multi-passage search while retrieving more than one nearest passage.
However, the inner_hits retrieved for the kNN search would ignore the
restricted similarity. Meaning, the inner hits would return all
passages, not just the ones within the limited similarity and this is
confusing.
closes: https://github.com/elastic/elasticsearch/issues/111093
The previous fix, which uses the search API, doesn't work with the
indexing tier only. This change uses the routing table from the cluster
state instead. I have tested this change in a serverless environment.
Relates #111211
The `search_shards` API is not available in serverless. This PR replaces
its usage in the newly added test with the `search` API with profiling.
Relates #111123
This change returns the total number of fields at the segment level,
allowing for a more accurate estimate of the memory used by Lucene. The
new estimate is expected to be closer to the actual memory usage than
the current estimate using the index-level field count, due to the
non-trivial overhead incurred by each Lucene segment. Two new fields are
introduced: total_segment_fields, which is the total number of fields at
the segment level, and average_fields_per_segment. The overhead per
field in segments with fewer fields is larger than in segments with many
fields.
We do not want to rely on templates or component templates to include
the host.name field in indices using LogsDB. The host.name field is a field
we sort on by default when LogsDB is used. As a result, we just inject it
by default, the same way we do for the @timestamp field. This prevents
sorting errors due to missing host.name field in mappings.
The host.name is a keyword field and depending on the value of subobjects it will
be mapped as a name keyword nested inside a host or as a flat host.name keyword.
We also include ignore_above as we normally do for keywords in observability mappings.
If we determine that the searchable term is completely empty, we switch back to a regular term query. This way we return the same docs as expected when we do a case sensitive search.
closes: #108968
Currently fails due to validation that is only performed in serverless:
```
java.lang.AssertionError: Failure at [logsdb/20_mapping:94]:
Expected: "Failed to parse mapping: Indices with with index mode [logs] only support synthetic source"
but: was "Failed to parse mapping: Parameter [mode=disabled] is not allowed in source"
```
This change allows querying the `index.mode` setting via a new
`_index_mode` metadata field, enabling APIs such as `field_caps` or
`resolve_indices` to target indices that are either time_series or logs
only. This approach avoids adding and handling a new parameter for
`index_mode` in these APIs. Both ES|QL and the `_search` API should also
work with this new field.
CCS tests could split the vectors over any number of shards. Through
empirical testing, I determined this commits values work to provide the
expected order, even if they are not all part of the same shard.
quantization can have weird behaviors when there are uniform values,
just like this test does.
closes#109978
This adds the Query Roles API:
```
POST /_security/_query/role
GET /_security/_query/role
```
This is similar to the currently existing: * [Query API key
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-api-key.html)
* [Query User
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-user.html)
Sample request:
```
POST /_security/_query/role
{
"query": {
"bool": {
"filter": [
{
"terms": {
"applications.application": ["app-1", "app-2" ]
}
}
],
"must_not": [
{
"match": {
"description": {
"query": "test match on role description (which is mapped as a text field)"
}
}
}
]
}
},
"sort": [
"name"
],
"search_after": [
"role-name-1"
]
}
```
The query supports a subset of query types, including match_all, bool,
term, terms, match, ids, prefix, wildcard, exists, range, and simple
query string.
Currently, the supported fields are: * name * description * metadata
* applications.application * applications.resources *
applications.privileges
The query also supports pagination-related fields (`from`, `size`,
`search_after`), analogous to the generic [Search
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html).
The response format is similar to that of the [Query API
key](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-api-key.html)
and [Query
User](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-user.html)
APIs. It contains a **list** of roles, in the sorted order (if
specified). Unlike the [Get Roles
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-get-role.html),
the role **name** is an attribute of the element in the list of roles
(in the get-roles API case, the role name was the key in the response
map, and the value was the rest of the role descriptor). In addition,
the element in the list of roles also contains the optional `_sort`
field, eg (sample response):
```
{
"total": 3,
"count": 3,
"roles": [
{
"name": "LYdz2",
"cluster": [],
"indices": [],
"applications": [
{
"application": "ejYWvGQTF",
"privileges": [
"pRCfBMgOy",
"zDhFtMQfc",
"roudxado"
],
"resources": [
"nWHEpmgxy",
"SOML/hMYrqx",
"YIqP/*",
"ueEomwsA"
]
},
{
"application": "ampUW9",
"privileges": [
"jDvRtp"
],
"resources": [
"99"
]
}
],
"run_as": [],
"metadata": {
"nFKc": [
1,
0
],
"PExF": [],
"qlqY": -433239865,
"IQXm": []
},
"transient_metadata": {
"enabled": true
},
"description": "KoLlsEbq",
"_sort": [
"LYdz2"
]
},
{
"name": "oaxW0",
"cluster": [],
"indices": [],
"applications": [
{
"application": "*",
"privileges": [
"qZYb"
],
"resources": [
"tFrSULaKb"
]
},
{
"application": "aLaEN9",
"privileges": [
"fCOc"
],
"resources": [
"gozqXtSgE",
"UX/JgydeIM",
"sjUp",
"Ivdz/UAmuNrQAG"
]
},
{
"application": "rbxyuKIMPAp",
"privileges": [
"lluqieFRu",
"xKU",
"gHlb"
],
"resources": [
"99"
]
}
],
"run_as": [],
"metadata": {},
"transient_metadata": {
"enabled": true
},
"_sort": [
"oaxW0"
]
},
{
"name": "vWAV1",
"cluster": [],
"indices": [],
"applications": [
{
"application": "*",
"privileges": [
"kWBWjCAc"
],
"resources": [
"hvEtV",
"gZJ"
]
},
{
"application": "avVUV9",
"privileges": [
"newZTa",
"gQpxNm"
],
"resources": [
"99"
]
}
],
"run_as": [],
"metadata": {},
"transient_metadata": {
"enabled": true
},
"_sort": [
"vWAV1"
]
}
]
}
```
The index.mode setting validates other index settings. When updating the index.time_series.end_time setting and the index.mode setting isn't wasn't defined at index creation time (meaning that default is active), then this validation is skipped which results into (worse) errors at a later point in time.
This problem is fixed by enforced by making index.mode setting a dependency of index.time_series.end_time setting.
Note that this problem doesn't exist for the index.time_series.start_time and index.routing_path index settings, because these index settings are final, which mean these can only be defined when an index is being created.
Closes#110265
This PR piggy-backs on recent changes in Lucene 9.11.1
(https://github.com/apache/lucene/pull/12829,
https://github.com/apache/lucene/pull/13341/), setting the parent doc
when nested fields are present. This allows moving nested documents
along with parent ones during sorting.
With this change, sorting is now allowed on fields outside nested
objects. Sorting on fields within nested objects is still not supported
(throws an exception).
Fixes#107349
Introduce an optional k param for knn query
If k is not set, knn query has the previous behaviour:
- `num_candidates` docs is collected from each shard. This `num_candidates` docs
are used for combining with results with other queries and aggregations on each shard.
- docs from all shards are merged to produce the top global `size` results
If k is set, the behaviour instead is following:
- `k` docs is collected from each shard. This `k` docs are used for
combining results with other queries and aggregations on each shard.
- similarly, docs from all shards are merged to produce the top global `size`
results.
Having `k` param makes it more intuitive for users to address their needs.
They also don't need to care and can skip `num_candidates` param for this query
as it is of more internal details to tune how knn search operates.
Closes#108473
* Add an override to the aggs tests to override the allow list default setting. This makes it possible to run the scripted metric aggs tests on Serverless, even when we disallow these aggs per default on Serverless.
* Move the allow list tests next to the scripted metric tests since these belong together.
This commit adds `bit` vector support by adding `element_type: bit` for
vectors. This new element type works for indexed and non-indexed
vectors. Additionally, it works with `hnsw` and `flat` index types. No
quantization based codec works with this element type, this is
consistent with `byte` vectors.
`bit` vectors accept up to `32768` dimensions in size and expect vectors
that are being indexed to be encoded either as a hexidecimal string or a
`byte[]` array where each element of the `byte` array represents `8`
bits of the vector.
`bit` vectors support script usage and regular query usage. When
indexed, all comparisons done are `xor` and `popcount` summations (aka,
hamming distance), and the scores are transformed and normalized given
the vector dimensions. Note, indexed bit vectors require `l2_norm` to be
the similarity.
For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported.
Note, the dimensions expected by this element_type are always to be
divisible by `8`, and the `byte[]` vectors provided for index must be
have size `dim/8` size, where each byte element represents `8` bits of
the vectors.
closes: https://github.com/elastic/elasticsearch/issues/48322