Commit graph

3650 commits

Author SHA1 Message Date
Martijn van Groningen
2b1170509b
Change subobjects yaml tests to use composable index templates. (#112129)
Currently the legacy templates are being used which are deprecated.
2024-08-23 17:17:54 +07:00
Quentin Pradet
92d25c157a
Fix id and routing types in indices.split YAML tests (#112059) 2024-08-23 12:46:52 +04:00
Kostas Krikellas
1362d56865
Introduce mode subobjects=auto for objects (#110524)
* Introduce mode `subobjects=auto` for objects

* Update docs/changelog/110524.yaml

* compilation error

* tests and fixes

* refactor

* spotless

* more tests

* fix nested objects

* fix test

* update fetch test

* add QA coverage

* update tests

* update tests

* update tests

* fix nested
2024-08-22 15:13:52 +03:00
Oleksandr Kolomiiets
27721c3c05
Add a test reproducing issue with lookup of parent document in nested field synthetic source (#112043) 2024-08-21 08:36:55 -07:00
Keith Massey
fac9b6a21e
Updating fix version for bulk api took time fix now that it has been backported (#111863) (#111899) (#111906) 2024-08-14 12:59:01 -05:00
Keith Massey
e63225ae32
Fixing incorrect bulk request took time (#111863) 2024-08-14 10:39:45 -05:00
Jim Ferenczi
6ee9801a99
Update the intervals query docs (#111808)
Since https://github.com/apache/lucene-solr/pull/620, intervals disjunctions are automatically rewritten to handle cases where minimizations can miss valid matches.
This change updates the documentation to take this behaviour into account (users don't need to manually pull intervals disjunctions to the top anymore).
2024-08-13 13:39:55 +09:00
Benjamin Trent
d0bd1f2cb1
fixing data setup for knn yaml tests (#111794)
We should do set up just in the test as that is the only place that uses
this index. This way we get around any weird bwc checks around
previously required parameters.

Additionally, this adjusts the bwc version skip as the code fix has been
backported.

closes: https://github.com/elastic/elasticsearch/issues/111765 closes:
https://github.com/elastic/elasticsearch/issues/111766 closes:
https://github.com/elastic/elasticsearch/issues/111767 closes:
https://github.com/elastic/elasticsearch/issues/111768
2024-08-13 06:38:14 +10:00
Kathleen DeRusso
4e26114764
Fix NullPointerException when doing knn search on empty index without dims (#111756)
* Fix NullPointerException when doing knn search on empty index without dims

* Update docs/changelog/111756.yaml

* Fix typo in yaml test

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-08-09 13:59:57 -04:00
Simon Cooper
b0c82f4054
Update docs with new behavior on skip conditions (#111640)
#111585 and #111268 change the behavior to skip on any node having the feature/capability, not all nodes
2024-08-07 10:37:59 +01:00
Simon Cooper
5da4f31a4b
Skip on any node capability being present (#111585)
Update capabilities skip behavior to skip on any node having the capability, not all nodes
2024-08-07 10:36:23 +01:00
Mike Pellegrini
e11fa74333
Gracefully handle invalid synonym rules in updateable synonyms (#110901)
Gracefully handle invalid synonym rules by setting lenient to true by default when synonyms are updateable

---------

Co-authored-by: carlosdelest <carlos.delgado@elastic.co>
2024-08-06 10:44:23 -04:00
David Turner
e3a2ce99de
Fix trappy timeouts in data stream APIs (#111474)
Relying on the default 30s timeout is trappy, we should be explicit
about the timeouts we're using in these requests.

Relates #107984
2024-08-06 21:13:06 +10:00
David Turner
586405d11f
Remove trappy timeout from ClusterSearchShardsRequest (#111442)
Exposes the `?master_timeout` parameter to the REST API and sets it
appropriately on internal/test requests.

Relates #107984
2024-07-31 08:53:24 +01:00
Benjamin Trent
69c96974de
Ensure vector similarity correctly limits inner_hits returned for nested kNN (#111363)
For nested kNN we support not only similarity thresholds, but also
multi-passage search while retrieving more than one nearest passage. 

However, the inner_hits retrieved for the kNN search would ignore the
restricted similarity. Meaning, the inner hits would return all
passages, not just the ones within the limited similarity and this is
confusing.

closes: https://github.com/elastic/elasticsearch/issues/111093
2024-07-30 06:01:56 +10:00
Nhat Nguyen
52834fe041
Relax assertions in segment level field stats (#111243)
This PR relaxes the assertions to allow an additional field 
introduced in serverless.
2024-07-24 12:58:11 -07:00
Nhat Nguyen
20094bfd8f
Use routing_table for allocated node in tests (#111217)
The previous fix, which uses the search API, doesn't work with the
indexing tier only. This change uses the routing table from the cluster
state instead. I have tested this change in a serverless environment.

Relates #111211
2024-07-24 10:04:27 +10:00
Oleksandr Kolomiiets
b8da526eda
Change the name of logsdb mapping test file to more specific (#111076) 2024-07-23 15:19:04 -07:00
Nhat Nguyen
8e07c4e572
Replace search_shards with search API in tests (#111211)
The `search_shards` API is not available in serverless. This PR replaces
its usage in the newly added test with the `search` API with profiling.

Relates #111123
2024-07-24 06:14:52 +10:00
Nhat Nguyen
f275dff609
Add Lucene segment-level fields stats (#111123)
This change returns the total number of fields at the segment level, 
allowing for a more accurate estimate of the memory used by Lucene. The
new estimate is expected to be closer to the actual memory usage than
the current estimate using the index-level field count, due to the
non-trivial overhead incurred by each Lucene segment. Two new fields are
introduced: total_segment_fields, which is the total number of fields at
the segment level, and average_fields_per_segment. The overhead per
field in segments with fewer fields is larger than in segments with many
fields.
2024-07-23 08:52:39 -07:00
Oleksandr Kolomiiets
344d846c5b
Fix remaining references to logs index mode (#111164) 2024-07-22 12:28:10 -07:00
Keith Massey
a2814e816b
Adding mapping validation to the simulate ingest API (#110606) 2024-07-19 08:08:21 -05:00
Salvatore Campagna
0f584176ca
Rename logs index mode to logsdb (#111054) 2024-07-19 13:38:58 +02:00
Salvatore Campagna
9332a937e1
test: re-enable test after backport #11031 (#111035) 2024-07-19 10:20:43 +02:00
Enrico Zimuel
39aa832400
Changed security API endpoints to stable (#110862) 2024-07-18 15:24:36 +02:00
Tommaso Teofili
0289ca68b8
Dense vector field types updatable for int4 (#110928) 2024-07-18 13:54:32 +02:00
Salvatore Campagna
ac2afd7633
Inject host.name field without relying on (component) templates (#110938)
We do not want to rely on templates or component templates to include
the host.name field in indices using LogsDB. The host.name field is a field
we sort on by default when LogsDB is used. As a result, we just inject it
by default, the same way we do for the @timestamp field. This prevents
sorting errors due to missing host.name field in mappings.

The host.name is a keyword field and depending on the value of subobjects it will
be mapped as a name keyword nested inside a host or as a flat host.name keyword.
We also include ignore_above as we normally do for keywords in observability mappings.
2024-07-18 12:47:51 +02:00
Joe Gallo
27e7601698
Directly download commercial ip geolocation databases from providers (#110844)
Co-authored-by: Keith Massey <keith.massey@elastic.co>
2024-07-17 20:55:14 -04:00
Benjamin Trent
28c7cbccce
Make empty string searches be consistent with case (in)sensitivity (#110833)
If we determine that the searchable term is completely empty, we switch back to a regular term query. This way we return the same docs as expected when we do a case sensitive search.

closes: #108968
2024-07-17 15:20:57 -04:00
Oleksandr Kolomiiets
ed0f3d0f70
Revert "Fix logsdb mapping rest tests on serverless (#110900)" (#110931)
This reverts commit 1bb58ccff0.
2024-07-16 09:49:52 -07:00
Oleksandr Kolomiiets
1bb58ccff0
Fix logsdb mapping rest tests on serverless (#110900)
Currently fails due to validation that is only performed in serverless:

```
java.lang.AssertionError: Failure at [logsdb/20_mapping:94]: 
Expected: "Failed to parse mapping: Indices with with index mode [logs] only support synthetic source"
     but: was "Failed to parse mapping: Parameter [mode=disabled] is not allowed in source"
```
2024-07-16 08:15:33 +10:00
Oleksandr Kolomiiets
a25ed530e5
Add validation for synthetic source mode in logs mode indices (#110677) 2024-07-15 11:11:57 -07:00
Benjamin Trent
c2e1ab8934
Correct tests, skipping on cluster features in mixed clusters is buggy (#110747)
Cluster feature "skip" just doesn't work as expected in a mixed cluster
scenario. It could be that the request is handled by a new node. I
honestly don't know whats happening there.

This adjusts the tests so that we verify that `allow_unmapped_fields`
modifies the behavior as expected.

closes: https://github.com/elastic/elasticsearch/issues/110720 closes:
https://github.com/elastic/elasticsearch/issues/110719
2024-07-12 22:39:14 +10:00
Nhat Nguyen
1964be565c
Allow querying index_mode (#110676)
This change allows querying the `index.mode` setting via a new 
`_index_mode` metadata field, enabling APIs such as `field_caps` or
`resolve_indices` to target indices that are either time_series or logs
only. This approach avoids adding and handling a new parameter for
`index_mode` in these APIs. Both ES|QL and the `_search` API should also
work with this new field.
2024-07-10 16:45:11 -07:00
ghostspiders
3bd192c2e0
KnnVectorQueryBuilder support for allowUnmappedFields (#107047)
* KnnVectorQueryBuilder  support for allowUnmappedFields

* Update and rename 106811.yaml to 107047.yaml

* Update 107047.yaml

* buildkite test

* spotless

* spotless

* Apply suggestions from code review

* fixing compilation

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
2024-07-10 09:41:15 -04:00
Moritz Mack
a4b3e6ffb5
Use valid documentation url for capabilities in rest specs (#110657) 2024-07-10 09:30:45 +02:00
Benjamin Trent
9dbe97b2cb
Fix flaky test #109978 (#110245)
CCS tests could split the vectors over any number of shards. Through
empirical testing, I determined this commits values work to provide the
expected order, even if they are not all part of the same shard. 

quantization can have weird behaviors when there are uniform values,
just like this test does.

closes #109978
2024-07-09 07:28:31 +10:00
Johannes Fredén
89cd966b24
Add bulk delete roles API (#110383)
* Add bulk delete roles API
2024-07-03 11:04:53 +02:00
Albert Zaharovits
566f5f831a
Query Roles API (#108733)
This adds the Query Roles API:

```
POST /_security/_query/role
GET /_security/_query/role
```

This is similar to the currently existing:  * [Query API key
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-api-key.html)
* [Query User
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-user.html)

Sample request:

```
POST /_security/_query/role
{
  "query": { 
    "bool": {
      "filter": [
        {
          "terms": {
            "applications.application": ["app-1", "app-2" ]
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "description": {
              "query": "test match on role description (which is mapped as a text field)"
            }
          }
        }
      ]
    }
  },
  "sort": [ 
    "name"
  ],
  "search_after": [
    "role-name-1"
  ]
}
```

The query supports a subset of query types, including match_all, bool,
term, terms, match, ids, prefix, wildcard, exists, range, and simple
query string.

Currently, the supported fields are:  * name  * description  * metadata 
* applications.application  * applications.resources  *
applications.privileges

The query also supports pagination-related fields (`from`, `size`,
`search_after`), analogous to the generic [Search
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html).

The response format is similar to that of the [Query API
key](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-api-key.html)
and [Query
User](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-query-user.html)
APIs. It contains a **list** of roles, in the sorted order (if
specified). Unlike the [Get Roles
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-get-role.html),
the role **name** is an attribute of the element in the list of roles
(in the get-roles API case, the role name was the key in the response
map, and the value was the rest of the role descriptor). In addition,
the element in the list of roles also contains the optional `_sort`
field, eg (sample response):

```
{
  "total": 3,
  "count": 3,
  "roles": [
    {
      "name": "LYdz2",
      "cluster": [],
      "indices": [],
      "applications": [
        {
          "application": "ejYWvGQTF",
          "privileges": [
            "pRCfBMgOy",
            "zDhFtMQfc",
            "roudxado"
          ],
          "resources": [
            "nWHEpmgxy",
            "SOML/hMYrqx",
            "YIqP/*",
            "ueEomwsA"
          ]
        },
        {
          "application": "ampUW9",
          "privileges": [
            "jDvRtp"
          ],
          "resources": [
            "99"
          ]
        }
      ],
      "run_as": [],
      "metadata": {
        "nFKc": [
          1,
          0
        ],
        "PExF": [],
        "qlqY": -433239865,
        "IQXm": []
      },
      "transient_metadata": {
        "enabled": true
      },
      "description": "KoLlsEbq",
      "_sort": [
        "LYdz2"
      ]
    },
    {
      "name": "oaxW0",
      "cluster": [],
      "indices": [],
      "applications": [
        {
          "application": "*",
          "privileges": [
            "qZYb"
          ],
          "resources": [
            "tFrSULaKb"
          ]
        },
        {
          "application": "aLaEN9",
          "privileges": [
            "fCOc"
          ],
          "resources": [
            "gozqXtSgE",
            "UX/JgydeIM",
            "sjUp",
            "Ivdz/UAmuNrQAG"
          ]
        },
        {
          "application": "rbxyuKIMPAp",
          "privileges": [
            "lluqieFRu",
            "xKU",
            "gHlb"
          ],
          "resources": [
            "99"
          ]
        }
      ],
      "run_as": [],
      "metadata": {},
      "transient_metadata": {
        "enabled": true
      },
      "_sort": [
        "oaxW0"
      ]
    },
    {
      "name": "vWAV1",
      "cluster": [],
      "indices": [],
      "applications": [
        {
          "application": "*",
          "privileges": [
            "kWBWjCAc"
          ],
          "resources": [
            "hvEtV",
            "gZJ"
          ]
        },
        {
          "application": "avVUV9",
          "privileges": [
            "newZTa",
            "gQpxNm"
          ],
          "resources": [
            "99"
          ]
        }
      ],
      "run_as": [],
      "metadata": {},
      "transient_metadata": {
        "enabled": true
      },
      "_sort": [
        "vWAV1"
      ]
    }
  ]
}
```
2024-07-03 01:59:11 +10:00
Johannes Fredén
55476041d9
Add BulkPutRoles API (#109339)
* Add BulkPutRoles API
2024-07-02 15:45:39 +02:00
Martijn van Groningen
e0d71d660d
Disallow index.time_series.end_time setting from being set or updated in normal indices (#110268)
The index.mode setting validates other index settings. When updating the index.time_series.end_time setting and the index.mode setting isn't wasn't defined at index creation time (meaning that default is active), then this validation is skipped which results into (worse) errors at a later point in time.

This problem is fixed by enforced by making index.mode setting a dependency of index.time_series.end_time setting.

Note that this problem doesn't exist for the index.time_series.start_time and index.routing_path index settings, because these index settings are final, which mean these can only be defined when an index is being created.

Closes #110265
2024-07-02 12:19:09 +02:00
Kostas Krikellas
e3caeed2b6
Fix sort on nested test (#110331)
* Add test for nested array, fix sort  on nested test.

* Fix sort  on nested test.
2024-07-01 15:00:15 +03:00
Kostas Krikellas
5fa92812cf
Add test for nested array, fix sort on nested test. (#110325) 2024-07-01 12:08:04 +03:00
Kostas Krikellas
6ae652f90e
Support index sorting with nested fields (#110251)
This PR piggy-backs on recent changes in Lucene 9.11.1
(https://github.com/apache/lucene/pull/12829,
https://github.com/apache/lucene/pull/13341/), setting the parent doc
when nested fields are present. This allows moving nested documents
along with parent ones during sorting.

With this change, sorting is now allowed on fields outside nested
objects. Sorting on fields within nested objects is still not supported
(throws an exception).

Fixes #107349
2024-07-01 17:24:17 +10:00
Mayya Sharipova
405e39660b
Support k parameter for knn query (#110233)
Introduce an optional k param for knn query

If k is not set, knn query has the previous behaviour:
- `num_candidates` docs  is collected from each shard. This `num_candidates` docs
are used for combining with results with other queries and aggregations on each shard.
- docs from all shards are merged to produce the top global `size` results

If k is set, the behaviour instead is following:
- `k` docs is collected from each shard. This `k` docs are used for
combining results with other queries and aggregations on each shard.
- similarly, docs from all shards are merged to produce the top global `size`
results.

Having `k` param makes it more intuitive for users to address their needs.
They also don't need to care and can skip `num_candidates` param for this query
as it is of more internal details to tune how knn search operates.

Closes #108473
2024-06-28 09:59:28 -04:00
Kathleen DeRusso
959d07f5ee
Rename query rules namespace in rest api spec (#110208)
* Rename query rules namespace in rest api spec

* Rename per Specification PR feedback
2024-06-28 08:19:08 -04:00
Alexander Spies
2876e059f3
Aggs: Improve scripted metric agg allow list tests (#110153)
* Add an override to the aggs tests to override the allow list default setting. This makes it possible to run the scripted metric aggs tests on Serverless, even when we disallow these aggs per default on Serverless.
* Move the allow list tests next to the scripted metric tests since these belong together.
2024-06-28 11:47:30 +02:00
Oleksandr Kolomiiets
736357a9fb
Handle ignore_above in synthetic source for flattened fields (#110214) 2024-06-27 10:11:26 -07:00
Benjamin Trent
5add44d7d1
Adds new bit element_type for dense_vectors (#110059)
This commit adds `bit` vector support by adding `element_type: bit` for
vectors. This new element type works for indexed and non-indexed
vectors. Additionally, it works with `hnsw` and `flat` index types. No
quantization based codec works with this element type, this is
consistent with `byte` vectors.

`bit` vectors accept up to `32768` dimensions in size and expect vectors
that are being indexed to be encoded either as a hexidecimal string or a
`byte[]` array where each element of the `byte` array represents `8`
bits of the vector.

`bit` vectors support script usage and regular query usage. When
indexed, all comparisons done are `xor` and `popcount` summations (aka,
hamming distance), and the scores are transformed and normalized given
the vector dimensions. Note, indexed bit vectors require `l2_norm` to be
the similarity.

For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported.

Note, the dimensions expected by this element_type are always to be
divisible by `8`, and the `byte[]` vectors provided for index must be
have size `dim/8` size, where each byte element represents `8` bits of
the vectors.

closes: https://github.com/elastic/elasticsearch/issues/48322
2024-06-27 04:48:41 +10:00
Quentin Pradet
6d98e0d6b9
Fix trailing slash in two rollup specifications (#110176) 2024-06-26 12:29:19 +04:00