When using a pre-filter with nested kNN vectors, its treated like a
top-level filter. Meaning, it is applied over parent document fields.
However, there are times when a query filter is applied that may or may
not match internal nested or non-nested docs. We failed to handle this
case correctly and users would receive an error.
closes: https://github.com/elastic/elasticsearch/issues/105901
First check whether the full cluster supports a specific indicator (feature) before we mark an indicator as "unknown" when (meta) data is missing from the cluster state.
We are adding a query parameter to the field_caps api in order to filter out
fields with no values. The parameter is called `include_empty_fields` and
defaults to true, and if set to false it will filter out from the field_caps
response all the fields that has no value in the index.
We keep track of FieldInfos during refresh in order to know which field has
value in an index. We added also a system property
`es.field_caps_empty_fields_filter` in order to disable this feature if needed.
---------
Co-authored-by: Matthias Wilhelm <ankertal@gmail.com>
To improve cross-cluster search user experience, Kibana needs an endpoint that is accessible
by arbitrary Kibana dashboard search users and provides:
1. a listing of clusters in scope for a CCS query (based on the index expression and whether
there are any indices on each cluster that the Kibana user has access to query).
2. whether that cluster is currently connected to the querying cluster (will it come back as
skipped or failed in a CCS search)
3. showing the skip_unavailable setting for those clusters (so you can know whether it will
return skipped or failed in a CCS search)
4. the ES version of the cluster
Since no single Elasticsearch endpoint provides all of these features, this PR creates a new endpoint `_resolve/cluster` that works along side the existing `_resolve/index` endpoint
(and leverages some of its features).
Example usage against a cluster with 2 remote clusters configured:
GET /_resolve/cluster/*,remote*:bl*
Response:
{
"(local)": {
"connected": true,
"skip_unavailable": false,
"matching_indices": true,
"version": {
"number": "8.12.0-SNAPSHOT",
"build_flavor": "default",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
}
},
"remote2": {
"connected": true,
"skip_unavailable": true,
"matching_indices": true,
"version": {
"number": "8.12.0-SNAPSHOT",
"build_flavor": "default",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
}
},
"remote1": {
"connected": true,
"skip_unavailable": false,
"matching_indices": false,
"version": {
"number": "8.12.0-SNAPSHOT",
"build_flavor": "default",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
}
}
}
Almost all errors show up as "error" entries in the response.
Only the local SecurityException returns a 403 since that happens before the ResolveCluster
Transport code kicks in.
Makes the task_type element of the _inference API optional so that
it is possible to GET, DELETE or POST to an inference entity without
providing the task type
This adds two new vector index types: - flat - int8_flat
Both store the vectors in a flat space and search is brute-force over
the vectors in the index. For the regular `flat` index, this can be
considered syntactic sugar that allows `knn` queries without having to
put indices within HNSW.
For `int8_flat`, this allows float vectors to be stored in a flat
manner, but also automatically quantized.
Yaml tests executed in mixed clusters need to skip clusters that run 8.12.x or earlier versions. The yaml tests assume hashing based time series ids, but if a node in the test cluster is on 8.12.x or earlier, then it can happen pre hashing time series ids are used (depending on the version of the elected master node).
Tsdb yaml tests that assert the _id or _tsid should be skipped if there are 8.12.x nodes in the mixed test cluster.
Rolling upgrade or full upgrade tests are better for assertion the _id or _tsid in this case, because tests are setup prior to upgrade and pre 8.12.x logic can be asserted in a more controlled way.
Closes#105129
A Lucene limitation on doc values for UTF-8 fields does not allow us to
write keyword fields whose size is larger then 32K. This limits our
ability to map more than a certain number of dimension fields for time
series indices. Before introducing this change the tsid is created as a
catenation of dimension field names and values into a keyword field.
To overcome this limitation we hash the tsid. This PR is intended to be
used as a draft to test different options.
Note that, as a side effect, this reduces the size of the tsid field as
a result of storing far less data when the tsid is hashed. Anyway, we
expect tsid hashing to affect compression of doc values and resulting in
larger storage footprint. Effect on query latency needs to be evaluated
too.
Resolves#93564
`name` is de facto required for `collapse.inner_hits`. It always has been, but we have never validated up front. Instead we accidentally try to serialize `null`, which leads to exciting and confusing errors.
closes: https://github.com/elastic/elasticsearch/issues/104647
Deprecated node_version field, made it optional(unused) in new parser
Added deprecation warning handler for mixed cluster
Split tests for old vs. current format
- Introduce new internal system index called .connector-secrets
- Add GET and POST requests for connector secrets
- Add permission sets for read and write connector secrets
- Introduce new internal system index called .connector-secrets
- Add GET and POST requests for connector secrets
- Create read_connector_secrets and write_connector_secrets role permissions
* Add extract match ranges functionality to Grok.
* TestGrokPatternAction and Request
* TestGrokPattern response
* Update docs/changelog/104394.yaml
* Polish validation error message
* Improve test_grok_pattern API
* Add explicit CharSet
* Add endpoint to operator constants
* Add TransportTestGrokPatternActionTests
* REST API spec
* One more TransportTestGrokPatternActionTest
* Fix API spec
* Refactor REST API spec
* Polish code
* Replace TransportTestGrokPatternActionTests by a YAML REST test
* Add ecs_compatibility
* Always return arrays in the API
* Documentation
* YAML test for ecs_compatibility
* Rename doc fileø
* serverless scope
* Fix docs (hopefully)
* Update docs/reference/rest-api/index.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Add "text structure APIs" header in docs TOC
* Move file
* Remove test grok from main index
* typo
* Nested APIs underneath text structure
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Closes#97032
Adding the ability to set `require_data_stream` parameter (boolean) on bulk and indexing APIs.
For document indexing, this flag requires the indexing operation to either be pointed at a data stream, or match a template that will create a data stream.
This fixes a problem with the async tests for `drop_null_columns`
caused by us not passing the option when fetching from the async index.
This option wasn't actually supported so I had to plumb that through as
well.
The test is unnecessarily complicated with its vectors. This simplifies
the vectors and the test. We mainly care about extreme weirdness &
server level failures.
closes: https://github.com/elastic/elasticsearch/issues/104297
We recently introduced support for index_filter to the open point in time API (see #102388).
Open point in time supports executing against remote indices, in which case it will open a
reader context against the target remote shards. With support for index_filter, shards that
cannot match the filter are not even included in the PIT id that open PIT returns.
When the following search is executed that includes such PIT id, there is one search shards call
per cluster performed, which will return all shards from the targeted indices, including those
that open PIT has filtered out. In that case, we should just ignore those shards instead of
throwing exception when those are looked up in the search context id map built from the PIT id.
Closes#102596
This adds tests that run the our suite of yaml tests against the ESQL
async endpoint. That's quite nice because the yaml tests are where we
handle lots of fun error cases and this'll make sure async does sensible
things in those cases.
The assertion here:
38867b7fb6/server/src/main/java/org/elasticsearch/search/fetch/FetchSearchResult.java (L64)
Assumes that profiling should never be overridden. However, when using
kNN & inner_hits we execute separate fetch phases for those inner_hits.
To prevent this assertion, I propose we have the inner hits context
return `null` for the profiler. The top level profiler is all we care
about with `fetch` and that is what we return to the user.
NOTE: Since this is an assertion, and the top-level fetch is unaffected,
production environments aren't effected by the original bug nor by this
fix.
closes: https://github.com/elastic/elasticsearch/issues/103985