elasticsearch/docs/reference
Luca Cavanna f5a2af6c71
Query phase: fold collector wrappers into a single top level collector (#97030)
The query phase uses a number of different collectors and combines them together, pretty much one per feature that the search API exposes: there is a collector for post_filter, one for min_score, one for terminate_after, one for aggs. While this is very flexible, we always combine such collectors together in the same way (e.g. terminate_after must be the first one, post_filter is only applied to top docs collection, min score is applied to both aggs and top docs). This means that despite we could flexibly compose collectors, we need to apply each feature predictably which makes the composability not needed. Furthermore, composability causes complexity.

The terminate_after functionality is a clear example of complexity introduced as a consequence of having a complex collector tree: it relies on a multi collector, and throws an exception to force terminating the collection for all other collectors in the tree. If there was a single collector aware of post_filter, min_score and terminate_after at the same time, we could simply reuse Lucene mechanisms to early terminate the collection (CollectionTerminatedException) instead of forcing the termination throwing an exception that Lucene does not handle.

Furthermore, MultiCollector is a complex and generic collector to combine multiple collectors together, while we always every combine maximum two collectors with it, which are more or less fixed (e.g. top docs and aggs).

This PR introduces a new top-level collector that is inspired by MultiCollector in that it holds the top docs and the optional aggs collector and applies post_filter, min_score as well as terminate_after as part of its execution. This allows us to have a specialized collector for our needs, less flexibility and more control. This surfaced some strange behaviour that we may want to change as a follow-up in how terminate_after makes us collecting docs even when all possible collections have been early terminated. The goal of this PR though is to have feature parity with query phase before the refactoring, without any change of behaviour.

A nice benefit of this work is that it allows us to rely on CollectionTerminatedException for the terminate_after functionality. This simplifies the introduction of multi-threaded collector managers when it comes to handling exceptions.
2023-06-30 12:48:13 +02:00
..
aggregations Update histogram-aggregation docs (#96974) 2023-06-22 11:16:39 +02:00
analysis Add trim filter to allowed normalizer filters in docs (#96739) 2023-06-14 15:52:26 +02:00
autoscaling [DOCS] Updates ML decider docs by mentioning CPU as scaling criterion (#92018) 2022-11-30 13:37:20 +01:00
behavioral-analytics/apis Add beta label to Behavioral Analytics API reference (#96657) 2023-06-07 14:45:36 +02:00
cat [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
ccr [DOCS] CCR disaster recovery (#91491) 2023-04-21 10:02:54 +01:00
cluster Fix delete-desired-balance doc (#96978) 2023-06-27 10:12:15 +02:00
commands [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
data-management [DOC] auto migrate only for default template (#82043) 2022-05-10 11:35:19 -04:00
data-streams Start with data stream lifecycle documentation (#95326) 2023-06-28 16:18:05 +03:00
docs [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
eql [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
features/apis Fix typo (#91894) 2022-11-24 14:40:43 +01:00
fleet Fix some typos in plugins & reference docs (#84667) 2022-03-07 12:29:58 -05:00
graph Fix typo in Graph Explore API docs (#95907) 2023-05-08 15:38:35 +02:00
health Document the enhancements to ILM Health Indicator (#96980) 2023-06-27 10:54:36 +02:00
high-availability [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
how-to Add file extensions for vector search for preload (#96955) 2023-06-20 13:52:51 -04:00
ilm Start with data stream lifecycle documentation (#95326) 2023-06-28 16:18:05 +03:00
images Add Geospatial analysis overview documentation (#94486) 2023-03-20 10:01:13 -06:00
index-modules [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
indices Fix format of DiscoveryNode xcontent index version fields (#97223) 2023-06-29 16:53:38 +01:00
ingest Enable analytics geoip in behavioral analytics. (#96624) 2023-06-15 23:42:10 +02:00
licensing [DOCS] Remove testenv annotations from doc snippet tests (#80023) 2021-11-05 18:38:50 -04:00
mapping [DOCS] Make 2028 dims 'experimental' warning inline (#96369) 2023-05-30 10:13:38 +02:00
migration Migrate IndexMetadata.getCreationVersion to IndexVersion (#97139) 2023-06-29 08:38:50 +01:00
ml [DOCS] Adds API docs for bert_ja text embedding tokenizer option (#96873) 2023-06-26 11:36:08 +02:00
modules [DOCS] Note license requirements for CCS (#97252) 2023-06-29 16:55:10 -04:00
monitoring [DOCS] Update default monitoring method on Elastic Cloud (#95662) 2023-05-02 11:31:33 +02:00
query-dsl [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
release-notes Bump to version 8.10.0 2023-06-22 10:35:12 +01:00
repositories-metering-api [DOCS] Remove testenv annotations from doc snippet tests (#80023) 2021-11-05 18:38:50 -04:00
rest-api Start with data stream lifecycle documentation (#95326) 2023-06-28 16:18:05 +03:00
rollup [DOCS] Add downsampling reference to rollup docs (#91295) 2022-11-08 10:02:17 -05:00
scripting [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
search Query phase: fold collector wrappers into a single top level collector (#97030) 2023-06-30 12:48:13 +02:00
search-application/apis Update Search Application API docs to discuss warnings (#97188) 2023-06-29 09:16:07 -04:00
searchable-snapshots Clarify searchable snapshot repository reliability (#93023) 2023-01-19 14:31:01 +02:00
settings Start with data stream lifecycle documentation (#95326) 2023-06-28 16:18:05 +03:00
setup Add transport version to main response (#96900) 2023-06-20 16:36:04 -04:00
shutdown/apis [DOCS] Fix typo in shutdown-put.asciidoc (#94234) 2023-03-01 15:31:23 +01:00
slm/apis [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
snapshot-restore [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
sql [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
tab-widgets Add shards capacity troubleshooting guide (#95208) 2023-04-19 09:24:07 +02:00
text-structure/apis [ML] Unmute text-structure docs test (#92224) 2022-12-08 09:19:41 +00:00
transform [DOCS] Fixes transform scheduled_now documentation (#96766) 2023-06-12 16:07:30 +02:00
troubleshooting Suggest capturing a heap dump to diagnose high heap (#96526) 2023-06-02 09:43:52 -04:00
upgrade Docs for snapshots as simple archives (#86261) 2022-05-30 13:23:53 +02:00
vectors [DOCS] Warn about calling vector functions repeatedly (#91864) 2022-12-12 09:43:46 +01:00
aggregations.asciidoc Convert bucket aggs docs to runtime fields (#71202) 2021-04-02 12:12:06 -04:00
alias.asciidoc [DOCS] Explain how to change aliases in data streams documentation (#94110) 2023-03-21 15:34:00 +01:00
analysis.asciidoc Update Lucene analysis base url (#84094) 2022-02-17 12:44:12 +01:00
api-conventions.asciidoc Fix a typo in api-conventions example (#88056) 2022-06-27 13:58:51 -04:00
cat.asciidoc [DOCS] Add documentation for cat component templates (#95035) 2023-04-05 16:51:11 +02:00
cluster.asciidoc Generalise new cluster info endpoint (#96259) 2023-05-23 16:30:56 +02:00
data-management.asciidoc Start with data stream lifecycle documentation (#95326) 2023-06-28 16:18:05 +03:00
data-rollup-transform.asciidoc [DOCS] Remove ifdefs for rollup refactor 2021-08-05 09:08:04 -04:00
datatiers.asciidoc [+Doc] Troubleshooting / Hot Spotting (#95429) 2023-04-26 12:29:47 -06:00
dependencies-versions.asciidoc [DOCS] Replace dependencies list with a link. Closes #84863 (#90694) 2022-11-09 14:37:55 -08:00
docs.asciidoc [DOCS] Update single index APIs reference (#73103) 2021-05-14 11:53:34 -04:00
geospatial-analysis.asciidoc Add Geospatial analysis overview documentation (#94486) 2023-03-20 10:01:13 -06:00
gs-index.asciidoc
high-availability.asciidoc [DOCS] Overhaul snapshot and restore docs (#79081) 2021-11-15 12:45:07 -05:00
how-to.asciidoc Add guide for tuning kNN search (#89782) 2022-10-12 14:53:53 -07:00
index-custom-title-page.html Add Geospatial analysis overview documentation (#94486) 2023-03-20 10:01:13 -06:00
index-modules.asciidoc Trigger refresh when shard becomes search active (#96321) 2023-06-15 07:25:37 +02:00
index.asciidoc [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
index.x.asciidoc
indices.asciidoc [DOCS] Add Downsampling docs (#88571) 2022-10-12 12:10:16 -04:00
ingest.asciidoc [DOCS] Remove redirect pages (#88738) 2023-05-24 12:32:46 +01:00
intro.asciidoc [DOCS] Update ES intro for stretched clusters (#77651) 2021-09-13 16:50:08 -04:00
links.asciidoc [DOCS] Rename ES Reference to ES Guide (#71198) 2021-04-01 15:38:41 -04:00
mapping.asciidoc Minor revision missed in merge. (#67282) 2021-01-11 13:50:06 -05:00
query-dsl.asciidoc [DOCS] Adds reference documentation to the text expansion query (#96151) 2023-05-17 09:39:23 +02:00
redirects.asciidoc Start with data stream lifecycle documentation (#95326) 2023-06-28 16:18:05 +03:00
release-notes.asciidoc Bump to version 8.10.0 2023-06-22 10:35:12 +01:00
scripting.asciidoc [DOCS] Add documentation for Painless field API (#83388) 2022-02-03 15:15:38 -05:00
search.asciidoc Add support for Reciprocal Rank Fusion to the search API (#93396) 2023-04-24 15:07:34 -07:00
setup.asciidoc Start with data stream lifecycle documentation (#95326) 2023-06-28 16:18:05 +03:00
troubleshooting.asciidoc Add note on jstack frequency for troubleshooting (#95764) 2023-05-03 10:04:13 +01:00
upgrade.asciidoc Reinstate prerelease upgrade warning (#90093) 2022-09-16 00:06:08 +09:30