[[query-dsl-knn-query]] === Knn query ++++ Knn ++++ Finds the _k_ nearest vectors to a query vector, as measured by a similarity metric. _knn_ query finds nearest vectors through approximate search on indexed dense_vectors. The preferred way to do approximate kNN search is through the <> of a search request. _knn_ query is reserved for expert cases, where there is a need to combine this query with other queries. [[knn-query-ex-request]] ==== Example request [source,console] ---- PUT my-image-index { "mappings": { "properties": { "image-vector": { "type": "dense_vector", "dims": 3, "index": true, "similarity": "l2_norm" }, "file-type": { "type": "keyword" } } } } ---- . Index your data. + [source,console] ---- POST my-image-index/_bulk?refresh=true { "index": { "_id": "1" } } { "image-vector": [1, 5, -20], "file-type": "jpg" } { "index": { "_id": "2" } } { "image-vector": [42, 8, -15], "file-type": "png" } { "index": { "_id": "3" } } { "image-vector": [15, 11, 23], "file-type": "jpg" } ---- //TEST[continued] . Run the search using the `knn` query, asking for the top 3 nearest vectors. + [source,console] ---- POST my-image-index/_search { "size" : 3, "query" : { "knn": { "field": "image-vector", "query_vector": [-5, 9, -12], "num_candidates": 10 } } } ---- //TEST[continued] NOTE: `knn` query doesn't have a separate `k` parameter. `k` is defined by `size` parameter of a search request similar to other queries. `knn` query collects `num_candidates` results from each shard, then merges them to get the top `size` results. [[knn-query-top-level-parameters]] ==== Top-level parameters for `knn` `field`:: + -- (Required, string) The name of the vector field to search against. Must be a <>. -- `query_vector`:: + -- (Required, array of floats) Query vector. Must have the same number of dimensions as the vector field you are searching against. -- `num_candidates`:: + -- (Required, integer) The number of nearest neighbor candidates to consider per shard. Cannot exceed 10,000. {es} collects `num_candidates` results from each shard, then merges them to find the top results. Increasing `num_candidates` tends to improve the accuracy of the final results. -- `filter`:: + -- (Optional, query object) Query to filter the documents that can match. The kNN search will return the top documents that also match this filter. The value can be a single query or a list of queries. If `filter` is not provided, all documents are allowed to match. The filter is a pre-filter, meaning that it is applied **during** the approximate kNN search to ensure that `num_candidates` matching documents are returned. -- `similarity`:: + -- (Optional, float) The minimum similarity required for a document to be considered a match. The similarity value calculated relates to the raw <> used. Not the document score. The matched documents are then scored according to <> and the provided `boost` is applied. -- `boost`:: + -- (Optional, float) Floating point number used to multiply the scores of matched documents. This value cannot be negative. Defaults to `1.0`. -- `_name`:: + -- (Optional, string) Name field to identify the query -- [[knn-query-filtering]] ==== Pre-filters and post-filters in knn query There are two ways to filter documents that match a kNN query: . **pre-filtering** – filter is applied during the approximate kNN search to ensure that `k` matching documents are returned. . **post-filtering** – filter is applied after the approximate kNN search completes, which results in fewer than k results, even when there are enough matching documents. Pre-filtering is supported through the `filter` parameter of the `knn` query. Also filters from <> are applied as pre-filters. All other filters found in the Query DSL tree are applied as post-filters. For example, `knn` query finds the top 3 documents with the nearest vectors (num_candidates=3), which are combined with `term` filter, that is post-filtered. The final set of documents will contain only a single document that passes the post-filter. [source,console] ---- POST my-image-index/_search { "size" : 10, "query" : { "bool" : { "must" : { "knn": { "field": "image-vector", "query_vector": [-5, 9, -12], "num_candidates": 3 } }, "filter" : { "term" : { "file-type" : "png" } } } } } ---- //TEST[continued] [[knn-query-with-nested-query]] ==== Knn query inside a nested query `knn` query can be used inside a nested query. The behaviour here is similar to <>: * kNN search over nested dense_vectors diversifies the top results over the top-level document * `filter` over the top-level document metadata is supported and acts as a post-filter * `filter` over `nested` field metadata is not supported A sample query can look like below: [source,js] ---- { "query" : { "nested" : { "path" : "paragraph", "query" : { "knn": { "query_vector": [ 0.45, 45 ], "field": "paragraph.vector", "num_candidates": 2 } } } } } ---- // NOTCONSOLE [[knn-query-aggregations]] ==== Knn query with aggregations `knn` query calculates aggregations on `num_candidates` from each shard. Thus, the final results from aggregations contain `num_candidates * number_of_shards` documents. This is different from the <> where aggregations are calculated on the global top k nearest documents.