* Term Stats documentation
* Update docs/reference/reranking/learning-to-rank-model-training.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Fix query example.
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
(cherry picked from commit 0416812456)
Co-authored-by: Aurélien FOUCRET <aurelien.foucret@gmail.com>
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.
Closes#48465
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
While function scores using scripts do allow explanations, they are only
creatable with an expert plugin. This commit improves the situation for
the newer script score query by adding the ability to set the
explanation from the script itself.
To set the explanation, a user would check for `explanation != null` to
indicate an explanation is needed, and then call
`explanation.set("some description")`.
* Add l1norm and l2norm distances for vectors
Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947
* Address Christoph's feedback
- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims
* Made examples more realistic
Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.
Typically, dense vectors of both documents and queries must have the same
number of dimensions. Different number of dimensions among documents
or query vector indicate an error. This PR enforces that all vectors
for the same field have the same number of dimensions. It also enforces
that query vectors have the same number of dimensions.
Negative scores are no longer allowed, but the cosine similarity between two
vectors lies in the range [-1, 1], and dot products can also be negative. This commit
updates the documentation with an example of how to avoid negative scores.
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.
This function produces different
random scores on different index shards.
It is also able to produce random scores
based on the internal Lucene Document Ids.
Ranaming as follows:
feature -> rank_feature
feature_vector -> rank_features
feature query -> rank_feature query
Ranaming is done to distinguish from other vector types.
Closes#36723