elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 18:33:26 -04:00

Author	SHA1	Message	Date
Mayya Sharipova	405e39660b	Support k parameter for knn query (#110233 ) Introduce an optional k param for knn query If k is not set, knn query has the previous behaviour: - `num_candidates` docs is collected from each shard. This `num_candidates` docs are used for combining with results with other queries and aggregations on each shard. - docs from all shards are merged to produce the top global `size` results If k is set, the behaviour instead is following: - `k` docs is collected from each shard. This `k` docs are used for combining results with other queries and aggregations on each shard. - similarly, docs from all shards are merged to produce the top global `size` results. Having `k` param makes it more intuitive for users to address their needs. They also don't need to care and can skip `num_candidates` param for this query as it is of more internal details to tune how knn search operates. Closes #108473	2024-06-28 09:59:28 -04:00
Nick Tindall	8edb3b07e7	Make repository analysis API available to non-operators (#110179 ) Closes #100381	2024-06-28 09:07:20 +10:00
Kathleen DeRusso	19fc0d9cad	Deprecate text_expansion and weighted_tokens queries (#109880 )	2024-06-27 13:24:57 -04:00
Iván Cea Fontenla	fc0313f429	ESQL: Add aggregations testing base and docs (#110042 ) - Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation. - Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable - Added a `TopListTests` example - This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_ - Adapted Kibana docs to use `type: "agg"` (@drewdaemon) The current tests are very basic: Consume a page, generate an output, all in Single aggregation mode (No intermediates, no grouping). More complex testing will be added in future PRs Initial PR of https://github.com/elastic/elasticsearch/issues/109917	2024-06-27 21:21:55 +10:00
Jedr Blaszyk	5179b0db29	[Connector API] Update status when setting/resetting connector error (#110192 )	2024-06-27 12:17:33 +02:00
Benjamin Trent	5add44d7d1	Adds new `bit` element_type for dense_vectors (#110059 ) This commit adds `bit` vector support by adding `element_type: bit` for vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works with `hnsw` and `flat` index types. No quantization based codec works with this element type, this is consistent with `byte` vectors. `bit` vectors accept up to `32768` dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or a `byte[]` array where each element of the `byte` array represents `8` bits of the vector. `bit` vectors support script usage and regular query usage. When indexed, all comparisons done are `xor` and `popcount` summations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions. Note, indexed bit vectors require `l2_norm` to be the similarity. For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is `sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported. Note, the dimensions expected by this element_type are always to be divisible by `8`, and the `byte[]` vectors provided for index must be have size `dim/8` size, where each byte element represents `8` bits of the vectors. closes: https://github.com/elastic/elasticsearch/issues/48322	2024-06-27 04:48:41 +10:00
István Zoltán Szabó	31f0253b43	[DOCS] Adds link to ES-Cohere notebook and clarifies requirements. (#110195 )	2024-06-26 17:22:40 +02:00
Oleksandr Kolomiiets	b68e7d76c9	Remove obsolete sentence from TSDS docs (#110162 )	2024-06-26 08:21:52 -07:00
Kostas Krikellas	3afd53e26a	Remove `average` from downsampling statistics in documentation (#110189 )	2024-06-26 17:23:06 +03:00
Pius	79623c7609	Update search-application-api.asciidoc (#110113 ) Add a subsection about cross cluster search support (or the lack of).	2024-06-26 12:20:28 +02:00
David Kyle	3c1c8d0f32	[ML] Increase response size limit for batched requests (#110112 ) Increase the default to 50MB and do not retry when the limit is exceeded	2024-06-26 10:31:06 +01:00
Kathleen DeRusso	1f46a94dec	Add documentation for individual query rules (#110006 ) * Add individual query rule API docs * Update docs/reference/query-rules/apis/get-query-rule.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Update docs/reference/query-rules/apis/delete-query-rule.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Update docs/reference/query-rules/apis/get-query-rule.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * PR feedback --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2024-06-25 14:35:08 -04:00
Benjamin Trent	1c1733d823	Add some docs explaining filter performance and behavior for HNSW (#110108 )	2024-06-25 08:42:24 -04:00
Martijn van Groningen	851e955181	Remove obsolete information about tsdb dimensions limit. (#110047 )	2024-06-25 11:41:25 +02:00
Martijn van Groningen	1b0e800f5b	Add a note about enabling time series index mode via a component template (#110050 ) Closes #109149	2024-06-25 17:22:31 +10:00
Jedr Blaszyk	a257fed44b	[Connector API] Add metadata to sync job stats endpoint (#109927 )	2024-06-25 08:04:56 +02:00
Mayya Sharipova	5c87eef89d	[DOCS Vectors with cosine automatically normalized (#110071 ) PR #99445 introduced automatic normalization of dense vectors with cosine similarity. This adds a note about this in the documentation. Relates to #99445	2024-06-22 22:32:25 +10:00
Benjamin Trent	d97cb686a5	Correct positioning for unique token filter (#109395 ) This is an extension of: https://github.com/elastic/elasticsearch/pull/35420 closes: https://github.com/elastic/elasticsearch/issues/35411	2024-06-22 09:44:24 +10:00
Kathleen DeRusso	41a61b069b	Mark Query Rules as GA (#110004 ) * Mark query rules APIs as stable * Remove preview label from docs * Update docs/changelog/110004.yaml	2024-06-21 15:26:51 -04:00
Carlos Delgado	d332ed7d16	Enforce synonyms limit on APIs (#109981 )	2024-06-21 18:16:16 +02:00
Jan Kuipers	13478b2bca	Fix put inference API docs (#110025 ) * Fix put inference API docs * Update docs/changelog/110025.yaml * Delete docs/changelog/110025.yaml	2024-06-21 16:01:08 +02:00
Craig Taverner	536d614694	ES\|QL ST_DISTANCE Function (#108764 ) * WIP Started refactoring in preparation for ST_DISTANCE * Initial evaluators for ST_DISTANCE * Update docs/changelog/108764.yaml * Fix invalid changelog generated by CI * Register function and get unit tests working * Fixed failing meta function description tests, and refined descriptions * Added initial CsvTests and calculate Geo differently to Cartesian * Added more csv-spec tests and changed to arcDistance for accuracy * Added generated docs files * Link to generated docs * Fix examples tag for linking from generated docs * Skip wrapper function And note that we might want to include instead some of the related intelligence from Circle2D::HaversineDistance class * Added ST_DWITHIN and more tests for ST_DISTANCE and ST_DWITHIN * Code style * Added more tests, this time for sorting on distance * Fixes after rebase on main * The ST_DWITHIN cannot use BinarySpatialFunction because it is ternary So we moved the common code to a separate SpatialTypeResolver, and made a simpler TernarySpatialFunction based on a simple TernaryScalarFunction. This had additional consequences, simplifying the points-only cases. The main reason for this change was to support StDWithinTests which need to test a lot of things that involve varying all three input types, generating expected error strings, etc. The original hack of just adding to BinarySpatialFunction worked for the actual integration tests, but clearly did not satisfy all the use cases tested by the unit tests. We also restricted ST_DWITHIN to take only a double as the third argument, because otherwise the number of evaluators would explode, since we need a separate evaluator for each Block type, and Integer and Double use different block types. * Fixed function count after rebasing on main * Update docs/changelog/108764.yaml * Added generated docs for ST_DWITHIN * Connect docs for ST_DWITHIN * Add back issue link * Remove support for ST_DWITHIN * Update docs/changelog/108764.yaml * Bring back link to issue in changelog * Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/spatial/StDistance.java Co-authored-by: Ignacio Vera <iverase@gmail.com> * Revert reformatting of function descriptions We should put this into a separate PR * Github merged commit with incorrectly formatted whitespace --------- Co-authored-by: Ignacio Vera <iverase@gmail.com>	2024-06-21 11:59:44 +02:00
David Turner	5662f988b2	Remove trappy timeouts in snapshot APIs (#109828 ) Wholesale fix of every `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in `o.e.snapshots` and `o.e.repositories`, just pulling them up to the REST layer (where they become API params), the test suite (where they become `TEST_REQUEST_TIMEOUT`), or some other place where an explicit value is available. Relates #107984	2024-06-21 07:11:12 +10:00
Oleksandr Kolomiiets	8bc5ecdc31	Support synthetic source together with ignore_malformed in histogram fields (#109882 )	2024-06-20 09:09:45 -07:00
Liam Thompson	c6e21a9fd3	Fix Bulk Helpers link of Python (#108694 ) (#109939 ) Co-authored-by: Hasanul Islam <hasanuli10@gmail.com>	2024-06-20 02:19:44 +10:00
Niels Bauman	ba91bfdc94	Lazily create the failure store (#109289 ) Rather than initializing the failure store right away when a new data stream is created, we leave it empty and mark it for lazy rollover. This results in the failure store only being initialized (i.e. an index created) when a failure has actually occurred. The exception to the rule is when a failure occurs while the data stream is being auto-created. In that case, we do want to initialize the failure store right away.	2024-06-19 13:18:47 +02:00
Jim Ferenczi	a6470fb86d	Fix cluster level dense vector stats (#107962 ) The cluster level dense vector stats returns the total number of dense vector indices globally including the replicas. This commit fixes the total to only include the value count of the primary indices. This change aligns with the docs stats which also reports the number of primary documents when used in cluster stats. The indices stats API still reports granular results for replicas and primaries so the information is not lost.	2024-06-18 17:45:02 +01:00
Oleksandr Kolomiiets	5440f178aa	Support synthetic source for geo_point when ignore_malformed is used (#109651 )	2024-06-18 08:37:27 -07:00
Nik Everett	b35f0ed48d	ESQL: Make a table of all inline casts (#109713 ) This adds a test that generates `docs/reference/esql/functions/kibana/inline_cast.json` which is a json object who's keys are the names of valid inline casts and who's values are the resulting data types. I also moved one of the maps we use to make the inline casts to `DataType`, which is a place where we want it.	2024-06-18 06:23:11 -04:00
Ed Savage	c214457b39	[ML] Handle the "output memory allocator bytes" field (#109653 ) Handle the "output memory allocator bytes" field if and only if it is present in the model size stats, as reported by the C++ backend. This PR _must_ be merged prior to the corresponding `ml-cpp` one, to keep CI tests happy.	2024-06-18 15:25:05 +12:00
Benjamin Trent	acc99302c6	Adding hamming distance function to painless for dense_vector fields (#109359 ) This adds `hamming` distances, the pop-count of `xor` byte vectors as a first class citizen in painless. For byte vectors, this means that we can compute hamming distances via script_score (aka, brute-force). The implementation of `hamming` is the same that is available in Lucene, and when lucene 9.11 is merged, we should update our logic where applicable to utilize it. NOTE: this does not yet add hamming distance as a metric for indexed vectors. This will be a future PR after the Lucene 9.11 upgrade.	2024-06-18 03:41:20 +10:00
Kathleen DeRusso	8529bf71f6	Add SparseVectorStats (#108793 ) * Add SparseVectorStats * Update to use mappings in engine * Update to be unique to primary shards * Fix doc * Fix null error in test * Cleanup * fix yaml * remove comment * add version to yaml * Revert whitespace changes to stats doc * fix yml test * Checkstyle * Fix NPE in test * Update docs/changelog/108793.yaml * Add link to sparse_vector field type in docs * PR feedback * Flesh out test a bit more * PR feedback - alphabetize placement in docs * Fix doc change	2024-06-17 11:42:14 -04:00
shainaraskas	c97be9cbc7	rm remaining dsl technical preview notice (#109810 )	2024-06-17 10:38:19 -04:00
Benjamin Trent	3aed0afb2b	Add new int4 quantization to dense_vector (#109317 ) This adds a new quantization mechanism for HNSW and flat indices. Here we add `int4` quantization via the `int4_hnsw` and `int4_flat` index types. This quantization methodology further reduces the memory required for fast HNSW, meaning that the memory required is 8x smaller than with regular float32 values. 8x reduction means that 1M 1024 dimension vectors goes from requiring 3.8GB to 477MB. Recall continues to stay steady, there is some reduction that is recoverable via slightly oversampling and reranking. For example over 500k CohereV3 vectors, only 5 extra vectors are required to be gathered to achieve over 0.98 recall in a brute-force scenario. ![recall](`b47a79d0`-020d-4baa-8199-41a932df00f7)	2024-06-18 00:15:43 +10:00
David Turner	0131e80624	Revert "(+Doc) link split-brain wiki from quorom decision making (#108915 )" This reverts commit `4d3ca2d029`.	2024-06-16 08:54:44 +01:00
Nick Tindall	cd8b1f9dc9	Add wait_for_completion parameter to delete snapshot request (#109462 ) Closes #101300	2024-06-15 12:27:35 +10:00
Alexander Reelsen	4de67ad7f0	DocsStats: Add human readable bytesize (#109720 ) This adds support for the `human` parameter for DocsStats, as it was missing. Sample ``` GET _cluster/stats?human&filter_path=indices.docs ```	2024-06-15 08:20:04 +10:00
Nik Everett	2aade9dd66	ESQL: Warn about division (#109716 ) When you divide two integers or two longs we round towards 0. Like Postgres or Java or Rust or C. Other systems, like MySQL or SPL or Javascript or Python always produce a floating point number. We should warn folks about this. It's genuinely unexpected for some folks. OTOH, converting into a floating point number would be unexpected for other folks. Oh well, let's document what we've got.	2024-06-14 08:36:27 -04:00
Carlos Delgado	d10dfb4ac5	Add limitations section to semantic_text field type docs (#109666 )	2024-06-13 15:19:00 +02:00
Albert Zaharovits	0e4888bdec	Refactor field name translator of query endpoints for security entities (#109559 ) This is a refactoring of the internal logic that's used to translate query-level into index-level field names for query APIs for security entities (i.e. users, API Keys, and soon, roles). The objective here is to have and reuse a single class to handle all the translations for different security query APIs.	2024-06-13 14:12:19 +03:00
elasticsearchmachine	98d2f75564	Forward port release notes for v8.14.1 (#109641 )	2024-06-12 16:27:51 -04:00
Oleksandr Kolomiiets	c847235ed0	Support synthetic source for scaled_float and unsigned_long when ignore_malformed is used (#109506 )	2024-06-12 11:05:23 -07:00
shainaraskas	900eb82c99	[DOCS] Address local vs. remote storage + shard limits feedback (#109360 )	2024-06-12 13:50:23 -04:00
Luigi Dell'Aquila	47edae4fbd	ES\|QL: reduce memory footprint for MvAppendTests with shapes (#109517 ) Fixing MvAppendTests CB exceptions by generating smaller geometries: the test generates a lot of documents and the CB is too small for multiple big shapes. Fixes https://github.com/elastic/elasticsearch/issues/109409	2024-06-13 02:44:49 +10:00
Benjamin Trent	fdd183ddbd	Merge branch 'lucene_snapshot_9_11'	2024-06-12 10:51:02 -04:00
David Turner	366c0b16bf	Add docs on HTTP client config (#109543 ) Some notes and recommendations on timeouts and TCP keepalives. Relates INC-1049	2024-06-12 14:54:54 +01:00
Jonathan Buttner	6a1ece0c06	Adding input type to docs (#109588 )	2024-06-12 09:15:08 -04:00
Benjamin Trent	08298dcd69	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-12 08:05:36 -04:00
Liam Thompson	394d2b09a6	Revert "[DOCS] Remove ESQL demo env link from 8.14+ (#109562 )" (#109579 ) This reverts commit `0480c1acba`.	2024-06-11 17:04:37 +02:00
Nik Everett	c888e5f4cd	ESQL: Run LOOKUP docs test only in SNAPSHOT (#109493 ) LOOKUP is only registered on SNAPSHOT builds. closes #109478	2024-06-11 23:27:22 +10:00

... 17 18 19 20 21 ...

12605 commits