elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 09:28:55 -04:00

Author	SHA1	Message	Date
Colleen McGinnis	ab5ff67bce	[docs] Add `products` to `docset.yml` (#128274 ) * add products to docset.yml * add page-level painless tags	2025-05-21 13:55:32 -05:00
Kathleen DeRusso	b335c1a8eb	Fix: Add NamedWriteable for RuleQueryRankDoc (#128153 ) * Add NamedWriteable for QueryRule rank doc * Update test * Update docs/changelog/128153.yaml * Add multi cluster test for query rules * Commenting out code - explicitly trying to spur a test failure * [CI] Auto commit changes from spotless * Streamline test for multi cluster * Revert changes to try to break test * Fix compile error --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>	2025-05-21 13:19:05 -04:00
Liam Thompson	960222e0dc	[DOCS] Make ESQL functions/operators/commands overview accordions open by default (#128197 )	2025-05-21 12:08:04 +02:00
kanoshiou	88b61e3621	ESQL: Avoid unintended attribute removal (#127563 ) --------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>	2025-05-21 10:19:10 +03:00
Keith Massey	928fe1ee69	System data streams incorrectly show up in the list of template validation problems (#128161 )	2025-05-20 16:14:57 +01:00
Fabrizio Ferri-Benedetti	d10ef76ba3	[DOCS] Replace irregular whitespaces in docs (#128199 ) * Replace irregular whitespaces * More chars	2025-05-20 16:20:22 +02:00
Pat Whelan	c0f5e00378	[Transform] Check alias during update (#124825 ) When the Transform System Index has been reindexed and aliased, we should check the Transform Update index against the alias when updating the Transform Config.	2025-05-20 09:16:52 -04:00
Iván Cea Fontenla	07aff0a739	ESQL: Limit Replace function memory usage (#127924 ) The Replace string result limit was fixed to 1MB, same as Repeat	2025-05-20 13:59:48 +01:00
kanoshiou	557f1f12b3	ESQL: Fix alias removal in regex extraction with `JOIN` (#127687 ) * Disallow removal of regex extracted fields --------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>	2025-05-20 15:06:24 +03:00
Jeremy Dahlgren	4e7b99cc73	Add cancellation support in TransportGetAllocationStatsAction (#127371 ) Replaces the use of a SingleResultDeduplicator by refactoring the cache as a subclass of CancellableSingleObjectCache. Refactored the AllocationStatsService and NodeAllocationStatsAndWeightsCalculator to accept the Runnable used to test for cancellation. Closes #123248	2025-05-20 07:25:19 -04:00
David Turner	18c60791c3	Make S3 custom query parameter optional (#128043 ) Today Elasticsearch will record the purpose for each request to S3 using a custom query parameter[^1]. This isn't believed to be necessary outside of the ECH/ECE/ECK/... managed services, and it adds rather a lot to the request logs, so with this commit we make the feature optional and disabled by default. [^1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html#LogFormatCustom	2025-05-20 17:14:39 +10:00
Nhat Nguyen	c2561b5cba	Fix union types in CCS (#128111 ) Currently, union types in CCS is broken. For example, FROM *:remote-indices \| EVAL port = TO_INT(port) returns all nulls if the types of the port field conflict. This happens because converters are a map of the fully qualified cluster:index -name (defined in MultiTypeEsField), but we are looking up the converter using only the index name, which leads to a wrong or missing converter on remote clusters. Our tests didn't catch this because MultiClusterSpecIT generates the same index for both clusters, allowing the local converter to be used for remote indices.	2025-05-19 22:47:30 -07:00
David Turner	a84dff876e	More efficient sort in `tryRelocateShard` (#128063 ) No need to do this via an allocation-heavy `Stream`, we can just put the objects straight into an array, sort them in-place, and keep hold of the array to avoid having to allocate anything on the next iteration. Also slims down `BY_DESCENDING_SHARD_ID`: it's always sorting the same index so we don't need to look at `ShardId#index` in the comparison, nor do we really need multiple layers of vtable lookups, we can just compare the shard IDs directly. Relates #128021	2025-05-20 05:45:31 +10:00
Jan-Kazlouski-elastic	d1ad917855	Add Hugging Face Chat Completion support to Inference Plugin (#127254 ) * Add Hugging Face Chat Completion support to Inference Plugin * Add support for streaming chat completion task for HuggingFace * [CI] Auto commit changes from spotless * Add support for non-streaming completion task for HuggingFace * Remove RequestManager for HF Chat Completion Task * Refactored Hugging Face Completion Service Settings, removed Request Manager, added Unit Tests * Refactored Hugging Face Action Creator, added Unit Tests * Add Hugging Face Server Test * [CI] Auto commit changes from spotless * Removed parameters from media type for Chat Completion Request and unit tests * Removed OpenAI default URL in HuggingFaceService's configuration, fixed formatting in InferenceGetServicesIT * Refactor error message handling in HuggingFaceActionCreator and HuggingFaceService * Update minimal supported version and add Hugging Face transport version constants * Made modelId field optional in HuggingFaceChatCompletionModel, updated unit tests * Removed max input tokens field from HuggingFaceChatCompletionServiceSettings, fixed unit tests * Removed if statement checking TransportVersion for HuggingFaceChatCompletionServiceSettings constructor with StreamInput param * Removed getFirst() method calls for backport compatibility * Made HuggingFaceChatCompletionServiceSettingsTests extend AbstractBWCWireSerializationTestCase for future serialization testing * Refactored tests to use stripWhitespace method for readability * Refactored javadoc for HuggingFaceService * Renamed HF chat completion TransportVersion constant names * Added random string generation in unit test * Refactored javadocs for HuggingFace requests * Refactored tests to reduce duplication * Added changelog file * Add HuggingFaceChatCompletionResponseHandler and associated tests * Refactor error handling in HuggingFaceServiceTests to standardize error response codes and types * Refactor HuggingFace error handling to improve response structure and add streaming support * Allowing null function name for hugging face models --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co>	2025-05-19 17:37:19 +01:00
kanoshiou	54f26680ea	ESQL: Keep `DROP` attributes when resolving field names (#127009 )	2025-05-19 18:41:13 +03:00
Ryan Ernst	d6ffe01122	Avoid nested docs in painless execute api (#127991 ) Painless does not support accessing nested docs (except through _source). Yet the painless execute api indexes any nested docs that are found when parsing the sample document. This commit changes the ram indexing to only index the root document, ignoring any nested docs. fixes #41004	2025-05-19 08:18:09 -07:00
Johannes Fredén	acc8ae74af	Add Microsoft Graph Delegated Authorization Realm Plugin (#127910 ) * Add Microsoft Graph Delegated Authorization Realm Plugin * Update docs/changelog/127910.yaml	2025-05-19 14:15:28 +02:00
David Turner	20c02f430d	Set `connection: close` header on shutdown (#128025 ) Lets clients using HTTP pipelining know to cease usage of connections to shutting-down nodes. Closes #127984	2025-05-19 06:14:46 +10:00
David Turner	cd1fa77990	Add missing entitlement to `repository-azure` (#128047 ) This entitlement is required, but only if validating the metadata endpoint against `https://login.microsoft.com/` which isn't something we can do in a test. Kind of a SDK bug, we should be using an existing event loop rather than spawning threads randomly like this.	2025-05-14 09:28:15 +01:00
Ian Wagner	d4b387c015	Minor subject/verb agreement fix (#127955 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2025-05-13 12:59:42 +01:00
István Zoltán Szabó	989032be19	Updates 9.0 breaking changes with ES\|QL item. (#128038 )	2025-05-12 13:19:30 -07:00
Nhat Nguyen	1609bb0d5a	Add emit time to hash aggregation status (#127988 ) The hash aggregation operator may take time to emit the output pages, including keys and aggregated values. This change adds an emit_time field to the status. While I considered including this in hash_nanos and aggregation_nanos, having a separate section feels more natural. I am open to suggestions.	2025-05-11 09:22:15 -07:00
Parker Timmins	c04a9569fe	Do not respect synthetic_source_keep=arrays if type parses arrays (#127796 ) Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type.	2025-05-09 14:49:15 -05:00
Nik Everett	da553b11e3	Fix a bug in significant_terms (#127975 ) Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.	2025-05-09 13:48:19 -04:00
Nhat Nguyen	2ea23da0af	Ensure ordinal builder emit ordinal blocks (#127949 ) Currently, if a field has high cardinality, we may mistakenly disable emitting ordinal blocks. For example, with 10,000 `tsid` values, we never emit ordinal blocks during reads, even though we could emit blocks for 10 `tsid` values across 1,000 positions. This bug disables optimizations for value aggregation and block hashing. This change tracks the minimum and maximum seen ordinals and uses them as an estimate for the number of ordinals. However, if a page contains `ord=1` and `ord=9999`, ordinal blocks still won't be emitted. Allocating a bitset or an array for `value_count` could track this more accurately but would require additional memory. I need to think about this trade off more before opening another PR to fix this issue completely. This is a quick, contained fix that significantly speeds up time-series aggregation (and other queries too). The execution time of this query is reduced from 3.4s to 1.9s with 11M documents. ``` POST /_query { "profile": true, "query": "TS metrics-hostmetricsreceiver.otel-default \| STATS cpu = avg(avg_over_time(`metrics.system.cpu.load_average.1m`)) BY host.name, BUCKET(@timestamp, 5 minute)" } ``` ``` "took": 3475, "is_partial": false, "documents_found": 11368089, "values_loaded": 34248167 ``` ``` "took": 1965, "is_partial": false, "documents_found": 11368089, "values_loaded": 34248167 ```	2025-05-09 09:03:47 -07:00
Carlos Delgado	7ff25b5c86	ES\|QL - Allow full text functions to be used in STATS ... WHERE (#125479 )	2025-05-09 15:57:04 +02:00
Chris Hegarty	1ed02784f6	[9.x] Revert "Enable madvise by default for all builds (#110159 )" (#127921 ) 9.x port of: Revert "Enable madvise by default for all builds (#110159)" #126308 This change did not apply cleanly. In fact this is not strictly a revert, since the change was never actually in 9.x post the Lucene 10 upgrade. However, the semantics of the change still apply - avoid RANDOM everywhere. Even though in 9.x we do set -Dorg.apache.lucene.store.defaultReadAdvice=normal, it is not enough to avoid RANDOM when random is explicitly requested by code.	2025-05-09 10:01:03 +01:00
Ryan Ernst	ab690ba23f	Check hidden frames in entitlements (#127877 ) Entitlements do a stack walk to find the calling class. When method refences are used in a lambda, the frame ends up hidden in the stack walk. In the case of using a method reference with AccessController.doPrivileged, the call looks like it is the jdk itself, so the call is trivially allowed. This commit adds hidden frames to the stack walk so that the lambda frame created for the method reference is included. Several internal packages are then necessary to filter out of the stack.	2025-05-08 16:59:03 -07:00
Nik Everett	3551494b9a	ESQL: `text ==` and `text !=` pushdown (#127355 ) Reenables `text ==` pushdown and adds support for `text !=` pushdown. It does so by making `TranslationAware#translatable` return something we can turn into a tri-valued function. It has these values: * `YES` * `NO` * `RECHECK` `YES` means the `Expression` is entirely pushable into Lucene. They will be pushed into Lucene and removed from the plan. `NO` means the `Expression` can't be pushed to Lucene at all and will stay in the plan. `RECHECK` mean the `Expression` can push a query that makes candidate matches but must be rechecked. Documents that don't match the query won't match the expression, but documents that match the query might not match the expression. These are pushed to Lucene and left in the plan. This is required because `txt != "b"` can build a candidate query against the `txt.keyword` subfield but it can't be sure of the match without loading the `_source` - which we do in the compute engine. I haven't plugged rally into this, but here's some basic performance tests: ``` Before: not text eq {"took":460,"documents_found":1000000} text eq {"took":432,"documents_found":1000000} After: text eq {"took":5,"documents_found":1} not text eq {"took":351,"documents_found":800000} ``` This comes from: ``` rm -f /tmp/bulk* for a in {1..1000}; do echo '{"index":{}}' >> /tmp/bulk echo '{"text":"text '$(printf $(($a % 5)))'"}' >> /tmp/bulk done ls -l /tmp/bulk* passwd="redacted" curl -sk -uelastic:$passwd -HContent-Type:application/json -XDELETE https://localhost:9200/test curl -sk -uelastic:$passwd -HContent-Type:application/json -XPUT https://localhost:9200/test -d'{ "settings": { "index.codec": "best_compression", "index.refresh_interval": -1 }, "mappings": { "properties": { "many": { "enabled": false } } } }' for a in {1..1000}; do printf %04d: $a curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST https://localhost:9200/test/_bulk?pretty --data-binary @/tmp/bulk \| grep errors done curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST https://localhost:9200/test/_forcemerge?max_num_segments=1 curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST https://localhost:9200/test/_refresh echo curl -sk -uelastic:$passwd https://localhost:9200/_cat/indices?v text_eq() { echo -n " text eq " curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST 'https://localhost:9200/_query?pretty' -d'{ "query": "FROM test \| WHERE text == \"text 1\" \| STATS COUNT()", "pragma": { "data_partitioning": "shard" } }' \| jq -c '{took, documents_found}' } not_text_eq() { echo -n "not text eq " curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST 'https://localhost:9200/_query?pretty' -d'{ "query": "FROM test \| WHERE NOT text == \"text 1\" \| STATS COUNT()", "pragma": { "data_partitioning": "shard" } }' \| jq -c '{took, documents_found}' } for a in {1..100}; do text_eq not_text_eq done ```	2025-05-08 10:00:05 -04:00
Craig Taverner	7d06f815f3	Initial kibana definition files for command, currently only providing License information (#127829 ) Initial Kibana definition files for commands, currently only providing License information. We leave the license field out if it works with BASIC, so the only two files that actually have a license line are: * CHANGE_POINT: PLATINUM * RRF: ENTERPRISE	2025-05-08 09:39:34 +02:00
David Turner	85d9990d70	Replace auto-read with proper flow-control in HTTP pipeline (#127817 ) Re-applying #126441 (cf. #127259) with: - the extra `FlowControlHandler` needed to ensure one-chunk-per-read semantics (also present in #127259). - no extra `read()` after exhausting a `Netty4HttpRequestBodyStream` (the bug behind #127391 and #127391). See #127111 for related tests.	2025-05-08 17:35:10 +10:00
Mary Gouseti	077b6b949b	Skip the validation when retrieving the index mode during reindexing a time series data stream. (#127824 ) During reindexing we retrieve the index mode from the template settings. However, we do not fully resolve the settings as we do when validating a template or when creating a data stream. This results on throwing the error reported in #125607. I do not see a reason to not fix this as suggested in #125607 (comment). Fixes: #125607	2025-05-08 10:25:53 +03:00
Mary Gouseti	e97efd264d	Change the handling of passthrough dimenensions (#127752 ) When downsampling an index that has a mapping with passthrough dimensions the downsampling process identifies the wrapper object as a dimension and it fails when it tried to retrieve the type. We did some prework to establish a shared framework in the internalClusterTest. For now it only includes setting up time series data stream helpers and a limited assertion helper for dimensions and metrics. This allows us to setup an internalClusterTest that captures this issue during downsampling in #125156. To fix this we refine the check that determines if a field is dimension, to skip wrapper field. Fixes #125156.	2025-05-08 09:04:41 +03:00
Jan Kuipers	9cf2a64067	ES\|QL SAMPLE aggregation function (#127629 ) * ES\|QL SAMPLE aggregation function * [CI] Auto commit changes from spotless * ThreadLocalRandom -> SplittableRandom * Update docs/changelog/127629.yaml * fix yaml test * Add SampleTests * docs + example * polish code * mark generated imports * comment with algorith description * use Randomness.get() * close properly * type checks * reuse hash * regen some files * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>	2025-05-08 08:01:53 +02:00
Nhat Nguyen	7b87266d6c	Optimize ordinal inputs in Values aggregation (#127849 ) Currently, time-series aggregations use the `values` aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using `values`, and the inputs are likely ordinal blocks. This change speeds up the `values` aggregation when the inputs are ordinal-based. Execution time reduced from 461ms to 192ms for 1000 groups. ``` ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 461.938 ± 6.089 ms/op ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 192.898 ± 1.781 ms/op ```	2025-05-07 18:24:27 -07:00
Jonathan Buttner	746b1df367	[ML] Fixing Google Vertex AI Rerank task type location field (#127856 ) * Fixing rerank location * Update docs/changelog/127856.yaml * Refactor changelog	2025-05-07 17:47:38 -04:00
Oleksandr Kolomiiets	5d6dffaa51	Fix more typos in new text docs (#127855 )	2025-05-08 06:20:08 +10:00
Charlotte Hoblik	d0e3af7990	[DOCS]: Add connector release notes page for 9.x (#127803 ) * Add connector release notes page * Add 9.0.0 release notes * Add 9.0.1 Release notes * Update docs/reference/search-connectors/release-notes.md Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Align IDs to MINOR_VERSION variable * Update docs/reference/search-connectors/release-notes.md Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2025-05-07 17:29:11 +02:00
David Turner	4ba94c7dfd	Handle streaming request body in audit log (#127798 ) The audit event for a successfully-authenticated REST request occurs when we start to process the request. For APIs that accept a streaming request body this means we have received the request headers, but not its body, at the time of the audit event. Today such requests will fail with a `ClassCastException` if the `emit_request_body` flag is set. This change fixes the handling of streaming requests in the audit log to now report that the request body was not available when writing the audit entry.	2025-05-08 01:16:44 +10:00
Keith Massey	47909108f0	Adding a known issue for watcher during upgrade (#127834 )	2025-05-07 10:09:24 -05:00
Arianna Laudazzi	afbd3319c1	[Reference] Revisit ES and index management landing page (#127571 ) * Update landing page * Fix links * Update docs/reference/elasticsearch/index.md Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2025-05-07 15:56:57 +02:00
Iván Cea Fontenla	a2fc1caa32	ESQL: Specialize aggs AddInput for each block type (#127582 ) * Specialize block parameters on AddInput (cherry picked from commit `a5855c1664`) * Call the specific add() methods for eacj block type (cherry picked from commit `5176663f43`) * Implement custom add in HashAggregationOperator (cherry picked from commit `fb670bdbbc`) * Migrated everything to the new add() calls * Update docs/changelog/127582.yaml * Spotless format * Remove unused ClassName for IntVectorBlock * Fixed tests * Randomize groupIds block types to check most AddInput cases * Minor fix and added some docs * Renamed BlockHashWrapper	2025-05-07 12:21:41 +02:00
Richard Dennehy	736e2e6eb7	add documentation for JWT realm proxy settings (#127605 )	2025-05-07 10:31:31 +01:00
David Turner	d934a0c540	Reinstate use of S3 `protocol` client setting (#127744 ) The `s3.client.CLIENT_NAME.protocol` setting became unused in #126843 as it is inapplicable in the v2 SDK. However, the v2 SDK requires the `s3.client.CLIENT_NAME.endpoint` setting to be a URL that includes a scheme, so in #127489 we prepend a `https://` to the endpoint if needed. This commit generalizes this slightly so that we prepend `http://` if the endpoint has no scheme and the `.protocol` setting is set to `http`.	2025-05-07 10:02:22 +01:00
Alexander Spies	9e3ae5b224	ESQL: Document LU JOIN/MV_EXPAND not respecting SORT (#127718 )	2025-05-07 10:59:48 +02:00
Craig Taverner	543aeb8c19	Output function signature license requirements to Kibana definitions (#127717 ) Output function signature license requirements to Kibana definition files, and also test that this matches the actual licensing behaviour of the functions. ES\|QL functions that enforce license checks do so with the `LicenseAware` interface. This does not expose what that functions license level is, but only whether the current active license will be sufficient for that function and its current signature (data types passed in as fields). Rather than add to this interface, we've made the license level information test-only information. This means if a function implements LicenseAware, it also needs to add a method to its test class to specify the license level for the signature being called. All functions will be tested for compliance, so failing to add this will result in test failure. Also if the test license level does not match the enforced license, that will also cause a failure.	2025-05-07 10:02:17 +02:00
Arianna Laudazzi	1df4a90943	[Reference] Revisit query language landing page (#127632 ) * Update query languauge landing page * Update index.md * Update docs/reference/query-languages/index.md Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2025-05-07 08:44:49 +02:00
Johannes Fredén	bb9d1d6232	Add Support for Providing a custom ServiceAccountTokenStore through SecurityExtensions (#126612 ) * Add Project Service Account Auth	2025-05-07 08:13:39 +02:00
Arianna Laudazzi	e9fe219067	[Reference] Revisit scripting language landing page (#127675 ) * Update scripting language landing page * Update index.md	2025-05-07 08:02:12 +02:00
Arianna Laudazzi	d90121f048	Update es plugins landing page (#127682 )	2025-05-07 07:51:22 +02:00

1 2 3 4 5 ...

18179 commits