elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 01:44:36 -04:00

Author	SHA1	Message	Date
Rene Groeschke	ae569def9c	[Build] Require reason for usesDefaultDistribution (#124707 ) This makes using usesDefaultDistribution in our test setup for explicit by requiring a reason why it's needed. This is helpful as part of revisiting the need for all those usages in our code base.	2025-03-17 08:25:39 +01:00
Moritz Mack	a608f0626e	Added query param `?include_source_on_error` for ingest requests (#120725 ) A new query parameter `?include_source_on_error` was added for create / index, update and bulk REST APIs to control if to include the document source in the error response in case of parsing errors. The default value is `true`.	2025-01-28 09:33:22 +01:00
Martijn van Groningen	e833e7b6c4	Add feature flag for subobjects auto (#114616 )	2024-10-12 18:55:27 +02:00
Chris Hegarty	32dde26e49	Upgrade to Lucene 9.12.0 (#113333 ) This commit upgrades to Lucene 9.12.0. Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Armin Braun <me@obrown.io> Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Chris Hegarty <chegar999@gmail.com> Co-authored-by: John Wagster <john.wagster@elastic.co> Co-authored-by: Luca Cavanna <javanna@apache.org> Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2024-10-01 08:39:27 +01:00
Salvatore Campagna	208a1fe571	Introduce an `ignore_above` index-level setting (#113121 ) Here we introduce a new index-level setting, `ignore_above`, similar to what we have for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened` fields. Each field mapping will still be allowed to override the index-level setting using a mapping-level `ignore_above` value.	2024-09-23 18:05:02 +02:00
Mark Vieira	a59c182f9f	Add AGPLv3 as a supported license	2024-09-13 15:29:46 -07:00
Simon Cooper	a36d90cf34	Use CLDR locale provider on JDK 23+ (#110222 ) JDK 23 removes the COMPAT locale provider, leaving CLDR as the only option. This commit configures Elasticsearch to use the CLDR provider when on JDK 23, but still use the existing COMPAT provider when on JDK 22 and below. This causes some differences in locale behaviour; this also adapts various tests to still work whether run on COMPAT or CLDR.	2024-09-04 13:42:40 +01:00
Mikhail Berezovskiy	1163d2e4f9	Rename streamContent/Separator to bulkContent/Separator (#111716 ) Rename `xContent.streamSeparator()` and `RestHandler.supportsStreamContent()` to `xContent.bulkSeparator()` and `RestHandler.supportsBulkContent()`. I want to reserve use of "supportsStreamContent" for current work in HTTP layer to [support incremental content handling](https://github.com/elastic/elasticsearch/pull/111438) besides fully aggregated byte buffers. `supportsStreamContent` would indicate that handler can parse chunks of http content as they arrive.	2024-08-09 06:32:20 +10:00
Alexander Spies	e540732e39	Aggs: Scripted metric allow list (#109444 ) Introduces new cluster settings that allow only a certain set of scripts in scripted metrics aggregations: - search.aggs.only_allowed_metric_scripts, defaults to false - search.aggs.allowed_inline_metric_scripts, defaults to empty list - search.aggs.allowed_stored_metric_scripts, defaults to empty list	2024-06-12 14:23:03 +02:00
Jim Ferenczi	4380cd1bd5	Allow rescorer with field collapsing (#107779 ) This change adds the support for rescoring collapsed documents. The rescoring is applied on the top document per group on each shard. Closes #27243	2024-04-29 08:48:12 +01:00
eyalkoren	6f4e293d29	Add `require_data_stream` feature (#101872 ) Closes #97032 Adding the ability to set `require_data_stream` parameter (boolean) on bulk and indexing APIs. For document indexing, this flag requires the indexing operation to either be pointed at a data stream, or match a template that will create a data stream.	2024-01-18 09:15:48 -07:00
Lorenzo Dematté	2b175653d9	YAML test framework: separate `skip` and `requires` sections (#104140 ) * Introduce Prerequisites criteria (Predicate + factory) for modular skip decisions - Removed accessors to specific criteria from SkipSection (used only on tests), adjusted test assertions - Moved Features check (YAML test runner features) to SkipSection build time * Separated check for xpack/no_xpack Check for xpack is cluster-configuration (modules installed) dependent, while Features are meant to be "static" test-runner capabilities. We separate them so checks on one (test-runner features) can be run before and separately from the other. * Consolidate skip() methods - Divide require and skip predicates - Divide requires and skip parsing (distinct sections) - Renaming SkipSection to PrerequisiteSection and related methods/fields (e.g. skip -> evaluate) * Refactoring tests - moving and adding VersionRange tests - adding specific version and os skip tests - modified parse/validate/build to make SkipSection more unit-testable * Adding cluster feature-based skip criteria * Updated javadoc + renaming + better skip reason message	2024-01-15 14:48:36 +01:00
Luca Cavanna	9cd96df179	Add support for index_filter to open pit (#102388 ) The open point in time API accepts a list of indices and opens a point in time view against those indices. Like we do already for field caps, this commit allows users to provide an index_filter parameter as part of the request body, that will be used to execute the can match phase and exclude the indices that can't possibly match such filter. Closes #99740	2023-11-21 15:35:49 +01:00
Keith Massey	92ec9d6605	Add executed pipelines to bulk api response (#100031 ) This change allows users to pass a new list_executed_pipelines parameter to the bulk API, which results in an executed_pipelines list being returned.	2023-10-17 09:39:09 -05:00
Albert Zaharovits	d6df838307	Refactor REST tests to the new internal cluster rule orchestration (#100399 ) This PR is migrating some of the ITs that use either the `elasticsearch.legacy-java-rest-test` or the `elasticsearch.legacy-yaml-rest-test` gradle test plugins to the new `elasticsearch.internal-java-rest-test` and `elasticsearch.internal-yaml-rest-test` equivalents. This is the list of the affected ITs: * SamlAuthenticationIT * OperatorPrivilegesIT * ProfileIT * SetSecurityUserProcessorWithWithSecurityDisabledIT * AsyncSearchSecurityIT * SecurityRealmSmokeTestCase * KibanaSystemIndexIT * KerberosAuthenticationIT * ReindexWithSecurityIT and ReindexWithSecurityClientYamlTestSuiteIT * ReloadSecureSettingsWithPasswordProtectedKeystoreRestIT * PermissionsIT from slm:qa:with-security * Permissions IT from runtime-fields:with-security * Permissions IT from ilm:qa:with-securiy * GraphWithSecurityIT and GraphWithSecurityInsufficientRoleIT Related: ES-6751	2023-10-17 07:42:43 -04:00
Armin Braun	b7eafce32c	Make some practically static methods static (#97565 ) Another round of automated fixes to this, marking things that can be made static as static. Saves some JIT cycles but also turns some lambdas from capturing to non-capturing and makes the "utilityness" of some classes visible.	2023-10-06 23:37:07 +02:00
Mayya Sharipova	f8c626f792	Track max_score in collapse when requested (#97703 ) Before we used to track max_score in collapse when requested (track_scores=true) or when there is no sort in collapse (see PR#27122). But this feature was lost through refactoring and changes. This PR restores this feature. Closes #97653	2023-07-17 06:48:00 -04:00
eyalkoren	3d36b08d28	Fix `fields` API with `subobjects: false` (#97092 )	2023-07-12 11:35:18 +03:00
Martijn van Groningen	b11cbd43dd	Move matrix stats to aggregations module (#92435 ) Running the `matrix_stats_multi_value_field.yaml` test in multi node test cluster showed a bug, see: `88758ab577` Also removes `MatrixStats` interface, removed usage of deprecated ValueType enum and removed unused generic usage. Relates to #90283	2022-12-22 03:15:05 -05:00
Mark Vieira	c2eda511de	Add JUnit rule based integration test cluster orchestration framework (#92379 ) This commit adds a new test framework for configuring and orchestrating test clusters for both Java and YAML REST testing. This will eventually replace the existing "test-clusters" Gradle plugin and the build-time cluster orchestration.	2022-12-21 15:33:46 -08:00
Iraklis Psaroudakis	756fcc212d	Log YAML test file on failure (#91349 ) Relates #91081	2022-11-09 18:35:36 +02:00
Slobodan Adamović	6f4cee4737	Remove HLRC from security integration tests (#91088 ) Removed remaining usages of HLRC from security integration tests. Relates to #83423	2022-11-01 09:20:56 +01:00
Nik Everett	092f370c10	Remove numbers from aggs tests (#90983 ) The line number style numbers prefixing the names of the aggregation tests don't buy us anything. Worse, they've obfuscated that I forgot to delete two files after merging their contents into the aggregations module. I've deleted those as part of this PR.	2022-10-18 14:39:17 -04:00
Nik Everett	a544b127be	Fix runtime field build The runtime field build now relies on the painless execute API transitively by virtue of using the aggs yaml tests. This pulls them in.	2022-10-18 09:52:21 -04:00
Nik Everett	71b5cad4eb	Move aggregations tests to module (#90953 ) We're going to move all aggregations to the module soon and this saves a little time in the build by only running the tests one time - in the aggregations module.	2022-10-18 07:38:10 -04:00
Nik Everett	f2a1ee9995	Synthetic _source: test `top_hits` (#90137 ) This adds a test for the `top_hits` aggregation using synthetic `_source`. It works but let's be a bit paranoid here because it's a whole new fetch phase.....	2022-09-24 05:55:23 +09:30
Yang Wang	adf8e01286	Add info of effective roles in denial messages (#89680 ) When an action is denied due to authorization error, the list of assigned roles is shown in the error message. However, it is possible that the effective roles are fewer or more than the assigned list: * Fewer roles can happen when the role is not defined or the license does not permit it * More roles can happen when anonymous access is enabled This PR changes the error message to show the effective roles instead of the assigned roles (whenever possible) to help troubleshooting. In addition, it also reports any missing roles, i.e. roles that are assigned but cannot be found.	2022-09-01 11:30:01 +09:30
Mark Tozzi	9ee6a19187	Add ability to select execution mode for cardinality aggregation (#87704 ) Plumbs through a new parameter for the cardinality aggregation, to allow configuring the execution mode. This can have significant impacts on speed and memory usage. This PR exposes three collection modes and two heuristics that we can tune going forward. All of these are treated as hints and can be silently ignored, e.g. if not applicable to the given field type. I've change the default behavior to optimize for time, which potentially uses more memory. Users can override this for the old behavior if needed.	2022-07-05 09:11:22 -04:00
Nik Everett	8ebf39b7e1	Fixup highlighting with synthetic source (#87667 ) Synthetic source has a habit of reordering text fields. This frustrates highlighting because it often wants to use index structures to find the offsets to values in the field. This disables the FVH highlighter for multi-valued text fields when synthetic source is enabled and runs the unified highlighter in "analyze" mode when synthetic source is enabled. That's enough to stop them from spitting out wrong answers. We might be leaving some performance on the table when the unified highlighter works on a single valued text field that is indexed with offsets or term vectors. We don't really expect that to be common at all though because generally folks will enable synthetic source to save space and adding offsets or term vectors is quite space inefficient. If it comes up, we might be able to improve here.	2022-06-15 14:49:06 -04:00
Rene Groeschke	95d56cc262	Fix configuration cache compatibility issues in gradle plugins (#87567 ) This fixes references to project that makes the plugin incompatible with Gradle configuration cache. We also remove custom xpackProject utility: using xpackProject in certain situations can break configure configuration cache compatibility as it uses a mutual project object under the hood that is discouraged to use in some use cases (e.g. at execution time) It always breaks compatibility with --configure-on-demand using xpackProject uses the project of the :x-pack project. referencing other project objects from other subproject should avoided where possible to decouple (sub project configurations). There's a good explanation of why we want to decouple our project configurations as much as possible here: https://docs.gradle.org/current/userguide/multi_project_configuration_and_execution.html#sec:decoupled_projects it adds little value over default out of the box gradle api (just use project(':x-pack:someProject') instead of xpackProject('someProject') Also in some occasions its even shorter. e.g. when this is used as xpackProject('someProject').path instead of just passing :x-pack:someProject I'll try to put a bit more context in the PR description in the future to make the motivation behind these kind of changes more clear upfront Related to #57918	2022-06-13 13:20:18 +02:00
Luca Cavanna	a48965decf	Authorize painless execute as index action when an index is specified (#85512 ) Painless execute allows users to validate their scripts. Some of the supported script contexts support providing a sample document as well as an index to pull the mappings from. The painless execute API requires cluster admin privileges today and while that's ok for the contexts that don't support providing an index, it is not ideal when an index is provided. In fact users can run scripts as part of the search API, which requires only the indices/read privilege on the indices that the users is reading from. This commit maps the painless execute action to an indices/read action when an index is specified, so that in that case the same privileges as a search action will be requested to run painless execute. Relates to #48856 Closes #86428	2022-05-17 17:22:42 +02:00
Nik Everett	a589456b81	Synthetic source (#85649 ) This attempts to shrink the index by implementing a "synthetic _source" field. You configure it by in the mapping: ``` { "mappings": { "_source": { "synthetic": true } } } ``` And we just stop storing the `_source` field - kind of. When you go to access the `_source` we regenerate it on the fly by loading doc values. Doc values don't preserve the original structure of the source you sent so we have to make some educated guesses. And we have a rule: the source we generate would result in the same index if you sent it back to us. That way you can use it for things like `_reindex`. Fetching the `_source` from doc values does slow down loading somewhat. See numbers further down. ## Supported fields This only works for the following fields: * `boolean` * `byte` * `date` * `double` * `float` * `geo_point` (with precision loss) * `half_float` * `integer` * `ip` * `keyword` * `long` * `scaled_float` * `short` * `text` (when there is a `keyword` sub-field that is compatible with this feature) ## Educated guesses The synthetic source generator makes `_source` fields that are: * sorted alphabetically * as "objecty" as possible * pushes all arrays to the "leaf" fields * sorts most array values * removes duplicate text and keyword values These are mostly artifacts of how doc values are stored. ### sorted alphabetically ``` { "b": 1, "c": 2, "a": 3 } ``` becomes ``` { "a": 3, "b": 1, "c": 2 } ``` ### as "objecty" as possible ``` { "a.b": "foo" } ``` becomes ``` { "a": { "b": "foo" } } ``` ### pushes all arrays to the "leaf" fields ``` { "a": [ { "b": "foo", "c": "bar" }, { "c": "bort" }, { "b": "snort" } } ``` becomes ``` { "a" { "b": ["foo", "snort"], "c": ["bar", "bort"] } } ``` ### sorts most array values ``` { "a": [2, 3, 1] } ``` becomes ``` { "a": [1, 2, 3] } ``` ### removes duplicate text and keyword values ``` { "a": ["bar", "baz", "baz", "baz", "foo", "foo"] } ``` becomes ``` { "a": ["bar", "baz", "foo"] } ``` ## `_recovery_source` Elasticsearch's shard "recovery" process needs `_source` sometimes. So does cross cluster replication. If you disable source or filter it somehow we store a `_recovery_source` field for as long as the recovery process might need it. When everything is running smoothly that's generally a few seconds or minutes. Then the fields is removed on merge. This synthetic source feature continues to produce `_recovery_source` and relies on it for recovery. It's possible to synthesize `_source` during recovery but we don't do it. That means that synethic source doesn't speed up writing the index. But in the future we might be able to turn this on to trade writing less data at index time for slower recovery and cross cluster replication. That's an area of future improvement. ## perf numbers I loaded the entire tsdb data set with this change and the size: ``` standard -> synthetic store size 31.0 GB -> 7.0 GB (77.5% reduction) _source 24695.7 MB -> 47.6 MB (99.8% reduction - synthetic is in _recovery_source) ``` A second _forcemerge a few minutes after rally finishes should removes the remaining 47.6MB of _recovery_source. With this fetching source for 1,000 documents seems to take about 500ms. I spot checked a lot of different areas and haven't seen any different hit. I expect this performance impact is based on the number of doc values fields in the index and how sparse they are.	2022-05-10 07:46:58 -04:00
Salvatore Campagna	08141cf875	fix: nested top metrics sort on keyword field (#85058 ) Using a double as a return value works only if the field we are sorting on is a number. If the field is not a value we can convert to a double, like a non-numeric keyword, converting it to a number returns `NaN`. Without this patch, sorting takes place on the bucket key, if the order field points to a non-numeric value. The additional bucket key comparator is implicitly added as a tie breaker to avoid non-deterministic sorting of buckets. With this change we support sorting using any subclass of SortValue. This means the bucket key will be used just in case of equal values on the order field. Issue: #78506	2022-03-22 00:38:51 +01:00
Mark Vieira	2cbc7cda49	Remove build plugin from additional QA projects (#84960 )	2022-03-14 14:11:05 -07:00
Ryan Ernst	0ec229050e	Move yaml rest test case to separate test lib (#84835 ) The ESClientYamlSuiteTestCase is used to run yaml tests throughout Elasticsearch. It utilizes the low level rest client in sniffing for nodes, but the sniffer is not needed anywhere else in the test framework. This commit creates a new project, `:test:rest-runner` which is meant to house the rest test running infrastructure. This has two purposes. First is to remove the sniffer from the test framework dependencies, because it transitively depends on Jackson. Second is to setup the runner for future refactorings where it could be made to not depend on the entire test framework, though how that could work is left for the future.	2022-03-11 10:51:11 -05:00
Alan Woodward	5ebcf60fbc	Don't run the runtime field YAML tests over TSDB aggs (#84791 ) Runtime fields don't support dimension parameters so we can't shadow keyword fields in the normal way for TSDB yaml tests.	2022-03-09 13:32:47 +00:00
Benjamin Trent	b592d2bf01	New random_sampler aggregation for sampling documents in aggregations (#84363 ) This adds a new sampling aggregation that performs a background sampling over all documents in an index. The syntax is as follows: ``` { "aggregations": { "sampling": { "random_sampler": { "probability": 0.1 }, "aggs": { "price_percentiles": { "percentiles": { "field": "taxful_total_price" } } } } } } ``` This aggregation provides fast random sampling over the entire document set in order to speed up costly aggregations. Testing this over a variety of aggregations and data sets, the median speed up when sampling at `0.001` over millions of documents is around 70X speed improvement. Relative error rate does rely on the size of the data and the aggregation kind. Here are some typically expected numbers when sampling over 10s of millions of documents. `p` is the configured probability and `n` is the number of documents matched by your provided filter query.	2022-03-02 14:32:30 -05:00
Nhat Nguyen	31d703f24c	Introduce lookup runtime fields (#82385 ) This PR introduces the lookup runtime fields which are used to retrieve data from the related indices. The below search request enriches its search hits with the location of each IP address from the `ip_location` index. ``` POST logs/_search { "runtime_mappings": { "location": { "type": "lookup", "lookup_index": "ip_location", "query_type": "term", "query_input_field": "ip", "query_target_field": "_id", "fetch_fields": [ "country", "city" ] } }, "fields": [ "timestamp", "message", "location" ] } ``` Response: ``` { "hits": { "hits": [ { "_index": "logs", "_id": "1", "fields": { "location": [ { "city": [ "Montreal" ], "country": [ "Canada" ] } ], "message": [ "the first message" ] } } ] } } ```	2022-02-22 21:36:19 -05:00
Mark Vieira	64929dc5df	Introduce explicit API for configure test cluster feature flags (#83876 )	2022-02-14 15:22:33 -08:00
Alan Woodward	8bc46ad959	Add filtering to fieldcaps endpoint (#83636 ) Many consumers of the field caps API need to do some post-processing of the results before they can use them; for instance, Kibana would like to exclude multifields from certain field selections, or would like to display only geo_point fields in Maps. ML and QL consumers exclude nested fields in certain circumstances. This post-processing is possible at the moment, but can be hacky; and in all cases it involves sending the whole (possibly very large) field caps response over the wire and then whittling it down in the client. It is also not guaranteed to be accurate - runtime fields may be incorrectly classified as multifields, for example. This commit pushes filtering into elasticsearch itself, reducing the amount of data that needs to be transported and ensuring better accuracy. The field caps API gets two new parameters: * filters - a comma-delimited list that may contain any combination of: `+metadata`, `-metadata`, `-nested`, `-parent`, `-multifield` * types - a comma-delimited list of field types; only fields that have a type in this set will be returned The API will make best-effort attempts to apply the filters post-hoc to responses from older nodes, so this should still work in a mixed-cluster or cross-cluster situation. Fixes #82966, #72174	2022-02-10 14:06:26 +00:00
Martijn van Groningen	0ddfad4cd7	Fix release build (#83720 ) - Add `es.index_mode_feature_flag_registered` feature flag to data-streams module's internalClusterTest task. - Add `es.random_sampler_feature_flag_registered` feature flag to xpack rest tests with security qa module. Closes #83722	2022-02-09 18:30:15 -05:00
weizijun	9503e9f4e2	Runtime fields core-with-mapped tests support tsdb (#83577 ) As runtime fields not support `time_series_dimension` and `time_series_metric`, it will lead to the failure of tsdb test case. And tsdb indices require the @timestamp field. So I improve the `runtimeifyMappingProperties` method logic, add some skip rule. - skip `time_series_dimension` field. - skip `time_series_metric` field. - skip `@timestamp` field. And the PR fixed the failed test in https://github.com/elastic/elasticsearch/issues/83431	2022-02-08 14:21:23 -05:00
Mark Vieira	1c95dfc94e	Mute failing runtime fields test	2022-02-02 13:55:03 -08:00
Tim Vernum	d61dda2c01	Remove system-index write-access from superuser role (#81400 ) This commit changes the superuser role (as used by the "elastic" builtin user) so that it no longer has any sort of write access to restricted indices (system indices). This improves the safety and security of the cluster, as it means that there are no out-of-the-box users or roles that can write to, delete or close the security index. Superusers can still read from (and monitor) system indices. Other roles (and users) can still access system indices as specified in their descriptor. These can be custom such as the "_es_test_root" role used in the integration test suite, or builtin roles such as kibana_system.	2022-01-17 12:00:38 +11:00
Artem Prigoda	0699c9351f	Use Java 14 switch expressions (#82178 ) JEP 361[https://openjdk.java.net/jeps/361] added support for switch expressions which can be much more terse and less error-prone than switch statements. Another useful feature of switch expressions is exhaustiveness: we can make sure that an enum switch expression covers all the cases at compile time.	2022-01-10 09:53:35 +01:00
Artem Prigoda	763d6d510f	Use Java 15 text blocks for JSON and multiline strings (#80751 ) The ES code base is quite JSON heavy. It uses a lot of multi-line JSON requests in tests which need to be escaped and concatenated which in turn makes them hard to read. Let's try to leverage Java 15 text blocks for representing them.	2021-12-15 18:01:28 +01:00
Mark Vieira	12ad399c48	Reformat Elasticsearch source	2021-10-27 08:19:51 -07:00
Lee Hinman	c017e1acdb	Add deprecation headers to HLRC classes (#79754 ) This commit adds the @Deprecated annotation and Javadoc to HLRC classes.	2021-10-25 16:11:16 -06:00
Igor Motov	f6034e643a	TSDB: Add time series information to field caps (#78790 ) Exposes information about dimensions and metrics via field caps. This information will be needed for PromQL support. Relates to #74660	2021-10-13 11:03:38 -10:00
Chris Hegarty	20c9f756d2	Fix split package org.elasticsearch.common.xcontent (#78831 ) Fix the split package org.elasticsearch.common.xcontent, between server and the x-content lib. Move the x-content lib exported package from org.elasticsearch.common.xcontent to org.elasticsearch.xcontent ( following the naming convention of similar libraries ). Removing split packages is a prerequisite to modularization.	2021-10-08 17:14:26 +01:00

1 2

71 commits