elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-23 14:47:31 -04:00

Author	SHA1	Message	Date
Joe Gallo	370fb79471	DateProcessor refactoring (#124349 ) (#124411 )	2025-03-09 05:00:39 +11:00
Joe Gallo	533d0a8750	Refactor RegisteredDomainProcessorTests (#124175 ) (#124245 )	2025-03-07 04:15:31 +11:00
Joe Gallo	126388cc0d	Cleanup RegisteredDomainProcessorTests (#124118 ) (#124173 )	2025-03-06 14:44:49 +11:00
Joe Gallo	aced4fc4d4	Cleanup RegisteredDomainProcessor (#124123 ) (#124155 )	2025-03-06 10:23:49 +11:00
Joe Gallo	0f46b562e6	Optimize IngestCtxMap construction (#120833 ) (#120926 )	2025-01-28 04:32:07 +11:00
Joe Gallo	a491383940	Optimize IngestDocMetadata isAvailable (#120753 ) (#120801 )	2025-01-25 02:51:42 +11:00
Rene Groeschke	6b7cd0339e	Update Gradle wrapper to 8.12 (#118683 ) (#119363 ) This updates the gradle wrapper to 8.12 We addressed deprecation warnings due to the update that includes: - Fix change in TestOutputEvent api - Fix deprecation in groovy syntax - Use latest ospackage plugin containing our fix - Remove project usages at execution time - Fix deprecated project references in repository-old-versions (cherry picked from commit `ba61f8c7f7`)	2024-12-31 08:36:31 +01:00
Parker Timmins	6ebee669c1	[8.x] Resolve pipelines from template if lazy rollover write (#116031 ) (#116132 ) * Resolve pipelines from template if lazy rollover write (#116031) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 * Remute tests block merge * Remute tests block merge	2024-11-03 04:25:33 +11:00
Ryan Ernst	dedf9fd6d7	Use directory name as project name for libs (#115720 ) (#115984 ) * Use directory name as project name for libs (#115720) The libs projects are configured to all begin with `elasticsearch-`. While this is desireable for the artifacts to contain this consistent prefix, it means the project names don't match up with their directories. Additionally, it creates complexities for subproject naming that must be manually adjusted. This commit adjusts the project names for those under libs to be their directory names. The resulting artifacts for these libs are kept the same, all beginning with `elasticsearch-`. * fixes	2024-10-31 07:52:10 +11:00
Simon Cooper	2d538c7022	Backport transport changes from #114895 to 8.x (#115909 )	2024-10-30 14:17:17 +00:00
Pete Gillin	6ec7a3439d	Add a `terminate` ingest processor (#114157 ) (#114343 ) This processor simply causes any remaining processors in the pipeline to be skipped. It will normally be executed conditionally using the `if` option. (If this pipeline is being called from another pipeline, the calling pipeline is not terminated.) For example, this: ``` POST /_ingest/pipeline/_simulate { "pipeline": { "description": "Appends just 'before' to the steps field if the number field is present, or both 'before' and 'after' if not", "processors": [ { "append": { "field": "steps", "value": "before" } }, { "terminate": { "if": "ctx.error != null" } }, { "append": { "field": "steps", "value": "after" } } ] }, "docs": [ { "_index": "index", "_id": "doc1", "_source": { "name": "okay", "steps": [] } }, { "_index": "index", "_id": "doc2", "_source": { "name": "bad", "error": "oh no", "steps": [] } } ] } ``` returns something like this: ``` { "docs": [ { "doc": { "_index": "index", "_version": "-3", "_id": "doc1", "_source": { "name": "okay", "steps": [ "before", "after" ] }, "_ingest": { "timestamp": "2024-10-04T16:25:20.448881Z" } } }, { "doc": { "_index": "index", "_version": "-3", "_id": "doc2", "_source": { "name": "bad", "error": "oh no", "steps": [ "before" ] }, "_ingest": { "timestamp": "2024-10-04T16:25:20.448932Z" } } } ] } ```	2024-10-09 16:44:57 +01:00
Simon Cooper	a5c05afe70	Explicitly use ISO weekfields for built-in weekyear date formats (#113787 ) This is so it doesn't change when changing JDK version and locale database	2024-10-01 14:10:15 +01:00
Simon Cooper	40f1e5057e	Add blog links to locale deprecation warnings (#113474 )	2024-09-25 14:24:05 +01:00
Simon Cooper	8c81222b66	Change default locale of date processors to ENGLISH (#112796 ) (#113438 ) It is English in the docs, so this fixes the code to match the docs. Note that this really impacts Elasticsearch when run on JDK 23 with the CLDR locale database, as in the COMPAT database pre-23, root and en are essentially the same.	2024-09-24 11:04:08 +01:00
Simon Cooper	7a81384974	Add deprecation warnings for week-date specifiers (#113247 ) Week dates also change on JDK 23, so add a deprecation warning if they are used on COMPAT	2024-09-20 16:49:47 +01:00
Simon Cooper	31d5967d35	Remove use of SPI locale for JDK 23+ (#113182 ) On JDK 23 we're just going with what CLDR specifies for week-date calculations - the built-in locales are available for ISO weekdate uses.	2024-09-20 16:48:17 +01:00
Simon Cooper	ceb9deff89	Use deprecation logger for CLDR date format specifiers (#112917 ) The addition of the logger requires several updates to tests to deal with the possible warning, or muting if there is not way to specify an allowed (but not mandatory) warning	2024-09-19 15:50:37 +01:00
Mark Vieira	0279c0a909	Add AGPLv3 as a supported license	2024-09-13 14:30:33 -07:00
Mark Vieira	24f33e95e8	Ensure rest compatibility tests are run when appropriate (#112526 )	2024-09-05 08:22:48 -07:00
Panos Koutsovasilis	29453cb2ce	fix: support all allowed protocol numbers (#111528 ) * fix(CommunityIdProcessor): support all allowed protocol numbers * fix(CommunityIdProcessor): update documentation	2024-08-26 08:37:40 +03:00
Patrick Doyle	35a375329a	Move Guice to org.elasticsearch.injection.guice (#111723 ) * Move files and fix imports & module exports * Other consequences of moving Guice	2024-08-12 10:47:46 -04:00
Moritz Mack	6ca3ac253a	Track raw ingest and storage size separately to support updates by doc (#111179 ) This PR starts tracking raw ingest and storage size separately for updates by document. This is done capturing the ingest size when initially parsing the update, and storage size when parsing the final, merged document. Additionally this renames DocumentSizeObserver to XContentParserDecorator / XContentMeteringParserDecorator for better reasoning about the code. More renaming will have to follow. --------- Co-authored-by: Przemyslaw Gomulka <przemyslaw.gomulka@elastic.co>	2024-08-02 09:26:37 +02:00
David Turner	b8af2a066e	Remove usages of more test-only request builders (#111400 ) Deprecates for removal the following methods from `ClusterAdminClient`: - `prepareSearchShards` - `preparePutStoredScript` - `prepareDeleteStoredScript` - `prepareGetStoredScript` Also replaces all usages of these methods with more suitable test utilities. This will permit their removal, and the removal of the corresponding `RequestBuilder` objects, in a followup. Relates #107984	2024-07-30 07:33:19 +01:00
Ankita Kumar	5761c4afb5	Reconstruct set of indices in BulkRequest (#110672 ) Reconstruct indices set in BulkRequest constructor so that the correct thread pool can be used for forwarded bulk requests. Before this fix, forwarded bulk requests were always using the system_write thread pool because the indices set was empty. Fixes issue https://github.com/elastic/elasticsearch/issues/102792	2024-07-25 20:30:55 -04:00
kanoshiou	9fbdfcf650	Fix unnecessary mustache template evaluation (#110986 ) Addresses the performance issue in the date ingest processor where Mustache template evaluation is unnecessarily applied inside a loop. The timezone and locale templates are now evaluated once before the loop, improving efficiency. closes #110191 --------- Co-authored-by: Joe Gallo <joegallo@gmail.com>	2024-07-22 15:42:58 -05:00
Przemyslaw Gomulka	cf03c66c1f	Infrastructure to meter updates by script for ra-s nontimeseries (#108910 ) this commit refactors the metering for billing api so that we can hide the implementation details of DocumentSizeObserver creation and adds additional field `originatesFromScript` on IndexRequest There will no longer need to have a code checking if the request was already parsed in ingest service or updatehelper. This logic will be hidden in the implementation.	2024-07-11 10:49:32 +02:00
Przemyslaw Gomulka	b80b739993	Provide document size reporter with MapperService (#109794 ) Instead of indexMode a mapper service is necessary to reliably determine if an index is a timeseries datastream	2024-06-18 11:40:56 +02:00
Przemyslaw Gomulka	44ae540fd7	Provide the DocumentSizeReporter with index mode (#108947 ) in order to decided what logic in to apply when reporting a document size we need to know if an index is a time_series mode. This information is in indexSettings.mode.	2024-06-10 11:48:22 +02:00
Parker Timmins	3662d12c9f	Return ingest byte stats even when 0-valued (#108796 ) Change the ingest byte stats to always be returned whether or not they have a value of 0. Add human readable form of byte stats. Update docs to reflect changes.	2024-05-20 10:52:16 -05:00
Parker Timmins	c5a3342449	Test pipeline run after reroute (#108693 ) Add test confirming that pipelines are run after a reroute. Fix test of two stage reroute. Delete pipelines during teardown so as to not break other tests using name pipeline name. Co-authored-by: Joe Gallo <joegallo@gmail.com>	2024-05-20 10:02:04 -05:00
Parker Timmins	298c6492a5	Make ingest byte stat names more descriptive (#108786 ) Current ingest byte stat fields could easily be confused. Add more descriptive name to make it clear that they do not count all docs processed by the pipeline.	2024-05-17 12:03:42 -05:00
Larisa Motova	a01baa3d79	Include doc size info in ingest stats (#107240 ) Add ingested_in_bytes and produced_in_bytes stats to pipeline ingest stats. These track how many bytes are ingested and produced by a given pipeline. For efficiency, these stats are recorded for the first pipeline to process a document. Thus, if a pipeline is called as a final pipeline after a default pipeline, as a pipeline processor, and after a reroute request, a document will not contribute to the stats for that pipeline. If a given pipeline has 0 bytes recorded for both of these stats, due to not being the first pipeline to run any doc, these stats will not appear in the pipeline's entry in ingest stats.	2024-05-17 08:53:24 -05:00
Przemyslaw Gomulka	1803320db5	Allow RA metrics to be reported upon parsing completed or accumulated (#108726 ) RAmetric can be implemented so that they could be reported before they are being indexed (like with a new field being added) or they could be accumulated and reported upon shard commit as an additional metadata This commit addes new method to DocumentSizeReporter#onParsingComplted DocumentSizeAccumulator that is being used to accumulate the size inbetween the commits DocumentSizeReporter can be parametrised with a DocumentSizeAccumulator based on #108449	2024-05-17 12:54:18 +02:00
Przemyslaw Gomulka	437e7db499	Refactor reporting of RA metrics to not to be done in TransportShardBulkAction (#108449 ) previously DocumentSizeReporter was reporting upon indexing being completed in TransportShardBulkAction#onComplete This commit renames the method to onIndexingCompleted and moves that reporting to IndexEngine in serverless plugin. This will be followed up in a separate PR that will be reporting in an Engine#index subclass (serverless)	2024-05-16 13:57:06 +02:00
Moritz Mack	b71fc0c561	Migrate remaining usage of skip version in YAML specs to cluster_features (#108055 )	2024-05-07 09:42:17 +02:00
Parker Timmins	796b0deeec	Simulate should succeed if ignore_missing_pipeline (#108106 ) PipelineProcessors with non-existing pipelines should succeed (as noop) if ignore_missing_pipeline=true. Currently, does not work when pipelines are simulated with verbose=true. In this case, an error is returned and no results are shown for subsequent processors. This change allows following processors to run, and changes the status from error to error_ignored.	2024-05-02 08:35:20 -05:00
Keith Massey	f21bba6ce5	Making test document larger to reliably force StackOverflowError in GsubProcessorTests (#107724 ) The size of the document used to trigger a StackOverflowError in GsubProcessorTests.testStackOverflow() was just large enough to cause it on a mac. On the linux CI boxes, occasionally it does not cause a StackOverflowError, and as a result the test fails. This change makes the document more than 3x larger, making a StackOverflowError guaranteed. Closes #107416	2024-04-22 17:06:43 -04:00
Jonathan Buttner	a0693a59fb	Muting testStackOverflow (#107465 ) Muting https://github.com/elastic/elasticsearch/issues/107416	2024-04-15 09:14:39 -04:00
Keith Massey	ef16be9303	Catching StackOverflowErrors from bad regexes in GsubProcessor and rethrowing as an Exception (#106851 )	2024-04-11 15:59:53 -05:00
Moritz Mack	1f5e04b721	Migrate YAML REST tests to synthetic cluster feature check (#107068 ) To simplify the migration away from version based skip checks in YAML specs, this PR adds a synthetic version feature `gte_vX.Y.Z` for any version at or before 8.14.0. New test specs for 8.14 or later are expected to use respective new cluster features, or a test-only feature supplied via ESRestTestCase#createAdditionalFeatureSpecifications if sufficient.	2024-04-11 18:22:38 +02:00
Przemyslaw Gomulka	84d61579c1	Do not report document metering on system indices (#107041 ) For system indices we don't want to emit metrics. DocumentSizeReporter will be created given an index. It will internally contain a SystemIndices instance that will verify the indexName with isSystemName	2024-04-10 13:04:40 +02:00
Keith Massey	c6a0d4f0d7	Pulling KeyValueProcessor.logAndBuildException() into AbstractProcessor (#106931 )	2024-03-29 16:29:16 -05:00
Armin Braun	fc8e2b7897	Introduce Predicate Utilities for always true/false use-cases (#105881 ) Just a suggetion. I think this would save us a bit of memory here and there. We have loads of places where the always true lambdas are used with `Predicate.or/and`. Found this initially when looking into field caps performance where we used to heavily compose these but many spots in security and index name resolution gain from these predicates. The better toString also helps in some cases at least when debugging.	2024-03-04 14:01:21 +01:00
Simon Cooper	bc47d18599	Convert uses of map/set creation using a subclass to static creation methods (#105767 )	2024-02-23 16:43:26 +00:00
Niels Bauman	b1fcedd7ae	Fix `uri_parts` processor behaviour for missing extensions (#105689 ) The `uri_parts` processor was behaving incorrectly for URI's that included a dot in the path but did not have an extension. Also includes YAML REST tests for the same.	2024-02-22 09:41:11 +01:00
Przemyslaw Gomulka	a103e3c7a4	Infrastructure for metering the update requests (#105063 ) udpate request that are sending a document (or part of it) should allow for metering the size of that doc the update request that are using a script should not be metered - reported size 0. this commit is following up on #104859 The parsing is of the update's document is being done in UpdateHelper - the same pattern we use to meter parsing in IngestService. If the script is being used, the size observed will be 0. The value observed is then reported in the TransportShardBulkAction and thanks to the value being 0 or positive it will not be metering the modified document again. This commit also renames the getDocumentParsingSupplier to getDocumentParsingProvider (this was accidentally omitted in the #104859)	2024-02-19 12:05:51 +01:00
Keith Massey	f0ec294382	Limiting the number of nested pipelines that can be executed (#105428 ) Limiting the number of nested pipelines that can be executed within a single pipeline to 100	2024-02-13 16:28:31 -06:00
Keith Massey	c884945a93	Adding executedPipelines to the IngestDocument copy constructor (#105427 )	2024-02-13 15:11:47 -06:00
Keith Massey	e2b2232569	Improving the performance of the ingest simulate verbose API (#105265 ) This updates the simulate verbose API to run in O(N) (for number of pipelines) time and memory like the simulate and ingest APIs rather than O(N^2).	2024-02-12 16:04:21 -06:00
Dmitry Cherniachenko	a50e58d99a	Use single-char variant of String.indexOf() where possible (#105205 ) * Use single-char variant of String.indexOf() where possible indexOf(char) is more efficient than searching for the same one-character String. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-02-12 14:14:32 -05:00

1 2 3 4 5 ...

500 commits