elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Author	SHA1	Message	Date
Yoann Rodière	841ac8e43a	Upgrade Apache Commons Logging to 1.2 (#85745 ) * Upgrade to Apache Commons Logging 1.2 (#40305) * Clarify that Apache HTTP/commons-* dependencies are not just for tests	2022-08-10 13:19:15 -04:00
Mark Vieira	398b0147a7	Upgrade Gradle wrapper to 7.5.1 (#88918 )	2022-08-08 12:34:58 -07:00
Rene Groeschke	3909b5eaf9	Add verification metadata for dependencies (#88814 ) Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support. - Use sha256 in favor of sha1 as sha1 is not considered safe these days. Closes https://github.com/elastic/elasticsearch/issues/69736	2022-08-04 09:51:16 +02:00
Artem Prigoda	2a03ac35a6	Fix compilation in the rescore plugin (#89004 ) Add source fallback operation when looking up a the factor field added in #88735 Resolves #88985	2022-08-01 21:05:57 +02:00
Ignacio Vera	ed564f6e1d	Update lo lucene-9.3.0 (#88927 )	2022-08-01 07:21:13 +02:00
Jack Conradson	5e0701f026	Add source fallback for keyword fields using operation (#88735 ) This change adds an operation parameter to FieldDataContext that allows us to specialize the field data that are returned from fielddataBuilder in MappedFieldType. Keyword, integer, and geo point field types now support source fallback where we build a doc values wrapper using source if doc values doesn't exist for this field under the operation SCRIPT. This allows us to have source fallback in scripting for the scripting fields API.	2022-07-28 10:34:05 -07:00
Alan Woodward	bc8ebbf540	Add FieldDataContext (#88779 ) MappedFieldType#fieldDataBuilder() currently takes two parameters, a fully qualified index name and a supplier for a SearchLookup. We expect to add more parameters here as we add support for loading fielddata from source. Rather than telescoping the parameter list, this commit instead introduces a new FieldDataContext carrier object which will allow us to add to these context parameters more easily.	2022-07-26 14:47:50 +01:00
Ignacio Vera	3b7f393a82	Upgrade to lucene snapshot lucene-9.3.0-snapshot-b8d1fcfd0ec (#88706 )	2022-07-22 11:22:39 +02:00
Rene Groeschke	98b789c940	Update to to Gradle wrapper 7.5 (#85141 ) This updates the gradle wrapper to a 7.5 Fixes #85123	2022-07-19 08:12:19 +02:00
Rene Groeschke	dbf39741a0	Make LoggedExec gradle task configuration cache compatible (#87621 ) This changes the LoggedExec task to be configuration cache compatible. We changed the implementation to use `ExecOperations` instead of extending `Exec` task. As double checked with the Gradle team this task is not planned to be made configuration cache compatible out of the box anytime soon. This is part of the effort on https://github.com/elastic/elasticsearch/issues/57918	2022-07-11 08:46:54 +02:00
Nhat Nguyen	bd69f90fff	Upgrade to Lucene-9.3.0-snapshot-2d05f5c623e (#88284 ) To include LUCENE-10620 - which passes Weight to Collector	2022-07-06 16:16:03 -04:00
Chris Hegarty	453f12c72d	Upgrade to Log4J 2.18.0 (#88237 )	2022-07-04 11:30:38 +01:00
Rene Groeschke	8ccae4da71	Setup elasticsearch dependency monitoring with Snyk for production code (#88036 ) This adds the generation and upload logic of Gradle dependency graphs to snyk We directly implemented a rest api based snyk plugin as: the existing snyk gradle plugin delegates to the snyk command line tool the command line tool uses custom gradle logic by injecting a init file that is a) using deprecated build logic which we definitely want to avoid b) uses gradle api we avoid like eager task creation. Shipping this as a internal gradle plugin gives us the most flexibility as we only want to monitor production code for now we apply this plugin as part of the elasticsearch.build plugin, that usage has been for now the de-facto indicator if a project is considered a "production" project that ends up in our distribution or public maven repositories. This isnt yet ideal and we will revisit the distinction between production and non production code / projects in a separate effort. As part of this effort we added the elasticsearch.build plugin to more projects that actually end up in the distribution. To unblock us on this we for now disabled a few check tasks that started failing by applying elasticsearch.build. Addresses #87620	2022-06-29 13:29:14 +02:00
James Baiera	08d1c3e643	Update HDFS Repository to HDFS 3.3.3 (#88039 ) This updates the HDFS repository plugin to use HDFS 3.3.3.	2022-06-28 11:02:54 -04:00
Armin Braun	eda1c511dd	Don't extend AbstractIndexComponent in AbstractCharFilterFactory (#88125 ) Same as #88113 but for AbstractCharFilterFactory.	2022-06-28 14:51:51 +02:00
Armin Braun	02568210ba	Don't extend AbstractIndexComponent in AbstractTokenFilter (#88113 ) No need for this extension, we don't make use of the settings or deprecation logger in production any more. Also, this slows down CS operations that require a temporary index service which builds quite a bit slower when the loggers need to be set up via reflective calls.	2022-06-28 12:13:36 +02:00
Tim Vernum	6078fc3cbf	Update http client version (#87491 ) Moves a few Apache HTTP client dependencies to their latest version - httpclient -> 4.5.13 - httpasyncclient -> 4.1.5 - httpcore -> 4.4.13	2022-06-28 06:10:17 -04:00
Ryan Ernst	eed8da3919	Move the ingest attachment processor to the default distribution (#87989 ) The ingest attachment processor is currently available as a plugin. This commit moves the processor to the default distribution so it is always available.	2022-06-28 02:10:36 -04:00
Nhat Nguyen	c2dc6e6ef4	Upgrade to new Lucene snapshot (#87932 ) This PR uses Lucene-9.3 snapshot in Elasticsearch 8.4. Noticeable changes in this Lucene snapshot: - Merge-on-refresh (disabled) - No more pathological merging - SortedSetDocValues#count for value_count aggs	2022-06-23 12:18:27 -04:00
Artem Prigoda	e17f805ccc	Remove redundant jackson dependencies from discovery-azure (#87898 ) The APIs that we use in azure-svc-mgmt-compute use the Apache HTTP client and the built-in Java XML parser, so it doesn't require Jersey JAXB bindings for databinding JSON/XML data to Java objects via old Jackson dependencies.	2022-06-23 14:39:25 +02:00
Ryan Ernst	6084b9d321	Fix rest example plugin (#87923 ) This is a followup to https://github.com/elastic/elasticsearch/pull/87504, to fix the example plugin that used BytesRestResponse.	2022-06-22 08:57:00 -07:00
Rene Groeschke	cdf5bd7ed0	Rework testing conventions gradle plugin (#87213 ) This PR reworks the testing conventions precommit plugin. This plugin now: - is compatible with yaml, java rest tests and internalClusterTest (aka different sourceSets per test type) - enforces test base class and simple naming conventions (as it did before) - adds one check task per test sourceSet - uses the worker api to improve task execution parallelism and encapsulation - is gradle configuration cache compatible This also ports the TestingConventions integration testing to Spock and removes the build-tools-internal/test kit folder that is not required anymore. We also add some common logic for testing java related gradle plugins. We will apply further cleanup on other tests within our test suite in a dedicated follow up cleanup	2022-06-20 16:26:38 +02:00
Armin Braun	0132541d60	Remove redundant BlobMetadata interface (#87705 ) No need to have more than a simple record here at this point.	2022-06-18 20:41:32 +02:00
Nikola Grcevski	06d5baaba5	Add more GraalThread filtering in tests (#87571 )	2022-06-09 16:06:57 -04:00
Alan Woodward	048fa422c2	Update to public lucene 9.2.0 release (#87162 )	2022-06-06 10:06:41 +01:00
Armin Braun	da4577ea82	Speed up NumberFieldMapper (#85688 ) No need to create an intermediary list here. Creating it and adding it to the document tended to take more time than the parsing of the number itself.	2022-06-04 12:24:41 +02:00
Przemyslaw Gomulka	705b27ae3b	Refactor ParameterizedMessage used in lambda and casted to Supplier (#87156 ) This is a result of structural search/replace in intellij. This only affects log methods with a signature logger.info((Supplier) ()-> ParametrizedMessage) logger.info((Supplier) ()-> ParametrizedMessage, Throwable) relates #86549	2022-05-31 08:46:35 +02:00
Ryan Ernst	e2e241ec01	Fix the ingest attachment license (#87189 ) This commit fixesx the license/notice files for ingest attachment dependency to account for both tika-langdetect and tika-langdetect-tika jars when building dependency info.	2022-05-27 06:14:30 -07:00
Keith Massey	6b34671dad	Upgrading to tika 2.4 (#86015 ) Tika 1.x is end of life as of later this year. This change updates the AttachmentProcessor to use tika 2. The goal was to keep the functionality as close as possible, just with upgraded tika. The tests have been slightly modified because of a small change in tika functionality -- as of 2.4.0 it now adds an extra newline to the output for every embedded attachment in a document. Also as part of this I have broken apart the tika-parsers into individual dependencies. The reason is that we are considering breaking this plugin apart, and want to know exactly which parsers we pull in.	2022-05-24 16:34:19 -04:00
Albert Zaharovits	346abf9816	Improve "Has Privilege" performance for boolean-only response (#86685 ) Boolean-only privilege checks, i.e. the ones currently used in the "profile has privilege" API, now benefit from a performance improvement, because the check will now stop upon first encountering a privilege that is NOT granted over a resource (and return `false` overall). Previously, all the privileges were always checked over all the resources in order to assemble a comprehensive response with all the privileges that are not granted.	2022-05-24 11:41:20 -04:00
Armin Braun	7a25453dec	Speed up FieldMapper construction/parsing/serialization (#86860 ) Speeding this up some more as it's now 50% of the bootstrap time of the many shards benchmarks. Iterating an array here in all cases is quite a bit faster than iterating various kinds of lists and doesn't complicate the code. Also removes a redundant call to `getValue()` for each parameter during serialization.	2022-05-23 12:09:00 +02:00
Chris Hegarty	3071c6a055	Modularize Elasticsearch (#81066 ) This PR represents the initial phase of Modularizing Elasticsearch (with Java Modules). This initial phase modularizes the core of the Elasticsearch server with Java Modules, which is then used to load and configure extension components atop the server. Only a subset of extension components are modularized at this stage (other components come in a later phase). Components are loaded dynamically at runtime with custom class loaders (same as is currently done). Components with a module-info.class are defined to a module layer. This architecture is somewhat akin to the Modular JDK, where applications run on the classpath. In the analogy, the Elasticsearch server modules are the platform (thus are always resolved and present), while components without a module-info.class are non-modular code running atop the Elasticsearch server modules. The extension components cannot access types from non-exported packages of the server modules, in the same way that classpath applications cannot access types from non-exported packages of modules from the JDK. Broadly, the core Elasticseach java modules simply "wrap" the existing packages and export them. There are opportunites to export less, which is best done in more narrowly focused follow-up PRs. The Elasticsearch distribution startup scripts are updated to put jars on the module path (the class path is empty), so the distribution will run the core of the server as java modules. A number of key components have been retrofitted with module-info.java's too, and the remaining components can follow later. Unit and functional tests run as non-modular (since they commonly require package-private access), while higher-level integration tests, that run the distribution, run as modular. Co-authored-by: Chris Hegarty <christopher.hegarty@elastic.co> Co-authored-by: Ryan Ernst <ryan@iernst.net> Co-authored-by: Rene Groeschke <rene@elastic.co>	2022-05-20 13:11:42 +01:00
Alan Woodward	205cfec52f	Upgrade to lucene 9.2.0-RC2 snapshot (#86931 ) Only difference from last snapshot is a revert of a change in the behaviour of PersianAnalyzer	2022-05-20 08:54:35 +01:00
Yannick Welsch	5aebb8ee38	Add text field support to archive indices (#86591 ) Adds support for "text" fields in archive indices, with the goal of adding simple filtering support on text fields when querying archive indices. There are some differences to regular text fields: - no global statistics: queries on text fields return constant score (similar to match_only_text). - analyzer fields can be updated - if defined analyzer is not available, falls back to default analyzer - no guarantees that analyzers are BWC The above limitations also give us the flexibility to eventually swap out the implementation with a "runtime-text field" variant, and hence only provide those capabilities that can be emulated via a runtime field. Relates #81210	2022-05-18 10:25:38 +02:00
Alan Woodward	0418e8a9d8	Upgrade to lucene snapshot 978eef5459c (#86852 ) Final (hopefully!) snapshot before the 9.2.0 release * Update test to expect persian tokenfilter - will be exposed later * Fix KnnVectorQueryBuilderTests::doAssertLuceneQuery Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2022-05-17 15:27:52 -07:00
Armin Braun	82933a8599	Save redundant singleton maps in field mappers (#86785 ) In the many-shards benchmarks the singleton maps storing just a single analyzer for each keyword field mapper cost around 5% of the total heap usage on data nodes (700MB for ~15k indices which translate into ~16M instances of keyword field mapper for Beats mappings). Creating specific implementations for the zero, one or many analyzers use cases that already have their own specialized constructors eliminates this overhead completely. relates #77466	2022-05-16 15:13:51 +02:00
Ryan Ernst	12b98b37b6	Remove remaining single arg ParameterizedMessages (#86715 ) This commit removes the remaining ParameterizedMessages that take a single argument, this time where the argument contains method calls. This was again done almost entirely through find/replace with regex in IntelliJ. relates #86549	2022-05-12 10:09:11 +02:00
Mark Vieira	22aeebcd9f	Avoid starting test fixtures when resolving all external dependencies (#86357 )	2022-05-11 07:59:34 -07:00
Nik Everett	a589456b81	Synthetic source (#85649 ) This attempts to shrink the index by implementing a "synthetic _source" field. You configure it by in the mapping: ``` { "mappings": { "_source": { "synthetic": true } } } ``` And we just stop storing the `_source` field - kind of. When you go to access the `_source` we regenerate it on the fly by loading doc values. Doc values don't preserve the original structure of the source you sent so we have to make some educated guesses. And we have a rule: the source we generate would result in the same index if you sent it back to us. That way you can use it for things like `_reindex`. Fetching the `_source` from doc values does slow down loading somewhat. See numbers further down. ## Supported fields This only works for the following fields: * `boolean` * `byte` * `date` * `double` * `float` * `geo_point` (with precision loss) * `half_float` * `integer` * `ip` * `keyword` * `long` * `scaled_float` * `short` * `text` (when there is a `keyword` sub-field that is compatible with this feature) ## Educated guesses The synthetic source generator makes `_source` fields that are: * sorted alphabetically * as "objecty" as possible * pushes all arrays to the "leaf" fields * sorts most array values * removes duplicate text and keyword values These are mostly artifacts of how doc values are stored. ### sorted alphabetically ``` { "b": 1, "c": 2, "a": 3 } ``` becomes ``` { "a": 3, "b": 1, "c": 2 } ``` ### as "objecty" as possible ``` { "a.b": "foo" } ``` becomes ``` { "a": { "b": "foo" } } ``` ### pushes all arrays to the "leaf" fields ``` { "a": [ { "b": "foo", "c": "bar" }, { "c": "bort" }, { "b": "snort" } } ``` becomes ``` { "a" { "b": ["foo", "snort"], "c": ["bar", "bort"] } } ``` ### sorts most array values ``` { "a": [2, 3, 1] } ``` becomes ``` { "a": [1, 2, 3] } ``` ### removes duplicate text and keyword values ``` { "a": ["bar", "baz", "baz", "baz", "foo", "foo"] } ``` becomes ``` { "a": ["bar", "baz", "foo"] } ``` ## `_recovery_source` Elasticsearch's shard "recovery" process needs `_source` sometimes. So does cross cluster replication. If you disable source or filter it somehow we store a `_recovery_source` field for as long as the recovery process might need it. When everything is running smoothly that's generally a few seconds or minutes. Then the fields is removed on merge. This synthetic source feature continues to produce `_recovery_source` and relies on it for recovery. It's possible to synthesize `_source` during recovery but we don't do it. That means that synethic source doesn't speed up writing the index. But in the future we might be able to turn this on to trade writing less data at index time for slower recovery and cross cluster replication. That's an area of future improvement. ## perf numbers I loaded the entire tsdb data set with this change and the size: ``` standard -> synthetic store size 31.0 GB -> 7.0 GB (77.5% reduction) _source 24695.7 MB -> 47.6 MB (99.8% reduction - synthetic is in _recovery_source) ``` A second _forcemerge a few minutes after rally finishes should removes the remaining 47.6MB of _recovery_source. With this fetching source for 1,000 documents seems to take about 500ms. I spot checked a lot of different areas and haven't seen any different hit. I expect this performance impact is based on the number of doc values fields in the index and how sparse they are.	2022-05-10 07:46:58 -04:00
Armin Braun	7b916f2678	AbstractAnalyzerProvider does not need to extend AbstractIndexComponent (#86537 ) Remove the inheritance here to make instances smaller and speed up many-shards benchmarks a little. Did not remove the dead arguments from the constructors in this PR as that would have been a very noisy change.	2022-05-08 22:34:52 +02:00
Albert Zaharovits	3d4234e80e	Has privileges API for profiles (#85898 ) This introduces a new Security API `_security/profile/_has_privileges` that can be used to verify which Users have the requested privileges, given their associated User Profiles. Multiple profile uids can be specified in a single has privileges request. This is analogous to the existing Has privileges API. It also uses the same format for specifying the privileges to be checked, and should be used in the same situations (ie to run an authorization preflight check or to verify privileges over application resources). However, unlike the existing has privilege API, this can be used to check the privileges of multiple users (not only of the currently authenticated one), but the users must have an existing profile, and the response is binary only (either it has or it does not have the requested privileges). Calling this API requires the `manage_user_profile` cluster privilege.	2022-05-06 09:54:34 +03:00
Alan Woodward	4d076eee20	Upgrade to Lucene 9.2 snapshot efa5d6f4d43 (#86227 ) Notable changes include: count implementations for MultiRangeQuery and IndexSortedNumericDocValuesRangeQuery, which may speed up certain aggregations more efficient decoding of docids in BKD reader	2022-05-05 15:48:13 +01:00
Yang Wang	286cb2b26c	[Test] Replace removed User methods (#86422 ) Another refactor leftover. Relates: #86246 Resolves: #86421	2022-05-04 08:36:36 -04:00
Armin Braun	cb41ed09e3	Deduplicate default FieldType in KeywordFieldMapper (#86346 ) The default type is incredibly common and instances are not trivial in size with 16 fields. Heap dumps from larger data nodes holding many keyword fields with the default field type can contain hundreds of MB of heap used for these. Same reasoning applies to the `TextSearchInfo` deduplication. `TextSearchInfo` was turned into a record to give us an `equals` implementation.	2022-05-03 16:11:36 +02:00
Yang Wang	210ce86663	[Test] Fix authentication creation in example project (#86385 ) In #86206, we closed down Authentication constructors to favour dedicated convenient methods for instantiation. The constructor usages in the example project were however left out (another refactor fallout). Relates: #86206 Resolves: #86378	2022-05-03 20:28:01 +10:00
Rene Groeschke	177b0fa47f	Mute failing example project (#86379 ) Exclude example project to unblock PR checks till #86378 is addressed.	2022-05-03 05:15:29 -04:00
Ryan Ernst	af7525e1f0	Upgrade jackson to 2.13.2 (#86051 ) Most of the Jackson uses, eg in x-content and azure, have already been upgraded. This commit upgrades the rest of the uses. Note that it does not yet upgrade the aws sdk, this should also be done on its own.	2022-04-22 07:21:17 -07:00
Chris Hegarty	603ca53798	Use declared constant (rather than resource lookup) (#86083 ) Use the public static final constant org.apache.lucene.analysis.icu.NORMALIZER, rather than poking around inside lucene resources - the Normalizer2 instance is equivalent. It would appear that this code, doing the resource lookup, predates the lucene public field.	2022-04-22 13:55:24 +01:00
Ryan Ernst	b2c9028384	Move io utils to core package (#85954 ) Most classes under elasticsearch-core had been moved to the o.e.core package. However, a couple io related classes remained in an "internal" package. This commit moves Streams and IOUtils to the core package, as they are no more "internal" than the rest of the classes in core.	2022-04-19 21:26:28 -07:00
Artem Prigoda	b841b5f7d5	[discovery-gce] Fix initialisation of transport in FIPS mode (#85817 ) Load the the keystore with Google certificates in the JKS format instead of the default p12 which is not compatible with FIPS.	2022-04-13 10:57:37 +02:00

1 2 3 4 5 ...

3053 commits