elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-25 07:37:19 -04:00

Author	SHA1	Message	Date
Simon Cooper	e7350dce29	Add a capabilities API to check node and cluster capabilities (#106820 ) This adds a /_capabilities rest endpoint for checking the capabilities of a cluster - what endpoints, parameters, and endpoint capabilities the cluster supports	2024-05-08 14:44:26 +01:00
Kostas Krikellas	3183e6d6c9	Add ignored field values to synthetic source (#107567 ) * Add ignored field values to synthetic source * Update docs/changelog/107567.yaml * initialize map * yaml fix * add node feature * add comments * small fixes * missing cluster feature in yaml * constants for chars, stored fields * remove duplicate method * throw exception on parse failure * remove Base64 encoding * add assert on IgnoredValuesFieldMapper::write * changes from review * simplify logic * add comment * rename classes * rename _ignored_values to _ignored_source * rename _ignored_values to _ignored_source	2024-04-26 15:35:31 +03:00
Panagiotis Bailis	d029d40cea	Adding new RankContext classes per different search phase/node type (#107093 )	2024-04-24 14:58:16 +03:00
Mary Gouseti	119f6e71ce	[Data stream lifecycle] Introduce factory retention settings (#107741 ) We introduce the plumbing so that a plugin can provide factory retention. This retention will take effect if there is no global retention provided by the user. Without a plugin defining the factory retention, elasticsearch will have no factory retention.	2024-04-24 11:52:24 +03:00
Howard	fdbb21bba4	Support effective watermark thresholds in node stats API (#107244 ) Adds to the `fs` component of the node stats API some additional values indicating the disk watermarks that are currently in effect. Relates #106676	2024-04-18 09:57:28 -04:00
Chris Hegarty	6b52d7837b	Add an optimised int8 vector distance function for aarch64. (#106133 ) This commit adds an optimised int8 vector distance implementation for aarch64. Additional platforms like, say, x64, will be added as a follow-up. The vector distance implementation outperforms Lucene's Pamana Vector implementation for binary comparisons by approx 5x (depending on the number of dimensions). It does so by means of compiler intrinsics built into a separate native library and link by Panama's FFI. Comparisons are performed on off-heap mmap'ed vector data. The implementation is currently only used during merging of scalar quantized segments, through a custom format ES814HnswScalarQuantizedVectorsFormat, but its usage will likely be expanded over time. Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co> Co-authored-by: Mark Vieira <portugee@gmail.com> Co-authored-by: Ryan Ernst <ryan@iernst.net>	2024-04-12 08:44:21 +01:00
Tim Vernum	36d5282907	Allow additional JSON log fields via SPI (#106980 ) This adds a new SPI based `LoggingDataProvider` service that can be implemented in order to add new fields to the main JSON log	2024-04-10 22:14:00 -04:00
Adrien Grand	49ffa045a6	Cut over stored fields to ZSTD for compression. (#103374 ) This cuts over stored fields with `index.codec: best_speed` (default) to ZSTD with level 0 and blocks of at most 128 documents or 14kB, and `index.codec: best_compression` to ZSTD with level 3 and blocks of at most 2,048 documents or 240kB. Compared with the current codecs, this would yield similar indexing speed, much better space efficiency and similar retrieval speed. Benchmarks on the `elastic/logs` track suggest 10% better storage efficiency and slightly faster ingestion. The Lucene codec infrastructure records the codec on a per-segment basis and ensures that this change is backward-compatible. Segments will get progressively migrated to ZSTD as they get merged in the background. Bindings for ZSTD are provided by the Panama FFI API on JDK21+ and JNA on older JDKs. ZSTD support is currently behind a feature flag, so it won't be enabled immediately when this feature gets merged, this will need a follow-up change. Co-authored-by: Mark Vieira <portugee@gmail.com> Co-authored-by: Ryan Ernst <ryan@iernst.net>	2024-04-09 09:18:58 +02:00
Jack Conradson	68b0acac8f	Add retrievers using the parser-only approach (#105470 ) This enhancement adds a new abstraction to the _search API called "retriever." A retriever is something that returns top hits. This adds three initial retrievers called "standard", "knn", and "rrf". The retrievers use a parser-only approach where they are parsed and then translated into a SearchSourceBuilder to execute the actual search. --------- Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2024-03-12 10:11:55 -07:00
Andrei Dan	882b92ab60	Add service for computing the optimal number of shards for data streams (#105498 ) This adds the `DataStreamAutoShardingService` that will compute the optimal number of shards for a data stream and return a recommendation as to when to apply it (a time interval we call cool down which is 0 when the auto sharding recommendation can be applied immediately). This also introduces a `DataStreamAutoShardingEvent` object that will be stored in the data stream metadata to indicate the last auto sharding event that was applied to a data stream and its cluster state representation looks like so: ``` "auto_sharding": { "trigger_index_name": ".ds-logs-nginx-2024.02.12-000002", "target_number_of_shards": 3, "event_timestamp": 1707739707954 } ``` The auto sharding service is not used in this PR, so the auto sharding event will not be stored in the data stream metadata, but the required infrastructure to configure it is in place.	2024-03-06 05:12:08 -05:00
Ryan Ernst	6375e9f443	Add native access library (#105100 ) Elasticsearch requires access to some native functions. Historically this has been achieved with the JNA library. However, JNA is a complicated, magical library, and has caused various problems booting Elasticsearch over the years. The new Java Foreign Function and Memory API allows access to call native functions directly from Java. It also has the advantage of tight integration with hotspot which can improve performance of these functions (though performance of Elasticsearch's native calls has never been much of an issue since they are mostly at boot time). This commit adds a new native lib that is internal to Elasticsearch. It is built to use the foreign function api starting with Java 21, and continue using JNA with Java versions below that. Only one function, checking whether Elasticsearch is running as root, is migrated. Future changes will migrate other native functions.	2024-02-07 18:27:09 -05:00
Niels Bauman	64891011d3	Extend `repository_integrity` health indicator for unknown and invalid repos (#104614 ) This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status. To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks. Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.	2024-02-07 15:18:55 +01:00
Craig Taverner	a58b2c2b05	Move doc-values classes needed by ST_INTERSECTS to server (#104980 ) * Move doc-values classes needed by ST_INTERSECTS to server This classes are needed by ESQL spatial queries, and are not licensed in a way that prevents this move. Since they depend on lucene it is not possible to move them to a library. Instead they are moved to be co-located with the GeoPoint doc-values classes that already exist in server. * Moved to lucene package org.elasticsearch.lucene.spatial * Moved Geo/ShapeDocValuesQuery to server because it is Lucene specific And this gives us access to these classes from ESQL for lucene-pushdown of spatial queries.	2024-02-07 15:00:38 +01:00
Benjamin Trent	43362d5de5	Add new int8_flat and flat vector index types (#104872 ) This adds two new vector index types: - flat - int8_flat Both store the vectors in a flat space and search is brute-force over the vectors in the index. For the regular `flat` index, this can be considered syntactic sugar that allows `knn` queries without having to put indices within HNSW. For `int8_flat`, this allows float vectors to be stored in a flat manner, but also automatically quantized.	2024-02-05 12:56:13 -05:00
Daniel Mitterdorfer	6e15229f6e	Make counted terms agg visible to profiling (#105049 ) The counted-terms aggregation is defined in its own plugin. When other plugins (such as the profiling plugin) want to use this aggregation, this leads to class loader issues, such as that the aggregation class is not recognized. By moving just the aggregation code itself to the server module but keeping everything else (including registration) in the `mapper-counted-keyword` module, we can use the counted-terms aggregation also from other plugins.	2024-02-02 15:56:07 +01:00
David Roberts	4e91d690e5	Export random sampler agg from server (#104747 ) The server module exports the classes needed to use most aggregations, but the random sampler aggregation was missed. (I think it's because the PRs to add random sampler and to add modularization in general were both long-running and were in flight around the same time.) This PR adds an export for the random sampler agg, so that it can be used from plugins.	2024-01-25 13:01:35 +00:00
Ignacio Vera	4e7a0dae19	Introduce Elasticsearch PostingFormat based on Lucene 90 positing format using PFOR (#103601 ) Lucene 9.9 has introduced a new posting format that uses FOR instead of PFOR. Elasticsearch prefers the former format, therefore we introduce it as a our own posting format here.	2023-12-20 15:09:24 +01:00
Mary Gouseti	9e3d0dbaf8	[Health API] Abstract data tier diagnoses as node roles (#102466 ) We generalise the code that is diagnosing the shard availability when it comes to data tier issues. We make it more extensible, so in serverless we can introduce new roles. For this reason, we consider a tier as a more specific kind of a role. Then we expose some methods and some diagnosis definitions in the ShardsAvailabilityHealthIndicatorService so they can be extended.	2023-11-22 17:20:10 +02:00
Simon Cooper	4c98fd9c5c	Add a historical feature for transport version fixups (#102211 ) Make sure logging is configured in the historical versions task Co-authored-by: Mark Vieira <portugee@gmail.com>	2023-11-16 10:02:12 +00:00
Simon Cooper	0c18798d59	Add feature for index mapping auto-put (#101668 )	2023-11-13 13:43:56 +00:00
Lee Hinman	4952f986ce	Modularize shard availability service (#101796 ) * Modularize shard availability service This commit moves the `ShardsAvailabilityHealthIndicatorService` to a package and modularizes it with exports so that Serverless can make use of it as a superclass. Relates to #101394	2023-11-03 15:59:09 -06:00
Simon Cooper	e851b303d0	Migrate desirednode processors version checks to features (#101706 )	2023-11-03 13:57:47 +00:00
Simon Cooper	580283025e	Unify naming of feature spec implementations (#101704 ) Use a naming scheme of <area>Features	2023-11-03 09:24:50 +00:00
Simon Cooper	1bb1c7be04	Create a historical feature for the get settings rest action (#101684 )	2023-11-02 08:51:39 +00:00
Simon Cooper	f6a211225a	Add historical feature for cluster health checks (#101538 )	2023-10-31 10:06:14 +00:00
Simon Cooper	bfad5e5b13	Create new feature API for querying features present on a cluster (#100974 ) This adds an internal API and service to manage & get information on features that are present on nodes in a cluster. New features can be declared as supported, and historical features can be added to previous node versions to eventually replace node version comparisons	2023-10-30 14:38:30 +00:00
Stuart Tettemer	f8d09e9c6c	APM Metering API (#99832 ) Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types: * Gauge - a single number that can go up or down * Histogram - bucketed samples * Counter - monotonically increasing summed value * UpDownCounter - summed value that may decrease Supports both Long* and Double* versions of the instruments. Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider. The metering provider starts as the open telemetry noop provider. `telemetry.metrics.enabled` turns on metering.	2023-09-28 19:35:46 -05:00
David Kyle	096cf81670	[ML] Make Inference Services pluggable (#99886 ) Creates an InferenceServicePlugins interface for inference services to implement and adds a test implementation to mock an inference service.	2023-09-27 13:35:45 +01:00
Przemyslaw Gomulka	0efa67821d	Rename TracerPlugin to TelemetryPlugin (#99735 ) with the support of metrics the TracerPlugin name is no longer adequate. Renaming this to TelemetryPlugin. Also introducing TelemetryProvider interface. While it is only used in Node.java at the moment to fetch Tracer instance, it is intended to be used in Plugin::createComponents (to be done in separate commit due to the broad scope of this method) This will allow for plugins to get access to both Tracer and Metric interfaces without the need to add yet another argument to createComponents Also adding internal subpackage in module/apm so that it is more obvious which packages are not exported	2023-09-22 13:35:36 +02:00
Simon Cooper	3f32affbb6	Add component info versions to node info in a pluggable way (#99631 ) This adds a `ComponentVersionNumber` service interface for modules to provide version numbers for individual components to be reported inside node info. Initial implementations for `MlConfigVersion` and `TransformConfigVersion` are provided.	2023-09-21 17:08:43 +01:00
Przemyslaw Gomulka	b6747b48ba	Rename tracing to telemetry package (#99710 ) This commit renames the tracing to telemetry.tracing in both xpack/APM and elasticserach's org.elasticsearch.tracing.Tracer (the api) the xpack/APM is renamed as follows: org.elasticsearch.telemetry.apm - the only exported package org.elasticsearch.telemetry.apm.settings - APMSettings org.elasticsearch.telemetry.apm.tracing - APMTracer org.elasticsearch.tracing.Tracer is moved to org.elasticsearch.telemetry.tracing.Tracer (responsible for majority of the changes in this PR)	2023-09-20 16:58:02 +02:00
Ryan Ernst	a5d07ee51f	Make transport and index versions easier to extend (#99688 ) When overriding transport and index versions, it is difficult to decide whether a version constant in serverless should be returned, or the latest version constant from serverless should be used, since the latest from serverless is not available. This commit adjust the version extension methods to pass in the latest serverless version constants. It also tweaks module and method visibility for a helper method needed.	2023-09-20 06:15:04 -07:00
Ryan Ernst	b2df3313fc	Make cat actions list extensible (#99504 ) This commit adds an internal extension for controlling which cat actions are returned by /_cat.	2023-09-13 20:48:09 -07:00
William Brafford	d32902cf45	Wrap transport version in cluster state (#99114 ) Cluster state currently holds a cluster minimum transport version and a map of nodes to transport versions. However, to determine node compatibility, we will need to account for more types of versions in cluster state than just the transport version (see #99076). Here we introduce a wrapper class to cluster state and update accessors and builders to use the new method. (I would have liked to re-use org.elasticsearch.cluster.node.VersionInformation, but that one holds IndexVersion rather than TransportVersion. * Introduce CompatibilityVersions to cluster state class	2023-09-06 09:52:42 -04:00
Ryan Ernst	47c1d99ae0	Add settings registration for Java modules through SPI (#98857 ) Currently plugins register settings through Plugin.getSettings. For easier breakdown of the codebase, it would be nice to allow arbitrary Java modules to register settings. This commit adds an internal SettingsExtension SPI which acts just like Plugin.getSettings but from a purely static context.	2023-08-25 07:00:49 -07:00
Tim Vernum	3093c40b8b	Make RestController pluggable (#98187 ) This commit changes the ActionModules to allow the RestController to be provided by an internal plugin. It renames `RestInterceptorActionPlugin` to `RestServerActionPlugin` and adds a new `getRestController` method to it. There may be multiple RestServerActionPlugins installed on a node, but only 1 may provide a Rest Wrapper (getRestHandlerInterceptor) and only 1 may provide a RestController (getRestController).	2023-08-08 01:29:24 -04:00
Przemyslaw Gomulka	0aed016215	Fix qualified export for serverless metering (#98106 ) the module name of serverless metering is org.elasticsearch.metering previously an incorrect name co.elastic.metering was used	2023-08-01 17:33:29 +02:00
Przemyslaw Gomulka	999489ce04	Infrastructure to report upon document parsing (#97961 ) In serverless we will like to report (meter and bill) upon a document ingestion. The metering should be agnostic to a document format (document structure should be normalised) hence we should allow to create XContentParsers which will keep track of parsed fields and values. There are 2 places where the parsing of the ingested document happens: 1. upon the 'raw bulk' a request is sent without the pipelines 2. upon the 'ingest service' when a request is sent with pipelines (parsing can occur twice when a dynamic mappings are calculated, this PR takes this into account and prevent double billing) We also want to make sure, that the metering logic is not unnecessarily executed when a document was already reported. That is if a document was reported in IngestService, there is no point wrapping the XContentParser again. This commit introduces a `DocumentReporterPlugin` an internal plugin that will be implemented in serverless. This plugin should return a `DocumentParsingObserver` supplier which will create a `DocumentParsingObserver`. A DocumentParsingObserver is used to wrap an `XContentParser` with an implementation that keeps track of parsed fields and values (performs a metering) and allows to send that information along with an index name to a MeteringReporter.	2023-08-01 13:55:18 +02:00
Ryan Ernst	cc1904add6	Expose build flavor again in nodes info (#98021 ) The nodes info returns some information from the build. However, the flavor is still hardcoded to default, even though flavor was added back to Build. This commit exposes the build flavor again in the nodes info response. It also fixes the build extension to be accessible in serverless.	2023-07-28 06:08:23 -07:00
Ryan Ernst	57d5fbd639	Make build info pluggable internally (#97768 ) This commit makes the Build.current() pluggable. This is only available for internal builds.	2023-07-19 06:02:57 -07:00
Mary Gouseti	a432313ff3	Data stream lifecycle class names (#97381 )	2023-07-05 12:28:32 +03:00
Mary Gouseti	f87c2c7758	Introduce downsampling configuration for data stream lifecycle (#97041 ) This PR introduces downsampling configuration to the data stream lifecycle. Keep in mind downsampling implementation will come in a follow up PR. Configuration looks like this: ``` { "lifecycle": { "data_retention": "90d", "downsampling": [ { "after": "1d", "fixed_interval": "2h" }, { "after": "15d", "fixed_interval": "1d" }, { "after": "30d", "fixed_interval": "1w" } ] } } ``` We will also support using `null` to unset downsampling configuration during template composition: ``` { "lifecycle": { "data_retention": "90d", "downsampling": null } } ```	2023-06-29 16:41:17 +03:00
Przemyslaw Gomulka	31e20d9239	Revert "Add JUL bridge (#96683 )" (#96832 ) This reverts commit `2bdf1bc0d6`.	2023-06-14 14:37:53 +02:00
Ryan Ernst	2bdf1bc0d6	Add JUL bridge (#96683 ) This commit adds the Log4j JUL bridge so that messages using JUL are more nicely converted to log4j messages. Currently these messages are captured via the stdout logging stream. This commit also adds a log4j filter to replace the logging stream filtering mechanism used to quiet some Lucene log messages that may be confusing to users. closes #94613	2023-06-13 19:31:05 -04:00
Kostas Krikellas	67211be81d	Fork TDigest library (#96086 ) * Initial import for TDigest forking. * Fix MedianTest. More work needed for TDigestPercentileTests and the TDigestTest (and the rest of the tests) in the tdigest lib to pass. Fix Dist. * Fix AVLTreeDigest.quantile to match Dist for uniform centroids. * Update docs/changelog/96086.yaml * Fix `MergingDigest.quantile` to match `Dist` on uniform distribution. * Add merging to TDigestState.hashCode and .equals. Remove wrong asserts from tests and MergingDigest. * Fix style violations for tdigest library. * Fix typo. * Fix more style violations. * Fix more style violations. * Fix remaining style violations in tdigest library. * Update results in docs based on the forked tdigest. * Fix YAML tests in aggs module. * Fix YAML tests in x-pack/plugin. * Skip failing V7 compat tests in modules/aggregations. * Fix TDigest library unittests. Remove redundant serializing interfaces from the library. * Remove YAML test versions for older releases. These tests don't address compatibility issues in mixed cluster tests as the latter contain a mix of older and newer nodes, so the output depends on which node is picked as a data node since the forked TDigest library is not backwards compatible (produces slightly different results). * Fix test failures in docs and mixed cluster. * Reduce buffer sizes in MergingDigest to avoid oom. * Exclude more failing V7 compatibility tests. * Update results for JdbcCsvSpecIT tests. * Update results for JdbcDocCsvSpecIT tests. * Revert unrelated change. * More test fixes. * Use version skips instead of blacklisting in mixed cluster tests. * Switch TDigestState back to AVLTreeDigest. * Update docs and tests with AVLTreeDigest output. * Update flaky test. * Remove dead code, esp around tracking of incoming data. * Update docs/changelog/96086.yaml * Delete docs/changelog/96086.yaml * Remove explicit compression calls. This was added to prevent concurrency tests from failing, but it leads to reduces precision. Submit this to see if the concurrency tests are still failing. * Revert "Remove explicit compression calls." This reverts commit `5352c96f65`. * Remove explicit compression calls to MedianAbsoluteDeviation input. * Add unittests for AVL and merging digest accuracy. * Fix spotless violations. * Delete redundant tests and benchmarks. * Fix spotless violation. * Use the old implementation of AVLTreeDigest. The latest library version is 50% slower and less accurate, as verified by ComparisonTests. * Update docs with latest percentile results. * Update docs with latest percentile results. * Remove repeated compression calls. * Update more percentile results. * Use approximate percentile values in integration tests. This helps with mixed cluster tests, where some of the tests where blocked. * Fix expected percentile value in test. * Revert in-place node updates in AVL tree. Update quantile calculations between centroids and min/max values to match v.3.2. * Add SortingDigest and HybridDigest. The SortingDigest tracks all samples in an ArrayList that gets sorted for quantile calculations. This approach provides perfectly accurate results and is the most efficient implementation for up to millions of samples, at the cost of bloated memory footprint. The HybridDigest uses a SortingDigest for small sample populations, then switches to a MergingDigest. This approach combines to the best performance and results for small sample counts with very good performance and acceptable accuracy for effectively unbounded sample counts. * Remove deps to the 3.2 library. * Remove unused licenses for tdigest. * Revert changes for SortingDigest and HybridDigest. These will be submitted in a follow-up PR for enabling MergingDigest. * Remove unused Histogram classes and unit tests. Delete dead and commented out code, make the remaining tests run reasonably fast. Remove unused annotations, esp. SuppressWarnings. * Remove Comparison class, not used. * Small fixes. * Add javadoc and tests. * Remove special logic for singletons in the boundaries. While this helps with the case where the digest contains only singletons (perfect accuracy), it has a major issue problem (non-monotonic quantile function) when the first singleton is followed by a non-singleton centroid. It's preferable to revert to the old version from 3.2; inaccuracies in a singleton-only digest should be mitigated by using a sorted array for small sample counts. * Revert changes to expected values in tests. This is due to restoring quantile functions to match head. * Revert changes to expected values in tests. This is due to restoring quantile functions to match head. * Tentatively restore percentile rank expected results. * Use cdf version from 3.2 Update Dist.cdf to use interpolation, use the same cdf version in AVLTreeDigest and MergingDigest. * Revert "Tentatively restore percentile rank expected results." This reverts commit `7718dbba59`. * Revert remaining changes compared to main. * Revert excluded V7 compat tests. * Exclude V7 compat tests still failing. * Exclude V7 compat tests still failing. * Restore bySize function in TDigest and subclasses.	2023-06-13 11:43:54 +03:00
Ryan Ernst	f086ef1990	Make current Version pluggable for serverless (#96539 ) Version.CURRENT is statically loaded as a constant early during startup. Yet serverless needs to be able to override the current Version so it can add additional versions. This commit makes Version.CURRENT pluggable via SPI. Note that the only way for this to be plugged in is via an additional jar on the boot layer.	2023-06-06 11:32:01 -04:00
Carlos Delgado	39b7b5eb56	Synonym Mgmnt API: PUT request (#95895 )	2023-05-31 10:48:56 +02:00
Athena Brown	d423f40037	Add mechanism to react to termination signals (#95850 ) This commit adds a mechanism to be exposed internally to allow plugins/modules to react to termination signals via SPI. I'm not entirely sure how to test this - normally I'd use an `ESIntegTestCase` and supply a stubbed version of the plugin but that's more difficult with SPI. Supercedes https://github.com/elastic/elasticsearch/pull/95518	2023-05-17 17:18:37 -04:00
Przemyslaw Gomulka	dc03c47ada	Refactor RestMainAction into separate module (#95881 ) we want to allow overriding info (GET /) api in serverless, therefore this commit moves the RestMainAction and is transport classes into a module that has a rest plugin Main endpoint is often used in testing to verfiy that a cluster is ready, hence this commit also has to add a testing dependency on main to a lot of modules relates #95422	2023-05-10 14:39:00 +02:00
William Brafford	a8f6205084	Add ReloadingPlugin type (#95743 ) * Add a ReloadAwarePlugin interface with a qualified export	2023-05-02 21:27:49 -04:00

1 2

88 commits