elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-25 07:37:19 -04:00

Author	SHA1	Message	Date
Benjamin Trent	b90b3450a2	[ML] [Transforms] prefer secondary auth headers for transforms (#86757 ) When creating and updating transforms, it is possible for clients to provide secondary headers. When PUT, _preview, _update is called with secondary authorization headers, those are then used or stored with the transform. closes: https://github.com/elastic/elasticsearch/issues/86731	2022-05-16 10:13:21 -04:00
Przemysław Witek	70e37ae7c6	[Transform] Support `range` aggregation in transform (#86501 )	2022-05-16 15:21:00 +02:00
Benjamin Trent	88a5da9560	[ML] [Transforms] fix transform _start permissions to use stored headers in the config (#86802 ) It was previously required that the _start API caller required the same roles as the create API caller. This does not make sense as when the transform is actually running (after _start) we rely solely on the roles of the caller who created the transform. Consequently, this commit does the permission validations and various checks with the roles of user who created the transform, not the one calling _start	2022-05-16 09:10:01 -04:00
Sohail Mirza	9117f0e42a	Docs: Remove extraneous backtick (#86750 )	2022-05-16 10:49:22 +02:00
István Zoltán Szabó	95ef40656f	[DOCS] Adds more details to the frequent items agg documentation (#86661 ) Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>	2022-05-16 10:24:14 +02:00
Tim Vernum	6e32fed6e5	[DOCS] Fix name of OIDC JWT sig algorithm setting (#86561 ) The `client_auth_jwt_signature_algorithm` was incorrectly documented.	2022-05-12 12:09:01 -04:00
Adam Locke	7db1c807f2	Fix a linebreak (#86739 ) (#86742 ) (cherry picked from commit `5ee3bbaa79`) Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>	2022-05-12 11:04:57 -04:00
Adam Locke	53b55a711b	Rectified the "Add lifecycle policy" hyperlink. (#86717 ) (#86735 ) "Add" was out of the hyperlink context which I have fixed it. Earlier line 71 was like : * Add <<set-up-lifecycle-policy,lifecycle policy>> After rectifying line 71 is like : * <<set-up-lifecycle-policy,Add lifecycle policy>> (cherry picked from commit `3b8d51c696`) Co-authored-by: Tapomoy Bhowmik <99604828+TapomoyBhowmik@users.noreply.github.com>	2022-05-12 10:22:40 -04:00
vincetrumental	05b7664272	correct way of getting node heap size (#85045 ) * correct way of getting node heap size in [[shard-count-recommendation]], we explain that the number of shards should be at most 20 shards per GB of heap. but the command to get relevant heap size should be _cat/nodes?v=true&h=heap.max and not _cat/nodes?v=true&h=heap.current . The latter gives the current memory consumption, which is alway moving. Here we need to consider the max allocated heap size (-Xmx) * Adds heap.max to valid columns Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-05-11 09:59:34 -04:00
David Turner	6f0cee0fae	Add master_timeout support to voting config exclusions APIs (#86670 ) Today the add/clear voting config exclusions APIs route a request to the master node but do not expose the usual `?master_timeout` parameter allowing to change the timeout for this phase of execution. This commit adds the missing parameter.	2022-05-11 13:56:50 +01:00
Rene Groeschke	62d5aa986c	Port gradle docs test plugin to use internal yaml rest test plugin (#86598 ) Remove usage of deprecated elasticsearch.rest-test in DocsTestPlugin we keep some files in src/test in docs projects as moving them would require more changes in build-docs project outside this repository	2022-05-11 12:01:23 +02:00
Luca Belluccini	1c52081b1f	[DOC] Air gapped environments and GEOIP (#85637 ) * [DOC] Air gapped environments and GEOIP Closing https://github.com/elastic/elasticsearch/issues/85542 * Use variable name for Elasticsearch Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-05-10 16:34:28 -04:00
Antonio Bonuccelli	b556b4b5d0	[Docs] Document apiKey usage in remote reindex (#85209 ) * document cloud_id usage * actually no cloud id used * [source,console] * suggested change * Mark example as NOTCONSOLE * Add tests * Add comma * Fix comma (for real this time) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-05-10 14:31:42 -04:00
Lisa Cawley	a9c8c12814	[DOCS] Removes infer trained model deployment API (#86497 )	2022-05-10 09:56:36 -07:00
Stef Nestor	5aa174e6d3	[DOC] auto migrate only for default template (#82043 ) * [DOC] auto migrate only for default template @DanRoscigno I'm not replicating [this](https://www.elastic.co/guide/en/elasticsearch/reference/7.16/migrate-index-allocation-filters.html#on-prem-migrate-to-node-roles) in ECE, is it only for the default template? * Update docs/reference/data-management/migrate-index-allocation-filters.asciidoc Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>	2022-05-10 11:35:19 -04:00
Adam Locke	4bf3f0beca	[DOCS] Fix CA cert directory for client connections (#86613 )	2022-05-10 11:10:17 -04:00
Dimitris Athanasiou	68c51f3ada	[ML] Rename threading params in _start trained model deployment API (#86597 ) When starting a trained model deployment the user can tweak performance by setting the `model_threads` and `inference_threads` parameters. These parameters are hard to understand and cause confusion. This commit renames these as well as the fields where their values are reported in the stats API. - `model_threads` => `number_of_allocations` - `inference_threads` => `threads_per_allocation` Now the terminology is as follows. A model deployment starts with a requested `number_of_allocations`. Each allocation means the model gets another thread for executing parallel inference requests. Thus, more allocations should increase throughput. In its turn, each allocation is may be using a number of threads to parallelize each individual inference request. This is the `threads_per_allocation` setting and increases inference speed (which might also result in improved throughput).	2022-05-10 17:41:00 +03:00
Nik Everett	a589456b81	Synthetic source (#85649 ) This attempts to shrink the index by implementing a "synthetic _source" field. You configure it by in the mapping: ``` { "mappings": { "_source": { "synthetic": true } } } ``` And we just stop storing the `_source` field - kind of. When you go to access the `_source` we regenerate it on the fly by loading doc values. Doc values don't preserve the original structure of the source you sent so we have to make some educated guesses. And we have a rule: the source we generate would result in the same index if you sent it back to us. That way you can use it for things like `_reindex`. Fetching the `_source` from doc values does slow down loading somewhat. See numbers further down. ## Supported fields This only works for the following fields: * `boolean` * `byte` * `date` * `double` * `float` * `geo_point` (with precision loss) * `half_float` * `integer` * `ip` * `keyword` * `long` * `scaled_float` * `short` * `text` (when there is a `keyword` sub-field that is compatible with this feature) ## Educated guesses The synthetic source generator makes `_source` fields that are: * sorted alphabetically * as "objecty" as possible * pushes all arrays to the "leaf" fields * sorts most array values * removes duplicate text and keyword values These are mostly artifacts of how doc values are stored. ### sorted alphabetically ``` { "b": 1, "c": 2, "a": 3 } ``` becomes ``` { "a": 3, "b": 1, "c": 2 } ``` ### as "objecty" as possible ``` { "a.b": "foo" } ``` becomes ``` { "a": { "b": "foo" } } ``` ### pushes all arrays to the "leaf" fields ``` { "a": [ { "b": "foo", "c": "bar" }, { "c": "bort" }, { "b": "snort" } } ``` becomes ``` { "a" { "b": ["foo", "snort"], "c": ["bar", "bort"] } } ``` ### sorts most array values ``` { "a": [2, 3, 1] } ``` becomes ``` { "a": [1, 2, 3] } ``` ### removes duplicate text and keyword values ``` { "a": ["bar", "baz", "baz", "baz", "foo", "foo"] } ``` becomes ``` { "a": ["bar", "baz", "foo"] } ``` ## `_recovery_source` Elasticsearch's shard "recovery" process needs `_source` sometimes. So does cross cluster replication. If you disable source or filter it somehow we store a `_recovery_source` field for as long as the recovery process might need it. When everything is running smoothly that's generally a few seconds or minutes. Then the fields is removed on merge. This synthetic source feature continues to produce `_recovery_source` and relies on it for recovery. It's possible to synthesize `_source` during recovery but we don't do it. That means that synethic source doesn't speed up writing the index. But in the future we might be able to turn this on to trade writing less data at index time for slower recovery and cross cluster replication. That's an area of future improvement. ## perf numbers I loaded the entire tsdb data set with this change and the size: ``` standard -> synthetic store size 31.0 GB -> 7.0 GB (77.5% reduction) _source 24695.7 MB -> 47.6 MB (99.8% reduction - synthetic is in _recovery_source) ``` A second _forcemerge a few minutes after rally finishes should removes the remaining 47.6MB of _recovery_source. With this fetching source for 1,000 documents seems to take about 500ms. I spot checked a lot of different areas and haven't seen any different hit. I expect this performance impact is based on the number of doc values fields in the index and how sparse they are.	2022-05-10 07:46:58 -04:00
Andrei Dan	21785c9a77	How-to docs for increasing the total number of shards per node (#86214 ) Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>	2022-05-10 09:13:27 +01:00
Keith Massey	6d975a6419	Health API explain query param (#86410 ) The health API has a notion of details within each health indicator that is returned. These details can sometimes be expensive to compute or transfer. This change allows a user to specify whether the details are generated and returned. By default now all details are generated and returned (previously this was only the case if a component was specified in the request). This behavior can be changed with the explain query param. Closes #86215	2022-05-09 08:46:02 -05:00
István Zoltán Szabó	15ea957df6	[DOCS] Expands transform setup page with info on spaces. (#86479 )	2022-05-09 11:14:24 +02:00
Przemyslaw Gomulka	a1d2981fc1	[doc] Explicitly mention about node shutdown remove for cluster shrink (#86173 )	2022-05-09 10:24:54 +02:00
David Turner	c4532504be	Small additions to the register-repo docs (#86122 ) 1. Adds a note that you can restore older snapshots (to recover from a failed upgrade) even after newer snapshots were taken. 2. Copies the note about incompatible S3 repo implementations to the top level to avoid misunderstandings.	2022-05-09 07:37:20 +01:00
Joe Gallo	6aaf0972a3	Make the ILM and SLM history_index_enabled settings dynamic (#86493 )	2022-05-06 13:07:54 -04:00
Tanguy Leroux	6491ae0dae	Add note that searchable snapshots indices cannot be snapshotted into source-only repositories (#86208 ) Relates #86207	2022-05-06 11:33:50 +02:00
Lisa Cawley	89a3e18e10	[DOCS] Add preview admonition to infer API (#86486 )	2022-05-05 13:49:02 -07:00
Benjamin Trent	a907f0bb6f	[ML] add new trained_models/{model_id}/_infer endpoint for all supervised models and deprecate deployment infer api (#86361 ) This commit adds a new `_ml/trained_models/{model_id}/_infer` API. This api works for both native NLP models and supervised models trained via Data Frame analytics. The format of the API is the same as the old `_ml/trained_models/{model_id}/deployment/_infer`. Taking a `docs` and an `inference_config` parameter. This PR also deprecates the old experimental `_ml/trained_models/{model_id}/deployment/_infer` API. The biggest difference is that the response now nests all results under an "inference_results" object. closes: https://github.com/elastic/elasticsearch/issues/86032	2022-05-05 14:58:59 -04:00
Nicole Albee	2af0126949	[doc] Update to include API for checking JVM pointers (#86360 ) * [doc] Add information for how to find if compressed ordinary object pointers is in use using the REST APIs. * Update docs/reference/setup/advanced-configuration.asciidoc Co-authored-by: Nikola Grcevski <6207777+grcevski@users.noreply.github.com> Co-authored-by: Nikola Grcevski <6207777+grcevski@users.noreply.github.com>	2022-05-05 13:44:38 -05:00
István Zoltán Szabó	e590e900a4	[DOCS] Adds frequent items agg docs (#86037 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2022-05-05 16:07:24 +02:00
István Zoltán Szabó	0c0c97070d	[DOCS] Adds limitation on deprecated painless scripts and edits limitation regarding spaces in Transform docs (#86397 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2022-05-05 13:31:16 +02:00
Benjamin Trent	237e345d71	[ML][Docs] fix minimum buckets for change_point agg (#86396 )	2022-05-04 09:37:46 -04:00
Adam Locke	6ebe562c5b	[DOCS] Fix config directory for installation (#86390 )	2022-05-03 09:06:25 -04:00
Francisco Fernández Castaño	ce9819fa6c	Keep track of desired nodes cluster membership (#84165 ) This commit adds tracking for desired nodes cluster membership. When desired nodes are updated they are matched against the current cluster members. Additionally when a node joins the cluster the desired nodes cluster membership is updated.	2022-05-03 14:06:48 +02:00
Yang Wang	5f82235180	[Docs] Fix url for feature migration APIs (#86330 ) Both GET and POST version of the API have a leading underscore.	2022-05-03 10:05:02 +10:00
Tim Vernum	4d7a516dac	Correct docs on DLS bitset cache default values (#86282 ) In #50535 (ES v7.6) the default values for the `DocumentSubsetBitsetCache` settings were changed. However, the docs were not updated at that time, and still reflect the old values for these settings	2022-05-01 22:40:03 -04:00
Armin Braun	b323e8e1db	Add parameter to exclude indices in a snapshot from response (#86269 ) Adds a parameter `index_names` to the get snapshots API so that users may exclude the potentially very long index name lists when listing out snapshots. closes #82937	2022-04-29 15:04:43 +02:00
Benjamin Trent	c49b92e425	Allow bucket paths to specify _count within a bucket (#85720 ) Users should be able to specify specific metrics/keys within a specific bucket key. An example is `agg["bucket_foo"]._count`. This change now allows that. closes: https://github.com/elastic/elasticsearch/issues/76320	2022-04-29 08:42:46 -04:00
Justin Cranford	d4c1c2efbd	Add missing settings hmac_jwkset and hmac_key for JWT realm to security-settings.asciidoc (#86085 )	2022-04-28 16:09:53 -04:00
Craig Taverner	68f432275d	Added documentation on GeoJSON format for points and geo-points (#86066 ) * Added documentation on GeoJSON format for points And geo-points. * Fixed some small mistakes in painless geo-point	2022-04-28 10:41:07 +02:00
debadair	3d4dea8336	[DOCS] Remove unnecessary link. (#86205 )	2022-04-27 15:44:43 -07:00
Adam Locke	336965201b	[DOCS] Update quickstart curl commands (#86145 ) * [DOCS] Update quickstart curl commands * Change Kibana container name	2022-04-27 08:43:00 -04:00
Salvatore Campagna	42e8bd847b	Update8.2.0 BC4 docs (#86084 ) (#86213 )	2022-04-27 14:09:40 +02:00
Lisa Cawley	dd885e28fa	[DOCS] Add transform limitation for underscore field names (#86195 )	2022-04-26 12:44:14 -07:00
David Turner	79f181d208	Reduce resource needs of join validation (#85380 ) Fixes a few scalability issues around join validation: - compresses the cluster state sent over the wire - shares the serialized cluster state across multiple nodes - forks the decompression/deserialization work off the transport thread Relates #77466 Closes #83204	2022-04-26 12:15:54 +01:00
Rune Antonsen	ce4c00f898	fix(docker-compose.yml): check correct filepath (#85602 ) Add `config/` to the path in the `if`-checks in `docker-compose` documentation.	2022-04-26 11:44:22 +01:00
David Turner	e70dc48220	Permit removal of archived index settings (#86107 ) Today you cannot remove archived index settings by applying a setting update `{"archived.*":null}` because `IndexSettings#same` incorrectly treats such an update as a no-op. This commit fixes that.	2022-04-26 08:27:16 +01:00
James Garside	fca3487395	Updated format parameter description to reference Java decimal format (#86163 )	2022-04-25 20:52:44 +01:00
Gabi Davar	43ab984639	Add documentation for "io_time_in_millis" (#84911 ) Add documentation for "io_time_in_millis" Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-04-25 16:43:19 +01:00
Bogdan Pintea	d8e6cc2096	SQ: Allow partial results in SQL queries (#85897 ) This adds support for partial results to SQL. The lenient mode is controlled by a new query paramter, `allow_partial_search_results`, false by default. On shard failures, the errors are added as Warning headers to the response. Only a first set of failures are sent to the client, the last header briefs on the number of remaining suppressed ones.	2022-04-21 18:12:50 +02:00
Justin Cranford	94b45585a1	[DOCS] Add documentation for JWT realm (#85189 )	2022-04-21 11:23:12 -04:00

1 2 3 4 5 ...

9702 commits