**Problem:**
For historical reasons, source files for the Elasticsearch Guide's security, watcher, and Logstash API docs are housed in the `x-pack/docs` directory. This can confuse new contributors who expect Elasticsearch Guide docs to be located in `docs/reference`.
**Solution:**
- Move the security, watcher, and Logstash API doc source files to the `docs/reference` directory
- Update doc snippet tests to use security
Rel: https://github.com/elastic/platform-docs-team/issues/208
This makes the data stream lifecycle generally available. This will allow
data streams to take advantage of a native simplified and resilient
lifecycle implementation.
With PR we introduce CRUD endpoints which update/delete the data lifecycle on the data stream level. When this is updated it will apply at the next DLM run to all the backing indices that are managed by DLM.
This PR adds a new remote_clusters section to the xpack usage response
to report stats of remote cluster connections including total number,
mode and security model.
It also adds a new remote_cluster_server sub-section under the existing
security section.
Relates: #94817
The ingest attachment processor is currently available as a plugin. This
commit moves the processor to the default distribution so it is always
available.
* Revert "Revert "[DOCS] Add TSDS docs (#86905)" (#87702)"
This reverts commit 0c86d7b9b2.
* First fix to tests
* Add data_stream object to index template
* small rewording
* Add enable data stream object in gradle example setup
* Add bullet about data stream must be enabled in template
* [DOCS] Add TSDB docs
* Update docs/build.gradle
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Address Nik's comments, part 1
* Address Nik's comments, part deux
* Reword write index
* Add feature flags
* Wrap one more section in feature flag
* Small fixes
* set index.routing_path to optional
* Update storage reduction value
* Update create index template code example
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Synthetic source is the first thing we've documented behind the tsdb
feature flag. This adds the feature flag to the docs sub-project for the
release build so the tests will pass.
Closes#87592
Remove usage of deprecated elasticsearch.rest-test in DocsTestPlugin
we keep some files in src/test in docs projects as moving them would require more changes
in build-docs project outside this repository
When SnippetsTask looks for doc snippets, the list of files it
checks includes roughly 350 files that aren't asciidoc files. Image
files (both png and jpg), yaml files, and so on. Set an explicit
include pattern so that Gradle skips these files instead of trying
to read them.
* Adjusted integration tests to use geoip test fixture or to use test databases provided via config dirs (for qa module / docs).
* Kept the geolite2-databases dependency for most of the unit tests only.
* Made fallback_to_default_databases parameter on geoip processor a noop and emit deprecation warning upon using it.
* If no geoip databases are available yet to a node then the geoip processor factory returns a processor implementation that flags documents that databases are unavailable. This allows these documents to be reindex later with a pipeline. These documents will have a tag string array field, which contains a string _geoip_database_unavailable_{database_name} for each missing database in a pipeline.
* Added reload pipeline capabilities is IngestService, so that when databases are available again on a node then pipelines with geoip processor definition can be reloaded.
Relates to #68920
This commit adds a new multi-bucket aggregation: `categorize_text`
The aggregation follows a similar design to significant text in that it reads from `_source`
and re-analyzes the the text as it is read.
Key difference is that it does not use the indexed field's analyzer, but instead relies on
the `ml_standard` tokenizer with specialized ML token filters. The tokenizer + filters are the
same that machine learning categorization anomaly jobs utilize.
The high level logical flow is as follows:
- at each shard, read in the text field with a custom analyzer using `ml_standard` tokenizer
- Read in the particular tokens from the analyzer
- Feed these tokens to a token tree algorithm (an adaptation of the drain categorization algorithm)
- Gather the individual log categories (the leaf nodes), sort them by doc_count, ship those buckets to be merged
- Merge all buckets that have the EXACT same key
- Once all buckets are merged, pass those keys + counts to a new token tree for additional merging
- That tree builds the final buckets and that is returned to the user
Algorithm explanation:
- Each log is parsed with the ml-standard tokenizer
- each token is passed into a token tree
- For `max_match_token` each token is stored in the tree and at `max_match_token+1` (or `len(tokens)`) a log group is created
- If another log group exists at that leaf, merge it if they have `similarity_threshold` percentage of tokens in common
- merging simply replaces tokens that are different in the group with `*`
- If a layer in the tree has `max_unique_tokens` we add a `*` child and any new tokens are passed through there. Catch here is that on the final merge, we first attempt to merge together subtrees with the smallest number of documents. Especially if the new sub tree has more documents counted.
## Aggregation configuration.
Here is an example on some openstack logs
```js
POST openstack/_search?size=0
{
"aggs": {
"categories": {
"categorize_text": {
"field": "message", // The field to categorize
"similarity_threshold": 20, // merge log groups if they are this similar
"max_unique_tokens": 20, // Max Number of children per token position
"max_match_token": 4, // Maximum tokens to build prefix trees
"size": 1
}
}
}
}
```
This will return buckets like
```json
"aggregations" : {
"categories" : {
"buckets" : [
{
"doc_count" : 806,
"key" : "nova-api.log.1.2017-05-16_13 INFO nova.osapi_compute.wsgi.server * HTTP/1.1 status len time"
}
]
}
}
```
* Flip node shutdown feature flag to default to true on snapshot builds
It previously defaulted to false. The setting can still only be set to 'true' on a
non-release (snapshot) build of Elasticsearch.
Relates to #70338
* Handle case where operator privileges are enabled
This converts the system property feature flag 'es.shutdown_feature_flag_enabled' to a regular
non-dynamic node setting. This setting can only be set to 'true' on a snapshot build of
Elasticsearch (not a release build).
Relates to #70338
Enroll node API can be used by new nodes in order to join an
existing cluster that has security features enabled. The response
of a call to this API contains all the necessary information that
the new node requires in order to configure itself and bootstrap
trust with the existing cluster.
This commit adds a new pipeline aggregation that allows correlation within the aggregation frame work in bucketed values.
The initial function is a `count_correlation` function. The purpose of which is to correlate the count in a consistent number of buckets with a pre calculated indicator. The indicator and the aggregated buckets should related to the same metrics with in documents.
Example for correlating terms within a `service.version.keyword` with latency percentiles. The percentiles and provided correlation indicator both refer to the same source data where the indicator was previously calculated.:
```
GET apm-7.12.0-transaction-generated/_search
{
"size": 0,
"aggs": {
"field_terms": {
"terms": {
"field": "service.version.keyword",
"size": 20
},
"aggs": {
"latency_range": {
"range": {
"field": "transaction.duration.us",
"ranges": [<snip>],
"keyed": true
}
},
"correlation": {
"bucket_correlation": {
"buckets_path": "latency_range>_count",
"count_correlation": {
"indicator": {
"expectations": [<snip>],
"doc_count": 20000
}
}
}
}
}
}
}
}
```
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
Changes:
* Refactors the "Getting Started" content down to one page.
* Refactors the README to reduce duplicated content and better mirror
Kibana's.
* Focuses the quick start on time series data, including data streams
and runtime fields.
* Streamlines self-managed install instructions to Docker.
Co-authored-by: debadair <debadair@elastic.co>
This PR adds documentation for GeoIPv2 auto-update feature.
It also changes related settings names from geoip.downloader.* to ingest.geoip.downloader to have the same convention as current setting.
Relates to #68920
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
- Update gradle wrapper to gradle 7.0
- Remove deprecated usages to make build 7.0 compatible
- Fix excludes in docs snippet tasks (See https://github.com/gradle/gradle/issues/16160 for details)
- Fix deprecation warnings in 7.0
- Add explicit dependencies that have been missed
- Make extract native licenses tasks output dir more explicit
- Use a snapshot of the ospackage plugin that includes a fix for 7.0 already
- fix test runtime classpath setup in repository-hdfs
- Make task dependency explicit to fix further deprecation warnings
- Remove manual check for http repo usages that has been deprecated in gradle 7.0
- Update spock to latest 2.0 milestone required for groovy 3
This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)
Relates to #69291
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This adds named `teardown` support for doc tests similar to its support
for named `setup` section. This is useful when many doc files want to
share a similar `setup` AND `teardown`. I've introduced an example of
this in the CCR docs just to prove its works. We expect we'll use it for
datastreams as well.
Closes#70830
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This commit adds the rest endpoints for the node shutdown API. These APIs are behind the
`es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing.
Currently these APIs do not do anything, returning immediately. We plan to implement them for real
in subsequent work.
Relates to #70338
* Fixing Painless tests.
* Update runtime field context to fix test cases.
* Remove watcher logging from usage API and replace test.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
* Adds datetime as a date, which is necessary in setup.
* Updating field context example.
* Fixing sample data, updating context example, and updating runtime example.
* Updating field context and changing runtime field to use seats data.
* Update filter context to use the seats data.
* Updating min-should-match context to use seats data.
* Replacing last mentions of TEST[skip].
* Update usage with watcher response for build error.
* Updating usage API again for watcher.
* Third time's a charm for fixing test cases.
* Adding specific test replacement for watcher logging total.
* Change actors to keyword based on review feedback.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Initial updates to the seats data.
* Enhance seats test in gradle.build.
* Updating bulk ingest example to use test data.
* Updating examples and context example intro.
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.
This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.
It also addresses a number of tests that would fail in approved only mode
Mainly:
Tests that use PBKDF2 with a password less than 112 bits (14char). We
elected to change the passwords used everywhere to be at least 14
characters long instead of mandating
the use of pbkdf2_stretch because both pbkdf2 and
pbkdf2_stretch are supported and allowed in fips mode and it makes sense
to test with both. We could possibly figure out the password algorithm used
for each test and adjust password length accordingly only for pbkdf2 but
there is little value in that. It's good practice to use strong passwords so if
our docs and tests use longer passwords, then it's for the best. The approach
is brittle as there is no guarantee that the next test that will be added won't
use a short password, so we add some testing documentation too.
This leaves us with a possible coverage gap since we do support passwords
as short as 6 characters but we only test with > 14 chars but the
validation itself was not tested even before. Tests can be added in a followup,
outside of fips related context.
Tests that use a PKCS12 keystore and were not already muted.
Tests that depend on running test clusters with a basic license or
using the OSS distribution as FIPS 140 support is not available in
neither of these.
Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
Removed the autoscaling feature flags, autoscaling is now on by default
(though it requires an external system to handle the autoscaling
events). Added experimental notice to all autoscaling related
documentation pages.
Relates #51191