This change will add logic to the put rollup api that fails if no rollup job is active and no rollup index exists in the cluster.
The logic first check whether there is an active rollup persistent task if there are no active rollup persistent tasks, then it checks whether any rollup index exists. The latter check is an expensive check, but assuming that it only runs as part of the put rollup job api and only when there are no rollup jobs, this should be ok.
All tests that invoke the put rollup job api will need to be adjusted to create a dummy index that has rollup mapping metadata. Otherwise, tests can't create a rollup job.
Closes#108381
Adding support for MDX files in our :docs project. We parse those *.mdx files
like we do for asciidoc files for code snippets and generate yaml specs from them that
we test as part of our integration tests.
By default:
When searching for doc sources in the docs folder we fail the build if we detect multiple files of
the same name but different extension. E.g. having painless-field-context.mdx
and painless-field-context.asciidoc in the same source folder will fail the build.
Migration Mode:
To allow easier migration from asciidoc to mdx the build supports a kind of migration mode.
When running the build with -Dgradle.docs.migration=true (e.g. ./gradlew buildRestTests -Dgradle.docs.migration=true)
Duplicate doc source files (asciidoc and mdx) are allowed
The Generated yaml rest specs for duplicates will have the extension *.mdx.yml or *asciidoc.yml.
The generated yaml rest specs for duplicates are compared to each other to ensure they produce the same yml output.
**Problem:**
For historical reasons, source files for the Elasticsearch Guide's security, watcher, and Logstash API docs are housed in the `x-pack/docs` directory. This can confuse new contributors who expect Elasticsearch Guide docs to be located in `docs/reference`.
**Solution:**
- Move the security, watcher, and Logstash API doc source files to the `docs/reference` directory
- Update doc snippet tests to use security
Rel: https://github.com/elastic/platform-docs-team/issues/208
This makes the data stream lifecycle generally available. This will allow
data streams to take advantage of a native simplified and resilient
lifecycle implementation.
With PR we introduce CRUD endpoints which update/delete the data lifecycle on the data stream level. When this is updated it will apply at the next DLM run to all the backing indices that are managed by DLM.
This PR adds a new remote_clusters section to the xpack usage response
to report stats of remote cluster connections including total number,
mode and security model.
It also adds a new remote_cluster_server sub-section under the existing
security section.
Relates: #94817
The ingest attachment processor is currently available as a plugin. This
commit moves the processor to the default distribution so it is always
available.
* Revert "Revert "[DOCS] Add TSDS docs (#86905)" (#87702)"
This reverts commit 0c86d7b9b2.
* First fix to tests
* Add data_stream object to index template
* small rewording
* Add enable data stream object in gradle example setup
* Add bullet about data stream must be enabled in template
* [DOCS] Add TSDB docs
* Update docs/build.gradle
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Address Nik's comments, part 1
* Address Nik's comments, part deux
* Reword write index
* Add feature flags
* Wrap one more section in feature flag
* Small fixes
* set index.routing_path to optional
* Update storage reduction value
* Update create index template code example
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Synthetic source is the first thing we've documented behind the tsdb
feature flag. This adds the feature flag to the docs sub-project for the
release build so the tests will pass.
Closes#87592
Remove usage of deprecated elasticsearch.rest-test in DocsTestPlugin
we keep some files in src/test in docs projects as moving them would require more changes
in build-docs project outside this repository
When SnippetsTask looks for doc snippets, the list of files it
checks includes roughly 350 files that aren't asciidoc files. Image
files (both png and jpg), yaml files, and so on. Set an explicit
include pattern so that Gradle skips these files instead of trying
to read them.
* Adjusted integration tests to use geoip test fixture or to use test databases provided via config dirs (for qa module / docs).
* Kept the geolite2-databases dependency for most of the unit tests only.
* Made fallback_to_default_databases parameter on geoip processor a noop and emit deprecation warning upon using it.
* If no geoip databases are available yet to a node then the geoip processor factory returns a processor implementation that flags documents that databases are unavailable. This allows these documents to be reindex later with a pipeline. These documents will have a tag string array field, which contains a string _geoip_database_unavailable_{database_name} for each missing database in a pipeline.
* Added reload pipeline capabilities is IngestService, so that when databases are available again on a node then pipelines with geoip processor definition can be reloaded.
Relates to #68920
This commit adds a new multi-bucket aggregation: `categorize_text`
The aggregation follows a similar design to significant text in that it reads from `_source`
and re-analyzes the the text as it is read.
Key difference is that it does not use the indexed field's analyzer, but instead relies on
the `ml_standard` tokenizer with specialized ML token filters. The tokenizer + filters are the
same that machine learning categorization anomaly jobs utilize.
The high level logical flow is as follows:
- at each shard, read in the text field with a custom analyzer using `ml_standard` tokenizer
- Read in the particular tokens from the analyzer
- Feed these tokens to a token tree algorithm (an adaptation of the drain categorization algorithm)
- Gather the individual log categories (the leaf nodes), sort them by doc_count, ship those buckets to be merged
- Merge all buckets that have the EXACT same key
- Once all buckets are merged, pass those keys + counts to a new token tree for additional merging
- That tree builds the final buckets and that is returned to the user
Algorithm explanation:
- Each log is parsed with the ml-standard tokenizer
- each token is passed into a token tree
- For `max_match_token` each token is stored in the tree and at `max_match_token+1` (or `len(tokens)`) a log group is created
- If another log group exists at that leaf, merge it if they have `similarity_threshold` percentage of tokens in common
- merging simply replaces tokens that are different in the group with `*`
- If a layer in the tree has `max_unique_tokens` we add a `*` child and any new tokens are passed through there. Catch here is that on the final merge, we first attempt to merge together subtrees with the smallest number of documents. Especially if the new sub tree has more documents counted.
## Aggregation configuration.
Here is an example on some openstack logs
```js
POST openstack/_search?size=0
{
"aggs": {
"categories": {
"categorize_text": {
"field": "message", // The field to categorize
"similarity_threshold": 20, // merge log groups if they are this similar
"max_unique_tokens": 20, // Max Number of children per token position
"max_match_token": 4, // Maximum tokens to build prefix trees
"size": 1
}
}
}
}
```
This will return buckets like
```json
"aggregations" : {
"categories" : {
"buckets" : [
{
"doc_count" : 806,
"key" : "nova-api.log.1.2017-05-16_13 INFO nova.osapi_compute.wsgi.server * HTTP/1.1 status len time"
}
]
}
}
```
* Flip node shutdown feature flag to default to true on snapshot builds
It previously defaulted to false. The setting can still only be set to 'true' on a
non-release (snapshot) build of Elasticsearch.
Relates to #70338
* Handle case where operator privileges are enabled
This converts the system property feature flag 'es.shutdown_feature_flag_enabled' to a regular
non-dynamic node setting. This setting can only be set to 'true' on a snapshot build of
Elasticsearch (not a release build).
Relates to #70338
Enroll node API can be used by new nodes in order to join an
existing cluster that has security features enabled. The response
of a call to this API contains all the necessary information that
the new node requires in order to configure itself and bootstrap
trust with the existing cluster.
This commit adds a new pipeline aggregation that allows correlation within the aggregation frame work in bucketed values.
The initial function is a `count_correlation` function. The purpose of which is to correlate the count in a consistent number of buckets with a pre calculated indicator. The indicator and the aggregated buckets should related to the same metrics with in documents.
Example for correlating terms within a `service.version.keyword` with latency percentiles. The percentiles and provided correlation indicator both refer to the same source data where the indicator was previously calculated.:
```
GET apm-7.12.0-transaction-generated/_search
{
"size": 0,
"aggs": {
"field_terms": {
"terms": {
"field": "service.version.keyword",
"size": 20
},
"aggs": {
"latency_range": {
"range": {
"field": "transaction.duration.us",
"ranges": [<snip>],
"keyed": true
}
},
"correlation": {
"bucket_correlation": {
"buckets_path": "latency_range>_count",
"count_correlation": {
"indicator": {
"expectations": [<snip>],
"doc_count": 20000
}
}
}
}
}
}
}
}
```
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
Changes:
* Refactors the "Getting Started" content down to one page.
* Refactors the README to reduce duplicated content and better mirror
Kibana's.
* Focuses the quick start on time series data, including data streams
and runtime fields.
* Streamlines self-managed install instructions to Docker.
Co-authored-by: debadair <debadair@elastic.co>
This PR adds documentation for GeoIPv2 auto-update feature.
It also changes related settings names from geoip.downloader.* to ingest.geoip.downloader to have the same convention as current setting.
Relates to #68920
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
- Update gradle wrapper to gradle 7.0
- Remove deprecated usages to make build 7.0 compatible
- Fix excludes in docs snippet tasks (See https://github.com/gradle/gradle/issues/16160 for details)
- Fix deprecation warnings in 7.0
- Add explicit dependencies that have been missed
- Make extract native licenses tasks output dir more explicit
- Use a snapshot of the ospackage plugin that includes a fix for 7.0 already
- fix test runtime classpath setup in repository-hdfs
- Make task dependency explicit to fix further deprecation warnings
- Remove manual check for http repo usages that has been deprecated in gradle 7.0
- Update spock to latest 2.0 milestone required for groovy 3
This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)
Relates to #69291
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This adds named `teardown` support for doc tests similar to its support
for named `setup` section. This is useful when many doc files want to
share a similar `setup` AND `teardown`. I've introduced an example of
this in the CCR docs just to prove its works. We expect we'll use it for
datastreams as well.
Closes#70830
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This commit adds the rest endpoints for the node shutdown API. These APIs are behind the
`es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing.
Currently these APIs do not do anything, returning immediately. We plan to implement them for real
in subsequent work.
Relates to #70338
* Fixing Painless tests.
* Update runtime field context to fix test cases.
* Remove watcher logging from usage API and replace test.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
* Adds datetime as a date, which is necessary in setup.
* Updating field context example.
* Fixing sample data, updating context example, and updating runtime example.
* Updating field context and changing runtime field to use seats data.
* Update filter context to use the seats data.
* Updating min-should-match context to use seats data.
* Replacing last mentions of TEST[skip].
* Update usage with watcher response for build error.
* Updating usage API again for watcher.
* Third time's a charm for fixing test cases.
* Adding specific test replacement for watcher logging total.
* Change actors to keyword based on review feedback.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Initial updates to the seats data.
* Enhance seats test in gradle.build.
* Updating bulk ingest example to use test data.
* Updating examples and context example intro.