mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
297 lines
No EOL
13 KiB
Text
297 lines
No EOL
13 KiB
Text
[role="xpack"]
|
|
[[transform-limitations]]
|
|
= {transform-cap} limitations
|
|
[subs="attributes"]
|
|
++++
|
|
<titleabbrev>Limitations</titleabbrev>
|
|
++++
|
|
|
|
The following limitations and known problems apply to the {version} release of
|
|
the Elastic {transform} feature. The limitations are grouped into the following
|
|
categories:
|
|
|
|
* <<transform-config-limitations>> apply to the configuration process of the
|
|
{transforms}.
|
|
* <<transform-operational-limitations>> affect the behavior of the {transforms}
|
|
that are running.
|
|
* <<transform-ui-limitations>> only apply to {transforms} managed via the user
|
|
interface.
|
|
|
|
|
|
[discrete]
|
|
[[transform-config-limitations]]
|
|
== Configuration limitations
|
|
|
|
[discrete]
|
|
[[transforms-ccs-limitation]]
|
|
=== {transforms-cap} support {ccs} if the remote cluster is configured properly
|
|
|
|
If you use <<modules-cross-cluster-search,{ccs}>>, the remote cluster must
|
|
support the search and aggregations you use in your {transforms}.
|
|
{transforms-cap} validate their configuration; if you use {ccs} and the
|
|
validation fails, make sure that the remote cluster supports the query and
|
|
aggregations you use.
|
|
|
|
[discrete]
|
|
[[transform-painless-limitation]]
|
|
=== Using scripts in {transforms}
|
|
|
|
{transforms-cap} support scripting in every case when aggregations support them.
|
|
However, there are certain factors you might want to consider when using scripts
|
|
in {transforms}:
|
|
|
|
* {transforms-cap} cannot deduce index mappings for output fields when the
|
|
fields are created by a script. In this case, you might want to create the
|
|
mappings of the destination index yourself prior to creating the transform.
|
|
|
|
* Scripted fields may increase the runtime of the {transform}.
|
|
|
|
* {transforms-cap} cannot optimize queries when you use scripts for all the
|
|
groupings defined in `group_by`, you will receive a warning message when you
|
|
use scripts this way.
|
|
|
|
[discrete]
|
|
[[transform-runtime-field-limitation]]
|
|
=== {transforms-cap} perform better on indexed fields
|
|
|
|
{transforms-cap} sort data by a user-defined time field, which is frequently
|
|
accessed. If the time field is a {ref}/runtime.html[runtime field], the
|
|
performance impact of calculating field values at query time can significantly
|
|
slow the {transform}. Use an indexed field as a time field when using
|
|
{transforms}.
|
|
|
|
[discrete]
|
|
[[transform-scheduling-limitations]]
|
|
=== {ctransform-cap} scheduling limitations
|
|
|
|
A {ctransform} periodically checks for changes to source data. The functionality
|
|
of the scheduler is currently limited to a basic periodic timer which can be
|
|
within the `frequency` range from 1s to 1h. The default is 1m. This is designed
|
|
to run little and often. When choosing a `frequency` for this timer consider
|
|
your ingest rate along with the impact that the {transform}
|
|
search/index operations has other users in your cluster. Also note that retries
|
|
occur at `frequency` interval.
|
|
|
|
|
|
[discrete]
|
|
[[transform-operational-limitations]]
|
|
== Operational limitations
|
|
|
|
[discrete]
|
|
[[transform-rolling-upgrade-limitation]]
|
|
=== {transforms-cap} reassignment suspended during a rolling upgrade from 7.2 and 7.3
|
|
|
|
If your cluster contains mixed version nodes, for example during a rolling
|
|
upgrade from 7.2 or 7.3 to a newer version, {transforms} whose nodes are stopped
|
|
will not be reassigned until the upgrade is complete. After the upgrade is done,
|
|
{transforms} resume automatically; no action is required.
|
|
|
|
[discrete]
|
|
[[transform-aggresponse-limitations]]
|
|
=== Aggregation responses may be incompatible with destination index mappings
|
|
|
|
When a {transform} is first started, it will deduce the mappings
|
|
required for the destination index. This process is based on the field types of
|
|
the source index and the aggregations used. If the fields are derived from
|
|
<<search-aggregations-metrics-scripted-metric-aggregation,`scripted_metrics`>>
|
|
or <<search-aggregations-pipeline-bucket-script-aggregation,`bucket_scripts`>>,
|
|
<<dynamic-mapping,dynamic mappings>> will be used. In some instances the
|
|
deduced mappings may be incompatible with the actual data. For example, numeric
|
|
overflows might occur or dynamically mapped fields might contain both numbers
|
|
and strings. Please check {es} logs if you think this may have occurred.
|
|
|
|
You can view the deduced mappings by using the
|
|
https://www.elastic.co/guide/en/elasticsearch/reference/current/preview-transform.html[Preview transform API].
|
|
See the `generated_dest_index` object in the API response.
|
|
|
|
If it's required, you may define custom mappings prior to starting the
|
|
{transform} by creating a custom destination index using the
|
|
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html[Create index API].
|
|
As deduced mappings cannot be overwritten by an index template, use the Create
|
|
index API to define custom mappings. The index templates only apply to fields
|
|
derived from scripts that use dynamic mappings.
|
|
|
|
[discrete]
|
|
[[transform-batch-limitations]]
|
|
=== Batch {transforms} may not account for changed documents
|
|
|
|
A batch {transform} uses a
|
|
<<search-aggregations-bucket-composite-aggregation,composite aggregation>>
|
|
which allows efficient pagination through all buckets. Composite aggregations
|
|
do not yet support a search context, therefore if the source data is changed
|
|
(deleted, updated, added) while the batch {dataframe} is in progress, then the
|
|
results may not include these changes.
|
|
|
|
[discrete]
|
|
[[transform-consistency-limitations]]
|
|
=== {ctransform-cap} consistency does not account for deleted or updated documents
|
|
|
|
While the process for {transforms} allows the continual recalculation of the
|
|
{transform} as new data is being ingested, it does also have some limitations.
|
|
|
|
Changed entities will only be identified if their time field has also been
|
|
updated and falls within the range of the action to check for changes. This has
|
|
been designed in principle for, and is suited to, the use case where new data is
|
|
given a timestamp for the time of ingest.
|
|
|
|
If the indices that fall within the scope of the source index pattern are
|
|
removed, for example when deleting historical time-based indices, then the
|
|
composite aggregation performed in consecutive checkpoint processing will search
|
|
over different source data, and entities that only existed in the deleted index
|
|
will not be removed from the {dataframe} destination index.
|
|
|
|
Depending on your use case, you may wish to recreate the {transform} entirely
|
|
after deletions. Alternatively, if your use case is tolerant to historical
|
|
archiving, you may wish to include a max ingest timestamp in your aggregation.
|
|
This will allow you to exclude results that have not been recently updated when
|
|
viewing the destination index.
|
|
|
|
[discrete]
|
|
[[transform-deletion-limitations]]
|
|
=== Deleting a {transform} does not delete the destination index or {kib} index pattern
|
|
|
|
When deleting a {transform} using `DELETE _transform/index`
|
|
neither the destination index nor the {kib} index pattern, should one have been
|
|
created, are deleted. These objects must be deleted separately.
|
|
|
|
[discrete]
|
|
[[transform-aggregation-page-limitations]]
|
|
=== Handling dynamic adjustment of aggregation page size
|
|
|
|
During the development of {transforms}, control was favoured over performance.
|
|
In the design considerations, it is preferred for the {transform} to take longer
|
|
to complete quietly in the background rather than to finish quickly and take
|
|
precedence in resource consumption.
|
|
|
|
Composite aggregations are well suited for high cardinality data enabling
|
|
pagination through results. If a <<circuit-breaker,circuit breaker>> memory
|
|
exception occurs when performing the composite aggregated search then we try
|
|
again reducing the number of buckets requested. This circuit breaker is
|
|
calculated based upon all activity within the cluster, not just activity from
|
|
{transforms}, so it therefore may only be a temporary resource
|
|
availability issue.
|
|
|
|
For a batch {transform}, the number of buckets requested is only ever adjusted
|
|
downwards. The lowering of value may result in a longer duration for the
|
|
{transform} checkpoint to complete. For {ctransforms}, the number of buckets
|
|
requested is reset back to its default at the start of every checkpoint and it
|
|
is possible for circuit breaker exceptions to occur repeatedly in the {es} logs.
|
|
|
|
The {transform} retrieves data in batches which means it calculates several
|
|
buckets at once. Per default this is 500 buckets per search/index operation. The
|
|
default can be changed using `max_page_search_size` and the minimum value is 10.
|
|
If failures still occur once the number of buckets requested has been reduced to
|
|
its minimum, then the {transform} will be set to a failed state.
|
|
|
|
[discrete]
|
|
[[transform-dynamic-adjustments-limitations]]
|
|
=== Handling dynamic adjustments for many terms
|
|
|
|
For each checkpoint, entities are identified that have changed since the last
|
|
time the check was performed. This list of changed entities is supplied as a
|
|
<<query-dsl-terms-query,terms query>> to the {transform} composite aggregation,
|
|
one page at a time. Then updates are applied to the destination index for each
|
|
page of entities.
|
|
|
|
The page `size` is defined by `max_page_search_size` which is also used to
|
|
define the number of buckets returned by the composite aggregation search. The
|
|
default value is 500, the minimum is 10.
|
|
|
|
The index setting <<dynamic-index-settings,`index.max_terms_count`>> defines
|
|
the maximum number of terms that can be used in a terms query. The default value
|
|
is 65536. If `max_page_search_size` exceeds `index.max_terms_count` the
|
|
{transform} will fail.
|
|
|
|
Using smaller values for `max_page_search_size` may result in a longer duration
|
|
for the {transform} checkpoint to complete.
|
|
|
|
[discrete]
|
|
[[transform-failed-limitations]]
|
|
=== Handling of failed {transforms}
|
|
|
|
Failed {transforms} remain as a persistent task and should be handled
|
|
appropriately, either by deleting it or by resolving the root cause of the
|
|
failure and re-starting.
|
|
|
|
When using the API to delete a failed {transform}, first stop it using
|
|
`_stop?force=true`, then delete it.
|
|
|
|
[discrete]
|
|
[[transform-availability-limitations]]
|
|
=== {ctransforms-cap} may give incorrect results if documents are not yet available to search
|
|
|
|
After a document is indexed, there is a very small delay until it is available
|
|
to search.
|
|
|
|
A {ctransform} periodically checks for changed entities between the time since
|
|
it last checked and `now` minus `sync.time.delay`. This time window moves
|
|
without overlapping. If the timestamp of a recently indexed document falls
|
|
within this time window but this document is not yet available to search then
|
|
this entity will not be updated.
|
|
|
|
If using a `sync.time.field` that represents the data ingest time and using a
|
|
zero second or very small `sync.time.delay`, then it is more likely that this
|
|
issue will occur.
|
|
|
|
[discrete]
|
|
[[transform-date-nanos]]
|
|
=== Support for date nanoseconds data type
|
|
|
|
If your data uses the <<date_nanos,date nanosecond data type>>, aggregations
|
|
are nonetheless on millisecond resolution. This limitation also affects the
|
|
aggregations in your {transforms}.
|
|
|
|
|
|
[discrete]
|
|
[[transform-data-streams-destination]]
|
|
=== Data streams as destination indices are not supported
|
|
|
|
{transforms-cap} update data in the destination index which requires writing
|
|
into the destination. <<data-streams>> are designed to be append-only, which
|
|
means you cannot send update or delete requests directly to a data stream. For
|
|
this reason, data streams are not supported as destination indices for
|
|
{transforms}.
|
|
|
|
|
|
[discrete]
|
|
[[transform-ilm-destination]]
|
|
=== ILM as destination index may cause duplicated documents
|
|
|
|
<<index-lifecycle-management,ILM>> is not recommended to use as a {transform}
|
|
destination index. {transforms-cap} update documents in the current destination,
|
|
and cannot delete documents in the indices previously used by ILM. This may lead
|
|
to duplicated documents when you use {transforms} combined with ILM in case of a
|
|
rollover.
|
|
|
|
If you use ILM to have time-based indices, please consider using the
|
|
<<date-index-name-processor>> instead. The processor works without duplicated
|
|
documents if your {transform} contains a `group_by` based on `date_histogram`.
|
|
|
|
|
|
[discrete]
|
|
[[transform-ui-limitations]]
|
|
== Limitations in {kib}
|
|
|
|
[discrete]
|
|
[[transform-space-limitations]]
|
|
=== {transforms-cap} are visible in all {kib} spaces
|
|
|
|
{kibana-ref}/xpack-spaces.html[Spaces] enable you to organize your source and
|
|
destination indices and other saved objects in {kib} and to see only the objects
|
|
that belong to your space. However, this limited scope does not apply to
|
|
{transforms}; they are visible in all spaces.
|
|
|
|
[discrete]
|
|
[[transform-rolling-upgrade-ui-limitation]]
|
|
=== {transforms-cap} UI will not work during a rolling upgrade from 7.2
|
|
|
|
If your cluster contains mixed version nodes, for example during a rolling
|
|
upgrade from 7.2 to a newer version, and {transforms} have been created in 7.2,
|
|
the {transforms} UI (earler {dataframe} UI) will not work. Please wait until all
|
|
nodes have been upgraded to the newer version before using the {transforms} UI.
|
|
|
|
[discrete]
|
|
[[transform-kibana-limitations]]
|
|
=== Up to 1,000 {transforms} are listed in {kib}
|
|
|
|
The {transforms} management page in {kib} lists up to 1000 {transforms}. |