[DOCS] updates transforms at scale doc with date rounding. (#109073)

This commit is contained in:
István Zoltán Szabó 2024-05-27 16:34:01 +02:00 committed by GitHub
parent 415b68f159
commit 2b0d2c9c23
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -15,7 +15,7 @@ relevant considerations in this guide to improve performance. It also helps to
understand how {transforms} work as different considerations apply depending on
whether or not your transform is running in continuous mode or in batch.
In this guide, youll learn how to:
In this guide, you'll learn how to:
* Understand the impact of configuration options on the performance of
{transforms}.
@ -111,10 +111,17 @@ group of IPs, in order to calculate the total `bytes_sent`. If this second
search matches many shards, then this could be resource intensive. Consider
limiting the scope that the source index pattern and query will match.
Use an absolute time value as a date range filter in your source query (for
example, greater than `2020-01-01T00:00:00`) to limit which historical indices
are accessed. If you use a relative time value (for example, `now-30d`) then
this date range is re-evaluated at the point of each checkpoint execution.
To limit which historical indices are accessed, exclude certain tiers (for
example `"must_not": { "terms": { "_tier": [ "data_frozen", "data_cold" ] } }`
and/or use an absolute time value as a date range filter in your source query
(for example, greater than 2024-01-01T00:00:00). If you use a relative time
value (for example, gte now-30d/d) then ensure date rounding is applied to take
advantage of query caching and ensure that the relative time is much larger than
the largest of `frequency` or `time.sync.delay` or the date histogram bucket,
otherwise data may be missed. Do not use date filters which are less than a date
value (for example, `lt`: less than or `lte`: less than or equal to) as this
conflicts with the logic applied at each checkpoint execution and data may be
missed.
Consider using <<api-date-math-index-names,date math>> in your index names to
reduce the number of indices to resolve in your queries. Add a date pattern