mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 09:28:55 -04:00
240 lines
9.3 KiB
Text
240 lines
9.3 KiB
Text
[[point-in-time-api]]
|
|
=== Point in time API
|
|
++++
|
|
<titleabbrev>Point in time</titleabbrev>
|
|
++++
|
|
|
|
A search request by default executes against the most recent visible data of
|
|
the target indices, which is called point in time. Elasticsearch pit (point in time)
|
|
is a lightweight view into the state of the data as it existed when initiated.
|
|
In some cases, it's preferred to perform multiple search requests using
|
|
the same point in time. For example, if <<indices-refresh,refreshes>> happen between
|
|
search_after requests, then the results of those requests might not be consistent as
|
|
changes happening between searches are only visible to the more recent point in time.
|
|
|
|
[[point-in-time-api-prereqs]]
|
|
==== {api-prereq-title}
|
|
|
|
* If the {es} {security-features} are enabled, you must have the `read`
|
|
<<privileges-list-indices,index privilege>> for the target data stream, index,
|
|
or alias.
|
|
+
|
|
To search a <<point-in-time-api,point in time (PIT)>> for an alias, you
|
|
must have the `read` index privilege for the alias's data streams or indices.
|
|
|
|
[[point-in-time-api-request-body]]
|
|
==== {api-request-body-title}
|
|
|
|
`index_filter`::
|
|
(Optional, <<query-dsl,query object>> Allows to filter indices if the provided
|
|
query rewrites to `match_none` on every shard.
|
|
|
|
[[point-in-time-api-example]]
|
|
==== {api-examples-title}
|
|
|
|
A point in time must be opened explicitly before being used in search requests. The
|
|
keep_alive parameter tells Elasticsearch how long it should keep a point in time alive,
|
|
e.g. `?keep_alive=5m`.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /my-index-000001/_pit?keep_alive=1m
|
|
--------------------------------------------------
|
|
// TEST[setup:my_index]
|
|
|
|
The result from the above request includes a `id`, which should
|
|
be passed to the `id` of the `pit` parameter of a search request.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /_search <1>
|
|
{
|
|
"size": 100, <2>
|
|
"query": {
|
|
"match" : {
|
|
"title" : "elasticsearch"
|
|
}
|
|
},
|
|
"pit": {
|
|
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", <3>
|
|
"keep_alive": "1m" <4>
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[catch:unavailable]
|
|
|
|
<1> A search request with the `pit` parameter must not specify `index`, `routing`,
|
|
or <<search-preference,`preference`>>
|
|
as these parameters are copied from the point in time.
|
|
<2> Just like regular searches, you can <<paginate-search-results,use `from` and
|
|
`size` to page through search results>>, up to the first 10,000 hits. If you
|
|
want to retrieve more hits, use PIT with <<search-after,`search_after`>>.
|
|
<3> The `id` parameter tells Elasticsearch to execute the request using contexts
|
|
from this point in time.
|
|
<4> The `keep_alive` parameter tells Elasticsearch how long it should extend
|
|
the time to live of the point in time.
|
|
|
|
IMPORTANT: The open point in time request and each subsequent search request can
|
|
return different `id`; thus always use the most recently received `id` for the
|
|
next search request.
|
|
|
|
In addition to the `keep_alive` parameter, the `allow_partial_search_results` parameter
|
|
can also be defined.
|
|
This parameter determines whether the <<point-in-time-api, point in time (PIT)>>
|
|
should tolerate unavailable shards or <<shard-failures, shard failures>> when
|
|
initially creating the PIT.
|
|
If set to true, the PIT will be created with the available shards, along with a
|
|
reference to any missing ones.
|
|
If set to false, the operation will fail if any shard is unavailable.
|
|
The default value is false.
|
|
|
|
The PIT response includes a summary of the total number of shards, as well as the number
|
|
of successful shards when creating the PIT.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /my-index-000001/_pit?keep_alive=1m&allow_partial_search_results=true
|
|
--------------------------------------------------
|
|
// TEST[setup:my_index]
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=",
|
|
"_shards": {
|
|
"total": 10,
|
|
"successful": 10,
|
|
"skipped": 0,
|
|
"failed": 0
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
When a PIT that contains shard failures is used in a search request, the missing are
|
|
always reported in the search response as a NoShardAvailableActionException exception.
|
|
To get rid of these exceptions, a new PIT needs to be created so that shards missing
|
|
from the previous PIT can be handled, assuming they become available in the meantime.
|
|
|
|
[[point-in-time-keep-alive]]
|
|
==== Keeping point in time alive
|
|
The `keep_alive` parameter, which is passed to a open point in time request and
|
|
search request, extends the time to live of the corresponding point in time.
|
|
The value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
|
|
process all data -- it just needs to be long enough for the next request.
|
|
|
|
Normally, the background merge process optimizes the index by merging together
|
|
smaller segments to create new, bigger segments. Once the smaller segments are
|
|
no longer needed they are deleted. However, open point-in-times prevent the
|
|
old segments from being deleted since they are still in use.
|
|
|
|
TIP: Keeping older segments alive means that more disk space and file handles
|
|
are needed. Ensure that you have configured your nodes to have ample free file
|
|
handles. See <<file-descriptors>>.
|
|
|
|
Additionally, if a segment contains deleted or updated documents then the
|
|
point in time must keep track of whether each document in the segment was live at
|
|
the time of the initial search request. Ensure that your nodes have sufficient heap
|
|
space if you have many open point-in-times on an index that is subject to ongoing
|
|
deletes or updates. Note that a point-in-time doesn't prevent its associated indices
|
|
from being deleted.
|
|
|
|
You can check how many point-in-times (i.e, search contexts) are open with the
|
|
<<cluster-nodes-stats,nodes stats API>>:
|
|
|
|
[source,console]
|
|
---------------------------------------
|
|
GET /_nodes/stats/indices/search
|
|
---------------------------------------
|
|
|
|
[[close-point-in-time-api]]
|
|
==== Close point in time API
|
|
|
|
Point-in-time is automatically closed when its `keep_alive` has
|
|
been elapsed. However keeping point-in-times has a cost, as discussed in the
|
|
<<point-in-time-keep-alive,previous section>>. Point-in-times should be closed
|
|
as soon as they are no longer used in search requests.
|
|
|
|
[source,console]
|
|
---------------------------------------
|
|
DELETE /_pit
|
|
{
|
|
"id" : "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
|
|
}
|
|
---------------------------------------
|
|
// TEST[catch:missing]
|
|
|
|
The API returns the following response:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"succeeded": true, <1>
|
|
"num_freed": 3 <2>
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"succeeded": true/"succeeded": $body.succeeded/]
|
|
// TESTRESPONSE[s/"num_freed": 3/"num_freed": $body.num_freed/]
|
|
|
|
<1> If true, all search contexts associated with the point-in-time id are successfully closed
|
|
<2> The number of search contexts have been successfully closed
|
|
|
|
[discrete]
|
|
[[search-slicing]]
|
|
=== Search slicing
|
|
|
|
When paging through a large number of documents, it can be helpful to split the search into multiple slices
|
|
to consume them independently:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"slice": {
|
|
"id": 0, <1>
|
|
"max": 2 <2>
|
|
},
|
|
"query": {
|
|
"match": {
|
|
"message": "foo"
|
|
}
|
|
},
|
|
"pit": {
|
|
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
|
|
}
|
|
}
|
|
|
|
GET /_search
|
|
{
|
|
"slice": {
|
|
"id": 1,
|
|
"max": 2
|
|
},
|
|
"pit": {
|
|
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
|
|
},
|
|
"query": {
|
|
"match": {
|
|
"message": "foo"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:both calls will throw errors]
|
|
|
|
<1> The id of the slice
|
|
<2> The maximum number of slices
|
|
|
|
The result from the first request returns documents belonging to the first slice (id: 0) and the
|
|
result from the second request returns documents in the second slice. Since the maximum number of
|
|
slices is set to 2 the union of the results of the two requests is equivalent to the results of a
|
|
point-in-time search without slicing. By default the splitting is done first on the shards, then
|
|
locally on each shard. The local splitting partitions the shard into contiguous ranges based on
|
|
Lucene document IDs.
|
|
|
|
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices
|
|
0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
|
|
|
|
IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used,
|
|
then slices can overlap and miss documents. This is because the splitting criterion is based on
|
|
Lucene document IDs, which are not stable across changes to the index.
|