HLRC API for _termvectors (#33447)

* HLRC API for _termvectors relates to #27205
2025-04-24 23:27:25 -04:00 · 2018-10-24 14:27:22 -04:00 · 2018-10-24 14:27:22 -04:00 · bf4d90a5dc
commit bf4d90a5dc
parent f19565c3e0
12 changed files with 1347 additions and 4 deletions
--- a/docs/java-rest/high-level/document/term-vectors.asciidoc
+++ b/docs/java-rest/high-level/document/term-vectors.asciidoc
@ -0,0 +1,134 @@
+--
+:api: term-vectors
+:request: TermVectorsRequest
+:response: TermVectorsResponse
+--
+
+[id="{upid}-{api}"]
+=== Term Vectors API
+
+Term Vectors API returns information and statistics on terms in the fields
+of a particular document. The document could be stored in the index or
+artificially provided by the user.
+
+
+[id="{upid}-{api}-request"]
+==== Term Vectors Request
+
+A +{request}+ expects an `index`, a `type` and an `id` to specify
+a certain document, and fields for which the information is retrieved.
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-request]
+--------------------------------------------------
+
+Term vectors can also be generated for artificial documents, that is for
+documents not present in the index:
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-request-artificial]
+--------------------------------------------------
+<1> An artificial document is provided as an `XContentBuilder` object,
+the Elasticsearch built-in helper to generate JSON content.
+
+===== Optional arguments
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-request-optional-arguments]
+--------------------------------------------------
+<1> Set `fieldStatistics` to `false` (default is `true`) to omit document count,
+sum of document frequencies, sum of total term frequencies.
+<2> Set `termStatistics` to `true` (default is `false`) to display
+total term frequency and document frequency.
+<3> Set `positions` to `false` (default is `true`) to omit the output of
+positions.
+<4> Set `offsets` to `false` (default is `true`) to omit the output of
+offsets.
+<5> Set `payloads` to `false` (default is `true`) to omit the output of
+payloads.
+<6> Set `filterSettings` to filter the terms that can be returned based
+on their tf-idf scores.
+<7> Set `perFieldAnalyzer` to specify  a different analyzer than
+the one that the field has.
+<8> Set `realtime` to `false` (default is `true`) to retrieve term vectors
+near realtime.
+<9> Set a routing parameter
+
+
+include::../execution.asciidoc[]
+
+
+[id="{upid}-{api}-response"]
+==== TermVectorsResponse
+
+The `TermVectorsResponse` contains the following information:
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-response]
+--------------------------------------------------
+<1> The index name of the document.
+<2> The type name of the document.
+<3> The id of the document.
+<4> Indicates whether or not the document found.
+
+
+===== Inspecting Term Vectors
+If `TermVectorsResponse` contains non-null list of term vectors,
+more information about them can be obtained using following:
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-term-vectors]
+--------------------------------------------------
+<1> The list of `TermVector` for the document
+<2> The name of the current field
+<3> Fields statistics for the current field - document count
+<4> Fields statistics for the current field - sum of total term frequencies
+<5> Fields statistics for the current field - sum of document frequencies
+<6> Terms for the current field
+<7> The name of the term
+<8> Term frequency of the term
+<9> Document frequency of the term
+<10> Total term frequency of the term
+<11> Score of the term
+<12> Tokens of the term
+<13> Position of the token
+<14> Start offset of the token
+<15> End offset of the token
+<16> Payload of the token
+
+
+[id="{upid}-{api}-response"]
+==== TermVectorsResponse
+
+The `TermVectorsResponse` contains the following information:
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-response]
+--------------------------------------------------
+<1> The index name of the document.
+<2> The type name of the document.
+<3> The id of the document.
+<4> Indicates whether or not the document found.
+<5> Indicates whether or not there are term vectors for this document.
+<6> The list of `TermVector` for the document
+<7> The name of the current field
+<8> Fields statistics for the current field - document count
+<9> Fields statistics for the current field - sum of total term frequencies
+<10> Fields statistics for the current field - sum of document frequencies
+<11> Terms for the current field
+<12> The name of the term
+<13> Term frequency of the term
+<14> Document frequency of the term
+<15> Total term frequency of the term
+<16> Score of the term
+<17> Tokens of the term
+<18> Position of the token
+<19> Start offset of the token
+<20> End offset of the token
+<21> Payload of the token
--- a/docs/java-rest/high-level/supported-apis.asciidoc
+++ b/docs/java-rest/high-level/supported-apis.asciidoc
@ -14,6 +14,7 @@ Single document APIs::
 * <<{upid}-exists>>
 * <<{upid}-delete>>
 * <<{upid}-update>>
+* <<{upid}-term-vectors>>

 [[multi-doc]]
 Multi-document APIs::
@ -29,6 +30,7 @@ include::document/get.asciidoc[]
 include::document/exists.asciidoc[]
 include::document/delete.asciidoc[]
 include::document/update.asciidoc[]
+include::document/term-vectors.asciidoc[]
 include::document/bulk.asciidoc[]
 include::document/multi-get.asciidoc[]
 include::document/reindex.asciidoc[]
@ -372,4 +374,4 @@ don't leak into the rest of the documentation.
 :response!:
 :doc-tests-file!:
 :upid!:
--
+--