[DOCS] Focus scripting docs on Painless (#69748)

* Initial changes for scripting.

* Shorten script examples.

* Expanding types docs.

* Updating types.

* Fixing broken cross-link.

* Fixing map error.

* Incorporating review feedback.

* Fixing broken table.

* Adding more info about reference types.

* Fixing broken path.

* Adding more info an examples for def type.

* Adding more info on operators.

* Incorporating review feedback.

* Adding notconsole for example.

* Removing comments in example.

* More review feedback.

* Editorial changes.

* Incorporating more reviewer feedback.

* Rewrites based on review feedback.

* Adding new sections for storing scripts and shortening scripts.

* Adding redirect for stored scripts.

* Adding DELETE for stored script plus link.

* Adding section for updating docs with scripts.

* Incorporating final feedback from reviews.

* Tightening up a few areas.

* Minor change around other languages.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit is contained in:
Adam Locke 2021-03-18 15:58:33 -04:00 committed by GitHub
parent 4bb3d21d60
commit aba4422606
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 432 additions and 324 deletions

View file

@ -1,32 +1,34 @@
[[modules-scripting-painless]]
== Painless scripting language
_Painless_ is a simple, secure scripting language designed specifically for use
with Elasticsearch. It is the default scripting language for Elasticsearch and
can safely be used for inline and stored scripts. To get started with
Painless, see the {painless}/painless-guide.html[Painless Guide]. For a
detailed description of the Painless syntax and language features, see the
{painless}/painless-lang-spec.html[Painless Language Specification].
_Painless_ is a performant, secure scripting language designed specifically for
{es}. You can use Painless to safely write inline and stored scripts anywhere
scripts are supported in {es}.
[[painless-features]]
You can use Painless anywhere scripts can be used in Elasticsearch. Painless
provides:
Painless provides numerous capabilities that center around the following
core principles:
* Fast performance: Painless scripts https://benchmarks.elastic.co/index.html#search_qps_scripts[
run several times faster] than the alternatives.
* **Safety**: Ensuring the security of your cluster is of utmost importance. To
that end, Painless uses a fine-grained allowlist with a granularity down to the
members of a class. Anything that is not part of the allowlist results in a
compilation error. See the
{painless}/painless-api-reference.html[Painless API Reference]
for a complete list of available classes, methods, and fields per script
context.
* **Performance**: Painless compiles directly into JVM bytecode to take
advantage of all possible optimizations that the JVM provides. Also, Painless
typically avoids features that require additional slower checks at runtime.
* **Simplicity**: Painless implements a syntax with a natural familiarity to
anyone with some basic coding experience. Painless uses a subset of Java syntax
with some additional improvements to enhance readability and remove
boilerplate.
* Safety: Fine-grained allowlists with method call/field granularity. See the
{painless}/painless-api-reference.html[Painless API Reference] for a
complete list of available classes and methods.
[discrete]
=== Start scripting
Ready to start scripting with Painless? Learn how to
<<modules-scripting-using,write your first script>>.
* Optional typing: Variables and parameters can use explicit types or the
dynamic `def` type.
* Syntax: Extends a subset of Java's syntax to provide additional scripting
language features.
* Optimizations: Designed specifically for Elasticsearch scripting.
Ready to start scripting with Painless? See the
{painless}/painless-guide.html[Painless Guide] for the
{painless}/index.html[Painless Scripting Language].
If you're already familiar with Painless, see the
{painless}/painless-lang-spec.html[Painless Language Specification] for a
detailed description of the Painless syntax and language features.

View file

@ -79,12 +79,12 @@ security of the Elasticsearch deployment.
=== Allowed script types setting
Elasticsearch supports two script types: `inline` and `stored` (<<modules-scripting-using>>).
By default, {es} is configured to run both types of scripts.
To limit what type of scripts are run, set `script.allowed_types` to `inline` or `stored`.
By default, {es} is configured to run both types of scripts.
To limit what type of scripts are run, set `script.allowed_types` to `inline` or `stored`.
To prevent any scripts from running, set `script.allowed_types` to `none`.
IMPORTANT: If you use {kib}, set `script.allowed_types` to `both` or `inline`.
Some {kib} features rely on inline scripts and do not function as expected
IMPORTANT: If you use {kib}, set `script.allowed_types` to `both` or `inline`.
Some {kib} features rely on inline scripts and do not function as expected
if {es} does not allow inline scripts.
For example, to run `inline` scripts but not `stored` scripts, specify:

View file

@ -1,39 +1,74 @@
[[modules-scripting-using]]
== How to use scripts
== How to write scripts
Wherever scripting is supported in the Elasticsearch API, the syntax follows
the same pattern:
Wherever scripting is supported in the {es} APIs, the syntax follows the same
pattern; you specify the language of your script, provide the script logic (or
source, and add parameters that are passed into the script:
[source,js]
-------------------------------------
"script": {
"lang": "...", <1>
"source" | "id": "...", <2>
"params": { ... } <3>
"lang": "...",
"source" | "id": "...",
"params": { ... }
}
-------------------------------------
// NOTCONSOLE
<1> The language the script is written in, which defaults to `painless`.
<2> The script itself which may be specified as `source` for an inline script or `id` for a stored script.
<3> Any named parameters that should be passed into the script.
For example, the following script is used in a search request to return a
<<script-fields, scripted field>>:
`lang`::
Specifies the language the script is written in. Defaults to `painless`.
`source`, `id`::
The script itself, which you specify as `source` for an inline script or `id` for a stored script. Use the `_scripts` endpoint to retrieve a stored script. For example, you can create or delete a <<script-stored-scripts,stored script>> by calling `POST _scripts/{id}` and `DELETE _scripts/{id}`.
`params`::
Specifies any named parameters that are passed into the script as
variables. <<prefer-params,Use parameters>> instead of hard-coded values to decrease compile time.
[discrete]
[[hello-world-script]]
=== Write your first script
<<modules-scripting-painless,Painless>> is the default scripting language
for {es}. It is secure, performant, and provides a natural syntax for anyone
with a little coding experience.
A Painless script is structured as one or more statements and optionally
has one or more user-defined functions at the beginning. A script must always
have at least one statement.
The {painless}/painless-execute-api.html[Painless execute API] provides the ability to
test a script with simple user-defined parameters and receive a result. Let's
start with a complete script and review its constituent parts.
First, index a document with a single field so that we have some data to work
with:
[source,console]
-------------------------------------
----
PUT my-index-000001/_doc/1
{
"my_field": 5
}
----
We can then construct a script that operates on that field and run evaluate the
script as part of a query. The following query uses the
<<script-fields,`script_fields`>> parameter of the search API to retrieve a
script valuation. There's a lot happening here, but we'll break it down the
components to understand them individually. For now, you only need to
understand that this script takes `my_field` and operates on it.
[source,console]
----
GET my-index-000001/_search
{
"script_fields": {
"my_doubled_field": {
"script": {
"lang": "expression",
"source": "doc['my_field'] * multiplier",
"script": { <1>
"source": "doc['my_field'].value * params['multiplier']", <2>
"params": {
"multiplier": 2
}
@ -41,117 +76,141 @@ GET my-index-000001/_search
}
}
}
-------------------------------------
----
// TEST[continued]
<1> `script` object
<2> `script` source
The `script` is a standard JSON object that defines scripts under most APIs
in {es}. This object requires `source` to define the script itself. The
script doesn't specify a language, so it defaults to Painless.
[discrete]
=== Script parameters
`lang`::
Specifies the language the script is written in. Defaults to `painless`.
`source`, `id`::
Specifies the source of the script. An `inline` script is specified
`source` as in the example above. A `stored` script is specified `id`
and is retrieved from the cluster state (see <<modules-scripting-stored-scripts,Stored Scripts>>).
`params`::
Specifies any named parameters that are passed into the script as
variables.
[IMPORTANT]
[[prefer-params]]
.Prefer parameters
========================================
=== Use parameters in your script
The first time Elasticsearch sees a new script, it compiles it and stores the
compiled version in a cache. Compilation can be a heavy process.
The first time {es} sees a new script, it compiles the script and stores the
compiled version in a cache. Compilation can be a heavy process. Rather than
hard-coding values in your script, pass them as named `params` instead.
If you need to pass variables into the script, you should pass them in as
named `params` instead of hard-coding values into the script itself. For
example, if you want to be able to multiply a field value by different
multipliers, don't hard-code the multiplier into the script:
For example, in the previous script, we could have just hard coded values and
written a script that is seemingly less complex. We could just retrieve the
first value for `my_field` and then multiply it by `2`:
[source,painless]
----
"source": "return doc['my_field'].value * 2"
----
Though it works, this solution is pretty inflexible. We have to modify the
script source to change the multiplier, and {es} has to recompile the script
every time that the multiplier changes.
Instead of hard-coding values, use named `params` to make scripts flexible, and
also reduce compilation time when the script runs. You can now make changes to
the `multiplier` parameter without {es} recompiling the script.
[source,painless]
----
"source": "doc['my_field'].value * params['multiplier']",
"params": {
"multiplier": 2
}
----
For most contexts, you can compile up to 75 scripts per 5 minutes by default.
For ingest contexts, the default script compilation rate is unlimited. You
can change these settings dynamically by setting
`script.context.$CONTEXT.max_compilations_rate`. For example, the following
setting limits script compilation to 100 scripts every 10 minutes for the
{painless}/painless-field-context.html[field context]:
[source,js]
----------------------
"source": "doc['my_field'] * 2"
----------------------
----
script.context.field.max_compilations_rate=100/10m
----
// NOTCONSOLE
Instead, pass it in as a named parameter:
IMPORTANT: If you compile too many unique scripts within a short time, {es}
rejects the new dynamic scripts with a `circuit_breaking_exception` error.
[source,js]
----------------------
"source": "doc['my_field'] * multiplier",
"params": {
"multiplier": 2
[discrete]
[[script-shorten-syntax]]
=== Shorten your script
Using syntactic abilities that are native to Painless, you can reduce verbosity
in your scripts and make them shorter. Here's a simple script that we can make
shorter:
[source,console]
----
GET my-index-000001/_search
{
"script_fields": {
"my_doubled_field": {
"script": {
"lang": "painless",
"source": "return doc['my_field'].value * params.get('multiplier');",
"params": {
"multiplier": 2
}
}
}
}
----------------------
// NOTCONSOLE
}
----
// TEST[s/^/PUT my-index-000001\n/]
The first version has to be recompiled every time the multiplier changes. The
second version is only compiled once.
Let's look at a shortened version of the script to see what improvements it
includes over the previous iteration:
If you compile too many unique scripts within a small amount of time,
Elasticsearch will reject the new dynamic scripts with a
`circuit_breaking_exception` error. For most contexts, you can compile up to 75
scripts per 5 minutes by default. For ingest contexts, the default script
compilation rate is unlimited. You can change these settings dynamically by
setting `script.context.$CONTEXT.max_compilations_rate` eg.
`script.context.field.max_compilations_rate=100/10m`.
========================================
[discrete]
[[modules-scripting-short-script-form]]
=== Short script form
A short script form can be used for brevity. In the short form, `script` is represented
by a string instead of an object. This string contains the source of the script.
Short form:
[source,js]
----------------------
"script": "ctx._source.my-int++"
----------------------
// NOTCONSOLE
The same script in the normal form:
[source,js]
----------------------
"script": {
"source": "ctx._source.my-int++"
[source,console]
----
GET my-index-000001/_search
{
"script_fields": {
"my_doubled_field": {
"script": {
"source": "doc['my_field'].value * params['multiplier']",
"params": {
"multiplier": 2
}
}
}
}
----------------------
// NOTCONSOLE
}
----
// TEST[s/^/PUT my-index-000001\n/]
This version of the script removes several components and simplifies the syntax
significantly:
* The `lang` declaration. Because Painless is the default language, you don't
need to specify the language if you're writing a Painless script.
* The `return` keyword. Painless automatically uses the final statement in a
script (when possible) to produce a return value in a script context that
requires one.
* The `get` method, which is replaced with brackets `[]`. Painless
uses a shortcut specifically for the `Map` type that allows us to use brackets
instead of the lengthier `get` method.
* The semicolon at the end of the `source` statement. Painless does not
require semicolons for the final statement of a block. However, it does require
them in other cases to remove ambiguity.
Use this abbreviated syntax anywhere that {es} supports scripts.
[discrete]
[[modules-scripting-stored-scripts]]
=== Stored scripts
[[script-stored-scripts]]
=== Store and retrieve scripts
You can store and retrieve scripts from the cluster state using the `_scripts`
endpoint. Using stored scripts can help to reduce compilation time and make
searches faster. Use the `{id}` path element in `_scripts/{id}` to refer to a
stored script.
Scripts may be stored in and retrieved from the cluster state using the
`_scripts` end-point.
NOTE: Unlike regular scripts, stored scripts require that you specify a script
language using the `lang` parameter.
If the {es} {security-features} are enabled, you must have the following
privileges to create, retrieve, and delete stored scripts:
* cluster: `all` or `manage`
For more information, see <<security-privileges>>.
[discrete]
==== Request examples
The following are examples of using a stored script that lives at
`/_scripts/{id}`.
First, create the script called `calculate-score` in the cluster state:
For example, let's create a stored script in the cluster named
`calculate-score`:
[source,console]
-----------------------------------
@ -159,29 +218,13 @@ POST _scripts/calculate-score
{
"script": {
"lang": "painless",
"source": "Math.log(_score * 2) + params.my_modifier"
"source": "Math.log(_score * 2) + params['my_modifier']"
}
}
-----------------------------------
// TEST[setup:my_index]
You may also specify a context as part of the url path to compile a
stored script against that specific context in the form of
`/_scripts/{id}/{context}`:
[source,console]
-----------------------------------
POST _scripts/calculate-score/score
{
"script": {
"lang": "painless",
"source": "Math.log(_score * 2) + params.my_modifier"
}
}
-----------------------------------
// TEST[setup:my_index]
This same script can be retrieved with:
You can retrieve that script by using the `_scripts` endpoint:
[source,console]
-----------------------------------
@ -189,10 +232,11 @@ GET _scripts/calculate-score
-----------------------------------
// TEST[continued]
Stored scripts can be used by specifying the `id` parameters as follows:
To use the stored script in a query, include the script `id` in the `script`
declaration:
[source,console]
--------------------------------------------------
----
GET my-index-000001/_search
{
"query": {
@ -203,7 +247,7 @@ GET my-index-000001/_search
}
},
"script": {
"id": "calculate-score",
"id": "calculate-score", <1>
"params": {
"my_modifier": 2
}
@ -211,68 +255,182 @@ GET my-index-000001/_search
}
}
}
--------------------------------------------------
----
// TEST[continued]
<1> `id` of the stored script
And deleted with:
To delete a stored script, submit a delete request to the `_scripts` endpoint
and specify the stored script `id`:
[source,console]
-----------------------------------
----
DELETE _scripts/calculate-score
-----------------------------------
----
// TEST[continued]
[discrete]
[[modules-scripting-search-templates]]
=== Search templates
You can also use the `_scripts` API to store **search templates**. Search
templates save specific <<search-search,search requests>> with placeholder
values, called template parameters.
[[scripts-update-scripts]]
=== Update documents with scripts
You can use the <<docs-update,update API>> to update documents with a specified
script. The script can update, delete, or skip modifying the document. The
update API also supports passing a partial document, which is merged into the
existing document.
You can use stored search templates to run searches without writing out the
entire query. Just provide the stored template's ID and the template parameters.
This is useful when you want to run a commonly used query quickly and without
mistakes.
First, let's index a simple document:
Search templates use the https://mustache.github.io/mustache.5.html[mustache
templating language]. See <<search-template>> for more information and examples.
[source,console]
----
PUT my-index-000001/_doc/1
{
"counter" : 1,
"tags" : ["red"]
}
----
[discrete]
[[modules-scripting-using-caching]]
=== Script caching
To increment the counter, you can submit an update request with the following
script:
All scripts are cached by default so that they only need to be recompiled
when updates occur. By default, scripts do not have a time-based expiration, but
you can change this behavior by using the `script.cache.expire` setting.
You can configure the size of this cache by using the `script.cache.max_size` setting.
For most contexts, the default cache size is `100`. For ingest contexts, the
default cache size is `200`.
[source,console]
----
POST my-index-000001/_update/1
{
"script" : {
"source": "ctx._source.counter += params.count",
"lang": "painless",
"params" : {
"count" : 4
}
}
}
----
// TEST[continued]
NOTE: The size of scripts is limited to 65,535 bytes. This can be
changed by setting `script.max_size_in_bytes` setting to increase that soft
limit, but if scripts are really large then a
<<modules-scripting-engine,native script engine>> should be considered.
Similarly, you can use an update script to add a tag to the list of tags.
Because this is just a list, the tag is added even it exists:
[source,console]
----
POST my-index-000001/_update/1
{
"script": {
"source": "ctx._source.tags.add(params['tag'])",
"lang": "painless",
"params": {
"tag": "blue"
}
}
}
----
// TEST[continued]
You can also remove a tag from the list of tags. The `remove` method of a Java
`List` is available in Painless. It takes the index of the element you
want to remove. To avoid a possible runtime error, you first need to make sure
the tag exists. If the list contains duplicates of the tag, this script just
removes one occurrence.
[source,console]
----
POST my-index-000001/_update/1
{
"script": {
"source": "if (ctx._source.tags.contains(params['tag'])) { ctx._source.tags.remove(ctx._source.tags.indexOf(params['tag'])) }",
"lang": "painless",
"params": {
"tag": "blue"
}
}
}
----
// TEST[continued]
You can also add and remove fields from a document. For example, this script
adds the field `new_field`:
[source,console]
----
POST my-index-000001/_update/1
{
"script" : "ctx._source.new_field = 'value_of_new_field'"
}
----
// TEST[continued]
Conversely, this script removes the field `new_field`:
[source,console]
----
POST my-index-000001/_update/1
{
"script" : "ctx._source.remove('new_field')"
}
----
// TEST[continued]
Instead of updating the document, you can also change the operation that is
executed from within the script. For example, this request deletes the document
if the `tags` field contains `green`. Otherwise it does nothing (`noop`):
[source,console]
----
POST my-index-000001/_update/1
{
"script": {
"source": "if (ctx._source.tags.contains(params['tag'])) { ctx.op = 'delete' } else { ctx.op = 'none' }",
"lang": "painless",
"params": {
"tag": "green"
}
}
}
----
// TEST[continued]
[[scripts-and-search-speed]]
=== Scripts and search speed
=== Scripts, caching, and search speed
{es} performs a number of optimizations to make using scripts as fast as
possible. One important optimization is a script cache. The compiled script is
placed in a cache so that requests that reference the script do not incur a
compilation penalty.
Scripts can't make use of {es}'s index structures or related optimizations. This
can sometimes result in slower search speeds.
Cache sizing is important. Your script cache should be large enough to hold all
of the scripts that users need to be accessed concurrently.
If you often use scripts to transform indexed data, you can speed up search by
making these changes during ingest instead. However, that often means slower
index speeds.
If you see a large number of script cache evictions and a rising number of
compilations in <<cluster-nodes-stats,node stats>>, your cache might be too
small.
.*Example*
[%collapsible]
=====
An index, `my_test_scores`, contains two `long` fields:
All scripts are cached by default so that they only need to be recompiled
when updates occur. By default, scripts do not have a time-based expiration.
You can change this behavior by using the `script.cache.expire` setting.
Use the `script.cache.max_size` setting to configure the size of the cache.
NOTE: The size of scripts is limited to 65,535 bytes. Set the value of `script.max_size_in_bytes` to increase that soft limit. If your scripts are
really large, then consider using a
<<modules-scripting-engine,native script engine>>.
[discrete]
==== Improving search speed
Scripts are incredibly useful, but can't use {es}'s index structures or related
optimizations. This relationship can sometimes result in slower search speeds.
If you often use scripts to transform indexed data, you can make search faster
by transforming data during ingest instead. However, that often means slower
index speeds. Let's look at a practical example to illustrate how you can
increase search speed.
When running searches, it's common to sort results by the sum of two values.
For example, consider an index named `my_test_scores` that contains test score
data. This index includes two fields of type `long`:
* `math_score`
* `verbal_score`
When running searches, users often use a script to sort results by the sum of
these two field's values.
You can run a query with a script that adds these values together. There's
nothing wrong with this approach, but the query will be slower because the
script valuation occurs as part of the request. The following request returns
documents where `grad_year` equals `2099`, and sorts by the results by the
valuation of the script.
[source,console]
----
@ -298,12 +456,12 @@ GET /my_test_scores/_search
----
// TEST[s/^/PUT my_test_scores\n/]
To speed up search, you can perform this calculation during ingest and index the
sum to a field instead.
If you're searching a small index, then including the script as part of your
search query can be a good solution. If you want to make search faster, you can
perform this calculation during ingest and index the sum to a field instead.
First, <<indices-put-mapping,add a new field>>, `total_score`, to the index. The
`total_score` field will contain sum of the `math_score` and `verbal_score`
field values.
First, we'll add a new field to the index named `total_score`, which will
contain sum of the `math_score` and `verbal_score` field values.
[source,console]
----
@ -319,7 +477,7 @@ PUT /my_test_scores/_mapping
// TEST[continued]
Next, use an <<ingest,ingest pipeline>> containing the
<<script-processor,`script`>> processor to calculate the sum of `math_score` and
<<script-processor,script processor>> to calculate the sum of `math_score` and
`verbal_score` and index it in the `total_score` field.
[source,console]
@ -339,7 +497,7 @@ PUT _ingest/pipeline/my_test_scores_pipeline
// TEST[continued]
To update existing data, use this pipeline to <<docs-reindex,reindex>> any
documents from `my_test_scores` to a new index, `my_test_scores_2`.
documents from `my_test_scores` to a new index named `my_test_scores_2`.
[source,console]
----
@ -364,15 +522,16 @@ POST /my_test_scores_2/_doc/?pipeline=my_test_scores_pipeline
{
"student": "kimchy",
"grad_year": "2099",
"math_score": 800,
"math_score": 1200,
"verbal_score": 800
}
----
// TEST[continued]
These changes may slow indexing but allow for faster searches. Users can now
sort searches made on `my_test_scores_2` using the `total_score` field instead
of using a script.
These changes slow the index process, but allow for faster searches. Instead of
using a script, you can sort searches made on `my_test_scores_2` using the
`total_score` field. The response is near real-time! Though this process slows
ingest time, it greatly increases queries at search time.
[source,console]
----
@ -401,25 +560,4 @@ DELETE /_ingest/pipeline/my_test_scores_pipeline
----
// TEST[continued]
[source,console-result]
----
{
"acknowledged": true
}
----
////
=====
We recommend testing and benchmarking any indexing changes before deploying them
in production.
[discrete]
[[modules-scripting-errors]]
=== Script errors
Elasticsearch returns error details when there is a compliation or runtime
exception. The contents of this response are useful for tracking down the
problem.
experimental[]
The contents of `position` are experimental and subject to change.