mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
342 lines
10 KiB
Text
342 lines
10 KiB
Text
[[esql-getting-started]]
|
|
== Getting started with {esql} queries
|
|
++++
|
|
<titleabbrev>Getting started</titleabbrev>
|
|
++++
|
|
|
|
This guide shows how you can use {esql} to query and aggregate your data.
|
|
|
|
[TIP]
|
|
====
|
|
This getting started is also available as an https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/esql/esql-getting-started.ipynb[interactive Python notebook] in the `elasticsearch-labs` GitHub repository.
|
|
====
|
|
|
|
[discrete]
|
|
[[esql-getting-started-prerequisites]]
|
|
=== Prerequisites
|
|
|
|
To follow along with the queries in this guide, you can either set up your own
|
|
deployment, or use Elastic's public {esql} demo environment.
|
|
|
|
include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-widget-sample-data.asciidoc[]
|
|
|
|
[discrete]
|
|
[[esql-getting-started-running-queries]]
|
|
=== Run an {esql} query
|
|
|
|
In {kib}, you can use Console or Discover to run {esql} queries:
|
|
|
|
include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-widget-discover-console.asciidoc[]
|
|
|
|
[discrete]
|
|
[[esql-getting-started-first-query]]
|
|
=== Your first {esql} query
|
|
|
|
Each {esql} query starts with a <<esql-source-commands,source command>>. A
|
|
source command produces a table, typically with data from {es}.
|
|
|
|
image::images/esql/source-command.svg[A source command producing a table from {es},align="center"]
|
|
|
|
The <<esql-from>> source command returns a table with documents from a data
|
|
stream, index, or alias. Each row in the resulting table represents a document.
|
|
This query returns up to 1000 documents from the `sample_data` index:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/docs.csv-spec[tag=gs-from]
|
|
----
|
|
|
|
Each column corresponds to a field, and can be accessed by the name of that
|
|
field.
|
|
|
|
[TIP]
|
|
====
|
|
{esql} keywords are case-insensitive. The following query is identical to the
|
|
previous one:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/docs.csv-spec[tag=gs-from-lowercase]
|
|
----
|
|
====
|
|
|
|
[discrete]
|
|
[[esql-getting-started-limit]]
|
|
=== Processing commands
|
|
|
|
A source command can be followed by one or more
|
|
<<esql-processing-commands,processing commands>>, separated by a pipe character:
|
|
`|`. Processing commands change an input table by adding, removing, or changing
|
|
rows and columns. Processing commands can perform filtering, projection,
|
|
aggregation, and more.
|
|
|
|
image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"]
|
|
|
|
For example, you can use the <<esql-limit>> command to limit the number of rows
|
|
that are returned, up to a maximum of 10,000 rows:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/docs.csv-spec[tag=gs-limit]
|
|
----
|
|
|
|
[TIP]
|
|
====
|
|
For readability, you can put each command on a separate line. However, you don't
|
|
have to. The following query is identical to the previous one:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/docs.csv-spec[tag=gs-limit-one-line]
|
|
----
|
|
====
|
|
|
|
[discrete]
|
|
[[esql-getting-started-sort]]
|
|
==== Sort a table
|
|
|
|
image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"]
|
|
|
|
Another processing command is the <<esql-sort>> command. By default, the rows
|
|
returned by `FROM` don't have a defined sort order. Use the `SORT` command to
|
|
sort rows on one or more columns:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/docs.csv-spec[tag=gs-sort]
|
|
----
|
|
|
|
[discrete]
|
|
[[esql-getting-started-where]]
|
|
==== Query the data
|
|
|
|
Use the <<esql-where>> command to query the data. For example, to find all
|
|
events with a duration longer than 5ms:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/where.csv-spec[tag=gs-where]
|
|
----
|
|
|
|
`WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/where-like.csv-spec[tag=gs-like]
|
|
----
|
|
|
|
[discrete]
|
|
[[esql-getting-started-more-commands]]
|
|
==== More processing commands
|
|
|
|
There are many other processing commands, like <<esql-keep>> and <<esql-drop>>
|
|
to keep or drop columns, <<esql-enrich>> to enrich a table with data from
|
|
indices in {es}, and <<esql-dissect>> and <<esql-grok>> to process data. Refer
|
|
to <<esql-processing-commands>> for an overview of all processing commands.
|
|
|
|
[discrete]
|
|
[[esql-getting-started-chaining]]
|
|
=== Chain processing commands
|
|
|
|
You can chain processing commands, separated by a pipe character: `|`. Each
|
|
processing command works on the output table of the previous command. The result
|
|
of a query is the table produced by the final processing command.
|
|
|
|
image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"]
|
|
|
|
The following example first sorts the table on `@timestamp`, and next limits the
|
|
result set to 3 rows:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/docs.csv-spec[tag=gs-chaining]
|
|
----
|
|
|
|
NOTE: The order of processing commands is important. First limiting the result
|
|
set to 3 rows before sorting those 3 rows would most likely return a result that
|
|
is different than this example, where the sorting comes before the limit.
|
|
|
|
[discrete]
|
|
[[esql-getting-started-eval]]
|
|
=== Compute values
|
|
|
|
Use the <<esql-eval>> command to append columns to a table, with calculated
|
|
values. For example, the following query appends a `duration_ms` column. The
|
|
values in the column are computed by dividing `event_duration` by 1,000,000. In
|
|
other words: `event_duration` converted from nanoseconds to milliseconds.
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/eval.csv-spec[tag=gs-eval]
|
|
----
|
|
|
|
`EVAL` supports several <<esql-functions,functions>>. For example, to round a
|
|
number to the closest number with the specified number of digits, use the
|
|
<<esql-round>> function:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/eval.csv-spec[tag=gs-round]
|
|
----
|
|
|
|
[discrete]
|
|
[[esql-getting-started-stats]]
|
|
=== Calculate statistics
|
|
|
|
{esql} can not only be used to query your data, you can also use it to aggregate
|
|
your data. Use the <<esql-stats-by>> command to calculate statistics. For
|
|
example, the median duration:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=gs-stats]
|
|
----
|
|
|
|
You can calculate multiple stats with one command:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=gs-two-stats]
|
|
----
|
|
|
|
Use `BY` to group calculated stats by one or more columns. For example, to
|
|
calculate the median duration per client IP:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/stats.csv-spec[tag=gs-stats-by]
|
|
----
|
|
|
|
[discrete]
|
|
[[esql-getting-started-access-columns]]
|
|
=== Access columns
|
|
|
|
You can access columns by their name. If a name contains special characters,
|
|
<<esql-identifiers,it needs to be quoted>> with backticks (+{backtick}+).
|
|
|
|
Assigning an explicit name to a column created by `EVAL` or `STATS` is optional.
|
|
If you don't provide a name, the new column name is equal to the function
|
|
expression. For example:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/eval.csv-spec[tag=gs-eval-no-column-name]
|
|
----
|
|
|
|
In this query, `EVAL` adds a new column named `event_duration/1000000.0`.
|
|
Because its name contains special characters, to access this column, quote it
|
|
with backticks:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/eval.csv-spec[tag=gs-eval-stats-backticks]
|
|
----
|
|
|
|
[discrete]
|
|
[[esql-getting-started-histogram]]
|
|
=== Create a histogram
|
|
|
|
To track statistics over time, {esql} enables you to create histograms using the
|
|
<<esql-bucket>> function. `BUCKET` creates human-friendly bucket sizes
|
|
and returns a value for each row that corresponds to the resulting bucket the
|
|
row falls into.
|
|
|
|
Combine `BUCKET` with <<esql-stats-by>> to create a histogram. For example,
|
|
to count the number of events per hour:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by]
|
|
----
|
|
|
|
Or the median duration per hour:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by-median]
|
|
----
|
|
|
|
[discrete]
|
|
[[esql-getting-started-enrich]]
|
|
=== Enrich data
|
|
|
|
{esql} enables you to <<esql-enrich-data,enrich>> a table with data from indices
|
|
in {es}, using the <<esql-enrich>> command.
|
|
|
|
image::images/esql/esql-enrich.png[align="center"]
|
|
|
|
Before you can use `ENRICH`, you first need to
|
|
<<esql-create-enrich-policy,create>> and <<esql-execute-enrich-policy,execute>>
|
|
an <<esql-enrich-policy,enrich policy>>.
|
|
|
|
include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-widget-enrich-policy.asciidoc[]
|
|
|
|
After creating and executing a policy, you can use it with the `ENRICH`
|
|
command:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/enrich.csv-spec[tag=gs-enrich]
|
|
----
|
|
|
|
You can use the new `env` column that's added by the `ENRICH` command in
|
|
subsequent commands. For example, to calculate the median duration per
|
|
environment:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/enrich.csv-spec[tag=gs-enrich-stats-by]
|
|
----
|
|
|
|
For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.
|
|
|
|
[discrete]
|
|
[[esql-getting-started-process-data]]
|
|
=== Process data
|
|
|
|
Your data may contain unstructured strings that you want to
|
|
<<esql-process-data-with-dissect-and-grok,structure>> to make it easier to
|
|
analyze the data. For example, the sample data contains log messages like:
|
|
|
|
[source,txt]
|
|
----
|
|
"Connected to 10.1.0.3"
|
|
----
|
|
|
|
By extracting the IP address from these messages, you can determine which IP has
|
|
accepted the most client connections.
|
|
|
|
To structure unstructured strings at query time, you can use the {esql}
|
|
<<esql-dissect>> and <<esql-grok>> commands. `DISSECT` works by breaking up a
|
|
string using a delimiter-based pattern. `GROK` works similarly, but uses regular
|
|
expressions. This makes `GROK` more powerful, but generally also slower.
|
|
|
|
In this case, no regular expressions are needed, as the `message` is
|
|
straightforward: "Connected to ", followed by the server IP. To match this
|
|
string, you can use the following `DISSECT` command:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/dissect.csv-spec[tag=gs-dissect]
|
|
----
|
|
|
|
This adds a `server_ip` column to those rows that have a `message` that matches
|
|
this pattern. For other rows, the value of `server_ip` is `null`.
|
|
|
|
You can use the new `server_ip` column that's added by the `DISSECT` command in
|
|
subsequent commands. For example, to determine how many connections each server
|
|
has accepted:
|
|
|
|
[source,esql]
|
|
----
|
|
include::{esql-specs}/dissect.csv-spec[tag=gs-dissect-stats-by]
|
|
----
|
|
|
|
For more about data processing with {esql}, refer to
|
|
<<esql-process-data-with-dissect-and-grok>>.
|
|
|
|
[discrete]
|
|
[[esql-getting-learn-more]]
|
|
=== Learn more
|
|
|
|
To learn more about {esql}, refer to <<esql-language>> and <<esql-using>>.
|