Refactors to address feedback, new screenshots.

This commit is contained in:
Paul Echeverri 2015-05-19 18:49:07 -07:00
parent 659729a638
commit 006ef675c8
15 changed files with 169 additions and 36 deletions

View file

@ -19,9 +19,18 @@ The material in this section assumes you have a working Kibana install connected
The tutorials in this section rely on the following data sets:
* The complete works of William Shakespeare, suitably parsed into fields. Download this data set by clicking here:
https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json[shakespeare.json].
https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json[shakespeare.json.gz].
* A set of fictitious accounts with randomly generated data. Download this data set by clicking here:
https://github.com/bly2k/files/blob/master/accounts.zip?raw=true[accounts.json]
https://github.com/bly2k/files/blob/master/accounts.zip?raw=true[accounts.json.gz]
* A set of randomly generated log files. Download this data set by clicking here: [logstash.json.gz]
The data sets are compressed with the `gzip` utility. Unzip the files after downloading them with the following
commands:
[source,shell]
gunzip shakespeare.json.gz
gunzip accounts.json.gz
gunzip logstash.json.gz
The Shakespeare data set is organized in the following schema:
@ -52,11 +61,51 @@ The accounts data set is organized in the following schema:
"state": "String"
}
After downloading the data sets, load them into Elasticsearch with the following commands:
The schema for the logs data set has 114 different fields, but the notable ones used in this tutorial are:
[source,json]
{
"memory": INT,
"geo.coordinates": "geo_point"
"@timestamp": "date"
}
Before we load the Shakespeare data set, we need to set up a {ref}/mapping.html[_mapping_] for the fields. Mapping
divides the documents in the index into logical groups and specifies a field's characteristics, such as the field's
searchability or whether or not it's _tokenized_, or broken up into separate words.
Use the following command to set up a mapping for the Shakespeare data set:
[source,shell]
$ curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
$ curl -XPOST 'localhost:9200/play/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPUT http://localhost:9200/shakespeare -d '
{
"mappings" : {
"_default_" : {
"properties" : {
"speaker" : {"type": "string", "index" : "not_analyzed" },
"play_name" : {"type": "string", "index" : "not_analyzed" },
"line_id" : { "type" : "integer" },
"speech_number" : { "type" : "integer" }
}
}
}
}
';
This mapping specifies the following qualities for the data set:
* The _speaker_ field is a string that isn't analyzed. The string in this field is treated as a single unit, even if
there are multiple words in the field.
* The same applies to the _play_name_ field.
* The line_id and speech_number fields are integers.
The accounts and logstash data sets don't require any mappings, so at this point we're ready to load the data sets into
Elasticsearch with the following commands:
[source,shell]
curl -XPOST 'localhost:9200/bank/_bulk?pretty' --data-binary @accounts.json
curl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logstash.json
These commands may take some time to execute, depending on the computing resources available.
@ -71,15 +120,18 @@ You should see output similar to the following:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank 5 1 1000 0 418.2kb 418.2kb
yellow open shakespeare 5 1 111396 0 17.6mb 17.6mb
yellow open logstash-2015.05.18 5 1 4631 0 15.6mb 15.6mb
yellow open logstash-2015.05.19 5 1 4624 0 15.7mb 15.7mb
yellow open logstash-2015.05.20 5 1 4750 0 16.4mb 16.4mb
[float]
[[tutorial-define-index]]
=== Defining Your Index Patterns
Each set of data loaded to Elasticsearch has an https://www.elastic.co/guide/en/kibana/current/settings.html#settings-create-pattern[index pattern]. In the previous section, the Shakespeare data set has an index named `shakespeare`, and the accounts
data set has an index named `bank`. An _index pattern_ is a regular expression that can
match multiple indices. For example, in the common logging use case, a typical index name contains the date in MM-DD-YYYY
format, and an index pattern for May would look something like `logstash-05-*`.
data set has an index named `bank`. An _index pattern_ is a string with optional wildcards that can match multiple
indices. For example, in the common logging use case, a typical index name contains the date in MM-DD-YYYY
format, and an index pattern for May would look something like `logstash-2015.05*`.
For this tutorial, any pattern that matches either of the two indices we've loaded will work. Open a browser and
navigate to `localhost:5601`. Click the *Settings* tab, then the *Indices* tab. Click *Add New* to define a new index
@ -102,11 +154,17 @@ which you can save and load by clicking the buttons to the right of the search b
Beneath the search box, the current index pattern is displayed in a drop-down. You can change the index pattern by
selecting a different pattern from the drop-down selector.
You can construct searches by using the field names and the values you're interested in. With numeric fields you can
use comparison operators such as greater than (>), less than (<), or equals (=). You can link elements with the
logical operators AND, OR, and NOT, all in uppercase.
Try selecting the `ba*` index pattern and putting the following search into the search box:
[source,text]
account_number:<100 AND balance:>47500
This search returns all account numbers between zero and 99 with balances in excess of 47,500.
If you're using the linked sample data set, this search returns 5 results: Account numbers 8, 32, 78, 85, and 97.
image::images/tutorial-discover-2.png[]
@ -122,17 +180,20 @@ image::images/tutorial-discover-3.png[]
=== Data Visualization: Beyond Discovery
The visualization tools available on the *Visualize* tab enable you to display aspects of your data sets in several
different ways. Visualizations depend on Elasticsearch {ref}/search-aggregations.html[aggregations] in two different
types: _bucket_ aggregations and _metric_ aggregations. A bucket aggregation sorts your data according to criteria you
specify. For example, in our accounts data set, we can establish a range of account balances, then display what
proportions of the total fall into which range of balances.
different ways.
Click on the *Visualize* tab to start:
image::images/tutorial-visualize.png[]
Click on *Pie chart*, then *From a new search*. Select the `ba*` index pattern. The whole pie displays, since we
haven't specified any buckets yet.
Click on *Pie chart*, then *From a new search*. Select the `ba*` index pattern.
Visualizations depend on Elasticsearch {ref}/search-aggregations.html[aggregations] in two different types: _bucket_
aggregations and _metric_ aggregations. A bucket aggregation sorts your data according to criteria you specify. For
example, in our accounts data set, we can establish a range of account balances, then display what proportions of the
total fall into which range of balances.
The whole pie displays, since we haven't specified any buckets yet.
image::images/tutorial-visualize-pie-1.png[]
@ -165,23 +226,67 @@ Save this chart by clicking the *Save Visualization* button to the right of the
_Pie Example_.
Next, we're going to make a bar chart. Click on *New Visualization*, then *Vertical bar chart*. Select *From a new
search* and the `ba*` index pattern, just as you did for the pie chart. You'll see a single big bar, since we haven't
defined any buckets yet:
search* and the `shakes*` index pattern. You'll see a single big bar, since we haven't defined any buckets yet:
image::images/tutorial-visualize-bar-1.png[]
For the Y-axis metrics aggregation, select *Average*, with *age* as the field. For the X-Axis buckets, select the
*Range* aggregation and define the same ranges as you did for the pie chart.
For the Y-axis metrics aggregation, select *Unique Count*, with *speaker* as the field. For Shakespeare plays, it might
be useful to know which plays have the lowest number of distinct speaking parts, if your theater company is short on
actors. For the X-Axis buckets, select the *Terms* aggregation with the *play_name* field. For the *Order*, select
*Bottom*, leaving the *Size* at 5.
Now, click *Add sub-buckets* and *Split Bars* to refine our data. In addition to listing the average age of the
accounts in each balance range, we're going to split the bars by the top five states with the highest average ages.
Select *Terms* as the sub-aggregation, with *state* as the field. Leave the other elements at their default values and
click the green *Apply changes* button. Your chart should now look like this:
Leave the other elements at their default values and click the green *Apply changes* button. Your chart should now look
like this:
image::images/tutorial-visualize-bar-2.png[]
Notice how the individual play names show up as whole phrases, instead of being broken down into individual words. This
is the result of the mapping we did at the beginning of the tutorial, when we marked the *play_name* field as 'not
analyzed'.
Hovering on each bar shows you the number of speaking parts for each play as a tooltip. You can turn this behavior off,
as well as change many other options for your visualizations, by clicking the *Options* tab in the top left.
Now that you have a list of the smallest casts for Shakespeare plays, you might also be curious to see which of these
plays makes the greatest demands on an individual actor by showing the maximum number of speeches for a given part. Add
a Y-axis aggregation with the *Add metrics* button, then choose the *Max* aggregation for the *speech_number* field. In
the *Options* tab, change the *Bar Mode* drop-down to *grouped*, then click the green *Apply changes* button. Your
chart should now look like this:
image::images/tutorial-visualize-bar-3.png[]
As you can see, _Love's Labours Lost_ has an unusually high maximum speech number, compared to the other plays, and
might therefore make more demands on an actor's memory.
Save this chart with the name _Bar Example_.
Next, we're going to make a tile map chart to visualize some geographic data. Click on *New Visualization*, then
*Tile map*. Select *From a new search* and the `logstash-*` index pattern. Define the time window for the events we're
exploring by clicking the time selector at the top right of the Kibana interface. Click on *Absolute*, then set the
end time for the range to May 20, 2015 and the start time to May 18, 2015:
image::images/tutorial-timepicker.png[]
Once you've got the time range set up, click the *Go* button, then close the time picker by clicking the small up arrow
at the bottom. You'll see a map of the world, since we haven't defined any buckets yet:
image::images/tutorial-visualize-map-1.png[]
Select *Geo Coordinates* as the bucket, then click the green *Apply changes* button. Your chart should now look like
this:
image::images/tutorial-visualize-map-2.png[]
You can navigate the map by clicking and dragging, zoom with the *+/-* buttons, or hit the *Fit Data Bounds* button to
zoom to the lowest level that includes all the points. You can also create a filter to define a rectangle on the map,
either to include or exclude, by clicking the *Latitude/Longitude Filter* button and drawing a bounding box on the map.
A green oval with the filter definition displays right under the query box:
image::images/tutorial-visualize-map-3.png[]
Hover on the filter to display the controls to toggle, pin, invert, or delete the filter. Save this chart with the name
_Bar Example_.
Finally, we're going to define a sample Markdown widget to display on our dashboard. Click on *New Visualization*, then
*Markdown widget*, to display a very simple Markdown entry field:
@ -206,10 +311,10 @@ Save this visualization with the name _Markdown Example_.
A Kibana dashboard is a collection of visualizations that you can arrange and share. To get started, click the
*Dashboard* tab, then the *Add Visualization* button at the far right of the search box to display the list of saved
visualizations. Select _Markdown Example_, _Pie Example_, and _Bar Example_, then close the list of visualizations by
clicking the small up-arrow at the bottom of the list. You can move the containers for each visualization by
clicking and dragging the title bar. Resize the containers by dragging the lower right corner of a visualization's
container. Your sample dashboard should end up looking roughly like this:
visualizations. Select _Markdown Example_, _Pie Example_, _Bar Example_, and _Map Example_, then close the list of
visualizations by clicking the small up-arrow at the bottom of the list. You can move the containers for each
visualization by clicking and dragging the title bar. Resize the containers by dragging the lower right corner of a
visualization's container. Your sample dashboard should end up looking roughly like this:
image::images/tutorial-dashboard.png[]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 88 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 77 KiB

After

Width:  |  Height:  |  Size: 856 KiB

Before After
Before After

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 41 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

After

Width:  |  Height:  |  Size: 62 KiB

Before After
Before After

Binary file not shown.

After

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 498 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 511 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 619 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

After

Width:  |  Height:  |  Size: 85 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 109 KiB

After

Width:  |  Height:  |  Size: 117 KiB

Before After
Before After

View file

@ -30,10 +30,19 @@ The availability of these options varies depending on the aggregation you choose
Select the *Options* tab to change the following aspects of the chart:
*Y-Axis Scale*:: You can select *linear*, *log*, or *square root* scales for the chart's Y axis.
*Smooth Lines*:: Check this box to curve the line from point to point.
*Y-Axis Scale*:: You can select *linear*, *log*, or *square root* scales for the chart's Y axis. You can use a log
scale to display data that varies exponentially, such as a compounding interest chart, or a square root scale to
regularize the display of data sets with variabilities that are themselves highly variable. This kind of data, where
the variability is itself variable over the domain being examined, is known as _heteroscedastic_ data. For example, if
a data set of height versus weight has a relatively narrow range of variability at the short end of height, but a wider
range at the taller end, the data set is heteroscedastic.
*Smooth Lines*:: Check this box to curve the line from point to point. Bear in mind that smoothed lines necessarily
affect the representation of your data and create a potential for ambiguity.
*Show Connecting Lines*:: Check this box to draw lines between the points on the chart.
*Show Circles*:: Check this box to draw each data point on the chart as a small circle.
*Current time marker*:: For charts of time-series data, check this box to draw a red line on the current time.
*Set Y-Axis Extents*:: Check this box and enter values in the *y-max* and *y-min* fields to set the Y axis to specific
values.
*Show Tooltip*:: Check this box to enable the display of tooltips.
*Show Legend*:: Check this box to enable the display of a legend next to the chart.
*Scale Y-Axis to Data Bounds*:: The default Y-axis bounds are zero and the maximum value returned in the data. Check

View file

@ -30,6 +30,9 @@ for the {ref}/search-aggregations-bucket-geohashgrid-aggregation.html#_cell_dime
aggregation for details on the area specified by each precision level. As of the 4.1 release, Kibana supports a maximum
geohash length of 7.
NOTE: Higher precisions increase memory usage for the browser displaying Kibana as well as for the underlying
Elasticsearch cluster.
Once you've specified a buckets aggregation, you can define sub-aggregations to refine the visualization. Tile maps
only support sub-aggregations as split charts. Click *+ Add Sub Aggregation*, then *Split Chart* to select a
sub-aggregation from the list of types:
@ -63,6 +66,8 @@ add another filter.
*Geohash*:: The {ref}/search-aggregations-bucket-geohashgrid-aggregation.html[_geohash_] aggregation displays points
based on the geohash coordinates.
NOTE: By default, the *Change precision on map zoom* box is checked. Uncheck the box to disable this behavior.
You can click the *Advanced* link to display more customization options for your metrics or bucket aggregation:
*Exclude Pattern*:: Specify a pattern in this field to exclude from the results.
@ -82,10 +87,21 @@ The availability of these options varies depending on the aggregation you choose
Select the *Options* tab to change the following aspects of the chart:
*Shaded Circle Markers*:: Displays the markers with different shades based on the metric aggregation's value.
*Scaled Circle Markers*:: Scale the size of the markers based on the metric aggregation's value.
*Shaded Geohash Grid*:: Displays the rectangular cells of the geohash grid instead of circular markers, with different
*Map type*:: Select one of the following options from the drop-down.
*_Scaled Circle Markers_*:: Scale the size of the markers based on the metric aggregation's value.
*_Shaded Circle Markers_*:: Displays the markers with different shades based on the metric aggregation's value.
*_Shaded Geohash Grid_*:: Displays the rectangular cells of the geohash grid instead of circular markers, with different
shades based on the metric aggregation's value.
*_Heatmap_*:: A heat map applies blurring to the circle markers and applies shading based on the amount of overlap.
Heatmaps have the following options:
* *Radius*: Sets the size of the individual heatmap dots.
* *Blur*: Sets the amount of blurring for the heatmap dots.
* *Maximum zoom*: Tilemaps in Kibana support 18 zoom levels. This slider defines the maximum zoom level at which the
heatmap dots appear at full intensity.
* *Minimum opacity*: Sets the opacity cutoff for the dots.
* *Show Tooltip*: Check this box to have a tooltip with the values for a given dot when the cursor is on that dot.
*Desaturate map tiles*:: Desaturate the map's color in order to make the markers stand out more clearly.
After changing options, click the green *Apply changes* button to update your visualization, or the grey *Discard
@ -94,7 +110,10 @@ changes* button to keep your visualization in its current state.
[float]
[[navigating-map]]
==== Navigating the Map
Once your tilemap visualization is ready, you can explore the map in several ways. Click and hold anywhere on the map
and move the cursor to move the map center. Hold Shift and drag a bounding box across the map to zoom in on the
selection. Click the *Fit Data Bounds* button to automatically crop the map boundaries to the geohash buckets that have
at least one result.
Once your tilemap visualization is ready, you can explore the map in several ways:
* Click and hold anywhere on the map and move the cursor to move the map center. Hold Shift and drag a bounding box
across the map to zoom in on the selection.
* Click the *Fit Data Bounds* button to automatically crop the map boundaries to the geohash buckets that have at least
one result.
* Click the *Latitude/Longitude Filter* button, then drag a bounding box across the map, to create a filter for the box
coordinates.

View file

@ -91,7 +91,7 @@ Use the aggregation builder on the left of the page to configure the {ref}/searc
visualization. Buckets are analogous to SQL `GROUP BY` statements. For more information on aggregations, see the main
{ref}/search-aggregations.html[Elasticsearch aggregations reference].
Bar or line chart visualizations use _metrics_ for the y-axis and _buckets_ are used for the x-axis, segment bar
Bar, line, or area chart visualizations use _metrics_ for the y-axis and _buckets_ are used for the x-axis, segment bar
colors, and row/column splits. For pie charts, use the metric for the slice size and the bucket for the number of
slices.