mirror of
https://github.com/elastic/kibana.git
synced 2025-04-25 02:09:32 -04:00
This overhaul of the docs structure puts Kibana's documentation more inline with the structure that is used in Elasticsearch. This will help us better organize the docs going forward as more docs are added. This also includes a few necessary content changes for 5.0.
171 lines
4.9 KiB
Text
171 lines
4.9 KiB
Text
[[tutorial-load-dataset]]
|
|
== Loading Sample Data
|
|
|
|
The tutorials in this section rely on the following data sets:
|
|
|
|
* The complete works of William Shakespeare, suitably parsed into fields. Download this data set by clicking here:
|
|
https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json[shakespeare.json].
|
|
* A set of fictitious accounts with randomly generated data. Download this data set by clicking here:
|
|
https://github.com/bly2k/files/blob/master/accounts.zip?raw=true[accounts.zip]
|
|
* A set of randomly generated log files. Download this data set by clicking here:
|
|
https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz[logs.jsonl.gz]
|
|
|
|
Two of the data sets are compressed. Use the following commands to extract the files:
|
|
|
|
[source,shell]
|
|
unzip accounts.zip
|
|
gunzip logs.jsonl.gz
|
|
|
|
The Shakespeare data set is organized in the following schema:
|
|
|
|
[source,json]
|
|
{
|
|
"line_id": INT,
|
|
"play_name": "String",
|
|
"speech_number": INT,
|
|
"line_number": "String",
|
|
"speaker": "String",
|
|
"text_entry": "String",
|
|
}
|
|
|
|
The accounts data set is organized in the following schema:
|
|
|
|
[source,json]
|
|
{
|
|
"account_number": INT,
|
|
"balance": INT,
|
|
"firstname": "String",
|
|
"lastname": "String",
|
|
"age": INT,
|
|
"gender": "M or F",
|
|
"address": "String",
|
|
"employer": "String",
|
|
"email": "String",
|
|
"city": "String",
|
|
"state": "String"
|
|
}
|
|
|
|
The schema for the logs data set has dozens of different fields, but the notable ones used in this tutorial are:
|
|
|
|
[source,json]
|
|
{
|
|
"memory": INT,
|
|
"geo.coordinates": "geo_point"
|
|
"@timestamp": "date"
|
|
}
|
|
|
|
Before we load the Shakespeare and logs data sets, we need to set up {es-ref}mapping.html[_mappings_] for the fields.
|
|
Mapping divides the documents in the index into logical groups and specifies a field's characteristics, such as the
|
|
field's searchability or whether or not it's _tokenized_, or broken up into separate words.
|
|
|
|
Use the following command to set up a mapping for the Shakespeare data set:
|
|
|
|
[source,shell]
|
|
curl -XPUT http://localhost:9200/shakespeare -d '
|
|
{
|
|
"mappings" : {
|
|
"_default_" : {
|
|
"properties" : {
|
|
"speaker" : {"type": "string", "index" : "not_analyzed" },
|
|
"play_name" : {"type": "string", "index" : "not_analyzed" },
|
|
"line_id" : { "type" : "integer" },
|
|
"speech_number" : { "type" : "integer" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
';
|
|
|
|
This mapping specifies the following qualities for the data set:
|
|
|
|
* The _speaker_ field is a string that isn't analyzed. The string in this field is treated as a single unit, even if
|
|
there are multiple words in the field.
|
|
* The same applies to the _play_name_ field.
|
|
* The _line_id_ and _speech_number_ fields are integers.
|
|
|
|
The logs data set requires a mapping to label the latitude/longitude pairs in the logs as geographic locations by
|
|
applying the `geo_point` type to those fields.
|
|
|
|
Use the following commands to establish `geo_point` mapping for the logs:
|
|
|
|
[source,shell]
|
|
curl -XPUT http://localhost:9200/logstash-2015.05.18 -d '
|
|
{
|
|
"mappings": {
|
|
"log": {
|
|
"properties": {
|
|
"geo": {
|
|
"properties": {
|
|
"coordinates": {
|
|
"type": "geo_point"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
';
|
|
|
|
[source,shell]
|
|
curl -XPUT http://localhost:9200/logstash-2015.05.19 -d '
|
|
{
|
|
"mappings": {
|
|
"log": {
|
|
"properties": {
|
|
"geo": {
|
|
"properties": {
|
|
"coordinates": {
|
|
"type": "geo_point"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
';
|
|
|
|
[source,shell]
|
|
curl -XPUT http://localhost:9200/logstash-2015.05.20 -d '
|
|
{
|
|
"mappings": {
|
|
"log": {
|
|
"properties": {
|
|
"geo": {
|
|
"properties": {
|
|
"coordinates": {
|
|
"type": "geo_point"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
';
|
|
|
|
The accounts data set doesn't require any mappings, so at this point we're ready to use the Elasticsearch
|
|
{es-ref}docs-bulk.html[`bulk`] API to load the data sets with the following commands:
|
|
|
|
[source,shell]
|
|
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
|
|
curl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
|
|
curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl
|
|
|
|
These commands may take some time to execute, depending on the computing resources available.
|
|
|
|
Verify successful loading with the following command:
|
|
|
|
[source,shell]
|
|
curl 'localhost:9200/_cat/indices?v'
|
|
|
|
You should see output similar to the following:
|
|
|
|
[source,shell]
|
|
health status index pri rep docs.count docs.deleted store.size pri.store.size
|
|
yellow open bank 5 1 1000 0 418.2kb 418.2kb
|
|
yellow open shakespeare 5 1 111396 0 17.6mb 17.6mb
|
|
yellow open logstash-2015.05.18 5 1 4631 0 15.6mb 15.6mb
|
|
yellow open logstash-2015.05.19 5 1 4624 0 15.7mb 15.7mb
|
|
yellow open logstash-2015.05.20 5 1 4750 0 16.4mb 16.4mb
|