mirror of
https://github.com/elastic/kibana.git
synced 2025-04-24 17:59:23 -04:00
* [docs/getting started] remove types * update urls * one primary shard * Update docs/getting-started/tutorial-load-dataset.asciidoc Co-Authored-By: jbudz <jbudz@users.noreply.github.com> * dedupe
190 lines
5.3 KiB
Text
190 lines
5.3 KiB
Text
[[tutorial-load-dataset]]
|
|
=== Loading sample data
|
|
|
|
This tutorial requires three data sets:
|
|
|
|
* The complete works of William Shakespeare, suitably parsed into fields
|
|
* A set of fictitious accounts with randomly generated data
|
|
* A set of randomly generated log files
|
|
|
|
Create a new working directory where you want to download the files. From that directory, run the following commands:
|
|
|
|
[source,shell]
|
|
curl -O https://download.elastic.co/demos/kibana/gettingstarted/7.x/shakespeare.json
|
|
curl -O https://download.elastic.co/demos/kibana/gettingstarted/7.x/accounts.zip
|
|
curl -O https://download.elastic.co/demos/kibana/gettingstarted/7.x/logs.jsonl.gz
|
|
|
|
Two of the data sets are compressed. To extract the files, use these commands:
|
|
|
|
[source,shell]
|
|
unzip accounts.zip
|
|
gunzip logs.jsonl.gz
|
|
|
|
==== Structure of the data sets
|
|
|
|
The Shakespeare data set has this structure:
|
|
|
|
[source,json]
|
|
{
|
|
"line_id": INT,
|
|
"play_name": "String",
|
|
"speech_number": INT,
|
|
"line_number": "String",
|
|
"speaker": "String",
|
|
"text_entry": "String",
|
|
}
|
|
|
|
The accounts data set is structured as follows:
|
|
|
|
[source,json]
|
|
{
|
|
"account_number": INT,
|
|
"balance": INT,
|
|
"firstname": "String",
|
|
"lastname": "String",
|
|
"age": INT,
|
|
"gender": "M or F",
|
|
"address": "String",
|
|
"employer": "String",
|
|
"email": "String",
|
|
"city": "String",
|
|
"state": "String"
|
|
}
|
|
|
|
The logs data set has dozens of different fields. Here are the notable fields for this tutorial:
|
|
|
|
[source,json]
|
|
{
|
|
"memory": INT,
|
|
"geo.coordinates": "geo_point"
|
|
"@timestamp": "date"
|
|
}
|
|
|
|
==== Set up mappings
|
|
|
|
Before you load the Shakespeare and logs data sets, you must set up {ref}/mapping.html[_mappings_] for the fields.
|
|
Mappings divide the documents in the index into logical groups and specify the characteristics
|
|
of the fields. These characteristics include the searchability of the field
|
|
and whether it's _tokenized_, or broken up into separate words.
|
|
|
|
NOTE: If security is enabled, you must have the `all` Kibana privilege to run this tutorial.
|
|
You must also have the `create`, `manage` `read`, `write,` and `delete`
|
|
index privileges. See {xpack-ref}/security-privileges.html[Security Privileges]
|
|
for more information.
|
|
|
|
In Kibana *Dev Tools > Console*, set up a mapping for the Shakespeare data set:
|
|
|
|
[source,js]
|
|
PUT /shakespeare
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"speaker": {"type": "keyword"},
|
|
"play_name": {"type": "keyword"},
|
|
"line_id": {"type": "integer"},
|
|
"speech_number": {"type": "integer"}
|
|
}
|
|
}
|
|
}
|
|
|
|
//CONSOLE
|
|
|
|
This mapping specifies field characteristics for the data set:
|
|
|
|
* The `speaker` and `play_name` fields are keyword fields. These fields are not analyzed.
|
|
The strings are treated as a single unit even if they contain multiple words.
|
|
* The `line_id` and `speech_number` fields are integers.
|
|
|
|
The logs data set requires a mapping to label the latitude and longitude pairs
|
|
as geographic locations by applying the `geo_point` type.
|
|
|
|
[source,js]
|
|
PUT /logstash-2015.05.18
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"geo": {
|
|
"properties": {
|
|
"coordinates": {
|
|
"type": "geo_point"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
//CONSOLE
|
|
|
|
[source,js]
|
|
PUT /logstash-2015.05.19
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"geo": {
|
|
"properties": {
|
|
"coordinates": {
|
|
"type": "geo_point"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
//CONSOLE
|
|
|
|
[source,js]
|
|
PUT /logstash-2015.05.20
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"geo": {
|
|
"properties": {
|
|
"coordinates": {
|
|
"type": "geo_point"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
//CONSOLE
|
|
|
|
The accounts data set doesn't require any mappings.
|
|
|
|
==== Load the data sets
|
|
|
|
At this point, you're ready to use the Elasticsearch {ref}/docs-bulk.html[bulk]
|
|
API to load the data sets:
|
|
|
|
[source,shell]
|
|
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
|
|
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
|
|
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl
|
|
|
|
Or for Windows users, in Powershell:
|
|
[source,shell]
|
|
Invoke-RestMethod "http://localhost:9200/bank/account/_bulk?pretty" -Method Post -ContentType 'application/x-ndjson' -InFile "accounts.json"
|
|
Invoke-RestMethod "http://localhost:9200/shakespeare/_bulk?pretty" -Method Post -ContentType 'application/x-ndjson' -InFile "shakespeare.json"
|
|
Invoke-RestMethod "http://localhost:9200/_bulk?pretty" -Method Post -ContentType 'application/x-ndjson' -InFile "logs.jsonl"
|
|
|
|
These commands might take some time to execute, depending on the available computing resources.
|
|
|
|
Verify successful loading:
|
|
|
|
[source,js]
|
|
GET /_cat/indices?v
|
|
|
|
//CONSOLE
|
|
|
|
Your output should look similar to this:
|
|
|
|
[source,shell]
|
|
health status index pri rep docs.count docs.deleted store.size pri.store.size
|
|
yellow open bank 1 1 1000 0 418.2kb 418.2kb
|
|
yellow open shakespeare 1 1 111396 0 17.6mb 17.6mb
|
|
yellow open logstash-2015.05.18 1 1 4631 0 15.6mb 15.6mb
|
|
yellow open logstash-2015.05.19 1 1 4624 0 15.7mb 15.7mb
|
|
yellow open logstash-2015.05.20 1 1 4750 0 16.4mb 16.4mb
|