mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
Migrated documentation into the main repo
This commit is contained in:
parent
b9558edeff
commit
822043347e
316 changed files with 23987 additions and 0 deletions
3
.gitignore
vendored
3
.gitignore
vendored
|
@ -8,6 +8,8 @@ logs/
|
||||||
build/
|
build/
|
||||||
target/
|
target/
|
||||||
.local-execution-hints.log
|
.local-execution-hints.log
|
||||||
|
docs/html/
|
||||||
|
docs/build.log
|
||||||
|
|
||||||
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
|
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
|
||||||
## The only configuration files which are not ignored are .settings since
|
## The only configuration files which are not ignored are .settings since
|
||||||
|
@ -19,6 +21,7 @@ target/
|
||||||
*/.project
|
*/.project
|
||||||
*/.classpath
|
*/.classpath
|
||||||
*/eclipse-build
|
*/eclipse-build
|
||||||
|
.settings/
|
||||||
|
|
||||||
## netbeans ignores
|
## netbeans ignores
|
||||||
nb-configuration.xml
|
nb-configuration.xml
|
||||||
|
|
152
docs/community/clients.asciidoc
Normal file
152
docs/community/clients.asciidoc
Normal file
|
@ -0,0 +1,152 @@
|
||||||
|
== Clients
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Perl
|
||||||
|
|
||||||
|
* http://github.com/clintongormley/ElasticSearch.pm[ElasticSearch.pm]:
|
||||||
|
Perl client.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Python
|
||||||
|
|
||||||
|
* http://github.com/aparo/pyes[pyes]:
|
||||||
|
Python client.
|
||||||
|
|
||||||
|
* http://github.com/rhec/pyelasticsearch[pyelasticsearch]:
|
||||||
|
Python client.
|
||||||
|
|
||||||
|
* https://github.com/eriky/ESClient[ESClient]:
|
||||||
|
A lightweight and easy to use Python client for ElasticSearch.
|
||||||
|
|
||||||
|
* https://github.com/humangeo/rawes[rawes]:
|
||||||
|
Python low level client.
|
||||||
|
|
||||||
|
* https://github.com/mozilla/elasticutils/[elasticutils]:
|
||||||
|
A friendly chainable ElasticSearch interface for Python.
|
||||||
|
|
||||||
|
* http://intridea.github.io/surfiki-refine-elasticsearch/[Surfiki Refine]:
|
||||||
|
Python Map-Reduce engine targeting Elasticsearch indices.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Ruby
|
||||||
|
|
||||||
|
* http://github.com/karmi/tire[Tire]:
|
||||||
|
Ruby API & DSL, with ActiveRecord/ActiveModel integration.
|
||||||
|
|
||||||
|
* http://github.com/grantr/rubberband[rubberband]:
|
||||||
|
Ruby client.
|
||||||
|
|
||||||
|
* https://github.com/PoseBiz/stretcher[stretcher]:
|
||||||
|
Ruby client.
|
||||||
|
|
||||||
|
* https://github.com/wireframe/elastic_searchable/[elastic_searchable]:
|
||||||
|
Ruby client + Rails integration.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== PHP
|
||||||
|
|
||||||
|
* http://github.com/ruflin/Elastica[Elastica]:
|
||||||
|
PHP client.
|
||||||
|
|
||||||
|
* http://github.com/nervetattoo/elasticsearch[elasticsearch] PHP client.
|
||||||
|
|
||||||
|
* http://github.com/polyfractal/Sherlock[Sherlock]:
|
||||||
|
PHP client, one-to-one mapping with query DSL, fluid interface.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Java
|
||||||
|
|
||||||
|
* https://github.com/searchbox-io/Jest[Jest]:
|
||||||
|
Java Rest client.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Javascript
|
||||||
|
|
||||||
|
* https://github.com/fullscale/elastic.js[Elastic.js]:
|
||||||
|
A JavaScript implementation of the ElasticSearch Query DSL and Core API.
|
||||||
|
|
||||||
|
* https://github.com/phillro/node-elasticsearch-client[node-elasticsearch-client]:
|
||||||
|
A NodeJS client for elastic search.
|
||||||
|
|
||||||
|
* https://github.com/ramv/node-elastical[node-elastical]:
|
||||||
|
Node.js client for the ElasticSearch REST API
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== .Net
|
||||||
|
|
||||||
|
* https://github.com/Yegoroff/PlainElastic.Net[PlainElastic.Net]:
|
||||||
|
.NET client.
|
||||||
|
|
||||||
|
* https://github.com/Mpdreamz/NEST[NEST]:
|
||||||
|
.NET client.
|
||||||
|
|
||||||
|
* https://github.com/medcl/ElasticSearch.Net[ElasticSearch.NET]:
|
||||||
|
.NET client.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Scala
|
||||||
|
|
||||||
|
* https://github.com/sksamuel/elastic4s[elastic4s]:
|
||||||
|
Scala DSL.
|
||||||
|
|
||||||
|
* https://github.com/scalastuff/esclient[esclient]:
|
||||||
|
Thin Scala client.
|
||||||
|
|
||||||
|
* https://github.com/bsadeh/scalastic[scalastic]:
|
||||||
|
Scala client.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Clojure
|
||||||
|
|
||||||
|
* http://github.com/clojurewerkz/elastisch[Elastisch]:
|
||||||
|
Clojure client.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Go
|
||||||
|
|
||||||
|
* https://github.com/mattbaird/elastigo[elastigo]:
|
||||||
|
Go client.
|
||||||
|
|
||||||
|
* https://github.com/belogik/goes[goes]:
|
||||||
|
Go lib.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Erlang
|
||||||
|
|
||||||
|
* http://github.com/tsloughter/erlastic_search[erlastic_search]:
|
||||||
|
Erlang client using HTTP.
|
||||||
|
|
||||||
|
* https://github.com/dieswaytoofast/erlasticsearch[erlasticsearch]:
|
||||||
|
Erlang client using Thrift.
|
||||||
|
|
||||||
|
* https://github.com/datahogs/tirexs[Tirexs]:
|
||||||
|
An https://github.com/elixir-lang/elixir[Elixir] based API/DSL, inspired by
|
||||||
|
http://github.com/karmi/tire[Tire]. Ready to use in pure Erlang
|
||||||
|
environment.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== EventMachine
|
||||||
|
|
||||||
|
* http://github.com/vangberg/em-elasticsearch[em-elasticsearch]:
|
||||||
|
elasticsearch library for eventmachine.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Command Line
|
||||||
|
|
||||||
|
* https://github.com/elasticsearch/es2unix[es2unix]:
|
||||||
|
Elasticsearch API consumable by the Linux command line.
|
||||||
|
|
||||||
|
* https://github.com/javanna/elasticshell[elasticshell]:
|
||||||
|
command line shell for elasticsearch.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== OCaml
|
||||||
|
|
||||||
|
* https://github.com/tovbinm/ocaml-elasticsearch[ocaml-elasticsearch]:
|
||||||
|
OCaml client for Elasticsearch
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Smalltalk
|
||||||
|
|
||||||
|
* http://ss3.gemstone.com/ss/Elasticsearch.html[Elasticsearch] -
|
||||||
|
Smalltalk client for Elasticsearch
|
16
docs/community/frontends.asciidoc
Normal file
16
docs/community/frontends.asciidoc
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
== Front Ends
|
||||||
|
|
||||||
|
* https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo[Sense]:
|
||||||
|
Chrome curl-like plugin for runninq requests against an Elasticsearch node
|
||||||
|
|
||||||
|
* https://github.com/mobz/elasticsearch-head[elasticsearch-head]:
|
||||||
|
A web front end for an elastic search cluster.
|
||||||
|
|
||||||
|
* https://github.com/OlegKunitsyn/elasticsearch-browser[browser]:
|
||||||
|
Web front-end over elasticsearch data.
|
||||||
|
|
||||||
|
* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor]:
|
||||||
|
Front-end to help debug/diagnose queries and analyzers
|
||||||
|
|
||||||
|
* http://elastichammer.exploringelasticsearch.com/[Hammer]:
|
||||||
|
Web front-end for elasticsearch
|
5
docs/community/github.asciidoc
Normal file
5
docs/community/github.asciidoc
Normal file
|
@ -0,0 +1,5 @@
|
||||||
|
== GitHub
|
||||||
|
|
||||||
|
GitHub is a place where a lot of development is done around
|
||||||
|
*elasticsearch*, here is a simple search for
|
||||||
|
https://github.com/search?q=elasticsearch&type=Repositories[repositories].
|
15
docs/community/index.asciidoc
Normal file
15
docs/community/index.asciidoc
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
= Community Supported Clients
|
||||||
|
|
||||||
|
|
||||||
|
include::clients.asciidoc[]
|
||||||
|
|
||||||
|
include::frontends.asciidoc[]
|
||||||
|
|
||||||
|
include::integrations.asciidoc[]
|
||||||
|
|
||||||
|
include::misc.asciidoc[]
|
||||||
|
|
||||||
|
include::monitoring.asciidoc[]
|
||||||
|
|
||||||
|
include::github.asciidoc[]
|
||||||
|
|
71
docs/community/integrations.asciidoc
Normal file
71
docs/community/integrations.asciidoc
Normal file
|
@ -0,0 +1,71 @@
|
||||||
|
== Integrations
|
||||||
|
|
||||||
|
|
||||||
|
* http://grails.org/plugin/elasticsearch[Grails]:
|
||||||
|
ElasticSearch Grails plugin.
|
||||||
|
|
||||||
|
* https://github.com/carrot2/elasticsearch-carrot2[carrot2]:
|
||||||
|
Results clustering with carrot2
|
||||||
|
|
||||||
|
* https://github.com/angelf/escargot[escargot]:
|
||||||
|
ElasticSearch connector for Rails (WIP).
|
||||||
|
|
||||||
|
* https://metacpan.org/module/Catalyst::Model::Search::ElasticSearch[Catalyst]:
|
||||||
|
ElasticSearch and Catalyst integration.
|
||||||
|
|
||||||
|
* http://github.com/aparo/django-elasticsearch[django-elasticsearch]:
|
||||||
|
Django ElasticSearch Backend.
|
||||||
|
|
||||||
|
* http://github.com/Aconex/elasticflume[elasticflume]:
|
||||||
|
http://github.com/cloudera/flume[Flume] sink implementation.
|
||||||
|
|
||||||
|
* http://code.google.com/p/terrastore/wiki/Search_Integration[Terrastore Search]:
|
||||||
|
http://code.google.com/p/terrastore/[Terrastore] integration module with elasticsearch.
|
||||||
|
|
||||||
|
* https://github.com/infochimps/wonderdog[Wonderdog]:
|
||||||
|
Hadoop bulk loader into elasticsearch.
|
||||||
|
|
||||||
|
* http://geeks.aretotally.in/play-framework-module-elastic-search-distributed-searching-with-json-http-rest-or-java[Play!Framework]:
|
||||||
|
Integrate with Play! Framework Application.
|
||||||
|
|
||||||
|
* https://github.com/Exercise/FOQElasticaBundle[ElasticaBundle]:
|
||||||
|
Symfony2 Bundle wrapping Elastica.
|
||||||
|
|
||||||
|
* http://drupal.org/project/elasticsearch[Drupal]:
|
||||||
|
Drupal ElasticSearch integration.
|
||||||
|
|
||||||
|
* https://github.com/refuge/couch_es[couch_es]:
|
||||||
|
elasticsearch helper for couchdb based products (apache couchdb, bigcouch & refuge)
|
||||||
|
|
||||||
|
* https://github.com/sonian/elasticsearch-jetty[Jetty]:
|
||||||
|
Jetty HTTP Transport
|
||||||
|
|
||||||
|
* https://github.com/dadoonet/spring-elasticsearch[Spring Elasticsearch]:
|
||||||
|
Spring Factory for Elasticsearch
|
||||||
|
|
||||||
|
* https://camel.apache.org/elasticsearch.html[Apache Camel Integration]:
|
||||||
|
An Apache camel component to integrate elasticsearch
|
||||||
|
|
||||||
|
* https://github.com/tlrx/elasticsearch-test[elasticsearch-test]:
|
||||||
|
Elasticsearch Java annotations for unit testing with
|
||||||
|
http://www.junit.org/[JUnit]
|
||||||
|
|
||||||
|
* http://searchbox-io.github.com/wp-elasticsearch/[Wp-ElasticSearch]:
|
||||||
|
ElasticSearch WordPress Plugin
|
||||||
|
|
||||||
|
* https://github.com/OlegKunitsyn/eslogd[eslogd]:
|
||||||
|
Linux daemon that replicates events to a central ElasticSearch server in real-time
|
||||||
|
|
||||||
|
* https://github.com/drewr/elasticsearch-clojure-repl[elasticsearch-clojure-repl]:
|
||||||
|
Plugin that embeds nREPL for run-time introspective adventure! Also
|
||||||
|
serves as an nREPL transport.
|
||||||
|
|
||||||
|
* http://haystacksearch.org/[Haystack]:
|
||||||
|
Modular search for Django
|
||||||
|
|
||||||
|
* https://github.com/cleverage/play2-elasticsearch[play2-elasticsearch]:
|
||||||
|
ElasticSearch module for Play Framework 2.x
|
||||||
|
|
||||||
|
* https://github.com/fullscale/dangle[dangle]:
|
||||||
|
A set of AngularJS directives that provide common visualizations for elasticsearch based on
|
||||||
|
D3.
|
17
docs/community/misc.asciidoc
Normal file
17
docs/community/misc.asciidoc
Normal file
|
@ -0,0 +1,17 @@
|
||||||
|
== Misc
|
||||||
|
|
||||||
|
* https://github.com/electrical/puppet-elasticsearch[Puppet]:
|
||||||
|
Elasticsearch puppet module.
|
||||||
|
|
||||||
|
* http://github.com/elasticsearch/cookbook-elasticsearch[Chef]:
|
||||||
|
Chef cookbook for Elasticsearch
|
||||||
|
|
||||||
|
* https://github.com/tavisto/elasticsearch-rpms[elasticsearch-rpms]:
|
||||||
|
RPMs for elasticsearch.
|
||||||
|
|
||||||
|
* http://www.github.com/neogenix/daikon[daikon]:
|
||||||
|
Daikon ElasticSearch CLI
|
||||||
|
|
||||||
|
* https://github.com/Aconex/scrutineer[Scrutineer]:
|
||||||
|
A high performance consistency checker to compare what you've indexed
|
||||||
|
with your source of truth content (e.g. DB)
|
27
docs/community/monitoring.asciidoc
Normal file
27
docs/community/monitoring.asciidoc
Normal file
|
@ -0,0 +1,27 @@
|
||||||
|
== Health and Performance Monitoring
|
||||||
|
|
||||||
|
* https://github.com/lukas-vlcek/bigdesk[bigdesk]:
|
||||||
|
Live charts and statistics for elasticsearch cluster.
|
||||||
|
|
||||||
|
* https://github.com/karmi/elasticsearch-paramedic[paramedic]:
|
||||||
|
Live charts with cluster stats and indices/shards information.
|
||||||
|
|
||||||
|
* http://www.elastichq.org/[ElasticSearchHQ]:
|
||||||
|
Free cluster health monitoring tool
|
||||||
|
|
||||||
|
* http://sematext.com/spm/index.html[SPM for ElasticSearch]:
|
||||||
|
Performance monitoring with live charts showing cluster and node stats, integrated
|
||||||
|
alerts, email reports, etc.
|
||||||
|
|
||||||
|
* https://github.com/radu-gheorghe/check-es[check-es]:
|
||||||
|
Nagios/Shinken plugins for checking on elasticsearch
|
||||||
|
|
||||||
|
* https://github.com/anchor/nagios-plugin-elasticsearch[check_elasticsearch]:
|
||||||
|
An ElasticSearch availability and performance monitoring plugin for
|
||||||
|
Nagios.
|
||||||
|
|
||||||
|
* https://github.com/rbramley/Opsview-elasticsearch[opsview-elasticsearch]:
|
||||||
|
Opsview plugin written in Perl for monitoring ElasticSearch
|
||||||
|
|
||||||
|
* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy]:
|
||||||
|
Plugin to watch Lucene segment merges across your cluster
|
99
docs/groovy-api/anatomy.asciidoc
Normal file
99
docs/groovy-api/anatomy.asciidoc
Normal file
|
@ -0,0 +1,99 @@
|
||||||
|
[[anatomy]]
|
||||||
|
== API Anatomy
|
||||||
|
|
||||||
|
Once a <<client,GClient>> has been
|
||||||
|
obtained, all of ElasticSearch APIs can be executed on it. Each Groovy
|
||||||
|
API is exposed using three different mechanisms.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Closure Request
|
||||||
|
|
||||||
|
The first type is to simply provide the request as a Closure, which
|
||||||
|
automatically gets resolved into the respective request instance (for
|
||||||
|
the index API, its the `IndexRequest` class). The API returns a special
|
||||||
|
future, called `GActionFuture`. This is a groovier version of
|
||||||
|
elasticsearch Java `ActionFuture` (in turn a nicer extension to Java own
|
||||||
|
`Future`) which allows to register listeners (closures) on it for
|
||||||
|
success and failures, as well as blocking for the response. For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def indexR = client.index {
|
||||||
|
index "test"
|
||||||
|
type "type1"
|
||||||
|
id "1"
|
||||||
|
source {
|
||||||
|
test = "value"
|
||||||
|
complex {
|
||||||
|
value1 = "value1"
|
||||||
|
value2 = "value2"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
println "Indexed $indexR.response.id into $indexR.response.index/$indexR.response.type"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
In the above example, calling `indexR.response` will simply block for
|
||||||
|
the response. We can also block for the response for a specific timeout:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
IndexResponse response = indexR.response "5s" // block for 5 seconds, same as:
|
||||||
|
response = indexR.response 5, TimeValue.SECONDS //
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
We can also register closures that will be called on success and on
|
||||||
|
failure:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
indexR.success = {IndexResponse response ->
|
||||||
|
pritnln "Indexed $response.id into $response.index/$response.type"
|
||||||
|
}
|
||||||
|
indexR.failure = {Throwable t ->
|
||||||
|
println "Failed to index: $t.message"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Request
|
||||||
|
|
||||||
|
This option allows to pass the actual instance of the request (instead
|
||||||
|
of a closure) as a parameter. The rest is similar to the closure as a
|
||||||
|
parameter option (the `GActionFuture` handling). For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def indexR = client.index (new IndexRequest(
|
||||||
|
index: "test",
|
||||||
|
type: "type1",
|
||||||
|
id: "1",
|
||||||
|
source: {
|
||||||
|
test = "value"
|
||||||
|
complex {
|
||||||
|
value1 = "value1"
|
||||||
|
value2 = "value2"
|
||||||
|
}
|
||||||
|
}))
|
||||||
|
|
||||||
|
println "Indexed $indexR.response.id into $indexR.response.index/$indexR.response.type"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Java Like
|
||||||
|
|
||||||
|
The last option is to provide an actual instance of the API request, and
|
||||||
|
an `ActionListener` for the callback. This is exactly like the Java API
|
||||||
|
with the added `gexecute` which returns the `GActionFuture`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def indexR = node.client.prepareIndex("test", "type1", "1").setSource({
|
||||||
|
test = "value"
|
||||||
|
complex {
|
||||||
|
value1 = "value1"
|
||||||
|
value2 = "value2"
|
||||||
|
}
|
||||||
|
}).gexecute()
|
||||||
|
--------------------------------------------------
|
58
docs/groovy-api/client.asciidoc
Normal file
58
docs/groovy-api/client.asciidoc
Normal file
|
@ -0,0 +1,58 @@
|
||||||
|
[[client]]
|
||||||
|
== Client
|
||||||
|
|
||||||
|
Obtaining an elasticsearch Groovy `GClient` (a `GClient` is a simple
|
||||||
|
wrapper on top of the Java `Client`) is simple. The most common way to
|
||||||
|
get a client is by starting an embedded `Node` which acts as a node
|
||||||
|
within the cluster.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Node Client
|
||||||
|
|
||||||
|
A Node based client is the simplest form to get a `GClient` to start
|
||||||
|
executing operations against elasticsearch.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.groovy.client.GClient
|
||||||
|
import org.elasticsearch.groovy.node.GNode
|
||||||
|
import static org.elasticsearch.groovy.node.GNodeBuilder.nodeBuilder
|
||||||
|
|
||||||
|
// on startup
|
||||||
|
|
||||||
|
GNode node = nodeBuilder().node();
|
||||||
|
GClient client = node.client();
|
||||||
|
|
||||||
|
// on shutdown
|
||||||
|
|
||||||
|
node.close();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Since elasticsearch allows to configure it using JSON based settings,
|
||||||
|
the configuration itself can be done using a closure that represent the
|
||||||
|
JSON:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.groovy.node.GNode
|
||||||
|
import org.elasticsearch.groovy.node.GNodeBuilder
|
||||||
|
import static org.elasticsearch.groovy.node.GNodeBuilder.*
|
||||||
|
|
||||||
|
// on startup
|
||||||
|
|
||||||
|
GNodeBuilder nodeBuilder = nodeBuilder();
|
||||||
|
nodeBuilder.settings {
|
||||||
|
node {
|
||||||
|
client = true
|
||||||
|
}
|
||||||
|
cluster {
|
||||||
|
name = "test"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GNode node = nodeBuilder.node()
|
||||||
|
|
||||||
|
// on shutdown
|
||||||
|
|
||||||
|
node.stop().close()
|
||||||
|
--------------------------------------------------
|
22
docs/groovy-api/count.asciidoc
Normal file
22
docs/groovy-api/count.asciidoc
Normal file
|
@ -0,0 +1,22 @@
|
||||||
|
[[count]]
|
||||||
|
== Count API
|
||||||
|
|
||||||
|
The count API is very similar to the
|
||||||
|
link:{java}/count.html[Java count API]. The Groovy
|
||||||
|
extension allows to provide the query to execute as a `Closure` (similar
|
||||||
|
to GORM criteria builder):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def count = client.count {
|
||||||
|
indices "test"
|
||||||
|
types "type1"
|
||||||
|
query {
|
||||||
|
term {
|
||||||
|
test = "value"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The query follows the same link:{ref}/query-dsl.html[Query DSL].
|
15
docs/groovy-api/delete.asciidoc
Normal file
15
docs/groovy-api/delete.asciidoc
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
[[delete]]
|
||||||
|
== Delete API
|
||||||
|
|
||||||
|
The delete API is very similar to the
|
||||||
|
link:{java}/delete.html[Java delete API], here is an
|
||||||
|
example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def deleteF = node.client.delete {
|
||||||
|
index "test"
|
||||||
|
type "type1"
|
||||||
|
id "1"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
18
docs/groovy-api/get.asciidoc
Normal file
18
docs/groovy-api/get.asciidoc
Normal file
|
@ -0,0 +1,18 @@
|
||||||
|
[[get]]
|
||||||
|
== Get API
|
||||||
|
|
||||||
|
The get API is very similar to the
|
||||||
|
link:{java}/get.html[Java get API]. The main benefit
|
||||||
|
of using groovy is handling the source content. It can be automatically
|
||||||
|
converted to a `Map` which means using Groovy to navigate it is simple:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def getF = node.client.get {
|
||||||
|
index "test"
|
||||||
|
type "type1"
|
||||||
|
id "1"
|
||||||
|
}
|
||||||
|
|
||||||
|
println "Result of field2: $getF.response.source.complex.field2"
|
||||||
|
--------------------------------------------------
|
50
docs/groovy-api/index.asciidoc
Normal file
50
docs/groovy-api/index.asciidoc
Normal file
|
@ -0,0 +1,50 @@
|
||||||
|
= Groovy API
|
||||||
|
:ref: http://www.elasticsearch.org/guide/elasticsearch/reference/current
|
||||||
|
:java: http://www.elasticsearch.org/guide/elasticsearch/client/java-api/current
|
||||||
|
|
||||||
|
[preface]
|
||||||
|
== Preface
|
||||||
|
|
||||||
|
This section describes the http://groovy.codehaus.org/[Groovy] API
|
||||||
|
elasticsearch provides. All elasticsearch APIs are executed using a
|
||||||
|
<<client,GClient>>, and are completely
|
||||||
|
asynchronous in nature (they either accept a listener, or return a
|
||||||
|
future).
|
||||||
|
|
||||||
|
The Groovy API is a wrapper on top of the
|
||||||
|
link:{java}[Java API] exposing it in a groovier
|
||||||
|
manner. The execution options for each API follow a similar manner and
|
||||||
|
covered in <<anatomy>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Maven Repository
|
||||||
|
|
||||||
|
The Groovy API is hosted on
|
||||||
|
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22elasticsearch-client-groovy%22[Maven
|
||||||
|
Central].
|
||||||
|
|
||||||
|
For example, you can define the latest version in your `pom.xml` file:
|
||||||
|
|
||||||
|
[source,xml]
|
||||||
|
--------------------------------------------------
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.elasticsearch</groupId>
|
||||||
|
<artifactId>elasticsearch-client-groovy</artifactId>
|
||||||
|
<version>${es.version}</version>
|
||||||
|
</dependency>
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
include::anatomy.asciidoc[]
|
||||||
|
|
||||||
|
include::client.asciidoc[]
|
||||||
|
|
||||||
|
include::index_.asciidoc[]
|
||||||
|
|
||||||
|
include::get.asciidoc[]
|
||||||
|
|
||||||
|
include::delete.asciidoc[]
|
||||||
|
|
||||||
|
include::search.asciidoc[]
|
||||||
|
|
||||||
|
include::count.asciidoc[]
|
||||||
|
|
31
docs/groovy-api/index_.asciidoc
Normal file
31
docs/groovy-api/index_.asciidoc
Normal file
|
@ -0,0 +1,31 @@
|
||||||
|
[[index_]]
|
||||||
|
== Index API
|
||||||
|
|
||||||
|
The index API is very similar to the
|
||||||
|
link:{java}/index_.html[Java index API]. The Groovy
|
||||||
|
extension to it is the ability to provide the indexed source using a
|
||||||
|
closure. For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def indexR = client.index {
|
||||||
|
index "test"
|
||||||
|
type "type1"
|
||||||
|
id "1"
|
||||||
|
source {
|
||||||
|
test = "value"
|
||||||
|
complex {
|
||||||
|
value1 = "value1"
|
||||||
|
value2 = "value2"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
In the above example, the source closure itself gets transformed into an
|
||||||
|
XContent (defaults to JSON). In order to change how the source closure
|
||||||
|
is serialized, a global (static) setting can be set on the `GClient` by
|
||||||
|
changing the `indexContentType` field.
|
||||||
|
|
||||||
|
Note also that the `source` can be set using the typical Java based
|
||||||
|
APIs, the `Closure` option is a Groovy extension.
|
114
docs/groovy-api/search.asciidoc
Normal file
114
docs/groovy-api/search.asciidoc
Normal file
|
@ -0,0 +1,114 @@
|
||||||
|
[[search]]
|
||||||
|
== Search API
|
||||||
|
|
||||||
|
The search API is very similar to the
|
||||||
|
link:{java}/search.html[Java search API]. The Groovy
|
||||||
|
extension allows to provide the search source to execute as a `Closure`
|
||||||
|
including the query itself (similar to GORM criteria builder):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def search = node.client.search {
|
||||||
|
indices "test"
|
||||||
|
types "type1"
|
||||||
|
source {
|
||||||
|
query {
|
||||||
|
term(test: "value")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
search.response.hits.each {SearchHit hit ->
|
||||||
|
println "Got hit $hit.id from $hit.index/$hit.type"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
It can also be executed using the "Java API" while still using a closure
|
||||||
|
for the query:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def search = node.client.prepareSearch("test").setQuery({
|
||||||
|
term(test: "value")
|
||||||
|
}).gexecute();
|
||||||
|
|
||||||
|
search.response.hits.each {SearchHit hit ->
|
||||||
|
println "Got hit $hit.id from $hit.index/$hit.type"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The format of the search `Closure` follows the same JSON syntax as the
|
||||||
|
link:{ref}/search-search.html[Search API] request.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== More examples
|
||||||
|
|
||||||
|
Term query where multiple values are provided (see
|
||||||
|
link:{ref}/query-dsl-terms-query.html[terms]):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def search = node.client.search {
|
||||||
|
indices "test"
|
||||||
|
types "type1"
|
||||||
|
source {
|
||||||
|
query {
|
||||||
|
terms(test: ["value1", "value2"])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Query string (see
|
||||||
|
link:{ref}/query-dsl-query-string-query.html[query string]):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def search = node.client.search {
|
||||||
|
indices "test"
|
||||||
|
types "type1"
|
||||||
|
source {
|
||||||
|
query {
|
||||||
|
query_string(
|
||||||
|
fields: ["test"],
|
||||||
|
query: "value1 value2")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Pagination (see
|
||||||
|
link:{ref}/search-request-from-size.html[from/size]):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def search = node.client.search {
|
||||||
|
indices "test"
|
||||||
|
types "type1"
|
||||||
|
source {
|
||||||
|
from = 0
|
||||||
|
size = 10
|
||||||
|
query {
|
||||||
|
term(test: "value")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Sorting (see link:{ref}/search-request-sort.html[sort]):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
def search = node.client.search {
|
||||||
|
indices "test"
|
||||||
|
types "type1"
|
||||||
|
source {
|
||||||
|
query {
|
||||||
|
term(test: "value")
|
||||||
|
}
|
||||||
|
sort = [
|
||||||
|
date : [ order: "desc"]
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
38
docs/java-api/bulk.asciidoc
Normal file
38
docs/java-api/bulk.asciidoc
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
[[bulk]]
|
||||||
|
== Bulk API
|
||||||
|
|
||||||
|
The bulk API allows one to index and delete several documents in a
|
||||||
|
single request. Here is a sample usage:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.common.xcontent.XContentFactory.*;
|
||||||
|
|
||||||
|
BulkRequestBuilder bulkRequest = client.prepareBulk();
|
||||||
|
|
||||||
|
// either use client#prepare, or use Requests# to directly build index/delete requests
|
||||||
|
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
|
||||||
|
.setSource(jsonBuilder()
|
||||||
|
.startObject()
|
||||||
|
.field("user", "kimchy")
|
||||||
|
.field("postDate", new Date())
|
||||||
|
.field("message", "trying out Elastic Search")
|
||||||
|
.endObject()
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
|
||||||
|
.setSource(jsonBuilder()
|
||||||
|
.startObject()
|
||||||
|
.field("user", "kimchy")
|
||||||
|
.field("postDate", new Date())
|
||||||
|
.field("message", "another post")
|
||||||
|
.endObject()
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
|
||||||
|
if (bulkResponse.hasFailures()) {
|
||||||
|
// process failures by iterating through each bulk response item
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
185
docs/java-api/client.asciidoc
Normal file
185
docs/java-api/client.asciidoc
Normal file
|
@ -0,0 +1,185 @@
|
||||||
|
[[client]]
|
||||||
|
== Client
|
||||||
|
|
||||||
|
You can use the *java client* in multiple ways:
|
||||||
|
|
||||||
|
* Perform standard <<index_,index>>, <<get,get>>,
|
||||||
|
<<delete,delete>> and <<search,search>> operations on an
|
||||||
|
existing cluster
|
||||||
|
* Perform administrative tasks on a running cluster
|
||||||
|
* Start full nodes when you want to run Elasticsearch embedded in your
|
||||||
|
own application or when you want to launch unit or integration tests
|
||||||
|
|
||||||
|
Obtaining an elasticsearch `Client` is simple. The most common way to
|
||||||
|
get a client is by:
|
||||||
|
|
||||||
|
1. creating an embedded link:#nodeclient[`Node`] that acts as a node
|
||||||
|
within a cluster
|
||||||
|
2. requesting a `Client` from your embedded `Node`.
|
||||||
|
|
||||||
|
Another manner is by creating a link:#transportclient[`TransportClient`]
|
||||||
|
that connects to a cluster.
|
||||||
|
|
||||||
|
*Important:*
|
||||||
|
|
||||||
|
______________________________________________________________________________________________________________________________________________________________
|
||||||
|
Please note that you are encouraged to use the same version on client
|
||||||
|
and cluster sides. You may hit some incompatibilities issues when mixing
|
||||||
|
major versions.
|
||||||
|
______________________________________________________________________________________________________________________________________________________________
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Node Client
|
||||||
|
|
||||||
|
Instantiating a node based client is the simplest way to get a `Client`
|
||||||
|
that can execute operations against elasticsearch.
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.node.NodeBuilder.*;
|
||||||
|
|
||||||
|
// on startup
|
||||||
|
|
||||||
|
Node node = nodeBuilder().node();
|
||||||
|
Client client = node.client();
|
||||||
|
|
||||||
|
// on shutdown
|
||||||
|
|
||||||
|
node.close();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
When you start a `Node`, it joins an elasticsearch cluster. You can have
|
||||||
|
different clusters by simple setting the `cluster.name` setting, or
|
||||||
|
explicitly using the `clusterName` method on the builder.
|
||||||
|
|
||||||
|
You can define `cluster.name` in `/src/main/resources/elasticsearch.yml`
|
||||||
|
dir in your project. As long as `elasticsearch.yml` is present in the
|
||||||
|
classloader, it will be used when you start your node.
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
cluster.name=yourclustername
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Or in Java:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
Node node = nodeBuilder().clusterName("yourclustername").node();
|
||||||
|
Client client = node.client();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The benefit of using the `Client` is the fact that operations are
|
||||||
|
automatically routed to the node(s) the operations need to be executed
|
||||||
|
on, without performing a "double hop". For example, the index operation
|
||||||
|
will automatically be executed on the shard that it will end up existing
|
||||||
|
at.
|
||||||
|
|
||||||
|
When you start a `Node`, the most important decision is whether it
|
||||||
|
should hold data or not. In other words, should indices and shards be
|
||||||
|
allocated to it. Many times we would like to have the clients just be
|
||||||
|
clients, without shards being allocated to them. This is simple to
|
||||||
|
configure by setting either `node.data` setting to `false` or
|
||||||
|
`node.client` to `true` (the `NodeBuilder` respective helper methods on
|
||||||
|
it):
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.node.NodeBuilder.*;
|
||||||
|
|
||||||
|
// on startup
|
||||||
|
|
||||||
|
Node node = nodeBuilder().client(true).node();
|
||||||
|
Client client = node.client();
|
||||||
|
|
||||||
|
// on shutdown
|
||||||
|
|
||||||
|
node.close();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Another common usage is to start the `Node` and use the `Client` in
|
||||||
|
unit/integration tests. In such a case, we would like to start a "local"
|
||||||
|
`Node` (with a "local" discovery and transport). Again, this is just a
|
||||||
|
matter of a simple setting when starting the `Node`. Note, "local" here
|
||||||
|
means local on the JVM (well, actually class loader) level, meaning that
|
||||||
|
two *local* servers started within the same JVM will discover themselves
|
||||||
|
and form a cluster.
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.node.NodeBuilder.*;
|
||||||
|
|
||||||
|
// on startup
|
||||||
|
|
||||||
|
Node node = nodeBuilder().local(true).node();
|
||||||
|
Client client = node.client();
|
||||||
|
|
||||||
|
// on shutdown
|
||||||
|
|
||||||
|
node.close();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Transport Client
|
||||||
|
|
||||||
|
The `TransportClient` connects remotely to an elasticsearch cluster
|
||||||
|
using the transport module. It does not join the cluster, but simply
|
||||||
|
gets one or more initial transport addresses and communicates with them
|
||||||
|
in round robin fashion on each action (though most actions will probably
|
||||||
|
be "two hop" operations).
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// on startup
|
||||||
|
|
||||||
|
Client client = new TransportClient()
|
||||||
|
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
|
||||||
|
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));
|
||||||
|
|
||||||
|
// on shutdown
|
||||||
|
|
||||||
|
client.close();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you have to set the cluster name if you use one different to
|
||||||
|
"elasticsearch":
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
Settings settings = ImmutableSettings.settingsBuilder()
|
||||||
|
.put("cluster.name", "myClusterName").build();
|
||||||
|
Client client = new TransportClient(settings);
|
||||||
|
//Add transport addresses and do something with the client...
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Or using `elasticsearch.yml` file as shown in the link:#nodeclient[Node
|
||||||
|
Client section]
|
||||||
|
|
||||||
|
The client allows to sniff the rest of the cluster, and add those into
|
||||||
|
its list of machines to use. In this case, note that the ip addresses
|
||||||
|
used will be the ones that the other nodes were started with (the
|
||||||
|
"publish" address). In order to enable it, set the
|
||||||
|
`client.transport.sniff` to `true`:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
Settings settings = ImmutableSettings.settingsBuilder()
|
||||||
|
.put("client.transport.sniff", true).build();
|
||||||
|
TransportClient client = new TransportClient(settings);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Other transport client level settings include:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Parameter |Description
|
||||||
|
|`client.transport.ignore_cluster_name` |Set to `true` to ignore cluster
|
||||||
|
name validation of connected nodes. (since 0.19.4)
|
||||||
|
|
||||||
|
|`client.transport.ping_timeout` |The time to wait for a ping response
|
||||||
|
from a node. Defaults to `5s`.
|
||||||
|
|
||||||
|
|`client.transport.nodes_sampler_interval` |How often to sample / ping
|
||||||
|
the nodes listed and connected. Defaults to `5s`.
|
||||||
|
|=======================================================================
|
||||||
|
|
38
docs/java-api/count.asciidoc
Normal file
38
docs/java-api/count.asciidoc
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
[[count]]
|
||||||
|
== Count API
|
||||||
|
|
||||||
|
The count API allows to easily execute a query and get the number of
|
||||||
|
matches for that query. It can be executed across one or more indices
|
||||||
|
and across one or more types. The query can be provided using the
|
||||||
|
link:{ref}/query-dsl.html[Query DSL].
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.index.query.xcontent.FilterBuilders.*;
|
||||||
|
import static org.elasticsearch.index.query.xcontent.QueryBuilders.*;
|
||||||
|
|
||||||
|
CountResponse response = client.prepareCount("test")
|
||||||
|
.setQuery(termQuery("_type", "type1"))
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For more information on the count operation, check out the REST
|
||||||
|
link:{ref}/search-count.html[count] docs.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Operation Threading
|
||||||
|
|
||||||
|
The count API allows to set the threading model the operation will be
|
||||||
|
performed when the actual execution of the API is performed on the same
|
||||||
|
node (the API is executed on a shard that is allocated on the same
|
||||||
|
server).
|
||||||
|
|
||||||
|
There are three threading modes.The `NO_THREADS` mode means that the
|
||||||
|
count operation will be executed on the calling thread. The
|
||||||
|
`SINGLE_THREAD` mode means that the count operation will be executed on
|
||||||
|
a single different thread for all local shards. The `THREAD_PER_SHARD`
|
||||||
|
mode means that the count operation will be executed on a different
|
||||||
|
thread for each local shard.
|
||||||
|
|
||||||
|
The default mode is `SINGLE_THREAD`.
|
21
docs/java-api/delete-by-query.asciidoc
Normal file
21
docs/java-api/delete-by-query.asciidoc
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
[[delete-by-query]]
|
||||||
|
== Delete By Query API
|
||||||
|
|
||||||
|
The delete by query API allows to delete documents from one or more
|
||||||
|
indices and one or more types based on a <<query-dsl-queries,query>>. Here
|
||||||
|
is an example:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.index.query.FilterBuilders.*;
|
||||||
|
import static org.elasticsearch.index.query.QueryBuilders.*;
|
||||||
|
|
||||||
|
DeleteByQueryResponse response = client.prepareDeleteByQuery("test")
|
||||||
|
.setQuery(termQuery("_type", "type1"))
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For more information on the delete by query operation, check out the
|
||||||
|
link:{ref}/docs-delete-by-query.html[delete_by_query API]
|
||||||
|
docs.
|
39
docs/java-api/delete.asciidoc
Normal file
39
docs/java-api/delete.asciidoc
Normal file
|
@ -0,0 +1,39 @@
|
||||||
|
[[delete]]
|
||||||
|
== Delete API
|
||||||
|
|
||||||
|
The delete API allows to delete a typed JSON document from a specific
|
||||||
|
index based on its id. The following example deletes the JSON document
|
||||||
|
from an index called twitter, under a type called tweet, with id valued
|
||||||
|
1:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For more information on the delete operation, check out the
|
||||||
|
link:{ref}/docs-delete.html[delete API] docs.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Operation Threading
|
||||||
|
|
||||||
|
The delete API allows to set the threading model the operation will be
|
||||||
|
performed when the actual execution of the API is performed on the same
|
||||||
|
node (the API is executed on a shard that is allocated on the same
|
||||||
|
server).
|
||||||
|
|
||||||
|
The options are to execute the operation on a different thread, or to
|
||||||
|
execute it on the calling thread (note that the API is still async). By
|
||||||
|
default, `operationThreaded` is set to `true` which means the operation
|
||||||
|
is executed on a different thread. Here is an example that sets it to
|
||||||
|
`false`:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
|
||||||
|
.setOperationThreaded(false)
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
483
docs/java-api/facets.asciidoc
Normal file
483
docs/java-api/facets.asciidoc
Normal file
|
@ -0,0 +1,483 @@
|
||||||
|
[[facets]]
|
||||||
|
== Facets
|
||||||
|
|
||||||
|
Elasticsearch provides a full Java API to play with facets. See the
|
||||||
|
link:{ref}/search-facets.html[Facets guide].
|
||||||
|
|
||||||
|
Use the factory for facet builders (`FacetBuilders`) and add each facet
|
||||||
|
you want to compute when querying and add it to your search request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
SearchResponse sr = node.client().prepareSearch()
|
||||||
|
.setQuery( /* your query */ )
|
||||||
|
.addFacet( /* add a facet */ )
|
||||||
|
.execute().actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can add more than one facet. See
|
||||||
|
link:{ref}/search-search.html[Search Java API] for details.
|
||||||
|
|
||||||
|
To build facet requests, use `FacetBuilders` helpers. Just import them
|
||||||
|
in your class:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.FacetBuilders.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Facets
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Terms Facet
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-terms-facet.html[Terms Facet]
|
||||||
|
with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.termsFacet("f")
|
||||||
|
.field("brand")
|
||||||
|
.size(10);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.terms.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
TermsFacet f = (TermsFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
f.getTotalCount(); // Total terms doc count
|
||||||
|
f.getOtherCount(); // Not shown terms doc count
|
||||||
|
f.getMissingCount(); // Without term doc count
|
||||||
|
|
||||||
|
// For each entry
|
||||||
|
for (TermsFacet.Entry entry : f) {
|
||||||
|
entry.getTerm(); // Term
|
||||||
|
entry.getCount(); // Doc count
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Range Facet
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-range-facet.html[Range Facet]
|
||||||
|
with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.rangeFacet("f")
|
||||||
|
.field("price") // Field to compute on
|
||||||
|
.addUnboundedFrom(3) // from -infinity to 3 (excluded)
|
||||||
|
.addRange(3, 6) // from 3 to 6 (excluded)
|
||||||
|
.addUnboundedTo(6); // from 6 to +infinity
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.range.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
RangeFacet f = (RangeFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
// For each entry
|
||||||
|
for (RangeFacet.Entry entry : f) {
|
||||||
|
entry.getFrom(); // Range from requested
|
||||||
|
entry.getTo(); // Range to requested
|
||||||
|
entry.getCount(); // Doc count
|
||||||
|
entry.getMin(); // Min value
|
||||||
|
entry.getMax(); // Max value
|
||||||
|
entry.getMean(); // Mean
|
||||||
|
entry.getTotal(); // Sum of values
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Histogram Facet
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-histogram-facet.html[Histogram
|
||||||
|
Facet] with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
HistogramFacetBuilder facet = FacetBuilders.histogramFacet("f")
|
||||||
|
.field("price")
|
||||||
|
.interval(1);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.histogram.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
HistogramFacet f = (HistogramFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
// For each entry
|
||||||
|
for (HistogramFacet.Entry entry : f) {
|
||||||
|
entry.getKey(); // Key (X-Axis)
|
||||||
|
entry.getCount(); // Doc count (Y-Axis)
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Date Histogram Facet
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-date-histogram-facet.html[Date
|
||||||
|
Histogram Facet] with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.dateHistogramFacet("f")
|
||||||
|
.field("date") // Your date field
|
||||||
|
.interval("year"); // You can also use "quarter", "month", "week", "day",
|
||||||
|
// "hour" and "minute" or notation like "1.5h" or "2w"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.datehistogram.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
DateHistogramFacet f = (DateHistogramFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
// For each entry
|
||||||
|
for (DateHistogramFacet.Entry entry : f) {
|
||||||
|
entry.getTime(); // Date in ms since epoch (X-Axis)
|
||||||
|
entry.getCount(); // Doc count (Y-Axis)
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Filter Facet (not facet filter)
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-filter-facet.html[Filter Facet]
|
||||||
|
with Java API.
|
||||||
|
|
||||||
|
If you are looking on how to apply a filter to a facet, have a look at
|
||||||
|
link:#facet-filter[facet filter] using Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.filterFacet("f",
|
||||||
|
FilterBuilders.termFilter("brand", "heineken")); // Your Filter here
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
See <<query-dsl-filters,Filters>> to
|
||||||
|
learn how to build filters using Java.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.filter.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
FilterFacet f = (FilterFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
f.getCount(); // Number of docs that matched
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Query Facet
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-query-facet.html[Query Facet]
|
||||||
|
with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.queryFacet("f",
|
||||||
|
QueryBuilders.matchQuery("brand", "heineken"));
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.query.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
QueryFacet f = (QueryFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
f.getCount(); // Number of docs that matched
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
See <<query-dsl-queries,Queries>> to
|
||||||
|
learn how to build queries using Java.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Statistical
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-statistical-facet.html[Statistical
|
||||||
|
Facet] with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.statisticalFacet("f")
|
||||||
|
.field("price");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.statistical.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
StatisticalFacet f = (StatisticalFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
f.getCount(); // Doc count
|
||||||
|
f.getMin(); // Min value
|
||||||
|
f.getMax(); // Max value
|
||||||
|
f.getMean(); // Mean
|
||||||
|
f.getTotal(); // Sum of values
|
||||||
|
f.getStdDeviation(); // Standard Deviation
|
||||||
|
f.getSumOfSquares(); // Sum of Squares
|
||||||
|
f.getVariance(); // Variance
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Terms Stats Facet
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-terms-stats-facet.html[Terms
|
||||||
|
Stats Facet] with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.termsStatsFacet("f")
|
||||||
|
.keyField("brand")
|
||||||
|
.valueField("price");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.termsstats.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
TermsStatsFacet f = (TermsStatsFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
f.getTotalCount(); // Total terms doc count
|
||||||
|
f.getOtherCount(); // Not shown terms doc count
|
||||||
|
f.getMissingCount(); // Without term doc count
|
||||||
|
|
||||||
|
// For each entry
|
||||||
|
for (TermsStatsFacet.Entry entry : f) {
|
||||||
|
entry.getTerm(); // Term
|
||||||
|
entry.getCount(); // Doc count
|
||||||
|
entry.getMin(); // Min value
|
||||||
|
entry.getMax(); // Max value
|
||||||
|
entry.getMean(); // Mean
|
||||||
|
entry.getTotal(); // Sum of values
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Geo Distance Facet
|
||||||
|
|
||||||
|
Here is how you can use
|
||||||
|
link:{ref}/search-facets-geo-distance-facet.html[Geo
|
||||||
|
Distance Facet] with Java API.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Prepare facet request
|
||||||
|
|
||||||
|
Here is an example on how to create the facet request:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders.geoDistanceFacet("f")
|
||||||
|
.field("pin.location") // Field containing coordinates we want to compare with
|
||||||
|
.point(40, -70) // Point from where we start (0)
|
||||||
|
.addUnboundedFrom(10) // 0 to 10 km (excluded)
|
||||||
|
.addRange(10, 20) // 10 to 20 km (excluded)
|
||||||
|
.addRange(20, 100) // 20 to 100 km (excluded)
|
||||||
|
.addUnboundedTo(100) // from 100 km to infinity (and beyond ;-) )
|
||||||
|
.unit(DistanceUnit.KILOMETERS); // All distances are in kilometers. Can be MILES
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Use facet response
|
||||||
|
|
||||||
|
Import Facet definition classes:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.search.facet.geodistance.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// sr is here your SearchResponse object
|
||||||
|
GeoDistanceFacet f = (GeoDistanceFacet) sr.facets().facetsAsMap().get("f");
|
||||||
|
|
||||||
|
// For each entry
|
||||||
|
for (GeoDistanceFacet.Entry entry : f) {
|
||||||
|
entry.getFrom(); // Distance from requested
|
||||||
|
entry.getTo(); // Distance to requested
|
||||||
|
entry.getCount(); // Doc count
|
||||||
|
entry.getMin(); // Min value
|
||||||
|
entry.getMax(); // Max value
|
||||||
|
entry.getTotal(); // Sum of values
|
||||||
|
entry.getMean(); // Mean
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Facet filters (not Filter Facet)
|
||||||
|
|
||||||
|
By default, facets are applied on the query resultset whatever filters
|
||||||
|
exists or are.
|
||||||
|
|
||||||
|
If you need to compute facets with the same filters or even with other
|
||||||
|
filters, you can add the filter to any facet using
|
||||||
|
`AbstractFacetBuilder#facetFilter(FilterBuilder)` method:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FacetBuilders
|
||||||
|
.termsFacet("f").field("brand") // Your facet
|
||||||
|
.facetFilter( // Your filter here
|
||||||
|
FilterBuilders.termFilter("colour", "pale")
|
||||||
|
);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For example, you can reuse the same filter you created for your query:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// A common filter
|
||||||
|
FilterBuilder filter = FilterBuilders.termFilter("colour", "pale");
|
||||||
|
|
||||||
|
TermsFacetBuilder facet = FacetBuilders.termsFacet("f")
|
||||||
|
.field("brand")
|
||||||
|
.facetFilter(filter); // We apply it to the facet
|
||||||
|
|
||||||
|
SearchResponse sr = node.client().prepareSearch()
|
||||||
|
.setQuery(QueryBuilders.matchAllQuery())
|
||||||
|
.setFilter(filter) // We apply it to the query
|
||||||
|
.addFacet(facet)
|
||||||
|
.execute().actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
See documentation on how to build
|
||||||
|
<<query-dsl-filters,Filters>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Scope
|
||||||
|
|
||||||
|
By default, facets are computed within the query resultset. But, you can
|
||||||
|
compute facets from all documents in the index whatever the query is,
|
||||||
|
using `global` parameter:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
TermsFacetBuilder facet = FacetBuilders.termsFacet("f")
|
||||||
|
.field("brand")
|
||||||
|
.global(true);
|
||||||
|
--------------------------------------------------
|
38
docs/java-api/get.asciidoc
Normal file
38
docs/java-api/get.asciidoc
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
[[get]]
|
||||||
|
== Get API
|
||||||
|
|
||||||
|
The get API allows to get a typed JSON document from the index based on
|
||||||
|
its id. The following example gets a JSON document from an index called
|
||||||
|
twitter, under a type called tweet, with id valued 1:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
GetResponse response = client.prepareGet("twitter", "tweet", "1")
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For more information on the index operation, check out the REST
|
||||||
|
link:{ref}/docs-get.html[get] docs.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Operation Threading
|
||||||
|
|
||||||
|
The get API allows to set the threading model the operation will be
|
||||||
|
performed when the actual execution of the API is performed on the same
|
||||||
|
node (the API is executed on a shard that is allocated on the same
|
||||||
|
server).
|
||||||
|
|
||||||
|
The options are to execute the operation on a different thread, or to
|
||||||
|
execute it on the calling thread (note that the API is still async). By
|
||||||
|
default, `operationThreaded` is set to `true` which means the operation
|
||||||
|
is executed on a different thread. Here is an example that sets it to
|
||||||
|
`false`:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
GetResponse response = client.prepareGet("twitter", "tweet", "1")
|
||||||
|
.setOperationThreaded(false)
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
61
docs/java-api/index.asciidoc
Normal file
61
docs/java-api/index.asciidoc
Normal file
|
@ -0,0 +1,61 @@
|
||||||
|
[[java-api]]
|
||||||
|
= Java API
|
||||||
|
:ref: http://www.elasticsearch.org/guide/elasticsearch/reference/current
|
||||||
|
|
||||||
|
[preface]
|
||||||
|
== Preface
|
||||||
|
This section describes the Java API that elasticsearch provides. All
|
||||||
|
elasticsearch operations are executed using a
|
||||||
|
<<client,Client>> object. All
|
||||||
|
operations are completely asynchronous in nature (either accepts a
|
||||||
|
listener, or return a future).
|
||||||
|
|
||||||
|
Additionally, operations on a client may be accumulated and executed in
|
||||||
|
<<bulk,Bulk>>.
|
||||||
|
|
||||||
|
Note, all the APIs are exposed through the
|
||||||
|
Java API (actually, the Java API is used internally to execute them).
|
||||||
|
|
||||||
|
[float]
|
||||||
|
== Maven Repository
|
||||||
|
|
||||||
|
Elasticsearch is hosted on
|
||||||
|
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22elasticsearch%22[Maven
|
||||||
|
Central].
|
||||||
|
|
||||||
|
For example, you can define the latest version in your `pom.xml` file:
|
||||||
|
|
||||||
|
[source,xml]
|
||||||
|
--------------------------------------------------
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.elasticsearch</groupId>
|
||||||
|
<artifactId>elasticsearch</artifactId>
|
||||||
|
<version>${es.version}</version>
|
||||||
|
</dependency>
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
include::client.asciidoc[]
|
||||||
|
|
||||||
|
include::index_.asciidoc[]
|
||||||
|
|
||||||
|
include::get.asciidoc[]
|
||||||
|
|
||||||
|
include::delete.asciidoc[]
|
||||||
|
|
||||||
|
include::bulk.asciidoc[]
|
||||||
|
|
||||||
|
include::search.asciidoc[]
|
||||||
|
|
||||||
|
include::count.asciidoc[]
|
||||||
|
|
||||||
|
include::delete-by-query.asciidoc[]
|
||||||
|
|
||||||
|
include::facets.asciidoc[]
|
||||||
|
|
||||||
|
include::percolate.asciidoc[]
|
||||||
|
|
||||||
|
include::query-dsl-queries.asciidoc[]
|
||||||
|
|
||||||
|
include::query-dsl-filters.asciidoc[]
|
||||||
|
|
201
docs/java-api/index_.asciidoc
Normal file
201
docs/java-api/index_.asciidoc
Normal file
|
@ -0,0 +1,201 @@
|
||||||
|
[[index_]]
|
||||||
|
== Index API
|
||||||
|
|
||||||
|
The index API allows one to index a typed JSON document into a specific
|
||||||
|
index and make it searchable.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Generate JSON document
|
||||||
|
|
||||||
|
There are different way of generating JSON document:
|
||||||
|
|
||||||
|
* Manually (aka do it yourself) using native `byte[]` or as a `String`
|
||||||
|
|
||||||
|
* Using `Map` that will be automatically converted to its JSON
|
||||||
|
equivalent
|
||||||
|
|
||||||
|
* Using a third party library to serialize your beans such as
|
||||||
|
http://wiki.fasterxml.com/JacksonHome[Jackson]
|
||||||
|
|
||||||
|
* Using built-in helpers XContentFactory.jsonBuilder()
|
||||||
|
|
||||||
|
Internally, each type is converted to `byte[]` (so a String is converted
|
||||||
|
to a `byte[]`). Therefore, if the object is in this form already, then
|
||||||
|
use it. The `jsonBuilder` is highly optimized JSON generator that
|
||||||
|
directly constructs a `byte[]`.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Do It Yourself
|
||||||
|
|
||||||
|
Nothing really difficult here but note that you will have to encode
|
||||||
|
dates regarding to the
|
||||||
|
link:{ref}/mapping-date-format.html[Date Format].
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
String json = "{" +
|
||||||
|
"\"user\":\"kimchy\"," +
|
||||||
|
"\"postDate\":\"2013-01-30\"," +
|
||||||
|
"\"message\":\"trying out Elastic Search\"," +
|
||||||
|
"}";
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Using Map
|
||||||
|
|
||||||
|
Map is a key:values pair collection. It represents very well a JSON
|
||||||
|
structure:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
Map<String, Object> json = new HashMap<String, Object>();
|
||||||
|
json.put("user","kimchy");
|
||||||
|
json.put("postDate",new Date());
|
||||||
|
json.put("message","trying out Elastic Search");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Serialize your beans
|
||||||
|
|
||||||
|
Elasticsearch already use Jackson but shade it under
|
||||||
|
`org.elasticsearch.common.jackson` package. +
|
||||||
|
So, you can add your own Jackson version in your `pom.xml` file or in
|
||||||
|
your classpath. See http://wiki.fasterxml.com/JacksonDownload[Jackson
|
||||||
|
Download Page].
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.fasterxml.jackson.core</groupId>
|
||||||
|
<artifactId>jackson-databind</artifactId>
|
||||||
|
<version>2.1.3</version>
|
||||||
|
</dependency>
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Then, you can start serializing your beans to JSON:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import com.fasterxml.jackson.databind.*;
|
||||||
|
|
||||||
|
// instance a json mapper
|
||||||
|
ObjectMapper mapper = new ObjectMapper(); // create once, reuse
|
||||||
|
|
||||||
|
// generate json
|
||||||
|
String json = mapper.writeValueAsString(yourbeaninstance);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Use Elasticsearch helpers
|
||||||
|
|
||||||
|
Elasticsearch provides built-in helpers to generate JSON content.
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.common.xcontent.XContentFactory.*;
|
||||||
|
|
||||||
|
XContentBuilder builder = jsonBuilder()
|
||||||
|
.startObject()
|
||||||
|
.field("user", "kimchy")
|
||||||
|
.field("postDate", new Date())
|
||||||
|
.field("message", "trying out Elastic Search")
|
||||||
|
.endObject()
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can also add arrays with `startArray(String)` and
|
||||||
|
`endArray()` methods. By the way, `field` method +
|
||||||
|
accept many object types. You can pass directly numbers, dates and even
|
||||||
|
other XContentBuilder objects.
|
||||||
|
|
||||||
|
If you need to see the generated JSON content, you can use the
|
||||||
|
@string()@method.
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
String json = builder.string();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Index document
|
||||||
|
|
||||||
|
The following example indexes a JSON document into an index called
|
||||||
|
twitter, under a type called tweet, with id valued 1:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.common.xcontent.XContentFactory.*;
|
||||||
|
|
||||||
|
IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
|
||||||
|
.setSource(jsonBuilder()
|
||||||
|
.startObject()
|
||||||
|
.field("user", "kimchy")
|
||||||
|
.field("postDate", new Date())
|
||||||
|
.field("message", "trying out Elastic Search")
|
||||||
|
.endObject()
|
||||||
|
)
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can also index your documents as JSON String and that you
|
||||||
|
don't have to give an ID:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
String json = "{" +
|
||||||
|
"\"user\":\"kimchy\"," +
|
||||||
|
"\"postDate\":\"2013-01-30\"," +
|
||||||
|
"\"message\":\"trying out Elastic Search\"," +
|
||||||
|
"}";
|
||||||
|
|
||||||
|
IndexResponse response = client.prepareIndex("twitter", "tweet")
|
||||||
|
.setSource(json)
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
`IndexResponse` object will give you report:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Index name
|
||||||
|
String _index = response.index();
|
||||||
|
// Type name
|
||||||
|
String _type = response.type();
|
||||||
|
// Document ID (generated or not)
|
||||||
|
String _id = response.id();
|
||||||
|
// Version (if it's the first time you index this document, you will get: 1)
|
||||||
|
long _version = response.version();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
If you use percolation while indexing, `IndexResponse` object will give
|
||||||
|
you percolator that have matched:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
|
||||||
|
.setSource(json)
|
||||||
|
.setPercolate("*")
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
|
||||||
|
List<String> matches = response.matches();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For more information on the index operation, check out the REST
|
||||||
|
link:{ref}/docs-index_.html[index] docs.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Operation Threading
|
||||||
|
|
||||||
|
The index API allows to set the threading model the operation will be
|
||||||
|
performed when the actual execution of the API is performed on the same
|
||||||
|
node (the API is executed on a shard that is allocated on the same
|
||||||
|
server).
|
||||||
|
|
||||||
|
The options are to execute the operation on a different thread, or to
|
||||||
|
execute it on the calling thread (note that the API is still async). By
|
||||||
|
default, `operationThreaded` is set to `true` which means the operation
|
||||||
|
is executed on a different thread.
|
48
docs/java-api/percolate.asciidoc
Normal file
48
docs/java-api/percolate.asciidoc
Normal file
|
@ -0,0 +1,48 @@
|
||||||
|
[[percolate]]
|
||||||
|
== Percolate API
|
||||||
|
|
||||||
|
The percolator allows to register queries against an index, and then
|
||||||
|
send `percolate` requests which include a doc, and getting back the
|
||||||
|
queries that match on that doc out of the set of registered queries.
|
||||||
|
|
||||||
|
Read the main {ref}/search-percolate.html[percolate]
|
||||||
|
documentation before reading this guide.
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
//This is the query we're registering in the percolator
|
||||||
|
QueryBuilder qb = termQuery("content", "amazing");
|
||||||
|
|
||||||
|
//Index the query = register it in the percolator
|
||||||
|
client.prepareIndex("_percolator", "myIndexName", "myDesignatedQueryName")
|
||||||
|
.setSource(jsonBuilder()
|
||||||
|
.startObject()
|
||||||
|
.field("query", qb) // Register the query
|
||||||
|
.endObject())
|
||||||
|
.setRefresh(true) // Needed when the query shall be available immediately
|
||||||
|
.execute().actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
This indexes the above term query under the name
|
||||||
|
*myDesignatedQueryName*.
|
||||||
|
|
||||||
|
In order to check a document against the registered queries, use this
|
||||||
|
code:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
//Build a document to check against the percolator
|
||||||
|
XContentBuilder docBuilder = XContentFactory.jsonBuilder().startObject();
|
||||||
|
docBuilder.field("doc").startObject(); //This is needed to designate the document
|
||||||
|
docBuilder.field("content", "This is amazing!");
|
||||||
|
docBuilder.endObject(); //End of the doc field
|
||||||
|
docBuilder.endObject(); //End of the JSON root object
|
||||||
|
//Percolate
|
||||||
|
PercolateResponse response =
|
||||||
|
client.preparePercolate("myIndexName", "myDocumentType").setSource(docBuilder).execute().actionGet();
|
||||||
|
//Iterate over the results
|
||||||
|
for(String result : response) {
|
||||||
|
//Handle the result which is the name of
|
||||||
|
//the query in the percolator
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
459
docs/java-api/query-dsl-filters.asciidoc
Normal file
459
docs/java-api/query-dsl-filters.asciidoc
Normal file
|
@ -0,0 +1,459 @@
|
||||||
|
[[query-dsl-filters]]
|
||||||
|
== Query DSL - Filters
|
||||||
|
|
||||||
|
elasticsearch provides a full Java query dsl in a similar manner to the
|
||||||
|
REST link:{ref}/query-dsl.html[Query DSL]. The factory for filter
|
||||||
|
builders is `FilterBuilders`.
|
||||||
|
|
||||||
|
Once your query is ready, you can use the <<search,Search API>>.
|
||||||
|
|
||||||
|
See also how to build <<query-dsl-queries,Queries>>.
|
||||||
|
|
||||||
|
To use `FilterBuilders` just import them in your class:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.index.query.FilterBuilders.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can easily print (aka debug) JSON generated queries using
|
||||||
|
`toString()` method on `FilterBuilder` object.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== And Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-and-filter.html[And Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.andFilter(
|
||||||
|
FilterBuilders.rangeFilter("postDate").from("2010-03-01").to("2010-04-01"),
|
||||||
|
FilterBuilders.prefixFilter("name.second", "ba")
|
||||||
|
);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`AndFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Bool Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-bool-filter.html[Bool Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.boolFilter()
|
||||||
|
.must(FilterBuilders.termFilter("tag", "wow"))
|
||||||
|
.mustNot(FilterBuilders.rangeFilter("age").from("10").to("20"))
|
||||||
|
.should(FilterBuilders.termFilter("tag", "sometag"))
|
||||||
|
.should(FilterBuilders.termFilter("tag", "sometagtag"));
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`BoolFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Exists Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-exists-filter.html[Exists Filter].
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.existsFilter("user");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Ids Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-ids-filter.html[IDs Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.idsFilter("my_type", "type2").addIds("1", "4", "100");
|
||||||
|
|
||||||
|
// Type is optional
|
||||||
|
FilterBuilders.idsFilter().addIds("1", "4", "100");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Limit Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-limit-filter.html[Limit Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.limitFilter(100);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Type Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-type-filter.html[Type Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.typeFilter("my_type");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Geo Bounding Box Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-geo-bounding-box-filter.html[Geo
|
||||||
|
Bounding Box Filter]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.geoBoundingBoxFilter("pin.location")
|
||||||
|
.topLeft(40.73, -74.1)
|
||||||
|
.bottomRight(40.717, -73.99);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`GeoBoundingBoxFilterBuilder#cache(boolean)` method. See
|
||||||
|
<<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== GeoDistance Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-geo-distance-filter.html[Geo
|
||||||
|
Distance Filter]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.geoDistanceFilter("pin.location")
|
||||||
|
.point(40, -70)
|
||||||
|
.distance(200, DistanceUnit.KILOMETERS)
|
||||||
|
.optimizeBbox("memory") // Can be also "indexed" or "none"
|
||||||
|
.geoDistance(GeoDistance.ARC); // Or GeoDistance.PLANE
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`GeoDistanceFilterBuilder#cache(boolean)` method. See
|
||||||
|
<<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Geo Distance Range Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-geo-distance-range-filter.html[Geo
|
||||||
|
Distance Range Filter]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.geoDistanceRangeFilter("pin.location")
|
||||||
|
.point(40, -70)
|
||||||
|
.from("200km")
|
||||||
|
.to("400km")
|
||||||
|
.includeLower(true)
|
||||||
|
.includeUpper(false)
|
||||||
|
.optimizeBbox("memory") // Can be also "indexed" or "none"
|
||||||
|
.geoDistance(GeoDistance.ARC); // Or GeoDistance.PLANE
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`GeoDistanceRangeFilterBuilder#cache(boolean)` method. See
|
||||||
|
<<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Geo Polygon Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-geo-polygon-filter.html[Geo Polygon
|
||||||
|
Filter]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.geoPolygonFilter("pin.location")
|
||||||
|
.addPoint(40, -70)
|
||||||
|
.addPoint(30, -80)
|
||||||
|
.addPoint(20, -90);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`GeoPolygonFilterBuilder#cache(boolean)` method. See
|
||||||
|
<<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Geo Shape Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-geo-shape-filter.html[Geo Shape
|
||||||
|
Filter]
|
||||||
|
|
||||||
|
Note: the `geo_shape` type uses `Spatial4J` and `JTS`, both of which are
|
||||||
|
optional dependencies. Consequently you must add `Spatial4J` and `JTS`
|
||||||
|
to your classpath in order to use this type:
|
||||||
|
|
||||||
|
[source,xml]
|
||||||
|
-----------------------------------------------
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.spatial4j</groupId>
|
||||||
|
<artifactId>spatial4j</artifactId>
|
||||||
|
<version>0.3</version>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.vividsolutions</groupId>
|
||||||
|
<artifactId>jts</artifactId>
|
||||||
|
<version>1.12</version>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<groupId>xerces</groupId>
|
||||||
|
<artifactId>xercesImpl</artifactId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
</dependency>
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Import Spatial4J shapes
|
||||||
|
import com.spatial4j.core.context.SpatialContext;
|
||||||
|
import com.spatial4j.core.shape.Shape;
|
||||||
|
import com.spatial4j.core.shape.impl.RectangleImpl;
|
||||||
|
|
||||||
|
// Also import ShapeRelation
|
||||||
|
import org.elasticsearch.common.geo.ShapeRelation;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Shape within another
|
||||||
|
filter = FilterBuilders.geoShapeFilter("location",
|
||||||
|
new RectangleImpl(0,10,0,10,SpatialContext.GEO))
|
||||||
|
.relation(ShapeRelation.WITHIN);
|
||||||
|
|
||||||
|
// Intersect shapes
|
||||||
|
filter = FilterBuilders.geoShapeFilter("location",
|
||||||
|
new PointImpl(0, 0, SpatialContext.GEO))
|
||||||
|
.relation(ShapeRelation.INTERSECTS);
|
||||||
|
|
||||||
|
// Using pre-indexed shapes
|
||||||
|
filter = FilterBuilders.geoShapeFilter("location", "New Zealand", "countries")
|
||||||
|
.relation(ShapeRelation.DISJOINT);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Has Child / Has Parent Filters
|
||||||
|
|
||||||
|
See:
|
||||||
|
* link:{ref}/query-dsl-has-child-filter.html[Has Child Filter]
|
||||||
|
* link:{ref}/query-dsl-has-parent-filter.html[Has Parent Filter]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Has Child
|
||||||
|
QFilterBuilders.hasChildFilter("blog_tag",
|
||||||
|
QueryBuilders.termQuery("tag", "something"));
|
||||||
|
|
||||||
|
// Has Parent
|
||||||
|
QFilterBuilders.hasParentFilter("blog",
|
||||||
|
QueryBuilders.termQuery("tag", "something"));
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Match All Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-match-all-filter.html[Match All Filter]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.matchAllFilter();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Missing Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-missing-filter.html[Missing Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.missingFilter("user")
|
||||||
|
.existence(true)
|
||||||
|
.nullValue(true);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Not Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-not-filter.html[Not Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.notFilter(
|
||||||
|
FilterBuilders.rangeFilter("price").from("1").to("2"));
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Numeric Range Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-numeric-range-filter.html[Numeric
|
||||||
|
Range Filter]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.numericRangeFilter("age")
|
||||||
|
.from(10)
|
||||||
|
.to(20)
|
||||||
|
.includeLower(true)
|
||||||
|
.includeUpper(false);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`NumericRangeFilterBuilder#cache(boolean)` method. See
|
||||||
|
<<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Or Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-or-filter.html[Or Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.orFilter(
|
||||||
|
FilterBuilders.termFilter("name.second", "banon"),
|
||||||
|
FilterBuilders.termFilter("name.nick", "kimchy")
|
||||||
|
);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`OrFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Prefix Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-prefix-filter.html[Prefix Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.prefixFilter("user", "ki");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`PrefixFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Query Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-query-filter.html[Query Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.queryFilter(
|
||||||
|
QueryBuilders.queryString("this AND that OR thus")
|
||||||
|
);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`QueryFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Range Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-range-filter.html[Range Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.rangeFilter("age")
|
||||||
|
.from("10")
|
||||||
|
.to("20")
|
||||||
|
.includeLower(true)
|
||||||
|
.includeUpper(false);
|
||||||
|
|
||||||
|
// A simplified form using gte, gt, lt or lte
|
||||||
|
FilterBuilders.rangeFilter("age")
|
||||||
|
.gte("10")
|
||||||
|
.lt("20");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can ask not to cache the result using
|
||||||
|
`RangeFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Script Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-script-filter.html[Script Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilder filter = FilterBuilders.scriptFilter(
|
||||||
|
"doc['age'].value > param1"
|
||||||
|
).addParam("param1", 10);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can cache the result using
|
||||||
|
`ScriptFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Term Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-term-filter.html[Term Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.termFilter("user", "kimchy");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can ask not to cache the result using
|
||||||
|
`TermFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Terms Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-terms-filter.html[Terms Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.termsFilter("user", "kimchy", "elasticsearch")
|
||||||
|
.execution("plain"); // Optional, can be also "bool", "and" or "or"
|
||||||
|
// or "bool_nocache", "and_nocache" or "or_nocache"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can ask not to cache the result using
|
||||||
|
`TermsFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Nested Filter
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-nested-filter.html[Nested Filter]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilders.nestedFilter("obj1",
|
||||||
|
QueryBuilders.boolQuery()
|
||||||
|
.must(QueryBuilders.matchQuery("obj1.name", "blue"))
|
||||||
|
.must(QueryBuilders.rangeQuery("obj1.count").gt(5))
|
||||||
|
);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can ask not to cache the result using
|
||||||
|
`NestedFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||||
|
|
||||||
|
[[query-dsl-filters-caching]]
|
||||||
|
[float]
|
||||||
|
=== Caching
|
||||||
|
|
||||||
|
By default, some filters are cached or not cached. You can have a fine
|
||||||
|
tuning control using `cache(boolean)` method when exists. For example:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
FilterBuilder filter = FilterBuilders.andFilter(
|
||||||
|
FilterBuilders.rangeFilter("postDate").from("2010-03-01").to("2010-04-01"),
|
||||||
|
FilterBuilders.prefixFilter("name.second", "ba")
|
||||||
|
)
|
||||||
|
.cache(true);
|
||||||
|
--------------------------------------------------
|
489
docs/java-api/query-dsl-queries.asciidoc
Normal file
489
docs/java-api/query-dsl-queries.asciidoc
Normal file
|
@ -0,0 +1,489 @@
|
||||||
|
[[query-dsl-queries]]
|
||||||
|
== Query DSL - Queries
|
||||||
|
|
||||||
|
elasticsearch provides a full Java query dsl in a similar manner to the
|
||||||
|
REST link:{ref}/query-dsl.html[Query DSL]. The factory for query
|
||||||
|
builders is `QueryBuilders`. Once your query is ready, you can use the
|
||||||
|
<<search,Search API>>.
|
||||||
|
|
||||||
|
See also how to build <<query-dsl-filters,Filters>>
|
||||||
|
|
||||||
|
To use `QueryBuilders` just import them in your class:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.index.query.QueryBuilders.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that you can easily print (aka debug) JSON generated queries using
|
||||||
|
`toString()` method on `QueryBuilder` object.
|
||||||
|
|
||||||
|
The `QueryBuilder` can then be used with any API that accepts a query,
|
||||||
|
such as `count` and `search`.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Match Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-match-query.html[Match Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders.matchQuery("name", "kimchy elasticsearch");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== MultiMatch Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-multi-match-query.html[MultiMatch
|
||||||
|
Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders.multiMatchQuery(
|
||||||
|
"kimchy elasticsearch", // Text you are looking for
|
||||||
|
"user", "message" // Fields you query on
|
||||||
|
);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Boolean Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-bool-query.html[Boolean Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders
|
||||||
|
.boolQuery()
|
||||||
|
.must(termQuery("content", "test1"))
|
||||||
|
.must(termQuery("content", "test4"))
|
||||||
|
.mustNot(termQuery("content", "test2"))
|
||||||
|
.should(termQuery("content", "test3"));
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Boosting Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-boosting-query.html[Boosting Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.boostingQuery()
|
||||||
|
.positive(QueryBuilders.termQuery("name","kimchy"))
|
||||||
|
.negative(QueryBuilders.termQuery("name","dadoonet"))
|
||||||
|
.negativeBoost(0.2f);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== IDs Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-ids-query.html[IDs Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.idsQuery().ids("1", "2");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Custom Score Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-custom-score-query.html[Custom Score
|
||||||
|
Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.customScoreQuery(QueryBuilders.matchAllQuery()) // Your query here
|
||||||
|
.script("_score * doc['price'].value"); // Your script here
|
||||||
|
|
||||||
|
// If the script have parameters, use the same script and provide parameters to it.
|
||||||
|
QueryBuilders.customScoreQuery(QueryBuilders.matchAllQuery())
|
||||||
|
.script("_score * doc['price'].value / pow(param1, param2)")
|
||||||
|
.param("param1", 2)
|
||||||
|
.param("param2", 3.1);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Custom Boost Factor Query
|
||||||
|
|
||||||
|
See
|
||||||
|
link:{ref}/query-dsl-custom-boost-factor-query.html[Custom
|
||||||
|
Boost Factor Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.customBoostFactorQuery(QueryBuilders.matchAllQuery()) // Your query
|
||||||
|
.boostFactor(3.1f);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Constant Score Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-constant-score-query.html[Constant
|
||||||
|
Score Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Using with Filters
|
||||||
|
QueryBuilders.constantScoreQuery(FilterBuilders.termFilter("name","kimchy"))
|
||||||
|
.boost(2.0f);
|
||||||
|
|
||||||
|
// With Queries
|
||||||
|
QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("name","kimchy"))
|
||||||
|
.boost(2.0f);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Disjunction Max Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-dis-max-query.html[Disjunction Max
|
||||||
|
Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.disMaxQuery()
|
||||||
|
.add(QueryBuilders.termQuery("name","kimchy")) // Your queries
|
||||||
|
.add(QueryBuilders.termQuery("name","elasticsearch")) // Your queries
|
||||||
|
.boost(1.2f)
|
||||||
|
.tieBreaker(0.7f);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Field Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-field-query.html[Field Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.fieldQuery("name", "+kimchy -dadoonet");
|
||||||
|
|
||||||
|
// Note that you can write the same query using queryString query.
|
||||||
|
QueryBuilders.queryString("+kimchy -dadoonet").field("name");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Fuzzy Like This (Field) Query (flt and flt_field)
|
||||||
|
|
||||||
|
See:
|
||||||
|
* link:{ref}/query-dsl-flt-query.html[Fuzzy Like This Query]
|
||||||
|
* link:{ref}/query-dsl-flt-field-query.html[Fuzzy Like This Field Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// flt Query
|
||||||
|
QueryBuilders.fuzzyLikeThisQuery("name.first", "name.last") // Fields
|
||||||
|
.likeText("text like this one") // Text
|
||||||
|
.maxQueryTerms(12); // Max num of Terms
|
||||||
|
// in generated queries
|
||||||
|
|
||||||
|
// flt_field Query
|
||||||
|
QueryBuilders.fuzzyLikeThisFieldQuery("name.first") // Only on single field
|
||||||
|
.likeText("text like this one")
|
||||||
|
.maxQueryTerms(12);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== FuzzyQuery
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-fuzzy-query.html[Fuzzy Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders.fuzzyQuery("name", "kimzhy");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Has Child / Has Parent
|
||||||
|
|
||||||
|
See:
|
||||||
|
* link:{ref}/query-dsl-has-child-query.html[Has Child Query]
|
||||||
|
* link:{ref}/query-dsl-has-parent-query.html[Has Parent]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Has Child
|
||||||
|
QueryBuilders.hasChildQuery("blog_tag",
|
||||||
|
QueryBuilders.termQuery("tag","something"))
|
||||||
|
|
||||||
|
// Has Parent
|
||||||
|
QueryBuilders.hasParentQuery("blog",
|
||||||
|
QueryBuilders.termQuery("tag","something"));
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== MatchAll Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-match-all-query.html[Match All
|
||||||
|
Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders.matchAllQuery();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Fuzzy Like This (Field) Query (flt and flt_field)
|
||||||
|
|
||||||
|
See:
|
||||||
|
* link:{ref}/query-dsl-mlt-query.html[More Like This Query]
|
||||||
|
* link:{ref}/query-dsl-mlt-field-query.html[More Like This Field Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// mlt Query
|
||||||
|
QueryBuilders.moreLikeThisQuery("name.first", "name.last") // Fields
|
||||||
|
.likeText("text like this one") // Text
|
||||||
|
.minTermFreq(1) // Ignore Threshold
|
||||||
|
.maxQueryTerms(12); // Max num of Terms
|
||||||
|
// in generated queries
|
||||||
|
|
||||||
|
// mlt_field Query
|
||||||
|
QueryBuilders.moreLikeThisFieldQuery("name.first") // Only on single field
|
||||||
|
.likeText("text like this one")
|
||||||
|
.minTermFreq(1)
|
||||||
|
.maxQueryTerms(12);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Prefix Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-prefix-query.html[Prefix Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.prefixQuery("brand", "heine");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== QueryString Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-query-string-query.html[QueryString Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders.queryString("+kimchy -elasticsearch");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Range Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-range-query.html[Range Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders
|
||||||
|
.rangeQuery("price")
|
||||||
|
.from(5)
|
||||||
|
.to(10)
|
||||||
|
.includeLower(true)
|
||||||
|
.includeUpper(false);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Span Queries (first, near, not, or, term)
|
||||||
|
|
||||||
|
See:
|
||||||
|
* link:{ref}/query-dsl-span-first-query.html[Span First Query]
|
||||||
|
* link:{ref}/query-dsl-span-near-query.html[Span Near Query]
|
||||||
|
* link:{ref}/query-dsl-span-not-query.html[Span Not Query]
|
||||||
|
* link:{ref}/query-dsl-span-or-query.html[Span Or Query]
|
||||||
|
* link:{ref}/query-dsl-span-term-query.html[Span Term Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Span First
|
||||||
|
QueryBuilders.spanFirstQuery(
|
||||||
|
QueryBuilders.spanTermQuery("user", "kimchy"), // Query
|
||||||
|
3 // Max End position
|
||||||
|
);
|
||||||
|
|
||||||
|
// Span Near
|
||||||
|
QueryBuilders.spanNearQuery()
|
||||||
|
.clause(QueryBuilders.spanTermQuery("field","value1")) // Span Term Queries
|
||||||
|
.clause(QueryBuilders.spanTermQuery("field","value2"))
|
||||||
|
.clause(QueryBuilders.spanTermQuery("field","value3"))
|
||||||
|
.slop(12) // Slop factor
|
||||||
|
.inOrder(false)
|
||||||
|
.collectPayloads(false);
|
||||||
|
|
||||||
|
// Span Not
|
||||||
|
QueryBuilders.spanNotQuery()
|
||||||
|
.include(QueryBuilders.spanTermQuery("field","value1"))
|
||||||
|
.exclude(QueryBuilders.spanTermQuery("field","value2"));
|
||||||
|
|
||||||
|
// Span Or
|
||||||
|
QueryBuilders.spanOrQuery()
|
||||||
|
.clause(QueryBuilders.spanTermQuery("field","value1"))
|
||||||
|
.clause(QueryBuilders.spanTermQuery("field","value2"))
|
||||||
|
.clause(QueryBuilders.spanTermQuery("field","value3"));
|
||||||
|
|
||||||
|
// Span Term
|
||||||
|
QueryBuilders.spanTermQuery("user","kimchy");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Term Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-term-query.html[Term Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilder qb = QueryBuilders.termQuery("name", "kimchy");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Terms Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-terms-query.html[Terms Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.termsQuery("tags", // field
|
||||||
|
"blue", "pill") // values
|
||||||
|
.minimumMatch(1); // How many terms must match
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Top Children Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-top-children-query.html[Top Children Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.topChildrenQuery(
|
||||||
|
"blog_tag", // field
|
||||||
|
QueryBuilders.termQuery("tag", "something") // Query
|
||||||
|
)
|
||||||
|
.score("max") // max, sum or avg
|
||||||
|
.factor(5)
|
||||||
|
.incrementalFactor(2);
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Wildcard Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-wildcard-query.html[Wildcard Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.wildcardQuery("user", "k?mc*");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Nested Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-nested-query.html[Nested Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.nestedQuery("obj1", // Path
|
||||||
|
QueryBuilders.boolQuery() // Your query
|
||||||
|
.must(QueryBuilders.matchQuery("obj1.name", "blue"))
|
||||||
|
.must(QueryBuilders.rangeQuery("obj1.count").gt(5))
|
||||||
|
)
|
||||||
|
.scoreMode("avg"); // max, total, avg or none
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Custom Filters Score Query
|
||||||
|
|
||||||
|
See
|
||||||
|
link:{ref}/query-dsl-custom-filters-score-query.html[Custom Filters Score Query]
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
QueryBuilders.customFiltersScoreQuery(
|
||||||
|
QueryBuilders.matchAllQuery()) // Query
|
||||||
|
// Filters with their boost factors
|
||||||
|
.add(FilterBuilders.rangeFilter("age").from(0).to(10), 3)
|
||||||
|
.add(FilterBuilders.rangeFilter("age").from(10).to(20), 2)
|
||||||
|
.scoreMode("first"); // first, min, max, total, avg or multiply
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Indices Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-indices-query.html[Indices Query]
|
||||||
|
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Using another query when no match for the main one
|
||||||
|
QueryBuilders.indicesQuery(
|
||||||
|
QueryBuilders.termQuery("tag", "wow"),
|
||||||
|
"index1", "index2"
|
||||||
|
)
|
||||||
|
.noMatchQuery(QueryBuilders.termQuery("tag", "kow"));
|
||||||
|
|
||||||
|
// Using all (match all) or none (match no documents)
|
||||||
|
QueryBuilders.indicesQuery(
|
||||||
|
QueryBuilders.termQuery("tag", "wow"),
|
||||||
|
"index1", "index2"
|
||||||
|
)
|
||||||
|
.noMatchQuery("all"); // all or none
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== GeoShape Query
|
||||||
|
|
||||||
|
See link:{ref}/query-dsl-geo-shape-query.html[GeoShape Query]
|
||||||
|
|
||||||
|
|
||||||
|
Note: the `geo_shape` type uses `Spatial4J` and `JTS`, both of which are
|
||||||
|
optional dependencies. Consequently you must add `Spatial4J` and `JTS`
|
||||||
|
to your classpath in order to use this type:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.spatial4j</groupId>
|
||||||
|
<artifactId>spatial4j</artifactId>
|
||||||
|
<version>0.3</version>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.vividsolutions</groupId>
|
||||||
|
<artifactId>jts</artifactId>
|
||||||
|
<version>1.12</version>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<groupId>xerces</groupId>
|
||||||
|
<artifactId>xercesImpl</artifactId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
</dependency>
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Import Spatial4J shapes
|
||||||
|
import com.spatial4j.core.context.SpatialContext;
|
||||||
|
import com.spatial4j.core.shape.Shape;
|
||||||
|
import com.spatial4j.core.shape.impl.RectangleImpl;
|
||||||
|
|
||||||
|
// Also import ShapeRelation
|
||||||
|
import org.elasticsearch.common.geo.ShapeRelation;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// Shape within another
|
||||||
|
QueryBuilders.geoShapeQuery("location",
|
||||||
|
new RectangleImpl(0,10,0,10,SpatialContext.GEO))
|
||||||
|
.relation(ShapeRelation.WITHIN);
|
||||||
|
|
||||||
|
// Intersect shapes
|
||||||
|
QueryBuilders.geoShapeQuery("location",
|
||||||
|
new PointImpl(0, 0, SpatialContext.GEO))
|
||||||
|
.relation(ShapeRelation.INTERSECTS);
|
||||||
|
|
||||||
|
// Using pre-indexed shapes
|
||||||
|
QueryBuilders.geoShapeQuery("location", "New Zealand", "countries")
|
||||||
|
.relation(ShapeRelation.DISJOINT);
|
||||||
|
--------------------------------------------------
|
137
docs/java-api/search.asciidoc
Normal file
137
docs/java-api/search.asciidoc
Normal file
|
@ -0,0 +1,137 @@
|
||||||
|
[[search]]
|
||||||
|
== Search API
|
||||||
|
|
||||||
|
The search API allows to execute a search query and get back search hits
|
||||||
|
that match the query. It can be executed across one or more indices and
|
||||||
|
across one or more types. The query can either be provided using the
|
||||||
|
<<query-dsl-queries,query Java API>> or
|
||||||
|
the <<query-dsl-filters,filter Java API>>.
|
||||||
|
The body of the search request is built using the
|
||||||
|
`SearchSourceBuilder`. Here is an example:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import org.elasticsearch.action.search.SearchResponse;
|
||||||
|
import org.elasticsearch.action.search.SearchType;
|
||||||
|
import org.elasticsearch.index.query.FilterBuilders.*;
|
||||||
|
import org.elasticsearch.index.query.QueryBuilders.*;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
SearchResponse response = client.prepareSearch("index1", "index2")
|
||||||
|
.setTypes("type1", "type2")
|
||||||
|
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
|
||||||
|
.setQuery(QueryBuilders.termQuery("multi", "test")) // Query
|
||||||
|
.setFilter(FilterBuilders.rangeFilter("age").from(12).to(18)) // Filter
|
||||||
|
.setFrom(0).setSize(60).setExplain(true)
|
||||||
|
.execute()
|
||||||
|
.actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Note that all parameters are optional. Here is the smallest search call
|
||||||
|
you can write:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
// MatchAll on the whole cluster with all default options
|
||||||
|
SearchResponse response = client.prepareSearch().execute().actionGet();
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For more information on the search operation, check out the REST
|
||||||
|
link:{ref}/search.html[search] docs.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Using scrolls in Java
|
||||||
|
|
||||||
|
Read the link:{ref}/search-request-scroll.html[scroll documentation]
|
||||||
|
first!
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.elasticsearch.index.query.FilterBuilders.*;
|
||||||
|
import static org.elasticsearch.index.query.QueryBuilders.*;
|
||||||
|
|
||||||
|
QueryBuilder qb = termQuery("multi", "test");
|
||||||
|
|
||||||
|
SearchResponse scrollResp = client.prepareSearch(test)
|
||||||
|
.setSearchType(SearchType.SCAN)
|
||||||
|
.setScroll(new TimeValue(60000))
|
||||||
|
.setQuery(qb)
|
||||||
|
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
|
||||||
|
//Scroll until no hits are returned
|
||||||
|
while (true) {
|
||||||
|
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
|
||||||
|
for (SearchHit hit : scrollResp.getHits()) {
|
||||||
|
//Handle the hit...
|
||||||
|
}
|
||||||
|
//Break condition: No hits are returned
|
||||||
|
if (scrollResp.hits().hits().length == 0) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Operation Threading
|
||||||
|
|
||||||
|
The search API allows to set the threading model the operation will be
|
||||||
|
performed when the actual execution of the API is performed on the same
|
||||||
|
node (the API is executed on a shard that is allocated on the same
|
||||||
|
server).
|
||||||
|
|
||||||
|
There are three threading modes.The `NO_THREADS` mode means that the
|
||||||
|
search operation will be executed on the calling thread. The
|
||||||
|
`SINGLE_THREAD` mode means that the search operation will be executed on
|
||||||
|
a single different thread for all local shards. The `THREAD_PER_SHARD`
|
||||||
|
mode means that the search operation will be executed on a different
|
||||||
|
thread for each local shard.
|
||||||
|
|
||||||
|
The default mode is `SINGLE_THREAD`.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== MultiSearch API
|
||||||
|
|
||||||
|
See link:{ref}/search-multi-search.html[MultiSearch API Query]
|
||||||
|
documentation
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
SearchRequestBuilder srb1 = node.client()
|
||||||
|
.prepareSearch().setQuery(QueryBuilders.queryString("elasticsearch")).setSize(1);
|
||||||
|
SearchRequestBuilder srb2 = node.client()
|
||||||
|
.prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);
|
||||||
|
|
||||||
|
MultiSearchResponse sr = node.client().prepareMultiSearch()
|
||||||
|
.add(srb1)
|
||||||
|
.add(srb2)
|
||||||
|
.execute().actionGet();
|
||||||
|
|
||||||
|
// You will get all individual responses from MultiSearchResponse#responses()
|
||||||
|
long nbHits = 0;
|
||||||
|
for (MultiSearchResponse.Item item : sr.responses()) {
|
||||||
|
SearchResponse response = item.response();
|
||||||
|
nbHits += response.hits().totalHits();
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Using Facets
|
||||||
|
|
||||||
|
The following code shows how to add two facets within your search:
|
||||||
|
|
||||||
|
[source,java]
|
||||||
|
--------------------------------------------------
|
||||||
|
SearchResponse sr = node.client().prepareSearch()
|
||||||
|
.setQuery(QueryBuilders.matchAllQuery())
|
||||||
|
.addFacet(FacetBuilders.termsFacet("f1").field("field"))
|
||||||
|
.addFacet(FacetBuilders.dateHistogramFacet("f2").field("birth").interval("year"))
|
||||||
|
.execute().actionGet();
|
||||||
|
|
||||||
|
// Get your facet results
|
||||||
|
TermsFacet f1 = (TermsFacet) sr.facets().facetsAsMap().get("f1");
|
||||||
|
DateHistogramFacet f2 = (DateHistogramFacet) sr.facets().facetsAsMap().get("f2");
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
See <<facets,Facets Java API>>
|
||||||
|
documentation for details.
|
76
docs/reference/analysis.asciidoc
Normal file
76
docs/reference/analysis.asciidoc
Normal file
|
@ -0,0 +1,76 @@
|
||||||
|
[[analysis]]
|
||||||
|
= Analysis
|
||||||
|
|
||||||
|
[partintro]
|
||||||
|
--
|
||||||
|
The index analysis module acts as a configurable registry of Analyzers
|
||||||
|
that can be used in order to both break indexed (analyzed) fields when a
|
||||||
|
document is indexed and process query strings. It maps to the Lucene
|
||||||
|
`Analyzer`.
|
||||||
|
|
||||||
|
|
||||||
|
Analyzers are composed of a single <<analysis-tokenizers,Tokenizer>>
|
||||||
|
and zero or more <<analysis-tokenfilters,TokenFilters>>. The tokenizer may
|
||||||
|
be preceded by one or more <<analysis-charfilters,CharFilters>>. The
|
||||||
|
analysis module allows one to register `TokenFilters`, `Tokenizers` and
|
||||||
|
`Analyzers` under logical names that can then be referenced either in
|
||||||
|
mapping definitions or in certain APIs. The Analysis module
|
||||||
|
automatically registers (*if not explicitly defined*) built in
|
||||||
|
analyzers, token filters, and tokenizers.
|
||||||
|
|
||||||
|
Here is a sample configuration:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
standard :
|
||||||
|
type : standard
|
||||||
|
stopwords : [stop1, stop2]
|
||||||
|
myAnalyzer1 :
|
||||||
|
type : standard
|
||||||
|
stopwords : [stop1, stop2, stop3]
|
||||||
|
max_token_length : 500
|
||||||
|
# configure a custom analyzer which is
|
||||||
|
# exactly like the default standard analyzer
|
||||||
|
myAnalyzer2 :
|
||||||
|
tokenizer : standard
|
||||||
|
filter : [standard, lowercase, stop]
|
||||||
|
tokenizer :
|
||||||
|
myTokenizer1 :
|
||||||
|
type : standard
|
||||||
|
max_token_length : 900
|
||||||
|
myTokenizer2 :
|
||||||
|
type : keyword
|
||||||
|
buffer_size : 512
|
||||||
|
filter :
|
||||||
|
myTokenFilter1 :
|
||||||
|
type : stop
|
||||||
|
stopwords : [stop1, stop2, stop3, stop4]
|
||||||
|
myTokenFilter2 :
|
||||||
|
type : length
|
||||||
|
min : 0
|
||||||
|
max : 2000
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Backwards compatibility
|
||||||
|
|
||||||
|
All analyzers, tokenizers, and token filters can be configured with a
|
||||||
|
`version` parameter to control which Lucene version behavior they should
|
||||||
|
use. Possible values are: `3.0` - `3.6`, `4.0` - `4.3` (the highest
|
||||||
|
version number is the default option).
|
||||||
|
|
||||||
|
--
|
||||||
|
|
||||||
|
include::analysis/analyzers.asciidoc[]
|
||||||
|
|
||||||
|
include::analysis/tokenizers.asciidoc[]
|
||||||
|
|
||||||
|
include::analysis/tokenfilters.asciidoc[]
|
||||||
|
|
||||||
|
include::analysis/charfilters.asciidoc[]
|
||||||
|
|
||||||
|
include::analysis/icu-plugin.asciidoc[]
|
||||||
|
|
69
docs/reference/analysis/analyzers.asciidoc
Normal file
69
docs/reference/analysis/analyzers.asciidoc
Normal file
|
@ -0,0 +1,69 @@
|
||||||
|
[[analysis-analyzers]]
|
||||||
|
== Analyzers
|
||||||
|
|
||||||
|
Analyzers are composed of a single <<analysis-tokenizers,Tokenizer>>
|
||||||
|
and zero or more <<analysis-tokenfilters,TokenFilters>>. The tokenizer may
|
||||||
|
be preceded by one or more <<analysis-charfilters,CharFilters>>.
|
||||||
|
The analysis module allows you to register `Analyzers` under logical
|
||||||
|
names which can then be referenced either in mapping definitions or in
|
||||||
|
certain APIs.
|
||||||
|
|
||||||
|
Elasticsearch comes with a number of prebuilt analyzers which are
|
||||||
|
ready to use. Alternatively, you can combine the built in
|
||||||
|
character filters, tokenizers and token filters to create
|
||||||
|
<<analysis-custom-analyzer,custom analyzers>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Default Analyzers
|
||||||
|
|
||||||
|
An analyzer is registered under a logical name. It can then be
|
||||||
|
referenced from mapping definitions or certain APIs. When none are
|
||||||
|
defined, defaults are used. There is an option to define which analyzers
|
||||||
|
will be used by default when none can be derived.
|
||||||
|
|
||||||
|
The `default` logical name allows one to configure an analyzer that will
|
||||||
|
be used both for indexing and for searching APIs. The `default_index`
|
||||||
|
logical name can be used to configure a default analyzer that will be
|
||||||
|
used just when indexing, and the `default_search` can be used to
|
||||||
|
configure a default analyzer that will be used just when searching.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Aliasing Analyzers
|
||||||
|
|
||||||
|
Analyzers can be aliased to have several registered lookup names
|
||||||
|
associated with them. For example, the following will allow
|
||||||
|
the `standard` analyzer to also be referenced with `alias1`
|
||||||
|
and `alias2` values.
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
standard :
|
||||||
|
alias: [alias1, alias2]
|
||||||
|
type : standard
|
||||||
|
stopwords : [test1, test2, test3]
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Below is a list of the built in analyzers.
|
||||||
|
|
||||||
|
include::analyzers/standard-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/simple-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/whitespace-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/stop-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/keyword-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/pattern-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/lang-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/snowball-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::analyzers/custom-analyzer.asciidoc[]
|
||||||
|
|
52
docs/reference/analysis/analyzers/custom-analyzer.asciidoc
Normal file
52
docs/reference/analysis/analyzers/custom-analyzer.asciidoc
Normal file
|
@ -0,0 +1,52 @@
|
||||||
|
[[analysis-custom-analyzer]]
|
||||||
|
=== Custom Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `custom` that allows to combine a `Tokenizer` with
|
||||||
|
zero or more `Token Filters`, and zero or more `Char Filters`. The
|
||||||
|
custom analyzer accepts a logical/registered name of the tokenizer to
|
||||||
|
use, and a list of logical/registered names of token filters.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `custom` analyzer type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`tokenizer` |The logical / registered name of the tokenizer to use.
|
||||||
|
|
||||||
|
|`filter` |An optional list of logical / registered name of token
|
||||||
|
filters.
|
||||||
|
|
||||||
|
|`char_filter` |An optional list of logical / registered name of char
|
||||||
|
filters.
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
myAnalyzer2 :
|
||||||
|
type : custom
|
||||||
|
tokenizer : myTokenizer1
|
||||||
|
filter : [myTokenFilter1, myTokenFilter2]
|
||||||
|
char_filter : [my_html]
|
||||||
|
tokenizer :
|
||||||
|
myTokenizer1 :
|
||||||
|
type : standard
|
||||||
|
max_token_length : 900
|
||||||
|
filter :
|
||||||
|
myTokenFilter1 :
|
||||||
|
type : stop
|
||||||
|
stopwords : [stop1, stop2, stop3, stop4]
|
||||||
|
myTokenFilter2 :
|
||||||
|
type : length
|
||||||
|
min : 0
|
||||||
|
max : 2000
|
||||||
|
char_filter :
|
||||||
|
my_html :
|
||||||
|
type : html_strip
|
||||||
|
escaped_tags : [xxx, yyy]
|
||||||
|
read_ahead : 1024
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,7 @@
|
||||||
|
[[analysis-keyword-analyzer]]
|
||||||
|
=== Keyword Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `keyword` that "tokenizes" an entire stream as a
|
||||||
|
single token. This is useful for data like zip codes, ids and so on.
|
||||||
|
Note, when using mapping definitions, it might make more sense to simply
|
||||||
|
mark the field as `not_analyzed`.
|
20
docs/reference/analysis/analyzers/lang-analyzer.asciidoc
Normal file
20
docs/reference/analysis/analyzers/lang-analyzer.asciidoc
Normal file
|
@ -0,0 +1,20 @@
|
||||||
|
[[analysis-lang-analyzer]]
|
||||||
|
=== Language Analyzers
|
||||||
|
|
||||||
|
A set of analyzers aimed at analyzing specific language text. The
|
||||||
|
following types are supported: `arabic`, `armenian`, `basque`,
|
||||||
|
`brazilian`, `bulgarian`, `catalan`, `chinese`, `cjk`, `czech`,
|
||||||
|
`danish`, `dutch`, `english`, `finnish`, `french`, `galician`, `german`,
|
||||||
|
`greek`, `hindi`, `hungarian`, `indonesian`, `italian`, `norwegian`,
|
||||||
|
`persian`, `portuguese`, `romanian`, `russian`, `spanish`, `swedish`,
|
||||||
|
`turkish`, `thai`.
|
||||||
|
|
||||||
|
All analyzers support setting custom `stopwords` either internally in
|
||||||
|
the config, or by using an external stopwords file by setting
|
||||||
|
`stopwords_path`.
|
||||||
|
|
||||||
|
The following analyzers support setting custom `stem_exclusion` list:
|
||||||
|
`arabic`, `armenian`, `basque`, `brazilian`, `bulgarian`, `catalan`,
|
||||||
|
`czech`, `danish`, `dutch`, `english`, `finnish`, `french`, `galician`,
|
||||||
|
`german`, `hindi`, `hungarian`, `indonesian`, `italian`, `norwegian`,
|
||||||
|
`portuguese`, `romanian`, `russian`, `spanish`, `swedish`, `turkish`.
|
126
docs/reference/analysis/analyzers/pattern-analyzer.asciidoc
Normal file
126
docs/reference/analysis/analyzers/pattern-analyzer.asciidoc
Normal file
|
@ -0,0 +1,126 @@
|
||||||
|
[[analysis-pattern-analyzer]]
|
||||||
|
=== Pattern Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `pattern` that can flexibly separate text into terms
|
||||||
|
via a regular expression. Accepts the following settings:
|
||||||
|
|
||||||
|
The following are settings that can be set for a `pattern` analyzer
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|===================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`lowercase` |Should terms be lowercased or not. Defaults to `true`.
|
||||||
|
|`pattern` |The regular expression pattern, defaults to `\W+`.
|
||||||
|
|`flags` |The regular expression flags.
|
||||||
|
|===================================================================
|
||||||
|
|
||||||
|
*IMPORTANT*: The regular expression should match the *token separators*,
|
||||||
|
not the tokens themselves.
|
||||||
|
|
||||||
|
Flags should be pipe-separated, eg `"CASE_INSENSITIVE|COMMENTS"`. Check
|
||||||
|
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#field_summary[Java
|
||||||
|
Pattern API] for more details about `flags` options.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Pattern Analyzer Examples
|
||||||
|
|
||||||
|
In order to try out these examples, you should delete the `test` index
|
||||||
|
before running each example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XDELETE localhost:9200/test
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Whitespace tokenizer
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XPUT 'localhost:9200/test' -d '
|
||||||
|
{
|
||||||
|
"settings":{
|
||||||
|
"analysis": {
|
||||||
|
"analyzer": {
|
||||||
|
"whitespace":{
|
||||||
|
"type": "pattern",
|
||||||
|
"pattern":"\\\\s+"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
|
||||||
|
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 'foo,bar baz'
|
||||||
|
# "foo,bar", "baz"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Non-word character tokenizer
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
curl -XPUT 'localhost:9200/test' -d '
|
||||||
|
{
|
||||||
|
"settings":{
|
||||||
|
"analysis": {
|
||||||
|
"analyzer": {
|
||||||
|
"nonword":{
|
||||||
|
"type": "pattern",
|
||||||
|
"pattern":"[^\\\\w]+"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
|
||||||
|
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'foo,bar baz'
|
||||||
|
# "foo,bar baz" becomes "foo", "bar", "baz"
|
||||||
|
|
||||||
|
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'type_1-type_4'
|
||||||
|
# "type_1","type_4"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== CamelCase tokenizer
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
curl -XPUT 'localhost:9200/test?pretty=1' -d '
|
||||||
|
{
|
||||||
|
"settings":{
|
||||||
|
"analysis": {
|
||||||
|
"analyzer": {
|
||||||
|
"camel":{
|
||||||
|
"type": "pattern",
|
||||||
|
"pattern":"([^\\\\p{L}\\\\d]+)|(?<=\\\\D)(?=\\\\d)|(?<=\\\\d)(?=\\\\D)|(?<=[\\\\p{L}&&[^\\\\p{Lu}]])(?=\\\\p{Lu})|(?<=\\\\p{Lu})(?=\\\\p{Lu}[\\\\p{L}&&[^\\\\p{Lu}]])"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
|
||||||
|
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=camel' -d '
|
||||||
|
MooseX::FTPClass2_beta
|
||||||
|
'
|
||||||
|
# "moose","x","ftp","class","2","beta"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The regex above is easier to understand as:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
([^\\p{L}\\d]+) # swallow non letters and numbers,
|
||||||
|
| (?<=\\D)(?=\\d) # or non-number followed by number,
|
||||||
|
| (?<=\\d)(?=\\D) # or number followed by non-number,
|
||||||
|
| (?<=[ \\p{L} && [^\\p{Lu}]]) # or lower case
|
||||||
|
(?=\\p{Lu}) # followed by upper case,
|
||||||
|
| (?<=\\p{Lu}) # or upper case
|
||||||
|
(?=\\p{Lu} # followed by upper case
|
||||||
|
[\\p{L}&&[^\\p{Lu}]] # then lower case
|
||||||
|
)
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,6 @@
|
||||||
|
[[analysis-simple-analyzer]]
|
||||||
|
=== Simple Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `simple` that is built using a
|
||||||
|
<<analysis-lowercase-tokenizer,Lower
|
||||||
|
Case Tokenizer>>.
|
63
docs/reference/analysis/analyzers/snowball-analyzer.asciidoc
Normal file
63
docs/reference/analysis/analyzers/snowball-analyzer.asciidoc
Normal file
|
@ -0,0 +1,63 @@
|
||||||
|
[[analysis-snowball-analyzer]]
|
||||||
|
=== Snowball Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `snowball` that uses the
|
||||||
|
<<analysis-standard-tokenizer,standard
|
||||||
|
tokenizer>>, with
|
||||||
|
<<analysis-standard-tokenfilter,standard
|
||||||
|
filter>>,
|
||||||
|
<<analysis-lowercase-tokenfilter,lowercase
|
||||||
|
filter>>,
|
||||||
|
<<analysis-stop-tokenfilter,stop
|
||||||
|
filter>>, and
|
||||||
|
<<analysis-snowball-tokenfilter,snowball
|
||||||
|
filter>>.
|
||||||
|
|
||||||
|
The Snowball Analyzer is a stemming analyzer from Lucene that is
|
||||||
|
originally based on the snowball project from
|
||||||
|
http://snowball.tartarus.org[snowball.tartarus.org].
|
||||||
|
|
||||||
|
Sample usage:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"my_analyzer" : {
|
||||||
|
"type" : "snowball",
|
||||||
|
"language" : "English"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The `language` parameter can have the same values as the
|
||||||
|
<<analysis-snowball-tokenfilter,snowball
|
||||||
|
filter>> and defaults to `English`. Note that not all the language
|
||||||
|
analyzers have a default set of stopwords provided.
|
||||||
|
|
||||||
|
The `stopwords` parameter can be used to provide stopwords for the
|
||||||
|
languages that has no defaults, or to simply replace the default set
|
||||||
|
with your custom list. A default set of stopwords for many of these
|
||||||
|
languages is available from for instance
|
||||||
|
https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/[here]
|
||||||
|
and
|
||||||
|
https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball[here.]
|
||||||
|
|
||||||
|
A sample configuration (in YAML format) specifying Swedish with
|
||||||
|
stopwords:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
my_analyzer:
|
||||||
|
type: snowball
|
||||||
|
language: Swedish
|
||||||
|
stopwords: "och,det,att,i,en,jag,hon,som,han,på,den,med,var,sig,för,så,till,är,men,ett,om,hade,de,av,icke,mig,du,henne,då,sin,nu,har,inte,hans,honom,skulle,hennes,där,min,man,ej,vid,kunde,något,från,ut,när,efter,upp,vi,dem,vara,vad,över,än,dig,kan,sina,här,ha,mot,alla,under,någon,allt,mycket,sedan,ju,denna,själv,detta,åt,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,några,deras,blir,mina,samma,vilken,er,sådan,vår,blivit,dess,inom,mellan,sådant,varför,varje,vilka,ditt,vem,vilket,sitta,sådana,vart,dina,vars,vårt,våra,ert,era,vilkas"
|
||||||
|
--------------------------------------------------
|
26
docs/reference/analysis/analyzers/standard-analyzer.asciidoc
Normal file
26
docs/reference/analysis/analyzers/standard-analyzer.asciidoc
Normal file
|
@ -0,0 +1,26 @@
|
||||||
|
[[analysis-standard-analyzer]]
|
||||||
|
=== Standard Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `standard` that is built of using
|
||||||
|
<<analysis-standard-tokenizer,Standard
|
||||||
|
Tokenizer>>, with
|
||||||
|
<<analysis-standard-tokenfilter,Standard
|
||||||
|
Token Filter>>,
|
||||||
|
<<analysis-lowercase-tokenfilter,Lower
|
||||||
|
Case Token Filter>>, and
|
||||||
|
<<analysis-stop-tokenfilter,Stop
|
||||||
|
Token Filter>>.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `standard` analyzer
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`stopwords` |A list of stopword to initialize the stop filter with.
|
||||||
|
Defaults to the english stop words.
|
||||||
|
|
||||||
|
|`max_token_length` |The maximum token length. If a token is seen that
|
||||||
|
exceeds this length then it is discarded. Defaults to `255`.
|
||||||
|
|=======================================================================
|
||||||
|
|
21
docs/reference/analysis/analyzers/stop-analyzer.asciidoc
Normal file
21
docs/reference/analysis/analyzers/stop-analyzer.asciidoc
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
[[analysis-stop-analyzer]]
|
||||||
|
=== Stop Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `stop` that is built using a
|
||||||
|
<<analysis-lowercase-tokenizer,Lower
|
||||||
|
Case Tokenizer>>, with
|
||||||
|
<<analysis-stop-tokenfilter,Stop
|
||||||
|
Token Filter>>.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `stop` analyzer type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`stopwords` |A list of stopword to initialize the stop filter with.
|
||||||
|
Defaults to the english stop words.
|
||||||
|
|
||||||
|
|`stopwords_path` |A path (either relative to `config` location, or
|
||||||
|
absolute) to a stopwords file configuration.
|
||||||
|
|=======================================================================
|
||||||
|
|
|
@ -0,0 +1,6 @@
|
||||||
|
[[analysis-whitespace-analyzer]]
|
||||||
|
=== Whitespace Analyzer
|
||||||
|
|
||||||
|
An analyzer of type `whitespace` that is built using a
|
||||||
|
<<analysis-whitespace-tokenizer,Whitespace
|
||||||
|
Tokenizer>>.
|
16
docs/reference/analysis/charfilters.asciidoc
Normal file
16
docs/reference/analysis/charfilters.asciidoc
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
[[analysis-charfilters]]
|
||||||
|
== Character Filters
|
||||||
|
|
||||||
|
Character filters are used to preprocess the string of
|
||||||
|
characters before it is passed to the <<analysis-tokenizers,tokenizer>>.
|
||||||
|
A character filter may be used to strip out HTML markup, , or to convert
|
||||||
|
`"&"` characters to the word `"and"`.
|
||||||
|
|
||||||
|
Elasticsearch has built in characters filters which can be
|
||||||
|
used to build <<analysis-custom-analyzer,custom analyzers>>.
|
||||||
|
|
||||||
|
include::charfilters/mapping-charfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::charfilters/htmlstrip-charfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::charfilters/pattern-replace-charfilter.asciidoc[]
|
|
@ -0,0 +1,5 @@
|
||||||
|
[[analysis-htmlstrip-charfilter]]
|
||||||
|
=== HTML Strip Char Filter
|
||||||
|
|
||||||
|
A char filter of type `html_strip` stripping out HTML elements from an
|
||||||
|
analyzed text.
|
|
@ -0,0 +1,38 @@
|
||||||
|
[[analysis-mapping-charfilter]]
|
||||||
|
=== Mapping Char Filter
|
||||||
|
|
||||||
|
A char filter of type `mapping` replacing characters of an analyzed text
|
||||||
|
with given mapping.
|
||||||
|
|
||||||
|
Here is a sample configuration:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"char_filter" : {
|
||||||
|
"my_mapping" : {
|
||||||
|
"type" : "mapping",
|
||||||
|
"mappings" : ["ph=>f", "qu=>q"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"analyzer" : {
|
||||||
|
"custom_with_char_filter" : {
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"char_filter" : ["my_mapping"]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Otherwise the setting `mappings_path` can specify a file where you can
|
||||||
|
put the list of char mapping :
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
ph => f
|
||||||
|
qu => k
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,37 @@
|
||||||
|
[[analysis-pattern-replace-charfilter]]
|
||||||
|
=== Pattern Replace Char Filter
|
||||||
|
|
||||||
|
The `pattern_replace` char filter allows the use of a regex to
|
||||||
|
manipulate the characters in a string before analysis. The regular
|
||||||
|
expression is defined using the `pattern` parameter, and the replacement
|
||||||
|
string can be provided using the `replacement` parameter (supporting
|
||||||
|
referencing the original text, as explained
|
||||||
|
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
|
||||||
|
For more information check the
|
||||||
|
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilter.html[lucene
|
||||||
|
documentation]
|
||||||
|
|
||||||
|
Here is a sample configuration:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"char_filter" : {
|
||||||
|
"my_pattern":{
|
||||||
|
"type":"pattern_replace",
|
||||||
|
"pattern":"sample(.*)",
|
||||||
|
"replacement":"replacedSample $1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"analyzer" : {
|
||||||
|
"custom_with_char_filter" : {
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"char_filter" : ["my_pattern"]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
148
docs/reference/analysis/icu-plugin.asciidoc
Normal file
148
docs/reference/analysis/icu-plugin.asciidoc
Normal file
|
@ -0,0 +1,148 @@
|
||||||
|
[[analysis-icu-plugin]]
|
||||||
|
== ICU Analysis Plugin
|
||||||
|
|
||||||
|
The http://icu-project.org/[ICU] analysis plugin allows for unicode
|
||||||
|
normalization, collation and folding. The plugin is called
|
||||||
|
https://github.com/elasticsearch/elasticsearch-analysis-icu[elasticsearch-analysis-icu].
|
||||||
|
|
||||||
|
The plugin includes the following analysis components:
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== ICU Normalization
|
||||||
|
|
||||||
|
Normalizes characters as explained
|
||||||
|
http://userguide.icu-project.org/transforms/normalization[here]. It
|
||||||
|
registers itself by default under `icu_normalizer` or `icuNormalizer`
|
||||||
|
using the default settings. Allows for the name parameter to be provided
|
||||||
|
which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`.
|
||||||
|
Here is a sample settings:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"normalization" : {
|
||||||
|
"tokenizer" : "keyword",
|
||||||
|
"filter" : ["icu_normalizer"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== ICU Folding
|
||||||
|
|
||||||
|
Folding of unicode characters based on `UTR#30`. It registers itself
|
||||||
|
under `icu_folding` and `icuFolding` names.
|
||||||
|
The filter also does lowercasing, which means the lowercase filter can
|
||||||
|
normally be left out. Sample setting:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"folding" : {
|
||||||
|
"tokenizer" : "keyword",
|
||||||
|
"filter" : ["icu_folding"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Filtering
|
||||||
|
|
||||||
|
The folding can be filtered by a set of unicode characters with the
|
||||||
|
parameter `unicodeSetFilter`. This is useful for a non-internationalized
|
||||||
|
search engine where retaining a set of national characters which are
|
||||||
|
primary letters in a specific language is wanted. See syntax for the
|
||||||
|
UnicodeSet
|
||||||
|
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html[here].
|
||||||
|
|
||||||
|
The Following example excempt Swedish characters from the folding. Note
|
||||||
|
that the filtered characters are NOT lowercased which is why we add that
|
||||||
|
filter below.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"folding" : {
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"filter" : ["my_icu_folding", "lowercase"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
"filter" : {
|
||||||
|
"my_icu_folding" : {
|
||||||
|
"type" : "icu_folding"
|
||||||
|
"unicodeSetFilter" : "[^åäöÅÄÖ]"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== ICU Collation
|
||||||
|
|
||||||
|
Uses collation token filter. Allows to either specify the rules for
|
||||||
|
collation (defined
|
||||||
|
http://www.icu-project.org/userguide/Collate_Customization.html[here])
|
||||||
|
using the `rules` parameter (can point to a location or expressed in the
|
||||||
|
settings, location can be relative to config location), or using the
|
||||||
|
`language` parameter (further specialized by country and variant). By
|
||||||
|
default registers under `icu_collation` or `icuCollation` and uses the
|
||||||
|
default locale.
|
||||||
|
|
||||||
|
Here is a sample settings:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"collation" : {
|
||||||
|
"tokenizer" : "keyword",
|
||||||
|
"filter" : ["icu_collation"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
And here is a sample of custom collation:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"collation" : {
|
||||||
|
"tokenizer" : "keyword",
|
||||||
|
"filter" : ["myCollator"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"filter" : {
|
||||||
|
"myCollator" : {
|
||||||
|
"type" : "icu_collation",
|
||||||
|
"language" : "en"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
71
docs/reference/analysis/tokenfilters.asciidoc
Normal file
71
docs/reference/analysis/tokenfilters.asciidoc
Normal file
|
@ -0,0 +1,71 @@
|
||||||
|
[[analysis-tokenfilters]]
|
||||||
|
== Token Filters
|
||||||
|
|
||||||
|
Token filters accept a stream of tokens from a
|
||||||
|
<<analysis-tokenizers,tokenizer>> and can modify tokens
|
||||||
|
(eg lowercasing), delete tokens (eg remove stopwords)
|
||||||
|
or add tokens (eg synonyms).
|
||||||
|
|
||||||
|
Elasticsearch has a number of built in token filters which can be
|
||||||
|
used to build <<analysis-custom-analyzer,custom analyzers>>.
|
||||||
|
|
||||||
|
include::tokenfilters/standard-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/asciifolding-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/length-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/lowercase-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/ngram-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/edgengram-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/porterstem-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/shingle-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/stop-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/word-delimiter-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/stemmer-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/stemmer-override-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/keyword-marker-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/keyword-repeat-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/kstem-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/snowball-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/phonetic-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/synonym-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/compound-word-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/reverse-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/elision-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/truncate-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/unique-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/pattern-capture-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/pattern_replace-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/trim-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/limit-token-count-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/hunspell-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/common-grams-tokenfilter.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenfilters/normalization-tokenfilter.asciidoc[]
|
||||||
|
|
|
@ -0,0 +1,7 @@
|
||||||
|
[[analysis-asciifolding-tokenfilter]]
|
||||||
|
=== ASCII Folding Token Filter
|
||||||
|
|
||||||
|
A token filter of type `asciifolding` that converts alphabetic, numeric,
|
||||||
|
and symbolic Unicode characters which are not in the first 127 ASCII
|
||||||
|
characters (the "Basic Latin" Unicode block) into their ASCII
|
||||||
|
equivalents, if one exists.
|
|
@ -0,0 +1,61 @@
|
||||||
|
[[analysis-common-grams-tokenfilter]]
|
||||||
|
=== Common Grams Token Filter
|
||||||
|
|
||||||
|
Token filter that generates bigrams for frequently occuring terms.
|
||||||
|
Single terms are still indexed. It can be used as an alternative to the
|
||||||
|
<<analysis-stop-tokenfilter,Stop
|
||||||
|
Token Filter>> when we don't want to completely ignore common terms.
|
||||||
|
|
||||||
|
For example, the text "the quick brown is a fox" will be tokenized as
|
||||||
|
"the", "the_quick", "quick", "brown", "brown_is", "is_a", "a_fox",
|
||||||
|
"fox". Assuming "the", "is" and "a" are common words.
|
||||||
|
|
||||||
|
When `query_mode` is enabled, the token filter removes common words and
|
||||||
|
single terms followed by a common word. This parameter should be enabled
|
||||||
|
in the search analyzer.
|
||||||
|
|
||||||
|
For example, the query "the quick brown is a fox" will be tokenized as
|
||||||
|
"the_quick", "quick", "brown_is", "is_a", "a_fox", "fox".
|
||||||
|
|
||||||
|
The following are settings that can be set:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`common_words` |A list of common words to use.
|
||||||
|
|
||||||
|
|`common_words_path` |A path (either relative to `config` location, or
|
||||||
|
absolute) to a list of common words. Each word should be in its own
|
||||||
|
"line" (separated by a line break). The file must be UTF-8 encoded.
|
||||||
|
|
||||||
|
|`ignore_case` |If true, common words matching will be case insensitive
|
||||||
|
(defaults to `false`).
|
||||||
|
|
||||||
|
|`query_mode` |Generates bigrams then removes common words and single
|
||||||
|
terms followed by a common word (defaults to `false`).
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
Note, `common_words` or `common_words_path` field is required.
|
||||||
|
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
index_grams :
|
||||||
|
tokenizer : whitespace
|
||||||
|
filter : [common_grams]
|
||||||
|
search_grams :
|
||||||
|
tokenizer : whitespace
|
||||||
|
filter : [common_grams_query]
|
||||||
|
filter :
|
||||||
|
common_grams :
|
||||||
|
type : common_grams
|
||||||
|
common_words: [a, an, the]
|
||||||
|
common_grams_query :
|
||||||
|
type : common_grams
|
||||||
|
query_mode: true
|
||||||
|
common_words: [a, an, the]
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,48 @@
|
||||||
|
[[analysis-compound-word-tokenfilter]]
|
||||||
|
=== Compound Word Token Filter
|
||||||
|
|
||||||
|
Token filters that allow to decompose compound words. There are two
|
||||||
|
types available: `dictionary_decompounder` and
|
||||||
|
`hyphenation_decompounder`.
|
||||||
|
|
||||||
|
The following are settings that can be set for a compound word token
|
||||||
|
filter type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`word_list` |A list of words to use.
|
||||||
|
|
||||||
|
|`word_list_path` |A path (either relative to `config` location, or
|
||||||
|
absolute) to a list of words.
|
||||||
|
|
||||||
|
|`min_word_size` |Minimum word size(Integer). Defaults to 5.
|
||||||
|
|
||||||
|
|`min_subword_size` |Minimum subword size(Integer). Defaults to 2.
|
||||||
|
|
||||||
|
|`max_subword_size` |Maximum subword size(Integer). Defaults to 15.
|
||||||
|
|
||||||
|
|`only_longest_match` |Only matching the longest(Boolean). Defaults to
|
||||||
|
`false`
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
myAnalyzer2 :
|
||||||
|
type : custom
|
||||||
|
tokenizer : standard
|
||||||
|
filter : [myTokenFilter1, myTokenFilter2]
|
||||||
|
filter :
|
||||||
|
myTokenFilter1 :
|
||||||
|
type : dictionary_decompounder
|
||||||
|
word_list: [one, two, three]
|
||||||
|
myTokenFilter2 :
|
||||||
|
type : hyphenation_decompounder
|
||||||
|
word_list_path: path/to/words.txt
|
||||||
|
max_subword_size : 22
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,16 @@
|
||||||
|
[[analysis-edgengram-tokenfilter]]
|
||||||
|
=== Edge NGram Token Filter
|
||||||
|
|
||||||
|
A token filter of type `edgeNGram`.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `edgeNGram` token
|
||||||
|
filter type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|======================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`min_gram` |Defaults to `1`.
|
||||||
|
|`max_gram` |Defaults to `2`.
|
||||||
|
|`side` |Either `front` or `back`. Defaults to `front`.
|
||||||
|
|======================================================
|
||||||
|
|
|
@ -0,0 +1,28 @@
|
||||||
|
[[analysis-elision-tokenfilter]]
|
||||||
|
=== Elision Token Filter
|
||||||
|
|
||||||
|
A token filter which removes elisions. For example, "l'avion" (the
|
||||||
|
plane) will tokenized as "avion" (plane).
|
||||||
|
|
||||||
|
Accepts `articles` setting which is a set of stop words articles. For
|
||||||
|
example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"default" : {
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"filter" : ["standard", "elision"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"filter" : {
|
||||||
|
"elision" : {
|
||||||
|
"type" : "elision",
|
||||||
|
"articles" : ["l", "m", "t", "qu", "n", "s", "j"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,116 @@
|
||||||
|
[[analysis-hunspell-tokenfilter]]
|
||||||
|
=== Hunspell Token Filter
|
||||||
|
|
||||||
|
Basic support for hunspell stemming. Hunspell dictionaries will be
|
||||||
|
picked up from a dedicated hunspell directory on the filesystem
|
||||||
|
(defaults to `<path.conf>/hunspell`). Each dictionary is expected to
|
||||||
|
have its own directory named after its associated locale (language).
|
||||||
|
This dictionary directory is expected to hold both the \*.aff and \*.dic
|
||||||
|
files (all of which will automatically be picked up). For example,
|
||||||
|
assuming the default hunspell location is used, the following directory
|
||||||
|
layout will define the `en_US` dictionary:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
- conf
|
||||||
|
|-- hunspell
|
||||||
|
| |-- en_US
|
||||||
|
| | |-- en_US.dic
|
||||||
|
| | |-- en_US.aff
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The location of the hunspell directory can be configured using the
|
||||||
|
`indices.analysis.hunspell.dictionary.location` settings in
|
||||||
|
_elasticsearch.yml_.
|
||||||
|
|
||||||
|
Each dictionary can be configured with two settings:
|
||||||
|
|
||||||
|
`ignore_case`::
|
||||||
|
If true, dictionary matching will be case insensitive
|
||||||
|
(defaults to `false`)
|
||||||
|
|
||||||
|
`strict_affix_parsing`::
|
||||||
|
Determines whether errors while reading a
|
||||||
|
affix rules file will cause exception or simple be ignored (defaults to
|
||||||
|
`true`)
|
||||||
|
|
||||||
|
These settings can be configured globally in `elasticsearch.yml` using
|
||||||
|
|
||||||
|
* `indices.analysis.hunspell.dictionary.ignore_case` and
|
||||||
|
* `indices.analysis.hunspell.dictionary.strict_affix_parsing`
|
||||||
|
|
||||||
|
or for specific dictionaries:
|
||||||
|
|
||||||
|
* `indices.analysis.hunspell.dictionary.en_US.ignore_case` and
|
||||||
|
* `indices.analysis.hunspell.dictionary.en_US.strict_affix_parsing`.
|
||||||
|
|
||||||
|
It is also possible to add `settings.yml` file under the dictionary
|
||||||
|
directory which holds these settings (this will override any other
|
||||||
|
settings defined in the `elasticsearch.yml`).
|
||||||
|
|
||||||
|
One can use the hunspell stem filter by configuring it the analysis
|
||||||
|
settings:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"en" : {
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"filter" : [ "lowercase", "en_US" ]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"filter" : {
|
||||||
|
"en_US" : {
|
||||||
|
"type" : "hunspell",
|
||||||
|
"locale" : "en_US",
|
||||||
|
"dedup" : true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The hunspell token filter accepts four options:
|
||||||
|
|
||||||
|
`locale`::
|
||||||
|
A locale for this filter. If this is unset, the `lang` or
|
||||||
|
`language` are used instead - so one of these has to be set.
|
||||||
|
|
||||||
|
`dictionary`::
|
||||||
|
The name of a dictionary. The path to your hunspell
|
||||||
|
dictionaries should be configured via
|
||||||
|
`indices.analysis.hunspell.dictionary.location` before.
|
||||||
|
|
||||||
|
`dedup`::
|
||||||
|
If only unique terms should be returned, this needs to be
|
||||||
|
set to `true`. Defaults to `true`.
|
||||||
|
|
||||||
|
`recursion_level`::
|
||||||
|
Configures the recursion level a
|
||||||
|
stemmer can go into. Defaults to `2`. Some languages (for example czech)
|
||||||
|
give better results when set to `1` or `0`, so you should test it out.
|
||||||
|
(since 0.90.3)
|
||||||
|
|
||||||
|
NOTE: As opposed to the snowball stemmers (which are algorithm based)
|
||||||
|
this is a dictionary lookup based stemmer and therefore the quality of
|
||||||
|
the stemming is determined by the quality of the dictionary.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== References
|
||||||
|
|
||||||
|
Hunspell is a spell checker and morphological analyzer designed for
|
||||||
|
languages with rich morphology and complex word compounding and
|
||||||
|
character encoding.
|
||||||
|
|
||||||
|
1. Wikipedia, http://en.wikipedia.org/wiki/Hunspell
|
||||||
|
|
||||||
|
2. Source code, http://hunspell.sourceforge.net/
|
||||||
|
|
||||||
|
3. Open Office Hunspell dictionaries, http://wiki.openoffice.org/wiki/Dictionaries
|
||||||
|
|
||||||
|
4. Mozilla Hunspell dictionaries, https://addons.mozilla.org/en-US/firefox/language-tools/
|
||||||
|
|
||||||
|
5. Chromium Hunspell dictionaries,
|
||||||
|
http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/
|
|
@ -0,0 +1,34 @@
|
||||||
|
[[analysis-keyword-marker-tokenfilter]]
|
||||||
|
=== Keyword Marker Token Filter
|
||||||
|
|
||||||
|
Protects words from being modified by stemmers. Must be placed before
|
||||||
|
any stemming filters.
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`keywords` |A list of words to use.
|
||||||
|
|
||||||
|
|`keywords_path` |A path (either relative to `config` location, or
|
||||||
|
absolute) to a list of words.
|
||||||
|
|
||||||
|
|`ignore_case` |Set to `true` to lower case all words first. Defaults to
|
||||||
|
`false`.
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
myAnalyzer :
|
||||||
|
type : custom
|
||||||
|
tokenizer : standard
|
||||||
|
filter : [lowercase, protwods, porterStem]
|
||||||
|
filter :
|
||||||
|
protwods :
|
||||||
|
type : keyword_marker
|
||||||
|
keywords_path : analysis/protwords.txt
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,28 @@
|
||||||
|
[[analysis-keyword-repeat-tokenfilter]]
|
||||||
|
=== Keyword Repeat Token Filter
|
||||||
|
|
||||||
|
The `keyword_repeat` token filter Emits each incoming token twice once
|
||||||
|
as keyword and once as a non-keyword to allow an un-stemmed version of a
|
||||||
|
term to be indexed side by site to the stemmed version of the term.
|
||||||
|
Given the nature of this filter each token that isn't transformed by a
|
||||||
|
subsequent stemmer will be indexed twice. Therefore, consider adding a
|
||||||
|
`unique` filter with `only_on_same_position` set to `true` to drop
|
||||||
|
unnecessary duplicates.
|
||||||
|
|
||||||
|
Note: this is available from `0.90.0.Beta2` on.
|
||||||
|
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
myAnalyzer :
|
||||||
|
type : custom
|
||||||
|
tokenizer : standard
|
||||||
|
filter : [lowercase, keyword_repeat, porterStem, unique_stem]
|
||||||
|
unique_stem:
|
||||||
|
type: unique
|
||||||
|
only_on_same_position : true
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,6 @@
|
||||||
|
[[analysis-kstem-tokenfilter]]
|
||||||
|
=== KStem Token Filter
|
||||||
|
|
||||||
|
The `kstem` token filter is a high performance filter for english. All
|
||||||
|
terms must already be lowercased (use `lowercase` filter) for this
|
||||||
|
filter to work correctly.
|
|
@ -0,0 +1,16 @@
|
||||||
|
[[analysis-length-tokenfilter]]
|
||||||
|
=== Length Token Filter
|
||||||
|
|
||||||
|
A token filter of type `length` that removes words that are too long or
|
||||||
|
too short for the stream.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `length` token filter
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|===========================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`min` |The minimum number. Defaults to `0`.
|
||||||
|
|`max` |The maximum number. Defaults to `Integer.MAX_VALUE`.
|
||||||
|
|===========================================================
|
||||||
|
|
|
@ -0,0 +1,32 @@
|
||||||
|
[[analysis-limit-token-count-tokenfilter]]
|
||||||
|
=== Limit Token Count Token Filter
|
||||||
|
|
||||||
|
Limits the number of tokens that are indexed per document and field.
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`max_token_count` |The maximum number of tokens that should be indexed
|
||||||
|
per document and field. The default is `1`
|
||||||
|
|
||||||
|
|`consume_all_tokens` |If set to `true` the filter exhaust the stream
|
||||||
|
even if `max_token_count` tokens have been consumed already. The default
|
||||||
|
is `false`.
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
myAnalyzer :
|
||||||
|
type : custom
|
||||||
|
tokenizer : standard
|
||||||
|
filter : [lowercase, five_token_limit]
|
||||||
|
filter :
|
||||||
|
five_token_limit :
|
||||||
|
type : limit
|
||||||
|
max_token_count : 5
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,37 @@
|
||||||
|
[[analysis-lowercase-tokenfilter]]
|
||||||
|
=== Lowercase Token Filter
|
||||||
|
|
||||||
|
A token filter of type `lowercase` that normalizes token text to lower
|
||||||
|
case.
|
||||||
|
|
||||||
|
Lowercase token filter supports Greek and Turkish lowercase token
|
||||||
|
filters through the `language` parameter. Below is a usage example in a
|
||||||
|
custom analyzer
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
myAnalyzer2 :
|
||||||
|
type : custom
|
||||||
|
tokenizer : myTokenizer1
|
||||||
|
filter : [myTokenFilter1, myGreekLowerCaseFilter]
|
||||||
|
char_filter : [my_html]
|
||||||
|
tokenizer :
|
||||||
|
myTokenizer1 :
|
||||||
|
type : standard
|
||||||
|
max_token_length : 900
|
||||||
|
filter :
|
||||||
|
myTokenFilter1 :
|
||||||
|
type : stop
|
||||||
|
stopwords : [stop1, stop2, stop3, stop4]
|
||||||
|
myGreekLowerCaseFilter :
|
||||||
|
type : lowercase
|
||||||
|
language : greek
|
||||||
|
char_filter :
|
||||||
|
my_html :
|
||||||
|
type : html_strip
|
||||||
|
escaped_tags : [xxx, yyy]
|
||||||
|
read_ahead : 1024
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,15 @@
|
||||||
|
[[analysis-ngram-tokenfilter]]
|
||||||
|
=== NGram Token Filter
|
||||||
|
|
||||||
|
A token filter of type `nGram`.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `nGram` token filter
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|============================
|
||||||
|
|Setting |Description
|
||||||
|
|`min_gram` |Defaults to `1`.
|
||||||
|
|`max_gram` |Defaults to `2`.
|
||||||
|
|============================
|
||||||
|
|
|
@ -0,0 +1,15 @@
|
||||||
|
[[analysis-normalization-tokenfilter]]
|
||||||
|
=== Normalization Token Filter
|
||||||
|
|
||||||
|
There are several token filters available which try to normalize special
|
||||||
|
characters of a certain language.
|
||||||
|
|
||||||
|
You can currently choose between `arabic_normalization` and
|
||||||
|
`persian_normalization` normalization in your token filter
|
||||||
|
configuration. For more information check the
|
||||||
|
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizer.html[ArabicNormalizer]
|
||||||
|
or the
|
||||||
|
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizer.html[PersianNormalizer]
|
||||||
|
documentation.
|
||||||
|
|
||||||
|
*Note:* This filters are available since `0.90.2`
|
|
@ -0,0 +1,134 @@
|
||||||
|
[[analysis-pattern-capture-tokenfilter]]
|
||||||
|
=== Pattern Capture Token Filter
|
||||||
|
|
||||||
|
The `pattern_capture` token filter, unlike the `pattern` tokenizer,
|
||||||
|
emits a token for every capture group in the regular expression.
|
||||||
|
Patterns are not anchored to the beginning and end of the string, so
|
||||||
|
each pattern can match multiple times, and matches are allowed to
|
||||||
|
overlap.
|
||||||
|
|
||||||
|
For instance a pattern like :
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"(([a-z]+)(\d*))"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
when matched against:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"abc123def456"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
would produce the tokens: [ `abc123`, `abc`, `123`, `def456`, `def`,
|
||||||
|
`456` ]
|
||||||
|
|
||||||
|
If `preserve_original` is set to `true` (the default) then it would also
|
||||||
|
emit the original token: `abc123def456`.
|
||||||
|
|
||||||
|
This is particularly useful for indexing text like camel-case code, eg
|
||||||
|
`stripHTML` where a user may search for `"strip html"` or `"striphtml"`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XPUT localhost:9200/test/ -d '
|
||||||
|
{
|
||||||
|
"settings" : {
|
||||||
|
"analysis" : {
|
||||||
|
"filter" : {
|
||||||
|
"code" : {
|
||||||
|
"type" : "pattern_capture",
|
||||||
|
"preserve_original" : 1,
|
||||||
|
"patterns" : [
|
||||||
|
"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)",
|
||||||
|
"(\\d+)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"analyzer" : {
|
||||||
|
"code" : {
|
||||||
|
"tokenizer" : "pattern",
|
||||||
|
"filter" : [ "code", "lowercase" ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
When used to analyze the text
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
this emits the tokens: [ `import`, `static`, `org`, `apache`, `commons`,
|
||||||
|
`lang`, `stringescapeutils`, `string`, `escape`, `utils`, `escapehtml`,
|
||||||
|
`escape`, `html` ]
|
||||||
|
|
||||||
|
Another example is analyzing email addresses:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XPUT localhost:9200/test/ -d '
|
||||||
|
{
|
||||||
|
"settings" : {
|
||||||
|
"analysis" : {
|
||||||
|
"filter" : {
|
||||||
|
"email" : {
|
||||||
|
"type" : "pattern_capture",
|
||||||
|
"preserve_original" : 1,
|
||||||
|
"patterns" : [
|
||||||
|
"(\\w+)",
|
||||||
|
"(\\p{L}+)",
|
||||||
|
"(\\d+)",
|
||||||
|
"@(.+)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"analyzer" : {
|
||||||
|
"email" : {
|
||||||
|
"tokenizer" : "uax_url_email",
|
||||||
|
"filter" : [ "email", "lowercase", "unique" ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
When the above analyzer is used on an email address like:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
john-smith_123@foo-bar.com
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
it would produce the following tokens: [ `john-smith_123`,
|
||||||
|
`foo-bar.com`, `john`, `smith_123`, `smith`, `123`, `foo`,
|
||||||
|
`foo-bar.com`, `bar`, `com` ]
|
||||||
|
|
||||||
|
Multiple patterns are required to allow overlapping captures, but also
|
||||||
|
means that patterns are less dense and easier to understand.
|
||||||
|
|
||||||
|
*Note:* All tokens are emitted in the same position, and with the same
|
||||||
|
character offsets, so when combined with highlighting, the whole
|
||||||
|
original token will be highlighted, not just the matching subset. For
|
||||||
|
instance, querying the above email address for `"smith"` would
|
||||||
|
highlight:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
<em>john-smith_123@foo-bar.com</em>
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
not:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
john-<em>smith</em>_123@foo-bar.com
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,9 @@
|
||||||
|
[[analysis-pattern_replace-tokenfilter]]
|
||||||
|
=== Pattern Replace Token Filter
|
||||||
|
|
||||||
|
The `pattern_replace` token filter allows to easily handle string
|
||||||
|
replacements based on a regular expression. The regular expression is
|
||||||
|
defined using the `pattern` parameter, and the replacement string can be
|
||||||
|
provided using the `replacement` parameter (supporting referencing the
|
||||||
|
original text, as explained
|
||||||
|
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
|
|
@ -0,0 +1,5 @@
|
||||||
|
[[analysis-phonetic-tokenfilter]]
|
||||||
|
=== Phonetic Token Filter
|
||||||
|
|
||||||
|
The `phonetic` token filter is provided as a plugin and located
|
||||||
|
https://github.com/elasticsearch/elasticsearch-analysis-phonetic[here].
|
|
@ -0,0 +1,15 @@
|
||||||
|
[[analysis-porterstem-tokenfilter]]
|
||||||
|
=== Porter Stem Token Filter
|
||||||
|
|
||||||
|
A token filter of type `porterStem` that transforms the token stream as
|
||||||
|
per the Porter stemming algorithm.
|
||||||
|
|
||||||
|
Note, the input to the stemming filter must already be in lower case, so
|
||||||
|
you will need to use
|
||||||
|
<<analysis-lowercase-tokenfilter,Lower
|
||||||
|
Case Token Filter>> or
|
||||||
|
<<analysis-lowercase-tokenizer,Lower
|
||||||
|
Case Tokenizer>> farther down the Tokenizer chain in order for this to
|
||||||
|
work properly!. For example, when using custom analyzer, make sure the
|
||||||
|
`lowercase` filter comes before the `porterStem` filter in the list of
|
||||||
|
filters.
|
|
@ -0,0 +1,4 @@
|
||||||
|
[[analysis-reverse-tokenfilter]]
|
||||||
|
=== Reverse Token Filter
|
||||||
|
|
||||||
|
A token filter of type `reverse` that simply reverses each token.
|
|
@ -0,0 +1,36 @@
|
||||||
|
[[analysis-shingle-tokenfilter]]
|
||||||
|
=== Shingle Token Filter
|
||||||
|
|
||||||
|
A token filter of type `shingle` that constructs shingles (token
|
||||||
|
n-grams) from a token stream. In other words, it creates combinations of
|
||||||
|
tokens as a single token. For example, the sentence "please divide this
|
||||||
|
sentence into shingles" might be tokenized into shingles "please
|
||||||
|
divide", "divide this", "this sentence", "sentence into", and "into
|
||||||
|
shingles".
|
||||||
|
|
||||||
|
This filter handles position increments > 1 by inserting filler tokens
|
||||||
|
(tokens with termtext "_"). It does not handle a position increment of
|
||||||
|
0.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `shingle` token filter
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`max_shingle_size` |The maximum shingle size. Defaults to `2`.
|
||||||
|
|
||||||
|
|`min_shingle_sizes` |The minimum shingle size. Defaults to `2`.
|
||||||
|
|
||||||
|
|`output_unigrams` |If `true` the output will contain the input tokens
|
||||||
|
(unigrams) as well as the shingles. Defaults to `true`.
|
||||||
|
|
||||||
|
|`output_unigrams_if_no_shingles` |If `output_unigrams` is `false` the
|
||||||
|
output will contain the input tokens (unigrams) if no shingles are
|
||||||
|
available. Note if `output_unigrams` is set to `true` this setting has
|
||||||
|
no effect. Defaults to `false`.
|
||||||
|
|
||||||
|
|`token_separator` |The string to use when joining adjacent tokens to
|
||||||
|
form a shingle. Defaults to `" "`.
|
||||||
|
|=======================================================================
|
||||||
|
|
|
@ -0,0 +1,33 @@
|
||||||
|
[[analysis-snowball-tokenfilter]]
|
||||||
|
=== Snowball Token Filter
|
||||||
|
|
||||||
|
A filter that stems words using a Snowball-generated stemmer. The
|
||||||
|
`language` parameter controls the stemmer with the following available
|
||||||
|
values: `Armenian`, `Basque`, `Catalan`, `Danish`, `Dutch`, `English`,
|
||||||
|
`Finnish`, `French`, `German`, `German2`, `Hungarian`, `Italian`, `Kp`,
|
||||||
|
`Lovins`, `Norwegian`, `Porter`, `Portuguese`, `Romanian`, `Russian`,
|
||||||
|
`Spanish`, `Swedish`, `Turkish`.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"my_analyzer" : {
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"filter" : ["standard", "lowercase", "my_snow"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"filter" : {
|
||||||
|
"my_snow" : {
|
||||||
|
"type" : "snowball",
|
||||||
|
"language" : "Lovins"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,7 @@
|
||||||
|
[[analysis-standard-tokenfilter]]
|
||||||
|
=== Standard Token Filter
|
||||||
|
|
||||||
|
A token filter of type `standard` that normalizes tokens extracted with
|
||||||
|
the
|
||||||
|
<<analysis-standard-tokenizer,Standard
|
||||||
|
Tokenizer>>.
|
|
@ -0,0 +1,34 @@
|
||||||
|
[[analysis-stemmer-override-tokenfilter]]
|
||||||
|
=== Stemmer Override Token Filter
|
||||||
|
|
||||||
|
Overrides stemming algorithms, by applying a custom mapping, then
|
||||||
|
protecting these terms from being modified by stemmers. Must be placed
|
||||||
|
before any stemming filters.
|
||||||
|
|
||||||
|
Rules are separated by "=>"
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`rules` |A list of mapping rules to use.
|
||||||
|
|
||||||
|
|`rules_path` |A path (either relative to `config` location, or
|
||||||
|
absolute) to a list of mappings.
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
index :
|
||||||
|
analysis :
|
||||||
|
analyzer :
|
||||||
|
myAnalyzer :
|
||||||
|
type : custom
|
||||||
|
tokenizer : standard
|
||||||
|
filter : [lowercase, custom_stems, porterStem]
|
||||||
|
filter:
|
||||||
|
custom_stems:
|
||||||
|
type: stemmer_override
|
||||||
|
rules_path : analysis/custom_stems.txt
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,78 @@
|
||||||
|
[[analysis-stemmer-tokenfilter]]
|
||||||
|
=== Stemmer Token Filter
|
||||||
|
|
||||||
|
A filter that stems words (similar to `snowball`, but with more
|
||||||
|
options). The `language`/@name@ parameter controls the stemmer with the
|
||||||
|
following available values:
|
||||||
|
|
||||||
|
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Far%2FArabicStemmer.html[arabic],
|
||||||
|
http://snowball.tartarus.org/algorithms/armenian/stemmer.html[armenian],
|
||||||
|
http://snowball.tartarus.org/algorithms/basque/stemmer.html[basque],
|
||||||
|
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fbr%2FBrazilianStemmer.html[brazilian],
|
||||||
|
http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf[bulgarian],
|
||||||
|
http://snowball.tartarus.org/algorithms/catalan/stemmer.html[catalan],
|
||||||
|
http://portal.acm.org/citation.cfm?id=1598600[czech],
|
||||||
|
http://snowball.tartarus.org/algorithms/danish/stemmer.html[danish],
|
||||||
|
http://snowball.tartarus.org/algorithms/dutch/stemmer.html[dutch],
|
||||||
|
http://snowball.tartarus.org/algorithms/english/stemmer.html[english],
|
||||||
|
http://snowball.tartarus.org/algorithms/finnish/stemmer.html[finnish],
|
||||||
|
http://snowball.tartarus.org/algorithms/french/stemmer.html[french],
|
||||||
|
http://snowball.tartarus.org/algorithms/german/stemmer.html[german],
|
||||||
|
http://snowball.tartarus.org/algorithms/german2/stemmer.html[german2],
|
||||||
|
http://sais.se/mthprize/2007/ntais2007.pdf[greek],
|
||||||
|
http://snowball.tartarus.org/algorithms/hungarian/stemmer.html[hungarian],
|
||||||
|
http://snowball.tartarus.org/algorithms/italian/stemmer.html[italian],
|
||||||
|
http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html[kp],
|
||||||
|
http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[kstem],
|
||||||
|
http://snowball.tartarus.org/algorithms/lovins/stemmer.html[lovins],
|
||||||
|
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Flv%2FLatvianStemmer.html[latvian],
|
||||||
|
http://snowball.tartarus.org/algorithms/norwegian/stemmer.html[norwegian],
|
||||||
|
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fno%2FNorwegianMinimalStemFilter.html[minimal_norwegian],
|
||||||
|
http://snowball.tartarus.org/algorithms/porter/stemmer.html[porter],
|
||||||
|
http://snowball.tartarus.org/algorithms/portuguese/stemmer.html[portuguese],
|
||||||
|
http://snowball.tartarus.org/algorithms/romanian/stemmer.html[romanian],
|
||||||
|
http://snowball.tartarus.org/algorithms/russian/stemmer.html[russian],
|
||||||
|
http://snowball.tartarus.org/algorithms/spanish/stemmer.html[spanish],
|
||||||
|
http://snowball.tartarus.org/algorithms/swedish/stemmer.html[swedish],
|
||||||
|
http://snowball.tartarus.org/algorithms/turkish/stemmer.html[turkish],
|
||||||
|
http://www.medialab.tfe.umu.se/courses/mdm0506a/material/fulltext_ID%3D10049387%26PLACEBO%3DIE.pdf[minimal_english],
|
||||||
|
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fen%2FEnglishPossessiveFilter.html[possessive_english],
|
||||||
|
http://clef.isti.cnr.it/2003/WN_web/22.pdf[light_finish],
|
||||||
|
http://dl.acm.org/citation.cfm?id=1141523[light_french],
|
||||||
|
http://dl.acm.org/citation.cfm?id=318984[minimal_french],
|
||||||
|
http://dl.acm.org/citation.cfm?id=1141523[light_german],
|
||||||
|
http://members.unine.ch/jacques.savoy/clef/morpho.pdf[minimal_german],
|
||||||
|
http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf[hindi],
|
||||||
|
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[light_hungarian],
|
||||||
|
http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf[indonesian],
|
||||||
|
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[light_italian],
|
||||||
|
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[light_portuguese],
|
||||||
|
http://www.inf.ufrgs.br/\~buriol/papers/Orengo_CLEF07.pdf[minimal_portuguese],
|
||||||
|
http://www.inf.ufrgs.br/\~viviane/rslp/index.htm[portuguese],
|
||||||
|
http://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf[light_russian],
|
||||||
|
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[light_spanish],
|
||||||
|
http://clef.isti.cnr.it/2003/WN_web/22.pdf[light_swedish].
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"my_analyzer" : {
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"filter" : ["standard", "lowercase", "my_stemmer"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"filter" : {
|
||||||
|
"my_stemmer" : {
|
||||||
|
"type" : "stemmer",
|
||||||
|
"name" : "light_german"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,33 @@
|
||||||
|
[[analysis-stop-tokenfilter]]
|
||||||
|
=== Stop Token Filter
|
||||||
|
|
||||||
|
A token filter of type `stop` that removes stop words from token
|
||||||
|
streams.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `stop` token filter
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`stopwords` |A list of stop words to use. Defaults to english stop
|
||||||
|
words.
|
||||||
|
|
||||||
|
|`stopwords_path` |A path (either relative to `config` location, or
|
||||||
|
absolute) to a stopwords file configuration. Each stop word should be in
|
||||||
|
its own "line" (separated by a line break). The file must be UTF-8
|
||||||
|
encoded.
|
||||||
|
|
||||||
|
|`enable_position_increments` |Set to `true` if token positions should
|
||||||
|
record the removed stop words, `false` otherwise. Defaults to `true`.
|
||||||
|
|
||||||
|
|`ignore_case` |Set to `true` to lower case all words first. Defaults to
|
||||||
|
`false`.
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
stopwords allow for custom language specific expansion of default
|
||||||
|
stopwords. It follows the `_lang_` notation and supports: arabic,
|
||||||
|
armenian, basque, brazilian, bulgarian, catalan, czech, danish, dutch,
|
||||||
|
english, finnish, french, galician, german, greek, hindi, hungarian,
|
||||||
|
indonesian, italian, norwegian, persian, portuguese, romanian, russian,
|
||||||
|
spanish, swedish, turkish.
|
|
@ -0,0 +1,124 @@
|
||||||
|
[[analysis-synonym-tokenfilter]]
|
||||||
|
=== Synonym Token Filter
|
||||||
|
|
||||||
|
The `synonym` token filter allows to easily handle synonyms during the
|
||||||
|
analysis process. Synonyms are configured using a configuration file.
|
||||||
|
Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"index" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"synonym" : {
|
||||||
|
"tokenizer" : "whitespace",
|
||||||
|
"filter" : ["synonym"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"filter" : {
|
||||||
|
"synonym" : {
|
||||||
|
"type" : "synonym",
|
||||||
|
"synonyms_path" : "analysis/synonym.txt"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The above configures a `synonym` filter, with a path of
|
||||||
|
`analysis/synonym.txt` (relative to the `config` location). The
|
||||||
|
`synonym` analyzer is then configured with the filter. Additional
|
||||||
|
settings are: `ignore_case` (defaults to `false`), and `expand`
|
||||||
|
(defaults to `true`).
|
||||||
|
|
||||||
|
The `tokenizer` parameter controls the tokenizers that will be used to
|
||||||
|
tokenize the synonym, and defaults to the `whitespace` tokenizer.
|
||||||
|
|
||||||
|
As of elasticsearch 0.17.9 two synonym formats are supported: Solr,
|
||||||
|
WordNet.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Solr synonyms
|
||||||
|
|
||||||
|
The following is a sample format of the file:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
# blank lines and lines starting with pound are comments.
|
||||||
|
|
||||||
|
#Explicit mappings match any token sequence on the LHS of "=>"
|
||||||
|
#and replace with all alternatives on the RHS. These types of mappings
|
||||||
|
#ignore the expand parameter in the schema.
|
||||||
|
#Examples:
|
||||||
|
i-pod, i pod => ipod,
|
||||||
|
sea biscuit, sea biscit => seabiscuit
|
||||||
|
|
||||||
|
#Equivalent synonyms may be separated with commas and give
|
||||||
|
#no explicit mapping. In this case the mapping behavior will
|
||||||
|
#be taken from the expand parameter in the schema. This allows
|
||||||
|
#the same synonym file to be used in different synonym handling strategies.
|
||||||
|
#Examples:
|
||||||
|
ipod, i-pod, i pod
|
||||||
|
foozball , foosball
|
||||||
|
universe , cosmos
|
||||||
|
|
||||||
|
# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
|
||||||
|
ipod, i-pod, i pod => ipod, i-pod, i pod
|
||||||
|
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
|
||||||
|
ipod, i-pod, i pod => ipod
|
||||||
|
|
||||||
|
#multiple synonym mapping entries are merged.
|
||||||
|
foo => foo bar
|
||||||
|
foo => baz
|
||||||
|
#is equivalent to
|
||||||
|
foo => foo bar, baz
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
You can also define synonyms for the filter directly in the
|
||||||
|
configuration file (note use of `synonyms` instead of `synonyms_path`):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"filter" : {
|
||||||
|
"synonym" : {
|
||||||
|
"type" : "synonym",
|
||||||
|
"synonyms" : [
|
||||||
|
"i-pod, i pod => ipod",
|
||||||
|
"universe, cosmos"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
However, it is recommended to define large synonyms set in a file using
|
||||||
|
`synonyms_path`.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== WordNet synonyms
|
||||||
|
|
||||||
|
Synonyms based on http://wordnet.princeton.edu/[WordNet] format can be
|
||||||
|
declared using `format`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"filter" : {
|
||||||
|
"synonym" : {
|
||||||
|
"type" : "synonym",
|
||||||
|
"format" : "wordnet",
|
||||||
|
"synonyms" : [
|
||||||
|
"s(100000001,1,'abstain',v,1,0).",
|
||||||
|
"s(100000001,2,'refrain',v,1,0).",
|
||||||
|
"s(100000001,3,'desist',v,1,0)."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Using `synonyms_path` to define WordNet synonyms in a file is supported
|
||||||
|
as well.
|
|
@ -0,0 +1,4 @@
|
||||||
|
[[analysis-trim-tokenfilter]]
|
||||||
|
=== Trim Token Filter
|
||||||
|
|
||||||
|
The `trim` token filter trims the whitespace surrounding a token.
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[analysis-truncate-tokenfilter]]
|
||||||
|
=== Truncate Token Filter
|
||||||
|
|
||||||
|
The `truncate` token filter can be used to truncate tokens into a
|
||||||
|
specific length. This can come in handy with keyword (single token)
|
||||||
|
based mapped fields that are used for sorting in order to reduce memory
|
||||||
|
usage.
|
||||||
|
|
||||||
|
It accepts a `length` parameter which control the number of characters
|
||||||
|
to truncate to, defaults to `10`.
|
|
@ -0,0 +1,7 @@
|
||||||
|
[[analysis-unique-tokenfilter]]
|
||||||
|
=== Unique Token Filter
|
||||||
|
|
||||||
|
The `unique` token filter can be used to only index unique tokens during
|
||||||
|
analysis. By default it is applied on all the token stream. If
|
||||||
|
`only_on_same_position` is set to `true`, it will only remove duplicate
|
||||||
|
tokens on the same position.
|
|
@ -0,0 +1,80 @@
|
||||||
|
[[analysis-word-delimiter-tokenfilter]]
|
||||||
|
=== Word Delimiter Token Filter
|
||||||
|
|
||||||
|
Named `word_delimiter`, it Splits words into subwords and performs
|
||||||
|
optional transformations on subword groups. Words are split into
|
||||||
|
subwords with the following rules:
|
||||||
|
|
||||||
|
* split on intra-word delimiters (by default, all non alpha-numeric
|
||||||
|
characters).
|
||||||
|
* "Wi-Fi" -> "Wi", "Fi"
|
||||||
|
* split on case transitions: "PowerShot" -> "Power", "Shot"
|
||||||
|
* split on letter-number transitions: "SD500" -> "SD", "500"
|
||||||
|
* leading and trailing intra-word delimiters on each subword are
|
||||||
|
ignored: "//hello---there, 'dude'" -> "hello", "there", "dude"
|
||||||
|
* trailing "'s" are removed for each subword: "O'Neil's" -> "O", "Neil"
|
||||||
|
|
||||||
|
Parameters include:
|
||||||
|
|
||||||
|
`generate_word_parts`::
|
||||||
|
If `true` causes parts of words to be
|
||||||
|
generated: "PowerShot" => "Power" "Shot". Defaults to `true`.
|
||||||
|
|
||||||
|
`generate_number_parts`::
|
||||||
|
If `true` causes number subwords to be
|
||||||
|
generated: "500-42" => "500" "42". Defaults to `true`.
|
||||||
|
|
||||||
|
`catenate_words`::
|
||||||
|
If `true` causes maximum runs of word parts to be
|
||||||
|
catenated: "wi-fi" => "wifi". Defaults to `false`.
|
||||||
|
|
||||||
|
`catenate_numbers`::
|
||||||
|
If `true` causes maximum runs of number parts to
|
||||||
|
be catenated: "500-42" => "50042". Defaults to `false`.
|
||||||
|
|
||||||
|
`catenate_all`::
|
||||||
|
If `true` causes all subword parts to be catenated:
|
||||||
|
"wi-fi-4000" => "wifi4000". Defaults to `false`.
|
||||||
|
|
||||||
|
`split_on_case_change`::
|
||||||
|
If `true` causes "PowerShot" to be two tokens;
|
||||||
|
("Power-Shot" remains two parts regards). Defaults to `true`.
|
||||||
|
|
||||||
|
`preserve_original`::
|
||||||
|
If `true` includes original words in subwords:
|
||||||
|
"500-42" => "500-42" "500" "42". Defaults to `false`.
|
||||||
|
|
||||||
|
`split_on_numerics`::
|
||||||
|
If `true` causes "j2se" to be three tokens; "j"
|
||||||
|
"2" "se". Defaults to `true`.
|
||||||
|
|
||||||
|
`stem_english_possessive`::
|
||||||
|
If `true` causes trailing "'s" to be
|
||||||
|
removed for each subword: "O'Neil's" => "O", "Neil". Defaults to `true`.
|
||||||
|
|
||||||
|
Advance settings include:
|
||||||
|
|
||||||
|
`protected_words`::
|
||||||
|
A list of protected words from being delimiter.
|
||||||
|
Either an array, or also can set `protected_words_path` which resolved
|
||||||
|
to a file configured with protected words (one on each line).
|
||||||
|
Automatically resolves to `config/` based location if exists.
|
||||||
|
|
||||||
|
`type_table`::
|
||||||
|
A custom type mapping table, for example (when configured
|
||||||
|
using `type_table_path`):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
# Map the $, %, '.', and ',' characters to DIGIT
|
||||||
|
# This might be useful for financial data.
|
||||||
|
$ => DIGIT
|
||||||
|
% => DIGIT
|
||||||
|
. => DIGIT
|
||||||
|
\\u002C => DIGIT
|
||||||
|
|
||||||
|
# in some cases you might not want to split on ZWJ
|
||||||
|
# this also tests the case where we need a bigger byte[]
|
||||||
|
# see http://en.wikipedia.org/wiki/Zero-width_joiner
|
||||||
|
\\u200D => ALPHANUM
|
||||||
|
--------------------------------------------------
|
30
docs/reference/analysis/tokenizers.asciidoc
Normal file
30
docs/reference/analysis/tokenizers.asciidoc
Normal file
|
@ -0,0 +1,30 @@
|
||||||
|
[[analysis-tokenizers]]
|
||||||
|
== Tokenizers
|
||||||
|
|
||||||
|
Tokenizers are used to break a string down into a stream of terms
|
||||||
|
or tokens. A simple tokenizer might split the string up into terms
|
||||||
|
wherever it encounters whitespace or punctuation.
|
||||||
|
|
||||||
|
Elasticsearch has a number of built in tokenizers which can be
|
||||||
|
used to build <<analysis-custom-analyzer,custom analyzers>>.
|
||||||
|
|
||||||
|
include::tokenizers/standard-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/edgengram-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/keyword-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/letter-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/lowercase-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/ngram-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/whitespace-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/pattern-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/uaxurlemail-tokenizer.asciidoc[]
|
||||||
|
|
||||||
|
include::tokenizers/pathhierarchy-tokenizer.asciidoc[]
|
||||||
|
|
|
@ -0,0 +1,80 @@
|
||||||
|
[[analysis-edgengram-tokenizer]]
|
||||||
|
=== Edge NGram Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `edgeNGram`.
|
||||||
|
|
||||||
|
This tokenizer is very similar to `nGram` but only keeps n-grams which
|
||||||
|
start at the beginning of a token.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `edgeNGram` tokenizer
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description |Default value
|
||||||
|
|`min_gram` |Minimum size in codepoints of a single n-gram |`1`.
|
||||||
|
|
||||||
|
|`max_gram` |Maximum size in codepoints of a single n-gram |`2`.
|
||||||
|
|
||||||
|
|`token_chars` |(Since `0.90.2`) Characters classes to keep in the
|
||||||
|
tokens, Elasticsearch will split on characters that don't belong to any
|
||||||
|
of these classes. |`[]` (Keep all characters)
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
|
||||||
|
`token_chars` accepts the following character classes:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`letter`:: for example `a`, `b`, `ï` or `京`
|
||||||
|
`digit`:: for example `3` or `7`
|
||||||
|
`whitespace`:: for example `" "` or `"\n"`
|
||||||
|
`punctuation`:: for example `!` or `"`
|
||||||
|
`symbol`:: for example `$` or `â`
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Example
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XPUT 'localhost:9200/test' -d '
|
||||||
|
{
|
||||||
|
"settings" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"my_edge_ngram_analyzer" : {
|
||||||
|
"tokenizer" : "my_edge_ngram_tokenizer"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"tokenizer" : {
|
||||||
|
"my_edge_ngram_tokenizer" : {
|
||||||
|
"type" : "edgeNGram",
|
||||||
|
"min_gram" : "2",
|
||||||
|
"max_gram" : "5",
|
||||||
|
"token_chars": [ "letter", "digit" ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
|
||||||
|
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'FC Schalke 04'
|
||||||
|
# FC, Sc, Sch, Scha, Schal, 04
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== `side` deprecated
|
||||||
|
|
||||||
|
There used to be a @side@ parameter up to `0.90.1` but it is now deprecated. In
|
||||||
|
order to emulate the behavior of `"side" : "BACK"` a
|
||||||
|
<<analysis-reverse-tokenfilter,`reverse` token filter>> should be used together
|
||||||
|
with the <<analysis-edgengram-tokenfilter,`edgeNGram` token filter>>. The
|
||||||
|
`edgeNGram` filter must be enclosed in `reverse` filters like this:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"filter" : ["reverse", "edgeNGram", "reverse"]
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
which essentially reverses the token, builds front `EdgeNGrams` and reverses
|
||||||
|
the ngram again. This has the same effect as the previous `"side" : "BACK"` setting.
|
||||||
|
|
|
@ -0,0 +1,15 @@
|
||||||
|
[[analysis-keyword-tokenizer]]
|
||||||
|
=== Keyword Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `keyword` that emits the entire input as a single
|
||||||
|
input.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `keyword` tokenizer
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`buffer_size` |The term buffer size. Defaults to `256`.
|
||||||
|
|=======================================================
|
||||||
|
|
|
@ -0,0 +1,7 @@
|
||||||
|
[[analysis-letter-tokenizer]]
|
||||||
|
=== Letter Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `letter` that divides text at non-letters. That's to
|
||||||
|
say, it defines tokens as maximal strings of adjacent letters. Note,
|
||||||
|
this does a decent job for most European languages, but does a terrible
|
||||||
|
job for some Asian languages, where words are not separated by spaces.
|
|
@ -0,0 +1,15 @@
|
||||||
|
[[analysis-lowercase-tokenizer]]
|
||||||
|
=== Lowercase Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `lowercase` that performs the function of
|
||||||
|
<<analysis-letter-tokenizer,Letter
|
||||||
|
Tokenizer>> and
|
||||||
|
<<analysis-lowercase-tokenfilter,Lower
|
||||||
|
Case Token Filter>> together. It divides text at non-letters and converts
|
||||||
|
them to lower case. While it is functionally equivalent to the
|
||||||
|
combination of
|
||||||
|
<<analysis-letter-tokenizer,Letter
|
||||||
|
Tokenizer>> and
|
||||||
|
<<analysis-lowercase-tokenizer,Lower
|
||||||
|
Case Token Filter>>, there is a performance advantage to doing the two
|
||||||
|
tasks at once, hence this (redundant) implementation.
|
57
docs/reference/analysis/tokenizers/ngram-tokenizer.asciidoc
Normal file
57
docs/reference/analysis/tokenizers/ngram-tokenizer.asciidoc
Normal file
|
@ -0,0 +1,57 @@
|
||||||
|
[[analysis-ngram-tokenizer]]
|
||||||
|
=== NGram Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `nGram`.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `nGram` tokenizer type:
|
||||||
|
|
||||||
|
[cols="<,<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description |Default value
|
||||||
|
|`min_gram` |Minimum size in codepoints of a single n-gram |`1`.
|
||||||
|
|
||||||
|
|`max_gram` |Maximum size in codepoints of a single n-gram |`2`.
|
||||||
|
|
||||||
|
|`token_chars` |(Since `0.90.2`) Characters classes to keep in the
|
||||||
|
tokens, Elasticsearch will split on characters that don't belong to any
|
||||||
|
of these classes. |`[]` (Keep all characters)
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
|
`token_chars` accepts the following character classes:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`letter`:: for example `a`, `b`, `ï` or `京`
|
||||||
|
`digit`:: for example `3` or `7`
|
||||||
|
`whitespace`:: for example `" "` or `"\n"`
|
||||||
|
`punctuation`:: for example `!` or `"`
|
||||||
|
`symbol`:: for example `$` or `â`
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Example
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XPUT 'localhost:9200/test' -d '
|
||||||
|
{
|
||||||
|
"settings" : {
|
||||||
|
"analysis" : {
|
||||||
|
"analyzer" : {
|
||||||
|
"my_ngram_analyzer" : {
|
||||||
|
"tokenizer" : "my_ngram_tokenizer"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"tokenizer" : {
|
||||||
|
"my_ngram_tokenizer" : {
|
||||||
|
"type" : "nGram",
|
||||||
|
"min_gram" : "2",
|
||||||
|
"max_gram" : "3",
|
||||||
|
"token_chars": [ "letter", "digit" ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
|
||||||
|
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'
|
||||||
|
# FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04
|
||||||
|
--------------------------------------------------
|
|
@ -0,0 +1,32 @@
|
||||||
|
[[analysis-pathhierarchy-tokenizer]]
|
||||||
|
=== Path Hierarchy Tokenizer
|
||||||
|
|
||||||
|
The `path_hierarchy` tokenizer takes something like this:
|
||||||
|
|
||||||
|
-------------------------
|
||||||
|
/something/something/else
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
And produces tokens:
|
||||||
|
|
||||||
|
-------------------------
|
||||||
|
/something
|
||||||
|
/something/something
|
||||||
|
/something/something/else
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`delimiter` |The character delimiter to use, defaults to `/`.
|
||||||
|
|
||||||
|
|`replacement` |An optional replacement character to use. Defaults to
|
||||||
|
the `delimiter`.
|
||||||
|
|
||||||
|
|`buffer_size` |The buffer size to use, defaults to `1024`.
|
||||||
|
|
||||||
|
|`reverse` |Generates tokens in reverse order, defaults to `false`.
|
||||||
|
|
||||||
|
|`skip` |Controls initial tokens to skip, defaults to `0`.
|
||||||
|
|=======================================================================
|
||||||
|
|
|
@ -0,0 +1,29 @@
|
||||||
|
[[analysis-pattern-tokenizer]]
|
||||||
|
=== Pattern Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `pattern` that can flexibly separate text into terms
|
||||||
|
via a regular expression. Accepts the following settings:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`pattern` |The regular expression pattern, defaults to `\\W+`.
|
||||||
|
|`flags` |The regular expression flags.
|
||||||
|
|`group` |Which group to extract into tokens. Defaults to `-1` (split).
|
||||||
|
|======================================================================
|
||||||
|
|
||||||
|
*IMPORTANT*: The regular expression should match the *token separators*,
|
||||||
|
not the tokens themselves.
|
||||||
|
|
||||||
|
`group` set to `-1` (the default) is equivalent to "split". Using group
|
||||||
|
>= 0 selects the matching group as the token. For example, if you have:
|
||||||
|
|
||||||
|
------------------------
|
||||||
|
pattern = \\'([^\']+)\\'
|
||||||
|
group = 0
|
||||||
|
input = aaa 'bbb' 'ccc'
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks).
|
||||||
|
With the same input but using group=1, the output would be: bbb and ccc
|
||||||
|
(no ' marks).
|
|
@ -0,0 +1,18 @@
|
||||||
|
[[analysis-standard-tokenizer]]
|
||||||
|
=== Standard Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `standard` providing grammar based tokenizer that is
|
||||||
|
a good tokenizer for most European language documents. The tokenizer
|
||||||
|
implements the Unicode Text Segmentation algorithm, as specified in
|
||||||
|
http://unicode.org/reports/tr29/[Unicode Standard Annex #29].
|
||||||
|
|
||||||
|
The following are settings that can be set for a `standard` tokenizer
|
||||||
|
type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`max_token_length` |The maximum token length. If a token is seen that
|
||||||
|
exceeds this length then it is discarded. Defaults to `255`.
|
||||||
|
|=======================================================================
|
||||||
|
|
|
@ -0,0 +1,16 @@
|
||||||
|
[[analysis-uaxurlemail-tokenizer]]
|
||||||
|
=== UAX Email URL Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `uax_url_email` which works exactly like the
|
||||||
|
`standard` tokenizer, but tokenizes emails and urls as single tokens.
|
||||||
|
|
||||||
|
The following are settings that can be set for a `uax_url_email`
|
||||||
|
tokenizer type:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Setting |Description
|
||||||
|
|`max_token_length` |The maximum token length. If a token is seen that
|
||||||
|
exceeds this length then it is discarded. Defaults to `255`.
|
||||||
|
|=======================================================================
|
||||||
|
|
|
@ -0,0 +1,4 @@
|
||||||
|
[[analysis-whitespace-tokenizer]]
|
||||||
|
=== Whitespace Tokenizer
|
||||||
|
|
||||||
|
A tokenizer of type `whitespace` that divides text at whitespace.
|
46
docs/reference/cluster.asciidoc
Normal file
46
docs/reference/cluster.asciidoc
Normal file
|
@ -0,0 +1,46 @@
|
||||||
|
[[cluster]]
|
||||||
|
= Cluster APIs
|
||||||
|
|
||||||
|
[partintro]
|
||||||
|
--
|
||||||
|
["float",id="cluster-nodes"]
|
||||||
|
== Nodes
|
||||||
|
|
||||||
|
Most cluster level APIs allow to specify which nodes to execute on (for
|
||||||
|
example, getting the node stats for a node). Nodes can be identified in
|
||||||
|
the APIs either using their internal node id, the node name, address,
|
||||||
|
custom attributes, or just the `_local` node receiving the request. For
|
||||||
|
example, here are some sample executions of nodes info:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
# Local
|
||||||
|
curl localhost:9200/_cluster/nodes/_local
|
||||||
|
# Address
|
||||||
|
curl localhost:9200/_cluster/nodes/10.0.0.3,10.0.0.4
|
||||||
|
curl localhost:9200/_cluster/nodes/10.0.0.*
|
||||||
|
# Names
|
||||||
|
curl localhost:9200/_cluster/nodes/node_name_goes_here
|
||||||
|
curl localhost:9200/_cluster/nodes/node_name_goes_*
|
||||||
|
# Attributes (set something like node.rack: 2 in the config)
|
||||||
|
curl localhost:9200/_cluster/nodes/rack:2
|
||||||
|
curl localhost:9200/_cluster/nodes/ra*:2
|
||||||
|
curl localhost:9200/_cluster/nodes/ra*:2*
|
||||||
|
--------------------------------------------------
|
||||||
|
--
|
||||||
|
|
||||||
|
include::cluster/health.asciidoc[]
|
||||||
|
|
||||||
|
include::cluster/state.asciidoc[]
|
||||||
|
|
||||||
|
include::cluster/reroute.asciidoc[]
|
||||||
|
|
||||||
|
include::cluster/update-settings.asciidoc[]
|
||||||
|
|
||||||
|
include::cluster/nodes-stats.asciidoc[]
|
||||||
|
|
||||||
|
include::cluster/nodes-info.asciidoc[]
|
||||||
|
|
||||||
|
include::cluster/nodes-hot-threads.asciidoc[]
|
||||||
|
|
||||||
|
include::cluster/nodes-shutdown.asciidoc[]
|
86
docs/reference/cluster/health.asciidoc
Normal file
86
docs/reference/cluster/health.asciidoc
Normal file
|
@ -0,0 +1,86 @@
|
||||||
|
[[cluster-health]]
|
||||||
|
== Cluster Health
|
||||||
|
|
||||||
|
The cluster health API allows to get a very simple status on the health
|
||||||
|
of the cluster.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
|
||||||
|
{
|
||||||
|
"cluster_name" : "testcluster",
|
||||||
|
"status" : "green",
|
||||||
|
"timed_out" : false,
|
||||||
|
"number_of_nodes" : 2,
|
||||||
|
"number_of_data_nodes" : 2,
|
||||||
|
"active_primary_shards" : 5,
|
||||||
|
"active_shards" : 10,
|
||||||
|
"relocating_shards" : 0,
|
||||||
|
"initializing_shards" : 0,
|
||||||
|
"unassigned_shards" : 0
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The API can also be executed against one or more indices to get just the
|
||||||
|
specified indices health:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XGET 'http://localhost:9200/_cluster/health/test1,test2'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The cluster health status is: `green`, `yellow` or `red`. On the shard
|
||||||
|
level, a `red` status indicates that the specific shard is not allocated
|
||||||
|
in the cluster, `yellow` means that the primary shard is allocated but
|
||||||
|
replicas are not, and `green` means that all shards are allocated. The
|
||||||
|
index level status is controlled by the worst shard status. The cluster
|
||||||
|
status is controlled by the worst index status.
|
||||||
|
|
||||||
|
One of the main benefits of the API is the ability to wait until the
|
||||||
|
cluster reaches a certain high water-mark health level. For example, the
|
||||||
|
following will wait till the cluster reaches the `yellow` level for 50
|
||||||
|
seconds (if it reaches the `green` or `yellow` status beforehand, it
|
||||||
|
will return):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XGET 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=50s'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Request Parameters
|
||||||
|
|
||||||
|
The cluster health API accepts the following request parameters:
|
||||||
|
|
||||||
|
`level`::
|
||||||
|
Can be one of `cluster`, `indices` or `shards`. Controls the
|
||||||
|
details level of the health information returned. Defaults to `cluster`.
|
||||||
|
|
||||||
|
`wait_for_status`::
|
||||||
|
One of `green`, `yellow` or `red`. Will wait (until
|
||||||
|
the timeout provided) until the status of the cluster changes to the one
|
||||||
|
provided. By default, will not wait for any status.
|
||||||
|
|
||||||
|
`wait_for_relocating_shards`::
|
||||||
|
A number controlling to how many relocating
|
||||||
|
shards to wait for. Usually will be `0` to indicate to wait till all
|
||||||
|
relocation have happened. Defaults to not to wait.
|
||||||
|
|
||||||
|
`wait_for_nodes`::
|
||||||
|
The request waits until the specified number `N` of
|
||||||
|
nodes is available. It also accepts `>=N`, `<=N`, `>N` and `<N`.
|
||||||
|
Alternatively, it is possible to use `ge(N)`, `le(N)`, `gt(N)` and
|
||||||
|
`lt(N)` notation.
|
||||||
|
|
||||||
|
`timeout`::
|
||||||
|
A time based parameter controlling how long to wait if one of
|
||||||
|
the wait_for_XXX are provided. Defaults to `30s`.
|
||||||
|
|
||||||
|
|
||||||
|
The following is an example of getting the cluster health at the
|
||||||
|
`shards` level:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XGET 'http://localhost:9200/_cluster/health/twitter?level=shards'
|
||||||
|
--------------------------------------------------
|
16
docs/reference/cluster/nodes-hot-threads.asciidoc
Normal file
16
docs/reference/cluster/nodes-hot-threads.asciidoc
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
[[cluster-nodes-hot-threads]]
|
||||||
|
== Nodes hot_threads
|
||||||
|
|
||||||
|
An API allowing to get the current hot threads on each node in the
|
||||||
|
cluster. Endpoints are `/_nodes/hot_threads`, and
|
||||||
|
`/_nodes/{nodesIds}/hot_threads`. This API is experimental.
|
||||||
|
|
||||||
|
The output is plain text with a breakdown of each node's top hot
|
||||||
|
threads. Parameters allowed are:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`threads`:: number of hot threads to provide, defaults to 3.
|
||||||
|
`interval`:: the interval to do the second sampling of threads.
|
||||||
|
Defaults to 500ms.
|
||||||
|
`type`:: The type to sample, defaults to cpu, but supports wait and
|
||||||
|
block to see hot threads that are in wait or block state.
|
98
docs/reference/cluster/nodes-info.asciidoc
Normal file
98
docs/reference/cluster/nodes-info.asciidoc
Normal file
|
@ -0,0 +1,98 @@
|
||||||
|
[[cluster-nodes-info]]
|
||||||
|
== Nodes Info
|
||||||
|
|
||||||
|
The cluster nodes info API allows to retrieve one or more (or all) of
|
||||||
|
the cluster nodes information.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XGET 'http://localhost:9200/_cluster/nodes'
|
||||||
|
curl -XGET 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2'
|
||||||
|
|
||||||
|
# Shorter Format
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes'
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The first command retrieves information of all the nodes in the cluster.
|
||||||
|
The second command selectively retrieves nodes information of only
|
||||||
|
`nodeId1` and `nodeId2`. All the nodes selective options are explained
|
||||||
|
<<cluster-nodes,here>>.
|
||||||
|
|
||||||
|
By default, it just returns the attributes and core settings for a node.
|
||||||
|
It also allows to get information on `settings`, `os`, `process`, `jvm`,
|
||||||
|
`thread_pool`, `network`, `transport`, `http` and `plugin`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes?os=true&process=true'
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/?os=true&process=true'
|
||||||
|
|
||||||
|
# Or, specific type endpoint:
|
||||||
|
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/process'
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/process'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The `all` flag can be set to return all the information.
|
||||||
|
|
||||||
|
`plugin` - if set, the result will contain details about the loaded
|
||||||
|
plugins per node:
|
||||||
|
|
||||||
|
* `name`: plugin name
|
||||||
|
* `description`: plugin description if any
|
||||||
|
* `site`: `true` if the plugin is a site plugin
|
||||||
|
* `jvm`: `true` if the plugin is a plugin running in the JVM
|
||||||
|
* `url`: URL if the plugin is a site plugin
|
||||||
|
|
||||||
|
The result will look similar to:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"ok" : true,
|
||||||
|
"cluster_name" : "test-cluster-MacBook-Air-de-David.local",
|
||||||
|
"nodes" : {
|
||||||
|
"hJLXmY_NTrCytiIMbX4_1g" : {
|
||||||
|
"name" : "node4",
|
||||||
|
"transport_address" : "inet[/172.18.58.139:9303]",
|
||||||
|
"hostname" : "MacBook-Air-de-David.local",
|
||||||
|
"version" : "0.90.0.Beta2-SNAPSHOT",
|
||||||
|
"http_address" : "inet[/172.18.58.139:9203]",
|
||||||
|
"plugins" : [ {
|
||||||
|
"name" : "test-plugin",
|
||||||
|
"description" : "test-plugin description",
|
||||||
|
"site" : true,
|
||||||
|
"jvm" : false
|
||||||
|
}, {
|
||||||
|
"name" : "test-no-version-plugin",
|
||||||
|
"description" : "test-no-version-plugin description",
|
||||||
|
"site" : true,
|
||||||
|
"jvm" : false
|
||||||
|
}, {
|
||||||
|
"name" : "dummy",
|
||||||
|
"description" : "No description found for dummy.",
|
||||||
|
"url" : "/_plugin/dummy/",
|
||||||
|
"site" : false,
|
||||||
|
"jvm" : true
|
||||||
|
} ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
if your `plugin` data is subject to change use
|
||||||
|
`plugins.info_refresh_interval` to change or disable the caching
|
||||||
|
interval:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
# Change cache to 20 seconds
|
||||||
|
plugins.info_refresh_interval: 20s
|
||||||
|
|
||||||
|
# Infinite cache
|
||||||
|
plugins.info_refresh_interval: -1
|
||||||
|
|
||||||
|
# Disable cache
|
||||||
|
plugins.info_refresh_interval: 0
|
||||||
|
--------------------------------------------------
|
56
docs/reference/cluster/nodes-shutdown.asciidoc
Normal file
56
docs/reference/cluster/nodes-shutdown.asciidoc
Normal file
|
@ -0,0 +1,56 @@
|
||||||
|
[[cluster-nodes-shutdown]]
|
||||||
|
== Nodes Shutdown
|
||||||
|
|
||||||
|
The nodes shutdown API allows to shutdown one or more (or all) nodes in
|
||||||
|
the cluster. Here is an example of shutting the `_local` node the
|
||||||
|
request is directed to:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Specific node(s) can be shutdown as well using their respective node ids
|
||||||
|
(or other selective options as explained
|
||||||
|
<<cluster-nodes,here>> .):
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The master (of the cluster) can also be shutdown using:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Finally, all nodes can be shutdown using one of the options below:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XPOST 'http://localhost:9200/_shutdown'
|
||||||
|
|
||||||
|
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'
|
||||||
|
|
||||||
|
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_all/_shutdown'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Delay
|
||||||
|
|
||||||
|
By default, the shutdown will be executed after a 1 second delay (`1s`).
|
||||||
|
The delay can be customized by setting the `delay` parameter in a time
|
||||||
|
value format. For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown?delay=10s'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Disable Shutdown
|
||||||
|
|
||||||
|
The shutdown API can be disabled by setting `action.disable_shutdown` in
|
||||||
|
the node configuration.
|
100
docs/reference/cluster/nodes-stats.asciidoc
Normal file
100
docs/reference/cluster/nodes-stats.asciidoc
Normal file
|
@ -0,0 +1,100 @@
|
||||||
|
[[cluster-nodes-stats]]
|
||||||
|
== Nodes Stats
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Nodes statistics
|
||||||
|
|
||||||
|
The cluster nodes stats API allows to retrieve one or more (or all) of
|
||||||
|
the cluster nodes statistics.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XGET 'http://localhost:9200/_cluster/nodes/stats'
|
||||||
|
curl -XGET 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/stats'
|
||||||
|
|
||||||
|
# simplified
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/stats'
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/stats'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The first command retrieves stats of all the nodes in the cluster. The
|
||||||
|
second command selectively retrieves nodes stats of only `nodeId1` and
|
||||||
|
`nodeId2`. All the nodes selective options are explained
|
||||||
|
<<cluster-nodes,here>>.
|
||||||
|
|
||||||
|
By default, `indices` stats are returned. With options for `indices`,
|
||||||
|
`os`, `process`, `jvm`, `network`, `transport`, `http`, `fs`, and
|
||||||
|
`thread_pool`. For example:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`indices`::
|
||||||
|
Indices stats about size, document count, indexing and
|
||||||
|
deletion times, search times, field cache size , merges and flushes
|
||||||
|
|
||||||
|
`fs`::
|
||||||
|
File system information, data path, free disk space, read/write
|
||||||
|
stats
|
||||||
|
|
||||||
|
`http`::
|
||||||
|
HTTP connection information
|
||||||
|
|
||||||
|
`jvm`::
|
||||||
|
JVM stats, memory pool information, garbage collection, buffer
|
||||||
|
pools
|
||||||
|
|
||||||
|
`network`::
|
||||||
|
TCP information
|
||||||
|
|
||||||
|
`os`::
|
||||||
|
Operating system stats, load average, cpu, mem, swap
|
||||||
|
|
||||||
|
`process`::
|
||||||
|
Process statistics, memory consumption, cpu usage, open
|
||||||
|
file descriptors
|
||||||
|
|
||||||
|
`thread_pool`::
|
||||||
|
Statistics about each thread pool, including current
|
||||||
|
size, queue and rejected tasks
|
||||||
|
|
||||||
|
`transport`::
|
||||||
|
Transport statistics about sent and received bytes in
|
||||||
|
cluster communication
|
||||||
|
|
||||||
|
`clear`::
|
||||||
|
Clears all the flags (first). Useful, if you only want to
|
||||||
|
retrieve specific stats.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
# return indices and os
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/stats?os=true'
|
||||||
|
# return just os and process
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/stats?clear=true&os=true&process=true'
|
||||||
|
# specific type endpoint
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/process/stats'
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/process/stats'
|
||||||
|
# or, if you like the other way
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/stats/process'
|
||||||
|
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/stats/process'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The `all` flag can be set to return all the stats.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Field data statistics
|
||||||
|
|
||||||
|
From 0.90, you can get information about field data memory usage on node
|
||||||
|
level or on index level.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
# Node Stats
|
||||||
|
curl localhost:9200/_nodes/stats/indices/fielddata/field1,field2?pretty
|
||||||
|
|
||||||
|
# Indices Stat
|
||||||
|
curl localhost:9200/_stats/fielddata/field1,field2?pretty
|
||||||
|
|
||||||
|
# You can use wildcards for field names
|
||||||
|
curl localhost:9200/_stats/fielddata/field*?pretty
|
||||||
|
curl localhost:9200/_nodes/stats/indices/fielddata/field*?pretty
|
||||||
|
--------------------------------------------------
|
68
docs/reference/cluster/reroute.asciidoc
Normal file
68
docs/reference/cluster/reroute.asciidoc
Normal file
|
@ -0,0 +1,68 @@
|
||||||
|
[[cluster-reroute]]
|
||||||
|
== Cluster Reroute
|
||||||
|
|
||||||
|
The reroute command allows to explicitly execute a cluster reroute
|
||||||
|
allocation command including specific commands. For example, a shard can
|
||||||
|
be moved from one node to another explicitly, an allocation can be
|
||||||
|
canceled, or an unassigned shard can be explicitly allocated on a
|
||||||
|
specific node.
|
||||||
|
|
||||||
|
Here is a short example of how a simple reroute API call:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
|
||||||
|
"commands" : [ {
|
||||||
|
"move" :
|
||||||
|
{
|
||||||
|
"index" : "test", "shard" : 0,
|
||||||
|
"from_node" : "node1", "to_node" : "node2"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"allocate" : {
|
||||||
|
"index" : "test", "shard" : 1, "node" : "node3"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
An important aspect to remember is the fact that once when an allocation
|
||||||
|
occurs, the cluster will aim at re-balancing its state back to an even
|
||||||
|
state. For example, if the allocation includes moving a shard from
|
||||||
|
`node1` to `node2`, in an `even` state, then another shard will be moved
|
||||||
|
from `node2` to `node1` to even things out.
|
||||||
|
|
||||||
|
The cluster can be set to disable allocations, which means that only the
|
||||||
|
explicitly allocations will be performed. Obviously, only once all
|
||||||
|
commands has been applied, the cluster will aim to be re-balance its
|
||||||
|
state.
|
||||||
|
|
||||||
|
Another option is to run the commands in `dry_run` (as a URI flag, or in
|
||||||
|
the request body). This will cause the commands to apply to the current
|
||||||
|
cluster state, and return the resulting cluster after the commands (and
|
||||||
|
re-balancing) has been applied.
|
||||||
|
|
||||||
|
The commands supported are:
|
||||||
|
|
||||||
|
`move`::
|
||||||
|
Move a started shard from one node to another node. Accepts
|
||||||
|
`index` and `shard` for index name and shard number, `from_node` for the
|
||||||
|
node to move the shard `from`, and `to_node` for the node to move the
|
||||||
|
shard to.
|
||||||
|
|
||||||
|
`cancel`::
|
||||||
|
Cancel allocation of a shard (or recovery). Accepts `index`
|
||||||
|
and `shard` for index name and shard number, and `node` for the node to
|
||||||
|
cancel the shard allocation on. It also accepts `allow_primary` flag to
|
||||||
|
explicitly specify that it is allowed to cancel allocation for a primary
|
||||||
|
shard.
|
||||||
|
|
||||||
|
`allocate`::
|
||||||
|
Allocate an unassigned shard to a node. Accepts the
|
||||||
|
`index` and `shard` for index name and shard number, and `node` to
|
||||||
|
allocate the shard to. It also accepts `allow_primary` flag to
|
||||||
|
explicitly specify that it is allowed to explicitly allocate a primary
|
||||||
|
shard (might result in data loss).
|
48
docs/reference/cluster/state.asciidoc
Normal file
48
docs/reference/cluster/state.asciidoc
Normal file
|
@ -0,0 +1,48 @@
|
||||||
|
[[cluster-state]]
|
||||||
|
== Cluster State
|
||||||
|
|
||||||
|
The cluster state API allows to get a comprehensive state information of
|
||||||
|
the whole cluster.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XGET 'http://localhost:9200/_cluster/state'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
By default, the cluster state request is routed to the master node, to
|
||||||
|
ensure that the latest cluster state is returned.
|
||||||
|
For debugging purposes, you can retrieve the cluster state local to a
|
||||||
|
particular node by adding `local=true` to the query string.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Response Filters
|
||||||
|
|
||||||
|
It is possible to filter the cluster state response using the following
|
||||||
|
REST parameters:
|
||||||
|
|
||||||
|
`filter_nodes`::
|
||||||
|
Set to `true` to filter out the `nodes` part of the
|
||||||
|
response.
|
||||||
|
|
||||||
|
`filter_routing_table`::
|
||||||
|
Set to `true` to filter out the `routing_table`
|
||||||
|
part of the response.
|
||||||
|
|
||||||
|
`filter_metadata`::
|
||||||
|
Set to `true` to filter out the `metadata` part of the
|
||||||
|
response.
|
||||||
|
|
||||||
|
`filter_blocks`::
|
||||||
|
Set to `true` to filter out the `blocks` part of the
|
||||||
|
response.
|
||||||
|
|
||||||
|
`filter_indices`::
|
||||||
|
When not filtering metadata, a comma separated list of
|
||||||
|
indices to include in the response.
|
||||||
|
|
||||||
|
Example follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ curl -XGET 'http://localhost:9200/_cluster/state?filter_nodes=true'
|
||||||
|
--------------------------------------------------
|
198
docs/reference/cluster/update-settings.asciidoc
Normal file
198
docs/reference/cluster/update-settings.asciidoc
Normal file
|
@ -0,0 +1,198 @@
|
||||||
|
[[cluster-update-settings]]
|
||||||
|
== Cluster Update Settings
|
||||||
|
|
||||||
|
Allows to update cluster wide specific settings. Settings updated can
|
||||||
|
either be persistent (applied cross restarts) or transient (will not
|
||||||
|
survive a full cluster restart). Here is an example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XPUT localhost:9200/_cluster/settings -d '{
|
||||||
|
"persistent" : {
|
||||||
|
"discovery.zen.minimum_master_nodes" : 2
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Or:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XPUT localhost:9200/_cluster/settings -d '{
|
||||||
|
"transient" : {
|
||||||
|
"discovery.zen.minimum_master_nodes" : 2
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The cluster responds with the settings updated. So the response for the
|
||||||
|
last example will be:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"persistent" : {},
|
||||||
|
"transient" : {
|
||||||
|
"discovery.zen.minimum_master_nodes" : "2"
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Cluster wide settings can be returned using:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
curl -XGET localhost:9200/_cluster/settings
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
There is a specific list of settings that can be updated, those include:
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Cluster settings
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Routing allocation
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Awareness
|
||||||
|
|
||||||
|
`cluster.routing.allocation.awareness.attributes`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.awareness.force.*`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Balanced Shards
|
||||||
|
|
||||||
|
`cluster.routing.allocation.balance.shard`::
|
||||||
|
Defines the weight factor for shards allocated on a node
|
||||||
|
(float). Defaults to `0.45f`.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.balance.index`::
|
||||||
|
Defines a factor to the number of shards per index allocated
|
||||||
|
on a specific node (float). Defaults to `0.5f`.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.balance.primary`::
|
||||||
|
defines a weight factor for the number of primaries of a specific index
|
||||||
|
allocated on a node (float). `0.05f`.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.balance.threshold`::
|
||||||
|
minimal optimization value of operations that should be performed (non
|
||||||
|
negative float). Defaults to `1.0f`.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Concurrent Rebalance
|
||||||
|
|
||||||
|
`cluster.routing.allocation.cluster_concurrent_rebalance`::
|
||||||
|
Allow to control how many concurrent rebalancing of shards are
|
||||||
|
allowed cluster wide, and default it to `2` (integer). `-1` for
|
||||||
|
unlimited. See also <<modules-cluster>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Disable allocation
|
||||||
|
|
||||||
|
`cluster.routing.allocation.disable_allocation`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.disable_replica_allocation`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.disable_new_allocation`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Throttling allocation
|
||||||
|
|
||||||
|
`cluster.routing.allocation.node_initial_primaries_recoveries`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.node_concurrent_recoveries`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
===== Filter allocation
|
||||||
|
|
||||||
|
`cluster.routing.allocation.include.*`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.exclude.*`::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
`cluster.routing.allocation.require.*` (from 0.90)::
|
||||||
|
See <<modules-cluster>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Metadata
|
||||||
|
|
||||||
|
`cluster.blocks.read_only`::
|
||||||
|
Have the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices).
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Discovery
|
||||||
|
|
||||||
|
`discovery.zen.minimum_master_nodes`::
|
||||||
|
See <<modules-discovery-zen>>
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Threadpools
|
||||||
|
|
||||||
|
`threadpool.*`::
|
||||||
|
See <<modules-threadpool>>
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Index settings
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Index filter cache
|
||||||
|
|
||||||
|
`indices.cache.filter.size`::
|
||||||
|
See <<index-modules-cache>>
|
||||||
|
|
||||||
|
`indices.cache.filter.expire` (time)::
|
||||||
|
See <<index-modules-cache>>
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== TTL interval
|
||||||
|
|
||||||
|
`indices.ttl.interval` (time)::
|
||||||
|
See <<mapping-ttl-field>>
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Recovery
|
||||||
|
|
||||||
|
`indices.recovery.concurrent_streams`::
|
||||||
|
See <<modules-indices>>
|
||||||
|
|
||||||
|
`indices.recovery.file_chunk_size`::
|
||||||
|
See <<modules-indices>>
|
||||||
|
|
||||||
|
`indices.recovery.translog_ops`::
|
||||||
|
See <<modules-indices>>
|
||||||
|
|
||||||
|
`indices.recovery.translog_size`::
|
||||||
|
See <<modules-indices>>
|
||||||
|
|
||||||
|
`indices.recovery.compress`::
|
||||||
|
See <<modules-indices>>
|
||||||
|
|
||||||
|
`indices.recovery.max_bytes_per_sec`::
|
||||||
|
Since 0.90.1. See <<modules-indices>>
|
||||||
|
|
||||||
|
`indices.recovery.max_size_per_sec`::
|
||||||
|
Deprecated since 0.90.1. See `max_bytes_per_sec` instead.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Store level throttling
|
||||||
|
|
||||||
|
`indices.store.throttle.type`::
|
||||||
|
See <<index-modules-store>>
|
||||||
|
|
||||||
|
`indices.store.throttle.max_bytes_per_sec`::
|
||||||
|
See <<index-modules-store>>
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Logger
|
||||||
|
|
||||||
|
Logger values can also be updated by setting `logger.` prefix. More
|
||||||
|
settings will be allowed to be updated.
|
45
docs/reference/common-options.asciidoc
Normal file
45
docs/reference/common-options.asciidoc
Normal file
|
@ -0,0 +1,45 @@
|
||||||
|
[[search-common-options]]
|
||||||
|
== Common Options
|
||||||
|
|
||||||
|
=== Pretty Results
|
||||||
|
|
||||||
|
When appending `?pretty=true` to any request made, the JSON returned
|
||||||
|
will be pretty formatted (use it for debugging only!). Another option is
|
||||||
|
to set `format=yaml` which will cause the result to be returned in the
|
||||||
|
(sometimes) more readable yaml format.
|
||||||
|
|
||||||
|
=== Parameters
|
||||||
|
|
||||||
|
Rest parameters (when using HTTP, map to HTTP URL parameters) follow the
|
||||||
|
convention of using underscore casing.
|
||||||
|
|
||||||
|
=== Boolean Values
|
||||||
|
|
||||||
|
All REST APIs parameters (both request parameters and JSON body) support
|
||||||
|
providing boolean "false" as the values: `false`, `0`, `no` and `off`.
|
||||||
|
All other values are considered "true". Note, this is not related to
|
||||||
|
fields within a document indexed treated as boolean fields.
|
||||||
|
|
||||||
|
=== Number Values
|
||||||
|
|
||||||
|
All REST APIs support providing numbered parameters as `string` on top
|
||||||
|
of supporting the native JSON number types.
|
||||||
|
|
||||||
|
=== Result Casing
|
||||||
|
|
||||||
|
All REST APIs accept the `case` parameter. When set to `camelCase`, all
|
||||||
|
field names in the result will be returned in camel casing, otherwise,
|
||||||
|
underscore casing will be used. Note, this does not apply to the source
|
||||||
|
document indexed.
|
||||||
|
|
||||||
|
=== JSONP
|
||||||
|
|
||||||
|
All REST APIs accept a `callback` parameter resulting in a
|
||||||
|
http://en.wikipedia.org/wiki/JSONP[JSONP] result.
|
||||||
|
|
||||||
|
=== Request body in query string
|
||||||
|
|
||||||
|
For libraries that don't accept a request body for non-POST requests,
|
||||||
|
you can pass the request body as the `source` query string parameter
|
||||||
|
instead.
|
||||||
|
|
31
docs/reference/docs.asciidoc
Normal file
31
docs/reference/docs.asciidoc
Normal file
|
@ -0,0 +1,31 @@
|
||||||
|
[[docs]]
|
||||||
|
= Document APIs
|
||||||
|
|
||||||
|
[partintro]
|
||||||
|
--
|
||||||
|
|
||||||
|
This section describes the REST APIs *elasticsearch* provides (mainly)
|
||||||
|
using JSON. The API is exposed using
|
||||||
|
<<modules-http,HTTP>>,
|
||||||
|
<<modules-thrift,thrift>>,
|
||||||
|
<<modules-memcached,memcached>>.
|
||||||
|
|
||||||
|
--
|
||||||
|
|
||||||
|
include::docs/index_.asciidoc[]
|
||||||
|
|
||||||
|
include::docs/get.asciidoc[]
|
||||||
|
|
||||||
|
include::docs/delete.asciidoc[]
|
||||||
|
|
||||||
|
include::docs/update.asciidoc[]
|
||||||
|
|
||||||
|
include::docs/multi-get.asciidoc[]
|
||||||
|
|
||||||
|
include::docs/bulk.asciidoc[]
|
||||||
|
|
||||||
|
include::docs/delete-by-query.asciidoc[]
|
||||||
|
|
||||||
|
include::docs/bulk-udp.asciidoc[]
|
||||||
|
|
||||||
|
|
57
docs/reference/docs/bulk-udp.asciidoc
Normal file
57
docs/reference/docs/bulk-udp.asciidoc
Normal file
|
@ -0,0 +1,57 @@
|
||||||
|
[[docs-bulk-udp]]
|
||||||
|
== Bulk UDP API
|
||||||
|
|
||||||
|
A Bulk UDP service is a service listening over UDP for bulk format
|
||||||
|
requests. The idea is to provide a low latency UDP service that allows
|
||||||
|
to easily index data that is not of critical nature.
|
||||||
|
|
||||||
|
The Bulk UDP service is disabled by default, but can be enabled by
|
||||||
|
setting `bulk.udp.enabled` to `true`.
|
||||||
|
|
||||||
|
The bulk UDP service performs internal bulk aggregation of the data and
|
||||||
|
then flushes it based on several parameters:
|
||||||
|
|
||||||
|
`bulk.udp.bulk_actions`::
|
||||||
|
The number of actions to flush a bulk after,
|
||||||
|
defaults to `1000`.
|
||||||
|
|
||||||
|
`bulk.udp.bulk_size`::
|
||||||
|
The size of the current bulk request to flush
|
||||||
|
the request once exceeded, defaults to `5mb`.
|
||||||
|
|
||||||
|
`bulk.udp.flush_interval`::
|
||||||
|
An interval after which the current
|
||||||
|
request is flushed, regardless of the above limits. Defaults to `5s`.
|
||||||
|
`bulk.udp.concurrent_requests`::
|
||||||
|
The number on max in flight bulk
|
||||||
|
requests allowed. Defaults to `4`.
|
||||||
|
|
||||||
|
The allowed network settings are:
|
||||||
|
|
||||||
|
`bulk.udp.host`::
|
||||||
|
The host to bind to, defaults to `network.host`
|
||||||
|
which defaults to any.
|
||||||
|
|
||||||
|
`bulk.udp.port`::
|
||||||
|
The port to use, defaults to `9700-9800`.
|
||||||
|
|
||||||
|
`bulk.udp.receive_buffer_size`::
|
||||||
|
The receive buffer size, defaults to `10mb`.
|
||||||
|
|
||||||
|
Here is an example of how it can be used:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
> cat bulk.txt
|
||||||
|
{ "index" : { "_index" : "test", "_type" : "type1" } }
|
||||||
|
{ "field1" : "value1" }
|
||||||
|
{ "index" : { "_index" : "test", "_type" : "type1" } }
|
||||||
|
{ "field1" : "value1" }
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
> cat bulk.txt | nc -w 0 -u localhost 9700
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
|
174
docs/reference/docs/bulk.asciidoc
Normal file
174
docs/reference/docs/bulk.asciidoc
Normal file
|
@ -0,0 +1,174 @@
|
||||||
|
[[docs-bulk]]
|
||||||
|
== Bulk API
|
||||||
|
|
||||||
|
The bulk API makes it possible to perform many index/delete operations
|
||||||
|
in a single API call. This can greatly increase the indexing speed. The
|
||||||
|
REST API endpoint is `/_bulk`, and it expects the following JSON
|
||||||
|
structure:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
action_and_meta_data\n
|
||||||
|
optional_source\n
|
||||||
|
action_and_meta_data\n
|
||||||
|
optional_source\n
|
||||||
|
....
|
||||||
|
action_and_meta_data\n
|
||||||
|
optional_source\n
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
*NOTE*: the final line of data must end with a newline character `\n`.
|
||||||
|
|
||||||
|
The possible actions are `index`, `create`, `delete` and since version
|
||||||
|
`0.90.1` also `update`. `index` and `create` expect a source on the next
|
||||||
|
line, and have the same semantics as the `op_type` parameter to the
|
||||||
|
standard index API (i.e. create will fail if a document with the same
|
||||||
|
index and type exists already, whereas index will add or replace a
|
||||||
|
document as necessary). `delete` does not expect a source on the
|
||||||
|
following line, and has the same semantics as the standard delete API.
|
||||||
|
`update` expects that the partial doc, upsert and script and its options
|
||||||
|
are specified on the next line.
|
||||||
|
|
||||||
|
If you're providing text file input to `curl`, you *must* use the
|
||||||
|
`--data-binary` flag instead of plain `-d`. The latter doesn't preserve
|
||||||
|
newlines. Example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
$ cat requests
|
||||||
|
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
|
||||||
|
{ "field1" : "value1" }
|
||||||
|
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo
|
||||||
|
{"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1,"ok":true}}]}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Because this format uses literal `\n`'s as delimiters, please be sure
|
||||||
|
that the JSON actions and sources are not pretty printed. Here is an
|
||||||
|
example of a correct sequence of bulk commands:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
|
||||||
|
{ "field1" : "value1" }
|
||||||
|
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
|
||||||
|
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
|
||||||
|
{ "field1" : "value3" }
|
||||||
|
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
|
||||||
|
{ "doc" : {"field2" : "value2"} }
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
In the above example `doc` for the `update` action is a partial
|
||||||
|
document, that will be merged with the already stored document.
|
||||||
|
|
||||||
|
The endpoints are `/_bulk`, `/{index}/_bulk`, and `{index}/type/_bulk`.
|
||||||
|
When the index or the index/type are provided, they will be used by
|
||||||
|
default on bulk items that don't provide them explicitly.
|
||||||
|
|
||||||
|
A note on the format. The idea here is to make processing of this as
|
||||||
|
fast as possible. As some of the actions will be redirected to other
|
||||||
|
shards on other nodes, only `action_meta_data` is parsed on the
|
||||||
|
receiving node side.
|
||||||
|
|
||||||
|
Client libraries using this protocol should try and strive to do
|
||||||
|
something similar on the client side, and reduce buffering as much as
|
||||||
|
possible.
|
||||||
|
|
||||||
|
The response to a bulk action is a large JSON structure with the
|
||||||
|
individual results of each action that was performed. The failure of a
|
||||||
|
single action does not affect the remaining actions.
|
||||||
|
|
||||||
|
There is no "correct" number of actions to perform in a single bulk
|
||||||
|
call. You should experiment with different settings to find the optimum
|
||||||
|
size for your particular workload.
|
||||||
|
|
||||||
|
If using the HTTP API, make sure that the client does not send HTTP
|
||||||
|
chunks, as this will slow things down.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Versioning
|
||||||
|
|
||||||
|
Each bulk item can include the version value using the
|
||||||
|
`_version`/`version` field. It automatically follows the behavior of the
|
||||||
|
index / delete operation based on the `_version` mapping. It also
|
||||||
|
support the `version_type`/`_version_type` when using `external`
|
||||||
|
versioning.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Routing
|
||||||
|
|
||||||
|
Each bulk item can include the routing value using the
|
||||||
|
`_routing`/`routing` field. It automatically follows the behavior of the
|
||||||
|
index / delete operation based on the `_routing` mapping.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Percolator
|
||||||
|
|
||||||
|
Each bulk index action can include a percolate value using the
|
||||||
|
`_percolate`/`percolate` field.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Parent
|
||||||
|
|
||||||
|
Each bulk item can include the parent value using the `_parent`/`parent`
|
||||||
|
field. It automatically follows the behavior of the index / delete
|
||||||
|
operation based on the `_parent` / `_routing` mapping.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Timestamp
|
||||||
|
|
||||||
|
Each bulk item can include the timestamp value using the
|
||||||
|
`_timestamp`/`timestamp` field. It automatically follows the behavior of
|
||||||
|
the index operation based on the `_timestamp` mapping.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== TTL
|
||||||
|
|
||||||
|
Each bulk item can include the ttl value using the `_ttl`/`ttl` field.
|
||||||
|
It automatically follows the behavior of the index operation based on
|
||||||
|
the `_ttl` mapping.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Write Consistency
|
||||||
|
|
||||||
|
When making bulk calls, you can require a minimum number of active
|
||||||
|
shards in the partition through the `consistency` parameter. The values
|
||||||
|
allowed are `one`, `quorum`, and `all`. It defaults to the node level
|
||||||
|
setting of `action.write_consistency`, which in turn defaults to
|
||||||
|
`quorum`.
|
||||||
|
|
||||||
|
For example, in a N shards with 2 replicas index, there will have to be
|
||||||
|
at least 2 active shards within the relevant partition (`quorum`) for
|
||||||
|
the operation to succeed. In a N shards with 1 replica scenario, there
|
||||||
|
will need to be a single shard active (in this case, `one` and `quorum`
|
||||||
|
is the same).
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Refresh
|
||||||
|
|
||||||
|
The `refresh` parameter can be set to `true` in order to refresh the
|
||||||
|
relevant shards immediately after the bulk operation has occurred and
|
||||||
|
make it searchable, instead of waiting for the normal refresh interval
|
||||||
|
to expire. Setting it to `true` can trigger additional load, and may
|
||||||
|
slow down indexing.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Update
|
||||||
|
|
||||||
|
When using `update` action `_retry_on_conflict` can be used as field in
|
||||||
|
the action itself (not in the extra payload line), to specify how many
|
||||||
|
times an update should be retried in the case of a version conflict.
|
||||||
|
|
||||||
|
The `update` action payload, supports the following options: `doc`
|
||||||
|
(partial document), `upsert`, `doc_as_upsert`, `script`, `params` (for
|
||||||
|
script), `lang` (for script). See update documentation for details on
|
||||||
|
the options. Curl example with update actions:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
|
||||||
|
{ "doc" : {"field" : "value"} }
|
||||||
|
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
|
||||||
|
{ "script" : "ctx._source.counter += param1", "lang" : "js", "params" : {"param1" : 1}, "upsert" : {"counter" : 1}}
|
||||||
|
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
|
||||||
|
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
|
||||||
|
--------------------------------------------------
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue