elasticsearch/client/benchmark
Rene Groeschke 8ccae4da71
Setup elasticsearch dependency monitoring with Snyk for production code (#88036)
This adds the generation and upload logic of Gradle dependency graphs to snyk

We directly implemented a rest api based snyk plugin as:

the existing snyk gradle plugin delegates to the snyk command line tool the command line tool 
uses custom gradle logic by injecting a init file that is 

a) using deprecated build logic which we definitely want to avoid
b) uses gradle api we avoid like eager task creation.

Shipping this as a internal gradle plugin gives us the most flexibility as we only want to monitor 
production code for now we apply this plugin as part of the elasticsearch.build plugin, 
that usage has been for now the de-facto indicator if a project is considered a "production" project 
that ends up in our distribution or public maven repositories. This isnt yet ideal and we will revisit 
the distinction between production and non production code / projects in a separate effort.

As part of this effort we added the elasticsearch.build plugin to more projects that actually end up 
in the distribution. To unblock us on this we for now disabled a few check tasks that started failing by applying elasticsearch.build. 

Addresses  #87620
2022-06-29 13:29:14 +02:00
..
src/main Migrate to Java 16 Records (part 1) (#82338) 2022-01-18 17:53:06 +01:00
build.gradle Setup elasticsearch dependency monitoring with Snyk for production code (#88036) 2022-06-29 13:29:14 +02:00
README.md Remove the transport client (#42538) 2019-06-05 17:22:37 -07:00

Steps to execute the benchmark

  1. Build client-benchmark-noop-api-plugin with ./gradlew :client:client-benchmark-noop-api-plugin:assemble
  2. Install it on the target host with bin/elasticsearch-plugin install file:///full/path/to/client-benchmark-noop-api-plugin.zip.
  3. Start Elasticsearch on the target host (ideally not on the machine that runs the benchmarks)
  4. Run the benchmark with
./gradlew -p client/benchmark run --args ' params go here'

Everything in the ' gets sent on the command line to JMH. The leading inside the 's is important. Without it parameters are sometimes sent to gradle.

See below for some example invocations.

Example benchmark

In general, you should define a few GC-related settings -Xms8192M -Xmx8192M -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails and keep an eye on GC activity. You can also define -XX:+PrintCompilation to see JIT activity.

Bulk indexing

Download benchmark data from http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames and decompress them.

Example invocation:

wget http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames/documents-2.json.bz2
bzip2 -d documents-2.json.bz2
mv documents-2.json client/benchmark/build
gradlew -p client/benchmark run --args ' rest bulk localhost build/documents-2.json geonames type 8647880 5000'

The parameters are all in the 's and are in order:

  • Client type: Use either "rest" or "transport"
  • Benchmark type: Use either "bulk" or "search"
  • Benchmark target host IP (the host where Elasticsearch is running)
  • full path to the file that should be bulk indexed
  • name of the index
  • name of the (sole) type in the index
  • number of documents in the file
  • bulk size

Example invocation:

./gradlew -p client/benchmark run --args ' rest search localhost geonames {"query":{"match_phrase":{"name":"Sankt Georgen"}}} 500,1000,1100,1200'

The parameters are in order:

  • Client type: Always "rest"
  • Benchmark type: Use either "bulk" or "search"
  • Benchmark target host IP (the host where Elasticsearch is running)
  • name of the index
  • a search request body (remember to escape double quotes).
  • A comma-separated list of target throughput rates