Commit graph

12 commits

Author SHA1 Message Date
Nik Everett
7ebb09b9f3
Speed getIntLE from BytesReference (#90147)
This speeds up `getIntLE` from `BytesReference` which we'll be using in
the upcoming dense representations for aggregations. Here's the
performance:

```
           (type)  Mode  Cnt   Before  Error  After    Error  Units
            array  avgt    7   1.036 ± 0.062   0.261 ± 0.022  ns/op
paged_bytes_array  avgt    7   5.189 ± 0.172   5.317 ± 0.196  ns/op
  composite_256kb  avgt    7  30.792 ± 0.834  11.240 ± 0.387  ns/op
composite_262344b  avgt    7  32.503 ± 1.017  11.155 ± 0.358  ns/op
    composite_1mb  avgt    7  25.189 ± 0.449   8.379 ± 0.193  ns/op
```

The `array` method is how we'll use slices that don't span the edges of
a netty buffer. The `paged_bytes_array` method doesn't really change and
represents the default for internal stuff. I'll bet we could make it
faster too, but I don't know that we use it in the hot path. The
`composite_<size>` method is how we'll be reading large slabs from the
netty byte buffer. We could probably do better if we relied on the sizes
of the buffers being even, but we don't presently do that in the
composite bytes array. The different sizes following `composite` show
that the performance is dominated by the number of slabs in the
composite buffer. `1mb` looks like the largest buffer netty uses.
`256kb` is the smallest. The wild number of bytes intentionally doesn't
line the int up on sensible values. I don't think we'll use sizes like
that but it looks like the performance doesn't make a huge difference.
We're dominated by the buffer choice.
2022-09-22 00:47:11 +09:30
Abele Mălan
22c4a10c63
Update README.md (#84153)
- delete/update a few misplaced words;
- add some extra commas;
- fix capitalization of "Mac".
2022-03-10 14:03:07 -05:00
Quentin Pradet
5d8421744a
Fix link to benchmark page (#83887) 2022-02-15 13:00:52 +04:00
Nik Everett
fad5e44b99
update benchmark readme (#72620)
Documents that version 2.0 of the async profiler doesn't seem to work
with jmh. Fixes some syntax in another profiling example.
2021-05-03 11:30:50 -04:00
Nik Everett
a5f3787be4
It's flame graph time! (#68312)
Upgrade JMH to latest (1.26) to pick up its async profiler integration
and update the documentation to include instructions to running the
async profiler and making pretty pretty flame graphs.
2021-02-02 11:11:16 -05:00
Nik Everett
dfc45396e7
Speed up writeVInt (#62345)
This speeds up `StreamOutput#writeVInt` quite a bit which is nice
because it is *very* commonly called when serializing aggregations. Well,
when serializing anything. All "collections" serialize their size as a
vint. Anyway, I was examining the serialization speeds of `StringTerms`
and this saves about 30% of the write time for that. I expect it'll be
useful other places.
2020-09-15 14:20:53 -04:00
Nik Everett
1af8d9f228
Rework checking if a year is a leap year (#60585)
This way is faster, saving about 8% on the microbenchmark that rounds to
the nearest month. That is in the hot path for `date_histogram` which is
a very popular aggregation so it seems worth it to at least try and
speed it up a little.
2020-08-05 16:09:51 -04:00
Nik Everett
0097a86d53
Optimize date_histograms across daylight savings time (#55559)
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.

Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)

I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.

This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.

This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.

Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
2020-05-07 07:22:32 -04:00
Nik Everett
21eb9695af
Build: Remove shadowing from benchmarks (#32475)
Removes shadowing from the benchmarks. It isn't *strictly* needed. We do
have to rework the documentation on how to run the benchmark, but it
still seems to work if you run everything through gradle.
2018-07-31 17:31:13 -04:00
Daniel Mitterdorfer
889d802115 Refine wording in benchmark README and correct typos 2016-06-15 23:01:56 +02:00
Daniel Mitterdorfer
32dd813436 Fix typo in benchmark README 2016-06-15 22:45:47 +02:00
Daniel Mitterdorfer
2c467fd9c2 Add microbenchmarking infrastructure (#18891)
With this commit we add a benchmarks project that contains the necessary build
infrastructure and an example benchmark. It is added as a separate project to avoid
interfering with the regular build too much (especially sanity checks) and to keep
the microbenchmarks isolated.

Microbenchmarks are generated with `gradle :benchmarks:jmhJar` and can be run with
` gradle :benchmarks:jmh`.

We intentionally do not use the
[jmh-gradle-plugin](https://github.com/melix/jmh-gradle-plugin) as it causes all
sorts of problems (dependencies are not properly excluded, not all JMH parameters
can be set) and it adds another abstraction layer that is not needed.

Closes #18242
2016-06-15 16:48:02 +02:00