From 433598296a7e154436eabd613968d7f1ea7cd18d Mon Sep 17 00:00:00 2001 From: Daan Date: Wed, 22 Jan 2020 15:21:54 -0800 Subject: [PATCH] Fix benchmark chart display --- readme.md | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/readme.md b/readme.md index 388e6470..db58df30 100644 --- a/readme.md +++ b/readme.md @@ -56,8 +56,8 @@ Enjoy! ### Releases -* 2020-01-XX, `v1.4.0`: stable release 1.4: delayed OS page reset for (much) better performance - with page reset enabled, more eager concurrent free, addition of STL allocator. +* 2020-01-22, `v1.4.0`: stable release 1.4: delayed OS page reset with (much) better performance + (when page reset is enabled), more eager concurrent free, addition of STL allocator, fixed potential memory leak. * 2020-01-15, `v1.3.0`: stable release 1.3: bug fixes, improved randomness and [stronger free list encoding](https://github.com/microsoft/mimalloc/blob/783e3377f79ee82af43a0793910a9f2d01ac7863/include/mimalloc-internal.h#L396) in secure mode. * 2019-12-22, `v1.2.2`: stable release 1.2: minor updates. @@ -208,14 +208,17 @@ or via environment variables. to explicitly allow large OS pages (as on [Windows][windows-huge] and [Linux][linux-huge]). However, sometimes the OS is very slow to reserve contiguous physical memory for large OS pages so use with care on systems that can have fragmented memory (for that reason, we generally recommend to use `MIMALLOC_RESERVE_HUGE_OS_PAGES` instead when possible). -- `MIMALLOC_EAGER_REGION_COMMIT=1`: on Windows, commit large (256MiB) regions eagerly. On Windows, these regions + - `MIMALLOC_RESERVE_HUGE_OS_PAGES=N`: where N is the number of 1GiB huge OS pages. This reserves the huge pages at startup and can give quite a performance improvement on long running workloads. Usually it is better to not use `MIMALLOC_LARGE_OS_PAGES` in combination with this setting. Just like large OS pages, use with care as reserving - contiguous physical memory can take a long time when memory is fragmented. + contiguous physical memory can take a long time when memory is fragmented (but reserving the huge pages is done at + startup only once). Note that we usually need to explicitly enable huge OS pages (as on [Windows][windows-huge] and [Linux][linux-huge])). With huge OS pages, it may be beneficial to set the setting `MIMALLOC_EAGER_COMMIT_DELAY=N` (with usually `N` as 1) to delay the initial `N` segments of a thread to not allocate in the huge OS pages; this prevents threads that are short lived @@ -358,8 +361,8 @@ the memory compacting [_Mesh_](https://github.com/plasma-umass/Mesh) (git:51222e Bobby Powers _et al_ \[8], and finally the default system allocator (glibc, 2.7.0) (based on _PtMalloc2_). -![bench-c5-18xlarge-a](doc/bench-c5-18xlarge-2020-01-20-a.svg) -![bench-c5-18xlarge-b](doc/bench-c5-18xlarge-2020-01-20-b.svg) + + Any benchmarks ending in `N` run on all processors in parallel. Results are averaged over 10 runs and reported relative @@ -450,8 +453,8 @@ having a 48 processor AMD Epyc 7000 at 2.5GHz with 384GiB of memory. The results are similar to the Intel results but it is interesting to see the differences in the _larsonN_, _mstressN_, and _xmalloc-testN_ benchmarks. -![bench-r5a-12xlarge-a](doc/bench-r5a-12xlarge-2020-01-16-a.svg) -![bench-r5a-12xlarge-b](doc/bench-r5a-12xlarge-2020-01-16-b.svg) + + ## Peak Working Set @@ -459,8 +462,8 @@ see the differences in the _larsonN_, _mstressN_, and _xmalloc-testN_ benchmarks The following figure shows the peak working set (rss) of the allocators on the benchmarks (on the c5.18xlarge instance). -![bench-c5-18xlarge-rss-a](doc/bench-c5-18xlarge-2020-01-20-rss-a.svg) -![bench-c5-18xlarge-rss-b](doc/bench-c5-18xlarge-2020-01-20-rss-b.svg) + + Note that the _xmalloc-testN_ memory usage should be disregarded as it allocates more the faster the program runs. Similarly, memory usage of