mirror of
https://github.com/microsoft/mimalloc.git
synced 2025-07-01 17:24:38 +03:00
Update documentation generation
This commit is contained in:
parent
c91bed99e1
commit
644f59fad7
3 changed files with 80 additions and 613 deletions
337
.gitignore
vendored
337
.gitignore
vendored
|
@ -1,330 +1,7 @@
|
||||||
## Ignore Visual Studio temporary files, build results, and
|
ide/vs2017/*.db
|
||||||
## files generated by popular Visual Studio add-ons.
|
ide/vs2017/*.opendb
|
||||||
##
|
ide/vs2017/*.user
|
||||||
## Get latest from https://github.com/github/gitignore/blob/master/VisualStudio.gitignore
|
ide/vs2017/.vs
|
||||||
|
out/
|
||||||
# User-specific files
|
docs/
|
||||||
*.suo
|
*.zip
|
||||||
*.user
|
|
||||||
*.userosscache
|
|
||||||
*.sln.docstates
|
|
||||||
|
|
||||||
# User-specific files (MonoDevelop/Xamarin Studio)
|
|
||||||
*.userprefs
|
|
||||||
|
|
||||||
# Build results
|
|
||||||
[Dd]ebug/
|
|
||||||
[Dd]ebugPublic/
|
|
||||||
[Rr]elease/
|
|
||||||
[Rr]eleases/
|
|
||||||
x64/
|
|
||||||
x86/
|
|
||||||
bld/
|
|
||||||
[Bb]in/
|
|
||||||
[Oo]bj/
|
|
||||||
[Ll]og/
|
|
||||||
|
|
||||||
# Visual Studio 2015/2017 cache/options directory
|
|
||||||
.vs/
|
|
||||||
# Uncomment if you have tasks that create the project's static files in wwwroot
|
|
||||||
#wwwroot/
|
|
||||||
|
|
||||||
# Visual Studio 2017 auto generated files
|
|
||||||
Generated\ Files/
|
|
||||||
|
|
||||||
# MSTest test Results
|
|
||||||
[Tt]est[Rr]esult*/
|
|
||||||
[Bb]uild[Ll]og.*
|
|
||||||
|
|
||||||
# NUNIT
|
|
||||||
*.VisualState.xml
|
|
||||||
TestResult.xml
|
|
||||||
|
|
||||||
# Build Results of an ATL Project
|
|
||||||
[Dd]ebugPS/
|
|
||||||
[Rr]eleasePS/
|
|
||||||
dlldata.c
|
|
||||||
|
|
||||||
# Benchmark Results
|
|
||||||
BenchmarkDotNet.Artifacts/
|
|
||||||
|
|
||||||
# .NET Core
|
|
||||||
project.lock.json
|
|
||||||
project.fragment.lock.json
|
|
||||||
artifacts/
|
|
||||||
**/Properties/launchSettings.json
|
|
||||||
|
|
||||||
# StyleCop
|
|
||||||
StyleCopReport.xml
|
|
||||||
|
|
||||||
# Files built by Visual Studio
|
|
||||||
*_i.c
|
|
||||||
*_p.c
|
|
||||||
*_i.h
|
|
||||||
*.ilk
|
|
||||||
*.meta
|
|
||||||
*.obj
|
|
||||||
*.iobj
|
|
||||||
*.pch
|
|
||||||
*.pdb
|
|
||||||
*.ipdb
|
|
||||||
*.pgc
|
|
||||||
*.pgd
|
|
||||||
*.rsp
|
|
||||||
*.sbr
|
|
||||||
*.tlb
|
|
||||||
*.tli
|
|
||||||
*.tlh
|
|
||||||
*.tmp
|
|
||||||
*.tmp_proj
|
|
||||||
*.log
|
|
||||||
*.vspscc
|
|
||||||
*.vssscc
|
|
||||||
.builds
|
|
||||||
*.pidb
|
|
||||||
*.svclog
|
|
||||||
*.scc
|
|
||||||
|
|
||||||
# Chutzpah Test files
|
|
||||||
_Chutzpah*
|
|
||||||
|
|
||||||
# Visual C++ cache files
|
|
||||||
ipch/
|
|
||||||
*.aps
|
|
||||||
*.ncb
|
|
||||||
*.opendb
|
|
||||||
*.opensdf
|
|
||||||
*.sdf
|
|
||||||
*.cachefile
|
|
||||||
*.VC.db
|
|
||||||
*.VC.VC.opendb
|
|
||||||
|
|
||||||
# Visual Studio profiler
|
|
||||||
*.psess
|
|
||||||
*.vsp
|
|
||||||
*.vspx
|
|
||||||
*.sap
|
|
||||||
|
|
||||||
# Visual Studio Trace Files
|
|
||||||
*.e2e
|
|
||||||
|
|
||||||
# TFS 2012 Local Workspace
|
|
||||||
$tf/
|
|
||||||
|
|
||||||
# Guidance Automation Toolkit
|
|
||||||
*.gpState
|
|
||||||
|
|
||||||
# ReSharper is a .NET coding add-in
|
|
||||||
_ReSharper*/
|
|
||||||
*.[Rr]e[Ss]harper
|
|
||||||
*.DotSettings.user
|
|
||||||
|
|
||||||
# JustCode is a .NET coding add-in
|
|
||||||
.JustCode
|
|
||||||
|
|
||||||
# TeamCity is a build add-in
|
|
||||||
_TeamCity*
|
|
||||||
|
|
||||||
# DotCover is a Code Coverage Tool
|
|
||||||
*.dotCover
|
|
||||||
|
|
||||||
# AxoCover is a Code Coverage Tool
|
|
||||||
.axoCover/*
|
|
||||||
!.axoCover/settings.json
|
|
||||||
|
|
||||||
# Visual Studio code coverage results
|
|
||||||
*.coverage
|
|
||||||
*.coveragexml
|
|
||||||
|
|
||||||
# NCrunch
|
|
||||||
_NCrunch_*
|
|
||||||
.*crunch*.local.xml
|
|
||||||
nCrunchTemp_*
|
|
||||||
|
|
||||||
# MightyMoose
|
|
||||||
*.mm.*
|
|
||||||
AutoTest.Net/
|
|
||||||
|
|
||||||
# Web workbench (sass)
|
|
||||||
.sass-cache/
|
|
||||||
|
|
||||||
# Installshield output folder
|
|
||||||
[Ee]xpress/
|
|
||||||
|
|
||||||
# DocProject is a documentation generator add-in
|
|
||||||
DocProject/buildhelp/
|
|
||||||
DocProject/Help/*.HxT
|
|
||||||
DocProject/Help/*.HxC
|
|
||||||
DocProject/Help/*.hhc
|
|
||||||
DocProject/Help/*.hhk
|
|
||||||
DocProject/Help/*.hhp
|
|
||||||
DocProject/Help/Html2
|
|
||||||
DocProject/Help/html
|
|
||||||
|
|
||||||
# Click-Once directory
|
|
||||||
publish/
|
|
||||||
|
|
||||||
# Publish Web Output
|
|
||||||
*.[Pp]ublish.xml
|
|
||||||
*.azurePubxml
|
|
||||||
# Note: Comment the next line if you want to checkin your web deploy settings,
|
|
||||||
# but database connection strings (with potential passwords) will be unencrypted
|
|
||||||
*.pubxml
|
|
||||||
*.publishproj
|
|
||||||
|
|
||||||
# Microsoft Azure Web App publish settings. Comment the next line if you want to
|
|
||||||
# checkin your Azure Web App publish settings, but sensitive information contained
|
|
||||||
# in these scripts will be unencrypted
|
|
||||||
PublishScripts/
|
|
||||||
|
|
||||||
# NuGet Packages
|
|
||||||
*.nupkg
|
|
||||||
# The packages folder can be ignored because of Package Restore
|
|
||||||
**/[Pp]ackages/*
|
|
||||||
# except build/, which is used as an MSBuild target.
|
|
||||||
!**/[Pp]ackages/build/
|
|
||||||
# Uncomment if necessary however generally it will be regenerated when needed
|
|
||||||
#!**/[Pp]ackages/repositories.config
|
|
||||||
# NuGet v3's project.json files produces more ignorable files
|
|
||||||
*.nuget.props
|
|
||||||
*.nuget.targets
|
|
||||||
|
|
||||||
# Microsoft Azure Build Output
|
|
||||||
csx/
|
|
||||||
*.build.csdef
|
|
||||||
|
|
||||||
# Microsoft Azure Emulator
|
|
||||||
ecf/
|
|
||||||
rcf/
|
|
||||||
|
|
||||||
# Windows Store app package directories and files
|
|
||||||
AppPackages/
|
|
||||||
BundleArtifacts/
|
|
||||||
Package.StoreAssociation.xml
|
|
||||||
_pkginfo.txt
|
|
||||||
*.appx
|
|
||||||
|
|
||||||
# Visual Studio cache files
|
|
||||||
# files ending in .cache can be ignored
|
|
||||||
*.[Cc]ache
|
|
||||||
# but keep track of directories ending in .cache
|
|
||||||
!*.[Cc]ache/
|
|
||||||
|
|
||||||
# Others
|
|
||||||
ClientBin/
|
|
||||||
~$*
|
|
||||||
*~
|
|
||||||
*.dbmdl
|
|
||||||
*.dbproj.schemaview
|
|
||||||
*.jfm
|
|
||||||
*.pfx
|
|
||||||
*.publishsettings
|
|
||||||
orleans.codegen.cs
|
|
||||||
|
|
||||||
# Including strong name files can present a security risk
|
|
||||||
# (https://github.com/github/gitignore/pull/2483#issue-259490424)
|
|
||||||
#*.snk
|
|
||||||
|
|
||||||
# Since there are multiple workflows, uncomment next line to ignore bower_components
|
|
||||||
# (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
|
|
||||||
#bower_components/
|
|
||||||
|
|
||||||
# RIA/Silverlight projects
|
|
||||||
Generated_Code/
|
|
||||||
|
|
||||||
# Backup & report files from converting an old project file
|
|
||||||
# to a newer Visual Studio version. Backup files are not needed,
|
|
||||||
# because we have git ;-)
|
|
||||||
_UpgradeReport_Files/
|
|
||||||
Backup*/
|
|
||||||
UpgradeLog*.XML
|
|
||||||
UpgradeLog*.htm
|
|
||||||
ServiceFabricBackup/
|
|
||||||
*.rptproj.bak
|
|
||||||
|
|
||||||
# SQL Server files
|
|
||||||
*.mdf
|
|
||||||
*.ldf
|
|
||||||
*.ndf
|
|
||||||
|
|
||||||
# Business Intelligence projects
|
|
||||||
*.rdl.data
|
|
||||||
*.bim.layout
|
|
||||||
*.bim_*.settings
|
|
||||||
*.rptproj.rsuser
|
|
||||||
|
|
||||||
# Microsoft Fakes
|
|
||||||
FakesAssemblies/
|
|
||||||
|
|
||||||
# GhostDoc plugin setting file
|
|
||||||
*.GhostDoc.xml
|
|
||||||
|
|
||||||
# Node.js Tools for Visual Studio
|
|
||||||
.ntvs_analysis.dat
|
|
||||||
node_modules/
|
|
||||||
|
|
||||||
# Visual Studio 6 build log
|
|
||||||
*.plg
|
|
||||||
|
|
||||||
# Visual Studio 6 workspace options file
|
|
||||||
*.opt
|
|
||||||
|
|
||||||
# Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
|
|
||||||
*.vbw
|
|
||||||
|
|
||||||
# Visual Studio LightSwitch build output
|
|
||||||
**/*.HTMLClient/GeneratedArtifacts
|
|
||||||
**/*.DesktopClient/GeneratedArtifacts
|
|
||||||
**/*.DesktopClient/ModelManifest.xml
|
|
||||||
**/*.Server/GeneratedArtifacts
|
|
||||||
**/*.Server/ModelManifest.xml
|
|
||||||
_Pvt_Extensions
|
|
||||||
|
|
||||||
# Paket dependency manager
|
|
||||||
.paket/paket.exe
|
|
||||||
paket-files/
|
|
||||||
|
|
||||||
# FAKE - F# Make
|
|
||||||
.fake/
|
|
||||||
|
|
||||||
# JetBrains Rider
|
|
||||||
.idea/
|
|
||||||
*.sln.iml
|
|
||||||
|
|
||||||
# CodeRush
|
|
||||||
.cr/
|
|
||||||
|
|
||||||
# Python Tools for Visual Studio (PTVS)
|
|
||||||
__pycache__/
|
|
||||||
*.pyc
|
|
||||||
|
|
||||||
# Cake - Uncomment if you are using it
|
|
||||||
# tools/**
|
|
||||||
# !tools/packages.config
|
|
||||||
|
|
||||||
# Tabs Studio
|
|
||||||
*.tss
|
|
||||||
|
|
||||||
# Telerik's JustMock configuration file
|
|
||||||
*.jmconfig
|
|
||||||
|
|
||||||
# BizTalk build output
|
|
||||||
*.btp.cs
|
|
||||||
*.btm.cs
|
|
||||||
*.odx.cs
|
|
||||||
*.xsd.cs
|
|
||||||
|
|
||||||
# OpenCover UI analysis results
|
|
||||||
OpenCover/
|
|
||||||
|
|
||||||
# Azure Stream Analytics local run output
|
|
||||||
ASALocalRun/
|
|
||||||
|
|
||||||
# MSBuild Binary and Structured Log
|
|
||||||
*.binlog
|
|
||||||
|
|
||||||
# NVidia Nsight GPU debugger configuration file
|
|
||||||
*.nvuser
|
|
||||||
|
|
||||||
# MFractors (Xamarin productivity tool) working folder
|
|
||||||
.mfractor/
|
|
||||||
|
|
13
doc/doxyfile
13
doc/doxyfile
|
@ -1235,18 +1235,7 @@ HTML_EXTRA_STYLESHEET = mimalloc-doxygen.css
|
||||||
# files will be copied as-is; there are no commands or markers available.
|
# files will be copied as-is; there are no commands or markers available.
|
||||||
# This tag requires that the tag GENERATE_HTML is set to YES.
|
# This tag requires that the tag GENERATE_HTML is set to YES.
|
||||||
|
|
||||||
HTML_EXTRA_FILES = bench-r5a-4xlarge-t1.png \
|
HTML_EXTRA_FILES =
|
||||||
bench-r5a-4xlarge-t2.png \
|
|
||||||
bench-r5a-4xlarge-m1.png \
|
|
||||||
bench-r5a-4xlarge-m2.png \
|
|
||||||
bench-c5d-2xlarge-t1.png \
|
|
||||||
bench-c5d-2xlarge-t2.png \
|
|
||||||
bench-c5d-2xlarge-m1.png \
|
|
||||||
bench-c5d-2xlarge-m2.png \
|
|
||||||
bench-z4-win-t1.png \
|
|
||||||
bench-z4-win-t2.png \
|
|
||||||
bench-z4-win-m1.png \
|
|
||||||
bench-z4-win-m2.png
|
|
||||||
|
|
||||||
# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen
|
# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen
|
||||||
# will adjust the colors in the style sheet and background images according to
|
# will adjust the colors in the style sheet and background images according to
|
||||||
|
|
|
@ -11,13 +11,19 @@ terms of the MIT license. A copy of the license can be found in the file
|
||||||
/*! \mainpage
|
/*! \mainpage
|
||||||
|
|
||||||
This is the API documentation of the
|
This is the API documentation of the
|
||||||
[mimalloc](https://github.com/koka-lang/mimalloc) allocator
|
[mimalloc](https://github.com/microsoft/mimalloc) allocator
|
||||||
(pronounced "me-malloc") -- a
|
(pronounced "me-malloc") -- a
|
||||||
general purpose allocator with excellent [performance](bench.html)
|
general purpose allocator with excellent [performance](bench.html)
|
||||||
characteristics. Initially
|
characteristics. Initially
|
||||||
developed by Daan Leijen for the run-time systems of the
|
developed by Daan Leijen for the run-time systems of the
|
||||||
[Koka](https://github.com/koka-lang/koka) and [Lean](https://github.com/leanprover/lean) languages.
|
[Koka](https://github.com/koka-lang/koka) and [Lean](https://github.com/leanprover/lean) languages.
|
||||||
|
|
||||||
|
It is a drop-in replacement for `malloc` and can be used in other programs
|
||||||
|
without code changes, for example, on Unix you can use it as:
|
||||||
|
```
|
||||||
|
> LD_PRELOAD=/usr/bin/libmimalloc.so myprogram
|
||||||
|
```
|
||||||
|
|
||||||
Notable aspects of the design include:
|
Notable aspects of the design include:
|
||||||
|
|
||||||
- __small and consistent__: the library is less than 3500 LOC using simple and
|
- __small and consistent__: the library is less than 3500 LOC using simple and
|
||||||
|
@ -25,23 +31,32 @@ Notable aspects of the design include:
|
||||||
to integrate and adapt in other projects. For runtime systems it
|
to integrate and adapt in other projects. For runtime systems it
|
||||||
provides hooks for a monotonic _heartbeat_ and deferred freeing (for
|
provides hooks for a monotonic _heartbeat_ and deferred freeing (for
|
||||||
bounded worst-case times with reference counting).
|
bounded worst-case times with reference counting).
|
||||||
- __free list sharding__: "the big idea": instead of one big free list (per size class) we have
|
- __free list sharding__: the big idea: instead of one big free list (per size class) we have
|
||||||
many smaller lists per memory "page" which both reduces fragmentation
|
many smaller lists per memory "page" which both reduces fragmentation
|
||||||
and increases locality --
|
and increases locality --
|
||||||
things that are allocated close in time get allocated close in memory.
|
things that are allocated close in time get allocated close in memory.
|
||||||
(A memory "page" in mimalloc contains blocks of one size class and is
|
(A memory "page" in _mimalloc_ contains blocks of one size class and is
|
||||||
usually 64KB on a 64-bit system).
|
usually 64KiB on a 64-bit system).
|
||||||
- __eager page reset__: when a "page" becomes empty (with increased chance
|
- __eager page reset__: when a "page" becomes empty (with increased chance
|
||||||
due to free list sharding) the memory is marked to the OS as unused ("reset" or "purged")
|
due to free list sharding) the memory is marked to the OS as unused ("reset" or "purged")
|
||||||
reducing (real) memory pressure and fragmentation, especially in long running
|
reducing (real) memory pressure and fragmentation, especially in long running
|
||||||
programs.
|
programs.
|
||||||
- __lazy initialization__: pages in a segment are lazily initialized so
|
- __secure__: _mimalloc_ can be build in secure mode, adding guard pages,
|
||||||
no memory is touched until it becomes allocated, reducing the resident
|
randomized allocation, encrypted free lists, etc. to protect against various
|
||||||
memory and potential page faults.
|
heap vulnerabilities. The performance penalty is only around 3% on average
|
||||||
|
over our benchmarks.
|
||||||
|
- __first-class heaps__: efficiently create and use multiple heaps to allocate across different regions.
|
||||||
|
A heap can be destroyed at once instead of deallocating each object separately.
|
||||||
- __bounded__: it does not suffer from _blowup_ \[1\], has bounded worst-case allocation
|
- __bounded__: it does not suffer from _blowup_ \[1\], has bounded worst-case allocation
|
||||||
times (_wcat_), bounded space overhead (~0.2% meta-data, with at most 16.7% waste in allocation sizes),
|
times (_wcat_), bounded space overhead (~0.2% meta-data, with at most 16.7% waste in allocation sizes),
|
||||||
and has no internal points of contention using atomic operations almost
|
and has no internal points of contention using only atomic operations.
|
||||||
everywhere.
|
- __fast__: In our benchmarks (see [below](#performance)),
|
||||||
|
_mimalloc_ always outperforms all other leading allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc),
|
||||||
|
and usually uses less memory (up to 25% more in the worst case). A nice property
|
||||||
|
is that it does consistently well over a wide range of benchmarks.
|
||||||
|
|
||||||
|
You can read more on the design of _mimalloc_ in the upcoming technical report
|
||||||
|
which also has detailed benchmark results.
|
||||||
|
|
||||||
Further information:
|
Further information:
|
||||||
|
|
||||||
|
@ -623,13 +638,13 @@ void mi_option_set_default(mi_option_t option, long value);
|
||||||
|
|
||||||
Checkout the sources from Github:
|
Checkout the sources from Github:
|
||||||
```
|
```
|
||||||
git clone https://github.com/koka-lang/mimalloc.git
|
git clone https://github.com/microsoft/mimalloc
|
||||||
```
|
```
|
||||||
|
|
||||||
## Windows
|
## Windows
|
||||||
|
|
||||||
Open `ide/vs2017/mimalloc.sln` in Visual Studio 2017 and build.
|
Open `ide/vs2017/mimalloc.sln` in Visual Studio 2017 and build.
|
||||||
The `mimalloc` project builds a static library, while the
|
The `mimalloc` project builds a static library (in `out/msvc-x64`), while the
|
||||||
`mimalloc-override` project builds a DLL for overriding malloc
|
`mimalloc-override` project builds a DLL for overriding malloc
|
||||||
in the entire program.
|
in the entire program.
|
||||||
|
|
||||||
|
@ -637,44 +652,50 @@ in the entire program.
|
||||||
|
|
||||||
We use [`cmake`](https://cmake.org)<sup>1</sup> as the build system:
|
We use [`cmake`](https://cmake.org)<sup>1</sup> as the build system:
|
||||||
|
|
||||||
- `mkdir -p out/release` (create a build directory)
|
```
|
||||||
- `cd out/release` (go to it)
|
> mkdir -p out/release
|
||||||
- `cmake ../..` (generate the make file)
|
> cd out/release
|
||||||
- `make` (and build)
|
> cmake ../..
|
||||||
|
> make
|
||||||
This will build the library as a shared (dynamic)
|
```
|
||||||
|
This builds the library as a shared (dynamic)
|
||||||
library (`.so` or `.dylib`), a static library (`.a`), and
|
library (`.so` or `.dylib`), a static library (`.a`), and
|
||||||
as a single object file (`.o`).
|
as a single object file (`.o`).
|
||||||
|
|
||||||
- `sudo make install` (install the library and header files in `/usr/lib` and `/usr/include`)
|
`> sudo make install` (install the library and header files in `/usr/local/lib` and `/usr/local/include`)
|
||||||
|
|
||||||
Use the option `-DCMAKE_INSTALL_PREFIX=../local` (for example) to the `ccmake`
|
|
||||||
command to install to a local directory to see what gets installed.
|
|
||||||
|
|
||||||
You can build the debug version which does many internal checks and
|
You can build the debug version which does many internal checks and
|
||||||
maintains detailed statistics as:
|
maintains detailed statistics as:
|
||||||
|
|
||||||
- `mkdir -p out/debug`
|
```
|
||||||
- `cd out/debug`
|
> mkdir -p out/debug
|
||||||
- `cmake -DCMAKE_BUILD_TYPE=Debug ../..`
|
> cd out/debug
|
||||||
- `make`
|
> cmake -DCMAKE_BUILD_TYPE=Debug ../..
|
||||||
|
> make
|
||||||
|
```
|
||||||
This will name the shared library as `libmimalloc-debug.so`.
|
This will name the shared library as `libmimalloc-debug.so`.
|
||||||
|
|
||||||
Or build with `clang`:
|
Finally, you can build a _secure_ version that uses guard pages, encrypted
|
||||||
|
free lists, etc, as:
|
||||||
- `CC=clang cmake ../..`
|
```
|
||||||
|
> mkdir -p out/secure
|
||||||
|
> cd out/secure
|
||||||
|
> cmake -DSECURE=ON ../..
|
||||||
|
> make
|
||||||
|
```
|
||||||
|
This will name the shared library as `libmimalloc-secure.so`.
|
||||||
Use `ccmake`<sup>2</sup> instead of `cmake`
|
Use `ccmake`<sup>2</sup> instead of `cmake`
|
||||||
to see and customize all the available build options.
|
to see and customize all the available build options.
|
||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
1. Install CMake: `sudo apt-get install cmake`
|
1. Install CMake: `sudo apt-get install cmake`
|
||||||
2. Install CCMake: `sudo apt-get install cmake-curses-gui`
|
2. Install CCMake: `sudo apt-get install cmake-curses-gui`
|
||||||
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/*! \page using Using the library
|
/*! \page using Using the library
|
||||||
|
|
||||||
|
|
||||||
The preferred usage is including `<mimalloc.h>`, linking with
|
The preferred usage is including `<mimalloc.h>`, linking with
|
||||||
the shared- or static library, and using the `mi_malloc` API exclusively for allocation. For example,
|
the shared- or static library, and using the `mi_malloc` API exclusively for allocation. For example,
|
||||||
```
|
```
|
||||||
|
@ -745,7 +766,7 @@ See \ref overrides for more info.
|
||||||
|
|
||||||
/*! \page overrides Overriding Malloc
|
/*! \page overrides Overriding Malloc
|
||||||
|
|
||||||
Overriding standard malloc can be done either _dynamically_ or _statically_.
|
Overriding the standard `malloc` can be done either _dynamically_ or _statically_.
|
||||||
|
|
||||||
## Dynamic override
|
## Dynamic override
|
||||||
|
|
||||||
|
@ -753,7 +774,7 @@ This is the recommended way to override the standard malloc interface.
|
||||||
|
|
||||||
### Unix, BSD, MacOSX
|
### Unix, BSD, MacOSX
|
||||||
|
|
||||||
On these system we preload the mimalloc shared
|
On these systems we preload the mimalloc shared
|
||||||
library so all calls to the standard `malloc` interface are
|
library so all calls to the standard `malloc` interface are
|
||||||
resolved to the _mimalloc_ library.
|
resolved to the _mimalloc_ library.
|
||||||
|
|
||||||
|
@ -770,7 +791,7 @@ env MIMALLOC_VERBOSE=1 LD_PRELOAD=/usr/lib/libmimalloc.so myprogram
|
||||||
```
|
```
|
||||||
or run with the debug version to get detailed statistics:
|
or run with the debug version to get detailed statistics:
|
||||||
```
|
```
|
||||||
env MIMALLOC_STATS=1 LD_PRELOAD=/usr/lib/libmimallocd.so myprogram
|
env MIMALLOC_STATS=1 LD_PRELOAD=/usr/lib/libmimalloc-debug.so myprogram
|
||||||
```
|
```
|
||||||
|
|
||||||
### Windows
|
### Windows
|
||||||
|
@ -780,7 +801,7 @@ DLL, and use the C-runtime library as a DLL (the `/MD` or `/MDd` switch).
|
||||||
To ensure the mimalloc DLL gets loaded it is easiest to insert some
|
To ensure the mimalloc DLL gets loaded it is easiest to insert some
|
||||||
call to the mimalloc API in the `main` function, like `mi_version()`.
|
call to the mimalloc API in the `main` function, like `mi_version()`.
|
||||||
|
|
||||||
Due to the way mimalloc overrides the standard malloc at runtime, it is best
|
Due to the way mimalloc intercepts the standard malloc at runtime, it is best
|
||||||
to link to the mimalloc import library first on the command line so it gets
|
to link to the mimalloc import library first on the command line so it gets
|
||||||
loaded right after the universal C runtime DLL (`ucrtbase`). See
|
loaded right after the universal C runtime DLL (`ucrtbase`). See
|
||||||
the `mimalloc-override-test` project for an example.
|
the `mimalloc-override-test` project for an example.
|
||||||
|
@ -788,9 +809,9 @@ the `mimalloc-override-test` project for an example.
|
||||||
|
|
||||||
## Static override
|
## Static override
|
||||||
|
|
||||||
You can also statically link with _mimalloc_ to override the standard
|
On Unix systems, you can also statically link with _mimalloc_ to override the standard
|
||||||
malloc interface. The recommended way is to link the final program with the
|
malloc interface. The recommended way is to link the final program with the
|
||||||
_mimalloc_ single object file (`mimalloc-override.o` (or `.obj`)). We use
|
_mimalloc_ single object file (`mimalloc-override.o`). We use
|
||||||
an object file instead of a library file as linkers give preference to
|
an object file instead of a library file as linkers give preference to
|
||||||
that over archives to resolve symbols. To ensure that the standard
|
that over archives to resolve symbols. To ensure that the standard
|
||||||
malloc interface resolves to the _mimalloc_ library, link it as the first
|
malloc interface resolves to the _mimalloc_ library, link it as the first
|
||||||
|
@ -858,239 +879,19 @@ void _free_dbg(void* p, int block_type);
|
||||||
|
|
||||||
/*! \page bench Performance
|
/*! \page bench Performance
|
||||||
|
|
||||||
|
We tested _mimalloc_ against many other top allocators over a wide
|
||||||
|
range of benchmarks, ranging from various real world programs to
|
||||||
|
synthetic benchmarks that see how the allocator behaves under more
|
||||||
|
extreme circumstances.
|
||||||
|
|
||||||
tldr: In our benchmarks, mimalloc always outperforms
|
In our benchmarks, _mimalloc_ always outperforms all other leading
|
||||||
all other leading allocators (jemalloc, tcmalloc, hoard, and glibc), and usually
|
allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc) (Apr 2019),
|
||||||
uses less memory (with less then 25% more in the worst case) (as of Jan 2019).
|
and usually uses less memory (up to 25% more in the worst case).
|
||||||
A nice property is that it does consistently well over a wide range of benchmarks.
|
A nice property is that it does *consistently* well over the wide
|
||||||
|
range of benchmarks.
|
||||||
|
|
||||||
Disclaimer: allocators are interesting as there is no optimal algorithm -- for
|
See the [Performance](https://github.com/microsoft/mimalloc#Performance)
|
||||||
a given allocator one can always construct a workload where it does not do so well.
|
section in the _mimalloc_ repository for benchmark results,
|
||||||
The goal is thus to find an allocation strategy that performs well over a wide
|
or the the technical report for detailed benchmark results.
|
||||||
range of benchmarks without suffering from underperformance in less
|
|
||||||
common situations (which is what our second benchmark set tests for).
|
|
||||||
|
|
||||||
|
|
||||||
## Benchmarking
|
|
||||||
|
|
||||||
We tested _mimalloc_ with 5 other allocators over 11 benchmarks.
|
|
||||||
The tested allocators are:
|
|
||||||
|
|
||||||
- **mi**: The mimalloc allocator (version tag `v1.0.0`).
|
|
||||||
- **je**: [jemalloc](https://github.com/jemalloc/jemalloc), by [Jason Evans](https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919) (Facebook);
|
|
||||||
currently (2018) one of the leading allocators and is widely used, for example
|
|
||||||
in BSD, Firefox, and at Facebook. Installed as package `libjemalloc-dev:amd64/bionic 3.6.0-11`.
|
|
||||||
- **tc**: [tcmalloc](https://github.com/gperftools/gperftools), by Google as part of the performance tools.
|
|
||||||
Highly performant and used in the Chrome browser. Installed as package `libgoogle-perftools-dev:amd64/bionic 2.5-2.2ubuntu3`.
|
|
||||||
- **jx**: A compiled version of a more recent instance of [jemalloc](https://github.com/jemalloc/jemalloc).
|
|
||||||
Using commit ` 7a815c1b` ([dev](https://github.com/jemalloc/jemalloc/tree/dev), 2019-01-15).
|
|
||||||
- **hd**: [Hoard](https://github.com/emeryberger/Hoard), by Emery Berger \[1].
|
|
||||||
One of the first multi-thread scalable allocators.
|
|
||||||
([master](https://github.com/emeryberger/Hoard), 2019-01-01, version tag `3.13`)
|
|
||||||
- **mc**: The system allocator. Here we use the LibC allocator (which is originally based on
|
|
||||||
PtMalloc). Using version 2.27. (Note that version 2.26 significantly improved scalability over
|
|
||||||
earlier versions).
|
|
||||||
|
|
||||||
All allocators run exactly the same benchmark programs and use `LD_PRELOAD` to override the system allocator.
|
|
||||||
The wall-clock elapsed time and peak resident memory (_rss_) are
|
|
||||||
measured with the `time` program. The best scores over 5 runs are used.
|
|
||||||
Performance is reported relative to mimalloc, e.g. a time of 66% means that
|
|
||||||
mimalloc ran 1.5× faster (i.e. that mimalloc finished in 66% of the time
|
|
||||||
that the other allocator needed).
|
|
||||||
|
|
||||||
## On a 16-core AMD EPYC running Linux
|
|
||||||
|
|
||||||
Testing on a big Amazon EC2 instance ([r5a.4xlarge](https://aws.amazon.com/ec2/instance-types/))
|
|
||||||
consisting of a 16-core AMD EPYC 7000 at 2.5GHz
|
|
||||||
with 128GB ECC memory, running Ubuntu 18.04.1 with LibC 2.27 and GCC 7.3.0.
|
|
||||||
|
|
||||||
|
|
||||||
The first benchmark set consists of programs that allocate a lot. Relative
|
|
||||||
elapsed time:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
and memory usage:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
The benchmarks above are (with N=16 in our case):
|
|
||||||
|
|
||||||
- __cfrac__: by Dave Barrett, implementation of continued fraction factorization:
|
|
||||||
uses many small short-lived allocations. Factorizes as `./cfrac 175451865205073170563711388363274837927895`.
|
|
||||||
- __espresso__: a programmable logic array analyzer \[3].
|
|
||||||
- __barnes__: a hierarchical n-body particle solver \[4]. Simulates 163840 particles.
|
|
||||||
- __leanN__: by Leonardo de Moura _et al_, the [lean](https://github.com/leanprover/lean)
|
|
||||||
compiler, version 3.4.1, compiling its own standard library concurrently using N cores (`./lean --make -j N`).
|
|
||||||
Big real-world workload with intensive allocation, takes about 1:40s when running on a
|
|
||||||
single high-end core.
|
|
||||||
- __redis__: running the [redis](https://redis.io/) 5.0.3 server on
|
|
||||||
1 million requests pushing 10 new list elements and then requesting the
|
|
||||||
head 10 elements. Measures the requests handled per second.
|
|
||||||
- __alloc-test__: a modern [allocator test](http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/)
|
|
||||||
developed by by OLogN Technologies AG at [ITHare.com](http://ithare.com). Simulates intensive allocation workloads with a Pareto
|
|
||||||
size distribution. The `alloc-testN` benchmark runs on N cores doing 100×10<sup>6</sup>
|
|
||||||
allocations per thread with objects up to 1KB in size.
|
|
||||||
Using commit `94f6cb` ([master](https://github.com/node-dot-cpp/alloc-test), 2018-07-04)
|
|
||||||
|
|
||||||
We can see mimalloc outperforms the other allocators moderately but all
|
|
||||||
these modern allocators perform well.
|
|
||||||
In `cfrac`, mimalloc is about 13%
|
|
||||||
faster than jemalloc for many small and short-lived allocations.
|
|
||||||
The `cfrac` and `espresso` programs do not use much
|
|
||||||
memory (~1.5MB) so it does not matter too much, but still mimalloc uses about half the resident
|
|
||||||
memory of tcmalloc (and almost 5× less than Hoard on `espresso`).
|
|
||||||
|
|
||||||
_The `leanN` program is most interesting as a large realistic and concurrent
|
|
||||||
workload and there is a 6% speedup over both tcmalloc and jemalloc. This is
|
|
||||||
quite significant: if Lean spends (optimistically) 20% of its time in the allocator
|
|
||||||
that means that mimalloc is 1.5× faster than the others._
|
|
||||||
|
|
||||||
The `alloc-test` is very allocation intensive and we see the larger
|
|
||||||
diffrerences here. Since all allocators perform almost identical on `alloc-test1`
|
|
||||||
as `alloc-testN`, we can see that they are all excellent and scale (almost) linearly.
|
|
||||||
|
|
||||||
The second benchmark set test specific aspects of the allocators and
|
|
||||||
shows more extreme differences between allocators:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
The benchmarks in the second set are (again with N=16):
|
|
||||||
|
|
||||||
- __larson__: by Larson and Krishnan \[2]. Simulates a server workload using 100
|
|
||||||
separate threads where
|
|
||||||
they allocate and free many objects but leave some objects to
|
|
||||||
be freed by other threads. Larson and Krishnan observe this behavior
|
|
||||||
(which they call _bleeding_) in actual server applications, and the
|
|
||||||
benchmark simulates this.
|
|
||||||
- __sh6bench__: by [MicroQuill](http://www.microquill.com) as part of SmartHeap. Stress test for
|
|
||||||
single-threaded allocation where some of the objects are freed
|
|
||||||
in a usual last-allocated, first-freed (LIFO) order, but others
|
|
||||||
are freed in reverse order. Using the public [source](http://www.microquill.com/smartheap/shbench/bench.zip) (retrieved 2019-01-02)
|
|
||||||
- __sh8bench__: by [MicroQuill](http://www.microquill.com) as part of SmartHeap. Stress test for
|
|
||||||
multithreaded allocation (with N threads) where, just as in `larson`, some objects are freed
|
|
||||||
by other threads, and some objects freed in reverse (as in `sh6bench`).
|
|
||||||
Using the public [source](http://www.microquill.com/smartheap/SH8BENCH.zip) (retrieved 2019-01-02)
|
|
||||||
- __cache-scratch__: by Emery Berger _et al_ \[1]. Introduced with the Hoard
|
|
||||||
allocator to test for _passive-false_ sharing of cache lines: first some
|
|
||||||
small objects are allocated and given to each thread; the threads free that
|
|
||||||
object and allocate another one and access that repeatedly. If an allocator
|
|
||||||
allocates objects from different threads close to each other this will
|
|
||||||
lead to cache-line contention.
|
|
||||||
|
|
||||||
In the `larson` server workload mimalloc is 2.5× faster than
|
|
||||||
tcmalloc and jemalloc which is quite surprising -- probably due to the object
|
|
||||||
migration between different threads. Also in `sh6bench` mimalloc does much
|
|
||||||
better than the others (more than 4× faster than jemalloc). a
|
|
||||||
We cannot explain this well but believe it may be
|
|
||||||
caused in part by the "reverse" free-ing in `sh6bench`. Again in `sh8bench`
|
|
||||||
the mimalloc allocator handles object migration between threads much better .
|
|
||||||
|
|
||||||
The `cache-scratch` benchmark also demonstrates the different architectures
|
|
||||||
of the allocators nicely. With a single thread they all perform the same, but when
|
|
||||||
running with multiple threads the allocator induced false sharing of the
|
|
||||||
cache lines causes large run-time differences, where mimalloc is up to
|
|
||||||
20× faster than tcmalloc here. Only the original jemalloc does almost
|
|
||||||
as well (but the most recent version, jxmalloc, regresses). The
|
|
||||||
Hoard allocator is specifically designed to avoid this false sharing and we
|
|
||||||
are not sure why it is not doing well here (although it runs still 5× as
|
|
||||||
fast as tcmalloc and jxmalloc).
|
|
||||||
|
|
||||||
|
|
||||||
## On a 8-core Intel Xeon running Linux
|
|
||||||
|
|
||||||
Testing on a compute optimized Amazon EC2 instance ([c5d.2xlarge](https://aws.amazon.com/ec2/instance-types/))
|
|
||||||
consisting of a 8-core Intel Xeon Platinum at 3GHz (up to 3.5GHz turbo boost)
|
|
||||||
with 16GB ECC memory, running Ubuntu 18.04.1 with LibC 2.27 and GCC 7.3.0.
|
|
||||||
|
|
||||||
First the regular workload benchmarks (with N=8):
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Most results are quite similar to the 16-core AMD machine except the
|
|
||||||
the differences are less pronounced with all a bit closer to mimalloc performance.
|
|
||||||
|
|
||||||
This is shown too in the second set of benchmarks:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
On the server workload of `larson` everyone does a bit better on the 8-cores
|
|
||||||
than on 16. On the other benchmarks the performance does not improve though.
|
|
||||||
|
|
||||||
|
|
||||||
## On Windows (4-core Intel Xeon)
|
|
||||||
|
|
||||||
Testing on a HP Z4 G4 Workstation with a 4-core Intel® Xeon® W2123 at 3.6 GHz
|
|
||||||
with 16GB ECC memory, running Windows 10 Pro (version 10.0.17134 Build 17134)
|
|
||||||
with Visual Studio 2017 (version 15.8.9).
|
|
||||||
|
|
||||||
Since we cannot use `LD_PRELOAD` on Windows we compiled a subset of our
|
|
||||||
allocators and benchmarks and linked them statically. The **je** benchmark
|
|
||||||
is therefore equivalent to the **jx** benchmark in the previous graphs.
|
|
||||||
The **mc** allocator now refers to the standard Microsoft allocator.
|
|
||||||
Unfortunately we could not get Hoard to work on Windows at this time.
|
|
||||||
|
|
||||||
We used the Windows call `QueryPerformanceCounter` to measure elapsed wall-clock
|
|
||||||
times, and `GetProcessMemoryInfo` to measure the peak working set (rss).
|
|
||||||
|
|
||||||
First the regular workload benchmarks:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Here mimalloc and tcmalloc perform very similar, and outperform the system
|
|
||||||
allocator by a significant margin. Somehow jemalloc does much worse than
|
|
||||||
running on Linux. It it not clear why yet, but it might be a compilation issue:
|
|
||||||
when running through the profiler the `__chkstk` routine takes
|
|
||||||
quite some time. This is a compiler inserted runtime function to check for enough
|
|
||||||
stack space if there are many local variables or when the compiler cannot make
|
|
||||||
a static estimate. Perhaps this is the culprit but it needs more investigation.
|
|
||||||
|
|
||||||
The second set of benchmarks shows again more pronounced differences:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
In the `larson` server workload mimalloc is 25% faster than
|
|
||||||
tcmalloc, and both significantly outperform the system allocator.
|
|
||||||
(again probably due to the object
|
|
||||||
migration between different threads).
|
|
||||||
Also in `sh6bench` and `sh8bench`, mimalloc scales much
|
|
||||||
better than the others.
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- \[1] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson.
|
|
||||||
_Hoard: A Scalable Memory Allocator for Multithreaded Applications_
|
|
||||||
the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). Cambridge, MA, November 2000.
|
|
||||||
[pdf](http://www.cs.utexas.edu/users/mckinley/papers/asplos-2000.pdf)
|
|
||||||
|
|
||||||
- \[2] P. Larson and M. Krishnan. _Memory allocation for long-running server applications_. In ISMM, Vancouver, B.C., Canada, 1998.
|
|
||||||
[pdf](http://citeseemi.ist.psu.edu/viewdoc/download;jsessionid=5F0BFB4F57832AEB6C11BF8257271088?doi=10.1.1.45.1947&rep=rep1&type=pdf)
|
|
||||||
|
|
||||||
- \[3] D. Grunwald, B. Zorn, and R. Henderson.
|
|
||||||
_Improving the cache locality of memory allocation_. In R. Cartwright, editor,
|
|
||||||
Proceedings of the Conference on Programming Language Design and Implementation, pages 177–186, New York, NY, USA, June 1993.
|
|
||||||
[pdf](http://citeseemi.ist.psu.edu/viewdoc/download?doi=10.1.1.43.6621&rep=rep1&type=pdf)
|
|
||||||
|
|
||||||
- \[4] J. Barnes and P. Hut. _A hierarchical O(n*log(n)) force-calculation algorithm_. Nature, 324:446-449, 1986.
|
|
||||||
|
|
||||||
*/
|
*/
|
||||||
|
|
Loading…
Add table
Reference in a new issue