merge from dev

This commit is contained in:
daanx 2024-06-17 16:18:03 -07:00
commit 3726cf94ba
106 changed files with 6623 additions and 5540 deletions

File diff suppressed because it is too large Load diff

View file

@ -25,12 +25,15 @@ without code changes, for example, on Unix you can use it as:
```
Notable aspects of the design include:
- __small and consistent__: the library is about 8k LOC using simple and
consistent data structures. This makes it very suitable
to integrate and adapt in other projects. For runtime systems it
provides hooks for a monotonic _heartbeat_ and deferred freeing (for
bounded worst-case times with reference counting).
Partly due to its simplicity, mimalloc has been ported to many systems (Windows, macOS,
Linux, WASM, various BSD's, Haiku, MUSL, etc) and has excellent support for dynamic overriding.
At the same time, it is an industrial strength allocator that runs (very) large scale
distributed services on thousands of machines with excellent worst case latencies.
- __free list sharding__: instead of one big free list (per size class) we have
many smaller lists per "mimalloc page" which reduces fragmentation and
increases locality --
@ -45,23 +48,23 @@ Notable aspects of the design include:
and the chance of contending on a single location will be low -- this is quite
similar to randomized algorithms like skip lists where adding
a random oracle removes the need for a more complex algorithm.
- __eager page reset__: when a "page" becomes empty (with increased chance
due to free list sharding) the memory is marked to the OS as unused ("reset" or "purged")
- __eager page purging__: when a "page" becomes empty (with increased chance
due to free list sharding) the memory is marked to the OS as unused (reset or decommitted)
reducing (real) memory pressure and fragmentation, especially in long running
programs.
- __secure__: _mimalloc_ can be build in secure mode, adding guard pages,
- __secure__: _mimalloc_ can be built in secure mode, adding guard pages,
randomized allocation, encrypted free lists, etc. to protect against various
heap vulnerabilities. The performance penalty is only around 5% on average
heap vulnerabilities. The performance penalty is usually around 10% on average
over our benchmarks.
- __first-class heaps__: efficiently create and use multiple heaps to allocate across different regions.
A heap can be destroyed at once instead of deallocating each object separately.
- __bounded__: it does not suffer from _blowup_ \[1\], has bounded worst-case allocation
times (_wcat_), bounded space overhead (~0.2% meta-data, with low internal fragmentation),
and has no internal points of contention using only atomic operations.
- __fast__: In our benchmarks (see [below](#performance)),
_mimalloc_ outperforms all other leading allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc),
and usually uses less memory (up to 25% more in the worst case). A nice property
is that it does consistently well over a wide range of benchmarks.
times (_wcat_) (upto OS primitives), bounded space overhead (~0.2% meta-data, with low
internal fragmentation), and has no internal points of contention using only atomic operations.
- __fast__: In our benchmarks (see [below](#bench)),
_mimalloc_ outperforms other leading allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc),
and often uses less memory. A nice property is that it does consistently well over a wide range
of benchmarks. There is also good huge OS page support for larger server programs.
You can read more on the design of _mimalloc_ in the
[technical report](https://www.microsoft.com/en-us/research/publication/mimalloc-free-list-sharding-in-action)
@ -278,8 +281,7 @@ void* mi_zalloc_small(size_t size);
/// The returned size can be
/// used to call \a mi_expand successfully.
/// The returned size is always at least equal to the
/// allocated size of \a p, and, in the current design,
/// should be less than 16.7% more.
/// allocated size of \a p.
///
/// @see [_msize](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/msize?view=vs-2017) (Windows)
/// @see [malloc_usable_size](http://man7.org/linux/man-pages/man3/malloc_usable_size.3.html) (Linux)
@ -304,7 +306,7 @@ size_t mi_good_size(size_t size);
/// in very narrow circumstances; in particular, when a long running thread
/// allocates a lot of blocks that are freed by other threads it may improve
/// resource usage by calling this every once in a while.
void mi_collect(bool force);
void mi_collect(bool force);
/// Deprecated
/// @param out Ignored, outputs to the registered output function or stderr by default.
@ -428,7 +430,7 @@ int mi_reserve_os_memory(size_t size, bool commit, bool allow_large);
/// allocated in some manner and available for use my mimalloc.
/// @param start Start of the memory area
/// @param size The size of the memory area.
/// @param commit Is the area already committed?
/// @param is_committed Is the area already committed?
/// @param is_large Does it consist of large OS pages? Set this to \a true as well for memory
/// that should not be decommitted or protected (like rdma etc.)
/// @param is_zero Does the area consists of zero's?
@ -453,7 +455,7 @@ int mi_reserve_huge_os_pages_interleave(size_t pages, size_t numa_nodes, size_t
/// Reserve \a pages of huge OS pages (1GiB) at a specific \a numa_node,
/// but stops after at most `timeout_msecs` seconds.
/// @param pages The number of 1GiB pages to reserve.
/// @param numa_node The NUMA node where the memory is reserved (start at 0).
/// @param numa_node The NUMA node where the memory is reserved (start at 0). Use -1 for no affinity.
/// @param timeout_msecs Maximum number of milli-seconds to try reserving, or 0 for no timeout.
/// @returns 0 if successful, \a ENOMEM if running out of memory, or \a ETIMEDOUT if timed out.
///
@ -486,6 +488,91 @@ bool mi_is_redirected();
/// on other systems as the amount of read/write accessible memory reserved by mimalloc.
void mi_process_info(size_t* elapsed_msecs, size_t* user_msecs, size_t* system_msecs, size_t* current_rss, size_t* peak_rss, size_t* current_commit, size_t* peak_commit, size_t* page_faults);
/// @brief Show all current arena's.
/// @param show_inuse Show the arena blocks that are in use.
/// @param show_abandoned Show the abandoned arena blocks.
/// @param show_purge Show arena blocks scheduled for purging.
void mi_debug_show_arenas(bool show_inuse, bool show_abandoned, bool show_purge);
/// Mimalloc uses large (virtual) memory areas, called "arena"s, from the OS to manage its memory.
/// Each arena has an associated identifier.
typedef int mi_arena_id_t;
/// @brief Return the size of an arena.
/// @param arena_id The arena identifier.
/// @param size Returned size in bytes of the (virtual) arena area.
/// @return base address of the arena.
void* mi_arena_area(mi_arena_id_t arena_id, size_t* size);
/// @brief Reserve huge OS pages (1GiB) into a single arena.
/// @param pages Number of 1GiB pages to reserve.
/// @param numa_node The associated NUMA node, or -1 for no NUMA preference.
/// @param timeout_msecs Max amount of milli-seconds this operation is allowed to take. (0 is infinite)
/// @param exclusive If exclusive, only a heap associated with this arena can allocate in it.
/// @param arena_id The arena identifier.
/// @return 0 if successful, \a ENOMEM if running out of memory, or \a ETIMEDOUT if timed out.
int mi_reserve_huge_os_pages_at_ex(size_t pages, int numa_node, size_t timeout_msecs, bool exclusive, mi_arena_id_t* arena_id);
/// @brief Reserve OS memory to be managed in an arena.
/// @param size Size the reserve.
/// @param commit Should the memory be initially committed?
/// @param allow_large Allow the use of large OS pages?
/// @param exclusive Is the returned arena exclusive?
/// @param arena_id The new arena identifier.
/// @return Zero on success, an error code otherwise.
int mi_reserve_os_memory_ex(size_t size, bool commit, bool allow_large, bool exclusive, mi_arena_id_t* arena_id);
/// @brief Manage externally allocated memory as a mimalloc arena. This memory will not be freed by mimalloc.
/// @param start Start address of the area.
/// @param size Size in bytes of the area.
/// @param is_committed Is the memory already committed?
/// @param is_large Does it consist of (pinned) large OS pages?
/// @param is_zero Is the memory zero-initialized?
/// @param numa_node Associated NUMA node, or -1 to have no NUMA preference.
/// @param exclusive Is the arena exclusive (where only heaps associated with the arena can allocate in it)
/// @param arena_id The new arena identifier.
/// @return `true` if successful.
bool mi_manage_os_memory_ex(void* start, size_t size, bool is_committed, bool is_large, bool is_zero, int numa_node, bool exclusive, mi_arena_id_t* arena_id);
/// @brief Create a new heap that only allocates in the specified arena.
/// @param arena_id The arena identifier.
/// @return The new heap or `NULL`.
mi_heap_t* mi_heap_new_in_arena(mi_arena_id_t arena_id);
/// @brief Create a new heap
/// @param heap_tag The heap tag associated with this heap; heaps only reclaim memory between heaps with the same tag.
/// @param allow_destroy Is \a mi_heap_destroy allowed? Not allowing this allows the heap to reclaim memory from terminated threads.
/// @param arena_id If not 0, the heap will only allocate from the specified arena.
/// @return A new heap or `NULL` on failure.
///
/// The \a arena_id can be used by runtimes to allocate only in a specified pre-reserved arena.
/// This is used for example for a compressed pointer heap in Koka.
/// The \a heap_tag enables heaps to keep objects of a certain type isolated to heaps with that tag.
/// This is used for example in the CPython integration.
mi_heap_t* mi_heap_new_ex(int heap_tag, bool allow_destroy, mi_arena_id_t arena_id);
/// A process can associate threads with sub-processes.
/// A sub-process will not reclaim memory from (abandoned heaps/threads)
/// other subprocesses.
typedef void* mi_subproc_id_t;
/// @brief Get the main sub-process identifier.
mi_subproc_id_t mi_subproc_main(void);
/// @brief Create a fresh sub-process (with no associated threads yet).
/// @return The new sub-process identifier.
mi_subproc_id_t mi_subproc_new(void);
/// @brief Delete a previously created sub-process.
/// @param subproc The sub-process identifier.
/// Only delete sub-processes if all associated threads have terminated.
void mi_subproc_delete(mi_subproc_id_t subproc);
/// Add the current thread to the given sub-process.
/// This should be called right after a thread is created (and no allocation has taken place yet)
void mi_subproc_add_current_thread(mi_subproc_id_t subproc);
/// \}
// ------------------------------------------------------
@ -495,20 +582,24 @@ void mi_process_info(size_t* elapsed_msecs, size_t* user_msecs, size_t* system_m
/// \defgroup aligned Aligned Allocation
///
/// Allocating aligned memory blocks.
/// Note that `alignment` always follows `size` for consistency with the unaligned
/// allocation API, but unfortunately this differs from `posix_memalign` and `aligned_alloc` in the C library.
///
/// \{
/// The maximum supported alignment size (currently 1MiB).
#define MI_BLOCK_ALIGNMENT_MAX (1024*1024UL)
/// Allocate \a size bytes aligned by \a alignment.
/// @param size number of bytes to allocate.
/// @param alignment the minimal alignment of the allocated memory. Must be less than #MI_BLOCK_ALIGNMENT_MAX.
/// @returns pointer to the allocated memory or \a NULL if out of memory.
/// The returned pointer is aligned by \a alignment, i.e.
/// `(uintptr_t)p % alignment == 0`.
///
/// @param alignment the minimal alignment of the allocated memory.
/// @returns pointer to the allocated memory or \a NULL if out of memory,
/// or if the alignment is not a power of 2 (including 0). The \a size is unrestricted
/// (and does not have to be an integral multiple of the \a alignment).
/// The returned pointer is aligned by \a alignment, i.e. `(uintptr_t)p % alignment == 0`.
/// Returns a unique pointer if called with \a size 0.
///
/// Note that `alignment` always follows `size` for consistency with the unaligned
/// allocation API, but unfortunately this differs from `posix_memalign` and `aligned_alloc` in the C library.
///
/// @see [aligned_alloc](https://en.cppreference.com/w/c/memory/aligned_alloc) (in the standard C11 library, with switched arguments!)
/// @see [_aligned_malloc](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-malloc?view=vs-2017) (on Windows)
/// @see [aligned_alloc](http://man.openbsd.org/reallocarray) (on BSD, with switched arguments!)
/// @see [posix_memalign](https://linux.die.net/man/3/posix_memalign) (on Posix, with switched arguments!)
@ -522,11 +613,12 @@ void* mi_realloc_aligned(void* p, size_t newsize, size_t alignment);
/// @param size number of bytes to allocate.
/// @param alignment the minimal alignment of the allocated memory at \a offset.
/// @param offset the offset that should be aligned.
/// @returns pointer to the allocated memory or \a NULL if out of memory.
/// The returned pointer is aligned by \a alignment at \a offset, i.e.
/// `((uintptr_t)p + offset) % alignment == 0`.
///
/// @returns pointer to the allocated memory or \a NULL if out of memory,
/// or if the alignment is not a power of 2 (including 0). The \a size is unrestricted
/// (and does not have to be an integral multiple of the \a alignment).
/// The returned pointer is aligned by \a alignment, i.e. `(uintptr_t)p % alignment == 0`.
/// Returns a unique pointer if called with \a size 0.
///
/// @see [_aligned_offset_malloc](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-offset-malloc?view=vs-2017) (on Windows)
void* mi_malloc_aligned_at(size_t size, size_t alignment, size_t offset);
void* mi_zalloc_aligned_at(size_t size, size_t alignment, size_t offset);
@ -574,12 +666,12 @@ void mi_heap_delete(mi_heap_t* heap);
/// heap is set to the backing heap.
void mi_heap_destroy(mi_heap_t* heap);
/// Set the default heap to use for mi_malloc() et al.
/// Set the default heap to use in the current thread for mi_malloc() et al.
/// @param heap The new default heap.
/// @returns The previous default heap.
mi_heap_t* mi_heap_set_default(mi_heap_t* heap);
/// Get the default heap that is used for mi_malloc() et al.
/// Get the default heap that is used for mi_malloc() et al. (for the current thread).
/// @returns The current default heap.
mi_heap_t* mi_heap_get_default();
@ -764,6 +856,8 @@ typedef struct mi_heap_area_s {
size_t committed; ///< current committed bytes of this area
size_t used; ///< bytes in use by allocated blocks
size_t block_size; ///< size in bytes of one block
size_t full_block_size; ///< size in bytes of a full block including padding and metadata.
int heap_tag; ///< heap tag associated with this area (see \a mi_heap_new_ex)
} mi_heap_area_t;
/// Visitor function passed to mi_heap_visit_blocks()
@ -788,6 +882,23 @@ typedef bool (mi_block_visit_fun)(const mi_heap_t* heap, const mi_heap_area_t* a
/// @returns \a true if all areas and blocks were visited.
bool mi_heap_visit_blocks(const mi_heap_t* heap, bool visit_all_blocks, mi_block_visit_fun* visitor, void* arg);
/// @brief Visit all areas and blocks in abandoned heaps.
/// @param subproc_id The sub-process id associated with the abandonded heaps.
/// @param heap_tag Visit only abandoned memory with the specified heap tag, use -1 to visit all abandoned memory.
/// @param visit_blocks If \a true visits all allocated blocks, otherwise
/// \a visitor is only called for every heap area.
/// @param visitor This function is called for every area in the heap
/// (with \a block as \a NULL). If \a visit_all_blocks is
/// \a true, \a visitor is also called for every allocated
/// block in every area (with `block!=NULL`).
/// return \a false from this function to stop visiting early.
/// @param arg extra argument passed to the \a visitor.
/// @return \a true if all areas and blocks were visited.
///
/// Note: requires the option `mi_option_visit_abandoned` to be set
/// at the start of the program.
bool mi_abandoned_visit_blocks(mi_subproc_id_t subproc_id, int heap_tag, bool visit_blocks, mi_block_visit_fun* visitor, void* arg);
/// \}
/// \defgroup options Runtime Options
@ -799,34 +910,38 @@ bool mi_heap_visit_blocks(const mi_heap_t* heap, bool visit_all_blocks, mi_block
/// Runtime options.
typedef enum mi_option_e {
// stable options
mi_option_show_errors, ///< Print error messages to `stderr`.
mi_option_show_stats, ///< Print statistics to `stderr` when the program is done.
mi_option_verbose, ///< Print verbose messages to `stderr`.
mi_option_show_errors, ///< Print error messages.
mi_option_show_stats, ///< Print statistics on termination.
mi_option_verbose, ///< Print verbose messages.
mi_option_max_errors, ///< issue at most N error messages
mi_option_max_warnings, ///< issue at most N warning messages
// the following options are experimental
mi_option_eager_commit, ///< Eagerly commit segments (4MiB) (enabled by default).
mi_option_large_os_pages, ///< Use large OS pages (2MiB in size) if possible
mi_option_reserve_huge_os_pages, ///< The number of huge OS pages (1GiB in size) to reserve at the start of the program.
mi_option_reserve_huge_os_pages_at, ///< Reserve huge OS pages at node N.
mi_option_reserve_os_memory, ///< Reserve specified amount of OS memory at startup, e.g. "1g" or "512m".
mi_option_segment_cache, ///< The number of segments per thread to keep cached (0).
mi_option_page_reset, ///< Reset page memory after \a mi_option_reset_delay milliseconds when it becomes free.
mi_option_abandoned_page_reset, //< Reset free page memory when a thread terminates.
mi_option_use_numa_nodes, ///< Pretend there are at most N NUMA nodes; Use 0 to use the actual detected NUMA nodes at runtime.
mi_option_eager_commit_delay, ///< the first N segments per thread are not eagerly committed (=1).
mi_option_os_tag, ///< OS tag to assign to mimalloc'd memory
mi_option_limit_os_alloc, ///< If set to 1, do not use OS memory for allocation (but only pre-reserved arenas)
// advanced options
mi_option_reserve_huge_os_pages, ///< reserve N huge OS pages (1GiB pages) at startup
mi_option_reserve_huge_os_pages_at, ///< Reserve N huge OS pages at a specific NUMA node N.
mi_option_reserve_os_memory, ///< reserve specified amount of OS memory in an arena at startup (internally, this value is in KiB; use `mi_option_get_size`)
mi_option_allow_large_os_pages, ///< allow large (2 or 4 MiB) OS pages, implies eager commit. If false, also disables THP for the process.
mi_option_purge_decommits, ///< should a memory purge decommit? (=1). Set to 0 to use memory reset on a purge (instead of decommit)
mi_option_arena_reserve, ///< initial memory size for arena reservation (= 1 GiB on 64-bit) (internally, this value is in KiB; use `mi_option_get_size`)
mi_option_os_tag, ///< tag used for OS logging (macOS only for now) (=100)
mi_option_retry_on_oom, ///< retry on out-of-memory for N milli seconds (=400), set to 0 to disable retries. (only on windows)
// v1.x specific options
mi_option_eager_region_commit, ///< Eagerly commit large (256MiB) memory regions (enabled by default, except on Windows)
mi_option_segment_reset, ///< Experimental
mi_option_reset_delay, ///< Delay in milli-seconds before resetting a page (100ms by default)
mi_option_purge_decommits, ///< Experimental
// v2.x specific options
mi_option_allow_purge, ///< Enable decommitting memory (=on)
mi_option_purge_delay, ///< Decommit page memory after N milli-seconds delay (25ms).
mi_option_segment_purge_delay, ///< Decommit large segment memory after N milli-seconds delay (500ms).
// experimental options
mi_option_eager_commit, ///< eager commit segments? (after `eager_commit_delay` segments) (enabled by default).
mi_option_eager_commit_delay, ///< the first N segments per thread are not eagerly committed (but per page in the segment on demand)
mi_option_arena_eager_commit, ///< eager commit arenas? Use 2 to enable just on overcommit systems (=2)
mi_option_abandoned_page_purge, ///< immediately purge delayed purges on thread termination
mi_option_purge_delay, ///< memory purging is delayed by N milli seconds; use 0 for immediate purging or -1 for no purging at all. (=10)
mi_option_use_numa_nodes, ///< 0 = use all available numa nodes, otherwise use at most N nodes.
mi_option_disallow_os_alloc, ///< 1 = do not use OS memory for allocation (but only programmatically reserved arenas)
mi_option_limit_os_alloc, ///< If set to 1, do not use OS memory for allocation (but only pre-reserved arenas)
mi_option_max_segment_reclaim, ///< max. percentage of the abandoned segments can be reclaimed per try (=10%)
mi_option_destroy_on_exit, ///< if set, release all memory on exit; sometimes used for dynamic unloading but can be unsafe
mi_option_arena_purge_mult, ///< multiplier for `purge_delay` for the purging delay for arenas (=10)
mi_option_abandoned_reclaim_on_free, ///< allow to reclaim an abandoned segment on a free (=1)
mi_option_purge_extend_delay, ///< extend purge delay on each subsequent delay (=1)
mi_option_disallow_arena_alloc, ///< 1 = do not use arena's for allocation (except if using specific arena id's)
mi_option_visit_abandoned, ///< allow visiting heap blocks from abandoned threads (=0)
_mi_option_last
} mi_option_t;
@ -838,7 +953,10 @@ void mi_option_disable(mi_option_t option);
void mi_option_set_enabled(mi_option_t option, bool enable);
void mi_option_set_enabled_default(mi_option_t option, bool enable);
long mi_option_get(mi_option_t option);
long mi_option_get(mi_option_t option);
long mi_option_get_clamp(mi_option_t option, long min, long max);
size_t mi_option_get_size(mi_option_t option);
void mi_option_set(mi_option_t option, long value);
void mi_option_set_default(mi_option_t option, long value);
@ -852,21 +970,27 @@ void mi_option_set_default(mi_option_t option, long value);
///
/// \{
void* mi_recalloc(void* p, size_t count, size_t size);
size_t mi_malloc_size(const void* p);
size_t mi_malloc_usable_size(const void *p);
/// Just as `free` but also checks if the pointer `p` belongs to our heap.
void mi_cfree(void* p);
void* mi__expand(void* p, size_t newsize);
void* mi_recalloc(void* p, size_t count, size_t size);
size_t mi_malloc_size(const void* p);
size_t mi_malloc_good_size(size_t size);
size_t mi_malloc_usable_size(const void *p);
int mi_posix_memalign(void** p, size_t alignment, size_t size);
int mi__posix_memalign(void** p, size_t alignment, size_t size);
void* mi_memalign(size_t alignment, size_t size);
void* mi_valloc(size_t size);
void* mi_pvalloc(size_t size);
void* mi_aligned_alloc(size_t alignment, size_t size);
unsigned short* mi_wcsdup(const unsigned short* s);
unsigned char* mi_mbsdup(const unsigned char* s);
int mi_dupenv_s(char** buf, size_t* size, const char* name);
int mi_wdupenv_s(unsigned short** buf, size_t* size, const unsigned short* name);
/// Correspond s to [reallocarray](https://www.freebsd.org/cgi/man.cgi?query=reallocarray&sektion=3&manpath=freebsd-release-ports)
/// in FreeBSD.
void* mi_reallocarray(void* p, size_t count, size_t size);
@ -874,6 +998,9 @@ void* mi_reallocarray(void* p, size_t count, size_t size);
/// Corresponds to [reallocarr](https://man.netbsd.org/reallocarr.3) in NetBSD.
int mi_reallocarr(void* p, size_t count, size_t size);
void* mi_aligned_recalloc(void* p, size_t newcount, size_t size, size_t alignment);
void* mi_aligned_offset_recalloc(void* p, size_t newcount, size_t size, size_t alignment, size_t offset);
void mi_free_size(void* p, size_t size);
void mi_free_size_aligned(void* p, size_t size, size_t alignment);
void mi_free_aligned(void* p, size_t alignment);
@ -998,7 +1125,7 @@ mimalloc uses only safe OS calls (`mmap` and `VirtualAlloc`) and can co-exist
with other allocators linked to the same program.
If you use `cmake`, you can simply use:
```
find_package(mimalloc 1.0 REQUIRED)
find_package(mimalloc 2.1 REQUIRED)
```
in your `CMakeLists.txt` to find a locally installed mimalloc. Then use either:
```
@ -1071,38 +1198,63 @@ See \ref overrides for more info.
/*! \page environment Environment Options
You can set further options either programmatically (using [`mi_option_set`](https://microsoft.github.io/mimalloc/group__options.html)),
or via environment variables.
You can set further options either programmatically (using [`mi_option_set`](https://microsoft.github.io/mimalloc/group__options.html)), or via environment variables:
- `MIMALLOC_SHOW_STATS=1`: show statistics when the program terminates.
- `MIMALLOC_VERBOSE=1`: show verbose messages.
- `MIMALLOC_SHOW_ERRORS=1`: show error and warning messages.
- `MIMALLOC_PAGE_RESET=0`: by default, mimalloc will reset (or purge) OS pages when not in use to signal to the OS
that the underlying physical memory can be reused. This can reduce memory fragmentation in long running (server)
programs. By setting it to `0` no such page resets will be done which can improve performance for programs that are not long
running. As an alternative, the `MIMALLOC_DECOMMIT_DELAY=`<msecs> can be set higher (100ms by default) to make the page
reset occur less frequently instead of turning it off completely.
- `MIMALLOC_LARGE_OS_PAGES=1`: use large OS pages (2MiB) when available; for some workloads this can significantly
improve performance. Use `MIMALLOC_VERBOSE` to check if the large OS pages are enabled -- usually one needs
to explicitly allow large OS pages (as on [Windows][windows-huge] and [Linux][linux-huge]). However, sometimes
Advanced options:
- `MIMALLOC_ARENA_EAGER_COMMIT=2`: turns on eager commit for the large arenas (usually 1GiB) from which mimalloc
allocates segments and pages. Set this to 2 (default) to
only enable this on overcommit systems (e.g. Linux). Set this to 1 to enable explicitly on other systems
as well (like Windows or macOS) which may improve performance (as the whole arena is committed at once).
Note that eager commit only increases the commit but not the actual the peak resident set
(rss) so it is generally ok to enable this.
- `MIMALLOC_PURGE_DELAY=N`: the delay in `N` milli-seconds (by default `10`) after which mimalloc will purge
OS pages that are not in use. This signals to the OS that the underlying physical memory can be reused which
can reduce memory fragmentation especially in long running (server) programs. Setting `N` to `0` purges immediately when
a page becomes unused which can improve memory usage but also decreases performance. Setting `N` to a higher
value like `100` can improve performance (sometimes by a lot) at the cost of potentially using more memory at times.
Setting it to `-1` disables purging completely.
- `MIMALLOC_PURGE_DECOMMITS=1`: By default "purging" memory means unused memory is decommitted (`MEM_DECOMMIT` on Windows,
`MADV_DONTNEED` (which decresease rss immediately) on `mmap` systems). Set this to 0 to instead "reset" unused
memory on a purge (`MEM_RESET` on Windows, generally `MADV_FREE` (which does not decrease rss immediately) on `mmap` systems).
Mimalloc generally does not "free" OS memory but only "purges" OS memory, in other words, it tries to keep virtual
address ranges and decommits within those ranges (to make the underlying physical memory available to other processes).
Further options for large workloads and services:
- `MIMALLOC_USE_NUMA_NODES=N`: pretend there are at most `N` NUMA nodes. If not set, the actual NUMA nodes are detected
at runtime. Setting `N` to 1 may avoid problems in some virtual environments. Also, setting it to a lower number than
the actual NUMA nodes is fine and will only cause threads to potentially allocate more memory across actual NUMA
nodes (but this can happen in any case as NUMA local allocation is always a best effort but not guaranteed).
- `MIMALLOC_ALLOW_LARGE_OS_PAGES=1`: use large OS pages (2 or 4MiB) when available; for some workloads this can significantly
improve performance. When this option is disabled, it also disables transparent huge pages (THP) for the process
(on Linux and Android). Use `MIMALLOC_VERBOSE` to check if the large OS pages are enabled -- usually one needs
to explicitly give permissions for large OS pages (as on [Windows][windows-huge] and [Linux][linux-huge]). However, sometimes
the OS is very slow to reserve contiguous physical memory for large OS pages so use with care on systems that
can have fragmented memory (for that reason, we generally recommend to use `MIMALLOC_RESERVE_HUGE_OS_PAGES` instead when possible).
- `MIMALLOC_RESERVE_HUGE_OS_PAGES=N`: where N is the number of 1GiB _huge_ OS pages. This reserves the huge pages at
can have fragmented memory (for that reason, we generally recommend to use `MIMALLOC_RESERVE_HUGE_OS_PAGES` instead whenever possible).
- `MIMALLOC_RESERVE_HUGE_OS_PAGES=N`: where `N` is the number of 1GiB _huge_ OS pages. This reserves the huge pages at
startup and sometimes this can give a large (latency) performance improvement on big workloads.
Usually it is better to not use
`MIMALLOC_LARGE_OS_PAGES` in combination with this setting. Just like large OS pages, use with care as reserving
Usually it is better to not use `MIMALLOC_ALLOW_LARGE_OS_PAGES=1` in combination with this setting. Just like large
OS pages, use with care as reserving
contiguous physical memory can take a long time when memory is fragmented (but reserving the huge pages is done at
startup only once).
Note that we usually need to explicitly enable huge OS pages (as on [Windows][windows-huge] and [Linux][linux-huge])). With huge OS pages, it may be beneficial to set the setting
Note that we usually need to explicitly give permission for huge OS pages (as on [Windows][windows-huge] and [Linux][linux-huge])).
With huge OS pages, it may be beneficial to set the setting
`MIMALLOC_EAGER_COMMIT_DELAY=N` (`N` is 1 by default) to delay the initial `N` segments (of 4MiB)
of a thread to not allocate in the huge OS pages; this prevents threads that are short lived
and allocate just a little to take up space in the huge OS page area (which cannot be reset).
- `MIMALLOC_RESERVE_HUGE_OS_PAGES_AT=N`: where N is the numa node. This reserves the huge pages at a specific numa node.
(`N` is -1 by default to reserve huge pages evenly among the given number of numa nodes (or use the available ones as detected))
and allocate just a little to take up space in the huge OS page area (which cannot be purged as huge OS pages are pinned
to physical memory).
The huge pages are usually allocated evenly among NUMA nodes.
We can use `MIMALLOC_RESERVE_HUGE_OS_PAGES_AT=N` where `N` is the numa node (starting at 0) to allocate all
the huge pages at a specific numa node instead.
Use caution when using `fork` in combination with either large or huge OS pages: on a fork, the OS uses copy-on-write
for all pages in the original process including the huge OS pages. When any memory is now written in that area, the
OS will copy the entire 1GiB huge page (or 2MiB large page) which can cause the memory usage to grow in big increments.
OS will copy the entire 1GiB huge page (or 2MiB large page) which can cause the memory usage to grow in large increments.
[linux-huge]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-large_memory_optimization_big_pages_and_huge_pages-configuring_huge_pages_in_red_hat_enterprise_linux_4_or_5
[windows-huge]: https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/enable-the-lock-pages-in-memory-option-windows?view=sql-server-2017
@ -1111,87 +1263,100 @@ OS will copy the entire 1GiB huge page (or 2MiB large page) which can cause the
/*! \page overrides Overriding Malloc
Overriding the standard `malloc` can be done either _dynamically_ or _statically_.
Overriding the standard `malloc` (and `new`) can be done either _dynamically_ or _statically_.
## Dynamic override
This is the recommended way to override the standard malloc interface.
### Dynamic Override on Linux, BSD
### Linux, BSD
On these systems we preload the mimalloc shared
On these ELF-based systems we preload the mimalloc shared
library so all calls to the standard `malloc` interface are
resolved to the _mimalloc_ library.
- `env LD_PRELOAD=/usr/lib/libmimalloc.so myprogram`
```
> env LD_PRELOAD=/usr/lib/libmimalloc.so myprogram
```
You can set extra environment variables to check that mimalloc is running,
like:
```
env MIMALLOC_VERBOSE=1 LD_PRELOAD=/usr/lib/libmimalloc.so myprogram
> env MIMALLOC_VERBOSE=1 LD_PRELOAD=/usr/lib/libmimalloc.so myprogram
```
or run with the debug version to get detailed statistics:
```
env MIMALLOC_SHOW_STATS=1 LD_PRELOAD=/usr/lib/libmimalloc-debug.so myprogram
> env MIMALLOC_SHOW_STATS=1 LD_PRELOAD=/usr/lib/libmimalloc-debug.so myprogram
```
### MacOS
### Dynamic Override on MacOS
On macOS we can also preload the mimalloc shared
library so all calls to the standard `malloc` interface are
resolved to the _mimalloc_ library.
- `env DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=/usr/lib/libmimalloc.dylib myprogram`
```
> env DYLD_INSERT_LIBRARIES=/usr/lib/libmimalloc.dylib myprogram
```
Note that certain security restrictions may apply when doing this from
the [shell](https://stackoverflow.com/questions/43941322/dyld-insert-libraries-ignored-when-calling-application-through-bash).
(Note: macOS support for dynamic overriding is recent, please report any issues.)
### Dynamic Override on Windows
### Windows
Overriding on Windows is robust and has the
particular advantage to be able to redirect all malloc/free calls that go through
<span id="override_on_windows">Dynamically overriding on mimalloc on Windows</span>
is robust and has the particular advantage to be able to redirect all malloc/free calls that go through
the (dynamic) C runtime allocator, including those from other DLL's or libraries.
As it intercepts all allocation calls on a low level, it can be used reliably
on large programs that include other 3rd party components.
There are four requirements to make the overriding work robustly:
The overriding on Windows requires that you link your program explicitly with
the mimalloc DLL and use the C-runtime library as a DLL (using the `/MD` or `/MDd` switch).
Also, the `mimalloc-redirect.dll` (or `mimalloc-redirect32.dll`) must be available
in the same folder as the main `mimalloc-override.dll` at runtime (as it is a dependency).
The redirection DLL ensures that all calls to the C runtime malloc API get redirected to
mimalloc (in `mimalloc-override.dll`).
1. Use the C-runtime library as a DLL (using the `/MD` or `/MDd` switch).
2. Link your program explicitly with `mimalloc-override.dll` library.
To ensure the `mimalloc-override.dll` is loaded at run-time it is easiest to insert some
call to the mimalloc API in the `main` function, like `mi_version()`
(or use the `/INCLUDE:mi_version` switch on the linker). See the `mimalloc-override-test` project
for an example on how to use this.
3. The [`mimalloc-redirect.dll`](bin) (or `mimalloc-redirect32.dll`) must be put
in the same folder as the main `mimalloc-override.dll` at runtime (as it is a dependency of that DLL).
The redirection DLL ensures that all calls to the C runtime malloc API get redirected to
mimalloc functions (which reside in `mimalloc-override.dll`).
4. Ensure the `mimalloc-override.dll` comes as early as possible in the import
list of the final executable (so it can intercept all potential allocations).
To ensure the mimalloc DLL is loaded at run-time it is easiest to insert some
call to the mimalloc API in the `main` function, like `mi_version()`
(or use the `/INCLUDE:mi_version` switch on the linker). See the `mimalloc-override-test` project
for an example on how to use this. For best performance on Windows with C++, it
For best performance on Windows with C++, it
is also recommended to also override the `new`/`delete` operations (by including
[`mimalloc-new-delete.h`](https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-new-delete.h) a single(!) source file in your project).
[`mimalloc-new-delete.h`](include/mimalloc-new-delete.h)
a single(!) source file in your project).
The environment variable `MIMALLOC_DISABLE_REDIRECT=1` can be used to disable dynamic
overriding at run-time. Use `MIMALLOC_VERBOSE=1` to check if mimalloc was successfully redirected.
(Note: in principle, it is possible to even patch existing executables without any recompilation
We cannot always re-link an executable with `mimalloc-override.dll`, and similarly, we cannot always
ensure the the DLL comes first in the import table of the final executable.
In many cases though we can patch existing executables without any recompilation
if they are linked with the dynamic C runtime (`ucrtbase.dll`) -- just put the `mimalloc-override.dll`
into the import table (and put `mimalloc-redirect.dll` in the same folder)
Such patching can be done for example with [CFF Explorer](https://ntcore.com/?page_id=388)).
Such patching can be done for example with [CFF Explorer](https://ntcore.com/?page_id=388) or
the [`minject`](bin) program.
## Static override
On Unix systems, you can also statically link with _mimalloc_ to override the standard
On Unix-like systems, you can also statically link with _mimalloc_ to override the standard
malloc interface. The recommended way is to link the final program with the
_mimalloc_ single object file (`mimalloc-override.o`). We use
_mimalloc_ single object file (`mimalloc.o`). We use
an object file instead of a library file as linkers give preference to
that over archives to resolve symbols. To ensure that the standard
malloc interface resolves to the _mimalloc_ library, link it as the first
object file. For example:
```
> gcc -o myprogram mimalloc.o myfile1.c ...
```
```
gcc -o myprogram mimalloc-override.o myfile1.c ...
```
Another way to override statically that works on all platforms, is to
link statically to mimalloc (as shown in the introduction) and include a
header file in each source file that re-defines `malloc` etc. to `mi_malloc`.
This is provided by [`mimalloc-override.h`](https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-override.h). This only works reliably though if all sources are
under your control or otherwise mixing of pointers from different heaps may occur!
## List of Overrides:

View file

@ -47,3 +47,14 @@ div.fragment {
#nav-sync img {
display: none;
}
h1,h2,h3,h4,h5,h6 {
transition:none;
}
.memtitle {
background-image: none;
background-color: #EEE;
}
table.memproto, .memproto {
text-shadow: none;
font-size: 110%;
}