merge from dev

2025-08-24 00:04:48 +03:00 · 2024-06-17 16:18:03 -07:00 · 2024-06-17 16:18:03 -07:00 · 3726cf94ba
commit 3726cf94ba
parent 64f3afdda4 b7dd5d6564
106 changed files with 6623 additions and 5540 deletions
--- a/doc/doxyfile
+++ b/doc/doxyfile
--- a/doc/mimalloc-doc.h
+++ b/doc/mimalloc-doc.h
@ -25,12 +25,15 @@ without code changes, for example, on Unix you can use it as:
 ```

 Notable aspects of the design include:
-
 - __small and consistent__: the library is about 8k LOC using simple and
  consistent data structures. This makes it very suitable
  to integrate and adapt in other projects. For runtime systems it
  provides hooks for a monotonic _heartbeat_ and deferred freeing (for
  bounded worst-case times with reference counting).
+  Partly due to its simplicity, mimalloc has been ported to many systems (Windows, macOS,
+  Linux, WASM, various BSD's, Haiku, MUSL, etc) and has excellent support for dynamic overriding.
+  At the same time, it is an industrial strength allocator that runs (very) large scale
+  distributed services on thousands of machines with excellent worst case latencies.
 - __free list sharding__: instead of one big free list (per size class) we have
  many smaller lists per "mimalloc page" which reduces fragmentation and
  increases locality --
@ -45,23 +48,23 @@ Notable aspects of the design include:
  and the chance of contending on a single location will be low -- this is quite
  similar to randomized algorithms like skip lists where adding
  a random oracle removes the need for a more complex algorithm.
- __eager page reset__: when a "page" becomes empty (with increased chance
-  due to free list sharding) the memory is marked to the OS as unused ("reset" or "purged")
+- __eager page purging__: when a "page" becomes empty (with increased chance
+  due to free list sharding) the memory is marked to the OS as unused (reset or decommitted)
  reducing (real) memory pressure and fragmentation, especially in long running
  programs.
- __secure__: _mimalloc_ can be build in secure mode, adding guard pages,
+- __secure__: _mimalloc_ can be built in secure mode, adding guard pages,
  randomized allocation, encrypted free lists, etc. to protect against various
-  heap vulnerabilities. The performance penalty is only around 5% on average
+  heap vulnerabilities. The performance penalty is usually around 10% on average
  over our benchmarks.
 - __first-class heaps__: efficiently create and use multiple heaps to allocate across different regions.
  A heap can be destroyed at once instead of deallocating each object separately.
 - __bounded__: it does not suffer from _blowup_ \[1\], has bounded worst-case allocation
-  times (_wcat_), bounded space overhead (~0.2% meta-data, with low internal fragmentation),
-  and has no internal points of contention using only atomic operations.
- __fast__: In our benchmarks (see [below](#performance)),
-  _mimalloc_ outperforms all other leading allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc),
-  and usually uses less memory (up to 25% more in the worst case). A nice property
-  is that it does consistently well over a wide range of benchmarks.
+  times (_wcat_) (upto OS primitives), bounded space overhead (~0.2% meta-data, with low
+  internal fragmentation), and has no internal points of contention using only atomic operations.
+- __fast__: In our benchmarks (see [below](#bench)),
+  _mimalloc_ outperforms other leading allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc),
+  and often uses less memory. A nice property is that it does consistently well over a wide range
+  of benchmarks. There is also good huge OS page support for larger server programs.

 You can read more on the design of _mimalloc_ in the
 [technical report](https://www.microsoft.com/en-us/research/publication/mimalloc-free-list-sharding-in-action)
@ -278,8 +281,7 @@ void* mi_zalloc_small(size_t size);
 /// The returned size can be
 /// used to call \a mi_expand successfully.
 /// The returned size is always at least equal to the
-/// allocated size of \a p, and, in the current design,
-/// should be less than 16.7% more.
+/// allocated size of \a p.
 ///
 /// @see [_msize](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/msize?view=vs-2017) (Windows)
 /// @see [malloc_usable_size](http://man7.org/linux/man-pages/man3/malloc_usable_size.3.html) (Linux)
@ -304,7 +306,7 @@ size_t mi_good_size(size_t size);
 /// in very narrow circumstances; in particular, when a long running thread
 /// allocates a lot of blocks that are freed by other threads it may improve
 /// resource usage by calling this every once in a while.
-void   mi_collect(bool force);
+void mi_collect(bool force);

 /// Deprecated
 /// @param out Ignored, outputs to the registered output function or stderr by default.
@ -428,7 +430,7 @@ int  mi_reserve_os_memory(size_t size, bool commit, bool allow_large);
 /// allocated in some manner and available for use my mimalloc.
 /// @param start       Start of the memory area
 /// @param size        The size of the memory area.
-/// @param commit      Is the area already committed?
+/// @param is_committed Is the area already committed?
 /// @param is_large    Does it consist of large OS pages? Set this to \a true as well for memory
 ///                    that should not be decommitted or protected (like rdma etc.)
 /// @param is_zero     Does the area consists of zero's?
@ -453,7 +455,7 @@ int mi_reserve_huge_os_pages_interleave(size_t pages, size_t numa_nodes, size_t
 /// Reserve \a pages of huge OS pages (1GiB) at a specific \a numa_node,
 /// but stops after at most `timeout_msecs` seconds.
 /// @param pages The number of 1GiB pages to reserve.
-/// @param numa_node The NUMA node where the memory is reserved (start at 0).
+/// @param numa_node The NUMA node where the memory is reserved (start at 0). Use -1 for no affinity.
 /// @param timeout_msecs Maximum number of milli-seconds to try reserving, or 0 for no timeout.
 /// @returns 0 if successful, \a ENOMEM if running out of memory, or \a ETIMEDOUT if timed out.
 ///
@ -486,6 +488,91 @@ bool mi_is_redirected();
 /// on other systems as the amount of read/write accessible memory reserved by mimalloc.
 void mi_process_info(size_t* elapsed_msecs, size_t* user_msecs, size_t* system_msecs, size_t* current_rss, size_t* peak_rss, size_t* current_commit, size_t* peak_commit, size_t* page_faults);

+/// @brief Show all current arena's.
+/// @param show_inuse       Show the arena blocks that are in use.
+/// @param show_abandoned   Show the abandoned arena blocks.
+/// @param show_purge       Show arena blocks scheduled for purging.
+void mi_debug_show_arenas(bool show_inuse, bool show_abandoned, bool show_purge);
+
+/// Mimalloc uses large (virtual) memory areas, called "arena"s, from the OS to manage its memory.
+/// Each arena has an associated identifier.
+typedef int mi_arena_id_t;
+
+/// @brief  Return the size of an arena.
+/// @param arena_id  The arena identifier.
+/// @param size      Returned size in bytes of the (virtual) arena area.
+/// @return base address of the arena.
+void* mi_arena_area(mi_arena_id_t arena_id, size_t* size);
+
+/// @brief Reserve huge OS pages (1GiB) into a single arena.
+/// @param pages             Number of 1GiB pages to reserve.
+/// @param numa_node         The associated NUMA node, or -1 for no NUMA preference.
+/// @param timeout_msecs     Max amount of milli-seconds this operation is allowed to take. (0 is infinite)
+/// @param exclusive         If exclusive, only a heap associated with this arena can allocate in it.
+/// @param arena_id          The arena identifier.
+/// @return 0 if successful, \a ENOMEM if running out of memory, or \a ETIMEDOUT if timed out.
+int   mi_reserve_huge_os_pages_at_ex(size_t pages, int numa_node, size_t timeout_msecs, bool exclusive, mi_arena_id_t* arena_id);
+
+/// @brief Reserve OS memory to be managed in an arena.
+/// @param size Size the reserve.
+/// @param commit Should the memory be initially committed?
+/// @param allow_large Allow the use of large OS pages?
+/// @param exclusive  Is the returned arena exclusive?
+/// @param arena_id The new arena identifier.
+/// @return Zero on success, an error code otherwise.
+int   mi_reserve_os_memory_ex(size_t size, bool commit, bool allow_large, bool exclusive, mi_arena_id_t* arena_id);
+
+/// @brief Manage externally allocated memory as a mimalloc arena. This memory will not be freed by mimalloc.
+/// @param start Start address of the area.
+/// @param size  Size in bytes of the area.
+/// @param is_committed  Is the memory already committed?
+/// @param is_large      Does it consist of (pinned) large OS pages?
+/// @param is_zero       Is the memory zero-initialized?
+/// @param numa_node     Associated NUMA node, or -1 to have no NUMA preference.
+/// @param exclusive     Is the arena exclusive (where only heaps associated with the arena can allocate in it)
+/// @param arena_id      The new arena identifier.
+/// @return `true` if successful.
+bool  mi_manage_os_memory_ex(void* start, size_t size, bool is_committed, bool is_large, bool is_zero, int numa_node, bool exclusive, mi_arena_id_t* arena_id);
+
+/// @brief Create a new heap that only allocates in the specified arena.
+/// @param arena_id The arena identifier.
+/// @return The new heap or `NULL`.
+mi_heap_t* mi_heap_new_in_arena(mi_arena_id_t arena_id);
+
+/// @brief Create a new heap
+/// @param heap_tag       The heap tag associated with this heap; heaps only reclaim memory between heaps with the same tag.
+/// @param allow_destroy  Is \a mi_heap_destroy allowed?  Not allowing this allows the heap to reclaim memory from terminated threads.
+/// @param arena_id       If not 0, the heap will only allocate from the specified arena.
+/// @return A new heap or `NULL` on failure.
+///
+/// The \a arena_id can be used by runtimes to allocate only in a specified pre-reserved arena.
+/// This is used for example for a compressed pointer heap in Koka.
+/// The \a heap_tag enables heaps to keep objects of a certain type isolated to heaps with that tag.
+/// This is used for example in the CPython integration.
+mi_heap_t* mi_heap_new_ex(int heap_tag, bool allow_destroy, mi_arena_id_t arena_id);
+
+/// A process can associate threads with sub-processes.
+/// A sub-process will not reclaim memory from (abandoned heaps/threads)
+/// other subprocesses.
+typedef void* mi_subproc_id_t;
+
+/// @brief  Get the main sub-process identifier.
+mi_subproc_id_t mi_subproc_main(void);
+
+/// @brief Create a fresh sub-process (with no associated threads yet).
+/// @return The new sub-process identifier.
+mi_subproc_id_t mi_subproc_new(void);
+
+/// @brief Delete a previously created sub-process.
+/// @param subproc The sub-process identifier.
+/// Only delete sub-processes if all associated threads have terminated.
+void mi_subproc_delete(mi_subproc_id_t subproc);
+
+/// Add the current thread to the given sub-process.
+/// This should be called right after a thread is created (and no allocation has taken place yet)
+void mi_subproc_add_current_thread(mi_subproc_id_t subproc);
+
+
 /// \}

 // ------------------------------------------------------
@ -495,20 +582,24 @@ void mi_process_info(size_t* elapsed_msecs, size_t* user_msecs, size_t* system_m
 /// \defgroup aligned Aligned Allocation
 ///
 /// Allocating aligned memory blocks.
+/// Note that `alignment` always follows `size` for consistency with the unaligned
+/// allocation API, but unfortunately this differs from `posix_memalign` and `aligned_alloc` in the C library.
 ///
 /// \{

-/// The maximum supported alignment size (currently 1MiB).
-#define MI_BLOCK_ALIGNMENT_MAX   (1024*1024UL)
-
 /// Allocate \a size bytes aligned by \a alignment.
 /// @param size  number of bytes to allocate.
-/// @param alignment  the minimal alignment of the allocated memory. Must be less than #MI_BLOCK_ALIGNMENT_MAX.
-/// @returns pointer to the allocated memory or \a NULL if out of memory.
-/// The returned pointer is aligned by \a alignment, i.e.
-/// `(uintptr_t)p % alignment == 0`.
-///
+/// @param alignment  the minimal alignment of the allocated memory.
+/// @returns pointer to the allocated memory or \a NULL if out of memory,
+/// or if the alignment is not a power of 2 (including 0). The \a size is unrestricted
+/// (and does not have to be an integral multiple of the \a alignment).
+/// The returned pointer is aligned by \a alignment, i.e. `(uintptr_t)p % alignment == 0`.
 /// Returns a unique pointer if called with \a size 0.
+///
+/// Note that `alignment` always follows `size` for consistency with the unaligned
+/// allocation API, but unfortunately this differs from `posix_memalign` and `aligned_alloc` in the C library.
+///
+/// @see [aligned_alloc](https://en.cppreference.com/w/c/memory/aligned_alloc) (in the standard C11 library, with switched arguments!)
 /// @see [_aligned_malloc](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-malloc?view=vs-2017) (on Windows)
 /// @see [aligned_alloc](http://man.openbsd.org/reallocarray) (on BSD, with switched arguments!)
 /// @see [posix_memalign](https://linux.die.net/man/3/posix_memalign) (on Posix, with switched arguments!)
@ -522,11 +613,12 @@ void* mi_realloc_aligned(void* p, size_t newsize, size_t alignment);
 /// @param size  number of bytes to allocate.
 /// @param alignment  the minimal alignment of the allocated memory at \a offset.
 /// @param offset     the offset that should be aligned.
-/// @returns pointer to the allocated memory or \a NULL if out of memory.
-/// The returned pointer is aligned by \a alignment at \a offset, i.e.
-/// `((uintptr_t)p + offset) % alignment == 0`.
-///
+/// @returns pointer to the allocated memory or \a NULL if out of memory,
+/// or if the alignment is not a power of 2 (including 0). The \a size is unrestricted
+/// (and does not have to be an integral multiple of the \a alignment).
+/// The returned pointer is aligned by \a alignment, i.e. `(uintptr_t)p % alignment == 0`.
 /// Returns a unique pointer if called with \a size 0.
+///
 /// @see [_aligned_offset_malloc](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-offset-malloc?view=vs-2017) (on Windows)
 void* mi_malloc_aligned_at(size_t size, size_t alignment, size_t offset);
 void* mi_zalloc_aligned_at(size_t size, size_t alignment, size_t offset);
@ -574,12 +666,12 @@ void mi_heap_delete(mi_heap_t* heap);
 /// heap is set to the backing heap.
 void mi_heap_destroy(mi_heap_t* heap);

-/// Set the default heap to use for mi_malloc() et al.
+/// Set the default heap to use in the current thread for mi_malloc() et al.
 /// @param heap  The new default heap.
 /// @returns The previous default heap.
 mi_heap_t* mi_heap_set_default(mi_heap_t* heap);

-/// Get the default heap that is used for mi_malloc() et al.
+/// Get the default heap that is used for mi_malloc() et al. (for the current thread).
 /// @returns The current default heap.
 mi_heap_t* mi_heap_get_default();

@ -764,6 +856,8 @@ typedef struct mi_heap_area_s {
  size_t committed;   ///< current committed bytes of this area
  size_t used;        ///< bytes in use by allocated blocks
  size_t block_size;  ///< size in bytes of one block
+  size_t full_block_size; ///< size in bytes of a full block including padding and metadata.
+  int    heap_tag;    ///< heap tag associated with this area (see \a mi_heap_new_ex)
 } mi_heap_area_t;

 /// Visitor function passed to mi_heap_visit_blocks()
@ -788,6 +882,23 @@ typedef bool (mi_block_visit_fun)(const mi_heap_t* heap, const mi_heap_area_t* a
 /// @returns \a true if all areas and blocks were visited.
 bool mi_heap_visit_blocks(const mi_heap_t* heap, bool visit_all_blocks, mi_block_visit_fun* visitor, void* arg);

+/// @brief Visit all areas and blocks in abandoned heaps.
+/// @param subproc_id The sub-process id associated with the abandonded heaps.
+/// @param heap_tag Visit only abandoned memory with the specified heap tag, use -1 to visit all abandoned memory.
+/// @param visit_blocks If \a true visits all allocated blocks, otherwise
+///                         \a visitor is only called for every heap area.
+/// @param visitor This function is called for every area in the heap
+///                 (with \a block as \a NULL). If \a visit_all_blocks is
+///                 \a true, \a visitor is also called for every allocated
+///                 block in every area (with `block!=NULL`).
+///                 return \a false from this function to stop visiting early.
+/// @param arg extra argument passed to the \a visitor.
+/// @return \a true if all areas and blocks were visited.
+///
+/// Note: requires the option `mi_option_visit_abandoned` to be set
+/// at the start of the program.
+bool mi_abandoned_visit_blocks(mi_subproc_id_t subproc_id, int heap_tag, bool visit_blocks, mi_block_visit_fun* visitor, void* arg);
+
 /// \}

 /// \defgroup options Runtime Options
@ -799,34 +910,38 @@ bool mi_heap_visit_blocks(const mi_heap_t* heap, bool visit_all_blocks, mi_block
 /// Runtime options.
 typedef enum mi_option_e {
  // stable options
-  mi_option_show_errors,  ///< Print error messages to `stderr`.
-  mi_option_show_stats,   ///< Print statistics to `stderr` when the program is done.
-  mi_option_verbose,      ///< Print verbose messages to `stderr`.
+  mi_option_show_errors,  ///< Print error messages.
+  mi_option_show_stats,   ///< Print statistics on termination.
+  mi_option_verbose,      ///< Print verbose messages.
+  mi_option_max_errors,                 ///< issue at most N error messages
+  mi_option_max_warnings,               ///< issue at most N warning messages

-  // the following options are experimental
-  mi_option_eager_commit, ///< Eagerly commit segments (4MiB) (enabled by default).
-  mi_option_large_os_pages,      ///< Use large OS pages (2MiB in size) if possible
-  mi_option_reserve_huge_os_pages, ///< The number of huge OS pages (1GiB in size) to reserve at the start of the program.
-  mi_option_reserve_huge_os_pages_at, ///< Reserve huge OS pages at node N.
-  mi_option_reserve_os_memory,        ///< Reserve specified amount of OS memory at startup, e.g. "1g" or "512m".
-  mi_option_segment_cache,   ///< The number of segments per thread to keep cached (0).
-  mi_option_page_reset,      ///< Reset page memory after \a mi_option_reset_delay milliseconds when it becomes free.
-  mi_option_abandoned_page_reset, //< Reset free page memory when a thread terminates.
-  mi_option_use_numa_nodes,  ///< Pretend there are at most N NUMA nodes; Use 0 to use the actual detected NUMA nodes at runtime.
-  mi_option_eager_commit_delay,  ///< the first N segments per thread are not eagerly committed (=1).
-  mi_option_os_tag,          ///< OS tag to assign to mimalloc'd memory
-  mi_option_limit_os_alloc,  ///< If set to 1, do not use OS memory for allocation (but only pre-reserved arenas)
+  // advanced options
+  mi_option_reserve_huge_os_pages,    ///< reserve N huge OS pages (1GiB pages) at startup
+  mi_option_reserve_huge_os_pages_at, ///< Reserve N huge OS pages at a specific NUMA node N.
+  mi_option_reserve_os_memory,        ///< reserve specified amount of OS memory in an arena at startup (internally, this value is in KiB; use `mi_option_get_size`)
+  mi_option_allow_large_os_pages,     ///< allow large (2 or 4 MiB) OS pages, implies eager commit. If false, also disables THP for the process.
+  mi_option_purge_decommits,          ///< should a memory purge decommit? (=1). Set to 0 to use memory reset on a purge (instead of decommit)
+  mi_option_arena_reserve,            ///< initial memory size for arena reservation (= 1 GiB on 64-bit) (internally, this value is in KiB; use `mi_option_get_size`)
+  mi_option_os_tag,                   ///< tag used for OS logging (macOS only for now) (=100)
+  mi_option_retry_on_oom,             ///< retry on out-of-memory for N milli seconds (=400), set to 0 to disable retries. (only on windows)

-  // v1.x specific options
-  mi_option_eager_region_commit, ///< Eagerly commit large (256MiB) memory regions (enabled by default, except on Windows)
-  mi_option_segment_reset,   ///< Experimental
-  mi_option_reset_delay,     ///< Delay in milli-seconds before resetting a page (100ms by default)
-  mi_option_purge_decommits, ///< Experimental
-
-  // v2.x specific options
-  mi_option_allow_purge,  ///< Enable decommitting memory (=on)
-  mi_option_purge_delay,  ///< Decommit page memory after N milli-seconds delay (25ms).
-  mi_option_segment_purge_delay, ///< Decommit large segment memory after N milli-seconds delay (500ms).
+  // experimental options
+  mi_option_eager_commit,             ///< eager commit segments? (after `eager_commit_delay` segments) (enabled by default).
+  mi_option_eager_commit_delay,       ///< the first N segments per thread are not eagerly committed (but per page in the segment on demand)
+  mi_option_arena_eager_commit,       ///< eager commit arenas? Use 2 to enable just on overcommit systems (=2)
+  mi_option_abandoned_page_purge,     ///< immediately purge delayed purges on thread termination
+  mi_option_purge_delay,              ///< memory purging is delayed by N milli seconds; use 0 for immediate purging or -1 for no purging at all. (=10)
+  mi_option_use_numa_nodes,           ///< 0 = use all available numa nodes, otherwise use at most N nodes.
+  mi_option_disallow_os_alloc,        ///< 1 = do not use OS memory for allocation (but only programmatically reserved arenas)
+  mi_option_limit_os_alloc,           ///< If set to 1, do not use OS memory for allocation (but only pre-reserved arenas)
+  mi_option_max_segment_reclaim,        ///< max. percentage of the abandoned segments can be reclaimed per try (=10%)
+  mi_option_destroy_on_exit,            ///< if set, release all memory on exit; sometimes used for dynamic unloading but can be unsafe
+  mi_option_arena_purge_mult,           ///< multiplier for `purge_delay` for the purging delay for arenas (=10)
+  mi_option_abandoned_reclaim_on_free,  ///< allow to reclaim an abandoned segment on a free (=1)
+  mi_option_purge_extend_delay,         ///< extend purge delay on each subsequent delay (=1)
+  mi_option_disallow_arena_alloc,       ///< 1 = do not use arena's for allocation (except if using specific arena id's)
+  mi_option_visit_abandoned,            ///< allow visiting heap blocks from abandoned threads (=0)

  _mi_option_last
 } mi_option_t;
@ -838,7 +953,10 @@ void  mi_option_disable(mi_option_t option);
 void  mi_option_set_enabled(mi_option_t option, bool enable);
 void  mi_option_set_enabled_default(mi_option_t option, bool enable);

-long  mi_option_get(mi_option_t option);
+long   mi_option_get(mi_option_t option);
+long   mi_option_get_clamp(mi_option_t option, long min, long max);
+size_t mi_option_get_size(mi_option_t option);
+
 void  mi_option_set(mi_option_t option, long value);
 void  mi_option_set_default(mi_option_t option, long value);

@ -852,21 +970,27 @@ void  mi_option_set_default(mi_option_t option, long value);
 ///
 /// \{

-void*  mi_recalloc(void* p, size_t count, size_t size);
-size_t mi_malloc_size(const void* p);
-size_t mi_malloc_usable_size(const void *p);
-
 /// Just as `free` but also checks if the pointer `p` belongs to our heap.
 void   mi_cfree(void* p);
+void* mi__expand(void* p, size_t newsize);
+
+void*  mi_recalloc(void* p, size_t count, size_t size);
+size_t mi_malloc_size(const void* p);
+size_t mi_malloc_good_size(size_t size);
+size_t mi_malloc_usable_size(const void *p);

 int mi_posix_memalign(void** p, size_t alignment, size_t size);
 int mi__posix_memalign(void** p, size_t alignment, size_t size);
 void* mi_memalign(size_t alignment, size_t size);
 void* mi_valloc(size_t size);
-
 void* mi_pvalloc(size_t size);
 void* mi_aligned_alloc(size_t alignment, size_t size);

+unsigned short* mi_wcsdup(const unsigned short* s);
+unsigned char*  mi_mbsdup(const unsigned char* s);
+int mi_dupenv_s(char** buf, size_t* size, const char* name);
+int mi_wdupenv_s(unsigned short** buf, size_t* size, const unsigned short* name);
+
 /// Correspond s to [reallocarray](https://www.freebsd.org/cgi/man.cgi?query=reallocarray&sektion=3&manpath=freebsd-release-ports)
 /// in FreeBSD.
 void* mi_reallocarray(void* p, size_t count, size_t size);
@ -874,6 +998,9 @@ void* mi_reallocarray(void* p, size_t count, size_t size);
 /// Corresponds to [reallocarr](https://man.netbsd.org/reallocarr.3) in NetBSD.
 int   mi_reallocarr(void* p, size_t count, size_t size);

+void* mi_aligned_recalloc(void* p, size_t newcount, size_t size, size_t alignment);
+void* mi_aligned_offset_recalloc(void* p, size_t newcount, size_t size, size_t alignment, size_t offset);
+
 void mi_free_size(void* p, size_t size);
 void mi_free_size_aligned(void* p, size_t size, size_t alignment);
 void mi_free_aligned(void* p, size_t alignment);
@ -998,7 +1125,7 @@ mimalloc uses only safe OS calls (`mmap` and `VirtualAlloc`) and can co-exist
 with other allocators linked to the same program.
 If you use `cmake`, you can simply use:
 ```
-find_package(mimalloc 1.0 REQUIRED)
+find_package(mimalloc 2.1 REQUIRED)
 ```
 in your `CMakeLists.txt` to find a locally installed mimalloc. Then use either:
 ```
@ -1071,38 +1198,63 @@ See \ref overrides for more info.

 /*! \page environment Environment Options

-You can set further options either programmatically (using [`mi_option_set`](https://microsoft.github.io/mimalloc/group__options.html)),
-or via environment variables.
+You can set further options either programmatically (using [`mi_option_set`](https://microsoft.github.io/mimalloc/group__options.html)), or via environment variables:

 - `MIMALLOC_SHOW_STATS=1`: show statistics when the program terminates.
 - `MIMALLOC_VERBOSE=1`: show verbose messages.
 - `MIMALLOC_SHOW_ERRORS=1`: show error and warning messages.
- `MIMALLOC_PAGE_RESET=0`: by default, mimalloc will reset (or purge) OS pages when not in use to signal to the OS
-   that the underlying physical memory can be reused. This can reduce memory fragmentation in long running (server)
-   programs. By setting it to `0` no such page resets will be done which can improve performance for programs that are not long
-   running. As an alternative, the `MIMALLOC_DECOMMIT_DELAY=`<msecs> can be set higher (100ms by default) to make the page
-   reset occur less frequently instead of turning it off completely.
- `MIMALLOC_LARGE_OS_PAGES=1`: use large OS pages (2MiB) when available; for some workloads this can significantly
-   improve performance. Use `MIMALLOC_VERBOSE` to check if the large OS pages are enabled -- usually one needs
-   to explicitly allow large OS pages (as on [Windows][windows-huge] and [Linux][linux-huge]). However, sometimes
+
+Advanced options:
+
+- `MIMALLOC_ARENA_EAGER_COMMIT=2`: turns on eager commit for the large arenas (usually 1GiB) from which mimalloc
+   allocates segments and pages. Set this to 2 (default) to
+   only enable this on overcommit systems (e.g. Linux). Set this to 1 to enable explicitly on other systems
+   as well (like Windows or macOS) which may improve performance (as the whole arena is committed at once).
+   Note that eager commit only increases the commit but not the actual the peak resident set
+   (rss) so it is generally ok to enable this.
+- `MIMALLOC_PURGE_DELAY=N`: the delay in `N` milli-seconds (by default `10`) after which mimalloc will purge
+   OS pages that are not in use. This signals to the OS that the underlying physical memory can be reused which
+   can reduce memory fragmentation especially in long running (server) programs. Setting `N` to `0` purges immediately when
+   a page becomes unused which can improve memory usage but also decreases performance. Setting `N` to a higher
+   value like `100` can improve performance (sometimes by a lot) at the cost of potentially using more memory at times.
+   Setting it to `-1` disables purging completely.
+- `MIMALLOC_PURGE_DECOMMITS=1`: By default "purging" memory means unused memory is decommitted (`MEM_DECOMMIT` on Windows,
+   `MADV_DONTNEED` (which decresease rss immediately) on `mmap` systems). Set this to 0 to instead "reset" unused
+   memory on a purge (`MEM_RESET` on Windows, generally `MADV_FREE` (which does not decrease rss immediately) on `mmap` systems).
+   Mimalloc generally does not "free" OS memory but only "purges" OS memory, in other words, it tries to keep virtual
+   address ranges and decommits within those ranges (to make the underlying physical memory available to other processes).
+
+Further options for large workloads and services:
+
+- `MIMALLOC_USE_NUMA_NODES=N`: pretend there are at most `N` NUMA nodes. If not set, the actual NUMA nodes are detected
+   at runtime. Setting `N` to 1 may avoid problems in some virtual environments. Also, setting it to a lower number than
+   the actual NUMA nodes is fine and will only cause threads to potentially allocate more memory across actual NUMA
+   nodes (but this can happen in any case as NUMA local allocation is always a best effort but not guaranteed).
+- `MIMALLOC_ALLOW_LARGE_OS_PAGES=1`: use large OS pages (2 or 4MiB) when available; for some workloads this can significantly
+   improve performance. When this option is disabled, it also disables transparent huge pages (THP) for the process
+   (on Linux and Android). Use `MIMALLOC_VERBOSE` to check if the large OS pages are enabled -- usually one needs
+   to explicitly give permissions for large OS pages (as on [Windows][windows-huge] and [Linux][linux-huge]). However, sometimes
   the OS is very slow to reserve contiguous physical memory for large OS pages so use with care on systems that
-   can have fragmented memory (for that reason, we generally recommend to use `MIMALLOC_RESERVE_HUGE_OS_PAGES` instead when possible).
- `MIMALLOC_RESERVE_HUGE_OS_PAGES=N`: where N is the number of 1GiB _huge_ OS pages. This reserves the huge pages at
+   can have fragmented memory (for that reason, we generally recommend to use `MIMALLOC_RESERVE_HUGE_OS_PAGES` instead whenever possible).
+- `MIMALLOC_RESERVE_HUGE_OS_PAGES=N`: where `N` is the number of 1GiB _huge_ OS pages. This reserves the huge pages at
   startup and sometimes this can give a large (latency) performance improvement on big workloads.
-   Usually it is better to not use
-   `MIMALLOC_LARGE_OS_PAGES` in combination with this setting. Just like large OS pages, use with care as reserving
+   Usually it is better to not use `MIMALLOC_ALLOW_LARGE_OS_PAGES=1` in combination with this setting. Just like large
+   OS pages, use with care as reserving
   contiguous physical memory can take a long time when memory is fragmented (but reserving the huge pages is done at
   startup only once).
-   Note that we usually need to explicitly enable huge OS pages (as on [Windows][windows-huge] and [Linux][linux-huge])). With huge OS pages, it may be beneficial to set the setting
+   Note that we usually need to explicitly give permission for huge OS pages (as on [Windows][windows-huge] and [Linux][linux-huge])).
+   With huge OS pages, it may be beneficial to set the setting
   `MIMALLOC_EAGER_COMMIT_DELAY=N` (`N` is 1 by default) to delay the initial `N` segments (of 4MiB)
   of a thread to not allocate in the huge OS pages; this prevents threads that are short lived
-   and allocate just a little to take up space in the huge OS page area (which cannot be reset).
- `MIMALLOC_RESERVE_HUGE_OS_PAGES_AT=N`: where N is the numa node. This reserves the huge pages at a specific numa node.
-   (`N` is -1 by default to reserve huge pages evenly among the given number of numa nodes (or use the available ones as detected))
+   and allocate just a little to take up space in the huge OS page area (which cannot be purged as huge OS pages are pinned
+   to physical memory).
+   The huge pages are usually allocated evenly among NUMA nodes.
+   We can use `MIMALLOC_RESERVE_HUGE_OS_PAGES_AT=N` where `N` is the numa node (starting at 0) to allocate all
+   the huge pages at a specific numa node instead.

 Use caution when using `fork` in combination with either large or huge OS pages: on a fork, the OS uses copy-on-write
 for all pages in the original process including the huge OS pages. When any memory is now written in that area, the
-OS will copy the entire 1GiB huge page (or 2MiB large page) which can cause the memory usage to grow in big increments.
+OS will copy the entire 1GiB huge page (or 2MiB large page) which can cause the memory usage to grow in large increments.

 [linux-huge]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-large_memory_optimization_big_pages_and_huge_pages-configuring_huge_pages_in_red_hat_enterprise_linux_4_or_5
 [windows-huge]: https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/enable-the-lock-pages-in-memory-option-windows?view=sql-server-2017
@ -1111,87 +1263,100 @@ OS will copy the entire 1GiB huge page (or 2MiB large page) which can cause the

 /*! \page overrides Overriding Malloc

-Overriding the standard `malloc` can be done either _dynamically_ or _statically_.
+Overriding the standard `malloc` (and `new`) can be done either _dynamically_ or _statically_.

 ## Dynamic override

 This is the recommended way to override the standard malloc interface.

+### Dynamic Override on Linux, BSD

-### Linux, BSD
-
-On these systems we preload the mimalloc shared
+On these ELF-based systems we preload the mimalloc shared
 library so all calls to the standard `malloc` interface are
 resolved to the _mimalloc_ library.
-
- `env LD_PRELOAD=/usr/lib/libmimalloc.so myprogram`
+```
+> env LD_PRELOAD=/usr/lib/libmimalloc.so myprogram
+```

 You can set extra environment variables to check that mimalloc is running,
 like:
 ```
-env MIMALLOC_VERBOSE=1 LD_PRELOAD=/usr/lib/libmimalloc.so myprogram
+> env MIMALLOC_VERBOSE=1 LD_PRELOAD=/usr/lib/libmimalloc.so myprogram
 ```
 or run with the debug version to get detailed statistics:
 ```
-env MIMALLOC_SHOW_STATS=1 LD_PRELOAD=/usr/lib/libmimalloc-debug.so myprogram
+> env MIMALLOC_SHOW_STATS=1 LD_PRELOAD=/usr/lib/libmimalloc-debug.so myprogram
 ```

-### MacOS
+### Dynamic Override on MacOS

 On macOS we can also preload the mimalloc shared
 library so all calls to the standard `malloc` interface are
 resolved to the _mimalloc_ library.
-
- `env DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=/usr/lib/libmimalloc.dylib myprogram`
+```
+> env DYLD_INSERT_LIBRARIES=/usr/lib/libmimalloc.dylib myprogram
+```

 Note that certain security restrictions may apply when doing this from
 the [shell](https://stackoverflow.com/questions/43941322/dyld-insert-libraries-ignored-when-calling-application-through-bash).

-(Note: macOS support for dynamic overriding is recent, please report any issues.)

+### Dynamic Override on Windows

-### Windows
-
-Overriding on Windows is robust and has the
-particular advantage to be able to redirect all malloc/free calls that go through
+<span id="override_on_windows">Dynamically overriding on mimalloc on Windows</span>
+is robust and has the particular advantage to be able to redirect all malloc/free calls that go through
 the (dynamic) C runtime allocator, including those from other DLL's or libraries.
+As it intercepts all allocation calls on a low level, it can be used reliably
+on large programs that include other 3rd party components.
+There are four requirements to make the overriding work robustly:

-The overriding on Windows requires that you link your program explicitly with
-the mimalloc DLL and use the C-runtime library as a DLL (using the `/MD` or `/MDd` switch).
-Also, the `mimalloc-redirect.dll` (or `mimalloc-redirect32.dll`) must be available
-in the same folder as the main `mimalloc-override.dll` at runtime (as it is a dependency).
-The redirection DLL ensures that all calls to the C runtime malloc API get redirected to
-mimalloc (in `mimalloc-override.dll`).
+1. Use the C-runtime library as a DLL (using the `/MD` or `/MDd` switch).
+2. Link your program explicitly with `mimalloc-override.dll` library.
+   To ensure the `mimalloc-override.dll` is loaded at run-time it is easiest to insert some
+    call to the mimalloc API in the `main` function, like `mi_version()`
+    (or use the `/INCLUDE:mi_version` switch on the linker). See the `mimalloc-override-test` project
+    for an example on how to use this.
+3. The [`mimalloc-redirect.dll`](bin) (or `mimalloc-redirect32.dll`) must be put
+   in the same folder as the main `mimalloc-override.dll` at runtime (as it is a dependency of that DLL).
+   The redirection DLL ensures that all calls to the C runtime malloc API get redirected to
+   mimalloc functions (which reside in `mimalloc-override.dll`).
+4. Ensure the `mimalloc-override.dll` comes as early as possible in the import
+   list of the final executable (so it can intercept all potential allocations).

-To ensure the mimalloc DLL is loaded at run-time it is easiest to insert some
-call to the mimalloc API in the `main` function, like `mi_version()`
-(or use the `/INCLUDE:mi_version` switch on the linker). See the `mimalloc-override-test` project
-for an example on how to use this. For best performance on Windows with C++, it
+For best performance on Windows with C++, it
 is also recommended to also override the `new`/`delete` operations (by including
-[`mimalloc-new-delete.h`](https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-new-delete.h) a single(!) source file in your project).
+[`mimalloc-new-delete.h`](include/mimalloc-new-delete.h)
+a single(!) source file in your project).

 The environment variable `MIMALLOC_DISABLE_REDIRECT=1` can be used to disable dynamic
 overriding at run-time. Use `MIMALLOC_VERBOSE=1` to check if mimalloc was successfully redirected.

-(Note: in principle, it is possible to even patch existing executables without any recompilation
+We cannot always re-link an executable with `mimalloc-override.dll`, and similarly, we cannot always
+ensure the the DLL comes first in the import table of the final executable.
+In many cases though we can patch existing executables without any recompilation
 if they are linked with the dynamic C runtime (`ucrtbase.dll`) -- just put the `mimalloc-override.dll`
 into the import table (and put `mimalloc-redirect.dll` in the same folder)
-Such patching can be done for example with [CFF Explorer](https://ntcore.com/?page_id=388)).
-
+Such patching can be done for example with [CFF Explorer](https://ntcore.com/?page_id=388) or
+the [`minject`](bin) program.

 ## Static override

-On Unix systems, you can also statically link with _mimalloc_ to override the standard
+On Unix-like systems, you can also statically link with _mimalloc_ to override the standard
 malloc interface. The recommended way is to link the final program with the
-_mimalloc_ single object file (`mimalloc-override.o`). We use
+_mimalloc_ single object file (`mimalloc.o`). We use
 an object file instead of a library file as linkers give preference to
 that over archives to resolve symbols. To ensure that the standard
 malloc interface resolves to the _mimalloc_ library, link it as the first
 object file. For example:
+```
+> gcc -o myprogram mimalloc.o  myfile1.c ...
+```

-```
-gcc -o myprogram mimalloc-override.o  myfile1.c ...
-```
+Another way to override statically that works on all platforms, is to
+link statically to mimalloc (as shown in the introduction) and include a
+header file in each source file that re-defines `malloc` etc. to `mi_malloc`.
+This is provided by [`mimalloc-override.h`](https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-override.h). This only works reliably though if all sources are
+under your control or otherwise mixing of pointers from different heaps may occur!

 ## List of Overrides:

--- a/doc/mimalloc-doxygen.css
+++ b/doc/mimalloc-doxygen.css
@ -47,3 +47,14 @@ div.fragment {
 #nav-sync img {
 	display: none;
 }
+h1,h2,h3,h4,h5,h6 {
+	transition:none;
+}
+.memtitle {
+	background-image: none;
+	background-color: #EEE;
+}
+table.memproto, .memproto {
+	text-shadow: none;
+	font-size: 110%;
+}