Commit graph

167 commits

Author SHA1 Message Date
Jim Huang
f58d34fe28 Detect L1 cache size at compile time
Common cache line sizes are 32, 64 and 128 bytes. On x86_64 the standard
cache line size is 64B. Even though this is not architecturally required,
all the x86_64implementations stick to it. Some AArch64 processors also
follow the x86_64 style with 64B cachelines. However, on Apple M1
devices, the underlying hardware is using a 128B cache line size. Quote
from Apple Developer documentation [1]:
  "Some features of Apple silicon are decidedly different than those of
   Intel-based Mac computers, and may impact your code if you don't fetch
   them dynamically. These features include:
   * Cache line sizes are different. Fetch the hw.cachelinesize setting
     using sysctl."

M1 cache lines are double of what is commonly used by x86_64 and other
Arm implementation. The cache line sizes for Arm depend on implementations,
not architectures. For example, TI AM57x (Cortex-A15) uses 64B cache
line while TI AM437x (Cortex-A9) uses 32B cache line. And, there are
even Arm implementations with cache line sizes configurable at boot time.

This patch attempts to detect L1 cache size at compile time. For Aarch64
hosts, the build process would collect system information and determine
L1 cache line size. At present, both macOS and Linux are supported. For
Arm targets, the software packages are usually cross-compiled, and
developers should specify the appropriate MI_CACHE_LINE setting in
advance.

64B is the default cache line size if none of the above is able to set.

[1] https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code
2021-06-20 22:05:16 +08:00
Daan Leijen
728be93977 fix for #414 making numa node count atomic 2021-06-17 19:38:51 -07:00
Daan Leijen
a83bca72b3 fixes for M1; disable interpose use zones; fix pedantic warnings 2021-06-17 19:15:09 -07:00
Jim Huang
5940d3bcce Bump copyright date
Each source file has been changed according to relevant Git activities.
2021-04-24 16:35:11 +00:00
Daan
f941015928
Merge pull request #384 from kdrag0n/fix-android-thread-id
Fix thread ID getter on Android ARM/AArch64
2021-04-22 10:33:53 -07:00
Jim Huang
3402c6cc3f Revise the use of macOS predefined macro
Quoted from "Porting UNIX/Linux Applications to OS X,"[1]
* macro __MACH__ is defined if Mach system calls are supported;
* macro __APPLE__ is defined in any Apple computer.

__MACH__ is not specific to macOS since GNU/Hurd runs on a Mach-based
microkernel (gnumach) [2]. __MACH__ is defined by the compiler,
leading to potential confusions. The solution is just changing the
checked identifier (i.e. __APPLE__), so it is really used only on
macOS.

[1] https://developer.apple.com/library/archive/documentation/Porting/Conceptual/PortingUnix/compiling/compiling.html
[2] https://www.gnu.org/software/hurd/microkernel/mach/gnumach.html
2021-04-21 15:24:02 +08:00
Danny Lin
ad2fa2bf6f
Fix thread ID getter on Android ARM/AArch64
Android's Bionic libc stores the thread ID in TLS slot 1 instead of 0
on 32-bit ARM and AArch64. Slot 0 contains a pointer to the ELF DTV
(Dynamic Thread Vector) instead, which is constant for each loaded DSO.

Because mimalloc uses the thread ID to determine whether operations are
thread-local or cross-thread (atomic), all threads having the same ID
causes internal data structures to get corrupted quickly when multiple
threads are using the allocator:

mimalloc: assertion failed: at "external/mimalloc/src/page.c":563, mi_page_extend_free
  assertion: "page->local_free == NULL"
mimalloc: assertion failed: at "external/mimalloc/src/page.c":74, mi_page_is_valid_init
  assertion: "page->used <= page->capacity"
mimalloc: assertion failed: at "external/mimalloc/src/page.c":100, mi_page_is_valid_init
  assertion: "page->used + free_count == page->capacity"
mimalloc: assertion failed: at "external/mimalloc/src/page.c":74, mi_page_is_valid_init
  assertion: "page->used <= page->capacity"

Add support for Android's alternate TLS layout to fix the crashes in
multi-threaded use cases.

Fixes #376.
2021-04-07 01:59:47 -07:00
Daan Leijen
331491e1e8 build fix for Apple M1 (issue #354 and pr #356) 2021-02-02 10:46:30 -08:00
Daan Leijen
a7c33a3b0e fix getting the unique thread id on the Apple M1, see issue #354. 2021-02-01 15:47:22 -08:00
Daan Leijen
35c1fc2be9 limit memcpy as rep stosb to windows where the cpu supporst FSRM; add mi_memcpy_aligned for machine-word aligned copy. see issue #201 and pr #253 2021-01-30 14:33:46 -08:00
Daan Leijen
92ec493a5d possible fix for aligment warning (issue #341) 2021-01-29 16:21:50 -08:00
Daan Leijen
0a06884732 ensure memcpy with rep stosb is only used on windows 2021-01-29 16:09:09 -08:00
Daan
9b966c3492
Merge pull request #253 from haneefmubarak/memcpy-rep-movsb-windows-201
resolve #201 with a platform-selective REP MOVSB implementation
2021-01-29 16:00:00 -08:00
Daan Leijen
78ce716e2d add comment on use of tpidrro_el0 on macOS 2021-01-28 17:36:56 -08:00
Uwe L. Korn
a753084f74 Use APPLE instead of MACH 2021-01-28 11:38:38 +01:00
Uwe L. Korn
88330cfc9f Use __APPLE__ instead of __MACH__ 2021-01-22 17:06:43 +01:00
Uwe L. Korn
ab3dac04c2 Use tpidrro_el0 for thread local storage in macOS-arm64
Fixes #343
2020-12-30 21:49:41 +01:00
Daan Leijen
bb386025b5 update override on macOS with interpose of malloc_default_zone (issues #313) 2020-12-15 16:03:54 -08:00
Daan
5bbe1c0216
Merge pull request #323 from devnexen/dfbsd_build_fix
DragonFly support fix (for 5.8.x and forward).
2020-12-10 10:19:05 -08:00
Daan
8b8011b4f0
Merge pull request #322 from Kokokokoka/x32_patch
fix for x32 builds
2020-12-10 10:14:04 -08:00
David Carlier
e6c2fd44fc DragonFly support fix (for 5.8.x and forward).
The pthread slot approach is somewhat buggy (pretty visible
 with the stress unit test which segfault more or less randomly,
 but the stats never show up).
Using the default approach instead, the test passes eventough
 it s relatively slow (e.g 1.5 sec on FreeBSD vs 4.5 on DragonFly with same
 machine).
2020-10-22 11:15:37 +01:00
Vasya B
cb45e3c6b1 fix for x32 builds 2020-10-19 21:00:16 +00:00
daan
14b8d27386 track pinned memory separately from large os pages 2020-09-08 16:46:03 -07:00
daan
c86459afef split bitmap code into separate header and source file 2020-09-08 10:14:13 -07:00
daan
30b993ecf3 consolidate bit scan operations 2020-09-08 09:27:57 -07:00
daan
900c97664a merge from dev-atomic 2020-09-03 09:47:01 -07:00
daan
5805c39916 enable --std=c99 compilation; fix mingw compilation 2020-08-09 17:55:17 -07:00
daan
ef8e5d18a6 replace atomics with C11/C++ atomics with explicit memory order; passes tsan. Issue #130 2020-07-26 18:01:33 -07:00
daan
afe29cb8f5 fix ub on shift, issue #279 2020-07-25 19:33:02 -07:00
daan
c5406f327e move include 'limits.h' outside of definition 2020-07-21 18:51:25 -07:00
David Carlier
0c550d1626 illumos support/build fix and large page support 2020-07-10 03:26:14 +01:00
Haneef Mubarak
4c45793ec1
fix __movsb typecast error MSVC 2020-05-26 16:16:19 -07:00
Haneef Mubarak
6c92690914
fix REP MOVSB doc comment typo 2020-05-26 16:08:33 -07:00
Haneef Mubarak
429025634e
resolve #201 with a platform-selective REP MOVSB implementation 2020-05-26 16:04:28 -07:00
daan
a7d2bc8ad6 edit warning messages to be more consistent 2020-05-19 10:16:28 -07:00
daan
f2ac272baa strengthen alignment check for memalign and aligned_alloc 2020-02-17 09:59:11 -08:00
daan
a96e94f940 change TLS slot on OpenBSD 2020-02-02 22:46:38 -08:00
daan
3560e0a867 fix TLS slot number on OSX 2020-02-02 22:15:09 -08:00
daan
f3c47c7c91 improved malloc zone handling on macOSX (not working yet) 2020-02-02 21:03:09 -08:00
daan
757dcc8411 extend interpose for macOSX 2020-02-02 19:07:26 -08:00
daan
12c4108abe update comments 2020-02-02 16:09:09 -08:00
daan
07fbe4f80f fixes for dragonfly 2020-02-02 14:31:28 -08:00
daan
8bc20631e4 fixes for freeBSD 2020-02-02 13:25:26 -08:00
daan
d2db9f1fc2 update thread local storage 2020-02-02 13:12:22 -08:00
daan
0989562c2d add initial fast tls for macOSX 2020-02-01 16:57:00 -08:00
daan
fea903900d use __thread locals on linux 2020-02-01 14:33:34 -08:00
daan
a169cf0e3f merge dev-exp; add pthread TLS support for macOSX 2020-02-01 14:10:10 -08:00
daan
edff9d4fbb merge from dev-win (padding) 2020-02-01 12:32:59 -08:00
daan
aa68b8cbc7 improve encoding of padding canary and buffer overflow detection 2020-02-01 12:16:37 -08:00
daan
40f1e1e07b byte-precise heap block overflow checking with encoded padding 2020-01-31 23:39:51 -08:00