An application could use large pages as a general-purpose heap, but it should avoid fork(). There are currently two ways to allocate large pages: using mmap(2) to map a file opened in hugetlbfs (on Linux only), or shmget(2) passing a special flag (SHM_HUGETLB on Linux, SHM_LGPAGE on AIX, noting that on AIX a large page is 16MB).
- Using shmget(2), the shared memory backed by large page cannot be made copy-on-write, so both parent and child processes after fork() now share the same heap. This will cause unexpected race condition. Furthermore, both processes could create a new heap space which will be private, and references to the new heap will cause memory error in the other process. References to memory newly allocated in another private heap such as malloc(2) will also be invalid in the other process.
- While mmap(2) allows you to set MAP_PRIVATE to trigger copy-on-write, the copying is going to be expensive for two reasons. The most obvious one is the cost of copying 4MB data even if the child only modifies a word of it and proceeds with an exec(2). With 4KB memory pages, copying a page on demand is much less expensive and can be easily amortized across memory access over process run-time. The other reason is that Linux might run out of hugetlb pool space and has to assemble physically continuous blocks of memory in order to back the new large page mapping. This is due to the way Page Size Extension works. Furthermore, the OS might also require the swap space backing a large page to be continuous, needing to move blocks on a disk, causing lots of thrashing activity. This is much more expensive than simple copy-on-write.