linux memory maps mmap - ghdrako/doc

Memory maps are a modern Unix mechanism where you can take a file and make it part of the virtual memory. In Unix context, modern means that it was introduced in the 1980s or later. You have a file, containing data, you mmap it and you'll get a pointer to where this resides. Now, instead of seeking and reading, you just read from this pointer, adjusting the offset to get to the right data.

mmap() mapuje plik (lub jego fragment) bezpośrednio do przestrzeni adresowej procesu.Dzięki temu można traktować zawartość pliku jak zwykły bufor w pamięci RAM – bez ręcznego wywoływania read().

void *data = mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0);

Linux:

tworzy wirtualne odwzorowanie pliku w przestrzeni adresowej procesu;
nie ładuje od razu żadnych danych z dysku;
dopiero przy pierwszym dostępie do danej strony pamięci (np. data[i]), następuje:
page fault (błąd strony),
kernel wczytuje odpowiednią stronę (zwykle 4 KB) z pliku do pamięci RAM,
i udostępnia ją procesowi.

To się nazywa "demand paging" (stronicowanie na żądanie).

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    if (argc < 2) {
        fprintf(stderr, "Użycie: %s plik\n", argv[0]);
        return 1;
    }

    const char *filename = argv[1];
    int fd = open(filename, O_RDONLY);
    if (fd == -1) {
        perror("open");
        return 1;
    }

    struct stat sb;
    if (fstat(fd, &sb) == -1) {
        perror("fstat");
        close(fd);
        return 1;
    }

    size_t length = sb.st_size;
    char *data = mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0);
    if (data == MAP_FAILED) {
        perror("mmap");
        close(fd);
        return 1;
    }

    // Przykładowe wykorzystanie: wypisz pierwsze 100 bajtów
    fwrite(data, 1, (length < 100 ? length : 100), stdout);

    munmap(data, length);
    close(fd);
    return 0;
}

open() – otwiera plik.
mmap() – mapuje plik do pamięci.
data – wskaźnik do pamięci z zawartością pliku.
munmap() – zwalnia mapowanie.

Nie ma read() — system sam dba o wczytywanie stron pamięci, gdy proces do nich sięga. Dzięki temu OS może buforować dane bardzo efektywnie.

import mmap

with open("plik.txt", "r") as f:
    with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
        print(mm[:100])  # pierwsze 100 bajtów

Info o wczytanych stronach

cat /proc/self/smaps

lub

grep -A20 "plik" /proc/self/smaps
``
Tam zobaczysz m.in.:
* Rss: – faktyczna ilość RAM zajęta przez mapowanie,
* Referenced: – ile stron faktycznie użyto.

Jeśli chcesz rzeczywiście załadować wszystko od razu
```c
madvise(data, length, MADV_WILLNEED);

lub

posix_madvise(data, length, POSIX_MADV_WILLNEED);

Wady: The downside of memory maps is that you really can't write to the memory map. The reason is due to the way virtual memory works. When you're writing to a part of virtual memory that isn't mapped into physical memory, the CPU will generate a page fault. On a modern computer, the CPU is responsible for tracking what virtual memory pages are mapped onto what physical memory. Since you're writing to a page that isn't mapped, the CPU needs help.

So, when the page fault occurs, the OS will 1) allocate a new memory page, 2) read the contents of the file at the correct offset, 3) write this to the new memory page. Then control is returned to the application. The application will now overwrite the virtual memory page with new data.

Wnioski:

jeśli plik jest głównie odczytywany, losowy dostęp, indeksy, itp., to mmap może przynieść dużą korzyść.
Ale jeśli plik jest obsługiwany głównie sekwencyjnie albo zapisywany intensywnie — korzyści mogą być mniejsze lub odwrotnie.

linux memory maps mmap - ghdrako/doc_snipets GitHub Wiki

⚠️ GitHub.com Fallback ⚠️

linux memory maps mmap - ghdrako/doc_snipets GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️