Today, I am going to talk about memory mapping, a typical topic in operating systems design. I will provide a short summary of memory mapped files in an easy to follow manner. Before we start, I recommend that you get a basic idea about some relevant concepts such as multitasking and paging. Here are few articles that you may need to check out:
Definition - what is memory mapped file?
Memory mapping refers to process ability to access files on disk the same way it accesses dynamic memory. It is obvious that accessing RAM is much faster than accessing disk via read and write system calls. This technique saves user applications IO overhead and buffering but it also has its own drawbacks as we will see later.
How does a memory mapped file work?
Behind the scenes, the operating system utilizes virtual memory techniques to do the trick. The OS splits the memory mapped file into pages (similar to process pages) and loads the requested pages into physical memory on demand. If a process references an address (i.e. location within the file) that does not exists, a page fault occurs and the operating system brings the missing page into memory.
When to use memory mapped file?
Memory mapped files sound like an efficient method to access files on disk. Is it a good option always? That is not necessarily the case. Here are few scenarios where memory mapping is appealing.
- Randomly accessing a huge file once (or a couple of times).
- Loading a small file once then randomly accessing the file frequently.
- Sharing a file or a portion of a file between multiple applications.
- When the file contains data of great importance to the application.
There is a rational behind each example in the list above. If you are curious to know why? please leave a comment in the comments section at the end of this post.
Memory mapping is an excellent technique that has various benefits. Examples below.
- Efficiency: when dealing with large files, no need to read the entire file into memory first.
- Fast: accessing virtual memory is much faster than accessing disk.
- Sharing: facilitates data sharing and interprocess communication.
- Simplicity: dealing with memory as opposed to allocating space, copying data and deallocating space.
Just like any other technique, memory mapping has some drawbacks.
- Memory mapping is generally good for binary files, however reading formatted binary file types with custom headers such as TIFF can be problematic.
- Memory mapping text files is not such an appealing task as it may require proper text handling and conversion.
- The notion that a memory mapped file has always better performance should not be taken for granted. Recall that accessing file in memory may generate a lot of page faults which is bad.
- Memory footprint is larger than that of traditional file IO. In other words, user applications have no control over memory allocation.
- Expanding file size is not easy to implement because a memory mapped file is assumed to be fixed in size.
Difference between memory mapped file and shared memory
- Shared memory is a RAM only form of interprocess communication (IPC) that does not require disk operations.
- Moreover, IPC can be implemented using memory mapped file technique, however it is not as fast as a pure memory only IPC.
Memory mapped file vs named pipe
- Named pipes allow one process to communicate with another process in real time on the same computer or through a network. It is based on client server communication model with a sender and a listener.
- Behind the scenes, named pipes may implement an IPC shared memory.
Python memory mapped file example
Here is a Python code snippet that demonstrates the use of memory mapped files. From the programmer perspective, it is no different than using standard file access calls.
import mmap # Open file.txt for reading in binary mode with open("file.txt", "r+b") as f: # Memory map the entire file (i.e 0 parameter) m = mmap.mmap(f.fileno(), 0) # Read a line just like reading a standard file # You can use other file operations like seek() line = m.readline() # Use slice notation slice = m[:5] # Close map mm.close()
That is it for today. I hope we got the concept of memory mapped file explained. If you have questions or comments, please use the comments section below.