The purpose of this short post is to explain operating system's page table vs inverted page table, mention their advantages and disadvantages in an easy to follow manner. Many students and beginners do not get the point behind these two concepts so my intention here is to clear any possible confusion rather than dive into academic and implementation details. Sounds good ? let us get started.
It does not make sense to talk about page tables without having a basic idea about operating systems virtual memory and multitasking. Please check the following posts if you think you need to:
Borrowing some points from the previous articles, we can summarize it as follows:
Physical memory space is limited so virtual memory comes to the rescue.
Large processes are split into smaller units called pages.
The operating system keeps track of each process active pages in memory.
To do that, the operating system maintains a per process data structure called the page table.
Given a logical (virtual) address in the form (page number, offset).
Use the page number as an index to the page table as if you are indexing an array.
Get the entry at that particular location in the page table.
This entry points to the physical memory frame in which the page is loaded.
Do you think there is a problem using the previous approach to translate logical addresses into physical addresses ? Yes indeed. Page tables occupy space and specially when the logical address space is large. Also, each process has its own page table which makes the space problem even worst. Let us take an example to demonstrate that page tables do eat space. We are going to use tiny numbers so that it is easy to follow.
Logical address translation example
Assume an 8 bit logical address space.
This means we can address up to 256 locations or instructions (i.e. 0 to 255).
Assume the page size is 8 bytes.
This means a process using the entire address space need to be split into 32 pages 8 bytes each (32 x 8 = 256).
In other words, the logical address [page, offset] is [5 bits to index a page, 3 bits for the offset within the page).
So the operating system has to maintain a per process page table of size 32 entries (regardless of entry size in bytes).
What if the logical address is 16 bits with the same page size ? How many entries we need ? (256 x 256)/8 = 8192 entries. What if the address is 32 bits ? It is 2 to the power of 32 divided by 8 which is 512M entries (M = 1024 x 1024).
Do you see the pattern here ? Page table size is dramatically increasing as the address spaces becomes wider.
Modern microprocessors can easily support 64 bit addresses (You can use the steps above to compute the page table size)
That is only for one application, imagine we have 100 application!
It is obvious that this is a serious space issue and requires a proper solution. One way to reduce page table size is to use multilevel page tables. Let us talk about that.
Two level page table example
If you recall, paging is an operating system trick to allow multiple processes to run simultaneously while having limited physical memory space. We can utilize the same concept to fix the page table space problem. Think of the big page table as a big process and divide the page table itself into pages. In this case, there is no need to keep the entire table in memory. We can only include the needed portions. It is clear that having only portions in memory would reduce space consumed by page tables. As in the saying, there is no such a free lunch, everything has a price. We save on space but pay the price in access time. Let us demonstrate two level page tables using the 8 bits example mentioned earlier.
8 bits logical address space.
We can address up to 256 addresses.
Page size is 8 bytes.
Entire address space need to be split into 32 pages 8 bytes each (32 x 8 = 256).
Split the logical address into 3 parts [Level 1, Level 2, Offset] for example 2 bits for the first level, 3 bits for the second level and as before 3 bits for the offset [2, 3, 3].
First level table uses 2 bits so it has 4 entries.
Second level table uses 3 bits so it has 8 entries.
As you can see 4 x 8 = 32 which is the entire space in our case.
If the total is 32 as in single level page table so how did we save space ? We save space by not having all these tables in memory at the same time.
Only the first level table and at least one second level table need to be present at memory. The first level table is 4 entries and one second level table is 8 entries. The total is 12 entries which is less than 32. This is just a trivial example, if you follow the same steps using a 32 or 64 you gonna see that a large amount of space is saved.
To translate a virtual or logical address into a physical address, use the first 2 bits as index to the first level page table. Get that entry which points to a second level page table. Use the next 3 bits to index the second level page table. Get that entry to point to a physical address frame. Use the offset to locate a specific location within the page frame.
Note that if a requested page is not in memory (page fault) the corresponding second level page table need to be loaded from disk. This will add to space usage but still way much better than having an entire single level page table.
As you can see, multilevel page tables reduce space but still it can not help with wide address spaces. Also, a multilevel page table has the disadvantage of long address translation time. Is there any other way to solve this problem ? Yes. Inverted page tables.
Inverted page table
Before we explain how inverted page table works, let us first mention why is it called inverted. Normal page tables map virtual pages into physical memory frames. In an inverted page table, for each occupied physical memory frame there is an associated virtual page. It is inverted in the sense that we look at mapping starting from a physical memory frame back to a virtual page though the actual address translation starts with a virtual page all the way to a physical memory frame just like a normal page table. I know, sometimes terminology is confusing. Anyway, what is an inverted page table ?
The idea behind inverted page tables is to have a single page table on the operating system level that is not tied to any specific process. It is based on the observation that the CPU only references entries in those pages that are already present in memory. The number of pages that are present in physical memory frames are far less than the total number of virtual pages that are on disk. Having a single table that maps occupied memory frames to virtual pages consumes far less space than having a page table for each process that is big enough to fit all virtual pages.
Inverted page table address translation
Given a process ID and a virtual page number, we need to search the inverted page table for a match. Linear lookup takes time so using an efficient data structure like a hash table is appealing. Here is how the translation takes place:
Hashed inverted page table
Hash the (process ID and virtual page number).
The inverted page table hash function gives us an index to the hash table.
Use that index to get the physical frame number if the stored process ID and virtual page number at that index location match the input process ID and virtual page number.
If they do not match (different process ID or different virtual page number) then that is a collision.
In this case, follow the pointer to the next link in the chain until you get a match.
If all entries in the chain are checked without a match. This means a miss or a page fault.
In practice, the number of entries in the hash table should be larger than the number of physical pages in order to reduce hash table collisions.
A page table is a per process data structure used by the operating system to map process pages to physical memory frames.
Page number is used as an index to the page table to get the corresponding physical frame number.
Page tables consume a lot of memory space specially when many applications are running and even more when using a wide address space.
Space issue can be reduced by using multilevel page tables. It is sort of applying paging to the page table itself with the expense of long access time.
Even with multilevel page tables, still the space issue can persist with large address spaces.
The number of active pages for all processes is limited so an inverted page table consumes less space as it only maps physical frames to virtual addresses.
Inverted page tables can be implemented using a hash table data structure for faster lookup.
Collision may happen but other techniques can be used to reduce lookup time such as having a hash table with more entries than physical memory frames.
Thanks for reading. Please use the comments section below for feedback and questions.