mm,thp,rmap: simplify compound page mapcount handling

Compound page (folio) mapcount calculations have been different for anon
and file (or shmem) THPs, and involved the obscure PageDoubleMap flag. 
And each huge mapping and unmapping of a file (or shmem) THP involved
atomically incrementing and decrementing the mapcount of every subpage of
that huge page, dirtying many struct page cachelines.

Add subpages_mapcount field to the struct folio and first tail page, so
that the total of subpage mapcounts is available in one place near the
head: then page_mapcount() and total_mapcount() and page_mapped(), and
their folio equivalents, are so quick that anon and file and hugetlb don't
need to be optimized differently.  Delete the unloved PageDoubleMap.

page_add and page_remove rmap functions must now maintain the
subpages_mapcount as well as the subpage _mapcount, when dealing with pte
mappings of huge pages; and correct maintenance of NR_ANON_MAPPED and
NR_FILE_MAPPED statistics still needs reading through the subpages, using
nr_subpages_unmapped() - but only when first or last pmd mapping finds
subpages_mapcount raised (double-map case, not the common case).

But are those counts (used to decide when to split an anon THP, and in
vmscan's pagecache_reclaimable heuristic) correctly maintained?  Not
quite: since page_remove_rmap() (and also split_huge_pmd()) is often
called without page lock, there can be races when a subpage pte mapcount
0<->1 while compound pmd mapcount 0<->1 is scanning - races which the
previous implementation had prevented.  The statistics might become
inaccurate, and even drift down until they underflow through 0.  That is
not good enough, but is better dealt with in a followup patch.

Update a few comments on first and second tail page overlaid fields. 
hugepage_add_new_anon_rmap() has to "increment" compound_mapcount, but
subpages_mapcount and compound_pincount are already correctly at 0, so
delete its reinitialization of compound_pincount.

A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
18 seconds on small pages, and used to take 1 second on huge pages, but
now takes 119 milliseconds on huge pages.  Mapping by pmds a second time
used to take 860ms and now takes 92ms; mapping by pmds after mapping by
ptes (when the scan is needed) used to take 870ms and now takes 495ms. 
But there might be some benchmarks which would show a slowdown, because
tail struct pages now fall out of cache until final freeing checks them.

Link: https://lkml.kernel.org/r/47ad693-717-79c8-e1ba-46c3a6602e48@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Hugh Dickins 2022-11-02 18:51:38 -07:00 committed by Andrew Morton
parent dad6a5eb55
commit cb67f4282b
13 changed files with 194 additions and 261 deletions

View file

@ -818,8 +818,8 @@ static inline int is_vmalloc_or_module_addr(const void *x)
/*
* How many times the entire folio is mapped as a single unit (eg by a
* PMD or PUD entry). This is probably not what you want, except for
* debugging purposes; look at folio_mapcount() or page_mapcount()
* instead.
* debugging purposes - it does not include PTE-mapped sub-pages; look
* at folio_mapcount() or page_mapcount() or total_mapcount() instead.
*/
static inline int folio_entire_mapcount(struct folio *folio)
{
@ -829,12 +829,20 @@ static inline int folio_entire_mapcount(struct folio *folio)
/*
* Mapcount of compound page as a whole, does not include mapped sub-pages.
*
* Must be called only for compound pages.
* Must be called only on head of compound page.
*/
static inline int compound_mapcount(struct page *page)
static inline int head_compound_mapcount(struct page *head)
{
return folio_entire_mapcount(page_folio(page));
return atomic_read(compound_mapcount_ptr(head)) + 1;
}
/*
* Sum of mapcounts of sub-pages, does not include compound mapcount.
* Must be called only on head of compound page.
*/
static inline int head_subpages_mapcount(struct page *head)
{
return atomic_read(subpages_mapcount_ptr(head));
}
/*
@ -847,11 +855,9 @@ static inline void page_mapcount_reset(struct page *page)
atomic_set(&(page)->_mapcount, -1);
}
int __page_mapcount(struct page *page);
/*
* Mapcount of 0-order page; when compound sub-page, includes
* compound_mapcount().
* compound_mapcount of compound_head of page.
*
* Result is undefined for pages which cannot be mapped into userspace.
* For example SLAB or special types of pages. See function page_has_type().
@ -859,25 +865,61 @@ int __page_mapcount(struct page *page);
*/
static inline int page_mapcount(struct page *page)
{
if (unlikely(PageCompound(page)))
return __page_mapcount(page);
return atomic_read(&page->_mapcount) + 1;
int mapcount = atomic_read(&page->_mapcount) + 1;
if (likely(!PageCompound(page)))
return mapcount;
page = compound_head(page);
return head_compound_mapcount(page) + mapcount;
}
int folio_mapcount(struct folio *folio);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline int total_mapcount(struct page *page)
{
return folio_mapcount(page_folio(page));
if (likely(!PageCompound(page)))
return atomic_read(&page->_mapcount) + 1;
page = compound_head(page);
return head_compound_mapcount(page) + head_subpages_mapcount(page);
}
#else
static inline int total_mapcount(struct page *page)
/*
* Return true if this page is mapped into pagetables.
* For compound page it returns true if any subpage of compound page is mapped,
* even if this particular subpage is not itself mapped by any PTE or PMD.
*/
static inline bool page_mapped(struct page *page)
{
return page_mapcount(page);
return total_mapcount(page) > 0;
}
/**
* folio_mapcount() - Calculate the number of mappings of this folio.
* @folio: The folio.
*
* A large folio tracks both how many times the entire folio is mapped,
* and how many times each individual page in the folio is mapped.
* This function calculates the total number of times the folio is
* mapped.
*
* Return: The number of times this folio is mapped.
*/
static inline int folio_mapcount(struct folio *folio)
{
if (likely(!folio_test_large(folio)))
return atomic_read(&folio->_mapcount) + 1;
return atomic_read(folio_mapcount_ptr(folio)) + 1 +
atomic_read(folio_subpages_mapcount_ptr(folio));
}
/**
* folio_mapped - Is this folio mapped into userspace?
* @folio: The folio.
*
* Return: True if any page in this folio is referenced by user page tables.
*/
static inline bool folio_mapped(struct folio *folio)
{
return folio_mapcount(folio) > 0;
}
#endif
static inline struct page *virt_to_head_page(const void *x)
{
@ -1800,9 +1842,6 @@ static inline pgoff_t page_index(struct page *page)
return page->index;
}
bool page_mapped(struct page *page);
bool folio_mapped(struct folio *folio);
/*
* Return true only if the page has been allocated with
* ALLOC_NO_WATERMARKS and the low watermark was not