mm/hugetlb: introduce hugetlb_walk()

huge_pte_offset() is the main walker function for hugetlb pgtables.  The
name is not really representing what it does, though.

Instead of renaming it, introduce a wrapper function called hugetlb_walk()
which will use huge_pte_offset() inside.  Assert on the locks when walking
the pgtable.

Note, the vma lock assertion will be a no-op for private mappings.

Document the last special case in the page_vma_mapped_walk() path where we
don't need any more lock to call hugetlb_walk().

Taking vma lock there is not needed because either: (1) potential callers
of hugetlb pvmw holds i_mmap_rwsem already (from one rmap_walk()), or (2)
the caller will not walk a hugetlb vma at all so the hugetlb code path not
reachable (e.g.  in ksm or uprobe paths).

It's slightly implicit for future page_vma_mapped_walk() callers on that
lock requirement.  But anyway, when one day this rule breaks, one will get
a straightforward warning in hugetlb_walk() with lockdep, then there'll be
a way out.

[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20221216155229.2043750-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Peter Xu 2022-12-16 10:52:29 -05:00 committed by Andrew Morton
parent dd361e5033
commit 9c67a20704
6 changed files with 60 additions and 31 deletions

View file

@ -2,6 +2,7 @@
#ifndef _LINUX_HUGETLB_H
#define _LINUX_HUGETLB_H
#include <linux/mm.h>
#include <linux/mm_types.h>
#include <linux/mmdebug.h>
#include <linux/fs.h>
@ -196,6 +197,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
* huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE.
* Returns the pte_t* if found, or NULL if the address is not mapped.
*
* IMPORTANT: we should normally not directly call this function, instead
* this is only a common interface to implement arch-specific
* walker. Please use hugetlb_walk() instead, because that will attempt to
* verify the locking for you.
*
* Since this function will walk all the pgtable pages (including not only
* high-level pgtable page, but also PUD entry that can be unshared
* concurrently for VM_SHARED), the caller of this function should be
@ -1229,4 +1235,35 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr);
#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end)
#endif
static inline bool __vma_shareable_lock(struct vm_area_struct *vma)
{
return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data;
}
/*
* Safe version of huge_pte_offset() to check the locks. See comments
* above huge_pte_offset().
*/
static inline pte_t *
hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz)
{
#if defined(CONFIG_HUGETLB_PAGE) && \
defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP)
struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
/*
* If pmd sharing possible, locking needed to safely walk the
* hugetlb pgtables. More information can be found at the comment
* above huge_pte_offset() in the same file.
*
* NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP.
*/
if (__vma_shareable_lock(vma))
WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) &&
!lockdep_is_held(
&vma->vm_file->f_mapping->i_mmap_rwsem));
#endif
return huge_pte_offset(vma->vm_mm, addr, sz);
}
#endif /* _LINUX_HUGETLB_H */