mirror of
https://gitee.com/bianbu-linux/linux-6.6
synced 2025-04-24 14:07:52 -04:00
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page(). - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY= =J+Dj -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
This commit is contained in:
commit
7fa8a8ee94
306 changed files with 11567 additions and 7985 deletions
|
@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes.
|
||||||
|
|
||||||
When it is set to 0 only pages from the same node are merged,
|
When it is set to 0 only pages from the same node are merged,
|
||||||
otherwise pages from all nodes can be merged together (default).
|
otherwise pages from all nodes can be merged together (default).
|
||||||
|
|
||||||
|
What: /sys/kernel/mm/ksm/general_profit
|
||||||
|
Date: April 2023
|
||||||
|
KernelVersion: 6.4
|
||||||
|
Contact: Linux memory management mailing list <linux-mm@kvack.org>
|
||||||
|
Description: Measure how effective KSM is.
|
||||||
|
general_profit: how effective is KSM. The formula for the
|
||||||
|
calculation is in Documentation/admin-guide/mm/ksm.rst.
|
||||||
|
|
|
@ -172,7 +172,7 @@ variables.
|
||||||
Offset of the free_list's member. This value is used to compute the number
|
Offset of the free_list's member. This value is used to compute the number
|
||||||
of free pages.
|
of free pages.
|
||||||
|
|
||||||
Each zone has a free_area structure array called free_area[MAX_ORDER].
|
Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
|
||||||
The free_list represents a linked list of free page blocks.
|
The free_list represents a linked list of free page blocks.
|
||||||
|
|
||||||
(list_head, next|prev)
|
(list_head, next|prev)
|
||||||
|
@ -189,8 +189,8 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
|
||||||
information. Makedumpfile gets the start address of the vmalloc region
|
information. Makedumpfile gets the start address of the vmalloc region
|
||||||
from this.
|
from this.
|
||||||
|
|
||||||
(zone.free_area, MAX_ORDER)
|
(zone.free_area, MAX_ORDER + 1)
|
||||||
---------------------------
|
-------------------------------
|
||||||
|
|
||||||
Free areas descriptor. User-space tools use this value to iterate the
|
Free areas descriptor. User-space tools use this value to iterate the
|
||||||
free_area ranges. MAX_ORDER is used by the zone buddy allocator.
|
free_area ranges. MAX_ORDER is used by the zone buddy allocator.
|
||||||
|
|
|
@ -4012,7 +4012,7 @@
|
||||||
[KNL] Minimal page reporting order
|
[KNL] Minimal page reporting order
|
||||||
Format: <integer>
|
Format: <integer>
|
||||||
Adjust the minimal page reporting order. The page
|
Adjust the minimal page reporting order. The page
|
||||||
reporting is disabled when it exceeds (MAX_ORDER-1).
|
reporting is disabled when it exceeds MAX_ORDER.
|
||||||
|
|
||||||
panic= [KNL] Kernel behaviour on panic: delay <timeout>
|
panic= [KNL] Kernel behaviour on panic: delay <timeout>
|
||||||
timeout > 0: seconds before rebooting
|
timeout > 0: seconds before rebooting
|
||||||
|
|
|
@ -157,6 +157,8 @@ stable_node_chains_prune_millisecs
|
||||||
|
|
||||||
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
|
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
|
||||||
|
|
||||||
|
general_profit
|
||||||
|
how effective is KSM. The calculation is explained below.
|
||||||
pages_shared
|
pages_shared
|
||||||
how many shared pages are being used
|
how many shared pages are being used
|
||||||
pages_sharing
|
pages_sharing
|
||||||
|
@ -207,7 +209,8 @@ several times, which are unprofitable memory consumed.
|
||||||
ksm_rmap_items * sizeof(rmap_item).
|
ksm_rmap_items * sizeof(rmap_item).
|
||||||
|
|
||||||
where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
|
where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
|
||||||
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``.
|
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. The process profit
|
||||||
|
is also shown in ``/proc/<pid>/ksm_stat`` as ksm_process_profit.
|
||||||
|
|
||||||
From the perspective of application, a high ratio of ``ksm_rmap_items`` to
|
From the perspective of application, a high ratio of ``ksm_rmap_items`` to
|
||||||
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
|
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
|
||||||
|
|
|
@ -219,6 +219,31 @@ former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter
|
||||||
you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
|
you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
|
||||||
used.
|
used.
|
||||||
|
|
||||||
|
Userfaultfd write-protect mode currently behave differently on none ptes
|
||||||
|
(when e.g. page is missing) over different types of memories.
|
||||||
|
|
||||||
|
For anonymous memory, ``ioctl(UFFDIO_WRITEPROTECT)`` will ignore none ptes
|
||||||
|
(e.g. when pages are missing and not populated). For file-backed memories
|
||||||
|
like shmem and hugetlbfs, none ptes will be write protected just like a
|
||||||
|
present pte. In other words, there will be a userfaultfd write fault
|
||||||
|
message generated when writing to a missing page on file typed memories,
|
||||||
|
as long as the page range was write-protected before. Such a message will
|
||||||
|
not be generated on anonymous memories by default.
|
||||||
|
|
||||||
|
If the application wants to be able to write protect none ptes on anonymous
|
||||||
|
memory, one can pre-populate the memory with e.g. MADV_POPULATE_READ. On
|
||||||
|
newer kernels, one can also detect the feature UFFD_FEATURE_WP_UNPOPULATED
|
||||||
|
and set the feature bit in advance to make sure none ptes will also be
|
||||||
|
write protected even upon anonymous memory.
|
||||||
|
|
||||||
|
When using ``UFFDIO_REGISTER_MODE_WP`` in combination with either
|
||||||
|
``UFFDIO_REGISTER_MODE_MISSING`` or ``UFFDIO_REGISTER_MODE_MINOR``, when
|
||||||
|
resolving missing / minor faults with ``UFFDIO_COPY`` or ``UFFDIO_CONTINUE``
|
||||||
|
respectively, it may be desirable for the new page / mapping to be
|
||||||
|
write-protected (so future writes will also result in a WP fault). These ioctls
|
||||||
|
support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP``
|
||||||
|
respectively) to configure the mapping this way.
|
||||||
|
|
||||||
QEMU/KVM
|
QEMU/KVM
|
||||||
========
|
========
|
||||||
|
|
||||||
|
|
|
@ -575,20 +575,26 @@ The field width is passed by value, the bitmap is passed by reference.
|
||||||
Helper macros cpumask_pr_args() and nodemask_pr_args() are available to ease
|
Helper macros cpumask_pr_args() and nodemask_pr_args() are available to ease
|
||||||
printing cpumask and nodemask.
|
printing cpumask and nodemask.
|
||||||
|
|
||||||
Flags bitfields such as page flags, gfp_flags
|
Flags bitfields such as page flags, page_type, gfp_flags
|
||||||
---------------------------------------------
|
--------------------------------------------------------
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
%pGp 0x17ffffc0002036(referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff)
|
%pGp 0x17ffffc0002036(referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff)
|
||||||
|
%pGt 0xffffff7f(buddy)
|
||||||
%pGg GFP_USER|GFP_DMA32|GFP_NOWARN
|
%pGg GFP_USER|GFP_DMA32|GFP_NOWARN
|
||||||
%pGv read|exec|mayread|maywrite|mayexec|denywrite
|
%pGv read|exec|mayread|maywrite|mayexec|denywrite
|
||||||
|
|
||||||
For printing flags bitfields as a collection of symbolic constants that
|
For printing flags bitfields as a collection of symbolic constants that
|
||||||
would construct the value. The type of flags is given by the third
|
would construct the value. The type of flags is given by the third
|
||||||
character. Currently supported are [p]age flags, [v]ma_flags (both
|
character. Currently supported are:
|
||||||
expect ``unsigned long *``) and [g]fp_flags (expects ``gfp_t *``). The flag
|
|
||||||
names and print order depends on the particular type.
|
- p - [p]age flags, expects value of type (``unsigned long *``)
|
||||||
|
- t - page [t]ype, expects value of type (``unsigned int *``)
|
||||||
|
- v - [v]ma_flags, expects value of type (``unsigned long *``)
|
||||||
|
- g - [g]fp_flags, expects value of type (``gfp_t *``)
|
||||||
|
|
||||||
|
The flag names and print order depends on the particular type.
|
||||||
|
|
||||||
Note that this format should not be used directly in the
|
Note that this format should not be used directly in the
|
||||||
:c:func:`TP_printk()` part of a tracepoint. Instead, use the show_*_flags()
|
:c:func:`TP_printk()` part of a tracepoint. Instead, use the show_*_flags()
|
||||||
|
|
|
@ -645,7 +645,7 @@ ops mmap_lock PageLocked(page)
|
||||||
open: yes
|
open: yes
|
||||||
close: yes
|
close: yes
|
||||||
fault: yes can return with page locked
|
fault: yes can return with page locked
|
||||||
map_pages: yes
|
map_pages: read
|
||||||
page_mkwrite: yes can return with page locked
|
page_mkwrite: yes can return with page locked
|
||||||
pfn_mkwrite: yes
|
pfn_mkwrite: yes
|
||||||
access: yes
|
access: yes
|
||||||
|
@ -661,7 +661,7 @@ locked. The VM will unlock the page.
|
||||||
|
|
||||||
->map_pages() is called when VM asks to map easy accessible pages.
|
->map_pages() is called when VM asks to map easy accessible pages.
|
||||||
Filesystem should find and map pages associated with offsets from "start_pgoff"
|
Filesystem should find and map pages associated with offsets from "start_pgoff"
|
||||||
till "end_pgoff". ->map_pages() is called with page table locked and must
|
till "end_pgoff". ->map_pages() is called with the RCU lock held and must
|
||||||
not block. If it's not possible to reach a page without blocking,
|
not block. If it's not possible to reach a page without blocking,
|
||||||
filesystem should skip it. Filesystem should use do_set_pte() to setup
|
filesystem should skip it. Filesystem should use do_set_pte() to setup
|
||||||
page table entry. Pointer to entry associated with the page is passed in
|
page table entry. Pointer to entry associated with the page is passed in
|
||||||
|
|
|
@ -996,6 +996,7 @@ Example output. You may not have all of these fields.
|
||||||
VmallocUsed: 40444 kB
|
VmallocUsed: 40444 kB
|
||||||
VmallocChunk: 0 kB
|
VmallocChunk: 0 kB
|
||||||
Percpu: 29312 kB
|
Percpu: 29312 kB
|
||||||
|
EarlyMemtestBad: 0 kB
|
||||||
HardwareCorrupted: 0 kB
|
HardwareCorrupted: 0 kB
|
||||||
AnonHugePages: 4149248 kB
|
AnonHugePages: 4149248 kB
|
||||||
ShmemHugePages: 0 kB
|
ShmemHugePages: 0 kB
|
||||||
|
@ -1146,6 +1147,13 @@ VmallocChunk
|
||||||
Percpu
|
Percpu
|
||||||
Memory allocated to the percpu allocator used to back percpu
|
Memory allocated to the percpu allocator used to back percpu
|
||||||
allocations. This stat excludes the cost of metadata.
|
allocations. This stat excludes the cost of metadata.
|
||||||
|
EarlyMemtestBad
|
||||||
|
The amount of RAM/memory in kB, that was identified as corrupted
|
||||||
|
by early memtest. If memtest was not run, this field will not
|
||||||
|
be displayed at all. Size is never rounded down to 0 kB.
|
||||||
|
That means if 0 kB is reported, you can safely assume
|
||||||
|
there was at least one pass of memtest and none of the passes
|
||||||
|
found a single faulty byte of RAM.
|
||||||
HardwareCorrupted
|
HardwareCorrupted
|
||||||
The amount of RAM/memory in KB, the kernel identifies as
|
The amount of RAM/memory in KB, the kernel identifies as
|
||||||
corrupted.
|
corrupted.
|
||||||
|
|
|
@ -13,17 +13,29 @@ everything stored therein is lost.
|
||||||
|
|
||||||
tmpfs puts everything into the kernel internal caches and grows and
|
tmpfs puts everything into the kernel internal caches and grows and
|
||||||
shrinks to accommodate the files it contains and is able to swap
|
shrinks to accommodate the files it contains and is able to swap
|
||||||
unneeded pages out to swap space. It has maximum size limits which can
|
unneeded pages out to swap space, if swap was enabled for the tmpfs
|
||||||
be adjusted on the fly via 'mount -o remount ...'
|
mount. tmpfs also supports THP.
|
||||||
|
|
||||||
If you compare it to ramfs (which was the template to create tmpfs)
|
tmpfs extends ramfs with a few userspace configurable options listed and
|
||||||
you gain swapping and limit checking. Another similar thing is the RAM
|
explained further below, some of which can be reconfigured dynamically on the
|
||||||
disk (/dev/ram*), which simulates a fixed size hard disk in physical
|
fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
|
||||||
RAM, where you have to create an ordinary filesystem on top. Ramdisks
|
filesystem can be resized but it cannot be resized to a size below its current
|
||||||
cannot swap and you do not have the possibility to resize them.
|
usage. tmpfs also supports POSIX ACLs, and extended attributes for the
|
||||||
|
trusted.* and security.* namespaces. ramfs does not use swap and you cannot
|
||||||
|
modify any parameter for a ramfs filesystem. The size limit of a ramfs
|
||||||
|
filesystem is how much memory you have available, and so care must be taken if
|
||||||
|
used so to not run out of memory.
|
||||||
|
|
||||||
Since tmpfs lives completely in the page cache and on swap, all tmpfs
|
An alternative to tmpfs and ramfs is to use brd to create RAM disks
|
||||||
pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
|
(/dev/ram*), which allows you to simulate a block device disk in physical RAM.
|
||||||
|
To write data you would just then need to create an regular filesystem on top
|
||||||
|
this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also
|
||||||
|
configured in size at initialization and you cannot dynamically resize them.
|
||||||
|
Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
|
||||||
|
block layer at all.
|
||||||
|
|
||||||
|
Since tmpfs lives completely in the page cache and optionally on swap,
|
||||||
|
all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
|
||||||
free(1). Notice that these counters also include shared memory
|
free(1). Notice that these counters also include shared memory
|
||||||
(shmem, see ipcs(1)). The most reliable way to get the count is
|
(shmem, see ipcs(1)). The most reliable way to get the count is
|
||||||
using df(1) and du(1).
|
using df(1) and du(1).
|
||||||
|
@ -72,6 +84,8 @@ nr_inodes The maximum number of inodes for this instance. The default
|
||||||
is half of the number of your physical RAM pages, or (on a
|
is half of the number of your physical RAM pages, or (on a
|
||||||
machine with highmem) the number of lowmem RAM pages,
|
machine with highmem) the number of lowmem RAM pages,
|
||||||
whichever is the lower.
|
whichever is the lower.
|
||||||
|
noswap Disables swap. Remounts must respect the original settings.
|
||||||
|
By default swap is enabled.
|
||||||
========= ============================================================
|
========= ============================================================
|
||||||
|
|
||||||
These parameters accept a suffix k, m or g for kilo, mega and giga and
|
These parameters accept a suffix k, m or g for kilo, mega and giga and
|
||||||
|
@ -85,6 +99,36 @@ mount with such options, since it allows any user with write access to
|
||||||
use up all the memory on the machine; but enhances the scalability of
|
use up all the memory on the machine; but enhances the scalability of
|
||||||
that instance in a system with many CPUs making intensive use of it.
|
that instance in a system with many CPUs making intensive use of it.
|
||||||
|
|
||||||
|
tmpfs also supports Transparent Huge Pages which requires a kernel
|
||||||
|
configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
|
||||||
|
your system (has_transparent_hugepage(), which is architecture specific).
|
||||||
|
The mount options for this are:
|
||||||
|
|
||||||
|
====== ============================================================
|
||||||
|
huge=0 never: disables huge pages for the mount
|
||||||
|
huge=1 always: enables huge pages for the mount
|
||||||
|
huge=2 within_size: only allocate huge pages if the page will be
|
||||||
|
fully within i_size, also respect fadvise()/madvise() hints.
|
||||||
|
huge=3 advise: only allocate huge pages if requested with
|
||||||
|
fadvise()/madvise()
|
||||||
|
====== ============================================================
|
||||||
|
|
||||||
|
There is a sysfs file which you can also use to control system wide THP
|
||||||
|
configuration for all tmpfs mounts, the file is:
|
||||||
|
|
||||||
|
/sys/kernel/mm/transparent_hugepage/shmem_enabled
|
||||||
|
|
||||||
|
This sysfs file is placed on top of THP sysfs directory and so is registered
|
||||||
|
by THP code. It is however only used to control all tmpfs mounts with one
|
||||||
|
single knob. Since it controls all tmpfs mounts it should only be used either
|
||||||
|
for emergency or testing purposes. The values you can set for shmem_enabled are:
|
||||||
|
|
||||||
|
== ============================================================
|
||||||
|
-1 deny: disables huge on shm_mnt and all mounts, for
|
||||||
|
emergency use
|
||||||
|
-2 force: enables huge on shm_mnt and all mounts, w/o needing
|
||||||
|
option, for testing
|
||||||
|
== ============================================================
|
||||||
|
|
||||||
tmpfs has a mount option to set the NUMA memory allocation policy for
|
tmpfs has a mount option to set the NUMA memory allocation policy for
|
||||||
all files in that instance (if CONFIG_NUMA is enabled) - which can be
|
all files in that instance (if CONFIG_NUMA is enabled) - which can be
|
||||||
|
|
|
@ -2,6 +2,12 @@
|
||||||
Active MM
|
Active MM
|
||||||
=========
|
=========
|
||||||
|
|
||||||
|
Note, the mm_count refcount may no longer include the "lazy" users
|
||||||
|
(running tasks with ->active_mm == mm && ->mm == NULL) on kernels
|
||||||
|
with CONFIG_MMU_LAZY_TLB_REFCOUNT=n. Taking and releasing these lazy
|
||||||
|
references must be done with mmgrab_lazy_tlb() and mmdrop_lazy_tlb()
|
||||||
|
helpers, which abstract this config option.
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
List: linux-kernel
|
List: linux-kernel
|
||||||
|
|
|
@ -214,7 +214,7 @@ HugeTLB Page Table Helpers
|
||||||
+---------------------------+--------------------------------------------------+
|
+---------------------------+--------------------------------------------------+
|
||||||
| pte_huge | Tests a HugeTLB |
|
| pte_huge | Tests a HugeTLB |
|
||||||
+---------------------------+--------------------------------------------------+
|
+---------------------------+--------------------------------------------------+
|
||||||
| pte_mkhuge | Creates a HugeTLB |
|
| arch_make_huge_pte | Creates a HugeTLB |
|
||||||
+---------------------------+--------------------------------------------------+
|
+---------------------------+--------------------------------------------------+
|
||||||
| huge_pte_dirty | Tests a dirty HugeTLB |
|
| huge_pte_dirty | Tests a dirty HugeTLB |
|
||||||
+---------------------------+--------------------------------------------------+
|
+---------------------------+--------------------------------------------------+
|
||||||
|
|
|
@ -103,7 +103,8 @@ moving across tiers only involves atomic operations on
|
||||||
``folio->flags`` and therefore has a negligible cost. A feedback loop
|
``folio->flags`` and therefore has a negligible cost. A feedback loop
|
||||||
modeled after the PID controller monitors refaults over all the tiers
|
modeled after the PID controller monitors refaults over all the tiers
|
||||||
from anon and file types and decides which tiers from which types to
|
from anon and file types and decides which tiers from which types to
|
||||||
evict or protect.
|
evict or protect. The desired effect is to balance refault percentages
|
||||||
|
between anon and file types proportional to the swappiness level.
|
||||||
|
|
||||||
There are two conceptually independent procedures: the aging and the
|
There are two conceptually independent procedures: the aging and the
|
||||||
eviction. They form a closed-loop system, i.e., the page reclaim.
|
eviction. They form a closed-loop system, i.e., the page reclaim.
|
||||||
|
@ -156,6 +157,27 @@ This time-based approach has the following advantages:
|
||||||
and memory sizes.
|
and memory sizes.
|
||||||
2. It is more reliable because it is directly wired to the OOM killer.
|
2. It is more reliable because it is directly wired to the OOM killer.
|
||||||
|
|
||||||
|
``mm_struct`` list
|
||||||
|
------------------
|
||||||
|
An ``mm_struct`` list is maintained for each memcg, and an
|
||||||
|
``mm_struct`` follows its owner task to the new memcg when this task
|
||||||
|
is migrated.
|
||||||
|
|
||||||
|
A page table walker iterates ``lruvec_memcg()->mm_list`` and calls
|
||||||
|
``walk_page_range()`` with each ``mm_struct`` on this list to scan
|
||||||
|
PTEs. When multiple page table walkers iterate the same list, each of
|
||||||
|
them gets a unique ``mm_struct``, and therefore they can run in
|
||||||
|
parallel.
|
||||||
|
|
||||||
|
Page table walkers ignore any misplaced pages, e.g., if an
|
||||||
|
``mm_struct`` was migrated, pages left in the previous memcg will be
|
||||||
|
ignored when the current memcg is under reclaim. Similarly, page table
|
||||||
|
walkers will ignore pages from nodes other than the one under reclaim.
|
||||||
|
|
||||||
|
This infrastructure also tracks the usage of ``mm_struct`` between
|
||||||
|
context switches so that page table walkers can skip processes that
|
||||||
|
have been sleeping since the last iteration.
|
||||||
|
|
||||||
Rmap/PT walk feedback
|
Rmap/PT walk feedback
|
||||||
---------------------
|
---------------------
|
||||||
Searching the rmap for PTEs mapping each page on an LRU list (to test
|
Searching the rmap for PTEs mapping each page on an LRU list (to test
|
||||||
|
@ -170,7 +192,7 @@ promotes hot pages. If the scan was done cacheline efficiently, it
|
||||||
adds the PMD entry pointing to the PTE table to the Bloom filter. This
|
adds the PMD entry pointing to the PTE table to the Bloom filter. This
|
||||||
forms a feedback loop between the eviction and the aging.
|
forms a feedback loop between the eviction and the aging.
|
||||||
|
|
||||||
Bloom Filters
|
Bloom filters
|
||||||
-------------
|
-------------
|
||||||
Bloom filters are a space and memory efficient data structure for set
|
Bloom filters are a space and memory efficient data structure for set
|
||||||
membership test, i.e., test if an element is not in the set or may be
|
membership test, i.e., test if an element is not in the set or may be
|
||||||
|
@ -186,6 +208,18 @@ is false positive, the cost is an additional scan of a range of PTEs,
|
||||||
which may yield hot pages anyway. Parameters of the filter itself can
|
which may yield hot pages anyway. Parameters of the filter itself can
|
||||||
control the false positive rate in the limit.
|
control the false positive rate in the limit.
|
||||||
|
|
||||||
|
PID controller
|
||||||
|
--------------
|
||||||
|
A feedback loop modeled after the Proportional-Integral-Derivative
|
||||||
|
(PID) controller monitors refaults over anon and file types and
|
||||||
|
decides which type to evict when both types are available from the
|
||||||
|
same generation.
|
||||||
|
|
||||||
|
The PID controller uses generations rather than the wall clock as the
|
||||||
|
time domain because a CPU can scan pages at different rates under
|
||||||
|
varying memory pressure. It calculates a moving average for each new
|
||||||
|
generation to avoid being permanently locked in a suboptimal state.
|
||||||
|
|
||||||
Memcg LRU
|
Memcg LRU
|
||||||
---------
|
---------
|
||||||
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
|
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
|
||||||
|
@ -223,9 +257,9 @@ parts:
|
||||||
|
|
||||||
* Generations
|
* Generations
|
||||||
* Rmap walks
|
* Rmap walks
|
||||||
* Page table walks
|
* Page table walks via ``mm_struct`` list
|
||||||
* Bloom filters
|
* Bloom filters for rmap/PT walk feedback
|
||||||
* PID controller
|
* PID controller for refault feedback
|
||||||
|
|
||||||
The aging and the eviction form a producer-consumer model;
|
The aging and the eviction form a producer-consumer model;
|
||||||
specifically, the latter drives the former by the sliding window over
|
specifically, the latter drives the former by the sliding window over
|
||||||
|
|
|
@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages:
|
||||||
|
|
||||||
* Those owned by ramfs.
|
* Those owned by ramfs.
|
||||||
|
|
||||||
|
* Those owned by tmpfs with the noswap mount option.
|
||||||
|
|
||||||
* Those mapped into SHM_LOCK'd shared memory regions.
|
* Those mapped into SHM_LOCK'd shared memory regions.
|
||||||
|
|
||||||
* Those mapped into VM_LOCKED [mlock()ed] VMAs.
|
* Those mapped into VM_LOCKED [mlock()ed] VMAs.
|
||||||
|
|
|
@ -13457,13 +13457,14 @@ F: arch/powerpc/include/asm/membarrier.h
|
||||||
F: include/uapi/linux/membarrier.h
|
F: include/uapi/linux/membarrier.h
|
||||||
F: kernel/sched/membarrier.c
|
F: kernel/sched/membarrier.c
|
||||||
|
|
||||||
MEMBLOCK
|
MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION
|
||||||
M: Mike Rapoport <rppt@kernel.org>
|
M: Mike Rapoport <rppt@kernel.org>
|
||||||
L: linux-mm@kvack.org
|
L: linux-mm@kvack.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: Documentation/core-api/boot-time-mm.rst
|
F: Documentation/core-api/boot-time-mm.rst
|
||||||
F: include/linux/memblock.h
|
F: include/linux/memblock.h
|
||||||
F: mm/memblock.c
|
F: mm/memblock.c
|
||||||
|
F: mm/mm_init.c
|
||||||
F: tools/testing/memblock/
|
F: tools/testing/memblock/
|
||||||
|
|
||||||
MEMORY CONTROLLER DRIVERS
|
MEMORY CONTROLLER DRIVERS
|
||||||
|
@ -13498,6 +13499,7 @@ F: include/linux/memory_hotplug.h
|
||||||
F: include/linux/mm.h
|
F: include/linux/mm.h
|
||||||
F: include/linux/mmzone.h
|
F: include/linux/mmzone.h
|
||||||
F: include/linux/pagewalk.h
|
F: include/linux/pagewalk.h
|
||||||
|
F: include/trace/events/ksm.h
|
||||||
F: mm/
|
F: mm/
|
||||||
F: tools/mm/
|
F: tools/mm/
|
||||||
F: tools/testing/selftests/mm/
|
F: tools/testing/selftests/mm/
|
||||||
|
@ -13506,6 +13508,7 @@ VMALLOC
|
||||||
M: Andrew Morton <akpm@linux-foundation.org>
|
M: Andrew Morton <akpm@linux-foundation.org>
|
||||||
R: Uladzislau Rezki <urezki@gmail.com>
|
R: Uladzislau Rezki <urezki@gmail.com>
|
||||||
R: Christoph Hellwig <hch@infradead.org>
|
R: Christoph Hellwig <hch@infradead.org>
|
||||||
|
R: Lorenzo Stoakes <lstoakes@gmail.com>
|
||||||
L: linux-mm@kvack.org
|
L: linux-mm@kvack.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
W: http://www.linux-mm.org
|
W: http://www.linux-mm.org
|
||||||
|
|
32
arch/Kconfig
32
arch/Kconfig
|
@ -465,6 +465,38 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM
|
||||||
irqs disabled over activate_mm. Architectures that do IPI based TLB
|
irqs disabled over activate_mm. Architectures that do IPI based TLB
|
||||||
shootdowns should enable this.
|
shootdowns should enable this.
|
||||||
|
|
||||||
|
# Use normal mm refcounting for MMU_LAZY_TLB kernel thread references.
|
||||||
|
# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching
|
||||||
|
# to/from kernel threads when the same mm is running on a lot of CPUs (a large
|
||||||
|
# multi-threaded application), by reducing contention on the mm refcount.
|
||||||
|
#
|
||||||
|
# This can be disabled if the architecture ensures no CPUs are using an mm as a
|
||||||
|
# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm
|
||||||
|
# or its kernel page tables). This could be arranged by arch_exit_mmap(), or
|
||||||
|
# final exit(2) TLB flush, for example.
|
||||||
|
#
|
||||||
|
# To implement this, an arch *must*:
|
||||||
|
# Ensure the _lazy_tlb variants of mmgrab/mmdrop are used when manipulating
|
||||||
|
# the lazy tlb reference of a kthread's ->active_mm (non-arch code has been
|
||||||
|
# converted already).
|
||||||
|
config MMU_LAZY_TLB_REFCOUNT
|
||||||
|
def_bool y
|
||||||
|
depends on !MMU_LAZY_TLB_SHOOTDOWN
|
||||||
|
|
||||||
|
# This option allows MMU_LAZY_TLB_REFCOUNT=n. It ensures no CPUs are using an
|
||||||
|
# mm as a lazy tlb beyond its last reference count, by shooting down these
|
||||||
|
# users before the mm is deallocated. __mmdrop() first IPIs all CPUs that may
|
||||||
|
# be using the mm as a lazy tlb, so that they may switch themselves to using
|
||||||
|
# init_mm for their active mm. mm_cpumask(mm) is used to determine which CPUs
|
||||||
|
# may be using mm as a lazy tlb mm.
|
||||||
|
#
|
||||||
|
# To implement this, an arch *must*:
|
||||||
|
# - At the time of the final mmdrop of the mm, ensure mm_cpumask(mm) contains
|
||||||
|
# at least all possible CPUs in which the mm is lazy.
|
||||||
|
# - It must meet the requirements for MMU_LAZY_TLB_REFCOUNT=n (see above).
|
||||||
|
config MMU_LAZY_TLB_SHOOTDOWN
|
||||||
|
bool
|
||||||
|
|
||||||
config ARCH_HAVE_NMI_SAFE_CMPXCHG
|
config ARCH_HAVE_NMI_SAFE_CMPXCHG
|
||||||
bool
|
bool
|
||||||
|
|
||||||
|
|
|
@ -556,7 +556,7 @@ endmenu # "ARC Architecture Configuration"
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Maximum zone order"
|
||||||
default "12" if ARC_HUGEPAGE_16M
|
default "11" if ARC_HUGEPAGE_16M
|
||||||
default "11"
|
default "10"
|
||||||
|
|
||||||
source "kernel/power/Kconfig"
|
source "kernel/power/Kconfig"
|
||||||
|
|
|
@ -74,11 +74,6 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
|
||||||
base, TO_MB(size), !in_use ? "Not used":"");
|
base, TO_MB(size), !in_use ? "Not used":"");
|
||||||
}
|
}
|
||||||
|
|
||||||
bool arch_has_descending_max_zone_pfns(void)
|
|
||||||
{
|
|
||||||
return !IS_ENABLED(CONFIG_ARC_HAS_PAE40);
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* First memory setup routine called from setup_arch()
|
* First memory setup routine called from setup_arch()
|
||||||
* 1. setup swapper's mm @init_mm
|
* 1. setup swapper's mm @init_mm
|
||||||
|
|
|
@ -1352,20 +1352,19 @@ config ARM_MODULE_PLTS
|
||||||
configurations. If unsure, say y.
|
configurations. If unsure, say y.
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Order of maximal physically contiguous allocations"
|
||||||
default "12" if SOC_AM33XX
|
default "11" if SOC_AM33XX
|
||||||
default "9" if SA1111
|
default "8" if SA1111
|
||||||
default "11"
|
default "10"
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
Don't change if unsure.
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
config ALIGNMENT_TRAP
|
config ALIGNMENT_TRAP
|
||||||
def_bool CPU_CP15_MMU
|
def_bool CPU_CP15_MMU
|
||||||
|
|
|
@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
|
||||||
CONFIG_SMP=y
|
CONFIG_SMP=y
|
||||||
CONFIG_ARM_PSCI=y
|
CONFIG_ARM_PSCI=y
|
||||||
CONFIG_HIGHMEM=y
|
CONFIG_HIGHMEM=y
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=14
|
CONFIG_ARCH_FORCE_MAX_ORDER=13
|
||||||
CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
|
CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
|
||||||
CONFIG_KEXEC=y
|
CONFIG_KEXEC=y
|
||||||
CONFIG_CPU_FREQ=y
|
CONFIG_CPU_FREQ=y
|
||||||
|
|
|
@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
|
||||||
# CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
|
# CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
|
||||||
# CONFIG_ARM_PATCH_IDIV is not set
|
# CONFIG_ARM_PATCH_IDIV is not set
|
||||||
CONFIG_HIGHMEM=y
|
CONFIG_HIGHMEM=y
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=12
|
CONFIG_ARCH_FORCE_MAX_ORDER=11
|
||||||
CONFIG_SECCOMP=y
|
CONFIG_SECCOMP=y
|
||||||
CONFIG_KEXEC=y
|
CONFIG_KEXEC=y
|
||||||
CONFIG_EFI=y
|
CONFIG_EFI=y
|
||||||
|
|
|
@ -20,7 +20,7 @@ CONFIG_PXA_SHARPSL=y
|
||||||
CONFIG_MACH_AKITA=y
|
CONFIG_MACH_AKITA=y
|
||||||
CONFIG_MACH_BORZOI=y
|
CONFIG_MACH_BORZOI=y
|
||||||
CONFIG_AEABI=y
|
CONFIG_AEABI=y
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=9
|
CONFIG_ARCH_FORCE_MAX_ORDER=8
|
||||||
CONFIG_CMDLINE="root=/dev/ram0 ro"
|
CONFIG_CMDLINE="root=/dev/ram0 ro"
|
||||||
CONFIG_KEXEC=y
|
CONFIG_KEXEC=y
|
||||||
CONFIG_CPU_FREQ=y
|
CONFIG_CPU_FREQ=y
|
||||||
|
|
|
@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
|
||||||
# CONFIG_CACHE_L2X0 is not set
|
# CONFIG_CACHE_L2X0 is not set
|
||||||
# CONFIG_ARM_PATCH_IDIV is not set
|
# CONFIG_ARM_PATCH_IDIV is not set
|
||||||
# CONFIG_CPU_SW_DOMAIN_PAN is not set
|
# CONFIG_CPU_SW_DOMAIN_PAN is not set
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=15
|
CONFIG_ARCH_FORCE_MAX_ORDER=14
|
||||||
CONFIG_UACCESS_WITH_MEMCPY=y
|
CONFIG_UACCESS_WITH_MEMCPY=y
|
||||||
# CONFIG_ATAGS is not set
|
# CONFIG_ATAGS is not set
|
||||||
CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
|
CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
|
||||||
|
|
|
@ -17,7 +17,7 @@ CONFIG_ARCH_SUNPLUS=y
|
||||||
# CONFIG_VDSO is not set
|
# CONFIG_VDSO is not set
|
||||||
CONFIG_SMP=y
|
CONFIG_SMP=y
|
||||||
CONFIG_THUMB2_KERNEL=y
|
CONFIG_THUMB2_KERNEL=y
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=12
|
CONFIG_ARCH_FORCE_MAX_ORDER=11
|
||||||
CONFIG_VFP=y
|
CONFIG_VFP=y
|
||||||
CONFIG_NEON=y
|
CONFIG_NEON=y
|
||||||
CONFIG_MODULES=y
|
CONFIG_MODULES=y
|
||||||
|
|
|
@ -253,7 +253,7 @@ static int ecard_init_mm(void)
|
||||||
current->mm = mm;
|
current->mm = mm;
|
||||||
current->active_mm = mm;
|
current->active_mm = mm;
|
||||||
activate_mm(active_mm, mm);
|
activate_mm(active_mm, mm);
|
||||||
mmdrop(active_mm);
|
mmdrop_lazy_tlb(active_mm);
|
||||||
ecard_init_pgtables(mm);
|
ecard_init_pgtables(mm);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
|
@ -95,6 +95,7 @@ config ARM64
|
||||||
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
|
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
|
||||||
select ARCH_SUPPORTS_NUMA_BALANCING
|
select ARCH_SUPPORTS_NUMA_BALANCING
|
||||||
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
|
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
|
||||||
|
select ARCH_SUPPORTS_PER_VMA_LOCK
|
||||||
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
|
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
|
||||||
select ARCH_WANT_DEFAULT_BPF_JIT
|
select ARCH_WANT_DEFAULT_BPF_JIT
|
||||||
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
|
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
|
||||||
|
@ -1505,39 +1506,34 @@ config XEN
|
||||||
|
|
||||||
# include/linux/mmzone.h requires the following to be true:
|
# include/linux/mmzone.h requires the following to be true:
|
||||||
#
|
#
|
||||||
# MAX_ORDER - 1 + PAGE_SHIFT <= SECTION_SIZE_BITS
|
# MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
|
||||||
#
|
#
|
||||||
# so the maximum value of MAX_ORDER is SECTION_SIZE_BITS + 1 - PAGE_SHIFT:
|
# so the maximum value of MAX_ORDER is SECTION_SIZE_BITS - PAGE_SHIFT:
|
||||||
#
|
#
|
||||||
# | SECTION_SIZE_BITS | PAGE_SHIFT | max MAX_ORDER | default MAX_ORDER |
|
# | SECTION_SIZE_BITS | PAGE_SHIFT | max MAX_ORDER | default MAX_ORDER |
|
||||||
# ----+-------------------+--------------+-----------------+--------------------+
|
# ----+-------------------+--------------+-----------------+--------------------+
|
||||||
# 4K | 27 | 12 | 16 | 11 |
|
# 4K | 27 | 12 | 15 | 10 |
|
||||||
# 16K | 27 | 14 | 14 | 12 |
|
# 16K | 27 | 14 | 13 | 11 |
|
||||||
# 64K | 29 | 16 | 14 | 14 |
|
# 64K | 29 | 16 | 13 | 13 |
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
|
int "Order of maximal physically contiguous allocations" if EXPERT && (ARM64_4K_PAGES || ARM64_16K_PAGES)
|
||||||
default "14" if ARM64_64K_PAGES
|
default "13" if ARM64_64K_PAGES
|
||||||
range 12 14 if ARM64_16K_PAGES
|
default "11" if ARM64_16K_PAGES
|
||||||
default "12" if ARM64_16K_PAGES
|
default "10"
|
||||||
range 11 16 if ARM64_4K_PAGES
|
|
||||||
default "11"
|
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
The maximal size of allocation cannot exceed the size of the
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
section, so the value of MAX_ORDER should satisfy
|
||||||
|
|
||||||
We make sure that we can allocate up to a HugePage size for each configuration.
|
MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
|
||||||
Hence we have :
|
|
||||||
MAX_ORDER = (PMD_SHIFT - PAGE_SHIFT) + 1 => PAGE_SHIFT - 2
|
|
||||||
|
|
||||||
However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
|
Don't change if unsure.
|
||||||
4M allocations matching the default size used by generic code.
|
|
||||||
|
|
||||||
config UNMAP_KERNEL_AT_EL0
|
config UNMAP_KERNEL_AT_EL0
|
||||||
bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
|
bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
|
||||||
|
|
|
@ -261,9 +261,11 @@ static inline const void *__tag_set(const void *addr, u8 tag)
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_KASAN_HW_TAGS
|
#ifdef CONFIG_KASAN_HW_TAGS
|
||||||
#define arch_enable_tagging_sync() mte_enable_kernel_sync()
|
#define arch_enable_tag_checks_sync() mte_enable_kernel_sync()
|
||||||
#define arch_enable_tagging_async() mte_enable_kernel_async()
|
#define arch_enable_tag_checks_async() mte_enable_kernel_async()
|
||||||
#define arch_enable_tagging_asymm() mte_enable_kernel_asymm()
|
#define arch_enable_tag_checks_asymm() mte_enable_kernel_asymm()
|
||||||
|
#define arch_suppress_tag_checks_start() mte_enable_tco()
|
||||||
|
#define arch_suppress_tag_checks_stop() mte_disable_tco()
|
||||||
#define arch_force_async_tag_fault() mte_check_tfsr_exit()
|
#define arch_force_async_tag_fault() mte_check_tfsr_exit()
|
||||||
#define arch_get_random_tag() mte_get_random_tag()
|
#define arch_get_random_tag() mte_get_random_tag()
|
||||||
#define arch_get_mem_tag(addr) mte_get_mem_tag(addr)
|
#define arch_get_mem_tag(addr) mte_get_mem_tag(addr)
|
||||||
|
|
|
@ -13,8 +13,73 @@
|
||||||
|
|
||||||
#include <linux/types.h>
|
#include <linux/types.h>
|
||||||
|
|
||||||
|
#ifdef CONFIG_KASAN_HW_TAGS
|
||||||
|
|
||||||
|
/* Whether the MTE asynchronous mode is enabled. */
|
||||||
|
DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
|
||||||
|
|
||||||
|
static inline bool system_uses_mte_async_or_asymm_mode(void)
|
||||||
|
{
|
||||||
|
return static_branch_unlikely(&mte_async_or_asymm_mode);
|
||||||
|
}
|
||||||
|
|
||||||
|
#else /* CONFIG_KASAN_HW_TAGS */
|
||||||
|
|
||||||
|
static inline bool system_uses_mte_async_or_asymm_mode(void)
|
||||||
|
{
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* CONFIG_KASAN_HW_TAGS */
|
||||||
|
|
||||||
#ifdef CONFIG_ARM64_MTE
|
#ifdef CONFIG_ARM64_MTE
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The Tag Check Flag (TCF) mode for MTE is per EL, hence TCF0
|
||||||
|
* affects EL0 and TCF affects EL1 irrespective of which TTBR is
|
||||||
|
* used.
|
||||||
|
* The kernel accesses TTBR0 usually with LDTR/STTR instructions
|
||||||
|
* when UAO is available, so these would act as EL0 accesses using
|
||||||
|
* TCF0.
|
||||||
|
* However futex.h code uses exclusives which would be executed as
|
||||||
|
* EL1, this can potentially cause a tag check fault even if the
|
||||||
|
* user disables TCF0.
|
||||||
|
*
|
||||||
|
* To address the problem we set the PSTATE.TCO bit in uaccess_enable()
|
||||||
|
* and reset it in uaccess_disable().
|
||||||
|
*
|
||||||
|
* The Tag check override (TCO) bit disables temporarily the tag checking
|
||||||
|
* preventing the issue.
|
||||||
|
*/
|
||||||
|
static inline void mte_disable_tco(void)
|
||||||
|
{
|
||||||
|
asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(0),
|
||||||
|
ARM64_MTE, CONFIG_KASAN_HW_TAGS));
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void mte_enable_tco(void)
|
||||||
|
{
|
||||||
|
asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(1),
|
||||||
|
ARM64_MTE, CONFIG_KASAN_HW_TAGS));
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* These functions disable tag checking only if in MTE async mode
|
||||||
|
* since the sync mode generates exceptions synchronously and the
|
||||||
|
* nofault or load_unaligned_zeropad can handle them.
|
||||||
|
*/
|
||||||
|
static inline void __mte_disable_tco_async(void)
|
||||||
|
{
|
||||||
|
if (system_uses_mte_async_or_asymm_mode())
|
||||||
|
mte_disable_tco();
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void __mte_enable_tco_async(void)
|
||||||
|
{
|
||||||
|
if (system_uses_mte_async_or_asymm_mode())
|
||||||
|
mte_enable_tco();
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* These functions are meant to be only used from KASAN runtime through
|
* These functions are meant to be only used from KASAN runtime through
|
||||||
* the arch_*() interface defined in asm/memory.h.
|
* the arch_*() interface defined in asm/memory.h.
|
||||||
|
@ -138,6 +203,22 @@ void mte_enable_kernel_asymm(void);
|
||||||
|
|
||||||
#else /* CONFIG_ARM64_MTE */
|
#else /* CONFIG_ARM64_MTE */
|
||||||
|
|
||||||
|
static inline void mte_disable_tco(void)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void mte_enable_tco(void)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void __mte_disable_tco_async(void)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void __mte_enable_tco_async(void)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
static inline u8 mte_get_ptr_tag(void *ptr)
|
static inline u8 mte_get_ptr_tag(void *ptr)
|
||||||
{
|
{
|
||||||
return 0xFF;
|
return 0xFF;
|
||||||
|
|
|
@ -178,14 +178,6 @@ static inline void mte_disable_tco_entry(struct task_struct *task)
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_KASAN_HW_TAGS
|
#ifdef CONFIG_KASAN_HW_TAGS
|
||||||
/* Whether the MTE asynchronous mode is enabled. */
|
|
||||||
DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
|
|
||||||
|
|
||||||
static inline bool system_uses_mte_async_or_asymm_mode(void)
|
|
||||||
{
|
|
||||||
return static_branch_unlikely(&mte_async_or_asymm_mode);
|
|
||||||
}
|
|
||||||
|
|
||||||
void mte_check_tfsr_el1(void);
|
void mte_check_tfsr_el1(void);
|
||||||
|
|
||||||
static inline void mte_check_tfsr_entry(void)
|
static inline void mte_check_tfsr_entry(void)
|
||||||
|
@ -212,10 +204,6 @@ static inline void mte_check_tfsr_exit(void)
|
||||||
mte_check_tfsr_el1();
|
mte_check_tfsr_el1();
|
||||||
}
|
}
|
||||||
#else
|
#else
|
||||||
static inline bool system_uses_mte_async_or_asymm_mode(void)
|
|
||||||
{
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
static inline void mte_check_tfsr_el1(void)
|
static inline void mte_check_tfsr_el1(void)
|
||||||
{
|
{
|
||||||
}
|
}
|
||||||
|
|
|
@ -57,7 +57,7 @@ static inline bool arch_thp_swp_supported(void)
|
||||||
* fault on one CPU which has been handled concurrently by another CPU
|
* fault on one CPU which has been handled concurrently by another CPU
|
||||||
* does not need to perform additional invalidation.
|
* does not need to perform additional invalidation.
|
||||||
*/
|
*/
|
||||||
#define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
|
#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* ZERO_PAGE is a global shared page that is always zero: used
|
* ZERO_PAGE is a global shared page that is always zero: used
|
||||||
|
|
|
@ -10,7 +10,7 @@
|
||||||
/*
|
/*
|
||||||
* Section size must be at least 512MB for 64K base
|
* Section size must be at least 512MB for 64K base
|
||||||
* page size config. Otherwise it will be less than
|
* page size config. Otherwise it will be less than
|
||||||
* (MAX_ORDER - 1) and the build process will fail.
|
* MAX_ORDER and the build process will fail.
|
||||||
*/
|
*/
|
||||||
#ifdef CONFIG_ARM64_64K_PAGES
|
#ifdef CONFIG_ARM64_64K_PAGES
|
||||||
#define SECTION_SIZE_BITS 29
|
#define SECTION_SIZE_BITS 29
|
||||||
|
|
|
@ -136,55 +136,9 @@ static inline void __uaccess_enable_hw_pan(void)
|
||||||
CONFIG_ARM64_PAN));
|
CONFIG_ARM64_PAN));
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* The Tag Check Flag (TCF) mode for MTE is per EL, hence TCF0
|
|
||||||
* affects EL0 and TCF affects EL1 irrespective of which TTBR is
|
|
||||||
* used.
|
|
||||||
* The kernel accesses TTBR0 usually with LDTR/STTR instructions
|
|
||||||
* when UAO is available, so these would act as EL0 accesses using
|
|
||||||
* TCF0.
|
|
||||||
* However futex.h code uses exclusives which would be executed as
|
|
||||||
* EL1, this can potentially cause a tag check fault even if the
|
|
||||||
* user disables TCF0.
|
|
||||||
*
|
|
||||||
* To address the problem we set the PSTATE.TCO bit in uaccess_enable()
|
|
||||||
* and reset it in uaccess_disable().
|
|
||||||
*
|
|
||||||
* The Tag check override (TCO) bit disables temporarily the tag checking
|
|
||||||
* preventing the issue.
|
|
||||||
*/
|
|
||||||
static inline void __uaccess_disable_tco(void)
|
|
||||||
{
|
|
||||||
asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(0),
|
|
||||||
ARM64_MTE, CONFIG_KASAN_HW_TAGS));
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void __uaccess_enable_tco(void)
|
|
||||||
{
|
|
||||||
asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(1),
|
|
||||||
ARM64_MTE, CONFIG_KASAN_HW_TAGS));
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
|
||||||
* These functions disable tag checking only if in MTE async mode
|
|
||||||
* since the sync mode generates exceptions synchronously and the
|
|
||||||
* nofault or load_unaligned_zeropad can handle them.
|
|
||||||
*/
|
|
||||||
static inline void __uaccess_disable_tco_async(void)
|
|
||||||
{
|
|
||||||
if (system_uses_mte_async_or_asymm_mode())
|
|
||||||
__uaccess_disable_tco();
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void __uaccess_enable_tco_async(void)
|
|
||||||
{
|
|
||||||
if (system_uses_mte_async_or_asymm_mode())
|
|
||||||
__uaccess_enable_tco();
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void uaccess_disable_privileged(void)
|
static inline void uaccess_disable_privileged(void)
|
||||||
{
|
{
|
||||||
__uaccess_disable_tco();
|
mte_disable_tco();
|
||||||
|
|
||||||
if (uaccess_ttbr0_disable())
|
if (uaccess_ttbr0_disable())
|
||||||
return;
|
return;
|
||||||
|
@ -194,7 +148,7 @@ static inline void uaccess_disable_privileged(void)
|
||||||
|
|
||||||
static inline void uaccess_enable_privileged(void)
|
static inline void uaccess_enable_privileged(void)
|
||||||
{
|
{
|
||||||
__uaccess_enable_tco();
|
mte_enable_tco();
|
||||||
|
|
||||||
if (uaccess_ttbr0_enable())
|
if (uaccess_ttbr0_enable())
|
||||||
return;
|
return;
|
||||||
|
@ -302,8 +256,8 @@ do { \
|
||||||
#define get_user __get_user
|
#define get_user __get_user
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We must not call into the scheduler between __uaccess_enable_tco_async() and
|
* We must not call into the scheduler between __mte_enable_tco_async() and
|
||||||
* __uaccess_disable_tco_async(). As `dst` and `src` may contain blocking
|
* __mte_disable_tco_async(). As `dst` and `src` may contain blocking
|
||||||
* functions, we must evaluate these outside of the critical section.
|
* functions, we must evaluate these outside of the critical section.
|
||||||
*/
|
*/
|
||||||
#define __get_kernel_nofault(dst, src, type, err_label) \
|
#define __get_kernel_nofault(dst, src, type, err_label) \
|
||||||
|
@ -312,10 +266,10 @@ do { \
|
||||||
__typeof__(src) __gkn_src = (src); \
|
__typeof__(src) __gkn_src = (src); \
|
||||||
int __gkn_err = 0; \
|
int __gkn_err = 0; \
|
||||||
\
|
\
|
||||||
__uaccess_enable_tco_async(); \
|
__mte_enable_tco_async(); \
|
||||||
__raw_get_mem("ldr", *((type *)(__gkn_dst)), \
|
__raw_get_mem("ldr", *((type *)(__gkn_dst)), \
|
||||||
(__force type *)(__gkn_src), __gkn_err, K); \
|
(__force type *)(__gkn_src), __gkn_err, K); \
|
||||||
__uaccess_disable_tco_async(); \
|
__mte_disable_tco_async(); \
|
||||||
\
|
\
|
||||||
if (unlikely(__gkn_err)) \
|
if (unlikely(__gkn_err)) \
|
||||||
goto err_label; \
|
goto err_label; \
|
||||||
|
@ -388,8 +342,8 @@ do { \
|
||||||
#define put_user __put_user
|
#define put_user __put_user
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We must not call into the scheduler between __uaccess_enable_tco_async() and
|
* We must not call into the scheduler between __mte_enable_tco_async() and
|
||||||
* __uaccess_disable_tco_async(). As `dst` and `src` may contain blocking
|
* __mte_disable_tco_async(). As `dst` and `src` may contain blocking
|
||||||
* functions, we must evaluate these outside of the critical section.
|
* functions, we must evaluate these outside of the critical section.
|
||||||
*/
|
*/
|
||||||
#define __put_kernel_nofault(dst, src, type, err_label) \
|
#define __put_kernel_nofault(dst, src, type, err_label) \
|
||||||
|
@ -398,10 +352,10 @@ do { \
|
||||||
__typeof__(src) __pkn_src = (src); \
|
__typeof__(src) __pkn_src = (src); \
|
||||||
int __pkn_err = 0; \
|
int __pkn_err = 0; \
|
||||||
\
|
\
|
||||||
__uaccess_enable_tco_async(); \
|
__mte_enable_tco_async(); \
|
||||||
__raw_put_mem("str", *((type *)(__pkn_src)), \
|
__raw_put_mem("str", *((type *)(__pkn_src)), \
|
||||||
(__force type *)(__pkn_dst), __pkn_err, K); \
|
(__force type *)(__pkn_dst), __pkn_err, K); \
|
||||||
__uaccess_disable_tco_async(); \
|
__mte_disable_tco_async(); \
|
||||||
\
|
\
|
||||||
if (unlikely(__pkn_err)) \
|
if (unlikely(__pkn_err)) \
|
||||||
goto err_label; \
|
goto err_label; \
|
||||||
|
|
|
@ -55,7 +55,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
|
||||||
{
|
{
|
||||||
unsigned long ret;
|
unsigned long ret;
|
||||||
|
|
||||||
__uaccess_enable_tco_async();
|
__mte_enable_tco_async();
|
||||||
|
|
||||||
/* Load word from unaligned pointer addr */
|
/* Load word from unaligned pointer addr */
|
||||||
asm(
|
asm(
|
||||||
|
@ -65,7 +65,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
|
||||||
: "=&r" (ret)
|
: "=&r" (ret)
|
||||||
: "r" (addr), "Q" (*(unsigned long *)addr));
|
: "r" (addr), "Q" (*(unsigned long *)addr));
|
||||||
|
|
||||||
__uaccess_disable_tco_async();
|
__mte_disable_tco_async();
|
||||||
|
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
|
@ -16,7 +16,7 @@ struct hyp_pool {
|
||||||
* API at EL2.
|
* API at EL2.
|
||||||
*/
|
*/
|
||||||
hyp_spinlock_t lock;
|
hyp_spinlock_t lock;
|
||||||
struct list_head free_area[MAX_ORDER];
|
struct list_head free_area[MAX_ORDER + 1];
|
||||||
phys_addr_t range_start;
|
phys_addr_t range_start;
|
||||||
phys_addr_t range_end;
|
phys_addr_t range_end;
|
||||||
unsigned short max_order;
|
unsigned short max_order;
|
||||||
|
|
|
@ -110,7 +110,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
|
||||||
* after coalescing, so make sure to mark it HYP_NO_ORDER proactively.
|
* after coalescing, so make sure to mark it HYP_NO_ORDER proactively.
|
||||||
*/
|
*/
|
||||||
p->order = HYP_NO_ORDER;
|
p->order = HYP_NO_ORDER;
|
||||||
for (; (order + 1) < pool->max_order; order++) {
|
for (; (order + 1) <= pool->max_order; order++) {
|
||||||
buddy = __find_buddy_avail(pool, p, order);
|
buddy = __find_buddy_avail(pool, p, order);
|
||||||
if (!buddy)
|
if (!buddy)
|
||||||
break;
|
break;
|
||||||
|
@ -203,9 +203,9 @@ void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
|
||||||
hyp_spin_lock(&pool->lock);
|
hyp_spin_lock(&pool->lock);
|
||||||
|
|
||||||
/* Look for a high-enough-order page */
|
/* Look for a high-enough-order page */
|
||||||
while (i < pool->max_order && list_empty(&pool->free_area[i]))
|
while (i <= pool->max_order && list_empty(&pool->free_area[i]))
|
||||||
i++;
|
i++;
|
||||||
if (i >= pool->max_order) {
|
if (i > pool->max_order) {
|
||||||
hyp_spin_unlock(&pool->lock);
|
hyp_spin_unlock(&pool->lock);
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
@ -228,8 +228,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
hyp_spin_lock_init(&pool->lock);
|
hyp_spin_lock_init(&pool->lock);
|
||||||
pool->max_order = min(MAX_ORDER, get_order((nr_pages + 1) << PAGE_SHIFT));
|
pool->max_order = min(MAX_ORDER, get_order(nr_pages << PAGE_SHIFT));
|
||||||
for (i = 0; i < pool->max_order; i++)
|
for (i = 0; i <= pool->max_order; i++)
|
||||||
INIT_LIST_HEAD(&pool->free_area[i]);
|
INIT_LIST_HEAD(&pool->free_area[i]);
|
||||||
pool->range_start = phys;
|
pool->range_start = phys;
|
||||||
pool->range_end = phys + (nr_pages << PAGE_SHIFT);
|
pool->range_end = phys + (nr_pages << PAGE_SHIFT);
|
||||||
|
|
|
@ -535,6 +535,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
|
||||||
unsigned long vm_flags;
|
unsigned long vm_flags;
|
||||||
unsigned int mm_flags = FAULT_FLAG_DEFAULT;
|
unsigned int mm_flags = FAULT_FLAG_DEFAULT;
|
||||||
unsigned long addr = untagged_addr(far);
|
unsigned long addr = untagged_addr(far);
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
struct vm_area_struct *vma;
|
||||||
|
#endif
|
||||||
|
|
||||||
if (kprobe_page_fault(regs, esr))
|
if (kprobe_page_fault(regs, esr))
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -585,6 +588,36 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
|
||||||
|
|
||||||
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
|
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
|
||||||
|
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
if (!(mm_flags & FAULT_FLAG_USER))
|
||||||
|
goto lock_mmap;
|
||||||
|
|
||||||
|
vma = lock_vma_under_rcu(mm, addr);
|
||||||
|
if (!vma)
|
||||||
|
goto lock_mmap;
|
||||||
|
|
||||||
|
if (!(vma->vm_flags & vm_flags)) {
|
||||||
|
vma_end_read(vma);
|
||||||
|
goto lock_mmap;
|
||||||
|
}
|
||||||
|
fault = handle_mm_fault(vma, addr & PAGE_MASK,
|
||||||
|
mm_flags | FAULT_FLAG_VMA_LOCK, regs);
|
||||||
|
vma_end_read(vma);
|
||||||
|
|
||||||
|
if (!(fault & VM_FAULT_RETRY)) {
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_RETRY);
|
||||||
|
|
||||||
|
/* Quick path to respond to signals */
|
||||||
|
if (fault_signal_pending(fault, regs)) {
|
||||||
|
if (!user_mode(regs))
|
||||||
|
goto no_context;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
lock_mmap:
|
||||||
|
#endif /* CONFIG_PER_VMA_LOCK */
|
||||||
/*
|
/*
|
||||||
* As per x86, we may deadlock here. However, since the kernel only
|
* As per x86, we may deadlock here. However, since the kernel only
|
||||||
* validly references user space from well defined areas of the code,
|
* validly references user space from well defined areas of the code,
|
||||||
|
@ -628,6 +661,9 @@ retry:
|
||||||
}
|
}
|
||||||
mmap_read_unlock(mm);
|
mmap_read_unlock(mm);
|
||||||
|
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
done:
|
||||||
|
#endif
|
||||||
/*
|
/*
|
||||||
* Handle the "normal" (no error) case first.
|
* Handle the "normal" (no error) case first.
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -332,10 +332,6 @@ config HIGHMEM
|
||||||
select KMAP_LOCAL
|
select KMAP_LOCAL
|
||||||
default y
|
default y
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
|
||||||
int "Maximum zone order"
|
|
||||||
default "11"
|
|
||||||
|
|
||||||
config DRAM_BASE
|
config DRAM_BASE
|
||||||
hex "DRAM start addr (the same with memory-section in dts)"
|
hex "DRAM start addr (the same with memory-section in dts)"
|
||||||
default 0x0
|
default 0x0
|
||||||
|
|
|
@ -203,10 +203,9 @@ config IA64_CYCLONE
|
||||||
If you're unsure, answer N.
|
If you're unsure, answer N.
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "MAX_ORDER (11 - 17)" if !HUGETLB_PAGE
|
int
|
||||||
range 11 17 if !HUGETLB_PAGE
|
default "16" if HUGETLB_PAGE
|
||||||
default "17" if HUGETLB_PAGE
|
default "10"
|
||||||
default "11"
|
|
||||||
|
|
||||||
config SMP
|
config SMP
|
||||||
bool "Symmetric multi-processing support"
|
bool "Symmetric multi-processing support"
|
||||||
|
|
|
@ -12,9 +12,9 @@
|
||||||
#define SECTION_SIZE_BITS (30)
|
#define SECTION_SIZE_BITS (30)
|
||||||
#define MAX_PHYSMEM_BITS (50)
|
#define MAX_PHYSMEM_BITS (50)
|
||||||
#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
|
#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
|
||||||
#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
|
#if (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT > SECTION_SIZE_BITS)
|
||||||
#undef SECTION_SIZE_BITS
|
#undef SECTION_SIZE_BITS
|
||||||
#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
|
#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT)
|
||||||
#endif
|
#endif
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
|
|
@ -170,7 +170,7 @@ static int __init hugetlb_setup_sz(char *str)
|
||||||
size = memparse(str, &str);
|
size = memparse(str, &str);
|
||||||
if (*str || !is_power_of_2(size) || !(tr_pages & size) ||
|
if (*str || !is_power_of_2(size) || !(tr_pages & size) ||
|
||||||
size <= PAGE_SIZE ||
|
size <= PAGE_SIZE ||
|
||||||
size >= (1UL << PAGE_SHIFT << MAX_ORDER)) {
|
size > (1UL << PAGE_SHIFT << MAX_ORDER)) {
|
||||||
printk(KERN_WARNING "Invalid huge page size specified\n");
|
printk(KERN_WARNING "Invalid huge page size specified\n");
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
|
@ -53,8 +53,8 @@ config LOONGARCH
|
||||||
select ARCH_USE_QUEUED_RWLOCKS
|
select ARCH_USE_QUEUED_RWLOCKS
|
||||||
select ARCH_USE_QUEUED_SPINLOCKS
|
select ARCH_USE_QUEUED_SPINLOCKS
|
||||||
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
|
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
|
||||||
select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
|
|
||||||
select ARCH_WANT_LD_ORPHAN_WARN
|
select ARCH_WANT_LD_ORPHAN_WARN
|
||||||
|
select ARCH_WANT_OPTIMIZE_VMEMMAP
|
||||||
select ARCH_WANTS_NO_INSTR
|
select ARCH_WANTS_NO_INSTR
|
||||||
select BUILDTIME_TABLE_SORT
|
select BUILDTIME_TABLE_SORT
|
||||||
select COMMON_CLK
|
select COMMON_CLK
|
||||||
|
@ -421,12 +421,9 @@ config NODES_SHIFT
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Maximum zone order"
|
||||||
range 14 64 if PAGE_SIZE_64KB
|
default "13" if PAGE_SIZE_64KB
|
||||||
default "14" if PAGE_SIZE_64KB
|
default "11" if PAGE_SIZE_16KB
|
||||||
range 12 64 if PAGE_SIZE_16KB
|
default "10"
|
||||||
default "12" if PAGE_SIZE_16KB
|
|
||||||
range 11 64
|
|
||||||
default "11"
|
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel memory allocator divides physically contiguous memory
|
||||||
blocks into "zones", where each zone is a power of two number of
|
blocks into "zones", where each zone is a power of two number of
|
||||||
|
@ -435,9 +432,6 @@ config ARCH_FORCE_MAX_ORDER
|
||||||
blocks of physically contiguous memory, then you may need to
|
blocks of physically contiguous memory, then you may need to
|
||||||
increase this value.
|
increase this value.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
The page size is not necessarily 4KB. Keep this in mind
|
The page size is not necessarily 4KB. Keep this in mind
|
||||||
when choosing a value for this option.
|
when choosing a value for this option.
|
||||||
|
|
||||||
|
|
|
@ -397,23 +397,22 @@ config SINGLE_MEMORY_CHUNK
|
||||||
Say N if not sure.
|
Say N if not sure.
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order" if ADVANCED
|
int "Order of maximal physically contiguous allocations" if ADVANCED
|
||||||
depends on !SINGLE_MEMORY_CHUNK
|
depends on !SINGLE_MEMORY_CHUNK
|
||||||
default "11"
|
default "10"
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
For systems that have holes in their physical address space this
|
For systems that have holes in their physical address space this
|
||||||
value also defines the minimal size of the hole that allows
|
value also defines the minimal size of the hole that allows
|
||||||
freeing unused memory map.
|
freeing unused memory map.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
Don't change if unsure.
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
config 060_WRITETHROUGH
|
config 060_WRITETHROUGH
|
||||||
bool "Use write-through caching for 68060 supervisor accesses"
|
bool "Use write-through caching for 68060 supervisor accesses"
|
||||||
|
|
|
@ -46,7 +46,7 @@
|
||||||
#define _CACHEMASK040 (~0x060)
|
#define _CACHEMASK040 (~0x060)
|
||||||
#define _PAGE_GLOBAL040 0x400 /* 68040 global bit, used for kva descs */
|
#define _PAGE_GLOBAL040 0x400 /* 68040 global bit, used for kva descs */
|
||||||
|
|
||||||
/* We borrow bit 24 to store the exclusive marker in swap PTEs. */
|
/* We borrow bit 7 to store the exclusive marker in swap PTEs. */
|
||||||
#define _PAGE_SWP_EXCLUSIVE CF_PAGE_NOCACHE
|
#define _PAGE_SWP_EXCLUSIVE CF_PAGE_NOCACHE
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
|
|
@ -2099,14 +2099,10 @@ endchoice
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Maximum zone order"
|
||||||
range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
|
default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
|
||||||
default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
|
default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
|
||||||
range 13 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
|
default "11" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
|
||||||
default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
|
default "10"
|
||||||
range 12 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
|
|
||||||
default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
|
|
||||||
range 0 64
|
|
||||||
default "11"
|
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel memory allocator divides physically contiguous memory
|
||||||
blocks into "zones", where each zone is a power of two number of
|
blocks into "zones", where each zone is a power of two number of
|
||||||
|
@ -2115,9 +2111,6 @@ config ARCH_FORCE_MAX_ORDER
|
||||||
blocks of physically contiguous memory, then you may need to
|
blocks of physically contiguous memory, then you may need to
|
||||||
increase this value.
|
increase this value.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
The page size is not necessarily 4KB. Keep this in mind
|
The page size is not necessarily 4KB. Keep this in mind
|
||||||
when choosing a value for this option.
|
when choosing a value for this option.
|
||||||
|
|
||||||
|
|
|
@ -70,7 +70,7 @@ enum fixed_addresses {
|
||||||
#include <asm-generic/fixmap.h>
|
#include <asm-generic/fixmap.h>
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Called from pgtable_init()
|
* Called from pagetable_init()
|
||||||
*/
|
*/
|
||||||
extern void fixrange_init(unsigned long start, unsigned long end,
|
extern void fixrange_init(unsigned long start, unsigned long end,
|
||||||
pgd_t *pgd_base);
|
pgd_t *pgd_base);
|
||||||
|
|
|
@ -469,7 +469,8 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
|
static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
|
||||||
unsigned long address)
|
unsigned long address,
|
||||||
|
pte_t *ptep)
|
||||||
{
|
{
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -45,19 +45,17 @@ menu "Kernel features"
|
||||||
source "kernel/Kconfig.hz"
|
source "kernel/Kconfig.hz"
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Order of maximal physically contiguous allocations"
|
||||||
range 9 20
|
default "10"
|
||||||
default "11"
|
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
Don't change if unsure.
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
endmenu
|
endmenu
|
||||||
|
|
||||||
|
|
|
@ -267,6 +267,7 @@ config PPC
|
||||||
select MMU_GATHER_PAGE_SIZE
|
select MMU_GATHER_PAGE_SIZE
|
||||||
select MMU_GATHER_RCU_TABLE_FREE
|
select MMU_GATHER_RCU_TABLE_FREE
|
||||||
select MMU_GATHER_MERGE_VMAS
|
select MMU_GATHER_MERGE_VMAS
|
||||||
|
select MMU_LAZY_TLB_SHOOTDOWN if PPC_BOOK3S_64
|
||||||
select MODULES_USE_ELF_RELA
|
select MODULES_USE_ELF_RELA
|
||||||
select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE
|
select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE
|
||||||
select NEED_PER_CPU_EMBED_FIRST_CHUNK if PPC64
|
select NEED_PER_CPU_EMBED_FIRST_CHUNK if PPC64
|
||||||
|
@ -896,34 +897,27 @@ config DATA_SHIFT
|
||||||
8M pages will be pinned.
|
8M pages will be pinned.
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Order of maximal physically contiguous allocations"
|
||||||
range 8 9 if PPC64 && PPC_64K_PAGES
|
default "8" if PPC64 && PPC_64K_PAGES
|
||||||
default "9" if PPC64 && PPC_64K_PAGES
|
default "12" if PPC64 && !PPC_64K_PAGES
|
||||||
range 13 13 if PPC64 && !PPC_64K_PAGES
|
default "8" if PPC32 && PPC_16K_PAGES
|
||||||
default "13" if PPC64 && !PPC_64K_PAGES
|
default "6" if PPC32 && PPC_64K_PAGES
|
||||||
range 9 64 if PPC32 && PPC_16K_PAGES
|
default "4" if PPC32 && PPC_256K_PAGES
|
||||||
default "9" if PPC32 && PPC_16K_PAGES
|
default "10"
|
||||||
range 7 64 if PPC32 && PPC_64K_PAGES
|
|
||||||
default "7" if PPC32 && PPC_64K_PAGES
|
|
||||||
range 5 64 if PPC32 && PPC_256K_PAGES
|
|
||||||
default "5" if PPC32 && PPC_256K_PAGES
|
|
||||||
range 11 64
|
|
||||||
default "11"
|
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
The page size is not necessarily 4KB. For example, on 64-bit
|
The page size is not necessarily 4KB. For example, on 64-bit
|
||||||
systems, 64KB pages can be enabled via CONFIG_PPC_64K_PAGES. Keep
|
systems, 64KB pages can be enabled via CONFIG_PPC_64K_PAGES. Keep
|
||||||
this in mind when choosing a value for this option.
|
this in mind when choosing a value for this option.
|
||||||
|
|
||||||
|
Don't change if unsure.
|
||||||
|
|
||||||
config PPC_SUBPAGE_PROT
|
config PPC_SUBPAGE_PROT
|
||||||
bool "Support setting protections for 4k subpages (subpage_prot syscall)"
|
bool "Support setting protections for 4k subpages (subpage_prot syscall)"
|
||||||
default n
|
default n
|
||||||
|
|
|
@ -30,7 +30,7 @@ CONFIG_PREEMPT=y
|
||||||
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
|
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
|
||||||
CONFIG_BINFMT_MISC=m
|
CONFIG_BINFMT_MISC=m
|
||||||
CONFIG_MATH_EMULATION=y
|
CONFIG_MATH_EMULATION=y
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=17
|
CONFIG_ARCH_FORCE_MAX_ORDER=16
|
||||||
CONFIG_PCI=y
|
CONFIG_PCI=y
|
||||||
CONFIG_PCIEPORTBUS=y
|
CONFIG_PCIEPORTBUS=y
|
||||||
CONFIG_PCI_MSI=y
|
CONFIG_PCI_MSI=y
|
||||||
|
|
|
@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y
|
||||||
CONFIG_FONT_8x16=y
|
CONFIG_FONT_8x16=y
|
||||||
CONFIG_FONT_8x8=y
|
CONFIG_FONT_8x8=y
|
||||||
CONFIG_FONTS=y
|
CONFIG_FONTS=y
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=13
|
CONFIG_ARCH_FORCE_MAX_ORDER=12
|
||||||
CONFIG_FRAMEBUFFER_CONSOLE=y
|
CONFIG_FRAMEBUFFER_CONSOLE=y
|
||||||
CONFIG_FRAME_WARN=1024
|
CONFIG_FRAME_WARN=1024
|
||||||
CONFIG_FTL=y
|
CONFIG_FTL=y
|
||||||
|
|
|
@ -121,7 +121,8 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
|
||||||
|
|
||||||
#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
|
#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
|
||||||
static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
|
static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
|
||||||
unsigned long address)
|
unsigned long address,
|
||||||
|
pte_t *ptep)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* Book3S 64 does not require spurious fault flushes because the PTE
|
* Book3S 64 does not require spurious fault flushes because the PTE
|
||||||
|
|
|
@ -1611,7 +1611,7 @@ void start_secondary(void *unused)
|
||||||
if (IS_ENABLED(CONFIG_PPC32))
|
if (IS_ENABLED(CONFIG_PPC32))
|
||||||
setup_kup();
|
setup_kup();
|
||||||
|
|
||||||
mmgrab(&init_mm);
|
mmgrab_lazy_tlb(&init_mm);
|
||||||
current->active_mm = &init_mm;
|
current->active_mm = &init_mm;
|
||||||
|
|
||||||
smp_store_cpu_info(cpu);
|
smp_store_cpu_info(cpu);
|
||||||
|
|
|
@ -97,7 +97,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
|
||||||
}
|
}
|
||||||
|
|
||||||
mmap_read_lock(mm);
|
mmap_read_lock(mm);
|
||||||
chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) /
|
chunk = (1UL << (PAGE_SHIFT + MAX_ORDER)) /
|
||||||
sizeof(struct vm_area_struct *);
|
sizeof(struct vm_area_struct *);
|
||||||
chunk = min(chunk, entries);
|
chunk = min(chunk, entries);
|
||||||
for (entry = 0; entry < entries; entry += chunk) {
|
for (entry = 0; entry < entries; entry += chunk) {
|
||||||
|
|
|
@ -797,10 +797,10 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
|
||||||
if (current->active_mm == mm) {
|
if (current->active_mm == mm) {
|
||||||
WARN_ON_ONCE(current->mm != NULL);
|
WARN_ON_ONCE(current->mm != NULL);
|
||||||
/* Is a kernel thread and is using mm as the lazy tlb */
|
/* Is a kernel thread and is using mm as the lazy tlb */
|
||||||
mmgrab(&init_mm);
|
mmgrab_lazy_tlb(&init_mm);
|
||||||
current->active_mm = &init_mm;
|
current->active_mm = &init_mm;
|
||||||
switch_mm_irqs_off(mm, &init_mm, current);
|
switch_mm_irqs_off(mm, &init_mm, current);
|
||||||
mmdrop(mm);
|
mmdrop_lazy_tlb(mm);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
|
|
@ -474,6 +474,40 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
|
||||||
if (is_exec)
|
if (is_exec)
|
||||||
flags |= FAULT_FLAG_INSTRUCTION;
|
flags |= FAULT_FLAG_INSTRUCTION;
|
||||||
|
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
if (!(flags & FAULT_FLAG_USER))
|
||||||
|
goto lock_mmap;
|
||||||
|
|
||||||
|
vma = lock_vma_under_rcu(mm, address);
|
||||||
|
if (!vma)
|
||||||
|
goto lock_mmap;
|
||||||
|
|
||||||
|
if (unlikely(access_pkey_error(is_write, is_exec,
|
||||||
|
(error_code & DSISR_KEYFAULT), vma))) {
|
||||||
|
vma_end_read(vma);
|
||||||
|
goto lock_mmap;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (unlikely(access_error(is_write, is_exec, vma))) {
|
||||||
|
vma_end_read(vma);
|
||||||
|
goto lock_mmap;
|
||||||
|
}
|
||||||
|
|
||||||
|
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
|
||||||
|
vma_end_read(vma);
|
||||||
|
|
||||||
|
if (!(fault & VM_FAULT_RETRY)) {
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_RETRY);
|
||||||
|
|
||||||
|
if (fault_signal_pending(fault, regs))
|
||||||
|
return user_mode(regs) ? 0 : SIGBUS;
|
||||||
|
|
||||||
|
lock_mmap:
|
||||||
|
#endif /* CONFIG_PER_VMA_LOCK */
|
||||||
|
|
||||||
/* When running in the kernel we expect faults to occur only to
|
/* When running in the kernel we expect faults to occur only to
|
||||||
* addresses in user space. All other faults represent errors in the
|
* addresses in user space. All other faults represent errors in the
|
||||||
* kernel and should generate an OOPS. Unfortunately, in the case of an
|
* kernel and should generate an OOPS. Unfortunately, in the case of an
|
||||||
|
@ -550,6 +584,9 @@ retry:
|
||||||
|
|
||||||
mmap_read_unlock(current->mm);
|
mmap_read_unlock(current->mm);
|
||||||
|
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
done:
|
||||||
|
#endif
|
||||||
if (unlikely(fault & VM_FAULT_ERROR))
|
if (unlikely(fault & VM_FAULT_ERROR))
|
||||||
return mm_fault_error(regs, address, fault);
|
return mm_fault_error(regs, address, fault);
|
||||||
|
|
||||||
|
|
|
@ -615,7 +615,7 @@ void __init gigantic_hugetlb_cma_reserve(void)
|
||||||
order = mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
|
order = mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
|
||||||
|
|
||||||
if (order) {
|
if (order) {
|
||||||
VM_WARN_ON(order < MAX_ORDER);
|
VM_WARN_ON(order <= MAX_ORDER);
|
||||||
hugetlb_cma_reserve(order);
|
hugetlb_cma_reserve(order);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -16,6 +16,7 @@ config PPC_POWERNV
|
||||||
select PPC_DOORBELL
|
select PPC_DOORBELL
|
||||||
select MMU_NOTIFIER
|
select MMU_NOTIFIER
|
||||||
select FORCE_SMP
|
select FORCE_SMP
|
||||||
|
select ARCH_SUPPORTS_PER_VMA_LOCK
|
||||||
default y
|
default y
|
||||||
|
|
||||||
config OPAL_PRD
|
config OPAL_PRD
|
||||||
|
|
|
@ -1740,7 +1740,7 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
|
||||||
* DMA window can be larger than available memory, which will
|
* DMA window can be larger than available memory, which will
|
||||||
* cause errors later.
|
* cause errors later.
|
||||||
*/
|
*/
|
||||||
const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
|
const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We create the default window as big as we can. The constraint is
|
* We create the default window as big as we can. The constraint is
|
||||||
|
|
|
@ -22,6 +22,7 @@ config PPC_PSERIES
|
||||||
select HOTPLUG_CPU
|
select HOTPLUG_CPU
|
||||||
select FORCE_SMP
|
select FORCE_SMP
|
||||||
select SWIOTLB
|
select SWIOTLB
|
||||||
|
select ARCH_SUPPORTS_PER_VMA_LOCK
|
||||||
default y
|
default y
|
||||||
|
|
||||||
config PARAVIRT
|
config PARAVIRT
|
||||||
|
|
|
@ -120,13 +120,14 @@ config S390
|
||||||
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
|
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
|
||||||
select ARCH_SUPPORTS_HUGETLBFS
|
select ARCH_SUPPORTS_HUGETLBFS
|
||||||
select ARCH_SUPPORTS_NUMA_BALANCING
|
select ARCH_SUPPORTS_NUMA_BALANCING
|
||||||
|
select ARCH_SUPPORTS_PER_VMA_LOCK
|
||||||
select ARCH_USE_BUILTIN_BSWAP
|
select ARCH_USE_BUILTIN_BSWAP
|
||||||
select ARCH_USE_CMPXCHG_LOCKREF
|
select ARCH_USE_CMPXCHG_LOCKREF
|
||||||
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
|
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
|
||||||
select ARCH_WANTS_NO_INSTR
|
select ARCH_WANTS_NO_INSTR
|
||||||
select ARCH_WANT_DEFAULT_BPF_JIT
|
select ARCH_WANT_DEFAULT_BPF_JIT
|
||||||
select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
|
|
||||||
select ARCH_WANT_IPC_PARSE_VERSION
|
select ARCH_WANT_IPC_PARSE_VERSION
|
||||||
|
select ARCH_WANT_OPTIMIZE_VMEMMAP
|
||||||
select BUILDTIME_TABLE_SORT
|
select BUILDTIME_TABLE_SORT
|
||||||
select CLONE_BACKWARDS2
|
select CLONE_BACKWARDS2
|
||||||
select DMA_OPS if PCI
|
select DMA_OPS if PCI
|
||||||
|
|
|
@ -1239,7 +1239,8 @@ static inline int pte_allow_rdp(pte_t old, pte_t new)
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
|
static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
|
||||||
unsigned long address)
|
unsigned long address,
|
||||||
|
pte_t *ptep)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* RDP might not have propagated the PTE protection reset to all CPUs,
|
* RDP might not have propagated the PTE protection reset to all CPUs,
|
||||||
|
@ -1247,11 +1248,12 @@ static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
|
||||||
* NOTE: This will also be called when a racing pagetable update on
|
* NOTE: This will also be called when a racing pagetable update on
|
||||||
* another thread already installed the correct PTE. Both cases cannot
|
* another thread already installed the correct PTE. Both cases cannot
|
||||||
* really be distinguished.
|
* really be distinguished.
|
||||||
* Therefore, only do the local TLB flush when RDP can be used, to avoid
|
* Therefore, only do the local TLB flush when RDP can be used, and the
|
||||||
* unnecessary overhead.
|
* PTE does not have _PAGE_PROTECT set, to avoid unnecessary overhead.
|
||||||
|
* A local RDP can be used to do the flush.
|
||||||
*/
|
*/
|
||||||
if (MACHINE_HAS_RDP)
|
if (MACHINE_HAS_RDP && !(pte_val(*ptep) & _PAGE_PROTECT))
|
||||||
asm volatile("ptlb" : : : "memory");
|
__ptep_rdp(address, ptep, 0, 0, 1);
|
||||||
}
|
}
|
||||||
#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
|
#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
|
||||||
|
|
||||||
|
|
|
@ -407,6 +407,30 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
|
||||||
access = VM_WRITE;
|
access = VM_WRITE;
|
||||||
if (access == VM_WRITE)
|
if (access == VM_WRITE)
|
||||||
flags |= FAULT_FLAG_WRITE;
|
flags |= FAULT_FLAG_WRITE;
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
if (!(flags & FAULT_FLAG_USER))
|
||||||
|
goto lock_mmap;
|
||||||
|
vma = lock_vma_under_rcu(mm, address);
|
||||||
|
if (!vma)
|
||||||
|
goto lock_mmap;
|
||||||
|
if (!(vma->vm_flags & access)) {
|
||||||
|
vma_end_read(vma);
|
||||||
|
goto lock_mmap;
|
||||||
|
}
|
||||||
|
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
|
||||||
|
vma_end_read(vma);
|
||||||
|
if (!(fault & VM_FAULT_RETRY)) {
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_RETRY);
|
||||||
|
/* Quick path to respond to signals */
|
||||||
|
if (fault_signal_pending(fault, regs)) {
|
||||||
|
fault = VM_FAULT_SIGNAL;
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
lock_mmap:
|
||||||
|
#endif /* CONFIG_PER_VMA_LOCK */
|
||||||
mmap_read_lock(mm);
|
mmap_read_lock(mm);
|
||||||
|
|
||||||
gmap = NULL;
|
gmap = NULL;
|
||||||
|
|
|
@ -2591,6 +2591,13 @@ int gmap_mark_unmergeable(void)
|
||||||
int ret;
|
int ret;
|
||||||
VMA_ITERATOR(vmi, mm, 0);
|
VMA_ITERATOR(vmi, mm, 0);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Make sure to disable KSM (if enabled for the whole process or
|
||||||
|
* individual VMAs). Note that nothing currently hinders user space
|
||||||
|
* from re-enabling it.
|
||||||
|
*/
|
||||||
|
clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
|
||||||
|
|
||||||
for_each_vma(vmi, vma) {
|
for_each_vma(vmi, vma) {
|
||||||
/* Copy vm_flags to avoid partial modifications in ksm_madvise */
|
/* Copy vm_flags to avoid partial modifications in ksm_madvise */
|
||||||
vm_flags = vma->vm_flags;
|
vm_flags = vma->vm_flags;
|
||||||
|
|
|
@ -273,7 +273,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
|
||||||
|
|
||||||
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
|
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
|
||||||
info.length = len;
|
info.length = len;
|
||||||
info.low_limit = max(PAGE_SIZE, mmap_min_addr);
|
info.low_limit = PAGE_SIZE;
|
||||||
info.high_limit = current->mm->mmap_base;
|
info.high_limit = current->mm->mmap_base;
|
||||||
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
|
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
|
||||||
info.align_offset = 0;
|
info.align_offset = 0;
|
||||||
|
|
|
@ -136,7 +136,7 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long ad
|
||||||
|
|
||||||
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
|
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
|
||||||
info.length = len;
|
info.length = len;
|
||||||
info.low_limit = max(PAGE_SIZE, mmap_min_addr);
|
info.low_limit = PAGE_SIZE;
|
||||||
info.high_limit = mm->mmap_base;
|
info.high_limit = mm->mmap_base;
|
||||||
if (filp || (flags & MAP_SHARED))
|
if (filp || (flags & MAP_SHARED))
|
||||||
info.align_mask = MMAP_ALIGN_MASK << PAGE_SHIFT;
|
info.align_mask = MMAP_ALIGN_MASK << PAGE_SHIFT;
|
||||||
|
|
|
@ -8,7 +8,7 @@ CONFIG_MODULES=y
|
||||||
CONFIG_MODULE_UNLOAD=y
|
CONFIG_MODULE_UNLOAD=y
|
||||||
# CONFIG_BLK_DEV_BSG is not set
|
# CONFIG_BLK_DEV_BSG is not set
|
||||||
CONFIG_CPU_SUBTYPE_SH7724=y
|
CONFIG_CPU_SUBTYPE_SH7724=y
|
||||||
CONFIG_ARCH_FORCE_MAX_ORDER=12
|
CONFIG_ARCH_FORCE_MAX_ORDER=11
|
||||||
CONFIG_MEMORY_SIZE=0x10000000
|
CONFIG_MEMORY_SIZE=0x10000000
|
||||||
CONFIG_FLATMEM_MANUAL=y
|
CONFIG_FLATMEM_MANUAL=y
|
||||||
CONFIG_SH_ECOVEC=y
|
CONFIG_SH_ECOVEC=y
|
||||||
|
|
|
@ -19,28 +19,24 @@ config PAGE_OFFSET
|
||||||
default "0x00000000"
|
default "0x00000000"
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Order of maximal physically contiguous allocations"
|
||||||
range 9 64 if PAGE_SIZE_16KB
|
default "8" if PAGE_SIZE_16KB
|
||||||
default "9" if PAGE_SIZE_16KB
|
default "6" if PAGE_SIZE_64KB
|
||||||
range 7 64 if PAGE_SIZE_64KB
|
default "13" if !MMU
|
||||||
default "7" if PAGE_SIZE_64KB
|
default "10"
|
||||||
range 11 64
|
|
||||||
default "14" if !MMU
|
|
||||||
default "11"
|
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
The page size is not necessarily 4KB. Keep this in mind when
|
The page size is not necessarily 4KB. Keep this in mind when
|
||||||
choosing a value for this option.
|
choosing a value for this option.
|
||||||
|
|
||||||
|
Don't change if unsure.
|
||||||
|
|
||||||
config MEMORY_START
|
config MEMORY_START
|
||||||
hex "Physical memory start address"
|
hex "Physical memory start address"
|
||||||
default "0x08000000"
|
default "0x08000000"
|
||||||
|
|
|
@ -271,18 +271,17 @@ config ARCH_SPARSEMEM_DEFAULT
|
||||||
def_bool y if SPARC64
|
def_bool y if SPARC64
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Order of maximal physically contiguous allocations"
|
||||||
default "13"
|
default "12"
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
Don't change if unsure.
|
||||||
a value of 13 means that the largest free memory block is 2^12 pages.
|
|
||||||
|
|
||||||
if SPARC64 || COMPILE_TEST
|
if SPARC64 || COMPILE_TEST
|
||||||
source "kernel/power/Kconfig"
|
source "kernel/power/Kconfig"
|
||||||
|
|
|
@ -357,6 +357,42 @@ static inline pgprot_t pgprot_noncached(pgprot_t prot)
|
||||||
*/
|
*/
|
||||||
#define pgprot_noncached pgprot_noncached
|
#define pgprot_noncached pgprot_noncached
|
||||||
|
|
||||||
|
static inline unsigned long pte_dirty(pte_t pte)
|
||||||
|
{
|
||||||
|
unsigned long mask;
|
||||||
|
|
||||||
|
__asm__ __volatile__(
|
||||||
|
"\n661: mov %1, %0\n"
|
||||||
|
" nop\n"
|
||||||
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
||||||
|
" .word 661b\n"
|
||||||
|
" sethi %%uhi(%2), %0\n"
|
||||||
|
" sllx %0, 32, %0\n"
|
||||||
|
" .previous\n"
|
||||||
|
: "=r" (mask)
|
||||||
|
: "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V));
|
||||||
|
|
||||||
|
return (pte_val(pte) & mask);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline unsigned long pte_write(pte_t pte)
|
||||||
|
{
|
||||||
|
unsigned long mask;
|
||||||
|
|
||||||
|
__asm__ __volatile__(
|
||||||
|
"\n661: mov %1, %0\n"
|
||||||
|
" nop\n"
|
||||||
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
||||||
|
" .word 661b\n"
|
||||||
|
" sethi %%uhi(%2), %0\n"
|
||||||
|
" sllx %0, 32, %0\n"
|
||||||
|
" .previous\n"
|
||||||
|
: "=r" (mask)
|
||||||
|
: "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V));
|
||||||
|
|
||||||
|
return (pte_val(pte) & mask);
|
||||||
|
}
|
||||||
|
|
||||||
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
|
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
|
||||||
pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
|
pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
|
||||||
#define arch_make_huge_pte arch_make_huge_pte
|
#define arch_make_huge_pte arch_make_huge_pte
|
||||||
|
@ -418,28 +454,43 @@ static inline bool is_hugetlb_pte(pte_t pte)
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
static inline pte_t __pte_mkhwwrite(pte_t pte)
|
||||||
|
{
|
||||||
|
unsigned long val = pte_val(pte);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Note: we only want to set the HW writable bit if the SW writable bit
|
||||||
|
* and the SW dirty bit are set.
|
||||||
|
*/
|
||||||
|
__asm__ __volatile__(
|
||||||
|
"\n661: or %0, %2, %0\n"
|
||||||
|
" .section .sun4v_1insn_patch, \"ax\"\n"
|
||||||
|
" .word 661b\n"
|
||||||
|
" or %0, %3, %0\n"
|
||||||
|
" .previous\n"
|
||||||
|
: "=r" (val)
|
||||||
|
: "0" (val), "i" (_PAGE_W_4U), "i" (_PAGE_W_4V));
|
||||||
|
|
||||||
|
return __pte(val);
|
||||||
|
}
|
||||||
|
|
||||||
static inline pte_t pte_mkdirty(pte_t pte)
|
static inline pte_t pte_mkdirty(pte_t pte)
|
||||||
{
|
{
|
||||||
unsigned long val = pte_val(pte), tmp;
|
unsigned long val = pte_val(pte), mask;
|
||||||
|
|
||||||
__asm__ __volatile__(
|
__asm__ __volatile__(
|
||||||
"\n661: or %0, %3, %0\n"
|
"\n661: mov %1, %0\n"
|
||||||
" nop\n"
|
|
||||||
"\n662: nop\n"
|
|
||||||
" nop\n"
|
" nop\n"
|
||||||
" .section .sun4v_2insn_patch, \"ax\"\n"
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
||||||
" .word 661b\n"
|
" .word 661b\n"
|
||||||
" sethi %%uhi(%4), %1\n"
|
" sethi %%uhi(%2), %0\n"
|
||||||
" sllx %1, 32, %1\n"
|
" sllx %0, 32, %0\n"
|
||||||
" .word 662b\n"
|
|
||||||
" or %1, %%lo(%4), %1\n"
|
|
||||||
" or %0, %1, %0\n"
|
|
||||||
" .previous\n"
|
" .previous\n"
|
||||||
: "=r" (val), "=r" (tmp)
|
: "=r" (mask)
|
||||||
: "0" (val), "i" (_PAGE_MODIFIED_4U | _PAGE_W_4U),
|
: "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V));
|
||||||
"i" (_PAGE_MODIFIED_4V | _PAGE_W_4V));
|
|
||||||
|
|
||||||
return __pte(val);
|
pte = __pte(val | mask);
|
||||||
|
return pte_write(pte) ? __pte_mkhwwrite(pte) : pte;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline pte_t pte_mkclean(pte_t pte)
|
static inline pte_t pte_mkclean(pte_t pte)
|
||||||
|
@ -481,7 +532,8 @@ static inline pte_t pte_mkwrite(pte_t pte)
|
||||||
: "=r" (mask)
|
: "=r" (mask)
|
||||||
: "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V));
|
: "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V));
|
||||||
|
|
||||||
return __pte(val | mask);
|
pte = __pte(val | mask);
|
||||||
|
return pte_dirty(pte) ? __pte_mkhwwrite(pte) : pte;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline pte_t pte_wrprotect(pte_t pte)
|
static inline pte_t pte_wrprotect(pte_t pte)
|
||||||
|
@ -584,42 +636,6 @@ static inline unsigned long pte_young(pte_t pte)
|
||||||
return (pte_val(pte) & mask);
|
return (pte_val(pte) & mask);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline unsigned long pte_dirty(pte_t pte)
|
|
||||||
{
|
|
||||||
unsigned long mask;
|
|
||||||
|
|
||||||
__asm__ __volatile__(
|
|
||||||
"\n661: mov %1, %0\n"
|
|
||||||
" nop\n"
|
|
||||||
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
||||||
" .word 661b\n"
|
|
||||||
" sethi %%uhi(%2), %0\n"
|
|
||||||
" sllx %0, 32, %0\n"
|
|
||||||
" .previous\n"
|
|
||||||
: "=r" (mask)
|
|
||||||
: "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V));
|
|
||||||
|
|
||||||
return (pte_val(pte) & mask);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline unsigned long pte_write(pte_t pte)
|
|
||||||
{
|
|
||||||
unsigned long mask;
|
|
||||||
|
|
||||||
__asm__ __volatile__(
|
|
||||||
"\n661: mov %1, %0\n"
|
|
||||||
" nop\n"
|
|
||||||
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
||||||
" .word 661b\n"
|
|
||||||
" sethi %%uhi(%2), %0\n"
|
|
||||||
" sllx %0, 32, %0\n"
|
|
||||||
" .previous\n"
|
|
||||||
: "=r" (mask)
|
|
||||||
: "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V));
|
|
||||||
|
|
||||||
return (pte_val(pte) & mask);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline unsigned long pte_exec(pte_t pte)
|
static inline unsigned long pte_exec(pte_t pte)
|
||||||
{
|
{
|
||||||
unsigned long mask;
|
unsigned long mask;
|
||||||
|
|
|
@ -193,7 +193,7 @@ static void *dma_4v_alloc_coherent(struct device *dev, size_t size,
|
||||||
|
|
||||||
size = IO_PAGE_ALIGN(size);
|
size = IO_PAGE_ALIGN(size);
|
||||||
order = get_order(size);
|
order = get_order(size);
|
||||||
if (unlikely(order >= MAX_ORDER))
|
if (unlikely(order > MAX_ORDER))
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
npages = size >> IO_PAGE_SHIFT;
|
npages = size >> IO_PAGE_SHIFT;
|
||||||
|
|
|
@ -897,7 +897,7 @@ void __init cheetah_ecache_flush_init(void)
|
||||||
|
|
||||||
/* Now allocate error trap reporting scoreboard. */
|
/* Now allocate error trap reporting scoreboard. */
|
||||||
sz = NR_CPUS * (2 * sizeof(struct cheetah_err_info));
|
sz = NR_CPUS * (2 * sizeof(struct cheetah_err_info));
|
||||||
for (order = 0; order < MAX_ORDER; order++) {
|
for (order = 0; order <= MAX_ORDER; order++) {
|
||||||
if ((PAGE_SIZE << order) >= sz)
|
if ((PAGE_SIZE << order) >= sz)
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
|
@ -402,8 +402,8 @@ void tsb_grow(struct mm_struct *mm, unsigned long tsb_index, unsigned long rss)
|
||||||
unsigned long new_rss_limit;
|
unsigned long new_rss_limit;
|
||||||
gfp_t gfp_flags;
|
gfp_t gfp_flags;
|
||||||
|
|
||||||
if (max_tsb_size > (PAGE_SIZE << MAX_ORDER))
|
if (max_tsb_size > PAGE_SIZE << MAX_ORDER)
|
||||||
max_tsb_size = (PAGE_SIZE << MAX_ORDER);
|
max_tsb_size = PAGE_SIZE << MAX_ORDER;
|
||||||
|
|
||||||
new_cache_index = 0;
|
new_cache_index = 0;
|
||||||
for (new_size = 8192; new_size < max_tsb_size; new_size <<= 1UL) {
|
for (new_size = 8192; new_size < max_tsb_size; new_size <<= 1UL) {
|
||||||
|
|
|
@ -27,6 +27,7 @@ config X86_64
|
||||||
# Options that are inherently 64-bit kernel only:
|
# Options that are inherently 64-bit kernel only:
|
||||||
select ARCH_HAS_GIGANTIC_PAGE
|
select ARCH_HAS_GIGANTIC_PAGE
|
||||||
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
|
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
|
||||||
|
select ARCH_SUPPORTS_PER_VMA_LOCK
|
||||||
select ARCH_USE_CMPXCHG_LOCKREF
|
select ARCH_USE_CMPXCHG_LOCKREF
|
||||||
select HAVE_ARCH_SOFT_DIRTY
|
select HAVE_ARCH_SOFT_DIRTY
|
||||||
select MODULES_USE_ELF_RELA
|
select MODULES_USE_ELF_RELA
|
||||||
|
@ -125,8 +126,8 @@ config X86
|
||||||
select ARCH_WANTS_NO_INSTR
|
select ARCH_WANTS_NO_INSTR
|
||||||
select ARCH_WANT_GENERAL_HUGETLB
|
select ARCH_WANT_GENERAL_HUGETLB
|
||||||
select ARCH_WANT_HUGE_PMD_SHARE
|
select ARCH_WANT_HUGE_PMD_SHARE
|
||||||
select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP if X86_64
|
|
||||||
select ARCH_WANT_LD_ORPHAN_WARN
|
select ARCH_WANT_LD_ORPHAN_WARN
|
||||||
|
select ARCH_WANT_OPTIMIZE_VMEMMAP if X86_64
|
||||||
select ARCH_WANTS_THP_SWAP if X86_64
|
select ARCH_WANTS_THP_SWAP if X86_64
|
||||||
select ARCH_HAS_PARANOID_L1D_FLUSH
|
select ARCH_HAS_PARANOID_L1D_FLUSH
|
||||||
select BUILDTIME_TABLE_SORT
|
select BUILDTIME_TABLE_SORT
|
||||||
|
|
|
@ -1097,7 +1097,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
|
||||||
clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
|
clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
|
||||||
}
|
}
|
||||||
|
|
||||||
#define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
|
#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)
|
||||||
|
|
||||||
#define mk_pmd(page, pgprot) pfn_pmd(page_to_pfn(page), (pgprot))
|
#define mk_pmd(page, pgprot) pfn_pmd(page_to_pfn(page), (pgprot))
|
||||||
|
|
||||||
|
|
|
@ -15,24 +15,18 @@
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#define __HAVE_ARCH_MEMCPY 1
|
#define __HAVE_ARCH_MEMCPY 1
|
||||||
#if defined(__SANITIZE_MEMORY__) && defined(__NO_FORTIFY)
|
|
||||||
#undef memcpy
|
|
||||||
#define memcpy __msan_memcpy
|
|
||||||
#else
|
|
||||||
extern void *memcpy(void *to, const void *from, size_t len);
|
extern void *memcpy(void *to, const void *from, size_t len);
|
||||||
#endif
|
|
||||||
extern void *__memcpy(void *to, const void *from, size_t len);
|
extern void *__memcpy(void *to, const void *from, size_t len);
|
||||||
|
|
||||||
#define __HAVE_ARCH_MEMSET
|
#define __HAVE_ARCH_MEMSET
|
||||||
#if defined(__SANITIZE_MEMORY__) && defined(__NO_FORTIFY)
|
|
||||||
extern void *__msan_memset(void *s, int c, size_t n);
|
|
||||||
#undef memset
|
|
||||||
#define memset __msan_memset
|
|
||||||
#else
|
|
||||||
void *memset(void *s, int c, size_t n);
|
void *memset(void *s, int c, size_t n);
|
||||||
#endif
|
|
||||||
void *__memset(void *s, int c, size_t n);
|
void *__memset(void *s, int c, size_t n);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* KMSAN needs to instrument as much code as possible. Use C versions of
|
||||||
|
* memsetXX() from lib/string.c under KMSAN.
|
||||||
|
*/
|
||||||
|
#if !defined(CONFIG_KMSAN)
|
||||||
#define __HAVE_ARCH_MEMSET16
|
#define __HAVE_ARCH_MEMSET16
|
||||||
static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
|
static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
|
||||||
{
|
{
|
||||||
|
@ -68,15 +62,10 @@ static inline void *memset64(uint64_t *s, uint64_t v, size_t n)
|
||||||
: "memory");
|
: "memory");
|
||||||
return s;
|
return s;
|
||||||
}
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
#define __HAVE_ARCH_MEMMOVE
|
#define __HAVE_ARCH_MEMMOVE
|
||||||
#if defined(__SANITIZE_MEMORY__) && defined(__NO_FORTIFY)
|
|
||||||
#undef memmove
|
|
||||||
void *__msan_memmove(void *dest, const void *src, size_t len);
|
|
||||||
#define memmove __msan_memmove
|
|
||||||
#else
|
|
||||||
void *memmove(void *dest, const void *src, size_t count);
|
void *memmove(void *dest, const void *src, size_t count);
|
||||||
#endif
|
|
||||||
void *__memmove(void *dest, const void *src, size_t count);
|
void *__memmove(void *dest, const void *src, size_t count);
|
||||||
|
|
||||||
int memcmp(const void *cs, const void *ct, size_t count);
|
int memcmp(const void *cs, const void *ct, size_t count);
|
||||||
|
|
|
@ -19,6 +19,7 @@
|
||||||
#include <linux/uaccess.h> /* faulthandler_disabled() */
|
#include <linux/uaccess.h> /* faulthandler_disabled() */
|
||||||
#include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/
|
#include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/
|
||||||
#include <linux/mm_types.h>
|
#include <linux/mm_types.h>
|
||||||
|
#include <linux/mm.h> /* find_and_lock_vma() */
|
||||||
|
|
||||||
#include <asm/cpufeature.h> /* boot_cpu_has, ... */
|
#include <asm/cpufeature.h> /* boot_cpu_has, ... */
|
||||||
#include <asm/traps.h> /* dotraplinkage, ... */
|
#include <asm/traps.h> /* dotraplinkage, ... */
|
||||||
|
@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs,
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
if (!(flags & FAULT_FLAG_USER))
|
||||||
|
goto lock_mmap;
|
||||||
|
|
||||||
|
vma = lock_vma_under_rcu(mm, address);
|
||||||
|
if (!vma)
|
||||||
|
goto lock_mmap;
|
||||||
|
|
||||||
|
if (unlikely(access_error(error_code, vma))) {
|
||||||
|
vma_end_read(vma);
|
||||||
|
goto lock_mmap;
|
||||||
|
}
|
||||||
|
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
|
||||||
|
vma_end_read(vma);
|
||||||
|
|
||||||
|
if (!(fault & VM_FAULT_RETRY)) {
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
count_vm_vma_lock_event(VMA_LOCK_RETRY);
|
||||||
|
|
||||||
|
/* Quick path to respond to signals */
|
||||||
|
if (fault_signal_pending(fault, regs)) {
|
||||||
|
if (!user_mode(regs))
|
||||||
|
kernelmode_fixup_or_oops(regs, error_code, address,
|
||||||
|
SIGBUS, BUS_ADRERR,
|
||||||
|
ARCH_DEFAULT_PKEY);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
lock_mmap:
|
||||||
|
#endif /* CONFIG_PER_VMA_LOCK */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Kernel-mode access to the user address space should only occur
|
* Kernel-mode access to the user address space should only occur
|
||||||
* on well-defined single instructions listed in the exception
|
* on well-defined single instructions listed in the exception
|
||||||
|
@ -1433,6 +1466,9 @@ good_area:
|
||||||
}
|
}
|
||||||
|
|
||||||
mmap_read_unlock(mm);
|
mmap_read_unlock(mm);
|
||||||
|
#ifdef CONFIG_PER_VMA_LOCK
|
||||||
|
done:
|
||||||
|
#endif
|
||||||
if (likely(!(fault & VM_FAULT_ERROR)))
|
if (likely(!(fault & VM_FAULT_ERROR)))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
|
|
|
@ -1073,11 +1073,15 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* untrack_pfn_moved is called, while mremapping a pfnmap for a new region,
|
* untrack_pfn_clear is called if the following situation fits:
|
||||||
* with the old vma after its pfnmap page table has been removed. The new
|
*
|
||||||
* vma has a new pfnmap to the same pfn & cache type with VM_PAT set.
|
* 1) while mremapping a pfnmap for a new region, with the old vma after
|
||||||
|
* its pfnmap page table has been removed. The new vma has a new pfnmap
|
||||||
|
* to the same pfn & cache type with VM_PAT set.
|
||||||
|
* 2) while duplicating vm area, the new vma fails to copy the pgtable from
|
||||||
|
* old vma.
|
||||||
*/
|
*/
|
||||||
void untrack_pfn_moved(struct vm_area_struct *vma)
|
void untrack_pfn_clear(struct vm_area_struct *vma)
|
||||||
{
|
{
|
||||||
vm_flags_clear(vma, VM_PAT);
|
vm_flags_clear(vma, VM_PAT);
|
||||||
}
|
}
|
||||||
|
|
|
@ -772,18 +772,17 @@ config HIGHMEM
|
||||||
If unsure, say Y.
|
If unsure, say Y.
|
||||||
|
|
||||||
config ARCH_FORCE_MAX_ORDER
|
config ARCH_FORCE_MAX_ORDER
|
||||||
int "Maximum zone order"
|
int "Order of maximal physically contiguous allocations"
|
||||||
default "11"
|
default "10"
|
||||||
help
|
help
|
||||||
The kernel memory allocator divides physically contiguous memory
|
The kernel page allocator limits the size of maximal physically
|
||||||
blocks into "zones", where each zone is a power of two number of
|
contiguous allocations. The limit is called MAX_ORDER and it
|
||||||
pages. This option selects the largest power of two that the kernel
|
defines the maximal power of two of number of pages that can be
|
||||||
keeps in the memory allocator. If you need to allocate very large
|
allocated as a single contiguous block. This option allows
|
||||||
blocks of physically contiguous memory, then you may need to
|
overriding the default setting when ability to allocate very
|
||||||
increase this value.
|
large blocks of physically contiguous memory is required.
|
||||||
|
|
||||||
This config option is actually maximum order plus one. For example,
|
Don't change if unsure.
|
||||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
|
||||||
|
|
||||||
endmenu
|
endmenu
|
||||||
|
|
||||||
|
|
|
@ -226,8 +226,8 @@ static ssize_t regmap_read_debugfs(struct regmap *map, unsigned int from,
|
||||||
if (*ppos < 0 || !count)
|
if (*ppos < 0 || !count)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
if (count > (PAGE_SIZE << (MAX_ORDER - 1)))
|
if (count > (PAGE_SIZE << MAX_ORDER))
|
||||||
count = PAGE_SIZE << (MAX_ORDER - 1);
|
count = PAGE_SIZE << MAX_ORDER;
|
||||||
|
|
||||||
buf = kmalloc(count, GFP_KERNEL);
|
buf = kmalloc(count, GFP_KERNEL);
|
||||||
if (!buf)
|
if (!buf)
|
||||||
|
@ -373,8 +373,8 @@ static ssize_t regmap_reg_ranges_read_file(struct file *file,
|
||||||
if (*ppos < 0 || !count)
|
if (*ppos < 0 || !count)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
if (count > (PAGE_SIZE << (MAX_ORDER - 1)))
|
if (count > (PAGE_SIZE << MAX_ORDER))
|
||||||
count = PAGE_SIZE << (MAX_ORDER - 1);
|
count = PAGE_SIZE << MAX_ORDER;
|
||||||
|
|
||||||
buf = kmalloc(count, GFP_KERNEL);
|
buf = kmalloc(count, GFP_KERNEL);
|
||||||
if (!buf)
|
if (!buf)
|
||||||
|
|
|
@ -3108,7 +3108,7 @@ loop:
|
||||||
ptr->resultcode = 0;
|
ptr->resultcode = 0;
|
||||||
|
|
||||||
if (ptr->flags & (FD_RAW_READ | FD_RAW_WRITE)) {
|
if (ptr->flags & (FD_RAW_READ | FD_RAW_WRITE)) {
|
||||||
if (ptr->length <= 0 || ptr->length >= MAX_LEN)
|
if (ptr->length <= 0 || ptr->length > MAX_LEN)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
ptr->kernel_data = (char *)fd_dma_mem_alloc(ptr->length);
|
ptr->kernel_data = (char *)fd_dma_mem_alloc(ptr->length);
|
||||||
fallback_on_nodma_alloc(&ptr->kernel_data, ptr->length);
|
fallback_on_nodma_alloc(&ptr->kernel_data, ptr->length);
|
||||||
|
|
|
@ -54,9 +54,8 @@ static size_t huge_class_size;
|
||||||
static const struct block_device_operations zram_devops;
|
static const struct block_device_operations zram_devops;
|
||||||
|
|
||||||
static void zram_free_page(struct zram *zram, size_t index);
|
static void zram_free_page(struct zram *zram, size_t index);
|
||||||
static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
|
static int zram_read_page(struct zram *zram, struct page *page, u32 index,
|
||||||
u32 index, int offset, struct bio *bio);
|
struct bio *parent);
|
||||||
|
|
||||||
|
|
||||||
static int zram_slot_trylock(struct zram *zram, u32 index)
|
static int zram_slot_trylock(struct zram *zram, u32 index)
|
||||||
{
|
{
|
||||||
|
@ -148,6 +147,7 @@ static inline bool is_partial_io(struct bio_vec *bvec)
|
||||||
{
|
{
|
||||||
return bvec->bv_len != PAGE_SIZE;
|
return bvec->bv_len != PAGE_SIZE;
|
||||||
}
|
}
|
||||||
|
#define ZRAM_PARTIAL_IO 1
|
||||||
#else
|
#else
|
||||||
static inline bool is_partial_io(struct bio_vec *bvec)
|
static inline bool is_partial_io(struct bio_vec *bvec)
|
||||||
{
|
{
|
||||||
|
@ -174,36 +174,6 @@ static inline u32 zram_get_priority(struct zram *zram, u32 index)
|
||||||
return prio & ZRAM_COMP_PRIORITY_MASK;
|
return prio & ZRAM_COMP_PRIORITY_MASK;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* Check if request is within bounds and aligned on zram logical blocks.
|
|
||||||
*/
|
|
||||||
static inline bool valid_io_request(struct zram *zram,
|
|
||||||
sector_t start, unsigned int size)
|
|
||||||
{
|
|
||||||
u64 end, bound;
|
|
||||||
|
|
||||||
/* unaligned request */
|
|
||||||
if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
|
|
||||||
return false;
|
|
||||||
if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
|
|
||||||
return false;
|
|
||||||
|
|
||||||
end = start + (size >> SECTOR_SHIFT);
|
|
||||||
bound = zram->disksize >> SECTOR_SHIFT;
|
|
||||||
/* out of range */
|
|
||||||
if (unlikely(start >= bound || end > bound || start > end))
|
|
||||||
return false;
|
|
||||||
|
|
||||||
/* I/O request is valid */
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
|
|
||||||
{
|
|
||||||
*index += (*offset + bvec->bv_len) / PAGE_SIZE;
|
|
||||||
*offset = (*offset + bvec->bv_len) % PAGE_SIZE;
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void update_used_max(struct zram *zram,
|
static inline void update_used_max(struct zram *zram,
|
||||||
const unsigned long pages)
|
const unsigned long pages)
|
||||||
{
|
{
|
||||||
|
@ -606,41 +576,16 @@ static void free_block_bdev(struct zram *zram, unsigned long blk_idx)
|
||||||
atomic64_dec(&zram->stats.bd_count);
|
atomic64_dec(&zram->stats.bd_count);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void zram_page_end_io(struct bio *bio)
|
static void read_from_bdev_async(struct zram *zram, struct page *page,
|
||||||
{
|
|
||||||
struct page *page = bio_first_page_all(bio);
|
|
||||||
|
|
||||||
page_endio(page, op_is_write(bio_op(bio)),
|
|
||||||
blk_status_to_errno(bio->bi_status));
|
|
||||||
bio_put(bio);
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Returns 1 if the submission is successful.
|
|
||||||
*/
|
|
||||||
static int read_from_bdev_async(struct zram *zram, struct bio_vec *bvec,
|
|
||||||
unsigned long entry, struct bio *parent)
|
unsigned long entry, struct bio *parent)
|
||||||
{
|
{
|
||||||
struct bio *bio;
|
struct bio *bio;
|
||||||
|
|
||||||
bio = bio_alloc(zram->bdev, 1, parent ? parent->bi_opf : REQ_OP_READ,
|
bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
|
||||||
GFP_NOIO);
|
|
||||||
if (!bio)
|
|
||||||
return -ENOMEM;
|
|
||||||
|
|
||||||
bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
|
bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
|
||||||
if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len, bvec->bv_offset)) {
|
__bio_add_page(bio, page, PAGE_SIZE, 0);
|
||||||
bio_put(bio);
|
|
||||||
return -EIO;
|
|
||||||
}
|
|
||||||
|
|
||||||
if (!parent)
|
|
||||||
bio->bi_end_io = zram_page_end_io;
|
|
||||||
else
|
|
||||||
bio_chain(bio, parent);
|
bio_chain(bio, parent);
|
||||||
|
|
||||||
submit_bio(bio);
|
submit_bio(bio);
|
||||||
return 1;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#define PAGE_WB_SIG "page_index="
|
#define PAGE_WB_SIG "page_index="
|
||||||
|
@ -701,10 +646,6 @@ static ssize_t writeback_store(struct device *dev,
|
||||||
}
|
}
|
||||||
|
|
||||||
for (; nr_pages != 0; index++, nr_pages--) {
|
for (; nr_pages != 0; index++, nr_pages--) {
|
||||||
struct bio_vec bvec;
|
|
||||||
|
|
||||||
bvec_set_page(&bvec, page, PAGE_SIZE, 0);
|
|
||||||
|
|
||||||
spin_lock(&zram->wb_limit_lock);
|
spin_lock(&zram->wb_limit_lock);
|
||||||
if (zram->wb_limit_enable && !zram->bd_wb_limit) {
|
if (zram->wb_limit_enable && !zram->bd_wb_limit) {
|
||||||
spin_unlock(&zram->wb_limit_lock);
|
spin_unlock(&zram->wb_limit_lock);
|
||||||
|
@ -748,7 +689,7 @@ static ssize_t writeback_store(struct device *dev,
|
||||||
/* Need for hugepage writeback racing */
|
/* Need for hugepage writeback racing */
|
||||||
zram_set_flag(zram, index, ZRAM_IDLE);
|
zram_set_flag(zram, index, ZRAM_IDLE);
|
||||||
zram_slot_unlock(zram, index);
|
zram_slot_unlock(zram, index);
|
||||||
if (zram_bvec_read(zram, &bvec, index, 0, NULL)) {
|
if (zram_read_page(zram, page, index, NULL)) {
|
||||||
zram_slot_lock(zram, index);
|
zram_slot_lock(zram, index);
|
||||||
zram_clear_flag(zram, index, ZRAM_UNDER_WB);
|
zram_clear_flag(zram, index, ZRAM_UNDER_WB);
|
||||||
zram_clear_flag(zram, index, ZRAM_IDLE);
|
zram_clear_flag(zram, index, ZRAM_IDLE);
|
||||||
|
@ -759,9 +700,8 @@ static ssize_t writeback_store(struct device *dev,
|
||||||
bio_init(&bio, zram->bdev, &bio_vec, 1,
|
bio_init(&bio, zram->bdev, &bio_vec, 1,
|
||||||
REQ_OP_WRITE | REQ_SYNC);
|
REQ_OP_WRITE | REQ_SYNC);
|
||||||
bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
|
bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
|
||||||
|
bio_add_page(&bio, page, PAGE_SIZE, 0);
|
||||||
|
|
||||||
bio_add_page(&bio, bvec.bv_page, bvec.bv_len,
|
|
||||||
bvec.bv_offset);
|
|
||||||
/*
|
/*
|
||||||
* XXX: A single page IO would be inefficient for write
|
* XXX: A single page IO would be inefficient for write
|
||||||
* but it would be not bad as starter.
|
* but it would be not bad as starter.
|
||||||
|
@ -829,19 +769,20 @@ struct zram_work {
|
||||||
struct work_struct work;
|
struct work_struct work;
|
||||||
struct zram *zram;
|
struct zram *zram;
|
||||||
unsigned long entry;
|
unsigned long entry;
|
||||||
struct bio *bio;
|
struct page *page;
|
||||||
struct bio_vec bvec;
|
int error;
|
||||||
};
|
};
|
||||||
|
|
||||||
#if PAGE_SIZE != 4096
|
|
||||||
static void zram_sync_read(struct work_struct *work)
|
static void zram_sync_read(struct work_struct *work)
|
||||||
{
|
{
|
||||||
struct zram_work *zw = container_of(work, struct zram_work, work);
|
struct zram_work *zw = container_of(work, struct zram_work, work);
|
||||||
struct zram *zram = zw->zram;
|
struct bio_vec bv;
|
||||||
unsigned long entry = zw->entry;
|
struct bio bio;
|
||||||
struct bio *bio = zw->bio;
|
|
||||||
|
|
||||||
read_from_bdev_async(zram, &zw->bvec, entry, bio);
|
bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
|
||||||
|
bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
|
||||||
|
__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
|
||||||
|
zw->error = submit_bio_wait(&bio);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -849,45 +790,39 @@ static void zram_sync_read(struct work_struct *work)
|
||||||
* chained IO with parent IO in same context, it's a deadlock. To avoid that,
|
* chained IO with parent IO in same context, it's a deadlock. To avoid that,
|
||||||
* use a worker thread context.
|
* use a worker thread context.
|
||||||
*/
|
*/
|
||||||
static int read_from_bdev_sync(struct zram *zram, struct bio_vec *bvec,
|
static int read_from_bdev_sync(struct zram *zram, struct page *page,
|
||||||
unsigned long entry, struct bio *bio)
|
unsigned long entry)
|
||||||
{
|
{
|
||||||
struct zram_work work;
|
struct zram_work work;
|
||||||
|
|
||||||
work.bvec = *bvec;
|
work.page = page;
|
||||||
work.zram = zram;
|
work.zram = zram;
|
||||||
work.entry = entry;
|
work.entry = entry;
|
||||||
work.bio = bio;
|
|
||||||
|
|
||||||
INIT_WORK_ONSTACK(&work.work, zram_sync_read);
|
INIT_WORK_ONSTACK(&work.work, zram_sync_read);
|
||||||
queue_work(system_unbound_wq, &work.work);
|
queue_work(system_unbound_wq, &work.work);
|
||||||
flush_work(&work.work);
|
flush_work(&work.work);
|
||||||
destroy_work_on_stack(&work.work);
|
destroy_work_on_stack(&work.work);
|
||||||
|
|
||||||
return 1;
|
return work.error;
|
||||||
}
|
}
|
||||||
#else
|
|
||||||
static int read_from_bdev_sync(struct zram *zram, struct bio_vec *bvec,
|
|
||||||
unsigned long entry, struct bio *bio)
|
|
||||||
{
|
|
||||||
WARN_ON(1);
|
|
||||||
return -EIO;
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
|
static int read_from_bdev(struct zram *zram, struct page *page,
|
||||||
unsigned long entry, struct bio *parent, bool sync)
|
unsigned long entry, struct bio *parent)
|
||||||
{
|
{
|
||||||
atomic64_inc(&zram->stats.bd_reads);
|
atomic64_inc(&zram->stats.bd_reads);
|
||||||
if (sync)
|
if (!parent) {
|
||||||
return read_from_bdev_sync(zram, bvec, entry, parent);
|
if (WARN_ON_ONCE(!IS_ENABLED(ZRAM_PARTIAL_IO)))
|
||||||
else
|
return -EIO;
|
||||||
return read_from_bdev_async(zram, bvec, entry, parent);
|
return read_from_bdev_sync(zram, page, entry);
|
||||||
|
}
|
||||||
|
read_from_bdev_async(zram, page, entry, parent);
|
||||||
|
return 0;
|
||||||
}
|
}
|
||||||
#else
|
#else
|
||||||
static inline void reset_bdev(struct zram *zram) {};
|
static inline void reset_bdev(struct zram *zram) {};
|
||||||
static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
|
static int read_from_bdev(struct zram *zram, struct page *page,
|
||||||
unsigned long entry, struct bio *parent, bool sync)
|
unsigned long entry, struct bio *parent)
|
||||||
{
|
{
|
||||||
return -EIO;
|
return -EIO;
|
||||||
}
|
}
|
||||||
|
@ -1190,10 +1125,9 @@ static ssize_t io_stat_show(struct device *dev,
|
||||||
|
|
||||||
down_read(&zram->init_lock);
|
down_read(&zram->init_lock);
|
||||||
ret = scnprintf(buf, PAGE_SIZE,
|
ret = scnprintf(buf, PAGE_SIZE,
|
||||||
"%8llu %8llu %8llu %8llu\n",
|
"%8llu %8llu 0 %8llu\n",
|
||||||
(u64)atomic64_read(&zram->stats.failed_reads),
|
(u64)atomic64_read(&zram->stats.failed_reads),
|
||||||
(u64)atomic64_read(&zram->stats.failed_writes),
|
(u64)atomic64_read(&zram->stats.failed_writes),
|
||||||
(u64)atomic64_read(&zram->stats.invalid_io),
|
|
||||||
(u64)atomic64_read(&zram->stats.notify_free));
|
(u64)atomic64_read(&zram->stats.notify_free));
|
||||||
up_read(&zram->init_lock);
|
up_read(&zram->init_lock);
|
||||||
|
|
||||||
|
@ -1371,20 +1305,6 @@ out:
|
||||||
~(1UL << ZRAM_LOCK | 1UL << ZRAM_UNDER_WB));
|
~(1UL << ZRAM_LOCK | 1UL << ZRAM_UNDER_WB));
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* Reads a page from the writeback devices. Corresponding ZRAM slot
|
|
||||||
* should be unlocked.
|
|
||||||
*/
|
|
||||||
static int zram_bvec_read_from_bdev(struct zram *zram, struct page *page,
|
|
||||||
u32 index, struct bio *bio, bool partial_io)
|
|
||||||
{
|
|
||||||
struct bio_vec bvec;
|
|
||||||
|
|
||||||
bvec_set_page(&bvec, page, PAGE_SIZE, 0);
|
|
||||||
return read_from_bdev(zram, &bvec, zram_get_element(zram, index), bio,
|
|
||||||
partial_io);
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Reads (decompresses if needed) a page from zspool (zsmalloc).
|
* Reads (decompresses if needed) a page from zspool (zsmalloc).
|
||||||
* Corresponding ZRAM slot should be locked.
|
* Corresponding ZRAM slot should be locked.
|
||||||
|
@ -1434,8 +1354,8 @@ static int zram_read_from_zspool(struct zram *zram, struct page *page,
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
|
static int zram_read_page(struct zram *zram, struct page *page, u32 index,
|
||||||
struct bio *bio, bool partial_io)
|
struct bio *parent)
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
|
@ -1445,11 +1365,14 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
|
||||||
ret = zram_read_from_zspool(zram, page, index);
|
ret = zram_read_from_zspool(zram, page, index);
|
||||||
zram_slot_unlock(zram, index);
|
zram_slot_unlock(zram, index);
|
||||||
} else {
|
} else {
|
||||||
/* Slot should be unlocked before the function call */
|
/*
|
||||||
|
* The slot should be unlocked before reading from the backing
|
||||||
|
* device.
|
||||||
|
*/
|
||||||
zram_slot_unlock(zram, index);
|
zram_slot_unlock(zram, index);
|
||||||
|
|
||||||
ret = zram_bvec_read_from_bdev(zram, page, index, bio,
|
ret = read_from_bdev(zram, page, zram_get_element(zram, index),
|
||||||
partial_io);
|
parent);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Should NEVER happen. Return bio error if it does. */
|
/* Should NEVER happen. Return bio error if it does. */
|
||||||
|
@ -1459,39 +1382,34 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
|
/*
|
||||||
u32 index, int offset, struct bio *bio)
|
* Use a temporary buffer to decompress the page, as the decompressor
|
||||||
|
* always expects a full page for the output.
|
||||||
|
*/
|
||||||
|
static int zram_bvec_read_partial(struct zram *zram, struct bio_vec *bvec,
|
||||||
|
u32 index, int offset)
|
||||||
{
|
{
|
||||||
|
struct page *page = alloc_page(GFP_NOIO);
|
||||||
int ret;
|
int ret;
|
||||||
struct page *page;
|
|
||||||
|
|
||||||
page = bvec->bv_page;
|
|
||||||
if (is_partial_io(bvec)) {
|
|
||||||
/* Use a temporary buffer to decompress the page */
|
|
||||||
page = alloc_page(GFP_NOIO|__GFP_HIGHMEM);
|
|
||||||
if (!page)
|
if (!page)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
}
|
ret = zram_read_page(zram, page, index, NULL);
|
||||||
|
if (likely(!ret))
|
||||||
ret = __zram_bvec_read(zram, page, index, bio, is_partial_io(bvec));
|
memcpy_to_bvec(bvec, page_address(page) + offset);
|
||||||
if (unlikely(ret))
|
|
||||||
goto out;
|
|
||||||
|
|
||||||
if (is_partial_io(bvec)) {
|
|
||||||
void *src = kmap_atomic(page);
|
|
||||||
|
|
||||||
memcpy_to_bvec(bvec, src + offset);
|
|
||||||
kunmap_atomic(src);
|
|
||||||
}
|
|
||||||
out:
|
|
||||||
if (is_partial_io(bvec))
|
|
||||||
__free_page(page);
|
__free_page(page);
|
||||||
|
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
|
static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
|
||||||
u32 index, struct bio *bio)
|
u32 index, int offset, struct bio *bio)
|
||||||
|
{
|
||||||
|
if (is_partial_io(bvec))
|
||||||
|
return zram_bvec_read_partial(zram, bvec, index, offset);
|
||||||
|
return zram_read_page(zram, bvec->bv_page, index, bio);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int zram_write_page(struct zram *zram, struct page *page, u32 index)
|
||||||
{
|
{
|
||||||
int ret = 0;
|
int ret = 0;
|
||||||
unsigned long alloced_pages;
|
unsigned long alloced_pages;
|
||||||
|
@ -1499,7 +1417,6 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
|
||||||
unsigned int comp_len = 0;
|
unsigned int comp_len = 0;
|
||||||
void *src, *dst, *mem;
|
void *src, *dst, *mem;
|
||||||
struct zcomp_strm *zstrm;
|
struct zcomp_strm *zstrm;
|
||||||
struct page *page = bvec->bv_page;
|
|
||||||
unsigned long element = 0;
|
unsigned long element = 0;
|
||||||
enum zram_pageflags flags = 0;
|
enum zram_pageflags flags = 0;
|
||||||
|
|
||||||
|
@ -1617,42 +1534,35 @@ out:
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
|
/*
|
||||||
|
* This is a partial IO. Read the full page before writing the changes.
|
||||||
|
*/
|
||||||
|
static int zram_bvec_write_partial(struct zram *zram, struct bio_vec *bvec,
|
||||||
u32 index, int offset, struct bio *bio)
|
u32 index, int offset, struct bio *bio)
|
||||||
{
|
{
|
||||||
|
struct page *page = alloc_page(GFP_NOIO);
|
||||||
int ret;
|
int ret;
|
||||||
struct page *page = NULL;
|
|
||||||
struct bio_vec vec;
|
|
||||||
|
|
||||||
vec = *bvec;
|
|
||||||
if (is_partial_io(bvec)) {
|
|
||||||
void *dst;
|
|
||||||
/*
|
|
||||||
* This is a partial IO. We need to read the full page
|
|
||||||
* before to write the changes.
|
|
||||||
*/
|
|
||||||
page = alloc_page(GFP_NOIO|__GFP_HIGHMEM);
|
|
||||||
if (!page)
|
if (!page)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
ret = __zram_bvec_read(zram, page, index, bio, true);
|
ret = zram_read_page(zram, page, index, bio);
|
||||||
if (ret)
|
if (!ret) {
|
||||||
goto out;
|
memcpy_from_bvec(page_address(page) + offset, bvec);
|
||||||
|
ret = zram_write_page(zram, page, index);
|
||||||
dst = kmap_atomic(page);
|
|
||||||
memcpy_from_bvec(dst + offset, bvec);
|
|
||||||
kunmap_atomic(dst);
|
|
||||||
|
|
||||||
bvec_set_page(&vec, page, PAGE_SIZE, 0);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
ret = __zram_bvec_write(zram, &vec, index, bio);
|
|
||||||
out:
|
|
||||||
if (is_partial_io(bvec))
|
|
||||||
__free_page(page);
|
__free_page(page);
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
|
||||||
|
u32 index, int offset, struct bio *bio)
|
||||||
|
{
|
||||||
|
if (is_partial_io(bvec))
|
||||||
|
return zram_bvec_write_partial(zram, bvec, index, offset, bio);
|
||||||
|
return zram_write_page(zram, bvec->bv_page, index);
|
||||||
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_ZRAM_MULTI_COMP
|
#ifdef CONFIG_ZRAM_MULTI_COMP
|
||||||
/*
|
/*
|
||||||
* This function will decompress (unless it's ZRAM_HUGE) the page and then
|
* This function will decompress (unless it's ZRAM_HUGE) the page and then
|
||||||
|
@ -1761,7 +1671,7 @@ static int zram_recompress(struct zram *zram, u32 index, struct page *page,
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* No direct reclaim (slow path) for handle allocation and no
|
* No direct reclaim (slow path) for handle allocation and no
|
||||||
* re-compression attempt (unlike in __zram_bvec_write()) since
|
* re-compression attempt (unlike in zram_write_bvec()) since
|
||||||
* we already have stored that object in zsmalloc. If we cannot
|
* we already have stored that object in zsmalloc. If we cannot
|
||||||
* alloc memory for recompressed object then we bail out and
|
* alloc memory for recompressed object then we bail out and
|
||||||
* simply keep the old (existing) object in zsmalloc.
|
* simply keep the old (existing) object in zsmalloc.
|
||||||
|
@ -1921,15 +1831,12 @@ release_init_lock:
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/*
|
static void zram_bio_discard(struct zram *zram, struct bio *bio)
|
||||||
* zram_bio_discard - handler on discard request
|
|
||||||
* @index: physical block index in PAGE_SIZE units
|
|
||||||
* @offset: byte offset within physical block
|
|
||||||
*/
|
|
||||||
static void zram_bio_discard(struct zram *zram, u32 index,
|
|
||||||
int offset, struct bio *bio)
|
|
||||||
{
|
{
|
||||||
size_t n = bio->bi_iter.bi_size;
|
size_t n = bio->bi_iter.bi_size;
|
||||||
|
u32 index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
|
||||||
|
u32 offset = (bio->bi_iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
|
||||||
|
SECTOR_SHIFT;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* zram manages data in physical block size units. Because logical block
|
* zram manages data in physical block size units. Because logical block
|
||||||
|
@ -1957,80 +1864,58 @@ static void zram_bio_discard(struct zram *zram, u32 index,
|
||||||
index++;
|
index++;
|
||||||
n -= PAGE_SIZE;
|
n -= PAGE_SIZE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bio_endio(bio);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
static void zram_bio_read(struct zram *zram, struct bio *bio)
|
||||||
* Returns errno if it has some problem. Otherwise return 0 or 1.
|
|
||||||
* Returns 0 if IO request was done synchronously
|
|
||||||
* Returns 1 if IO request was successfully submitted.
|
|
||||||
*/
|
|
||||||
static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
|
|
||||||
int offset, enum req_op op, struct bio *bio)
|
|
||||||
{
|
{
|
||||||
int ret;
|
struct bvec_iter iter;
|
||||||
|
struct bio_vec bv;
|
||||||
|
unsigned long start_time;
|
||||||
|
|
||||||
if (!op_is_write(op)) {
|
start_time = bio_start_io_acct(bio);
|
||||||
ret = zram_bvec_read(zram, bvec, index, offset, bio);
|
bio_for_each_segment(bv, bio, iter) {
|
||||||
flush_dcache_page(bvec->bv_page);
|
u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
|
||||||
} else {
|
u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
|
||||||
ret = zram_bvec_write(zram, bvec, index, offset, bio);
|
SECTOR_SHIFT;
|
||||||
|
|
||||||
|
if (zram_bvec_read(zram, &bv, index, offset, bio) < 0) {
|
||||||
|
atomic64_inc(&zram->stats.failed_reads);
|
||||||
|
bio->bi_status = BLK_STS_IOERR;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
flush_dcache_page(bv.bv_page);
|
||||||
|
|
||||||
|
zram_slot_lock(zram, index);
|
||||||
|
zram_accessed(zram, index);
|
||||||
|
zram_slot_unlock(zram, index);
|
||||||
|
}
|
||||||
|
bio_end_io_acct(bio, start_time);
|
||||||
|
bio_endio(bio);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void zram_bio_write(struct zram *zram, struct bio *bio)
|
||||||
|
{
|
||||||
|
struct bvec_iter iter;
|
||||||
|
struct bio_vec bv;
|
||||||
|
unsigned long start_time;
|
||||||
|
|
||||||
|
start_time = bio_start_io_acct(bio);
|
||||||
|
bio_for_each_segment(bv, bio, iter) {
|
||||||
|
u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
|
||||||
|
u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
|
||||||
|
SECTOR_SHIFT;
|
||||||
|
|
||||||
|
if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) {
|
||||||
|
atomic64_inc(&zram->stats.failed_writes);
|
||||||
|
bio->bi_status = BLK_STS_IOERR;
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
zram_slot_lock(zram, index);
|
zram_slot_lock(zram, index);
|
||||||
zram_accessed(zram, index);
|
zram_accessed(zram, index);
|
||||||
zram_slot_unlock(zram, index);
|
zram_slot_unlock(zram, index);
|
||||||
|
|
||||||
if (unlikely(ret < 0)) {
|
|
||||||
if (!op_is_write(op))
|
|
||||||
atomic64_inc(&zram->stats.failed_reads);
|
|
||||||
else
|
|
||||||
atomic64_inc(&zram->stats.failed_writes);
|
|
||||||
}
|
|
||||||
|
|
||||||
return ret;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void __zram_make_request(struct zram *zram, struct bio *bio)
|
|
||||||
{
|
|
||||||
int offset;
|
|
||||||
u32 index;
|
|
||||||
struct bio_vec bvec;
|
|
||||||
struct bvec_iter iter;
|
|
||||||
unsigned long start_time;
|
|
||||||
|
|
||||||
index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
|
|
||||||
offset = (bio->bi_iter.bi_sector &
|
|
||||||
(SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
|
|
||||||
|
|
||||||
switch (bio_op(bio)) {
|
|
||||||
case REQ_OP_DISCARD:
|
|
||||||
case REQ_OP_WRITE_ZEROES:
|
|
||||||
zram_bio_discard(zram, index, offset, bio);
|
|
||||||
bio_endio(bio);
|
|
||||||
return;
|
|
||||||
default:
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
|
|
||||||
start_time = bio_start_io_acct(bio);
|
|
||||||
bio_for_each_segment(bvec, bio, iter) {
|
|
||||||
struct bio_vec bv = bvec;
|
|
||||||
unsigned int unwritten = bvec.bv_len;
|
|
||||||
|
|
||||||
do {
|
|
||||||
bv.bv_len = min_t(unsigned int, PAGE_SIZE - offset,
|
|
||||||
unwritten);
|
|
||||||
if (zram_bvec_rw(zram, &bv, index, offset,
|
|
||||||
bio_op(bio), bio) < 0) {
|
|
||||||
bio->bi_status = BLK_STS_IOERR;
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
|
|
||||||
bv.bv_offset += bv.bv_len;
|
|
||||||
unwritten -= bv.bv_len;
|
|
||||||
|
|
||||||
update_position(&index, &offset, &bv);
|
|
||||||
} while (unwritten);
|
|
||||||
}
|
}
|
||||||
bio_end_io_acct(bio, start_time);
|
bio_end_io_acct(bio, start_time);
|
||||||
bio_endio(bio);
|
bio_endio(bio);
|
||||||
|
@ -2043,14 +1928,21 @@ static void zram_submit_bio(struct bio *bio)
|
||||||
{
|
{
|
||||||
struct zram *zram = bio->bi_bdev->bd_disk->private_data;
|
struct zram *zram = bio->bi_bdev->bd_disk->private_data;
|
||||||
|
|
||||||
if (!valid_io_request(zram, bio->bi_iter.bi_sector,
|
switch (bio_op(bio)) {
|
||||||
bio->bi_iter.bi_size)) {
|
case REQ_OP_READ:
|
||||||
atomic64_inc(&zram->stats.invalid_io);
|
zram_bio_read(zram, bio);
|
||||||
bio_io_error(bio);
|
break;
|
||||||
return;
|
case REQ_OP_WRITE:
|
||||||
|
zram_bio_write(zram, bio);
|
||||||
|
break;
|
||||||
|
case REQ_OP_DISCARD:
|
||||||
|
case REQ_OP_WRITE_ZEROES:
|
||||||
|
zram_bio_discard(zram, bio);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
WARN_ON_ONCE(1);
|
||||||
|
bio_endio(bio);
|
||||||
}
|
}
|
||||||
|
|
||||||
__zram_make_request(zram, bio);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static void zram_slot_free_notify(struct block_device *bdev,
|
static void zram_slot_free_notify(struct block_device *bdev,
|
||||||
|
|
|
@ -78,7 +78,6 @@ struct zram_stats {
|
||||||
atomic64_t compr_data_size; /* compressed size of pages stored */
|
atomic64_t compr_data_size; /* compressed size of pages stored */
|
||||||
atomic64_t failed_reads; /* can happen when memory is too low */
|
atomic64_t failed_reads; /* can happen when memory is too low */
|
||||||
atomic64_t failed_writes; /* can happen when memory is too low */
|
atomic64_t failed_writes; /* can happen when memory is too low */
|
||||||
atomic64_t invalid_io; /* non-page-aligned I/O requests */
|
|
||||||
atomic64_t notify_free; /* no. of swap slot free notifications */
|
atomic64_t notify_free; /* no. of swap slot free notifications */
|
||||||
atomic64_t same_pages; /* no. of same element filled pages */
|
atomic64_t same_pages; /* no. of same element filled pages */
|
||||||
atomic64_t huge_pages; /* no. of huge pages */
|
atomic64_t huge_pages; /* no. of huge pages */
|
||||||
|
|
|
@ -892,7 +892,7 @@ static int sev_ioctl_do_get_id2(struct sev_issue_cmd *argp)
|
||||||
/*
|
/*
|
||||||
* The length of the ID shouldn't be assumed by software since
|
* The length of the ID shouldn't be assumed by software since
|
||||||
* it may change in the future. The allocation size is limited
|
* it may change in the future. The allocation size is limited
|
||||||
* to 1 << (PAGE_SHIFT + MAX_ORDER - 1) by the page allocator.
|
* to 1 << (PAGE_SHIFT + MAX_ORDER) by the page allocator.
|
||||||
* If the allocation fails, simply return ENOMEM rather than
|
* If the allocation fails, simply return ENOMEM rather than
|
||||||
* warning in the kernel log.
|
* warning in the kernel log.
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -70,11 +70,11 @@ struct hisi_acc_sgl_pool *hisi_acc_create_sgl_pool(struct device *dev,
|
||||||
HISI_ACC_SGL_ALIGN_SIZE);
|
HISI_ACC_SGL_ALIGN_SIZE);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* the pool may allocate a block of memory of size PAGE_SIZE * 2^(MAX_ORDER - 1),
|
* the pool may allocate a block of memory of size PAGE_SIZE * 2^MAX_ORDER,
|
||||||
* block size may exceed 2^31 on ia64, so the max of block size is 2^31
|
* block size may exceed 2^31 on ia64, so the max of block size is 2^31
|
||||||
*/
|
*/
|
||||||
block_size = 1 << (PAGE_SHIFT + MAX_ORDER <= 32 ?
|
block_size = 1 << (PAGE_SHIFT + MAX_ORDER < 32 ?
|
||||||
PAGE_SHIFT + MAX_ORDER - 1 : 31);
|
PAGE_SHIFT + MAX_ORDER : 31);
|
||||||
sgl_num_per_block = block_size / sgl_size;
|
sgl_num_per_block = block_size / sgl_size;
|
||||||
block_num = count / sgl_num_per_block;
|
block_num = count / sgl_num_per_block;
|
||||||
remain_sgl = count % sgl_num_per_block;
|
remain_sgl = count % sgl_num_per_block;
|
||||||
|
|
|
@ -41,12 +41,11 @@ struct dma_heap_attachment {
|
||||||
bool mapped;
|
bool mapped;
|
||||||
};
|
};
|
||||||
|
|
||||||
#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
|
#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
|
||||||
#define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN)
|
|
||||||
#define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
|
#define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
|
||||||
| __GFP_NORETRY) & ~__GFP_RECLAIM) \
|
| __GFP_NORETRY) & ~__GFP_RECLAIM) \
|
||||||
| __GFP_COMP)
|
| __GFP_COMP)
|
||||||
static gfp_t order_flags[] = {HIGH_ORDER_GFP, MID_ORDER_GFP, LOW_ORDER_GFP};
|
static gfp_t order_flags[] = {HIGH_ORDER_GFP, HIGH_ORDER_GFP, LOW_ORDER_GFP};
|
||||||
/*
|
/*
|
||||||
* The selection of the orders used for allocation (1MB, 64K, 4K) is designed
|
* The selection of the orders used for allocation (1MB, 64K, 4K) is designed
|
||||||
* to match with the sizes often found in IOMMUs. Using order 4 pages instead
|
* to match with the sizes often found in IOMMUs. Using order 4 pages instead
|
||||||
|
|
|
@ -115,7 +115,7 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
|
||||||
do {
|
do {
|
||||||
struct page *page;
|
struct page *page;
|
||||||
|
|
||||||
GEM_BUG_ON(order >= MAX_ORDER);
|
GEM_BUG_ON(order > MAX_ORDER);
|
||||||
page = alloc_pages(GFP | __GFP_ZERO, order);
|
page = alloc_pages(GFP | __GFP_ZERO, order);
|
||||||
if (!page)
|
if (!page)
|
||||||
goto err;
|
goto err;
|
||||||
|
|
|
@ -261,7 +261,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
|
||||||
* encryption bits. This is because the exact location of the
|
* encryption bits. This is because the exact location of the
|
||||||
* data may not be known at mmap() time and may also change
|
* data may not be known at mmap() time and may also change
|
||||||
* at arbitrary times while the data is mmap'ed.
|
* at arbitrary times while the data is mmap'ed.
|
||||||
* See vmf_insert_mixed_prot() for a discussion.
|
* See vmf_insert_pfn_prot() for a discussion.
|
||||||
*/
|
*/
|
||||||
ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
|
ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
|
||||||
|
|
||||||
|
|
|
@ -65,11 +65,11 @@ module_param(page_pool_size, ulong, 0644);
|
||||||
|
|
||||||
static atomic_long_t allocated_pages;
|
static atomic_long_t allocated_pages;
|
||||||
|
|
||||||
static struct ttm_pool_type global_write_combined[MAX_ORDER];
|
static struct ttm_pool_type global_write_combined[MAX_ORDER + 1];
|
||||||
static struct ttm_pool_type global_uncached[MAX_ORDER];
|
static struct ttm_pool_type global_uncached[MAX_ORDER + 1];
|
||||||
|
|
||||||
static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
|
static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER + 1];
|
||||||
static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
|
static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
|
||||||
|
|
||||||
static spinlock_t shrinker_lock;
|
static spinlock_t shrinker_lock;
|
||||||
static struct list_head shrinker_list;
|
static struct list_head shrinker_list;
|
||||||
|
@ -444,7 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
|
||||||
else
|
else
|
||||||
gfp_flags |= GFP_HIGHUSER;
|
gfp_flags |= GFP_HIGHUSER;
|
||||||
|
|
||||||
for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
|
for (order = min_t(unsigned int, MAX_ORDER, __fls(num_pages));
|
||||||
num_pages;
|
num_pages;
|
||||||
order = min_t(unsigned int, order, __fls(num_pages))) {
|
order = min_t(unsigned int, order, __fls(num_pages))) {
|
||||||
struct ttm_pool_type *pt;
|
struct ttm_pool_type *pt;
|
||||||
|
@ -563,7 +563,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
|
||||||
|
|
||||||
if (use_dma_alloc) {
|
if (use_dma_alloc) {
|
||||||
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
|
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
|
||||||
for (j = 0; j < MAX_ORDER; ++j)
|
for (j = 0; j <= MAX_ORDER; ++j)
|
||||||
ttm_pool_type_init(&pool->caching[i].orders[j],
|
ttm_pool_type_init(&pool->caching[i].orders[j],
|
||||||
pool, i, j);
|
pool, i, j);
|
||||||
}
|
}
|
||||||
|
@ -583,7 +583,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
|
||||||
|
|
||||||
if (pool->use_dma_alloc) {
|
if (pool->use_dma_alloc) {
|
||||||
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
|
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
|
||||||
for (j = 0; j < MAX_ORDER; ++j)
|
for (j = 0; j <= MAX_ORDER; ++j)
|
||||||
ttm_pool_type_fini(&pool->caching[i].orders[j]);
|
ttm_pool_type_fini(&pool->caching[i].orders[j]);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -637,7 +637,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
|
||||||
unsigned int i;
|
unsigned int i;
|
||||||
|
|
||||||
seq_puts(m, "\t ");
|
seq_puts(m, "\t ");
|
||||||
for (i = 0; i < MAX_ORDER; ++i)
|
for (i = 0; i <= MAX_ORDER; ++i)
|
||||||
seq_printf(m, " ---%2u---", i);
|
seq_printf(m, " ---%2u---", i);
|
||||||
seq_puts(m, "\n");
|
seq_puts(m, "\n");
|
||||||
}
|
}
|
||||||
|
@ -648,7 +648,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
|
||||||
{
|
{
|
||||||
unsigned int i;
|
unsigned int i;
|
||||||
|
|
||||||
for (i = 0; i < MAX_ORDER; ++i)
|
for (i = 0; i <= MAX_ORDER; ++i)
|
||||||
seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
|
seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
|
||||||
seq_puts(m, "\n");
|
seq_puts(m, "\n");
|
||||||
}
|
}
|
||||||
|
@ -757,7 +757,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
|
||||||
spin_lock_init(&shrinker_lock);
|
spin_lock_init(&shrinker_lock);
|
||||||
INIT_LIST_HEAD(&shrinker_list);
|
INIT_LIST_HEAD(&shrinker_list);
|
||||||
|
|
||||||
for (i = 0; i < MAX_ORDER; ++i) {
|
for (i = 0; i <= MAX_ORDER; ++i) {
|
||||||
ttm_pool_type_init(&global_write_combined[i], NULL,
|
ttm_pool_type_init(&global_write_combined[i], NULL,
|
||||||
ttm_write_combined, i);
|
ttm_write_combined, i);
|
||||||
ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
|
ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
|
||||||
|
@ -790,7 +790,7 @@ void ttm_pool_mgr_fini(void)
|
||||||
{
|
{
|
||||||
unsigned int i;
|
unsigned int i;
|
||||||
|
|
||||||
for (i = 0; i < MAX_ORDER; ++i) {
|
for (i = 0; i <= MAX_ORDER; ++i) {
|
||||||
ttm_pool_type_fini(&global_write_combined[i]);
|
ttm_pool_type_fini(&global_write_combined[i]);
|
||||||
ttm_pool_type_fini(&global_uncached[i]);
|
ttm_pool_type_fini(&global_uncached[i]);
|
||||||
|
|
||||||
|
|
|
@ -182,7 +182,7 @@
|
||||||
#ifdef CONFIG_CMA_ALIGNMENT
|
#ifdef CONFIG_CMA_ALIGNMENT
|
||||||
#define Q_MAX_SZ_SHIFT (PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
|
#define Q_MAX_SZ_SHIFT (PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
|
||||||
#else
|
#else
|
||||||
#define Q_MAX_SZ_SHIFT (PAGE_SHIFT + MAX_ORDER - 1)
|
#define Q_MAX_SZ_SHIFT (PAGE_SHIFT + MAX_ORDER)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
|
|
@ -736,7 +736,7 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
|
||||||
struct page **pages;
|
struct page **pages;
|
||||||
unsigned int i = 0, nid = dev_to_node(dev);
|
unsigned int i = 0, nid = dev_to_node(dev);
|
||||||
|
|
||||||
order_mask &= (2U << MAX_ORDER) - 1;
|
order_mask &= GENMASK(MAX_ORDER, 0);
|
||||||
if (!order_mask)
|
if (!order_mask)
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
|
@ -756,7 +756,7 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
|
||||||
* than a necessity, hence using __GFP_NORETRY until
|
* than a necessity, hence using __GFP_NORETRY until
|
||||||
* falling back to minimum-order allocations.
|
* falling back to minimum-order allocations.
|
||||||
*/
|
*/
|
||||||
for (order_mask &= (2U << __fls(count)) - 1;
|
for (order_mask &= GENMASK(__fls(count), 0);
|
||||||
order_mask; order_mask &= ~order_size) {
|
order_mask; order_mask &= ~order_size) {
|
||||||
unsigned int order = __fls(order_mask);
|
unsigned int order = __fls(order_mask);
|
||||||
gfp_t alloc_flags = gfp;
|
gfp_t alloc_flags = gfp;
|
||||||
|
|
|
@ -2445,8 +2445,8 @@ static bool its_parse_indirect_baser(struct its_node *its,
|
||||||
* feature is not supported by hardware.
|
* feature is not supported by hardware.
|
||||||
*/
|
*/
|
||||||
new_order = max_t(u32, get_order(esz << ids), new_order);
|
new_order = max_t(u32, get_order(esz << ids), new_order);
|
||||||
if (new_order >= MAX_ORDER) {
|
if (new_order > MAX_ORDER) {
|
||||||
new_order = MAX_ORDER - 1;
|
new_order = MAX_ORDER;
|
||||||
ids = ilog2(PAGE_ORDER_TO_SIZE(new_order) / (int)esz);
|
ids = ilog2(PAGE_ORDER_TO_SIZE(new_order) / (int)esz);
|
||||||
pr_warn("ITS@%pa: %s Table too large, reduce ids %llu->%u\n",
|
pr_warn("ITS@%pa: %s Table too large, reduce ids %llu->%u\n",
|
||||||
&its->phys_base, its_base_type_string[type],
|
&its->phys_base, its_base_type_string[type],
|
||||||
|
|
|
@ -1134,7 +1134,7 @@ static void __cache_size_refresh(void)
|
||||||
* If the allocation may fail we use __get_free_pages. Memory fragmentation
|
* If the allocation may fail we use __get_free_pages. Memory fragmentation
|
||||||
* won't have a fatal effect here, but it just causes flushes of some other
|
* won't have a fatal effect here, but it just causes flushes of some other
|
||||||
* buffers and more I/O will be performed. Don't use __get_free_pages if it
|
* buffers and more I/O will be performed. Don't use __get_free_pages if it
|
||||||
* always fails (i.e. order >= MAX_ORDER).
|
* always fails (i.e. order > MAX_ORDER).
|
||||||
*
|
*
|
||||||
* If the allocation shouldn't fail we use __vmalloc. This is only for the
|
* If the allocation shouldn't fail we use __vmalloc. This is only for the
|
||||||
* initial reserve allocation, so there's no risk of wasting all vmalloc
|
* initial reserve allocation, so there's no risk of wasting all vmalloc
|
||||||
|
|
|
@ -1828,7 +1828,7 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
|
||||||
* Replacement block manager (new_bm) is created and old_bm destroyed outside of
|
* Replacement block manager (new_bm) is created and old_bm destroyed outside of
|
||||||
* cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
|
* cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
|
||||||
* shrinker associated with the block manager's bufio client vs cmd root_lock).
|
* shrinker associated with the block manager's bufio client vs cmd root_lock).
|
||||||
* - must take shrinker_rwsem without holding cmd->root_lock
|
* - must take shrinker_mutex without holding cmd->root_lock
|
||||||
*/
|
*/
|
||||||
new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
|
new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
|
||||||
CACHE_MAX_CONCURRENT_LOCKS);
|
CACHE_MAX_CONCURRENT_LOCKS);
|
||||||
|
|
|
@ -1887,7 +1887,7 @@ int dm_pool_abort_metadata(struct dm_pool_metadata *pmd)
|
||||||
* Replacement block manager (new_bm) is created and old_bm destroyed outside of
|
* Replacement block manager (new_bm) is created and old_bm destroyed outside of
|
||||||
* pmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
|
* pmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
|
||||||
* shrinker associated with the block manager's bufio client vs pmd root_lock).
|
* shrinker associated with the block manager's bufio client vs pmd root_lock).
|
||||||
* - must take shrinker_rwsem without holding pmd->root_lock
|
* - must take shrinker_mutex without holding pmd->root_lock
|
||||||
*/
|
*/
|
||||||
new_bm = dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
|
new_bm = dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
|
||||||
THIN_MAX_CONCURRENT_LOCKS);
|
THIN_MAX_CONCURRENT_LOCKS);
|
||||||
|
|
|
@ -210,7 +210,7 @@ u32 genwqe_crc32(u8 *buff, size_t len, u32 init)
|
||||||
void *__genwqe_alloc_consistent(struct genwqe_dev *cd, size_t size,
|
void *__genwqe_alloc_consistent(struct genwqe_dev *cd, size_t size,
|
||||||
dma_addr_t *dma_handle)
|
dma_addr_t *dma_handle)
|
||||||
{
|
{
|
||||||
if (get_order(size) >= MAX_ORDER)
|
if (get_order(size) > MAX_ORDER)
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
return dma_alloc_coherent(&cd->pci_dev->dev, size, dma_handle,
|
return dma_alloc_coherent(&cd->pci_dev->dev, size, dma_handle,
|
||||||
|
|
|
@ -1040,7 +1040,7 @@ static void hns3_init_tx_spare_buffer(struct hns3_enet_ring *ring)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
order = get_order(alloc_size);
|
order = get_order(alloc_size);
|
||||||
if (order >= MAX_ORDER) {
|
if (order > MAX_ORDER) {
|
||||||
if (net_ratelimit())
|
if (net_ratelimit())
|
||||||
dev_warn(ring_to_dev(ring), "failed to allocate tx spare buffer, exceed to max order\n");
|
dev_warn(ring_to_dev(ring), "failed to allocate tx spare buffer, exceed to max order\n");
|
||||||
return;
|
return;
|
||||||
|
|
|
@ -75,7 +75,7 @@
|
||||||
* pool for the 4MB. Thus the 16 Rx and Tx queues require 32 * 5 = 160
|
* pool for the 4MB. Thus the 16 Rx and Tx queues require 32 * 5 = 160
|
||||||
* plus 16 for the TSO pools for a total of 176 LTB mappings per VNIC.
|
* plus 16 for the TSO pools for a total of 176 LTB mappings per VNIC.
|
||||||
*/
|
*/
|
||||||
#define IBMVNIC_ONE_LTB_MAX ((u32)((1 << (MAX_ORDER - 1)) * PAGE_SIZE))
|
#define IBMVNIC_ONE_LTB_MAX ((u32)((1 << MAX_ORDER) * PAGE_SIZE))
|
||||||
#define IBMVNIC_ONE_LTB_SIZE min((u32)(8 << 20), IBMVNIC_ONE_LTB_MAX)
|
#define IBMVNIC_ONE_LTB_SIZE min((u32)(8 << 20), IBMVNIC_ONE_LTB_MAX)
|
||||||
#define IBMVNIC_LTB_SET_SIZE (38 << 20)
|
#define IBMVNIC_LTB_SET_SIZE (38 << 20)
|
||||||
|
|
||||||
|
|
|
@ -946,7 +946,7 @@ static phys_addr_t hvfb_get_phymem(struct hv_device *hdev,
|
||||||
if (request_size == 0)
|
if (request_size == 0)
|
||||||
return -1;
|
return -1;
|
||||||
|
|
||||||
if (order < MAX_ORDER) {
|
if (order <= MAX_ORDER) {
|
||||||
/* Call alloc_pages if the size is less than 2^MAX_ORDER */
|
/* Call alloc_pages if the size is less than 2^MAX_ORDER */
|
||||||
page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
|
page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
|
||||||
if (!page)
|
if (!page)
|
||||||
|
@ -977,7 +977,7 @@ static void hvfb_release_phymem(struct hv_device *hdev,
|
||||||
{
|
{
|
||||||
unsigned int order = get_order(size);
|
unsigned int order = get_order(size);
|
||||||
|
|
||||||
if (order < MAX_ORDER)
|
if (order <= MAX_ORDER)
|
||||||
__free_pages(pfn_to_page(paddr >> PAGE_SHIFT), order);
|
__free_pages(pfn_to_page(paddr >> PAGE_SHIFT), order);
|
||||||
else
|
else
|
||||||
dma_free_coherent(&hdev->device,
|
dma_free_coherent(&hdev->device,
|
||||||
|
|
|
@ -197,7 +197,7 @@ static int vmlfb_alloc_vram(struct vml_info *vinfo,
|
||||||
va = &vinfo->vram[i];
|
va = &vinfo->vram[i];
|
||||||
order = 0;
|
order = 0;
|
||||||
|
|
||||||
while (requested > (PAGE_SIZE << order) && order < MAX_ORDER)
|
while (requested > (PAGE_SIZE << order) && order <= MAX_ORDER)
|
||||||
order++;
|
order++;
|
||||||
|
|
||||||
err = vmlfb_alloc_vram_area(va, order, 0);
|
err = vmlfb_alloc_vram_area(va, order, 0);
|
||||||
|
|
|
@ -33,7 +33,7 @@
|
||||||
#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
|
#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
|
||||||
__GFP_NOMEMALLOC)
|
__GFP_NOMEMALLOC)
|
||||||
/* The order of free page blocks to report to host */
|
/* The order of free page blocks to report to host */
|
||||||
#define VIRTIO_BALLOON_HINT_BLOCK_ORDER (MAX_ORDER - 1)
|
#define VIRTIO_BALLOON_HINT_BLOCK_ORDER MAX_ORDER
|
||||||
/* The size of a free page block in bytes */
|
/* The size of a free page block in bytes */
|
||||||
#define VIRTIO_BALLOON_HINT_BLOCK_BYTES \
|
#define VIRTIO_BALLOON_HINT_BLOCK_BYTES \
|
||||||
(1 << (VIRTIO_BALLOON_HINT_BLOCK_ORDER + PAGE_SHIFT))
|
(1 << (VIRTIO_BALLOON_HINT_BLOCK_ORDER + PAGE_SHIFT))
|
||||||
|
|
|
@ -1120,13 +1120,13 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
|
||||||
*/
|
*/
|
||||||
static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
|
static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
|
||||||
{
|
{
|
||||||
unsigned long order = MAX_ORDER - 1;
|
unsigned long order = MAX_ORDER;
|
||||||
unsigned long i;
|
unsigned long i;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We might get called for ranges that don't cover properly aligned
|
* We might get called for ranges that don't cover properly aligned
|
||||||
* MAX_ORDER - 1 pages; however, we can only online properly aligned
|
* MAX_ORDER pages; however, we can only online properly aligned
|
||||||
* pages with an order of MAX_ORDER - 1 at maximum.
|
* pages with an order of MAX_ORDER at maximum.
|
||||||
*/
|
*/
|
||||||
while (!IS_ALIGNED(pfn | nr_pages, 1 << order))
|
while (!IS_ALIGNED(pfn | nr_pages, 1 << order))
|
||||||
order--;
|
order--;
|
||||||
|
@ -1237,9 +1237,9 @@ static void virtio_mem_online_page(struct virtio_mem *vm,
|
||||||
bool do_online;
|
bool do_online;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We can get called with any order up to MAX_ORDER - 1. If our
|
* We can get called with any order up to MAX_ORDER. If our subblock
|
||||||
* subblock size is smaller than that and we have a mixture of plugged
|
* size is smaller than that and we have a mixture of plugged and
|
||||||
* and unplugged subblocks within such a page, we have to process in
|
* unplugged subblocks within such a page, we have to process in
|
||||||
* smaller granularity. In that case we'll adjust the order exactly once
|
* smaller granularity. In that case we'll adjust the order exactly once
|
||||||
* within the loop.
|
* within the loop.
|
||||||
*/
|
*/
|
||||||
|
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue