Commit graph

1743 commits

Author SHA1 Message Date
Kees Cook
06e67b849a fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum
The "FIRMWARE_EFI_EMBEDDED" enum is a "where", not a "what". It
should not be distinguished separately from just "FIRMWARE", as this
confuses the LSMs about what is being loaded. Additionally, there was
no actual validation of the firmware contents happening.

Fixes: e4c2c0ff00 ("firmware: Add new platform fallback mechanism and firmware_request_platform()")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201002173828.2099543-3-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:34:18 +02:00
Kees Cook
c307459b9d fs/kernel_read_file: Remove FIRMWARE_PREALLOC_BUFFER enum
FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs
that are interested in filtering between types of things. The "how"
should be an internal detail made uninteresting to the LSMs.

Fixes: a098ecd2fa ("firmware: support loading into a pre-allocated buffer")
Fixes: fd90bc559b ("ima: based on policy verify firmware signatures (pre-allocated buffer)")
Fixes: 4f0496d8ff ("ima: based on policy warn about loading firmware (pre-allocated buffer)")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201002173828.2099543-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:34:18 +02:00
Christoph Hellwig
bfdc59701d iov_iter: refactor rw_copy_check_uvector and import_iovec
Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

 - iovec_from_user copies a iovec from userspace either into the provided
   stack buffer if it fits, or allocates a new buffer for it.  Returns
   the actually used iovec.  It also verifies that iov_len does fit a
   signed type, and handles compat iovecs if the compat flag is set.
 - __import_iovec consolidates the native and compat versions of
   import_iovec. It calls iovec_from_user, then validates each iovec
   actually points to user addresses, and ensures the total length
   doesn't overflow.

This has two major implications:

 - the access_process_vm case loses the total lenght checking, which
   wasn't required anyway, given that each call receives two iovecs
   for the local and remote side of the operation, and it verifies
   the total length on the local side already.
 - instead of a single loop there now are two loops over the iovecs.
   Given that the iovecs are cache hot this doesn't make a major
   difference

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-03 00:01:56 -04:00
Jens Axboe
ce71bfea20 fs: align IOCB_* flags with RWF_* flags
We have a set of flags that are shared between the two and inherired
in kiocb_set_rw_flags(), but we check and set these individually.
Reorder the IOCB flags so that the bottom part of the space is synced
with the RWF flag space, and then we can do them all in one mask and
set operation.

The only exception is RWF_SYNC, which needs to mark IOCB_SYNC and
IOCB_DSYNC. Do that one separately.

This shaves 15 bytes of text from kiocb_set_rw_flags() for me.

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30 20:32:33 -06:00
Jens Axboe
a3ec600540 io_uring: move io_uring_get_socket() into io_uring.h
Now we have a io_uring kernel header, move this definition out of fs.h
and into io_uring.h where it belongs.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30 20:32:33 -06:00
Christoph Hellwig
09f1bde401 fs: move vfs_fstatat out of line
This allows to keep vfs_statx static in fs/stat.c to prepare for the following
changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-26 22:55:05 -04:00
Christoph Hellwig
0b2c6693b4 fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
Go through vfs_fstatat instead of duplicating the *stat to statx mapping
three times.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-26 22:55:05 -04:00
Christoph Hellwig
da9aa5d96b fs: remove vfs_statx_fd
vfs_statx_fd is only used to implement vfs_fstat.  Remove vfs_statx_fd
and just implement vfs_fstat directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-26 22:55:05 -04:00
Christoph Hellwig
1cb039f3dc bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
402dd2cf46 fs: remove the unused SB_I_MULTIROOT flag
The last user of SB_I_MULTIROOT is disappeared with commit f2aedb713c
("NFS: Add fs_context support.")

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:38 -06:00
Daniel Rosenberg
c843843e71 fs: Add standard casefolding support
This adds general supporting functions for filesystems that use
utf8 casefolding. It provides standard dentry_operations and adds the
necessary structures in struct super_block to allow this standardization.

The new dentry operations are functionally equivalent to the existing
operations in ext4 and f2fs, apart from the use of utf8_casefold_hash to
avoid an allocation.

By providing a common implementation, all users can benefit from any
optimizations without needing to port over improvements.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10 14:03:31 -07:00
Christoph Hellwig
36e2c7421f fs: don't allow splice read/write without explicit ops
default_file_splice_write is the last piece of generic code that uses
set_fs to make the uaccess routines operate on kernel pointers.  It
implements a "fallback loop" for splicing from files that do not actually
provide a proper splice_read method.  The usual file systems and other
high bandwidth instances all provide a ->splice_read, so this just removes
support for various device drivers and procfs/debugfs files.  If splice
support for any of those turns out to be important it can be added back
by switching them to the iter ops and using generic_file_splice_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-08 22:21:32 -04:00
Linus Torvalds
e309428590 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl9JG9wACgkQnJ2qBz9k
 QNlp3ggA3B/Xopb2X3cCpf2fFw63YGJU4i0XJxi+3fC/v6m8U+D4XbqJUjaM5TZz
 +4XABQf7OHvSwDezc3n6KXXD/zbkZCeVm9aohEXvfMYLyKbs+S7QNQALHEtpfBUU
 3IY2pQ90K7JT9cD9pJls/Y/EaA1ObWP7+3F1zpw8OutGchKcE8SvVjzL3SSJaj7k
 d8OTtMosAFuTe4saFWfsf9CmZzbx4sZw3VAzXEXAArrxsmqFKIcY8dI8TQ0WaYNh
 C3wQFvW+n9wHapylyi7RhGl2QH9Tj8POfnCTahNFFJbsmJBx0Z3r42mCBAk4janG
 FW+uDdH5V780bTNNVUKz0v4C/YDiKg==
 =jQnW
 -----END PGP SIGNATURE-----

Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull writeback fixes from Jan Kara:
 "Fixes for writeback code occasionally skipping writeback of some
  inodes or livelocking sync(2)"

* tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  writeback: Drop I_DIRTY_TIME_EXPIRE
  writeback: Fix sync livelock due to b_dirty_time processing
  writeback: Avoid skipping inode writeback
  writeback: Protect inode->i_io_list with inode->i_lock
2020-08-28 10:57:14 -07:00
Linus Torvalds
2cc3c4b3c2 io_uring-5.9-2020-08-15
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl84aLkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgps72D/96/HCiUArhxmRltiwqB9KjemEPaY7nWDkz
 0hrUtTCr/MjqFIzgPwx6pIjdiSP4GbmcMCrBO67E+mLbOdO1hKte2ElysRAsTlLE
 fGrcdrs3is5+QK8aqJJks74NzM7XG6jfrR8ewV9cz6aJRWbgRjCWvSxx/03Iha2B
 t897xuwJg4K30Z83IxPfnu+Xp0dIfmFLUuXQApsZ3bNTuQW3sR4CC/v418+1Wmk4
 kXGbQtxcEsrhCy9OxnNyU6GPEJ3b99ANPbRE8OUNQwaHiejvMOCWpcmaoT6TwKeU
 aku+XcVyoMjxJQk2k0uzr8Ecj5G1FJv4fUHhTZBxcGqkrxhkTjQ520HtrqPlc7uV
 BjyXutZ8yjmeCbvGrsXu8f8ktjHHkntkRDA8hgzW1OpmWYuWKF/2OIjNpmcmtvbj
 XqwBDEKdQW9X4dHoQKsVExtzeT6nNP0dxaeZX8OeB2GGkitP7rCm8k/SOuDPTCLi
 MX/qWpERo4hRfCLjY+4nezxkFMLIF7Jej3tzwuVRshFYsRVQzTPQpbnkmkuwibhi
 ObEwVI+lLkbatnR2wmJwoVKcywH13U68VNJXACyw1GZnPlp2lYylT/MV80y/iELE
 mj4zDklqwfIrnoHEuCkERwgXrYsffhUrFvajmAMnJncUFOI4khYJ+dWBVVlBkyn0
 e7UK1Sd7Jw==
 =T+fx
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few differerent things in here.

  Seems like syzbot got some more io_uring bits wired up, and we got a
  handful of reports and the associated fixes are in here.

  General fixes too, and a lot of them marked for stable.

  Lastly, a bit of fallout from the async buffered reads, where we now
  more easily trigger short reads. Some applications don't really like
  that, so the io_read() code now handles short reads internally, and
  got a cleanup along the way so that it's now easier to read (and
  documented). We're now passing tests that failed before"

* tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
  io_uring: short circuit -EAGAIN for blocking read attempt
  io_uring: sanitize double poll handling
  io_uring: internally retry short reads
  io_uring: retain iov_iter state over io_read/io_write calls
  task_work: only grab task signal lock when needed
  io_uring: enable lookup of links holding inflight files
  io_uring: fail poll arm on queue proc failure
  io_uring: hold 'ctx' reference around task_work queue + execute
  fs: RWF_NOWAIT should imply IOCB_NOIO
  io_uring: defer file table grabbing request cleanup for locked requests
  io_uring: add missing REQ_F_COMP_LOCKED for nested requests
  io_uring: fix recursive completion locking on oveflow flush
  io_uring: use TWA_SIGNAL for task_work uncondtionally
  io_uring: account locked memory before potential error case
  io_uring: set ctx sq/cq entry count earlier
  io_uring: Fix NULL pointer dereference in loop_rw_iter()
  io_uring: add comments on how the async buffered read retry works
  io_uring: io_async_buf_func() need not test page bit
2020-08-16 10:55:12 -07:00
Mike Kravetz
34ae204f18 hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem
Commit c0d0381ade ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem
in at least read mode.  This is because the explicit locking in
huge_pmd_share (called by huge_pte_alloc) was removed.  When restructuring
the code, the call to huge_pte_alloc in the else block at the beginning of
hugetlb_fault was missed.

Unfortunately, that else clause is exercised when there is no page table
entry.  This will likely lead to a call to huge_pmd_share.  If
huge_pmd_share thinks pmd sharing is possible, it will traverse the
mapping tree (i_mmap) without holding i_mmap_rwsem.  If someone else is
modifying the tree, bad things such as addressing exceptions or worse
could happen.

Simply remove the else clause.  It should have been removed previously.
The code following the else will call huge_pte_alloc with the appropriate
locking.

To prevent this type of issue in the future, add routines to assert that
i_mmap_rwsem is held, and call these routines in huge pmd sharing
routines.

Fixes: c0d0381ade ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A.Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/e670f327-5cf9-1959-96e4-6dc7cc30d3d5@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:57:56 -07:00
Jens Axboe
efa8480a83 fs: RWF_NOWAIT should imply IOCB_NOIO
With the change allowing read-ahead for IOCB_NOWAIT, we changed the
RWF_NOWAIT semantics of only doing cached reads. Since we know have
IOCB_NOIO to manage that specific side of it, just make RWF_NOWAIT
imply IOCB_NOIO as well to restore the previous behavior.

Fixes: 2e85abf053 ("mm: allow read-ahead with IOCB_NOWAIT set")
Reported-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-11 08:09:01 -06:00
Linus Torvalds
8c2618a6d0 Changes in gfs2:
- Make sure transactions won't be started recursively in gfs2_block_zero_range.
   (Bug introduced in 5.4 when switching to iomap_zero_range.)
 - Fix a glock holder refcount leak introduced in the iopen glock locking
   scheme rework merged in 5.8.
 - A few other small improvements (debugging, stack usage, comment fixes).
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl8xkmAUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZToB0w/9FANsxraTsNxxZUQuP2uyPiB/TRxY
 Vutp63Pe+eE0GNxgpfLWT+QgO99BQlGZ8WmDJ49ex0dZXZY4lv4L7JoGTV5VwAyh
 CFqRcWQkAb7zVvOxAeKfyXT0CwhS7O0avvlubSpEHfQgPkGQa3q3vu18vg1jHfB3
 OsFjZGCoyiJKmNFX005euhebnStabaGeP0+8uSz/5xHS+NQUbB3jPlgycUMvTVBe
 Lhl052rVQzlULNUCkehKnaBxzN0/8K55iFOaOSd2kzZ5BMfRabvskZEyJFf4T75p
 VJyElekk5mGV0k/FpGL03Me1oqMddDLpAFCGh9Hw51o02i8wZ3RderfQHfUmojku
 5/lLEcEJV+oC7gJR7IsGRme71De9y+uLnKvod+ayBw+9us1ZbEm4zJY7hLLGyq03
 piuFEAo0UGUmxGn8/s1RBT0lMYKjEDGIGjockaz/XzEQMSip27JZq4ATnA5CHaSy
 8q4PFflxEaXWJtPEjiY7DW1xQhYW/3cEDfd7kjSBw/GUxk1ILXRMnzCOsjrYuWfH
 ykH0gzVq7/uJln/otYa39RBdt/GQXJCzhvODeizF6B36seVX9oBBEc6TtohxREwt
 aIpupz6iGQ6GXddg1Sq6p2w3xyW8e8bG4YislEyiCyxpdOjvcYdM6cVxF7UBgtgu
 fwEbjgGO34pAO1E=
 =+Nig
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Make sure transactions won't be started recursively in
   gfs2_block_zero_range (bug introduced in 5.4 when switching to
   iomap_zero_range)

 - Fix a glock holder refcount leak introduced in the iopen glock
   locking scheme rework merged in 5.8.

 - A few other small improvements (debugging, stack usage, comment
   fixes).

* tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: When gfs2_dirty_inode gets a glock error, dump the glock
  gfs2: Never call gfs2_block_zero_range with an open transaction
  gfs2: print details on transactions that aren't properly ended
  gfs2: Fix inaccurate comment
  fs: Fix typo in comment
  gfs2: Fix refcount leak in gfs2_glock_poke
  gfs2: Pass glock holder to gfs2_file_direct_{read,write}
  gfs2: Add some flags missing from glock output
2020-08-10 18:22:43 -07:00
Linus Torvalds
b79675e15a Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "No common topic whatsoever in those, sorry"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: define inode flags using bit numbers
  iov_iter: Move unnecessary inclusion of crypto/hash.h
  dlmfs: clean up dlmfs_file_{read,write}() a bit
2020-08-07 21:14:30 -07:00
Linus Torvalds
81e11336d9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few MM hotfixes

 - kthread, tools, scripts, ntfs and ocfs2

 - some of MM

Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  mm: vmscan: consistent update to pgrefill
  mm/vmscan.c: fix typo
  khugepaged: khugepaged_test_exit() check mmget_still_valid()
  khugepaged: retract_page_tables() remember to test exit
  khugepaged: collapse_pte_mapped_thp() protect the pmd lock
  khugepaged: collapse_pte_mapped_thp() flush the right range
  mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
  mm: thp: replace HTTP links with HTTPS ones
  mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
  mm/page_alloc.c: skip setting nodemask when we are in interrupt
  mm/page_alloc: fallbacks at most has 3 elements
  mm/page_alloc: silence a KASAN false positive
  mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
  mm/page_alloc.c: simplify pageblock bitmap access
  mm/page_alloc.c: extract the common part in pfn_to_bitidx()
  mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
  mm/shuffle: remove dynamic reconfiguration
  mm/memory_hotplug: document why shuffle_zone() is relevant
  mm/page_alloc: remove nr_free_pagecache_pages()
  mm: remove vm_total_pages
  ...
2020-08-07 11:39:33 -07:00
Peter Collingbourne
45e55300f1 mm: remove unnecessary wrapper function do_mmap_pgoff()
The current split between do_mmap() and do_mmap_pgoff() was introduced in
commit 1fcfd8db7f ("mm, mpx: add "vm_flags_t vm_flags" arg to
do_mmap_pgoff()") to support MPX.

The wrapper function do_mmap_pgoff() always passed 0 as the value of the
vm_flags argument to do_mmap().  However, MPX support has subsequently
been removed from the kernel and there were no more direct callers of
do_mmap(); all calls were going via do_mmap_pgoff().

Simplify the code by removing do_mmap_pgoff() and changing all callers to
directly call do_mmap(), which now no longer takes a vm_flags argument.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Chris Down
e809d5f0b5 tmpfs: per-superblock i_ino support
Patch series "tmpfs: inode: Reduce risk of inum overflow", v7.

In Facebook production we are seeing heavy i_ino wraparounds on tmpfs.  On
affected tiers, in excess of 10% of hosts show multiple files with
different content and the same inode number, with some servers even having
as many as 150 duplicated inode numbers with differing file content.

This causes actual, tangible problems in production.  For example, we have
complaints from those working on remote caches that their application is
reporting cache corruptions because it uses (device, inodenum) to
establish the identity of a particular cache object, but because it's not
unique any more, the application refuses to continue and reports cache
corruption.  Even worse, sometimes applications may not even detect the
corruption but may continue anyway, causing phantom and hard to debug
behaviour.

In general, userspace applications expect that (device, inodenum) should
be enough to be uniquely point to one inode, which seems fair enough.  One
might also need to check the generation, but in this case:

1. That's not currently exposed to userspace
   (ioctl(...FS_IOC_GETVERSION...) returns ENOTTY on tmpfs);
2. Even with generation, there shouldn't be two live inodes with the
   same inode number on one device.

In order to mitigate this, we take a two-pronged approach:

1. Moving inum generation from being global to per-sb for tmpfs. This
   itself allows some reduction in i_ino churn. This works on both 64-
   and 32- bit machines.
2. Adding inode{64,32} for tmpfs. This fix is supported on machines with
   64-bit ino_t only: we allow users to mount tmpfs with a new inode64
   option that uses the full width of ino_t, or CONFIG_TMPFS_INODE64.

You can see how this compares to previous related patches which didn't
implement this per-superblock:

- https://patchwork.kernel.org/patch/11254001/
- https://patchwork.kernel.org/patch/11023915/

This patch (of 2):

get_next_ino has a number of problems:

- It uses and returns a uint, which is susceptible to become overflowed
  if a lot of volatile inodes that use get_next_ino are created.
- It's global, with no specificity per-sb or even per-filesystem. This
  means it's not that difficult to cause inode number wraparounds on a
  single device, which can result in having multiple distinct inodes
  with the same inode number.

This patch adds a per-superblock counter that mitigates the second case.
This design also allows us to later have a specific i_ino size per-device,
for example, allowing users to choose whether to use 32- or 64-bit inodes
for each tmpfs mount.  This is implemented in the next commit.

For internal shmem mounts which may be less tolerant to spinlock delays,
we implement a percpu batching scheme which only takes the stat_lock at
each batch boundary.

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/cover.1594661218.git.chris@chrisdown.name
Link: http://lkml.kernel.org/r/1986b9d63b986f08ec07a4aa4b2275e718e47d8a.1594661218.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Linus Torvalds
e1ec517e18 Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull init and set_fs() cleanups from Al Viro:
 "Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series"

* 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits)
  init: add an init_dup helper
  init: add an init_utimes helper
  init: add an init_stat helper
  init: add an init_mknod helper
  init: add an init_mkdir helper
  init: add an init_symlink helper
  init: add an init_link helper
  init: add an init_eaccess helper
  init: add an init_chmod helper
  init: add an init_chown helper
  init: add an init_chroot helper
  init: add an init_chdir helper
  init: add an init_rmdir helper
  init: add an init_unlink helper
  init: add an init_umount helper
  init: add an init_mount helper
  init: mark create_dev as __init
  init: mark console_on_rootfs as __init
  init: initialize ramdisk_execute_command at compile time
  devtmpfs: refactor devtmpfsd()
  ...
2020-08-07 09:40:34 -07:00
Linus Torvalds
2324d50d05 It's been a busy cycle for documentation - hopefully the busiest for a
while to come.  Changes include:
 
  - Some new Chinese translations
 
  - Progress on the battle against double words words and non-HTTPS URLs
 
  - Some block-mq documentation
 
  - More RST conversions from Mauro.  At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again for a
    while.  Unless we decide to switch to asciidoc or something...:)
 
  - Lots of typo fixes, warning fixes, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
 cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
 BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
 =JQLZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for <alobakin@marvell.com>
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
2020-08-04 22:47:54 -07:00
Linus Torvalds
cdc8fcb499 for-5.9/io_uring-20200802
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8m7asQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplrCD/0S17kio+k4cOJDGwl88WoJw+QiYmM5019k
 decZ1JymQvV1HXRmlcZiEAu0hHDD0FoovSRrw7II3gw3GouETmYQM62f6ZTpDeMD
 CED/fidnfULAkPaI6h+bj3jyI0cEuujG/R47rGSQEkIIr3RttqKZUzVkB9KN+KMw
 +OBuXZtMIoFFEVJ91qwC2dm2qHLqOn1/5MlT59knso/xbPOYOXsFQpGiACJqF97x
 6qSSI8uGE+HZqvL2OLWPDBbLEJhrq+dzCgxln5VlvLele4UcRhOdonUb7nUwEKCe
 zwvtXzz16u1D1b8bJL4Kg5bGqyUAQUCSShsfBJJxh6vTTULiHyCX5sQaai1OEB16
 4dpBL9E+nOUUix4wo9XBY0/KIYaPWg5L1CoEwkAXqkXPhFvNUucsC0u6KvmzZR3V
 1OogVTjl6GhS8uEVQjTKNshkTIC9QHEMXDUOHtINDCb/sLU+ANXU5UpvsuzZ9+kt
 KGc4mdyCwaKBq4YW9sVwhhq/RHLD4AUtWZiUVfOE+0cltCLJUNMbQsJ+XrcYaQnm
 W4zz22Rep+SJuQNVcCW/w7N2zN3yB6gC1qeroSLvzw4b5el2TdFp+BcgVlLHK+uh
 xjsGNCq++fyzNk7vvMZ5hVq4JGXYjza7AiP5HlQ8nqdiPUKUPatWCBqUm9i9Cz/B
 n+0dlYbRwQ==
 =2vmy
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:
 "Lots of cleanups in here, hardening the code and/or making it easier
  to read and fixing bugs, but a core feature/change too adding support
  for real async buffered reads. With the latter in place, we just need
  buffered write async support and we're done relying on kthreads for
  the fast path. In detail:

   - Cleanup how memory accounting is done on ring setup/free (Bijan)

   - sq array offset calculation fixup (Dmitry)

   - Consistently handle blocking off O_DIRECT submission path (me)

   - Support proper async buffered reads, instead of relying on kthread
     offload for that. This uses the page waitqueue to drive retries
     from task_work, like we handle poll based retry. (me)

   - IO completion optimizations (me)

   - Fix race with accounting and ring fd install (me)

   - Support EPOLLEXCLUSIVE (Jiufei)

   - Get rid of the io_kiocb unionizing, made possible by shrinking
     other bits (Pavel)

   - Completion side cleanups (Pavel)

   - Cleanup REQ_F_ flags handling, and kill off many of them (Pavel)

   - Request environment grabbing cleanups (Pavel)

   - File and socket read/write cleanups (Pavel)

   - Improve kiocb_set_rw_flags() (Pavel)

   - Tons of fixes and cleanups (Pavel)

   - IORING_SQ_NEED_WAKEUP clear fix (Xiaoguang)"

* tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block: (127 commits)
  io_uring: flip if handling after io_setup_async_rw
  fs: optimise kiocb_set_rw_flags()
  io_uring: don't touch 'ctx' after installing file descriptor
  io_uring: get rid of atomic FAA for cq_timeouts
  io_uring: consolidate *_check_overflow accounting
  io_uring: fix stalled deferred requests
  io_uring: fix racy overflow count reporting
  io_uring: deduplicate __io_complete_rw()
  io_uring: de-unionise io_kiocb
  io-wq: update hash bits
  io_uring: fix missing io_queue_linked_timeout()
  io_uring: mark ->work uninitialised after cleanup
  io_uring: deduplicate io_grab_files() calls
  io_uring: don't do opcode prep twice
  io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works
  io_uring: batch put_task_struct()
  tasks: add put_task_struct_many()
  io_uring: return locked and pinned page accounting
  io_uring: don't miscount pinned memory
  io_uring: don't open-code recv kbuf managment
  ...
2020-08-03 13:01:22 -07:00
Linus Torvalds
382625d0d4 for-5.9/block-20200802
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8m7YwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpt+dEAC7a0HYuX2OrkyawBnsgd1QQR/soC7surec
 yDDa7SMM8cOq3935bfzcYHV9FWJszEGIknchiGb9R3/T+vmSohbvDsM5zgwya9u/
 FHUIuTq324I6JWXKl30k4rwjiX9wQeMt+WZ5gC8KJYCWA296i2IpJwd0A45aaKuS
 x4bTjxqknE+fD4gQiMUSt+bmuOUAp81fEku3EPapCRYDPAj8f5uoY7R2arT/POwB
 b+s+AtXqzBymIqx1z0sZ/XcdZKmDuhdurGCWu7BfJFIzw5kQ2Qe3W8rUmrQ3pGut
 8a21YfilhUFiBv+B4wptfrzJuzU6Ps0BXHCnBsQjzvXwq5uFcZH495mM/4E4OJvh
 SbjL2K4iFj+O1ngFkukG/F8tdEM1zKBYy2ZEkGoWKUpyQanbAaGI6QKKJA+DCdBi
 yPEb7yRAa5KfLqMiocm1qCEO1I56HRiNHaJVMqCPOZxLmpXj19Fs71yIRplP1Trv
 GGXdWZsccjuY6OljoXWdEfnxAr5zBsO3Yf2yFT95AD+egtGsU1oOzlqAaU1mtflw
 ABo452pvh6FFpxGXqz6oK4VqY4Et7WgXOiljA4yIGoPpG/08L1Yle4eVc2EE01Jb
 +BL49xNJVeUhGFrvUjPGl9kVMeLmubPFbmgrtipW+VRg9W8+Yirw7DPP6K+gbPAR
 RzAUdZFbWw==
 =abJG
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block

Pull core block updates from Jens Axboe:
 "Good amount of cleanups and tech debt removals in here, and as a
  result, the diffstat shows a nice net reduction in code.

   - Softirq completion cleanups (Christoph)

   - Stop using ->queuedata (Christoph)

   - Cleanup bd claiming (Christoph)

   - Use check_events, moving away from the legacy media change
     (Christoph)

   - Use inode i_blkbits consistently (Christoph)

   - Remove old unused writeback congestion bits (Christoph)

   - Cleanup/unify submission path (Christoph)

   - Use bio_uninit consistently, instead of bio_disassociate_blkg
     (Christoph)

   - sbitmap cleared bits handling (John)

   - Request merging blktrace event addition (Jan)

   - sysfs add/remove race fixes (Luis)

   - blk-mq tag fixes/optimizations (Ming)

   - Duplicate words in comments (Randy)

   - Flush deferral cleanup (Yufen)

   - IO context locking/retry fixes (John)

   - struct_size() usage (Gustavo)

   - blk-iocost fixes (Chengming)

   - blk-cgroup IO stats fixes (Boris)

   - Various little fixes"

* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
  block: blk-timeout: delete duplicated word
  block: blk-mq-sched: delete duplicated word
  block: blk-mq: delete duplicated word
  block: genhd: delete duplicated words
  block: elevator: delete duplicated word and fix typos
  block: bio: delete duplicated words
  block: bfq-iosched: fix duplicated word
  iocost_monitor: start from the oldest usage index
  iocost: Fix check condition of iocg abs_vdebt
  block: Remove callback typedefs for blk_mq_ops
  block: Use non _rcu version of list functions for tag_set_list
  blk-cgroup: show global disk stats in root cgroup io.stat
  blk-cgroup: make iostat functions visible to stat printing
  block: improve discard bio alignment in __blkdev_issue_discard()
  block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
  block: defer flush request no matter whether we have elevator
  block: make blk_timeout_init() static
  block: remove retry loop in ioc_release_fn()
  block: remove unnecessary ioc nested locking
  block: integrate bd_start_claiming into __blkdev_get
  ...
2020-08-03 11:57:03 -07:00
Linus Torvalds
690b25675f fscrypt updates for 5.9
This release, we add support for inline encryption via the blk-crypto
 framework which was added in 5.8.  Now when an ext4 or f2fs filesystem
 is mounted with '-o inlinecrypt', the contents of encrypted files will
 be encrypted/decrypted via blk-crypto, instead of directly using the
 crypto API.  This model allows taking advantage of the inline encryption
 hardware that is integrated into the UFS or eMMC host controllers on
 most mobile SoCs.  Note that this is just an alternate implementation;
 the ciphertext written to disk stays the same.
 
 (This pull request does *not* include support for direct I/O on
 encrypted files, which blk-crypto makes possible, since that part is
 still being discussed.)
 
 Besides the above feature update, there are also a few fixes and
 cleanups, e.g. strengthening some memory barriers that may be too weak.
 
 All these patches have been in linux-next with no reported issues.  I've
 also tested them with the fscrypt xfstests, as usual.  It's also been
 tested that the inline encryption support works with the support for
 Qualcomm and Mediatek inline encryption hardware that will be in the
 scsi pull request for 5.9.  Also, several SoC vendors are already using
 a previous, functionally equivalent version of these patches.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXye2EBQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK0veAQCKEnwvy+M6s2/QWhC9vo01rABMtt7h
 VRAAKPiFzLNH3AD/dCnZNsFUzk3x0ZyiU1YRW3FvlxFOaEO7Ea0Pt/pyyQ0=
 =g9FK
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt updates from Eric Biggers:
 "This release, we add support for inline encryption via the blk-crypto
  framework which was added in 5.8.

  Now when an ext4 or f2fs filesystem is mounted with '-o inlinecrypt',
  the contents of encrypted files will be encrypted/decrypted via
  blk-crypto, instead of directly using the crypto API. This model
  allows taking advantage of the inline encryption hardware that is
  integrated into the UFS or eMMC host controllers on most mobile SoCs.

  Note that this is just an alternate implementation; the ciphertext
  written to disk stays the same.

  (This pull request does *not* include support for direct I/O on
  encrypted files, which blk-crypto makes possible, since that part is
  still being discussed.)

  Besides the above feature update, there are also a few fixes and
  cleanups, e.g. strengthening some memory barriers that may be too
  weak.

  All these patches have been in linux-next with no reported issues.
  I've also tested them with the fscrypt xfstests, as usual. It's also
  been tested that the inline encryption support works with the support
  for Qualcomm and Mediatek inline encryption hardware that will be in
  the scsi pull request for 5.9. Also, several SoC vendors are already
  using a previous, functionally equivalent version of these patches"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: don't load ->i_crypt_info before it's known to be valid
  fscrypt: document inline encryption support
  fscrypt: use smp_load_acquire() for ->i_crypt_info
  fscrypt: use smp_load_acquire() for ->s_master_keys
  fscrypt: use smp_load_acquire() for fscrypt_prepared_key
  fscrypt: switch fscrypt_do_sha256() to use the SHA-256 library
  fscrypt: restrict IV_INO_LBLK_* to AES-256-XTS
  fscrypt: rename FS_KEY_DERIVATION_NONCE_SIZE
  fscrypt: add comments that describe the HKDF info strings
  ext4: add inline encryption support
  f2fs: add inline encryption support
  fscrypt: add inline encryption support
  fs: introduce SB_INLINECRYPT
2020-08-03 10:09:59 -07:00
Andreas Gruenbacher
c9dff08485 fs: Fix typo in comment
The comment for function filemap_check_wb_err accidentally refers to
it as filemap_check_wb_error.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-03 14:11:26 +02:00
Pavel Begunkov
1752f0adea fs: optimise kiocb_set_rw_flags()
Use a local var to collect flags in kiocb_set_rw_flags(). That spares
some memory writes and allows to replace most of the jumps with MOVEcc.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-01 11:02:00 -06:00
Christoph Hellwig
fd5ad30c78 fs: expose utimes_common
Rename utimes_common to vfs_utimes and make it available outside of
utimes.c.  This will be used by the initramfs unpacking code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-31 08:16:01 +02:00
Eric Biggers
6414e9b09f fs: define inode flags using bit numbers
Define the VFS inode flags using bit numbers instead of hardcoding
powers of 2, which has become unwieldy now that we're up to 65536.

No change in the actual values.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27 14:39:53 -04:00
Christoph Hellwig
9e96c8c0e9 fs: add a vfs_fchmod helper
Add a helper for struct file based chmode operations.  To be used by
the initramfs code soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-16 15:33:04 +02:00
Christoph Hellwig
c04011fe8c fs: add a vfs_fchown helper
Add a helper for struct file based chown operations.  To be used by
the initramfs code soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-16 15:33:00 +02:00
Linus Torvalds
b1b11d0063 cleanup in-kernel read and write operations
Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure
 all users of in-kernel file I/O use them if they don't use iov_iter
 based methods already.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl8Ij8gLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOcpBAAn157ooLqRrqQisEA6j59rTgkHUuqZMUx+8XjiivX
 baHQPmgctza1Xzjc4PjJ1owtLpt4ywcTpY8IDj3vZF1PpffeeuWVzxMTk/aIvhNN
 zPK2SJpRlDQHErKEhkTTOfOYoFTgc7vPa5Hvm6AEMaJs8oPtGZ2rnQHzPXENl/TY
 TgcLd1ou3iuw19UIAfB+EfuC9uhq7pCPu9+tryNyT2IfM7fqdsIhRESpcodg1ve+
 1k6leFIBrXa3MWiBGVUGCrSmlpP9xd22Zl8D/w60WeYWeg7szZoUK2bjhbdIEDZI
 tTwkdZ73IKpcxOyzUVbfr2hqNa94zrXCKQGfEGVS/7arV7QH4yvhg9NU9lqVXZKV
 ruPoyjsmJkHW52FfEEv1Gfrd6v4H6qZ6iyJEm3ZYNGul85O97t1xA/kKxAIwMuPa
 nFhhxHIooT/We3Ao77FROhIob4D5AOfOI4gvkTE15YMzsNxT/yjilQjdDFR5An6A
 ckzqb+VyDvcTx2gxR/qaol7b4lzmri4S/8Jt7WXjHOtNe9eXC4kl44leitK5j31H
 fHZNyMLJ2+/JF5pGB2rNRNnTeQ7lXKob4Y+qAjRThddDxtdsf5COdZAiIiZbRurR
 Ogl2k3sMDdHgNfycK2Bg5Fab9OIWePQlpcGU14afUSPviuNkIYKLGrx92ZWef53j
 loI=
 =eYsI
 -----END PGP SIGNATURE-----

Merge tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc

Pull in-kernel read and write op cleanups from Christoph Hellwig:
 "Cleanup in-kernel read and write operations

  Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure
  all users of in-kernel file I/O use them if they don't use iov_iter
  based methods already.

  The new WARN_ONs in combination with syzcaller already found a missing
  input validation in 9p. The fix should be on your way through the
  maintainer ASAP".

[ This is prep-work for the real changes coming 5.9 ]

* tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc:
  fs: remove __vfs_read
  fs: implement kernel_read using __kernel_read
  integrity/ima: switch to using __kernel_read
  fs: add a __kernel_read helper
  fs: remove __vfs_write
  fs: implement kernel_write using __kernel_write
  fs: check FMODE_WRITE in __kernel_write
  fs: unexport __kernel_write
  bpfilter: switch to kernel_write
  autofs: switch to kernel_write
  cachefiles: switch to kernel_write
2020-07-10 09:45:15 -07:00
Satya Tangirala
457e7a135c fs: introduce SB_INLINECRYPT
Introduce SB_INLINECRYPT, which is set by filesystems that wish to use
blk-crypto for file content en/decryption. This flag maps to the
'-o inlinecrypt' mount option which multiple filesystems will implement,
and code in fs/crypto/ needs to be able to check for this mount option
in a filesystem-independent way.

Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200702015607.1215430-2-satyat@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-08 10:29:14 -07:00
Christoph Hellwig
775802c057 fs: remove __vfs_read
Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:57 +02:00
Christoph Hellwig
61a707c543 fs: add a __kernel_read helper
This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00
Andreas Gruenbacher
41da51bce3 fs: Add IOCB_NOIO flag for generic_file_read_iter
Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead.  This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-07-07 23:40:08 +02:00
Mauro Carvalho Chehab
21b9cb3438 fs: fs.h: fix a kernel-doc parameter description
Changeset 3b0311e7ca71 ("vfs: track per-sb writeback errors and report them to syncfs")
added a variant of filemap_sample_wb_err(), but it forgot to
rename the arguments at the kernel-doc markup. Fix it.

Fix those warnings:
	./include/linux/fs.h:2845: warning: Function parameter or member 'file' not described in 'file_sample_sb_err'
	./include/linux/fs.h:2845: warning: Excess function parameter 'mapping' description in 'file_sample_sb_err'

Fixes: 3b0311e7ca71 ("vfs: track per-sb writeback errors and report them to syncfs")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/7b33bbceb29ac80874622a2bc84127bb10103245.1592895969.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-06-26 10:01:05 -06:00
Christoph Hellwig
621c1f4294 block: move struct block_device to blk_types.h
Move the struct block_device definition together with most of the
block layer definitions, as it has nothing to do with the rest of fs.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Christoph Hellwig
3f1266f1f8 block: move block-related definitions out of fs.h
Move most of the block related definition out of fs.h into more suitable
headers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Christoph Hellwig
dd0dca223e block: simplify sb_is_blkdev_sb
Just use IS_ENABLED instead of providing a stub for !CONFIG_BLOCK.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Christoph Hellwig
75362a1792 fs: remove the mount_bdev and kill_block_super stubs
No one calls these functions without CONFIG_BLOCK, so don't bother
stubbing them out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Christoph Hellwig
4e24566a13 fs: remove the HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL defines
These are not defined anywhere, and contrary to the comments we really
do not care about out of tree code at all.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Christoph Hellwig
7dbac5baa8 fs: remove an unused block_device_operations forward declaration
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Christoph Hellwig
764b23bd9a block: mark bd_finish_claiming static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Christoph Hellwig
b818f09e46 tty/sysrq: emergency_thaw_all does not depend on CONFIG_BLOCK
We can also thaw non-block file systems.  Remove the CONFIG_BLOCK in
sysrq.c after making the prototype available unconditionally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24 09:16:02 -06:00
Jens Axboe
c2a25ec0f1 fs: add FMODE_BUF_RASYNC
If set, this indicates that the file system supports IOCB_WAITQ for
buffered reads.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
dd3e6d5039 mm: add support for async page locking
Normally waiting for a page to become unlocked, or locking the page,
requires waiting for IO to complete. Add support for lock_page_async()
and wait_on_page_locked_async(), which are callback based instead. This
allows a caller to get notified when a page becomes unlocked, rather
than wait for it.

We add a new iocb field, ki_waitq, to pass in the necessary data for this
to happen. We can unionize this with ki_cookie, since that is only used
for polled IO. Polled IO can never co-exist with async callbacks, as it is
(by definition) polled completions. struct wait_page_key is made public,
and we define struct wait_page_async as the interface between the caller
and the core.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Zheng Bin
3373a3461a block: make function 'kill_bdev' static
kill_bdev does not have any external user, so make it static.

Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-18 09:24:35 -06:00
Jan Kara
5fcd57505c writeback: Drop I_DIRTY_TIME_EXPIRE
The only use of I_DIRTY_TIME_EXPIRE is to detect in
__writeback_single_inode() that inode got there because flush worker
decided it's time to writeback the dirty inode time stamps (either
because we are syncing or because of age). However we can detect this
directly in __writeback_single_inode() and there's no need for the
strange propagation with I_DIRTY_TIME_EXPIRE flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-06-15 09:18:46 +02:00