Commit graph

56547 commits

Author SHA1 Message Date
Darrick J. Wong
2587b1f1fa ocfs2: truncate page cache for clone destination file before remapping
When cloning blocks into another file, truncate the page cache before we
start remapping blocks so that concurrent reads wait for us to finish.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:42:56 +11:00
Darrick J. Wong
8c5c836bd6 vfs: clean up generic_remap_file_range_prep return value
Since the remap prep function can update the length of the remap
request, we can change this function to return the usual return status
instead of the odd behavior it has now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:42:24 +11:00
Darrick J. Wong
c32e5f3995 vfs: hide file range comparison function
There are no callers of vfs_dedupe_file_range_compare, so we might as
well make it a static helper and remove the export.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:42:17 +11:00
Darrick J. Wong
eca3654e3c vfs: enable remap callers that can handle short operations
Plumb in a remap flag that enables the filesystem remap handler to
shorten remapping requests for callers that can handle it.  Now
copy_file_range can report partial success (in case we run up against
alignment problems, resource limits, etc.).

We also enable CAN_SHORTEN for fideduperange to maintain existing
userspace-visible behavior where xfs/btrfs shorten the dedupe range to
avoid stale post-eof data exposure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:42:10 +11:00
Darrick J. Wong
df36583619 vfs: plumb remap flags through the vfs dedupe functions
Plumb a remap_flags argument through the vfs_dedupe_file_range_one
functions so that dedupe can take advantage of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:42:03 +11:00
Darrick J. Wong
452ce65951 vfs: plumb remap flags through the vfs clone functions
Plumb a remap_flags argument through the {do,vfs}_clone_file_range
functions so that clone can take advantage of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:56 +11:00
Darrick J. Wong
42ec3d4c02 vfs: make remap_file_range functions take and return bytes completed
Change the remap_file_range functions to take a number of bytes to
operate upon and return the number of bytes they operated on.  This is a
requirement for allowing fs implementations to return short clone/dedupe
results to the user, which will enable us to obey resource limits in a
graceful manner.

A subsequent patch will enable copy_file_range to signal to the
->clone_file_range implementation that it can handle a short length,
which will be returned in the function's return value.  For now the
short return is not implemented anywhere so the behavior won't change --
either copy_file_range manages to clone the entire range or it tries an
alternative.

Neither clone ioctl can take advantage of this, alas.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:49 +11:00
Darrick J. Wong
8dde90bca6 vfs: remap helper should update destination inode metadata
Extend generic_remap_file_range_prep to handle inode metadata updates
when remapping into a file.  If the operation can possibly alter the
file contents, we must update the ctime and mtime and remove security
privileges, just like we do for regular file writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:41 +11:00
Darrick J. Wong
3d28193e1d vfs: pass remap flags to generic_remap_checks
Pass the same remap flags to generic_remap_checks for consistency.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:34 +11:00
Darrick J. Wong
a91ae49bba vfs: pass remap flags to generic_remap_file_range_prep
Plumb the remap flags through the filesystem from the vfs function
dispatcher all the way to the prep function to prepare for behavior
changes in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:28 +11:00
Darrick J. Wong
2e5dfc99f2 vfs: combine the clone and dedupe into a single remap_file_range
Combine the clone_file_range and dedupe_file_range operations into a
single remap_file_range file operation dispatch since they're
fundamentally the same operation.  The differences between the two can
be made in the prep functions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:21 +11:00
Darrick J. Wong
6095028b45 vfs: rename clone_verify_area to remap_verify_area
Since we use clone_verify_area for both clone and dedupe range checks,
rename the function to make it clear that it's for both.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:14 +11:00
Darrick J. Wong
a83ab01a62 vfs: rename vfs_clone_file_prep to be more descriptive
The vfs_clone_file_prep is a generic function to be called by filesystem
implementations only.  Rename the prefix to generic_ and make it more
clear that it applies to remap operations, not just clones.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:08 +11:00
Darrick J. Wong
9aae20500d vfs: skip zero-length dedupe requests
Don't bother calling the filesystem for a zero-length dedupe request;
we can return zero and exit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:41:01 +11:00
Darrick J. Wong
07d19dc9fb vfs: avoid problematic remapping requests into partial EOF block
A deduplication data corruption is exposed in XFS and btrfs. It is
caused by extending the block match range to include the partial EOF
block, but then allowing unknown data beyond EOF to be considered a
"match" to data in the destination file because the comparison is only
made to the end of the source file. This corrupts the destination file
when the source extent is shared with it.

The VFS remapping prep functions  only support whole block dedupe, but
we still need to appear to support whole file dedupe correctly.  Hence
if the dedupe request includes the last block of the souce file, don't
include it in the actual dedupe operation. If the rest of the range
dedupes successfully, then reject the entire request.  A subsequent
patch will enable us to shorten dedupe requests correctly.

When reflinking sub-file ranges, a data corruption can occur when the
source file range includes a partial EOF block. This shares the unknown
data beyond EOF into the second file at a position inside EOF, exposing
stale data in the second file.

If the reflink request includes the last block of the souce file, only
proceed with the reflink operation if it lands at or past the
destination file's current EOF. If it lands within the destination file
EOF, reject the entire request with -EINVAL and make the caller go the
hard way.  A subsequent patch will enable us to shorten reflink requests
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:40:55 +11:00
Darrick J. Wong
2c5773f102 vfs: exit early from zero length remap operations
If a remap caller asks us to remap to the source file's EOF and the
source file length leaves us with a zero byte request, exit early.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:40:39 +11:00
Darrick J. Wong
1383a7ed67 vfs: check file ranges before cloning files
Move the file range checks from vfs_clone_file_prep into a separate
generic_remap_checks function so that all the checks are collected in a
central location.  This forms the basis for adding more checks from
generic_write_checks that will make cloning's input checking more
consistent with write input checking.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:40:31 +11:00
Darrick J. Wong
5b49f64db2 vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF
vfs_clone_file_prep_inodes cannot return 0 if it is asked to remap from
a zero byte file because that's what btrfs does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30 10:40:22 +11:00
Linus Torvalds
134bf98c55 media updates for v4.20-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb12wyAAoJEAhfPr2O5OEVQG4P/3QlXjec6qlhbo6UPs54E2sC
 bVdZfp3mobo8NmLRt791Yh9cc0rN45Tlf2BT8XEmCyI6+NB++obU/j0LW5XT7sp7
 oE8IgeRraVFWH/Xl9lTgP15Cs6v43eyvP12xgRWBmr+TYugLHDVTheGBvU/COb3d
 yaykUULezuOMLA3HsPbz5EJOmU5rZ/Wa1w1sAiNJY/cRohfVb3kO4593enwUTMSx
 yHJ+AVjl/Dn3RV4yLwoybpxPH6XIb3KoLg/6Fx8bOlKy1sg0mcWpzQ1CvMUNpXTF
 kdwTw3ri1bfYnjChZewuKoJU8Wcw0Gt7pkqAhULN1ieo84MNA3bNor56pdRPaOZW
 KxzlXZRS6xgYW8bzZ51N0Ku6fwSt3AWRE7TeKcrHF84Yb8vOtPS15sp3qc+9o9rb
 EDV/lJLcz4bbi3W28di5WMFaN7LHxCHnRV7GvrcNQm6Im62CBFZHiI7jKjMv3tXp
 Taes0utMPGfWuY6fv4LmuBzFG4nGB6/H4RiVvL1cLkjnx/FJtWGH+1uOcKDraKeI
 ENBrK0VYrNH7nCDGNehiamStcVK+27tS+xsuqoZkGz6RA8vAxYBXTIZULXA98BPA
 f6NC32ZNJaruxh4qh5tUy+LKPGzXs0sWa9kfgKmFfaOndFLMjGTXHpAT5AYJMbNe
 iqKi/4aXD4aKAWTA7PPg
 =Cc6D
 -----END PGP SIGNATURE-----

Merge tag 'media/v4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - new dvb frontend driver: lnbh29

 - new sensor drivers: imx319 and imx 355

 - some old soc_camera driver renames to avoid conflict with new
   drivers

 - new i.MX Pixel Pipeline (PXP) mem-to-mem platform driver

 - a new V4L2 frontend for the FWHT codec

 - several other improvements, bug fixes, code cleanups, etc

* tag 'media/v4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (289 commits)
  media: rename soc_camera I2C drivers
  media: cec: forgot to cancel delayed work
  media: vivid: Support 480p for webcam capture
  media: v4l2-tpg: fix kernel oops when enabling HFLIP and OSD
  media: vivid: Add 16-bit bayer to format list
  media: v4l2-tpg-core: Add 16-bit bayer
  media: pvrusb2: replace `printk` with `pr_*`
  media: venus: vdec: fix decoded data size
  media: cx231xx: fix potential sign-extension overflow on large shift
  media: dt-bindings: media: rcar_vin: add device tree support for r8a7744
  media: isif: fix a NULL pointer dereference bug
  media: exynos4-is: make const array config_ids static
  media: cx23885: make const array addr_list static
  media: ivtv: make const array addr_list static
  media: bttv-input: make const array addr_list static
  media: cx18: Don't check for address of video_dev
  media: dw9807-vcm: Fix probe error handling
  media: dw9714: Remove useless error message
  media: dw9714: Fix error handling in probe function
  media: cec: name for RC passthrough device does not need 'RC for'
  ...
2018-10-29 14:29:58 -07:00
Amir Goldstein
93f38b6fae lockd: fix access beyond unterminated strings in prints
printk format used %*s instead of %.*s, so hostname_len does not limit
the number of bytes accessed from hostname.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Andrew Elble
bd8d725078 nfsd: correctly decrement odstate refcount in error path
alloc_init_deleg() both allocates an nfs4_delegation, and
bumps the refcount on odstate. So after this point, we need to
put_clnt_odstate() and nfs4_put_stid() to not leave the odstate
refcount inappropriately bumped.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Gustavo A. R. Silva
0ac203cb1f nfsd: fix fall-through annotations
Replace "fallthru" with a proper "fall through" annotation.

Also, add an annotation were it is expected to fall through.

These fixes are part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
736c6625de knfsd: Improve lookup performance in the duplicate reply cache using an rbtree
Use an rbtree to ensure the lookup/insert of an entry in a DRC bucket is
O(log(N)).

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
ed00c2f652 knfsd: Further simplify the cache lookup
Order the structure so that the key can be compared using memcmp().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
76ecec2119 knfsd: Simplify NFS duplicate replay cache
Simplify the duplicate replay cache by initialising the preallocated
cache entry, so that we can use it as a key for the cache lookup.

Note that the 99.999% case we want to optimise for is still the one
where the lookup fails, and we have to add this entry to the cache,
so preinitialising should not cause a performance penalty.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
3e87da5145 knfsd: Remove dead code from nfsd_cache_lookup
The preallocated cache entry is always set to type RC_NOCACHE, and that
type isn't changed until we later call nfsd_cache_update().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
a6482733bc NFS: Fix up a typo in nfs_dns_ent_put
call_rcu() needs to take a first argument of type (struct rcu_head *).

Fixes: fd497f1e40d9 ("NFS: Lockless DNS lookups")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
437f914513 NFS: Lockless DNS lookups
Enable RCU protected lookup in the legacy DNS resolver.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
9d5afd9491 knfsd: Lockless lookup of NFSv4 identities.
Enable RCU protected lookups of the NFSv4 idmap.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
9ceddd9da1 knfsd: Allow lockless lookups of the exports
Convert structs svc_expkey and svc_export to allow RCU protected lookups.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Linus Torvalds
e64433d587 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlvW43UACgkQnJ2qBz9k
 QNnqtQgA2uzRlz6U7UayQl9JiYKd2XbuojmAE+irdSL5+4OzpqkOsRfLzGSKAfvs
 ekv1eVv+4+PS90FUNbvwmX/OzZ9wi3e5d3/qfnJ7l2ZsMfKuc9aW/9I8EXPXkpAB
 O7NgoTtOZMXTJXMhseMmha2JfpbQZZ566NzCDfhOfPKqylbjTEM58pcY382VGRDX
 Iv1DNwrzPw7PaOOYO3P/vWLeb4GGpMkdG61eoTBMi6SKb/5QMc6MS+WqYzmdLZWE
 aP4tK8VhC0L47i0myXzWOHMrjQysq+E24CuQ6zG2O4bFRZj1fT+hiST9SwyUim2+
 Ne8P5gnHJiBSZgYtKBoETNI0jORzCA==
 =9DMi
 -----END PGP SIGNATURE-----

Merge tag 'filesystems_for_v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext2 and udf updates from Jan Kara:
 "Small ext2 cleanups and a couple of udf fixes"

* tag 'filesystems_for_v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: remove redundant building macro check
  udf: Drop pack pragma from udf_sb.h
  udf: Drop freed bitmap / table support
  udf: Fix crash during mount
  udf: Prevent write-unsupported filesystem to be remounted read-write
  ext2: cache NULL when both default_acl and acl are NULL
  udf: remove unused variables group_start and nr_groups
2018-10-29 10:23:36 -07:00
Linus Torvalds
79257514f5 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlvWyDMACgkQnJ2qBz9k
 QNnifgf+PXybPXX3KxtRUmK4u2zX2JMTwzuE0wmLxM6I08tf7rzLrBIbOY7iXka/
 nzW6IK+KnA5HtPTEUbxqNBAvWpUAvPLZ/v20d0t/QTMJcz8yfhpvM9O2mjQAGMH8
 EBmjjEhZaso8uOIAPhUg9um1QdQoYWa329fsoQuHor9kjKmDg+3RmtdH0jbRzQ6B
 RNAY1WNFbm+7MH7Fu3AB/jLqqkwZhoPcu7TwXP6m+va6xAvzEYUOQQB9rPEIaY2Z
 +q0B9LhwFIAnWPCI7dxw3CBTndoR2u1vkpnGw5FFhJgnMG4L1QMPoCCYPIZEIXg/
 VuGZQ0/mayCtO+JWw+VDJF3jQFrHxA==
 =J6tx
 -----END PGP SIGNATURE-----

Merge tag 'for_v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "Amir's patches to implement superblock fanotify watches, Xiaoming's
  patch to enable reporting of thread IDs in fanotify events instead of
  TGIDs (sadly the patch got mis-attributed to Amir and I've noticed
  only now), and a fix of possible oops on umount caused by fsnotify
  infrastructure"

* tag 'for_v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Fix busy inodes during unmount
  fs: group frequently accessed fields of struct super_block together
  fanotify: support reporting thread id instead of process id
  fanotify: add BUILD_BUG_ON() to count the bits of fanotify constants
  fsnotify: convert runtime BUG_ON() to BUILD_BUG_ON()
  fanotify: deprecate uapi FAN_ALL_* constants
  fanotify: simplify handling of FAN_ONDIR
  fsnotify: generalize handling of extra event flags
  fanotify: fix collision of internal and uapi mark flags
  fanotify: store fanotify_init() flags in group's fanotify_data
  fanotify: add API to attach/detach super block mark
  fsnotify: send path type events to group with super block marks
  fsnotify: add super block object type
2018-10-29 09:19:53 -07:00
Linus Torvalds
7da4221b53 Pull request for inclusion in 4.20
* Finish removing the custom 9p request cache mechanism
  * Embed part of the fcall in the request to have better slab
 performance (msize usually is power of two aligned)
  * syzkaller fixes:
   - add a refcount to 9p requests to avoid use after free
   - a few double free issues
  * A few coverity fixes
  * Some old patches that were in the bugzilla:
   - do not trust pdu content for size header
   - mount option for lock retry interval
 
 ----------------------------------------------------------------
 Dan Carpenter (1):
       9p: potential NULL dereference
 
 Dinu-Razvan Chis-Serban (1):
       9p locks: add mount option for lock retry interval
 
 Dominique Martinet (12):
       9p/xen: fix check for xenbus_read error in front_probe
       v9fs_dir_readdir: fix double-free on p9stat_read error
       9p: clear dangling pointers in p9stat_free
       9p: embed fcall in req to round down buffer allocs
       9p: add a per-client fcall kmem_cache
       9p/rdma: do not disconnect on down_interruptible EAGAIN
       9p: acl: fix uninitialized iattr access
       9p/rdma: remove useless check in cm_event_handler
       9p: p9dirent_read: check network-provided name length
       9p locks: fix glock.client_id leak in do_lock
       9p/trans_fd: abort p9_read_work if req status changed
       9p/trans_fd: put worker reqs on destroy
 
 Gertjan Halkes (1):
       9p: do not trust pdu content for stat item size
 
 Gustavo A. R. Silva (1):
       9p: fix spelling mistake in fall-through annotation
 
 Matthew Wilcox (2):
       9p: Use a slab for allocating requests
       9p: Remove p9_idpool
 
 Tomas Bortoli (3):
       9p: rename p9_free_req() function
       9p: Add refcount to p9_req_t
       9p: Rename req to rreq in trans_fd
 
  fs/9p/acl.c             |   2 +-
  fs/9p/v9fs.c            |  21 +++++
  fs/9p/v9fs.h            |   1 +
  fs/9p/vfs_dir.c         |  19 +---
  fs/9p/vfs_file.c        |  24 +++++-
  include/net/9p/9p.h     |  12 +--
  include/net/9p/client.h |  71 ++++++---------
  net/9p/Makefile         |   1 -
  net/9p/client.c         | 551 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------
  net/9p/mod.c            |   9 +-
  net/9p/protocol.c       |  20 ++++-
  net/9p/trans_fd.c       |  64 +++++++++-----
  net/9p/trans_rdma.c     |  37 ++++----
  net/9p/trans_virtio.c   |  44 +++++++---
  net/9p/trans_xen.c      |  17 ++--
  net/9p/util.c           | 140 ------------------------------
  16 files changed, 482 insertions(+), 551 deletions(-)
  delete mode 100644 net/9p/util.c
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAlvWYE0ACgkQq06b7GqY
 5nBBDxAAkz4rGRsnRpx9D1Bt+81eARTqsWcoPM138zEbwiJehe5aSF8x/CGyirDW
 +PoES/efho50sDfaUL9eBMgyy95UMN7LepMTcjNE1tywA+tRsH3dOL4757+lBwZc
 FGsHvfse4b5ZzmMxSo2L5KbzpNiUEXCJGEvNlBSbeWcSSUtCfOShMVx9IC6r7JhX
 pbsoczQ6W+384V4WgBxBZZf99EryxSeYJ9KMU+9b13ndzKICJqYeaEdItt28uYUC
 d05iXDEDQZD3B8R/1Y08ktMqmHLXP9kZdJ/w8HhCMRAxWKUD1pfDwJ45gIwhCb5R
 KxdxZ2tPvc8ihGngNJ4VaJoQQFTVzDgbyWXjwAFYkFUuPFgmuAgx3qApOhUFHCdM
 8vgJd2u0attNwfOM3VsJApLQu09wmRw1LcmJ9re5OJ/YLiAUF3vw2oTpGhGlKRRI
 bMBI9rwXHzeoVI7ank5mh1NMVRpMbzL9A8uqux85gcpXL36V3XmZtKRDe1AAa5jB
 JR6dciAUABm9m+Ilt1Pl/faSjmysUOb1wZ0XU/cvVy5EBMUjbHv3x0TuDmrVwDas
 puUXEM8soVf9e/NUYqTOozwFIle6KFcyPmae/OopATLgkGES4cjDy9cLh5SgnERM
 vMcN/Z+DQO98zM/fPRy2B4qklflgTKI6VI/gSLRCWGpn49Iwhhg=
 =g1OF
 -----END PGP SIGNATURE-----

Merge tag '9p-for-4.20' of git://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
 "Highlights this time around are the end of Matthew's work to remove
  the custom 9p request cache and use a slab directly for requests, with
  some extra patches on my end to not degrade performance, but it's a
  very good cleanup.

  Tomas and I fixed a few more syzkaller bugs (refcount is the big one),
  and I had a go at the coverity bugs and at some of the bugzilla
  reports we had open for a while.

  I'm a bit disappointed that I couldn't get much reviews for a few of
  my own patches, but the big ones got some and it's all been soaking in
  linux-next for quite a while so I think it should be OK.

  Summary:

   - Finish removing the custom 9p request cache mechanism

   - Embed part of the fcall in the request to have better slab
     performance (msize usually is power of two aligned)

   - syzkaller fixes:
      * add a refcount to 9p requests to avoid use after free
      * a few double free issues

   - A few coverity fixes

   - Some old patches that were in the bugzilla:
      * do not trust pdu content for size header
      * mount option for lock retry interval"

* tag '9p-for-4.20' of git://github.com/martinetd/linux: (21 commits)
  9p/trans_fd: put worker reqs on destroy
  9p/trans_fd: abort p9_read_work if req status changed
  9p: potential NULL dereference
  9p locks: fix glock.client_id leak in do_lock
  9p: p9dirent_read: check network-provided name length
  9p/rdma: remove useless check in cm_event_handler
  9p: acl: fix uninitialized iattr access
  9p locks: add mount option for lock retry interval
  9p: do not trust pdu content for stat item size
  9p: Rename req to rreq in trans_fd
  9p: fix spelling mistake in fall-through annotation
  9p/rdma: do not disconnect on down_interruptible EAGAIN
  9p: Add refcount to p9_req_t
  9p: rename p9_free_req() function
  9p: add a per-client fcall kmem_cache
  9p: embed fcall in req to round down buffer allocs
  9p: Remove p9_idpool
  9p: Use a slab for allocating requests
  9p: clear dangling pointers in p9stat_free
  v9fs_dir_readdir: fix double-free on p9stat_read error
  ...
2018-10-29 09:09:47 -07:00
Linus Torvalds
dad4f140ed Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax
Pull XArray conversion from Matthew Wilcox:
 "The XArray provides an improved interface to the radix tree data
  structure, providing locking as part of the API, specifying GFP flags
  at allocation time, eliminating preloading, less re-walking the tree,
  more efficient iterations and not exposing RCU-protected pointers to
  its users.

  This patch set

   1. Introduces the XArray implementation

   2. Converts the pagecache to use it

   3. Converts memremap to use it

  The page cache is the most complex and important user of the radix
  tree, so converting it was most important. Converting the memremap
  code removes the only other user of the multiorder code, which allows
  us to remove the radix tree code that supported it.

  I have 40+ followup patches to convert many other users of the radix
  tree over to the XArray, but I'd like to get this part in first. The
  other conversions haven't been in linux-next and aren't suitable for
  applying yet, but you can see them in the xarray-conv branch if you're
  interested"

* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
  radix tree: Remove multiorder support
  radix tree test: Convert multiorder tests to XArray
  radix tree tests: Convert item_delete_rcu to XArray
  radix tree tests: Convert item_kill_tree to XArray
  radix tree tests: Move item_insert_order
  radix tree test suite: Remove multiorder benchmarking
  radix tree test suite: Remove __item_insert
  memremap: Convert to XArray
  xarray: Add range store functionality
  xarray: Move multiorder_check to in-kernel tests
  xarray: Move multiorder_shrink to kernel tests
  xarray: Move multiorder account test in-kernel
  radix tree test suite: Convert iteration test to XArray
  radix tree test suite: Convert tag_tagged_items to XArray
  radix tree: Remove radix_tree_clear_tags
  radix tree: Remove radix_tree_maybe_preload_order
  radix tree: Remove split/join code
  radix tree: Remove radix_tree_update_node_t
  page cache: Finish XArray conversion
  dax: Convert page fault handlers to XArray
  ...
2018-10-28 11:35:40 -07:00
Linus Torvalds
345671ea0f Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc things

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
  hugetlbfs: dirty pages as they are added to pagecache
  mm: export add_swap_extent()
  mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
  tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE
  mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page()
  mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
  mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
  mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
  mm/gup: cache dev_pagemap while pinning pages
  Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"
  mm: return zero_resv_unavail optimization
  mm: zero remaining unavailable struct pages
  tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option
  tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option
  tools/testing/selftests/vm/gup_benchmark.c: allow user specified file
  tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage
  mm/gup_benchmark.c: add additional pinning methods
  mm/gup_benchmark.c: time put_page()
  mm: don't raise MEMCG_OOM event due to failed high-order allocation
  mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
  ...
2018-10-26 19:33:41 -07:00
Johannes Weiner
4b85afbdac mm: zero-seek shrinkers
The page cache and most shrinkable slab caches hold data that has been
read from disk, but there are some caches that only cache CPU work, such
as the dentry and inode caches of procfs and sysfs, as well as the subset
of radix tree nodes that track non-resident page cache.

Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for
the shrinker's seeks setting tells the reclaim algorithm that for every
two page cache pages scanned it should scan one slab object.

This is a bogus setting.  A virtual inode that required no IO to create is
not twice as valuable as a page cache page; shadow cache entries with
eviction distances beyond the size of memory aren't either.

In most cases, the behavior in practice is still fine.  Such virtual
caches don't tend to grow and assert themselves aggressively, and usually
get picked up before they cause problems.  But there are scenarios where
that's not true.

Our database workloads suffer from two of those.  For one, their file
workingset is several times bigger than available memory, which has the
kernel aggressively create shadow page cache entries for the non-resident
parts of it.  The workingset code does tell the VM that most of these are
expendable, but the VM ends up balancing them 2:1 to cache pages as per
the seeks setting.  This is a huge waste of memory.

These workloads also deal with tens of thousands of open files and use
/proc for introspection, which ends up growing the proc_inode_cache to
absurdly large sizes - again at the cost of valuable cache space, which
isn't a reasonable trade-off, given that proc inodes can be re-created
without involving the disk.

This patch implements a "zero-seek" setting for shrinkers that results in
a target ratio of 0:1 between their objects and IO-backed caches.  This
allows such virtual caches to grow when memory is available (they do
cache/avoid CPU work after all), but effectively disables them as soon as
IO-backed objects are under pressure.

It then switches the shrinkers for procfs and sysfs metadata, as well as
excess page cache shadow nodes, to the new zero-seek setting.

Link: http://lkml.kernel.org/r/20181009184732.762-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Domas Mituzas <dmituzas@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:33 -07:00
Johannes Weiner
8508cf3ffa sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:32 -07:00
Vlastimil Babka
61f94e18de mm, proc: add KReclaimable to /proc/meminfo
The vmstat NR_KERNEL_MISC_RECLAIMABLE counter is for kernel non-slab
allocations that can be reclaimed via shrinker.  In /proc/meminfo, we can
show the sum of all reclaimable kernel allocations (including slab) as
"KReclaimable".  Add the same counter also to per-node meminfo under /sys

With this counter, users will have more complete information about kernel
memory usage.  Non-slab reclaimable pages (currently just the ION
allocator) will not be missing from /proc/meminfo, making users wonder
where part of their memory went.  More precisely, they already appear in
MemAvailable, but without the new counter, it's not obvious why the value
in MemAvailable doesn't fully correspond with the sum of other counters
participating in it.

Link: http://lkml.kernel.org/r/20180731090649.16028-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:32 -07:00
Vlastimil Babka
2e03b4bc4a dcache: allocate external names from reclaimable kmalloc caches
We can use the newly introduced kmalloc-reclaimable-X caches, to allocate
external names in dcache, which will take care of the proper accounting
automatically, and also improve anti-fragmentation page grouping.

This effectively reverts commit f1782c9bc5 ("dcache: account external
names as indirectly reclaimable memory") and instead passes
__GFP_RECLAIMABLE to kmalloc().  The accounting thus moves from
NR_INDIRECTLY_RECLAIMABLE_BYTES to NR_SLAB_RECLAIMABLE, which is also
considered in MemAvailable calculation and overcommit decisions.

Link: http://lkml.kernel.org/r/20180731090649.16028-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:32 -07:00
Nicolas Pitre
7f2764cfbd cramfs: convert to use vmf_insert_mixed
cramfs is the only remaining user of vm_insert_mixed() and should be
converted to vmf_insert_mixed().

Based on a previous patch from Matthew Wilcox.

Link: http://lkml.kernel.org/r/nycvar.YSQ.7.76.1808290945450.10215@knanqh.ubzr
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>a
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:19 -07:00
Souptick Joarder
5780a02fd1 fs/iomap.c: change return type to vm_fault_t
Change iomap_page_mkwrite() return type to vm_fault_t.

see commit 1c8f422059 ("mm: change return type to vm_fault_t") for
reference.

Link: http://lkml.kernel.org/r/20180827172050.GA18673@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
YueHaibing
867632d6a6 ocfs2: remove set but not used variable 'rb'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/ocfs2/refcounttree.c: In function 'ocfs2_create_reflink_node':
fs/ocfs2/refcounttree.c:4138:31: warning:
 variable 'rb' set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1536198443-113047-1-git-send-email-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Jia-Ju Bai
999865764f fs/ocfs2/dlm/dlmdebug.c: fix a sleep-in-atomic-context bug in dlm_print_one_mle()
The kernel module may sleep with holding a spinlock.

The function call paths (from bottom to top) in Linux-4.16 are:

[FUNC] get_zeroed_page(GFP_NOFS)
fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle
fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle
fs/ocfs2/dlm/dlmmaster.c, 255: __dlm_put_mle in dlm_put_mle
fs/ocfs2/dlm/dlmmaster.c, 254: spin_lock in dlm_put_ml

[FUNC] get_zeroed_page(GFP_NOFS)
fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle
fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle
fs/ocfs2/dlm/dlmmaster.c, 222: __dlm_put_mle in dlm_put_mle_inuse
fs/ocfs2/dlm/dlmmaster.c, 219: spin_lock in dlm_put_mle_inuse

To fix this bug, GFP_NOFS is replaced with GFP_ATOMIC.

This bug is found by my static analysis tool DSAC.

Link: http://lkml.kernel.org/r/20180901112528.27025-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Ding Xiang
0ae1c2dbdc ocfs2: remove unneeded null check
Null check for kfree is unnecessary, so remove it.

Link: http://lkml.kernel.org/r/1535704514-26559-1-git-send-email-dingxiang@cmss.chinamobile.com
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Colin Ian King
2de24cb742 ocfs2: remove unused pointer 'eb'
Pointer 'eb' is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
warning: variable 'eb' set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/20180828141907.10826-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Nathan Chancellor
32c1b90dcd ocfs2/dlm: remove unnecessary parentheses
Clang warns when more than one set of parentheses is used for a
single conditional statement:

fs/ocfs2/dlm/dlmthread.c:534:18: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
        if ((res->owner == dlm->node_num)) {
             ~~~~~~~~~~~^~~~~~~~~~~~~~~~
fs/ocfs2/dlm/dlmthread.c:534:18: note: remove extraneous parentheses around the
      comparison to silence this warning
        if ((res->owner == dlm->node_num)) {
            ~           ^               ~

Link: http://lkml.kernel.org/r/20180924181929.6853-1-natechancellor@gmail.com
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Christoph Hellwig
ae62c16e10 userfaultfd: disable irqs when taking the waitqueue lock
userfaultfd contains howe-grown locking of the waitqueue lock, and does
not disable interrupts.  This relies on the fact that no one else takes it
from interrupt context and violates an invariat of the normal waitqueue
locking scheme.  With aio poll it is easy to trigger other locks that
disable interrupts (or are called from interrupt context).

Link: http://lkml.kernel.org/r/20181018154101.18750-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[4.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Vlastimil Babka
fa76da461b mm: /proc/pid/smaps_rollup: fix NULL pointer deref in smaps_pte_range()
Leonardo reports an apparent regression in 4.19-rc7:

 BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 3 PID: 6032 Comm: python Not tainted 4.19.0-041900rc7-lowlatency #201810071631
 Hardware name: LENOVO 80UG/Toronto 4A2, BIOS 0XCN45WW 08/09/2018
 RIP: 0010:smaps_pte_range+0x32d/0x540
 Code: 80 00 00 00 00 74 a9 48 89 de 41 f6 40 52 40 0f 85 04 02 00 00 49 2b 30 48 c1 ee 0c 49 03 b0 98 00 00 00 49 8b 80 a0 00 00 00 <48> 8b b8 f0 00 00 00 e8 b7 ef ec ff 48 85 c0 0f 84 71 ff ff ff a8
 RSP: 0018:ffffb0cbc484fb88 EFLAGS: 00010202
 RAX: 0000000000000000 RBX: 0000560ddb9e9000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000560ddb9e9 RDI: 0000000000000001
 RBP: ffffb0cbc484fbc0 R08: ffff94a5a227a578 R09: ffff94a5a227a578
 R10: 0000000000000000 R11: 0000560ddbbe7000 R12: ffffe903098ba728
 R13: ffffb0cbc484fc78 R14: ffffb0cbc484fcf8 R15: ffff94a5a2e9cf48
 FS:  00007f6dfb683740(0000) GS:ffff94a5aaf80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000000f0 CR3: 000000011c118001 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  __walk_page_range+0x3c2/0x6f0
  walk_page_vma+0x42/0x60
  smap_gather_stats+0x79/0xe0
  ? gather_pte_stats+0x320/0x320
  ? gather_hugetlb_stats+0x70/0x70
  show_smaps_rollup+0xcd/0x1c0
  seq_read+0x157/0x400
  __vfs_read+0x3a/0x180
  ? security_file_permission+0x93/0xc0
  ? security_file_permission+0x93/0xc0
  vfs_read+0x8f/0x140
  ksys_read+0x55/0xc0
  __x64_sys_read+0x1a/0x20
  do_syscall_64+0x5a/0x110
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Decoded code matched to local compilation+disassembly points to
smaps_pte_entry():

        } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
                                                        && pte_none(*pte))) {
                page = find_get_entry(vma->vm_file->f_mapping,
                                                linear_page_index(vma, addr));

Here, vma->vm_file is NULL.  mss->check_shmem_swap should be false in that
case, however for smaps_rollup, smap_gather_stats() can set the flag true
for one vma and leave it true for subsequent vma's where it should be
false.

To fix, reset the check_shmem_swap flag to false.  There's also related
bug which sets mss->swap to shmem_swapped, which in the context of
smaps_rollup overwrites any value accumulated from previous vma's.  Fix
that as well.

Note that the report suggests a regression between 4.17.19 and 4.19-rc7,
which makes the 4.19 series ending with commit 258f669e7e ("mm:
/proc/pid/smaps_rollup: convert to single value seq_file") suspicious.
But the mss was reused for rollup since 493b0e9d94 ("mm: add
/proc/pid/smaps_rollup") so let's play it safe with the stable backport.

Link: http://lkml.kernel.org/r/555fbd1f-4ac9-0b58-dcd4-5dc4380ff7ca@suse.cz
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201377
Fixes: 493b0e9d94 ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Leonardo Soares Müller <leozinho29_eu@hotmail.com>
Tested-by: Leonardo Soares Müller <leozinho29_eu@hotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Chengguang Xu
14fa085640 ovl: using posix_acl_xattr_size() to get size instead of posix_acl_to_xattr()
There is no functional change but it seems better to get size by calling
posix_acl_xattr_size() instead of calling posix_acl_to_xattr() with
NULL buffer argument. Additionally, remove unnecessary assignments.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-26 23:34:40 +02:00
Amir Goldstein
1e92e3072c ovl: abstract ovl_inode lock with a helper
The abstraction improves code readabilty (to some).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-26 23:34:40 +02:00