The first two changes that involve files outside of fs/ext4:

- submit_bh() can never return an error, so change it to return void,
   and remove the unused checks from its callers
 
 - fix I_DIRTY_TIME handling so it will be set even if the inode
   already has I_DIRTY_INODE
 
 Performance:
 
 - Always enable i_version counter (as btrfs and xfs already do).
   Remove some uneeded i_version bumps to avoid unnecessary nfs cache
   invalidations.
 
 - Wake up journal waters in FIFO order, to avoid some journal users
   from not getting a journal handle for an unfairly long time.
 
 - In ext4_write_begin() allocate any necessary buffer heads before
   starting the journal handle.
 
 - Don't try to prefetch the block allocation bitmaps for a read-only
   file system.
 
 Bug Fixes:
 
 - Fix a number of fast commit bugs, including resources leaks and out
   of bound references in various error handling paths and/or if the fast
   commit log is corrupted.
 
 - Avoid stopping the online resize early when expanding a file system
   which is less than 16TiB to a size greater than 16TiB.
 
 - Fix apparent metadata corruption caused by a race with a metadata
   buffer head getting migrated while it was trying to be read.
 
 - Mark the lazy initialization thread freezable to prevent suspend
   failures.
 
 - Other miscellaneous bug fixes.
 
 Cleanups:
 
 - Break up the incredibly long ext4_full_super() function by
   refactoring to move code into more understandable, smaller
   functions.
 
 - Remove the deprecated (and ignored) noacl and nouser_attr mount
   option.
 
 - Factor out some common code in fast commit handling.
 
 - Other miscellaneous cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmM8/2gACgkQ8vlZVpUN
 gaPohAf9GDMUq3QIYoWLlJ+ygJhL0xQGPfC6sypMjHaUO5GSo+1+sAMU3JBftxUS
 LrgTtmzSKzwp9PyOHNs+mswUzhLZivKVCLMmOznQUZS228GSVKProhN1LPL4UP2Q
 Ks8i1M5XTWS+mtJ5J5Mw6jRHxcjfT6ynyJKPnIWKTwXyeru1WSJ2PWqtWQD4EZkE
 lImECy0jX/zlK02s0jDYbNIbXIvI/TTYi7wT8o1ouLCAXMDv5gJRc5TXCVtX8i59
 /Pl9rGG/+IWTnYT/aQ668S2g0Cz6Wyv2EkmiPUW0Y8NoLaaouBYZoC2hDujiv+l1
 ucEI14TEQ+DojJTdChrtwKqgZfqDOw==
 =xoLC
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The first two changes involve files outside of fs/ext4:

   - submit_bh() can never return an error, so change it to return void,
     and remove the unused checks from its callers

   - fix I_DIRTY_TIME handling so it will be set even if the inode
     already has I_DIRTY_INODE

  Performance:

   - Always enable i_version counter (as btrfs and xfs already do).
     Remove some uneeded i_version bumps to avoid unnecessary nfs cache
     invalidations

   - Wake up journal waiters in FIFO order, to avoid some journal users
     from not getting a journal handle for an unfairly long time

   - In ext4_write_begin() allocate any necessary buffer heads before
     starting the journal handle

   - Don't try to prefetch the block allocation bitmaps for a read-only
     file system

  Bug Fixes:

   - Fix a number of fast commit bugs, including resources leaks and out
     of bound references in various error handling paths and/or if the
     fast commit log is corrupted

   - Avoid stopping the online resize early when expanding a file system
     which is less than 16TiB to a size greater than 16TiB

   - Fix apparent metadata corruption caused by a race with a metadata
     buffer head getting migrated while it was trying to be read

   - Mark the lazy initialization thread freezable to prevent suspend
     failures

   - Other miscellaneous bug fixes

  Cleanups:

   - Break up the incredibly long ext4_full_super() function by
     refactoring to move code into more understandable, smaller
     functions

   - Remove the deprecated (and ignored) noacl and nouser_attr mount
     option

   - Factor out some common code in fast commit handling

   - Other miscellaneous cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (53 commits)
  ext4: fix potential out of bound read in ext4_fc_replay_scan()
  ext4: factor out ext4_fc_get_tl()
  ext4: introduce EXT4_FC_TAG_BASE_LEN helper
  ext4: factor out ext4_free_ext_path()
  ext4: remove unnecessary drop path references in mext_check_coverage()
  ext4: update 'state->fc_regions_size' after successful memory allocation
  ext4: fix potential memory leak in ext4_fc_record_regions()
  ext4: fix potential memory leak in ext4_fc_record_modified_inode()
  ext4: remove redundant checking in ext4_ioctl_checkpoint
  jbd2: add miss release buffer head in fc_do_one_pass()
  ext4: move DIOREAD_NOLOCK setting to ext4_set_def_opts()
  ext4: remove useless local variable 'blocksize'
  ext4: unify the ext4 super block loading operation
  ext4: factor out ext4_journal_data_mode_check()
  ext4: factor out ext4_load_and_init_journal()
  ext4: factor out ext4_group_desc_init() and ext4_group_desc_free()
  ext4: factor out ext4_geometry_check()
  ext4: factor out ext4_check_feature_compatibility()
  ext4: factor out ext4_init_metadata_csum()
  ext4: factor out ext4_encoding_init()
  ...
This commit is contained in:
Linus Torvalds 2022-10-06 17:45:53 -07:00
commit bc32a6330f
27 changed files with 993 additions and 806 deletions

View file

@ -274,6 +274,9 @@ or bottom half).
This is specifically for the inode itself being marked dirty, This is specifically for the inode itself being marked dirty,
not its data. If the update needs to be persisted by fdatasync(), not its data. If the update needs to be persisted by fdatasync(),
then I_DIRTY_DATASYNC will be set in the flags argument. then I_DIRTY_DATASYNC will be set in the flags argument.
I_DIRTY_TIME will be set in the flags in case lazytime is enabled
and struct inode has times updated since the last ->dirty_inode
call.
``write_inode`` ``write_inode``
this method is called when the VFS needs to write an inode to this method is called when the VFS needs to write an inode to

View file

@ -52,8 +52,8 @@
#include "internal.h" #include "internal.h"
static int fsync_buffers_list(spinlock_t *lock, struct list_head *list); static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static int submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh, static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
struct writeback_control *wbc); struct writeback_control *wbc);
#define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers) #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
@ -2673,8 +2673,8 @@ static void end_bio_bh_io_sync(struct bio *bio)
bio_put(bio); bio_put(bio);
} }
static int submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh, static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
struct writeback_control *wbc) struct writeback_control *wbc)
{ {
const enum req_op op = opf & REQ_OP_MASK; const enum req_op op = opf & REQ_OP_MASK;
struct bio *bio; struct bio *bio;
@ -2717,12 +2717,11 @@ static int submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
} }
submit_bio(bio); submit_bio(bio);
return 0;
} }
int submit_bh(blk_opf_t opf, struct buffer_head *bh) void submit_bh(blk_opf_t opf, struct buffer_head *bh)
{ {
return submit_bh_wbc(opf, bh, NULL); submit_bh_wbc(opf, bh, NULL);
} }
EXPORT_SYMBOL(submit_bh); EXPORT_SYMBOL(submit_bh);
@ -2801,8 +2800,6 @@ EXPORT_SYMBOL(write_dirty_buffer);
*/ */
int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags) int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags)
{ {
int ret = 0;
WARN_ON(atomic_read(&bh->b_count) < 1); WARN_ON(atomic_read(&bh->b_count) < 1);
lock_buffer(bh); lock_buffer(bh);
if (test_clear_buffer_dirty(bh)) { if (test_clear_buffer_dirty(bh)) {
@ -2817,14 +2814,14 @@ int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags)
get_bh(bh); get_bh(bh);
bh->b_end_io = end_buffer_write_sync; bh->b_end_io = end_buffer_write_sync;
ret = submit_bh(REQ_OP_WRITE | op_flags, bh); submit_bh(REQ_OP_WRITE | op_flags, bh);
wait_on_buffer(bh); wait_on_buffer(bh);
if (!ret && !buffer_uptodate(bh)) if (!buffer_uptodate(bh))
ret = -EIO; return -EIO;
} else { } else {
unlock_buffer(bh); unlock_buffer(bh);
} }
return ret; return 0;
} }
EXPORT_SYMBOL(__sync_dirty_buffer); EXPORT_SYMBOL(__sync_dirty_buffer);

View file

@ -3592,9 +3592,6 @@ extern bool empty_inline_dir(struct inode *dir, int *has_inline_data);
extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode, extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
struct ext4_dir_entry_2 **parent_de, struct ext4_dir_entry_2 **parent_de,
int *retval); int *retval);
extern int ext4_inline_data_fiemap(struct inode *inode,
struct fiemap_extent_info *fieinfo,
int *has_inline, __u64 start, __u64 len);
extern void *ext4_read_inline_link(struct inode *inode); extern void *ext4_read_inline_link(struct inode *inode);
struct iomap; struct iomap;
@ -3713,7 +3710,7 @@ extern int ext4_ext_insert_extent(handle_t *, struct inode *,
extern struct ext4_ext_path *ext4_find_extent(struct inode *, ext4_lblk_t, extern struct ext4_ext_path *ext4_find_extent(struct inode *, ext4_lblk_t,
struct ext4_ext_path **, struct ext4_ext_path **,
int flags); int flags);
extern void ext4_ext_drop_refs(struct ext4_ext_path *); extern void ext4_free_ext_path(struct ext4_ext_path *);
extern int ext4_ext_check_inode(struct inode *inode); extern int ext4_ext_check_inode(struct inode *inode);
extern ext4_lblk_t ext4_ext_next_allocated_block(struct ext4_ext_path *path); extern ext4_lblk_t ext4_ext_next_allocated_block(struct ext4_ext_path *path);
extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,

View file

@ -106,6 +106,25 @@ static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped)
return 0; return 0;
} }
static void ext4_ext_drop_refs(struct ext4_ext_path *path)
{
int depth, i;
if (!path)
return;
depth = path->p_depth;
for (i = 0; i <= depth; i++, path++) {
brelse(path->p_bh);
path->p_bh = NULL;
}
}
void ext4_free_ext_path(struct ext4_ext_path *path)
{
ext4_ext_drop_refs(path);
kfree(path);
}
/* /*
* Make sure 'handle' has at least 'check_cred' credits. If not, restart * Make sure 'handle' has at least 'check_cred' credits. If not, restart
* transaction with 'restart_cred' credits. The function drops i_data_sem * transaction with 'restart_cred' credits. The function drops i_data_sem
@ -636,8 +655,7 @@ int ext4_ext_precache(struct inode *inode)
ext4_set_inode_state(inode, EXT4_STATE_EXT_PRECACHED); ext4_set_inode_state(inode, EXT4_STATE_EXT_PRECACHED);
out: out:
up_read(&ei->i_data_sem); up_read(&ei->i_data_sem);
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
return ret; return ret;
} }
@ -724,19 +742,6 @@ static void ext4_ext_show_move(struct inode *inode, struct ext4_ext_path *path,
#define ext4_ext_show_move(inode, path, newblock, level) #define ext4_ext_show_move(inode, path, newblock, level)
#endif #endif
void ext4_ext_drop_refs(struct ext4_ext_path *path)
{
int depth, i;
if (!path)
return;
depth = path->p_depth;
for (i = 0; i <= depth; i++, path++) {
brelse(path->p_bh);
path->p_bh = NULL;
}
}
/* /*
* ext4_ext_binsearch_idx: * ext4_ext_binsearch_idx:
* binary search for the closest index of the given block * binary search for the closest index of the given block
@ -955,8 +960,7 @@ ext4_find_extent(struct inode *inode, ext4_lblk_t block,
return path; return path;
err: err:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
if (orig_path) if (orig_path)
*orig_path = NULL; *orig_path = NULL;
return ERR_PTR(ret); return ERR_PTR(ret);
@ -2174,8 +2178,7 @@ merge:
err = ext4_ext_dirty(handle, inode, path + path->p_depth); err = ext4_ext_dirty(handle, inode, path + path->p_depth);
cleanup: cleanup:
ext4_ext_drop_refs(npath); ext4_free_ext_path(npath);
kfree(npath);
return err; return err;
} }
@ -3061,8 +3064,7 @@ again:
} }
} }
out: out:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
path = NULL; path = NULL;
if (err == -EAGAIN) if (err == -EAGAIN)
goto again; goto again;
@ -4375,8 +4377,7 @@ got_allocated_blocks:
allocated = map->m_len; allocated = map->m_len;
ext4_ext_show_leaf(inode, path); ext4_ext_show_leaf(inode, path);
out: out:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
trace_ext4_ext_map_blocks_exit(inode, flags, map, trace_ext4_ext_map_blocks_exit(inode, flags, map,
err ? err : allocated); err ? err : allocated);
@ -5245,8 +5246,7 @@ again:
break; break;
} }
out: out:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
return ret; return ret;
} }
@ -5538,15 +5538,13 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
EXT4_GET_BLOCKS_METADATA_NOFAIL); EXT4_GET_BLOCKS_METADATA_NOFAIL);
} }
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
if (ret < 0) { if (ret < 0) {
up_write(&EXT4_I(inode)->i_data_sem); up_write(&EXT4_I(inode)->i_data_sem);
goto out_stop; goto out_stop;
} }
} else { } else {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
} }
ret = ext4_es_remove_extent(inode, offset_lblk, ret = ext4_es_remove_extent(inode, offset_lblk,
@ -5766,10 +5764,8 @@ ext4_swap_extents(handle_t *handle, struct inode *inode1,
count -= len; count -= len;
repeat: repeat:
ext4_ext_drop_refs(path1); ext4_free_ext_path(path1);
kfree(path1); ext4_free_ext_path(path2);
ext4_ext_drop_refs(path2);
kfree(path2);
path1 = path2 = NULL; path1 = path2 = NULL;
} }
return replaced_count; return replaced_count;
@ -5848,8 +5844,7 @@ int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu)
} }
out: out:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
return err ? err : mapped; return err ? err : mapped;
} }
@ -5916,8 +5911,7 @@ int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
ret = ext4_ext_dirty(NULL, inode, &path[path->p_depth]); ret = ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
up_write(&EXT4_I(inode)->i_data_sem); up_write(&EXT4_I(inode)->i_data_sem);
out: out:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
ext4_mark_inode_dirty(NULL, inode); ext4_mark_inode_dirty(NULL, inode);
return ret; return ret;
} }
@ -5935,8 +5929,7 @@ void ext4_ext_replay_shrink_inode(struct inode *inode, ext4_lblk_t end)
return; return;
ex = path[path->p_depth].p_ext; ex = path[path->p_depth].p_ext;
if (!ex) { if (!ex) {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
ext4_mark_inode_dirty(NULL, inode); ext4_mark_inode_dirty(NULL, inode);
return; return;
} }
@ -5949,8 +5942,7 @@ void ext4_ext_replay_shrink_inode(struct inode *inode, ext4_lblk_t end)
ext4_ext_dirty(NULL, inode, &path[path->p_depth]); ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
up_write(&EXT4_I(inode)->i_data_sem); up_write(&EXT4_I(inode)->i_data_sem);
ext4_mark_inode_dirty(NULL, inode); ext4_mark_inode_dirty(NULL, inode);
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
} }
} }
@ -5989,13 +5981,11 @@ int ext4_ext_replay_set_iblocks(struct inode *inode)
return PTR_ERR(path); return PTR_ERR(path);
ex = path[path->p_depth].p_ext; ex = path[path->p_depth].p_ext;
if (!ex) { if (!ex) {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
goto out; goto out;
} }
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex); end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
/* Count the number of data blocks */ /* Count the number of data blocks */
cur = 0; cur = 0;
@ -6025,30 +6015,26 @@ int ext4_ext_replay_set_iblocks(struct inode *inode)
if (IS_ERR(path)) if (IS_ERR(path))
goto out; goto out;
numblks += path->p_depth; numblks += path->p_depth;
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
while (cur < end) { while (cur < end) {
path = ext4_find_extent(inode, cur, NULL, 0); path = ext4_find_extent(inode, cur, NULL, 0);
if (IS_ERR(path)) if (IS_ERR(path))
break; break;
ex = path[path->p_depth].p_ext; ex = path[path->p_depth].p_ext;
if (!ex) { if (!ex) {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
return 0; return 0;
} }
cur = max(cur + 1, le32_to_cpu(ex->ee_block) + cur = max(cur + 1, le32_to_cpu(ex->ee_block) +
ext4_ext_get_actual_len(ex)); ext4_ext_get_actual_len(ex));
ret = skip_hole(inode, &cur); ret = skip_hole(inode, &cur);
if (ret < 0) { if (ret < 0) {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
break; break;
} }
path2 = ext4_find_extent(inode, cur, NULL, 0); path2 = ext4_find_extent(inode, cur, NULL, 0);
if (IS_ERR(path2)) { if (IS_ERR(path2)) {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
break; break;
} }
for (i = 0; i <= max(path->p_depth, path2->p_depth); i++) { for (i = 0; i <= max(path->p_depth, path2->p_depth); i++) {
@ -6062,10 +6048,8 @@ int ext4_ext_replay_set_iblocks(struct inode *inode)
if (cmp1 != cmp2 && cmp2 != 0) if (cmp1 != cmp2 && cmp2 != 0)
numblks++; numblks++;
} }
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
ext4_ext_drop_refs(path2); ext4_free_ext_path(path2);
kfree(path);
kfree(path2);
} }
out: out:
@ -6092,13 +6076,11 @@ int ext4_ext_clear_bb(struct inode *inode)
return PTR_ERR(path); return PTR_ERR(path);
ex = path[path->p_depth].p_ext; ex = path[path->p_depth].p_ext;
if (!ex) { if (!ex) {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
return 0; return 0;
} }
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex); end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
cur = 0; cur = 0;
while (cur < end) { while (cur < end) {
@ -6117,8 +6099,7 @@ int ext4_ext_clear_bb(struct inode *inode)
ext4_fc_record_regions(inode->i_sb, inode->i_ino, ext4_fc_record_regions(inode->i_sb, inode->i_ino,
0, path[j].p_block, 1, 1); 0, path[j].p_block, 1, 1);
} }
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
} }
ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0); ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
ext4_fc_record_regions(inode->i_sb, inode->i_ino, ext4_fc_record_regions(inode->i_sb, inode->i_ino,

View file

@ -667,8 +667,7 @@ static void ext4_es_insert_extent_ext_check(struct inode *inode,
} }
} }
out: out:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
} }
static void ext4_es_insert_extent_ind_check(struct inode *inode, static void ext4_es_insert_extent_ind_check(struct inode *inode,

View file

@ -229,6 +229,12 @@ __releases(&EXT4_SB(inode->i_sb)->s_fc_lock)
finish_wait(wq, &wait.wq_entry); finish_wait(wq, &wait.wq_entry);
} }
static bool ext4_fc_disabled(struct super_block *sb)
{
return (!test_opt2(sb, JOURNAL_FAST_COMMIT) ||
(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY));
}
/* /*
* Inform Ext4's fast about start of an inode update * Inform Ext4's fast about start of an inode update
* *
@ -240,8 +246,7 @@ void ext4_fc_start_update(struct inode *inode)
{ {
struct ext4_inode_info *ei = EXT4_I(inode); struct ext4_inode_info *ei = EXT4_I(inode);
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(inode->i_sb))
(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
return; return;
restart: restart:
@ -265,8 +270,7 @@ void ext4_fc_stop_update(struct inode *inode)
{ {
struct ext4_inode_info *ei = EXT4_I(inode); struct ext4_inode_info *ei = EXT4_I(inode);
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(inode->i_sb))
(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
return; return;
if (atomic_dec_and_test(&ei->i_fc_updates)) if (atomic_dec_and_test(&ei->i_fc_updates))
@ -283,8 +287,7 @@ void ext4_fc_del(struct inode *inode)
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct ext4_fc_dentry_update *fc_dentry; struct ext4_fc_dentry_update *fc_dentry;
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(inode->i_sb))
(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
return; return;
restart: restart:
@ -337,8 +340,7 @@ void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handl
struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_sb_info *sbi = EXT4_SB(sb);
tid_t tid; tid_t tid;
if (!test_opt2(sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(sb))
(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
return; return;
ext4_set_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); ext4_set_mount_flag(sb, EXT4_MF_FC_INELIGIBLE);
@ -493,10 +495,8 @@ void __ext4_fc_track_unlink(handle_t *handle,
void ext4_fc_track_unlink(handle_t *handle, struct dentry *dentry) void ext4_fc_track_unlink(handle_t *handle, struct dentry *dentry)
{ {
struct inode *inode = d_inode(dentry); struct inode *inode = d_inode(dentry);
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(inode->i_sb))
(sbi->s_mount_state & EXT4_FC_REPLAY))
return; return;
if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE)) if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
@ -522,10 +522,8 @@ void __ext4_fc_track_link(handle_t *handle,
void ext4_fc_track_link(handle_t *handle, struct dentry *dentry) void ext4_fc_track_link(handle_t *handle, struct dentry *dentry)
{ {
struct inode *inode = d_inode(dentry); struct inode *inode = d_inode(dentry);
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(inode->i_sb))
(sbi->s_mount_state & EXT4_FC_REPLAY))
return; return;
if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE)) if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
@ -551,10 +549,8 @@ void __ext4_fc_track_create(handle_t *handle, struct inode *inode,
void ext4_fc_track_create(handle_t *handle, struct dentry *dentry) void ext4_fc_track_create(handle_t *handle, struct dentry *dentry)
{ {
struct inode *inode = d_inode(dentry); struct inode *inode = d_inode(dentry);
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(inode->i_sb))
(sbi->s_mount_state & EXT4_FC_REPLAY))
return; return;
if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE)) if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
@ -576,22 +572,20 @@ static int __track_inode(struct inode *inode, void *arg, bool update)
void ext4_fc_track_inode(handle_t *handle, struct inode *inode) void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
{ {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int ret; int ret;
if (S_ISDIR(inode->i_mode)) if (S_ISDIR(inode->i_mode))
return; return;
if (ext4_fc_disabled(inode->i_sb))
return;
if (ext4_should_journal_data(inode)) { if (ext4_should_journal_data(inode)) {
ext4_fc_mark_ineligible(inode->i_sb, ext4_fc_mark_ineligible(inode->i_sb,
EXT4_FC_REASON_INODE_JOURNAL_DATA, handle); EXT4_FC_REASON_INODE_JOURNAL_DATA, handle);
return; return;
} }
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
(sbi->s_mount_state & EXT4_FC_REPLAY))
return;
if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE)) if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
return; return;
@ -634,15 +628,13 @@ static int __track_range(struct inode *inode, void *arg, bool update)
void ext4_fc_track_range(handle_t *handle, struct inode *inode, ext4_lblk_t start, void ext4_fc_track_range(handle_t *handle, struct inode *inode, ext4_lblk_t start,
ext4_lblk_t end) ext4_lblk_t end)
{ {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct __track_range_args args; struct __track_range_args args;
int ret; int ret;
if (S_ISDIR(inode->i_mode)) if (S_ISDIR(inode->i_mode))
return; return;
if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) || if (ext4_fc_disabled(inode->i_sb))
(sbi->s_mount_state & EXT4_FC_REPLAY))
return; return;
if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE)) if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
@ -710,10 +702,10 @@ static u8 *ext4_fc_reserve_space(struct super_block *sb, int len, u32 *crc)
* After allocating len, we should have space at least for a 0 byte * After allocating len, we should have space at least for a 0 byte
* padding. * padding.
*/ */
if (len + sizeof(struct ext4_fc_tl) > bsize) if (len + EXT4_FC_TAG_BASE_LEN > bsize)
return NULL; return NULL;
if (bsize - off - 1 > len + sizeof(struct ext4_fc_tl)) { if (bsize - off - 1 > len + EXT4_FC_TAG_BASE_LEN) {
/* /*
* Only allocate from current buffer if we have enough space for * Only allocate from current buffer if we have enough space for
* this request AND we have space to add a zero byte padding. * this request AND we have space to add a zero byte padding.
@ -730,10 +722,10 @@ static u8 *ext4_fc_reserve_space(struct super_block *sb, int len, u32 *crc)
/* Need to add PAD tag */ /* Need to add PAD tag */
tl = (struct ext4_fc_tl *)(sbi->s_fc_bh->b_data + off); tl = (struct ext4_fc_tl *)(sbi->s_fc_bh->b_data + off);
tl->fc_tag = cpu_to_le16(EXT4_FC_TAG_PAD); tl->fc_tag = cpu_to_le16(EXT4_FC_TAG_PAD);
pad_len = bsize - off - 1 - sizeof(struct ext4_fc_tl); pad_len = bsize - off - 1 - EXT4_FC_TAG_BASE_LEN;
tl->fc_len = cpu_to_le16(pad_len); tl->fc_len = cpu_to_le16(pad_len);
if (crc) if (crc)
*crc = ext4_chksum(sbi, *crc, tl, sizeof(*tl)); *crc = ext4_chksum(sbi, *crc, tl, EXT4_FC_TAG_BASE_LEN);
if (pad_len > 0) if (pad_len > 0)
ext4_fc_memzero(sb, tl + 1, pad_len, crc); ext4_fc_memzero(sb, tl + 1, pad_len, crc);
ext4_fc_submit_bh(sb, false); ext4_fc_submit_bh(sb, false);
@ -775,7 +767,7 @@ static int ext4_fc_write_tail(struct super_block *sb, u32 crc)
* ext4_fc_reserve_space takes care of allocating an extra block if * ext4_fc_reserve_space takes care of allocating an extra block if
* there's no enough space on this block for accommodating this tail. * there's no enough space on this block for accommodating this tail.
*/ */
dst = ext4_fc_reserve_space(sb, sizeof(tl) + sizeof(tail), &crc); dst = ext4_fc_reserve_space(sb, EXT4_FC_TAG_BASE_LEN + sizeof(tail), &crc);
if (!dst) if (!dst)
return -ENOSPC; return -ENOSPC;
@ -785,8 +777,8 @@ static int ext4_fc_write_tail(struct super_block *sb, u32 crc)
tl.fc_len = cpu_to_le16(bsize - off - 1 + sizeof(struct ext4_fc_tail)); tl.fc_len = cpu_to_le16(bsize - off - 1 + sizeof(struct ext4_fc_tail));
sbi->s_fc_bytes = round_up(sbi->s_fc_bytes, bsize); sbi->s_fc_bytes = round_up(sbi->s_fc_bytes, bsize);
ext4_fc_memcpy(sb, dst, &tl, sizeof(tl), &crc); ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, &crc);
dst += sizeof(tl); dst += EXT4_FC_TAG_BASE_LEN;
tail.fc_tid = cpu_to_le32(sbi->s_journal->j_running_transaction->t_tid); tail.fc_tid = cpu_to_le32(sbi->s_journal->j_running_transaction->t_tid);
ext4_fc_memcpy(sb, dst, &tail.fc_tid, sizeof(tail.fc_tid), &crc); ext4_fc_memcpy(sb, dst, &tail.fc_tid, sizeof(tail.fc_tid), &crc);
dst += sizeof(tail.fc_tid); dst += sizeof(tail.fc_tid);
@ -808,15 +800,15 @@ static bool ext4_fc_add_tlv(struct super_block *sb, u16 tag, u16 len, u8 *val,
struct ext4_fc_tl tl; struct ext4_fc_tl tl;
u8 *dst; u8 *dst;
dst = ext4_fc_reserve_space(sb, sizeof(tl) + len, crc); dst = ext4_fc_reserve_space(sb, EXT4_FC_TAG_BASE_LEN + len, crc);
if (!dst) if (!dst)
return false; return false;
tl.fc_tag = cpu_to_le16(tag); tl.fc_tag = cpu_to_le16(tag);
tl.fc_len = cpu_to_le16(len); tl.fc_len = cpu_to_le16(len);
ext4_fc_memcpy(sb, dst, &tl, sizeof(tl), crc); ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc);
ext4_fc_memcpy(sb, dst + sizeof(tl), val, len, crc); ext4_fc_memcpy(sb, dst + EXT4_FC_TAG_BASE_LEN, val, len, crc);
return true; return true;
} }
@ -828,8 +820,8 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
struct ext4_fc_dentry_info fcd; struct ext4_fc_dentry_info fcd;
struct ext4_fc_tl tl; struct ext4_fc_tl tl;
int dlen = fc_dentry->fcd_name.len; int dlen = fc_dentry->fcd_name.len;
u8 *dst = ext4_fc_reserve_space(sb, sizeof(tl) + sizeof(fcd) + dlen, u8 *dst = ext4_fc_reserve_space(sb,
crc); EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc);
if (!dst) if (!dst)
return false; return false;
@ -838,8 +830,8 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
fcd.fc_ino = cpu_to_le32(fc_dentry->fcd_ino); fcd.fc_ino = cpu_to_le32(fc_dentry->fcd_ino);
tl.fc_tag = cpu_to_le16(fc_dentry->fcd_op); tl.fc_tag = cpu_to_le16(fc_dentry->fcd_op);
tl.fc_len = cpu_to_le16(sizeof(fcd) + dlen); tl.fc_len = cpu_to_le16(sizeof(fcd) + dlen);
ext4_fc_memcpy(sb, dst, &tl, sizeof(tl), crc); ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc);
dst += sizeof(tl); dst += EXT4_FC_TAG_BASE_LEN;
ext4_fc_memcpy(sb, dst, &fcd, sizeof(fcd), crc); ext4_fc_memcpy(sb, dst, &fcd, sizeof(fcd), crc);
dst += sizeof(fcd); dst += sizeof(fcd);
ext4_fc_memcpy(sb, dst, fc_dentry->fcd_name.name, dlen, crc); ext4_fc_memcpy(sb, dst, fc_dentry->fcd_name.name, dlen, crc);
@ -874,22 +866,25 @@ static int ext4_fc_write_inode(struct inode *inode, u32 *crc)
tl.fc_tag = cpu_to_le16(EXT4_FC_TAG_INODE); tl.fc_tag = cpu_to_le16(EXT4_FC_TAG_INODE);
tl.fc_len = cpu_to_le16(inode_len + sizeof(fc_inode.fc_ino)); tl.fc_len = cpu_to_le16(inode_len + sizeof(fc_inode.fc_ino));
ret = -ECANCELED;
dst = ext4_fc_reserve_space(inode->i_sb, dst = ext4_fc_reserve_space(inode->i_sb,
sizeof(tl) + inode_len + sizeof(fc_inode.fc_ino), crc); EXT4_FC_TAG_BASE_LEN + inode_len + sizeof(fc_inode.fc_ino), crc);
if (!dst) if (!dst)
return -ECANCELED; goto err;
if (!ext4_fc_memcpy(inode->i_sb, dst, &tl, sizeof(tl), crc)) if (!ext4_fc_memcpy(inode->i_sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc))
return -ECANCELED; goto err;
dst += sizeof(tl); dst += EXT4_FC_TAG_BASE_LEN;
if (!ext4_fc_memcpy(inode->i_sb, dst, &fc_inode, sizeof(fc_inode), crc)) if (!ext4_fc_memcpy(inode->i_sb, dst, &fc_inode, sizeof(fc_inode), crc))
return -ECANCELED; goto err;
dst += sizeof(fc_inode); dst += sizeof(fc_inode);
if (!ext4_fc_memcpy(inode->i_sb, dst, (u8 *)ext4_raw_inode(&iloc), if (!ext4_fc_memcpy(inode->i_sb, dst, (u8 *)ext4_raw_inode(&iloc),
inode_len, crc)) inode_len, crc))
return -ECANCELED; goto err;
ret = 0;
return 0; err:
brelse(iloc.bh);
return ret;
} }
/* /*
@ -1343,7 +1338,7 @@ struct dentry_info_args {
}; };
static inline void tl_to_darg(struct dentry_info_args *darg, static inline void tl_to_darg(struct dentry_info_args *darg,
struct ext4_fc_tl *tl, u8 *val) struct ext4_fc_tl *tl, u8 *val)
{ {
struct ext4_fc_dentry_info fcd; struct ext4_fc_dentry_info fcd;
@ -1352,8 +1347,14 @@ static inline void tl_to_darg(struct dentry_info_args *darg,
darg->parent_ino = le32_to_cpu(fcd.fc_parent_ino); darg->parent_ino = le32_to_cpu(fcd.fc_parent_ino);
darg->ino = le32_to_cpu(fcd.fc_ino); darg->ino = le32_to_cpu(fcd.fc_ino);
darg->dname = val + offsetof(struct ext4_fc_dentry_info, fc_dname); darg->dname = val + offsetof(struct ext4_fc_dentry_info, fc_dname);
darg->dname_len = le16_to_cpu(tl->fc_len) - darg->dname_len = tl->fc_len - sizeof(struct ext4_fc_dentry_info);
sizeof(struct ext4_fc_dentry_info); }
static inline void ext4_fc_get_tl(struct ext4_fc_tl *tl, u8 *val)
{
memcpy(tl, val, EXT4_FC_TAG_BASE_LEN);
tl->fc_len = le16_to_cpu(tl->fc_len);
tl->fc_tag = le16_to_cpu(tl->fc_tag);
} }
/* Unlink replay function */ /* Unlink replay function */
@ -1491,13 +1492,15 @@ static int ext4_fc_record_modified_inode(struct super_block *sb, int ino)
if (state->fc_modified_inodes[i] == ino) if (state->fc_modified_inodes[i] == ino)
return 0; return 0;
if (state->fc_modified_inodes_used == state->fc_modified_inodes_size) { if (state->fc_modified_inodes_used == state->fc_modified_inodes_size) {
state->fc_modified_inodes = krealloc( int *fc_modified_inodes;
state->fc_modified_inodes,
fc_modified_inodes = krealloc(state->fc_modified_inodes,
sizeof(int) * (state->fc_modified_inodes_size + sizeof(int) * (state->fc_modified_inodes_size +
EXT4_FC_REPLAY_REALLOC_INCREMENT), EXT4_FC_REPLAY_REALLOC_INCREMENT),
GFP_KERNEL); GFP_KERNEL);
if (!state->fc_modified_inodes) if (!fc_modified_inodes)
return -ENOMEM; return -ENOMEM;
state->fc_modified_inodes = fc_modified_inodes;
state->fc_modified_inodes_size += state->fc_modified_inodes_size +=
EXT4_FC_REPLAY_REALLOC_INCREMENT; EXT4_FC_REPLAY_REALLOC_INCREMENT;
} }
@ -1516,7 +1519,7 @@ static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl,
struct ext4_inode *raw_fc_inode; struct ext4_inode *raw_fc_inode;
struct inode *inode = NULL; struct inode *inode = NULL;
struct ext4_iloc iloc; struct ext4_iloc iloc;
int inode_len, ino, ret, tag = le16_to_cpu(tl->fc_tag); int inode_len, ino, ret, tag = tl->fc_tag;
struct ext4_extent_header *eh; struct ext4_extent_header *eh;
memcpy(&fc_inode, val, sizeof(fc_inode)); memcpy(&fc_inode, val, sizeof(fc_inode));
@ -1541,7 +1544,7 @@ static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl,
if (ret) if (ret)
goto out; goto out;
inode_len = le16_to_cpu(tl->fc_len) - sizeof(struct ext4_fc_inode); inode_len = tl->fc_len - sizeof(struct ext4_fc_inode);
raw_inode = ext4_raw_inode(&iloc); raw_inode = ext4_raw_inode(&iloc);
memcpy(raw_inode, raw_fc_inode, offsetof(struct ext4_inode, i_block)); memcpy(raw_inode, raw_fc_inode, offsetof(struct ext4_inode, i_block));
@ -1682,15 +1685,18 @@ int ext4_fc_record_regions(struct super_block *sb, int ino,
if (replay && state->fc_regions_used != state->fc_regions_valid) if (replay && state->fc_regions_used != state->fc_regions_valid)
state->fc_regions_used = state->fc_regions_valid; state->fc_regions_used = state->fc_regions_valid;
if (state->fc_regions_used == state->fc_regions_size) { if (state->fc_regions_used == state->fc_regions_size) {
struct ext4_fc_alloc_region *fc_regions;
fc_regions = krealloc(state->fc_regions,
sizeof(struct ext4_fc_alloc_region) *
(state->fc_regions_size +
EXT4_FC_REPLAY_REALLOC_INCREMENT),
GFP_KERNEL);
if (!fc_regions)
return -ENOMEM;
state->fc_regions_size += state->fc_regions_size +=
EXT4_FC_REPLAY_REALLOC_INCREMENT; EXT4_FC_REPLAY_REALLOC_INCREMENT;
state->fc_regions = krealloc( state->fc_regions = fc_regions;
state->fc_regions,
state->fc_regions_size *
sizeof(struct ext4_fc_alloc_region),
GFP_KERNEL);
if (!state->fc_regions)
return -ENOMEM;
} }
region = &state->fc_regions[state->fc_regions_used++]; region = &state->fc_regions[state->fc_regions_used++];
region->ino = ino; region->ino = ino;
@ -1770,8 +1776,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
ret = ext4_ext_insert_extent( ret = ext4_ext_insert_extent(
NULL, inode, &path, &newex, 0); NULL, inode, &path, &newex, 0);
up_write((&EXT4_I(inode)->i_data_sem)); up_write((&EXT4_I(inode)->i_data_sem));
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
if (ret) if (ret)
goto out; goto out;
goto next; goto next;
@ -1926,8 +1931,7 @@ static void ext4_fc_set_bitmaps_and_counters(struct super_block *sb)
for (j = 0; j < path->p_depth; j++) for (j = 0; j < path->p_depth; j++)
ext4_mb_mark_bb(inode->i_sb, ext4_mb_mark_bb(inode->i_sb,
path[j].p_block, 1, 1); path[j].p_block, 1, 1);
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
} }
cur += ret; cur += ret;
ext4_mb_mark_bb(inode->i_sb, map.m_pblk, ext4_mb_mark_bb(inode->i_sb, map.m_pblk,
@ -1972,6 +1976,34 @@ void ext4_fc_replay_cleanup(struct super_block *sb)
kfree(sbi->s_fc_replay_state.fc_modified_inodes); kfree(sbi->s_fc_replay_state.fc_modified_inodes);
} }
static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl,
u8 *val, u8 *end)
{
if (val + tl->fc_len > end)
return false;
/* Here only check ADD_RANGE/TAIL/HEAD which will read data when do
* journal rescan before do CRC check. Other tags length check will
* rely on CRC check.
*/
switch (tl->fc_tag) {
case EXT4_FC_TAG_ADD_RANGE:
return (sizeof(struct ext4_fc_add_range) == tl->fc_len);
case EXT4_FC_TAG_TAIL:
return (sizeof(struct ext4_fc_tail) <= tl->fc_len);
case EXT4_FC_TAG_HEAD:
return (sizeof(struct ext4_fc_head) == tl->fc_len);
case EXT4_FC_TAG_DEL_RANGE:
case EXT4_FC_TAG_LINK:
case EXT4_FC_TAG_UNLINK:
case EXT4_FC_TAG_CREAT:
case EXT4_FC_TAG_INODE:
case EXT4_FC_TAG_PAD:
default:
return true;
}
}
/* /*
* Recovery Scan phase handler * Recovery Scan phase handler
* *
@ -2028,12 +2060,18 @@ static int ext4_fc_replay_scan(journal_t *journal,
} }
state->fc_replay_expected_off++; state->fc_replay_expected_off++;
for (cur = start; cur < end; cur = cur + sizeof(tl) + le16_to_cpu(tl.fc_len)) { for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
memcpy(&tl, cur, sizeof(tl)); cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
val = cur + sizeof(tl); ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
if (!ext4_fc_tag_len_isvalid(&tl, val, end)) {
ret = state->fc_replay_num_tags ?
JBD2_FC_REPLAY_STOP : -ECANCELED;
goto out_err;
}
ext4_debug("Scan phase, tag:%s, blk %lld\n", ext4_debug("Scan phase, tag:%s, blk %lld\n",
tag2str(le16_to_cpu(tl.fc_tag)), bh->b_blocknr); tag2str(tl.fc_tag), bh->b_blocknr);
switch (le16_to_cpu(tl.fc_tag)) { switch (tl.fc_tag) {
case EXT4_FC_TAG_ADD_RANGE: case EXT4_FC_TAG_ADD_RANGE:
memcpy(&ext, val, sizeof(ext)); memcpy(&ext, val, sizeof(ext));
ex = (struct ext4_extent *)&ext.fc_ex; ex = (struct ext4_extent *)&ext.fc_ex;
@ -2053,13 +2091,13 @@ static int ext4_fc_replay_scan(journal_t *journal,
case EXT4_FC_TAG_PAD: case EXT4_FC_TAG_PAD:
state->fc_cur_tag++; state->fc_cur_tag++;
state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur,
sizeof(tl) + le16_to_cpu(tl.fc_len)); EXT4_FC_TAG_BASE_LEN + tl.fc_len);
break; break;
case EXT4_FC_TAG_TAIL: case EXT4_FC_TAG_TAIL:
state->fc_cur_tag++; state->fc_cur_tag++;
memcpy(&tail, val, sizeof(tail)); memcpy(&tail, val, sizeof(tail));
state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur,
sizeof(tl) + EXT4_FC_TAG_BASE_LEN +
offsetof(struct ext4_fc_tail, offsetof(struct ext4_fc_tail,
fc_crc)); fc_crc));
if (le32_to_cpu(tail.fc_tid) == expected_tid && if (le32_to_cpu(tail.fc_tid) == expected_tid &&
@ -2086,7 +2124,7 @@ static int ext4_fc_replay_scan(journal_t *journal,
} }
state->fc_cur_tag++; state->fc_cur_tag++;
state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur,
sizeof(tl) + le16_to_cpu(tl.fc_len)); EXT4_FC_TAG_BASE_LEN + tl.fc_len);
break; break;
default: default:
ret = state->fc_replay_num_tags ? ret = state->fc_replay_num_tags ?
@ -2141,19 +2179,20 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
start = (u8 *)bh->b_data; start = (u8 *)bh->b_data;
end = (__u8 *)bh->b_data + journal->j_blocksize - 1; end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
for (cur = start; cur < end; cur = cur + sizeof(tl) + le16_to_cpu(tl.fc_len)) { for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
memcpy(&tl, cur, sizeof(tl)); cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
val = cur + sizeof(tl); ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
if (state->fc_replay_num_tags == 0) { if (state->fc_replay_num_tags == 0) {
ret = JBD2_FC_REPLAY_STOP; ret = JBD2_FC_REPLAY_STOP;
ext4_fc_set_bitmaps_and_counters(sb); ext4_fc_set_bitmaps_and_counters(sb);
break; break;
} }
ext4_debug("Replay phase, tag:%s\n",
tag2str(le16_to_cpu(tl.fc_tag))); ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag));
state->fc_replay_num_tags--; state->fc_replay_num_tags--;
switch (le16_to_cpu(tl.fc_tag)) { switch (tl.fc_tag) {
case EXT4_FC_TAG_LINK: case EXT4_FC_TAG_LINK:
ret = ext4_fc_replay_link(sb, &tl, val); ret = ext4_fc_replay_link(sb, &tl, val);
break; break;
@ -2174,19 +2213,18 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
break; break;
case EXT4_FC_TAG_PAD: case EXT4_FC_TAG_PAD:
trace_ext4_fc_replay(sb, EXT4_FC_TAG_PAD, 0, trace_ext4_fc_replay(sb, EXT4_FC_TAG_PAD, 0,
le16_to_cpu(tl.fc_len), 0); tl.fc_len, 0);
break; break;
case EXT4_FC_TAG_TAIL: case EXT4_FC_TAG_TAIL:
trace_ext4_fc_replay(sb, EXT4_FC_TAG_TAIL, 0, trace_ext4_fc_replay(sb, EXT4_FC_TAG_TAIL,
le16_to_cpu(tl.fc_len), 0); 0, tl.fc_len, 0);
memcpy(&tail, val, sizeof(tail)); memcpy(&tail, val, sizeof(tail));
WARN_ON(le32_to_cpu(tail.fc_tid) != expected_tid); WARN_ON(le32_to_cpu(tail.fc_tid) != expected_tid);
break; break;
case EXT4_FC_TAG_HEAD: case EXT4_FC_TAG_HEAD:
break; break;
default: default:
trace_ext4_fc_replay(sb, le16_to_cpu(tl.fc_tag), 0, trace_ext4_fc_replay(sb, tl.fc_tag, 0, tl.fc_len, 0);
le16_to_cpu(tl.fc_len), 0);
ret = -ECANCELED; ret = -ECANCELED;
break; break;
} }

View file

@ -70,6 +70,9 @@ struct ext4_fc_tail {
__le32 fc_crc; __le32 fc_crc;
}; };
/* Tag base length */
#define EXT4_FC_TAG_BASE_LEN (sizeof(struct ext4_fc_tl))
/* /*
* Fast commit status codes * Fast commit status codes
*/ */

View file

@ -543,6 +543,12 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
ret = -EAGAIN; ret = -EAGAIN;
goto out; goto out;
} }
/*
* Make sure inline data cannot be created anymore since we are going
* to allocate blocks for DIO. We know the inode does not have any
* inline data now because ext4_dio_supported() checked for that.
*/
ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
offset = iocb->ki_pos; offset = iocb->ki_pos;
count = ret; count = ret;

View file

@ -1188,6 +1188,13 @@ retry_grab:
page = grab_cache_page_write_begin(mapping, index); page = grab_cache_page_write_begin(mapping, index);
if (!page) if (!page)
return -ENOMEM; return -ENOMEM;
/*
* The same as page allocation, we prealloc buffer heads before
* starting the handle.
*/
if (!page_has_buffers(page))
create_empty_buffers(page, inode->i_sb->s_blocksize, 0);
unlock_page(page); unlock_page(page);
retry_journal: retry_journal:
@ -5342,6 +5349,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
int error, rc = 0; int error, rc = 0;
int orphan = 0; int orphan = 0;
const unsigned int ia_valid = attr->ia_valid; const unsigned int ia_valid = attr->ia_valid;
bool inc_ivers = true;
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO; return -EIO;
@ -5425,8 +5433,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL; return -EINVAL;
} }
if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size) if (attr->ia_size == inode->i_size)
inode_inc_iversion(inode); inc_ivers = false;
if (shrink) { if (shrink) {
if (ext4_should_order_data(inode)) { if (ext4_should_order_data(inode)) {
@ -5528,6 +5536,8 @@ out_mmap_sem:
} }
if (!error) { if (!error) {
if (inc_ivers)
inode_inc_iversion(inode);
setattr_copy(mnt_userns, inode, attr); setattr_copy(mnt_userns, inode, attr);
mark_inode_dirty(inode); mark_inode_dirty(inode);
} }
@ -5768,9 +5778,6 @@ int ext4_mark_iloc_dirty(handle_t *handle,
} }
ext4_fc_track_inode(handle, inode); ext4_fc_track_inode(handle, inode);
if (IS_I_VERSION(inode))
inode_inc_iversion(inode);
/* the do_update_inode consumes one bh->b_count */ /* the do_update_inode consumes one bh->b_count */
get_bh(iloc->bh); get_bh(iloc->bh);

View file

@ -452,6 +452,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
swap_inode_data(inode, inode_bl); swap_inode_data(inode, inode_bl);
inode->i_ctime = inode_bl->i_ctime = current_time(inode); inode->i_ctime = inode_bl->i_ctime = current_time(inode);
inode_inc_iversion(inode);
inode->i_generation = prandom_u32(); inode->i_generation = prandom_u32();
inode_bl->i_generation = prandom_u32(); inode_bl->i_generation = prandom_u32();
@ -665,6 +666,7 @@ static int ext4_ioctl_setflags(struct inode *inode,
ext4_set_inode_flags(inode, false); ext4_set_inode_flags(inode, false);
inode->i_ctime = current_time(inode); inode->i_ctime = current_time(inode);
inode_inc_iversion(inode);
err = ext4_mark_iloc_dirty(handle, inode, &iloc); err = ext4_mark_iloc_dirty(handle, inode, &iloc);
flags_err: flags_err:
@ -775,6 +777,7 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
EXT4_I(inode)->i_projid = kprojid; EXT4_I(inode)->i_projid = kprojid;
inode->i_ctime = current_time(inode); inode->i_ctime = current_time(inode);
inode_inc_iversion(inode);
out_dirty: out_dirty:
rc = ext4_mark_iloc_dirty(handle, inode, &iloc); rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
if (!err) if (!err)
@ -1060,9 +1063,6 @@ static int ext4_ioctl_checkpoint(struct file *filp, unsigned long arg)
if (!EXT4_SB(sb)->s_journal) if (!EXT4_SB(sb)->s_journal)
return -ENODEV; return -ENODEV;
if (flags & ~EXT4_IOC_CHECKPOINT_FLAG_VALID)
return -EINVAL;
if ((flags & JBD2_JOURNAL_FLUSH_DISCARD) && if ((flags & JBD2_JOURNAL_FLUSH_DISCARD) &&
!bdev_max_discard_sectors(EXT4_SB(sb)->s_journal->j_dev)) !bdev_max_discard_sectors(EXT4_SB(sb)->s_journal->j_dev))
return -EOPNOTSUPP; return -EOPNOTSUPP;
@ -1257,6 +1257,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
err = ext4_reserve_inode_write(handle, inode, &iloc); err = ext4_reserve_inode_write(handle, inode, &iloc);
if (err == 0) { if (err == 0) {
inode->i_ctime = current_time(inode); inode->i_ctime = current_time(inode);
inode_inc_iversion(inode);
inode->i_generation = generation; inode->i_generation = generation;
err = ext4_mark_iloc_dirty(handle, inode, &iloc); err = ext4_mark_iloc_dirty(handle, inode, &iloc);
} }

View file

@ -56,8 +56,7 @@ static int finish_range(handle_t *handle, struct inode *inode,
retval = ext4_ext_insert_extent(handle, inode, &path, &newext, 0); retval = ext4_ext_insert_extent(handle, inode, &path, &newext, 0);
err_out: err_out:
up_write((&EXT4_I(inode)->i_data_sem)); up_write((&EXT4_I(inode)->i_data_sem));
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
lb->first_pblock = 0; lb->first_pblock = 0;
return retval; return retval;
} }

View file

@ -32,8 +32,7 @@ get_ext_path(struct inode *inode, ext4_lblk_t lblock,
if (IS_ERR(path)) if (IS_ERR(path))
return PTR_ERR(path); return PTR_ERR(path);
if (path[ext_depth(inode)].p_ext == NULL) { if (path[ext_depth(inode)].p_ext == NULL) {
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
*ppath = NULL; *ppath = NULL;
return -ENODATA; return -ENODATA;
} }
@ -103,12 +102,10 @@ mext_check_coverage(struct inode *inode, ext4_lblk_t from, ext4_lblk_t count,
if (unwritten != ext4_ext_is_unwritten(ext)) if (unwritten != ext4_ext_is_unwritten(ext))
goto out; goto out;
from += ext4_ext_get_actual_len(ext); from += ext4_ext_get_actual_len(ext);
ext4_ext_drop_refs(path);
} }
ret = 1; ret = 1;
out: out:
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
return ret; return ret;
} }
@ -472,19 +469,17 @@ mext_check_arguments(struct inode *orig_inode,
if (IS_IMMUTABLE(donor_inode) || IS_APPEND(donor_inode)) if (IS_IMMUTABLE(donor_inode) || IS_APPEND(donor_inode))
return -EPERM; return -EPERM;
/* Ext4 move extent does not support swapfile */ /* Ext4 move extent does not support swap files */
if (IS_SWAPFILE(orig_inode) || IS_SWAPFILE(donor_inode)) { if (IS_SWAPFILE(orig_inode) || IS_SWAPFILE(donor_inode)) {
ext4_debug("ext4 move extent: The argument files should " ext4_debug("ext4 move extent: The argument files should not be swap files [ino:orig %lu, donor %lu]\n",
"not be swapfile [ino:orig %lu, donor %lu]\n",
orig_inode->i_ino, donor_inode->i_ino); orig_inode->i_ino, donor_inode->i_ino);
return -EBUSY; return -ETXTBSY;
} }
if (ext4_is_quota_file(orig_inode) && ext4_is_quota_file(donor_inode)) { if (ext4_is_quota_file(orig_inode) && ext4_is_quota_file(donor_inode)) {
ext4_debug("ext4 move extent: The argument files should " ext4_debug("ext4 move extent: The argument files should not be quota files [ino:orig %lu, donor %lu]\n",
"not be quota files [ino:orig %lu, donor %lu]\n",
orig_inode->i_ino, donor_inode->i_ino); orig_inode->i_ino, donor_inode->i_ino);
return -EBUSY; return -EOPNOTSUPP;
} }
/* Ext4 move extent supports only extent based file */ /* Ext4 move extent supports only extent based file */
@ -631,11 +626,11 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
if (ret) if (ret)
goto out; goto out;
ex = path[path->p_depth].p_ext; ex = path[path->p_depth].p_ext;
next_blk = ext4_ext_next_allocated_block(path);
cur_blk = le32_to_cpu(ex->ee_block); cur_blk = le32_to_cpu(ex->ee_block);
cur_len = ext4_ext_get_actual_len(ex); cur_len = ext4_ext_get_actual_len(ex);
/* Check hole before the start pos */ /* Check hole before the start pos */
if (cur_blk + cur_len - 1 < o_start) { if (cur_blk + cur_len - 1 < o_start) {
next_blk = ext4_ext_next_allocated_block(path);
if (next_blk == EXT_MAX_BLOCKS) { if (next_blk == EXT_MAX_BLOCKS) {
ret = -ENODATA; ret = -ENODATA;
goto out; goto out;
@ -663,7 +658,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
donor_page_index = d_start >> (PAGE_SHIFT - donor_page_index = d_start >> (PAGE_SHIFT -
donor_inode->i_blkbits); donor_inode->i_blkbits);
offset_in_page = o_start % blocks_per_page; offset_in_page = o_start % blocks_per_page;
if (cur_len > blocks_per_page- offset_in_page) if (cur_len > blocks_per_page - offset_in_page)
cur_len = blocks_per_page - offset_in_page; cur_len = blocks_per_page - offset_in_page;
/* /*
* Up semaphore to avoid following problems: * Up semaphore to avoid following problems:
@ -694,8 +689,7 @@ out:
ext4_discard_preallocations(donor_inode, 0); ext4_discard_preallocations(donor_inode, 0);
} }
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
ext4_double_up_write_data_sem(orig_inode, donor_inode); ext4_double_up_write_data_sem(orig_inode, donor_inode);
unlock_two_nondirectories(orig_inode, donor_inode); unlock_two_nondirectories(orig_inode, donor_inode);

View file

@ -85,15 +85,20 @@ static struct buffer_head *ext4_append(handle_t *handle,
return bh; return bh;
inode->i_size += inode->i_sb->s_blocksize; inode->i_size += inode->i_sb->s_blocksize;
EXT4_I(inode)->i_disksize = inode->i_size; EXT4_I(inode)->i_disksize = inode->i_size;
err = ext4_mark_inode_dirty(handle, inode);
if (err)
goto out;
BUFFER_TRACE(bh, "get_write_access"); BUFFER_TRACE(bh, "get_write_access");
err = ext4_journal_get_write_access(handle, inode->i_sb, bh, err = ext4_journal_get_write_access(handle, inode->i_sb, bh,
EXT4_JTR_NONE); EXT4_JTR_NONE);
if (err) { if (err)
brelse(bh); goto out;
ext4_std_error(inode->i_sb, err);
return ERR_PTR(err);
}
return bh; return bh;
out:
brelse(bh);
ext4_std_error(inode->i_sb, err);
return ERR_PTR(err);
} }
static int ext4_dx_csum_verify(struct inode *inode, static int ext4_dx_csum_verify(struct inode *inode,
@ -126,7 +131,7 @@ static struct buffer_head *__ext4_read_dirblock(struct inode *inode,
struct ext4_dir_entry *dirent; struct ext4_dir_entry *dirent;
int is_dx_block = 0; int is_dx_block = 0;
if (block >= inode->i_size) { if (block >= inode->i_size >> inode->i_blkbits) {
ext4_error_inode(inode, func, line, block, ext4_error_inode(inode, func, line, block,
"Attempting to read directory block (%u) that is past i_size (%llu)", "Attempting to read directory block (%u) that is past i_size (%llu)",
block, inode->i_size); block, inode->i_size);

View file

@ -2122,7 +2122,7 @@ retry:
goto out; goto out;
} }
if (ext4_blocks_count(es) == n_blocks_count) if (ext4_blocks_count(es) == n_blocks_count && n_blocks_count_retry == 0)
goto out; goto out;
err = ext4_alloc_flex_bg_array(sb, n_group + 1); err = ext4_alloc_flex_bg_array(sb, n_group + 1);

File diff suppressed because it is too large Load diff

View file

@ -298,16 +298,14 @@ static int ext4_get_verity_descriptor_location(struct inode *inode,
last_extent = path[path->p_depth].p_ext; last_extent = path[path->p_depth].p_ext;
if (!last_extent) { if (!last_extent) {
EXT4_ERROR_INODE(inode, "verity file has no extents"); EXT4_ERROR_INODE(inode, "verity file has no extents");
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
return -EFSCORRUPTED; return -EFSCORRUPTED;
} }
end_lblk = le32_to_cpu(last_extent->ee_block) + end_lblk = le32_to_cpu(last_extent->ee_block) +
ext4_ext_get_actual_len(last_extent); ext4_ext_get_actual_len(last_extent);
desc_size_pos = (u64)end_lblk << inode->i_blkbits; desc_size_pos = (u64)end_lblk << inode->i_blkbits;
ext4_ext_drop_refs(path); ext4_free_ext_path(path);
kfree(path);
if (desc_size_pos < sizeof(desc_size_disk)) if (desc_size_pos < sizeof(desc_size_disk))
goto bad; goto bad;

View file

@ -2412,6 +2412,7 @@ retry_inode:
if (!error) { if (!error) {
ext4_xattr_update_super_block(handle, inode->i_sb); ext4_xattr_update_super_block(handle, inode->i_sb);
inode->i_ctime = current_time(inode); inode->i_ctime = current_time(inode);
inode_inc_iversion(inode);
if (!value) if (!value)
no_expand = 0; no_expand = 0;
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc); error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);

View file

@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
*/ */
if (!(inode->i_state & I_DIRTY_ALL)) if (!(inode->i_state & I_DIRTY_ALL))
inode_cgwb_move_to_attached(inode, wb); inode_cgwb_move_to_attached(inode, wb);
else if (!(inode->i_state & I_SYNC_QUEUED) && else if (!(inode->i_state & I_SYNC_QUEUED)) {
(inode->i_state & I_DIRTY)) if ((inode->i_state & I_DIRTY))
redirty_tail_locked(inode, wb); redirty_tail_locked(inode, wb);
else if (inode->i_state & I_DIRTY_TIME) {
inode->dirtied_when = jiffies;
inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
}
}
spin_unlock(&wb->list_lock); spin_unlock(&wb->list_lock);
inode_sync_complete(inode); inode_sync_complete(inode);
@ -2369,6 +2374,20 @@ void __mark_inode_dirty(struct inode *inode, int flags)
trace_writeback_mark_inode_dirty(inode, flags); trace_writeback_mark_inode_dirty(inode, flags);
if (flags & I_DIRTY_INODE) { if (flags & I_DIRTY_INODE) {
/*
* Inode timestamp update will piggback on this dirtying.
* We tell ->dirty_inode callback that timestamps need to
* be updated by setting I_DIRTY_TIME in flags.
*/
if (inode->i_state & I_DIRTY_TIME) {
spin_lock(&inode->i_lock);
if (inode->i_state & I_DIRTY_TIME) {
inode->i_state &= ~I_DIRTY_TIME;
flags |= I_DIRTY_TIME;
}
spin_unlock(&inode->i_lock);
}
/* /*
* Notify the filesystem about the inode being dirtied, so that * Notify the filesystem about the inode being dirtied, so that
* (if needed) it can update on-disk fields and journal the * (if needed) it can update on-disk fields and journal the
@ -2378,7 +2397,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/ */
trace_writeback_dirty_inode_start(inode, flags); trace_writeback_dirty_inode_start(inode, flags);
if (sb->s_op->dirty_inode) if (sb->s_op->dirty_inode)
sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE); sb->s_op->dirty_inode(inode,
flags & (I_DIRTY_INODE | I_DIRTY_TIME));
trace_writeback_dirty_inode(inode, flags); trace_writeback_dirty_inode(inode, flags);
/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */ /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@ -2399,21 +2419,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/ */
smp_mb(); smp_mb();
if (((inode->i_state & flags) == flags) || if ((inode->i_state & flags) == flags)
(dirtytime && (inode->i_state & I_DIRTY_INODE)))
return; return;
spin_lock(&inode->i_lock); spin_lock(&inode->i_lock);
if (dirtytime && (inode->i_state & I_DIRTY_INODE))
goto out_unlock_inode;
if ((inode->i_state & flags) != flags) { if ((inode->i_state & flags) != flags) {
const int was_dirty = inode->i_state & I_DIRTY; const int was_dirty = inode->i_state & I_DIRTY;
inode_attach_wb(inode, NULL); inode_attach_wb(inode, NULL);
/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
if (flags & I_DIRTY_INODE)
inode->i_state &= ~I_DIRTY_TIME;
inode->i_state |= flags; inode->i_state |= flags;
/* /*
@ -2486,7 +2500,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
out_unlock: out_unlock:
if (wb) if (wb)
spin_unlock(&wb->list_lock); spin_unlock(&wb->list_lock);
out_unlock_inode:
spin_unlock(&inode->i_lock); spin_unlock(&inode->i_lock);
} }
EXPORT_SYMBOL(__mark_inode_dirty); EXPORT_SYMBOL(__mark_inode_dirty);

View file

@ -122,8 +122,8 @@ static int journal_submit_commit_record(journal_t *journal,
{ {
struct commit_header *tmp; struct commit_header *tmp;
struct buffer_head *bh; struct buffer_head *bh;
int ret;
struct timespec64 now; struct timespec64 now;
blk_opf_t write_flags = REQ_OP_WRITE | REQ_SYNC;
*cbh = NULL; *cbh = NULL;
@ -155,13 +155,11 @@ static int journal_submit_commit_record(journal_t *journal,
if (journal->j_flags & JBD2_BARRIER && if (journal->j_flags & JBD2_BARRIER &&
!jbd2_has_feature_async_commit(journal)) !jbd2_has_feature_async_commit(journal))
ret = submit_bh(REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | write_flags |= REQ_PREFLUSH | REQ_FUA;
REQ_FUA, bh);
else
ret = submit_bh(REQ_OP_WRITE | REQ_SYNC, bh);
submit_bh(write_flags, bh);
*cbh = bh; *cbh = bh;
return ret; return 0;
} }
/* /*
@ -570,7 +568,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
journal->j_running_transaction = NULL; journal->j_running_transaction = NULL;
start_time = ktime_get(); start_time = ktime_get();
commit_transaction->t_log_start = journal->j_head; commit_transaction->t_log_start = journal->j_head;
wake_up(&journal->j_wait_transaction_locked); wake_up_all(&journal->j_wait_transaction_locked);
write_unlock(&journal->j_state_lock); write_unlock(&journal->j_state_lock);
jbd2_debug(3, "JBD2: commit phase 2a\n"); jbd2_debug(3, "JBD2: commit phase 2a\n");

View file

@ -923,10 +923,16 @@ int jbd2_fc_wait_bufs(journal_t *journal, int num_blks)
for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) { for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) {
bh = journal->j_fc_wbuf[i]; bh = journal->j_fc_wbuf[i];
wait_on_buffer(bh); wait_on_buffer(bh);
/*
* Update j_fc_off so jbd2_fc_release_bufs can release remain
* buffer head.
*/
if (unlikely(!buffer_uptodate(bh))) {
journal->j_fc_off = i + 1;
return -EIO;
}
put_bh(bh); put_bh(bh);
journal->j_fc_wbuf[i] = NULL; journal->j_fc_wbuf[i] = NULL;
if (unlikely(!buffer_uptodate(bh)))
return -EIO;
} }
return 0; return 0;
@ -1606,7 +1612,7 @@ static int jbd2_write_superblock(journal_t *journal, blk_opf_t write_flags)
{ {
struct buffer_head *bh = journal->j_sb_buffer; struct buffer_head *bh = journal->j_sb_buffer;
journal_superblock_t *sb = journal->j_superblock; journal_superblock_t *sb = journal->j_superblock;
int ret; int ret = 0;
/* Buffer got discarded which means block device got invalidated */ /* Buffer got discarded which means block device got invalidated */
if (!buffer_mapped(bh)) { if (!buffer_mapped(bh)) {
@ -1636,7 +1642,7 @@ static int jbd2_write_superblock(journal_t *journal, blk_opf_t write_flags)
sb->s_checksum = jbd2_superblock_csum(journal, sb); sb->s_checksum = jbd2_superblock_csum(journal, sb);
get_bh(bh); get_bh(bh);
bh->b_end_io = end_buffer_write_sync; bh->b_end_io = end_buffer_write_sync;
ret = submit_bh(REQ_OP_WRITE | write_flags, bh); submit_bh(REQ_OP_WRITE | write_flags, bh);
wait_on_buffer(bh); wait_on_buffer(bh);
if (buffer_write_io_error(bh)) { if (buffer_write_io_error(bh)) {
clear_buffer_write_io_error(bh); clear_buffer_write_io_error(bh);
@ -1644,9 +1650,8 @@ static int jbd2_write_superblock(journal_t *journal, blk_opf_t write_flags)
ret = -EIO; ret = -EIO;
} }
if (ret) { if (ret) {
printk(KERN_ERR "JBD2: Error %d detected when updating " printk(KERN_ERR "JBD2: I/O error when updating journal superblock for %s.\n",
"journal superblock for %s.\n", ret, journal->j_devname);
journal->j_devname);
if (!is_journal_aborted(journal)) if (!is_journal_aborted(journal))
jbd2_journal_abort(journal, ret); jbd2_journal_abort(journal, ret);
} }

View file

@ -256,6 +256,7 @@ static int fc_do_one_pass(journal_t *journal,
err = journal->j_fc_replay_callback(journal, bh, pass, err = journal->j_fc_replay_callback(journal, bh, pass,
next_fc_block - journal->j_fc_first, next_fc_block - journal->j_fc_first,
expected_commit_id); expected_commit_id);
brelse(bh);
next_fc_block++; next_fc_block++;
if (err < 0 || err == JBD2_FC_REPLAY_STOP) if (err < 0 || err == JBD2_FC_REPLAY_STOP)
break; break;

View file

@ -168,7 +168,7 @@ static void wait_transaction_locked(journal_t *journal)
int need_to_start; int need_to_start;
tid_t tid = journal->j_running_transaction->t_tid; tid_t tid = journal->j_running_transaction->t_tid;
prepare_to_wait(&journal->j_wait_transaction_locked, &wait, prepare_to_wait_exclusive(&journal->j_wait_transaction_locked, &wait,
TASK_UNINTERRUPTIBLE); TASK_UNINTERRUPTIBLE);
need_to_start = !tid_geq(journal->j_commit_request, tid); need_to_start = !tid_geq(journal->j_commit_request, tid);
read_unlock(&journal->j_state_lock); read_unlock(&journal->j_state_lock);
@ -194,7 +194,7 @@ static void wait_transaction_switching(journal_t *journal)
read_unlock(&journal->j_state_lock); read_unlock(&journal->j_state_lock);
return; return;
} }
prepare_to_wait(&journal->j_wait_transaction_locked, &wait, prepare_to_wait_exclusive(&journal->j_wait_transaction_locked, &wait,
TASK_UNINTERRUPTIBLE); TASK_UNINTERRUPTIBLE);
read_unlock(&journal->j_state_lock); read_unlock(&journal->j_state_lock);
/* /*
@ -920,7 +920,7 @@ void jbd2_journal_unlock_updates (journal_t *journal)
write_lock(&journal->j_state_lock); write_lock(&journal->j_state_lock);
--journal->j_barrier_count; --journal->j_barrier_count;
write_unlock(&journal->j_state_lock); write_unlock(&journal->j_state_lock);
wake_up(&journal->j_wait_transaction_locked); wake_up_all(&journal->j_wait_transaction_locked);
} }
static void warn_dirty_buffer(struct buffer_head *bh) static void warn_dirty_buffer(struct buffer_head *bh)

View file

@ -90,8 +90,14 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key,
return -ENOMEM; return -ENOMEM;
INIT_LIST_HEAD(&entry->e_list); INIT_LIST_HEAD(&entry->e_list);
/* Initial hash reference */ /*
atomic_set(&entry->e_refcnt, 1); * We create entry with two references. One reference is kept by the
* hash table, the other reference is used to protect us from
* mb_cache_entry_delete_or_get() until the entry is fully setup. This
* avoids nesting of cache->c_list_lock into hash table bit locks which
* is problematic for RT.
*/
atomic_set(&entry->e_refcnt, 2);
entry->e_key = key; entry->e_key = key;
entry->e_value = value; entry->e_value = value;
entry->e_reusable = reusable; entry->e_reusable = reusable;
@ -106,15 +112,12 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key,
} }
} }
hlist_bl_add_head(&entry->e_hash_list, head); hlist_bl_add_head(&entry->e_hash_list, head);
/* hlist_bl_unlock(head);
* Add entry to LRU list before it can be found by
* mb_cache_entry_delete() to avoid races
*/
spin_lock(&cache->c_list_lock); spin_lock(&cache->c_list_lock);
list_add_tail(&entry->e_list, &cache->c_list); list_add_tail(&entry->e_list, &cache->c_list);
cache->c_entry_count++; cache->c_entry_count++;
spin_unlock(&cache->c_list_lock); spin_unlock(&cache->c_list_lock);
hlist_bl_unlock(head); mb_cache_entry_put(cache, entry);
return 0; return 0;
} }

View file

@ -527,12 +527,12 @@ err_out:
goto out; goto out;
} }
static inline int ntfs_submit_bh_for_read(struct buffer_head *bh) static inline void ntfs_submit_bh_for_read(struct buffer_head *bh)
{ {
lock_buffer(bh); lock_buffer(bh);
get_bh(bh); get_bh(bh);
bh->b_end_io = end_buffer_read_sync; bh->b_end_io = end_buffer_read_sync;
return submit_bh(REQ_OP_READ, bh); submit_bh(REQ_OP_READ, bh);
} }
/** /**

View file

@ -653,7 +653,7 @@ xfs_fs_destroy_inode(
static void static void
xfs_fs_dirty_inode( xfs_fs_dirty_inode(
struct inode *inode, struct inode *inode,
int flag) int flags)
{ {
struct xfs_inode *ip = XFS_I(inode); struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount; struct xfs_mount *mp = ip->i_mount;
@ -661,7 +661,13 @@ xfs_fs_dirty_inode(
if (!(inode->i_sb->s_flags & SB_LAZYTIME)) if (!(inode->i_sb->s_flags & SB_LAZYTIME))
return; return;
if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
/*
* Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC)
* and has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be passed
* in flags possibly together with I_DIRTY_SYNC.
*/
if ((flags & ~I_DIRTY_TIME) != I_DIRTY_SYNC || !(flags & I_DIRTY_TIME))
return; return;
if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp)) if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))

View file

@ -240,7 +240,7 @@ void ll_rw_block(blk_opf_t, int, struct buffer_head * bh[]);
int sync_dirty_buffer(struct buffer_head *bh); int sync_dirty_buffer(struct buffer_head *bh);
int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
int submit_bh(blk_opf_t, struct buffer_head *); void submit_bh(blk_opf_t, struct buffer_head *);
void write_boundary_block(struct block_device *bdev, void write_boundary_block(struct block_device *bdev,
sector_t bblock, unsigned blocksize); sector_t bblock, unsigned blocksize);
int bh_uptodate_or_lock(struct buffer_head *bh); int bh_uptodate_or_lock(struct buffer_head *bh);

View file

@ -2372,13 +2372,14 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
* don't have to write inode on fdatasync() when only * don't have to write inode on fdatasync() when only
* e.g. the timestamps have changed. * e.g. the timestamps have changed.
* I_DIRTY_PAGES Inode has dirty pages. Inode itself may be clean. * I_DIRTY_PAGES Inode has dirty pages. Inode itself may be clean.
* I_DIRTY_TIME The inode itself only has dirty timestamps, and the * I_DIRTY_TIME The inode itself has dirty timestamps, and the
* lazytime mount option is enabled. We keep track of this * lazytime mount option is enabled. We keep track of this
* separately from I_DIRTY_SYNC in order to implement * separately from I_DIRTY_SYNC in order to implement
* lazytime. This gets cleared if I_DIRTY_INODE * lazytime. This gets cleared if I_DIRTY_INODE
* (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e. * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
* either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
* i_state, but not both. I_DIRTY_PAGES may still be set. * in place because writeback might already be in progress
* and we don't want to lose the time update
* I_NEW Serves as both a mutex and completion notification. * I_NEW Serves as both a mutex and completion notification.
* New inodes set I_NEW. If two processes both create * New inodes set I_NEW. If two processes both create
* the same inode, one of them will release its inode and * the same inode, one of them will release its inode and