Make the anon_inodes facility unconditional so that it can be used by core
VFS code.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit dadd2299ab61fc2b55b95b7b3a8f674cdd3b69c9)
Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I2f97bda4f360d8d05bbb603de839717b3d8067ae
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxbDFsACgkQONu9yGCS
aT4TmQ/+PCgz7TKTu1/Sh7ScmbSjlYFfaQ25K9NzqWDuaTOXCsvrWY4IDdbk7fxZ
WnM8WPpTUfKaRwUVL3T32uwv/uMM8TnXN2r4dB9TH+WF/UK1ovLHQGO5zGWS9erY
dvckRjkRuPe27KYd/sPSA7PRBiWuw1EFqcBIdAuC8OJK1knm3W95C8XVeIkVfey2
/NFIvZELYEOCMzXwPqXbeFG553VtHVGB2GRJqEqZzqZ+JGnUCyBpCgvuwbrQ706l
/Smvz23mmElUGzdTXr4uRPmf2g8bA3a+08/gQM4YPpZYRUMbcCVE3+KaYf3jjJPR
Inj+cKBhF8R6JwkJayzdBJ1KbG6e6DaL2yANqNvoeYCkyP+4GNmoWKebFYklrXr/
jn/btnJtHwSBDVufdgW6uQ039UE66Wfvvi7QrkCgTjebEyf6jU3SnB8Z7HpZZMCg
dwgO6rHKklm6cSJTBZsIDyBd4oj19bTTvOiUcaayQXfeyjlrbWXhH6UOEBze54uT
Ho8yjCq9r2TiO6g9mJOfkkvep1hgNVgabnOyA/papUx3my9sGkUn1Rqg1Pbm9OEF
PV4p853t4+Qe5iBq+yXgacrCg+21q9Xg3ct6QW/c/Quz4UirR9Ayw93C4ej4rJYZ
79/mADmkpH+A1ufP2F80UoWo0IN9oizecfYHOZ+6useUEVgIRik=
=QoXP
-----END PGP SIGNATURE-----
Merge 4.9.155 into android-4.9
Changes in 4.9.155
Fix "net: ipv4: do not handle duplicate fragments as overlapping"
fs: add the fsnotify call to vfs_iter_write
ipv6: Consider sk_bound_dev_if when binding a socket to an address
l2tp: copy 4 more bytes to linear part if necessary
net/mlx4_core: Add masking for a few queries on HCA caps
netrom: switch to sock timer API
net/rose: fix NULL ax25_cb kernel panic
ucc_geth: Reset BQL queue when stopping device
net/mlx5e: Allow MAC invalidation while spoofchk is ON
l2tp: remove l2specific_len dependency in l2tp_core
l2tp: fix reading optional fields of L2TPv3
ipvlan, l3mdev: fix broken l3s mode wrt local routes
CIFS: Do not count -ENODATA as failure for query directory
fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
arm64: hyp-stub: Forbid kprobing of the hyp-stub
arm64: hibernate: Clean the __hyp_text to PoC after resume
gfs2: Revert "Fix loop in gfs2_rbm_find"
platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
mm, oom: fix use-after-free in oom_kill_process
mm: hwpoison: use do_send_sig_info() instead of force_sig()
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
cifs: Always resolve hostname before reconnecting
drivers: core: Remove glue dirs from sysfs earlier
fs: don't scan the inode cache before SB_BORN is set
fanotify: fix handling of events on child sub-directory
Linux 4.9.155
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit b469e7e47c8a075cc08bcd1e85d4365134bdcdd5 upstream.
When an event is reported on a sub-directory and the parent inode has
a mark mask with FS_EVENT_ON_CHILD|FS_ISDIR, the event will be sent to
fsnotify() even if the event type is not in the parent mark mask
(e.g. FS_OPEN).
Further more, if that event happened on a mount or a filesystem with
a mount/sb mark that does have that event type in their mask, the "on
child" event will be reported on the mount/sb mark. That is not
desired, because user will get a duplicate event for the same action.
Note that the event reported on the victim inode is never merged with
the event reported on the parent inode, because of the check in
should_merge(): old_fsn->inode == new_fsn->inode.
Fix this by looking for a match of an actual event type (i.e. not just
FS_ISDIR) in parent's inode mark mask and by not reporting an "on child"
event to group if event type is only found on mount/sb marks.
[backport hint: The bug seems to have always been in fanotify, but this
patch will only apply cleanly to v4.19.y]
Cc: <stable@vger.kernel.org> # v4.19
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[amir: backport to v4.9]
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlre3foACgkQONu9yGCS
aT4ygw/7BxK85/7my26TCcF3NlNfw8x9B1AuEHcjQdUFmSFip9fbolC1VVFQyzWV
1Z6sK5LKrFU/234A4LVW+6oFVnUi8VvX5RnWNqRLj/xpPyv/L5PsG4A7VwQLZDCj
yn6OrIkQ2czv3oio9R8bz1DFxrPsxoCdrMzV1xR/tbWQJscSRSomBSNstZ63lGQP
FleS//hG21092lM0EDPQRfc9/xnwJFMKJoCwBrf6SUR7CK2ZdYp+w+HoeP5BSA49
BCDdMU6wi63gTdfGFJa76VcKuFxRRic3F9VVJwOuYPCfCpgNz7GiUaGXirNCRSxD
F0VveFMx9lOrxMCyDDEoj3lRaZ4FrhwFOF+Q0RK4/p8XH8Pr9emvt4beOeYNk3ev
dFvCI/zA3zmDj8ppB+ZnKq9GTRSuNZ5sw4q7AyFu6fm0mSMV84ZRHgCrVbJkis3i
VDnu6V+Nd9ZYP6rR6AegtQwFM3OReVVJG4ZpB17PZOebLdM+n4DsmqDCgvhFboVP
WowtlR13GIn6ER7hiHfOlwv3CtTUspRrAZi73F8aEA7gdoYzGc8/9Q1fA738WiI4
46utL18qyfjzwbwmI9dtFx1SLkrobSJ+X88PLquLYNUleoxbLx/2u4+v+gm0P4xo
yQGPjIv+kms+rbwPR3ogMKqC2GrYYJs4uHBEPJboJWzYVoPAaEs=
=yoJG
-----END PGP SIGNATURE-----
Merge 4.9.96 into android-4.9
Changes in 4.9.96
tty: make n_tty_read() always abort if hangup is in progress
ubifs: Check ubifs_wbuf_sync() return code
ubi: fastmap: Don't flush fastmap work on detach
ubi: Fix error for write access
ubi: Reject MLC NAND
fs/reiserfs/journal.c: add missing resierfs_warning() arg
resource: fix integer overflow at reallocation
ipc/shm: fix use-after-free of shm file via remap_file_pages()
mm, slab: reschedule cache_reap() on the same CPU
usb: musb: gadget: misplaced out of bounds check
usb: gadget: udc: core: update usb_ep_queue() documentation
ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
ARM: dts: exynos: Fix IOMMU support for GScaler devices on Exynos5250
ARM: dts: at91: sama5d4: fix pinctrl compatible string
spi: Fix scatterlist elements size in spi_map_buf
xen-netfront: Fix hang on device removal
regmap: Fix reversed bounds check in regmap_raw_write()
ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
USB: gadget: f_midi: fixing a possible double-free in f_midi
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
usb: dwc3: pci: Properly cleanup resource
smb3: Fix root directory when server returns inode number of zero
HID: i2c-hid: fix size check and type usage
powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
HID: Fix hid_report_len usage
HID: core: Fix size as type u32
ASoC: ssm2602: Replace reg_default_raw with reg_default
thunderbolt: Resume control channel after hibernation image is created
irqchip/gic: Take lock when updating irq type
random: use a tighter cap in credit_entropy_bits_safe()
jbd2: if the journal is aborted then don't allow update of the log tail
ext4: don't update checksum of new initialized bitmaps
ext4: protect i_disksize update by i_data_sem in direct write path
ext4: fail ext4_iget for root directory if unallocated
RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
RDMA/rxe: Fix an out-of-bounds read
ALSA: pcm: Fix UAF at PCM release via PCM timer access
IB/srp: Fix srp_abort()
IB/srp: Fix completion vector assignment algorithm
dmaengine: at_xdmac: fix rare residue corruption
libnvdimm, namespace: use a safe lookup for dimm device name
nfit, address-range-scrub: fix scrub in-progress reporting
um: Compile with modern headers
um: Use POSIX ucontext_t instead of struct ucontext
iommu/vt-d: Fix a potential memory leak
mmc: jz4740: Fix race condition in IRQ mask update
clk: mvebu: armada-38x: add support for 1866MHz variants
clk: mvebu: armada-38x: add support for missing clocks
clk: fix false-positive Wmaybe-uninitialized warning
clk: bcm2835: De-assert/assert PLL reset signal when appropriate
pwm: rcar: Fix a condition to prevent mismatch value setting to duty
thermal: imx: Fix race condition in imx_thermal_probe()
dt-bindings: clock: mediatek: add binding for fixed-factor clock axisel_d4
watchdog: f71808e_wdt: Fix WD_EN register read
vfio/pci: Virtualize Maximum Read Request Size
ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
drm/amdgpu: Add an ATPX quirk for hybrid laptop
drm/amdgpu: Fix always_valid bos multiple LRU insertions.
drm/amdgpu: Fix PCIe lane width calculation
drm/rockchip: Clear all interrupts before requesting the IRQ
drm/radeon: Fix PCIe lane width calculation
ALSA: line6: Use correct endpoint type for midi output
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: hda - New VIA controller suppor no-snoop path
random: fix crng_ready() test
random: crng_reseed() should lock the crng instance that it is modifying
random: add new ioctl RNDRESEEDCRNG
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
MIPS: uaccess: Add micromips clobbers to bzero invocation
MIPS: memset.S: EVA & fault support for small_memset
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: Fix clobber of v1 in last_fixup
powerpc/eeh: Fix enabling bridge MMIO windows
powerpc/lib: Fix off-by-one in alternate feature patching
udf: Fix leak of UTF-16 surrogates into encoded strings
jffs2_kill_sb(): deal with failed allocations
hypfs_kill_super(): deal with failed allocations
orangefs_kill_sb(): deal with allocation failures
rpc_pipefs: fix double-dput()
Don't leak MNT_INTERNAL away from internal mounts
autofs: mount point create should honour passed in mode
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
fanotify: fix logic of events on child
writeback: safer lock nesting
block/mq: fix potential deadlock during cpu hotplug
Linux 4.9.96
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 54a307ba8d3cd00a3902337ffaae28f436eeb1a4 upstream.
When event on child inodes are sent to the parent inode mark and
parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
will not be delivered to the listener process. However, if the same
process also has a mount mark, the event to the parent inode will be
delivered regadless of the mount mark mask.
This behavior is incorrect in the case where the mount mark mask does
not contain the specific event type. For example, the process adds
a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).
A modify event on a file inside that directory (and inside that mount)
should not create a FAN_MODIFY event, because neither of the marks
requested to get that event on the file.
Fixes: 1968f5eed5 ("fanotify: use both marks when possible")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[natechancellor: Fix small conflict due to lack of 3cd5eca8d7a2f]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmHya0ACgkQONu9yGCS
aT7J4BAArYY7/NDbB+rIFtIdqdk3d6PiOyfQjQB2fSrhdR/+38a3Z+AtfNoPkppd
DQsEnXuZyJeUP5GCNAue1urp0nUuuw/Kr/7DUZvfO8wnjFjS/hfKxnRE8o/S+acN
y/TBFO3LuWSp8ROouAnAr7EEDrLtses6/boQHiioGcb/uqI9JVutkdnOTnm5xYJQ
bAo7dEOlK4trRRGbbMju2FHTFHO/NStos3GD2ORcqS3ibj2FyHvGEmbyyBfmmi+2
Jf5cOvn99lszHweUTxnRw93q6RZcGtmhYoPoANzYdtoZiXkV++EaEAfQ6KPVqeno
mLNPqmWWvgJUsd1uCaR89GXwczVJtqThvGuJMroau+ShLJWuI8mL8N/N+g9wThJr
ormq8PTP3oea9u63hfae1hfr4KsPqQ6UTQcSY7OuT4mgchU5T/DuSyYShy3kdtx4
bT1hzpXwD9y0JCgPVfH1o7ad996D9/MVXpELNREcvrqEGsvRbxBx7gPc4FUzMqaz
vft6z0MX8AvgtNLm9EY5a956ixRG27vGYA7drsLPFXKxIuHQ5oh2nBCupxkrDtGK
0FEWeV6p+DJZm4Gl+0pCYt6AqxzszNW0eVM7sKngbAIYQ77m2h50TOb0HIA0MrKu
Nrjs68hKc/louKc1eYQZnt6bD4EVHVjbpgVP+l1PGpKo/OSnxes=
=tb62
-----END PGP SIGNATURE-----
Merge 4.9.41 into android-4.9
Changes in 4.9.41
af_key: Add lock to key dump
pstore: Make spinlock per zone instead of global
net: reduce skb_warn_bad_offload() noise
jfs: Don't clear SGID when inheriting ACLs
ALSA: fm801: Initialize chip after IRQ handler is registered
ALSA: hda - Add missing NVIDIA GPU codec IDs to patch table
parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases
parisc: Extend disabled preemption in copy_user_page
parisc: Suspend lockup detectors before system halt
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
NFS: invalidate file size when taking a lock.
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
crypto: authencesn - Fix digest_null crash
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
md/raid5: add thread_group worker async_tx_issue_pending_all
drm/vmwgfx: Fix gcc-7.1.1 warning
drm/nouveau/disp/nv50-: bump max chans to 21
drm/nouveau/bar/gf100: fix access to upper half of BAR2
KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
KVM: PPC: Book3S HV: Save/restore host values of debug registers
Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
Staging: comedi: comedi_fops: Avoid orphaned proc entry
drm: rcar-du: Simplify and fix probe error handling
smp/hotplug: Move unparking of percpu threads to the control CPU
smp/hotplug: Replace BUG_ON and react useful
nfc: Fix hangup of RC-S380* in port100_send_ack()
nfc: fdp: fix NULL pointer dereference
net: phy: Do not perform software reset for Generic PHY
isdn: Fix a sleep-in-atomic bug
isdn/i4l: fix buffer overflow
ath10k: fix null deref on wmi-tlv when trying spectral scan
wil6210: fix deadlock when using fw_no_recovery option
mailbox: always wait in mbox_send_message for blocking Tx mode
mailbox: skip complete wait event if timer expired
mailbox: handle empty message in tx_tick
sched/cgroup: Move sched_online_group() back into css_online() to fix crash
RDMA/uverbs: Fix the check for port number
ipmi/watchdog: fix watchdog timeout set on reboot
dentry name snapshots
v4l: s5c73m3: fix negation operator
pstore: Allow prz to control need for locking
pstore: Correctly initialize spinlock and flags
pstore: Use dynamic spinlock initializer
net: skb_needs_check() accepts CHECKSUM_NONE for tx
device-dax: fix sysfs duplicate warnings
x86/mce/AMD: Make the init code more robust
r8169: add support for RTL8168 series add-on card.
ARM: omap2+: fixing wrong strcat for Non-NULL terminated string
dt-bindings: power/supply: Update TPS65217 properties
dt-bindings: input: Specify the interrupt number of TPS65217 power button
ARM: dts: am57xx-idk: Put USB2 port in peripheral mode
ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
net/mlx5: Disable RoCE on the e-switch management port under switchdev mode
ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
net/mlx4_core: Use-after-free causes a resource leak in flow-steering detach
net/mlx4: Remove BUG_ON from ICM allocation routine
net/mlx4_core: Fix raw qp flow steering rules under SRIOV
drm/msm: Ensure that the hardware write pointer is valid
drm/msm: Put back the vaddr in submit_reloc()
drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
vfio-pci: use 32-bit comparisons for register address for gcc-4.5
irqchip/keystone: Fix "scheduling while atomic" on rt
ASoC: tlv320aic3x: Mark the RESET register as volatile
spi: dw: Make debugfs name unique between instances
ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
openrisc: Add _text symbol to fix ksym build error
dmaengine: ioatdma: Add Skylake PCI Dev ID
dmaengine: ioatdma: workaround SKX ioatdma version
l2tp: consider '::' as wildcard address in l2tp_ip6 socket lookup
dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
usb: dwc3: omap: fix race of pm runtime with irq handler in probe
ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
ARM64: zynqmp: Fix i2c node's compatible string
perf probe: Fix to get correct modname from elf header
ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
usb: gadget: Fix copy/pasted error message
Btrfs: use down_read_nested to make lockdep silent
Btrfs: fix lockdep warning about log_mutex
benet: stricter vxlan offloading check in be_features_check
Btrfs: adjust outstanding_extents counter properly when dio write is split
Xen: ARM: Zero reserved fields of xatp before making hypervisor call
tools lib traceevent: Fix prev/next_prio for deadline tasks
xfrm: Don't use sk_family for socket policy lookups
perf tools: Install tools/lib/traceevent plugins with install-bin
perf symbols: Robustify reading of build-id from sysfs
video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
vfio-pci: Handle error from pci_iomap
arm64: mm: fix show_pte KERN_CONT fallout
nvmem: imx-ocotp: Fix wrong register size
net: usb: asix_devices: add .reset_resume for USB PM
ASoC: fsl_ssi: set fifo watermark to more reliable value
sh_eth: enable RX descriptor word 0 shift on SH7734
ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP
ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
perf/x86: Set pmu->module in Intel PMU modules
ASoC: Intel: bytcr-rt5640: fix settings in internal clock mode
HID: ignore Petzl USB headlamp
scsi: fnic: Avoid sending reset to firmware when another reset is in progress
scsi: snic: Return error code on memory allocation failure
scsi: bfa: Increase requested firmware version to 3.2.5.1
ASoC: Intel: Skylake: Release FW ctx in cleanup
ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
Linux 4.9.41
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream.
take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified). In either case the pointer to stable
string is stored into the same structure.
dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().
Intended use:
struct name_snapshot s;
take_dentry_name_snapshot(&s, dentry);
...
access s.name
...
release_dentry_name_snapshot(&s);
Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkm3+kACgkQONu9yGCS
aT72UhAAuNgHUXQWWaTy7Fl0bsIn5TDB5I9hrVll1EDehdqyrPt3bHec9/kClDOs
eBov++TfYFkCbHMRYSzcxX36AXhD52LDeJ7t3D8L59YaASJv/mjTAqhXUvFlsxX+
qNOxFJtfB7eURRMd8Onn/04F1I7h7wNh2Vu49NYZ3NYbCrMVmdUSZOL686lm20gT
sceXEvawCtrJN3+c3g45Kbm4wrB+ymJDtUApRGbzrWm94hbjkD67wn5T0YRDHpfV
oYjIwlZBCjW2yjSfkuLD+1I9ApmbMRUR/WYl1Zthh3FVh16ZCXAGDDhoBb+SOoor
AFkgiikz2DcNoqmWBzQ8W7h46nZbO7Fm6Ejg/F2kSvUY/4C6S5W+T4rsHHCHhcI3
rNGw6j+lPemlNtcIDA7JjfPjbgdtvHRAUHTMBrUpktxpMJU1L2gcbSJnNjgWTVP7
yvnqp8nf2bOKGvv/yKNLAbt1XHOo+TCBERbGA6bXy/5QDMseMB4QnIhEjrUWt1MG
QxgKxdFiSeONnkaAHecB3uaS7CiWSTWMCljESK6rcomXLBAmjVfjzTgrixBLuSzh
Qi/xZTdLcMXYb0U0fK7vohWryxuN43o5Q3m93mAi1Sqm2uTAHE4ZTGm7BKh5O2uZ
tXZ5r9+GkOqvwk6ssfOKDDMprPiMnBxZ/OxvMLjFdA4N5QTPYJE=
=kXgS
-----END PGP SIGNATURE-----
Merge 4.9.30 into android-4.9
Changes in 4.9.30
usb: misc: legousbtower: Fix buffers on stack
usb: misc: legousbtower: Fix memory leak
USB: ene_usb6250: fix DMA to the stack
watchdog: pcwd_usb: fix NULL-deref at probe
char: lp: fix possible integer overflow in lp_setup()
USB: core: replace %p with %pK
tpm_tis_core: Choose appropriate timeout for reading burstcount
ALSA: hda: Fix cpu lockup when stopping the cmd dmas
ARM: tegra: paz00: Mark panel regulator as enabled on boot
fanotify: don't expose EOPENSTALE to userspace
tpm_tis_spi: Use single function to transfer data
tpm_tis_spi: Abort transfer when too many wait states are signaled
tpm_tis_spi: Check correct byte for wait state indicator
tpm_tis_spi: Remove limitation of transfers to MAX_SPI_FRAMESIZE bytes
tpm_tis_spi: Add small delay after last transfer
tpm: msleep() delays - replace with usleep_range() in i2c nuvoton driver
tpm: add sleep only for retry in i2c_nuvoton_write_status()
tpm_crb: check for bad response size
ASoC: cs4271: configure reset GPIO as output
mlx5: Fix mlx5_ib_map_mr_sg mr length
infiniband: call ipv6 route lookup via the stub interface
dm btree: fix for dm_btree_find_lowest_key()
dm raid: select the Kconfig option CONFIG_MD_RAID0
dm bufio: avoid a possible ABBA deadlock
dm bufio: check new buffer allocation watermark every 30 seconds
dm mpath: split and rename activate_path() to prepare for its expanded use
dm cache metadata: fail operations if fail_io mode has been established
dm bufio: make the parameter "retain_bytes" unsigned long
dm thin metadata: call precommit before saving the roots
dm space map disk: fix some book keeping in the disk space map
md: update slab_cache before releasing new stripes when stripes resizing
md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop
rtlwifi: rtl8821ae: setup 8812ae RFE according to device type
mwifiex: MAC randomization should not be persistent
mwifiex: pcie: fix cmd_buf use-after-free in remove/reset
ima: accept previously set IMA_NEW_FILE
KVM: x86: Fix load damaged SSEx MXCSR register
KVM: x86: Fix potential preemption when get the current kvmclock timestamp
KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
x86: fix 32-bit case of __get_user_asm_u64()
regulator: rk808: Fix RK818 LDO2
regulator: tps65023: Fix inverted core enable logic.
s390/kdump: Add final note
s390/cputime: fix incorrect system time
ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device
ath9k_htc: fix NULL-deref at probe
drm/amdgpu: Make display watermark calculations more accurate
drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations.
drm/amdgpu: Add missing lb_vblank_lead_lines setup to DCE-6 path.
drm/nouveau/therm: remove ineffective workarounds for alarm bugs
drm/nouveau/tmr: ack interrupt before processing alarms
drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
drm/nouveau/tmr: avoid processing completed alarms when adding a new one
drm/nouveau/tmr: handle races with hw when updating the next alarm time
gpio: omap: return error if requested debounce time is not possible
cdc-acm: fix possible invalid access when processing notification
ohci-pci: add qemu quirk
cxl: Force context lock during EEH flow
cxl: Route eeh events to all drivers in cxl_pci_error_detected()
proc: Fix unbalanced hard link numbers
of: fix sparse warning in of_pci_range_parser_one
of: fix "/cpus" reference leak in of_numa_parse_cpu_nodes()
of: fdt: add missing allocation-failure check
ibmvscsis: Do not send aborted task response
iio: dac: ad7303: fix channel description
IIO: bmp280-core.c: fix error in humidity calculation
IB/hfi1: Return an error on memory allocation failure
IB/hfi1: Fix a subcontext memory leak
pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
USB: serial: ftdi_sio: fix setting latency for unprivileged users
USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs
USB: chaoskey: fix Alea quirk on big-endian hosts
f2fs: check entire encrypted bigname when finding a dentry
fscrypt: avoid collisions when presenting long encrypted filenames
libnvdimm: fix clear length of nvdimm_forget_poison()
xhci: remove GFP_DMA flag from allocation
usb: host: xhci-plat: propagate return value of platform_get_irq()
xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
net: irda: irda-usb: fix firmware name on big-endian hosts
usbvision: fix NULL-deref at probe
mceusb: fix NULL-deref at probe
ttusb2: limit messages to buffer size
dvb-usb-dibusb-mc-common: Add MODULE_LICENSE
usb: dwc3: gadget: Prevent losing events in event cache
usb: musb: tusb6010_omap: Do not reset the other direction's packet size
usb: musb: Fix trying to suspend while active for OTG configurations
USB: iowarrior: fix info ioctl on big-endian hosts
usb: serial: option: add Telit ME910 support
USB: serial: qcserial: add more Lenovo EM74xx device IDs
USB: serial: mct_u232: fix big-endian baud-rate handling
USB: serial: io_ti: fix div-by-zero in set_termios
USB: hub: fix SS hub-descriptor handling
USB: hub: fix non-SS hub-descriptor handling
ipx: call ipxitf_put() in ioctl error path
iio: proximity: as3935: fix as3935_write
iio: hid-sensor: Store restore poll and hysteresis on S3
s5p-mfc: Fix race between interrupt routine and device functions
gspca: konica: add missing endpoint sanity check
s5p-mfc: Fix unbalanced call to clock management
dib0700: fix NULL-deref at probe
zr364xx: enforce minimum size when reading header
dvb-frontends/cxd2841er: define symbol_rate_min/max in T/C fe-ops
digitv: limit messages to buffer size
dw2102: limit messages to buffer size
cx231xx-audio: fix init error path
cx231xx-audio: fix NULL-deref at probe
cx231xx-cards: fix NULL-deref at probe
powerpc/mm: Ensure IRQs are off in switch_mm()
powerpc/eeh: Avoid use after free in eeh_handle_special_event()
powerpc/book3s/mce: Move add_taint() later in virtual mode
powerpc/pseries: Fix of_node_put() underflow during DLPAR remove
powerpc/iommu: Do not call PageTransHuge() on tail pages
powerpc/64e: Fix hang when debugging programs with relocated kernel
powerpc/tm: Fix FP and VMX register corruption
arm64: KVM: Do not use stack-protector to compile EL2 code
arm: KVM: Do not use stack-protector to compile HYP code
KVM: arm: plug potential guest hardware debug leakage
ARM: 8662/1: module: split core and init PLT sections
ARM: 8670/1: V7M: Do not corrupt vector table around v7m_invalidate_l1 call
ARM: dts: at91: sama5d3_xplained: fix ADC vref
ARM: dts: at91: sama5d3_xplained: not all ADC channels are available
ARM: dts: imx6sx-sdb: Remove OPP override
arm64: dts: hi6220: Reset the mmc hosts
arm64: xchg: hazard against entire exchange variable
arm64: ensure extension of smp_store_release value
arm64: armv8_deprecated: ensure extension of addr
arm64: uaccess: ensure extension of access_ok() addr
arm64: documentation: document tagged pointer stack constraints
staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
staging: rtl8192e: GetTs Fix invalid TID 7 warning.
iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings
metag/uaccess: Fix access_ok()
metag/uaccess: Check access_ok in strncpy_from_user
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
uwb: fix device quirk on big-endian hosts
genirq: Fix chained interrupt data ordering
nvme: unmap CMB and remove sysfs file in reset path
MIPS: Loongson-3: Select MIPS_L1_CACHE_SHIFT_6
osf_wait4(): fix infoleak
um: Fix to call read_initrd after init_bootmem
tracing/kprobes: Enforce kprobes teardown after testing
PCI: hv: Allocate interrupt descriptors with GFP_ATOMIC
PCI: hv: Specify CPU_AFFINITY_ALL for MSI affinity when >= 32 CPUs
PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
PCI: Fix another sanity check bug in /proc/pci mmap
PCI: Only allow WC mmap on prefetchable resources
PCI: Freeze PME scan before suspending devices
mtd: nand: orion: fix clk handling
mtd: nand: omap2: Fix partition creation via cmdline mtdparts
mtd: nand: add ooblayout for old hamming layout
drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2
NFSv4: Fix a hang in OPEN related to server reboot
NFS: Fix use after free in write error path
NFS: Use GFP_NOIO for two allocations in writeback
nfsd: fix undefined behavior in nfsd4_layout_verify
nfsd: encoders mustn't use unitialized values in error cases
drivers: char: mem: Check for address space wraparound with mmap()
drm/i915/gvt: Disable access to stolen memory as a guest
Linux 4.9.30
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 4ff33aafd32e084f5ee7faa54ba06e95f8b1b8af upstream.
When delivering an event to userspace for a file on an NFS share,
if the file is deleted on server side before user reads the event,
user will not get the event.
If the event queue contained several events, the stale event is
quietly dropped and read() returns to user with events read so far
in the buffer.
If the event queue contains a single stale event or if the stale
event is a permission event, read() returns to user with the kernel
internal error code 518 (EOPENSTALE), which is not a POSIX error code.
Check the internal return value -EOPENSTALE in fanotify_read(), just
the same as it is checked in path_openat() and drop the event in the
cases that it is not already dropped.
This is a reproducer from Marko Rauhamaa:
Just take the example program listed under "man fanotify" ("fantest")
and follow these steps:
==============================================================
NFS Server NFS Client(1) NFS Client(2)
==============================================================
# echo foo >/nfsshare/bar.txt
# cat /nfsshare/bar.txt
foo
# ./fantest /nfsshare
Press enter key to terminate.
Listening for events.
# rm -f /nfsshare/bar.txt
# cat /nfsshare/bar.txt
read: Unknown error 518
cat: /nfsshare/bar.txt: Operation not permitted
==============================================================
where NFS Client (1) and (2) are two terminal sessions on a single NFS
Client machine.
Reported-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Tested-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.
Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Patch "vfs: add d_canonical_path for stacked filesystem support"
erroneously updated the ALL_INOTIFY_BITS count. This changes it back
Change-Id: Idb04edc736da276159d30f04c40cff9d6b1e070f
Inotify does not currently know when a filesystem
is acting as a wrapper around another fs. This means
that inotify watchers will miss any modifications to
the base file, as well as any made in a separate
stacked fs that points to the same file.
d_canonical_path solves this problem by allowing the fs
to map a dentry to a path in the lower fs. Inotify
can use it to find the appropriate place to watch to
be informed of all changes to a file.
Change-Id: I09563baffad1711a045e45c1bd0bd8713c2cc0b6
Signed-off-by: Daniel Rosenberg <drosen@google.com>
commit 5716863e0f8251d3360d4cbfc0e44e08007075df upstream.
fsnotify_unmount_inodes() plays complex tricks to pin next inode in the
sb->s_inodes list when iterating over all inodes. Furthermore the code has a
bug that if the current inode is the last on i_sb_list that does not have e.g.
I_FREEING set, then we leave next_i pointing to inode which may get removed
from the i_sb_list once we drop s_inode_list_lock thus resulting in
use-after-free issues (usually manifesting as infinite looping in
fsnotify_unmount_inodes()).
Fix the problem by keeping current inode pinned somewhat longer. Then we can
make the code much simpler and standard.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use assert_spin_locked() macro instead of hand-made BUG_ON statements.
Link: http://lkml.kernel.org/r/1474537439-18919-1-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When freeing permission events by fsnotify_destroy_event(), the warning
WARN_ON(!list_empty(&event->list)); may falsely hit.
This is because although fanotify_get_response() saw event->response
set, there is nothing to make sure the current CPU also sees the removal
of the event from the list. Add proper locking around the WARN_ON() to
avoid the false warning.
Link: http://lkml.kernel.org/r/1473797711-14111-7-git-send-email-jack@suse.cz
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fanotify code has its own lock (access_lock) to protect a list of events
waiting for a response from userspace.
However this is somewhat awkward as the same list_head in the event is
protected by notification_lock if it is part of the notification queue
and by access_lock if it is part of the fanotify private queue which
makes it difficult for any reliable checks in the generic code. So make
fanotify use the same lock - notification_lock - for protecting its
private event list.
Link: http://lkml.kernel.org/r/1473797711-14111-6-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
notification_mutex is used to protect the list of pending events. As such
there's no reason to use a sleeping lock for it. Convert it to a
spinlock.
[jack@suse.cz: fixed version]
Link: http://lkml.kernel.org/r/1474031567-1831-1-git-send-email-jack@suse.cz
Link: http://lkml.kernel.org/r/1473797711-14111-5-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fsnotify_flush_notify() and fanotify_release() destroy notification
event while holding notification_mutex.
The destruction of fanotify event includes a path_put() call which may
end up calling into a filesystem to delete an inode if we happen to be
the last holders of dentry reference which happens to be the last holder
of inode reference.
That in turn may violate lock ordering for some filesystems since
notification_mutex is also acquired e. g. during write when generating
fanotify event.
Also this is the only thing that forces notification_mutex to be a
sleeping lock. So drop notification_mutex before destroying a
notification event.
Link: http://lkml.kernel.org/r/1473797711-14111-4-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).
However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock. Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.
Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().
Fixes: 5838d4442b ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement a function that can be called when a group is being shutdown
to stop queueing new events to the group. Fanotify will use this.
Fixes: 5838d4442b ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inotify instance is destroyed when all references to it are dropped.
That not only means that the corresponding file descriptor needs to be
closed but also that all corresponding instance marks are freed (as each
mark holds a reference to the inotify instance). However marks are
freed only after SRCU period ends which can take some time and thus if
user rapidly creates and frees inotify instances, number of existing
inotify instances can exceed max_user_instances limit although from user
point of view there is always at most one existing instance. Thus
inotify_init() returns EMFILE error which is hard to justify from user
point of view. This problem is exposed by LTP inotify06 testcase on
some machines.
We fix the problem by making sure all group marks are properly freed
while destroying inotify instance. We wait for SRCU period to end in
that path anyway since we have to make sure there is no event being
added to the instance while we are tearing down the instance. So it
takes only some plumbing to allow for marks to be destroyed in that path
as well and not from a dedicated work item.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Tested-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't require a dedicated thread for fsnotify cleanup. Switch it
over to a workqueue job instead that runs on the system_unbound_wq.
In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit c510eff6be ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").
Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.
The above commit changed the code to use call_srcu to clean up the
marks. While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.
There's also another potential problem with using call_srcu here. While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.
It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys. While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.
This patch reverts the above patch. A later patch will take a different
approach to eliminated the dedicated thread here.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At the time that this code was originally written, call_srcu didn't
exist, so this thread was required to ensure that we waited for that
SRCU grace period to settle before finally freeing the object.
It does exist now however and we can much more efficiently use call_srcu
to handle this. That also allows us to potentially use srcu_barrier to
ensure that they are all of the callbacks have run before proceeding.
In order to conserve space, we union the rcu_head with the g_list.
This will be necessary for nfsd which will allocate marks from a
dedicated slabcache. We have to be able to ensure that all of the
objects are destroyed before destroying the cache. That's fairly
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: Eric Paris <eparis@parisplace.org>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To make the intention clearer, use list_next_entry instead of
list_entry.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comment here says that it is checking for invalid bits. But, the mask
is *actually* checking to ensure that _any_ valid bit is set, which is
quite different.
Without this check, an unexpected bit could get set on an inotify object.
Since these bits are also interpreted by the fsnotify/dnotify code, there
is the potential for an object to be mishandled inside the kernel. For
instance, can we be sure that setting the dnotify flag FS_DN_RENAME on an
inotify watch is harmless?
Add the actual check which was intended. Retain the existing inotify bits
are being added to the watch. Plus, this is existing behavior which would
be nice to preserve.
I did a quick sniff test that inotify functions and that my
'inotify-tools' package passes 'make check'.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was a report that my patch:
inotify: actually check for invalid bits in sys_inotify_add_watch()
broke CRIU.
The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/* to
figure out how to rebuild inotify watches and then passes those flags
directly back in to the inotify API. One of those flags
(FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the inotify
API. It is used inside the kernel to _implement_ inotify but it is not
and has never been part of the API.
My patch above ensured that we only allow bits which are part of the API
(IN_ALL_EVENTS). This broke CRIU.
FS_EVENT_ON_CHILD is really internal to the kernel. It is set _anyway_ on
all inotify marks. So, CRIU was really just trying to set a bit that was
already set.
This patch hides that bit from fdinfo. CRIU will not see the bit, not try
to set it, and should work as before. We should not have been exposing
this bit in the first place, so this is a good patch independent of the
CRIU problem.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: Andrey Wagin <avagin@gmail.com>
Acked-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
"In this one:
- d_move fixes (Eric Biederman)
- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)
- switch of sb_writers to percpu rwsem (Oleg Nesterov)
- superblock scalability (Josef Bacik and Dave Chinner)
- swapon(2) race fix (Hugh Dickins)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...
fsnotify_destroy_mark_locked() is subtle to use because it temporarily
releases group->mark_mutex. To avoid future problems with this
function, split it into two.
fsnotify_detach_mark() is the part that needs group->mark_mutex and
fsnotify_free_mark() is the part that must be called outside of
group->mark_mutex. This way it's much clearer what's going on and we
also avoid some pointless acquisitions of group->mark_mutex.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away. However we can free all of
the marks without using a special list with some care.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A check in inotify_fdinfo() checking whether mark is valid was always
true due to a bug. Luckily we can never get to invalidated marks since
we hold mark_mutex and invalidated marks get removed from the group list
when they are invalidated under that mutex.
Anyway fix the check to make code more future proof.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have a _tiny_ microbenchmark that sits in a loop and writes single
bytes to a file. Writing one byte to a tmpfs file is around 2x slower
than reading one byte from a file, which is a _bit_ more than I expecte.
This is a dumb benchmark, but I think it's hard to deny that write() is
a hot path and we should avoid unnecessary overhead there.
I did a 'perf record' of 30-second samples of read and write. The top
item in a diffprofile is srcu_read_lock() from fsnotify(). There are
active inotify fd's from systemd, but nothing is actually listening to
the file or its part of the filesystem.
I *think* we can avoid taking the srcu_read_lock() for the common case
where there are no actual marks on the file. This means that there will
both be nothing to notify for *and* implies that there is no need for
clearing the ignore mask.
This patch gave a 13.1% speedup in writes/second on my test, which is an
improvement from the 10.8% that I saw with the last version.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.
Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list. This method
is safe even when entries from the list can disappear once we drop the
lock.
Signed-off-by: Jan Kara <jack@suse.com>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit a2673b6e04.
Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:
"Thanks for report! You are right that my patch introduces a race
between fsnotify kthread and fsnotify_destroy_group() which can result
in leaking inotify event on group destruction.
I haven't yet decided whether the right fix is not to queue events for
dying notification group (as that is pointless anyway) or whether we
should just fix the original problem differently... Whenever I look
at fsnotify code mark handling I get lost in the maze of locks, lists,
and subtle differences between how different notification systems
handle notification marks :( I'll think about it over night"
and after thinking about it, Jan says:
"OK, I have looked into the code some more and I found another
relatively simple way of fixing the original oops. It will be IMHO
better than trying to fixup this issue which has more potential for
breakage. I'll ask Linus to revert the fsnotify fix he already merged
and send a new fix"
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Requested-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference free
memory in the loop there.
Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that
we need to call a ->freeing_mark() callback which may acquire mark_mutex
again. To avoid this and similar lock inversion issues, we move the call
to ->freeing_mark() callback to the kthread destroying the mark.
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The INOTIFY_USER option is bool, and hence this code is either
present or absent. It will never be modular, so using
module_init as an alias for __initcall is rather misleading.
Fix this up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.
Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of fs_initcall (which
makes sense for fs code) will thus change this registration
from level 6-device to level 5-fs (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
With FAN_ONDIR set, the user can end up getting events, which it hasn't
marked. This was revealed with fanotify04 testcase failure on
Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0d7
("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").
# /opt/ltp/testcases/bin/fanotify04
[ ... ]
fanotify04 7 TPASS : event generated properly for type 100000
fanotify04 8 TFAIL : fanotify04.c:147: got unexpected event 30
fanotify04 9 TPASS : No event as expected
The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
a fanotify on a dir. Then does an open(), followed by close() of the
directory and expects to see an event FAN_OPEN(0x20). However, the
fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)). This happens due to
the flaw in the check for event_mask in fanotify_should_send_event() which
does:
if (event_mask & marks_mask & ~marks_ignored_mask)
return true;
where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
marks_mask == (FAN_ONDIR | FAN_OPEN),
marks_ignored_mask == 0
Fix this by masking the outgoing events to the user, as we already take
care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
rather than d_is_dir() when checking a dir watch and give an error on fake
directories.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently FAN_ONDIR is always set on a mark's ignored mask when the
event mask is extended without FAN_MARK_ONDIR being set. This may
result in events for directories being ignored unexpectedly for call
sequences like
fanotify_mark(fd, FAN_MARK_ADD, FAN_OPEN | FAN_ONDIR , AT_FDCWD, "dir");
fanotify_mark(fd, FAN_MARK_ADD, FAN_CLOSE, AT_FDCWD, "dir");
Also FAN_MARK_ONDIR is only honored when adding events to a mark's mask,
but not for event removal. Fix both issues by not setting FAN_ONDIR
implicitly on the ignore mask any more. Instead treat FAN_ONDIR as any
other event flag and require FAN_MARK_ONDIR to be set by the user for
both event mask and ignore mask. Furthermore take FAN_MARK_ONDIR into
account when set for event removal.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If removing bits from a mark's ignored mask, the concerning
inodes/vfsmounts mask is not affected. So don't recalculate it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In fanotify_mark_remove_from_mask() a mark is destroyed if only one of
both bitmasks (mask or ignored_mask) of a mark is cleared. However the
other mask may still be set and contain information that should not be
lost. So only destroy a mark if both masks are cleared.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As per e23738a730 ("sched, inotify: Deal with nested sleeps").
fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).
Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.
The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.
If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o
Size of arch/powerpc/boot/zImage changes from
text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after
so the savings are about ~2000 bytes.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
destroy_list is used to track marks which still need waiting for srcu
period end before they can be freed. However by the time mark is added to
destroy_list it isn't in group's list of marks anymore and thus we can
reuse fsnotify_mark->g_list for queueing into destroy_list. This saves
two pointers for each fsnotify_mark.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>