Commit graph

628 commits

Author SHA1 Message Date
Michal Hocko
f372ac0415
BACKPORT: mm: introduce kv[mz]alloc helpers
Patch series "kvmalloc", v5.

There are many open coded kmalloc with vmalloc fallback instances in the
tree.  Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

This patch (of 9):

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code.  Yet we do not have any common helper
for that and so users have invented their own helpers.  Some of them are
really creative when doing so.  Let's just add kv[mz]alloc and make sure
it is implemented properly.  This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures.  This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them.  In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL).  Those need to be
fixed separately.

While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers.  Existing ones would have to die
slowly.

[sfr@canb.auug.org.au: f2fs fixup]
  Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>	[ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: DennySPB <dennyspb@gmail.com>

,
2023-02-21 00:19:39 +03:00
Greg Kroah-Hartman
60fbe7034f This is the 4.9.215 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5ZJvIACgkQONu9yGCS
 aT4YXxAAkGwu5Kr2xdjQVfkG/P/v0DBf1QJ1xxUGlOt0eql0Kl9aiDSOoHDwLBpX
 L+nCaXdH5miLR/U3bhV95UW2uR9Ma/Q4hSyWv8ESzMvv3ov8zauCaCjRpIY00+0X
 2HjEl50SV1ST5fb8U55JIaYp2kNGXLsrnL2/J7NiYq/gv2+2qAQVup3rnTWk4YMB
 Zz3XCoyVw5m1Wn3D5sPe3S6z0TQq5wAqzH2hFgHIkOddIEbD+ZBtWWcWVSuVfBu6
 eyJBSNz75SV7eLK6MUVQyfKMIA0rQpidGHPO9R9QspS9wUT/PG4veCGsrvIXOQKG
 wD/PbVlvGCI0aj87dbvJjd4N24HEFPM85Kq5KcJdFYwSTgpf9h/R/Y3amZ1EzS1Y
 +7IzDWAdw2jqWo7Ys78bBoT2BGAHTkRbj93jTVv8cjow88URHQeakIyFPh2+nQg3
 +ky04NGy/eWzNRNwBB6BdpukZZ5WVwcmYq7VhDkHLOvqhH5Q2eom+pUKgLszvZCY
 U/Nv2wCQB+gVXEAZ1LNqG8uePBkJXZSwCEAoMTQMAC0riiJSAtS1BZPnr+x2ZdJD
 OrN7DsFvTeJLsh/SDg+s9mVvO4YotQ5JcBbmGdbXDjuEIaxpncDIo5MM9y5CE/Ej
 hW6n8tNPbIZfkzEDsBMkhHbUoDtKLgdZql7dMmeVWIsRi3UfrVM=
 =EsN0
 -----END PGP SIGNATURE-----

Merge 4.9.215 into android-4.9-q

Changes in 4.9.215
	x86/vdso: Use RDPID in preference to LSL when available
	KVM: x86: emulate RDPID
	ALSA: hda: Use scnprintf() for printing texts for sysfs/procfs
	ecryptfs: fix a memory leak bug in parse_tag_1_packet()
	ecryptfs: fix a memory leak bug in ecryptfs_init_messaging()
	ALSA: usb-audio: Apply sample rate quirk for Audioengine D1
	ext4: don't assume that mmp_nodename/bdevname have NUL
	ext4: fix checksum errors with indexed dirs
	ext4: improve explanation of a mount failure caused by a misconfigured kernel
	Btrfs: fix race between using extent maps and merging them
	btrfs: log message when rw remount is attempted with unclean tree-log
	perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's event map
	padata: Remove broken queue flushing
	s390/time: Fix clk type in get_tod_clock
	perf/x86/intel: Fix inaccurate period in context switch for auto-reload
	hwmon: (pmbus/ltc2978) Fix PMBus polling of MFR_COMMON definitions.
	jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
	jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
	btrfs: print message when tree-log replay starts
	scsi: qla2xxx: fix a potential NULL pointer dereference
	Revert "KVM: VMX: Add non-canonical check on writes to RTIT address MSRs"
	drm/gma500: Fixup fbdev stolen size usage evaluation
	cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
	brcmfmac: Fix use after free in brcmf_sdio_readframes()
	gianfar: Fix TX timestamping with a stacked DSA driver
	pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs
	pxa168fb: Fix the function used to release some memory in an error handling path
	media: i2c: mt9v032: fix enum mbus codes and frame sizes
	powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
	gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap()
	media: sti: bdisp: fix a possible sleep-in-atomic-context bug in bdisp_device_run()
	pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
	efi/x86: Map the entire EFI vendor string before copying it
	MIPS: Loongson: Fix potential NULL dereference in loongson3_platform_init()
	sparc: Add .exit.data section.
	uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()
	usb: gadget: udc: fix possible sleep-in-atomic-context bugs in gr_probe()
	jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
	x86/sysfb: Fix check for bad VRAM size
	tracing: Fix tracing_stat return values in error handling paths
	tracing: Fix very unlikely race of registering two stat tracers
	ext4, jbd2: ensure panic when aborting with zero errno
	kconfig: fix broken dependency in randconfig-generated .config
	clk: qcom: rcg2: Don't crash if our parent can't be found; return an error
	drm/amdgpu: remove 4 set but not used variable in amdgpu_atombios_get_connector_info_from_object_table
	regulator: rk808: Lower log level on optional GPIOs being not available
	net/wan/fsl_ucc_hdlc: reject muram offsets above 64K
	PCI/IOV: Fix memory leak in pci_iov_add_virtfn()
	NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu().
	media: v4l2-device.h: Explicitly compare grp{id,mask} to zero in v4l2_device macros
	reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
	ALSA: usx2y: Adjust indentation in snd_usX2Y_hwdep_dsp_status
	b43legacy: Fix -Wcast-function-type
	ipw2x00: Fix -Wcast-function-type
	iwlegacy: Fix -Wcast-function-type
	rtlwifi: rtl_pci: Fix -Wcast-function-type
	orinoco: avoid assertion in case of NULL pointer
	ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
	scsi: aic7xxx: Adjust indentation in ahc_find_syncrate
	drm/mediatek: handle events when enabling/disabling crtc
	ARM: dts: r8a7779: Add device node for ARM global timer
	x86/vdso: Provide missing include file
	PM / devfreq: rk3399_dmc: Add COMPILE_TEST and HAVE_ARM_SMCCC dependency
	pinctrl: sh-pfc: sh7269: Fix CAN function GPIOs
	RDMA/rxe: Fix error type of mmap_offset
	ALSA: sh: Fix compile warning wrt const
	tools lib api fs: Fix gcc9 stringop-truncation compilation error
	usbip: Fix unsafe unaligned pointer usage
	udf: Fix free space reporting for metadata and virtual partitions
	soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
	rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls
	Input: edt-ft5x06 - work around first register access error
	wan: ixp4xx_hss: fix compile-testing on 64-bit
	ASoC: atmel: fix build error with CONFIG_SND_ATMEL_SOC_DMA=m
	tty: synclinkmp: Adjust indentation in several functions
	tty: synclink_gt: Adjust indentation in several functions
	driver core: platform: Prevent resouce overflow from causing infinite loops
	driver core: Print device when resources present in really_probe()
	vme: bridges: reduce stack usage
	drm/nouveau/gr/gk20a,gm200-: add terminators to method lists read from fw
	drm/nouveau: Fix copy-paste error in nouveau_fence_wait_uevent_handler
	drm/vmwgfx: prevent memory leak in vmw_cmdbuf_res_add
	usb: musb: omap2430: Get rid of musb .set_vbus for omap2430 glue
	iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
	scsi: iscsi: Don't destroy session if there are outstanding connections
	arm64: fix alternatives with LLVM's integrated assembler
	pwm: omap-dmtimer: Remove PWM chip in .remove before making it unfunctional
	cmd64x: potential buffer overflow in cmd64x_program_timings()
	ide: serverworks: potential overflow in svwks_set_pio_mode()
	remoteproc: Initialize rproc_class before use
	x86/decoder: Add TEST opcode to Group3-2
	s390/ftrace: generate traced function stack frame
	driver core: platform: fix u32 greater or equal to zero comparison
	ALSA: hda - Add docking station support for Lenovo Thinkpad T420s
	powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
	jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
	ARM: 8951/1: Fix Kexec compilation issue.
	hostap: Adjust indentation in prism2_hostapd_add_sta
	iwlegacy: ensure loop counter addr does not wrap and cause an infinite loop
	cifs: fix NULL dereference in match_prepath
	irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
	drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
	ftrace: fpid_next() should increase position index
	trigger_next should increase position index
	radeon: insert 10ms sleep in dce5_crtc_load_lut
	ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
	lib/scatterlist.c: adjust indentation in __sg_alloc_table
	reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
	bcache: explicity type cast in bset_bkey_last()
	irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
	iwlwifi: mvm: Fix thermal zone registration
	microblaze: Prevent the overflow of the start
	brd: check and limit max_part par
	help_next should increase position index
	selinux: ensure we cleanup the internal AVC counters on error in avc_update()
	enic: prevent waking up stopped tx queues over watchdog reset
	net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS
	net/sched: flower: add missing validation of TCA_FLOWER_FLAGS
	floppy: check FDC index for errors before assigning it
	vt: selection, handle pending signals in paste_selection
	staging: android: ashmem: Disallow ashmem memory from being remapped
	staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
	xhci: Force Maximum Packet size for Full-speed bulk devices to valid range.
	usb: uas: fix a plug & unplug racing
	USB: Fix novation SourceControl XL after suspend
	USB: hub: Don't record a connect-change event during reset-resume
	staging: rtl8188eu: Fix potential security hole
	staging: rtl8188eu: Fix potential overuse of kernel memory
	x86/mce/amd: Publish the bank pointer only after setup has succeeded
	x86/mce/amd: Fix kobject lifetime
	tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode
	tty: serial: imx: setup the correct sg entry for tx dma
	Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
	xhci: apply XHCI_PME_STUCK_QUIRK to Intel Comet Lake platforms
	KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI
	VT_RESIZEX: get rid of field-by-field copyin
	vt: vt_ioctl: fix race in VT_RESIZEX
	lib/stackdepot.c: fix global out-of-bounds in stack_slabs
	KVM: nVMX: Don't emulate instructions in guest mode
	netfilter: xt_bpf: add overflow checks
	ext4: fix a data race in EXT4_I(inode)->i_disksize
	ext4: add cond_resched() to __ext4_find_entry()
	ext4: fix mount failure with quota configured as module
	ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
	ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
	KVM: nVMX: Refactor IO bitmap checks into helper function
	KVM: nVMX: Check IO instruction VM-exit conditions
	KVM: apic: avoid calculating pending eoi from an uninitialized val
	Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
	scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
	scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
	usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
	staging: greybus: use after free in gb_audio_manager_remove_all()
	ecryptfs: replace BUG_ON with error handling code
	ALSA: rawmidi: Avoid bit fields for state flags
	ALSA: seq: Avoid concurrent access to queue flags
	ALSA: seq: Fix concurrent access to queue current tick/time
	netfilter: xt_hashlimit: limit the max size of hashtable
	ata: ahci: Add shutdown to freeze hardware resources of ahci
	xen: Enable interrupts when calling _cond_resched()
	s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range
	Linux 4.9.215

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4c663321dde48cd2a324e59acb70c99f75f9344e
2020-02-28 16:59:01 +01:00
Ioanna Alifieraki
cc79150ad3 Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
commit edf28f4061afe4c2d9eb1c3323d90e882c1d6800 upstream.

This reverts commit a979558448.

Commit a979558448 ("ipc,sem: remove uneeded sem_undo_list lock usage
in exit_sem()") removes a lock that is needed.  This leads to a process
looping infinitely in exit_sem() and can also lead to a crash.  There is
a reproducer available in [1] and with the commit reverted the issue
does not reproduce anymore.

Using the reproducer found in [1] is fairly easy to reach a point where
one of the child processes is looping infinitely in exit_sem between
for(;;) and if (semid == -1) block, while it's trying to free its last
sem_undo structure which has already been freed by freeary().

Each sem_undo struct is on two lists: one per semaphore set (list_id)
and one per process (list_proc).  The list_id list tracks undos by
semaphore set, and the list_proc by process.

Undo structures are removed either by freeary() or by exit_sem().  The
freeary function is invoked when the user invokes a syscall to remove a
semaphore set.  During this operation freeary() traverses the list_id
associated with the semaphore set and removes the undo structures from
both the list_id and list_proc lists.

For this case, exit_sem() is called at process exit.  Each process
contains a struct sem_undo_list (referred to as "ulp") which contains
the head for the list_proc list.  When the process exits, exit_sem()
traverses this list to remove each sem_undo struct.  As in freeary(),
whenever a sem_undo struct is removed from list_proc, it is also removed
from the list_id list.

Removing elements from list_id is safe for both exit_sem() and freeary()
due to sem_lock().  Removing elements from list_proc is not safe;
freeary() locks &un->ulp->lock when it performs
list_del_rcu(&un->list_proc) but exit_sem() does not (locking was
removed by commit a979558448 ("ipc,sem: remove uneeded sem_undo_list
lock usage in exit_sem()").

This can result in the following situation while executing the
reproducer [1] : Consider a child process in exit_sem() and the parent
in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)).

 - The list_proc for the child contains the last two undo structs A and
   B (the rest have been removed either by exit_sem() or freeary()).

 - The semid for A is 1 and semid for B is 2.

 - exit_sem() removes A and at the same time freeary() removes B.

 - Since A and B have different semid sem_lock() will acquire different
   locks for each process and both can proceed.

The bug is that they remove A and B from the same list_proc at the same
time because only freeary() acquires the ulp lock. When exit_sem()
removes A it makes ulp->list_proc.next to point at B and at the same
time freeary() removes B setting B->semid=-1.

At the next iteration of for(;;) loop exit_sem() will try to remove B.

The only way to break from for(;;) is for (&un->list_proc ==
&ulp->list_proc) to be true which is not. Then exit_sem() will check if
B->semid=-1 which is and will continue looping in for(;;) until the
memory for B is reallocated and the value at B->semid is changed.

At that point, exit_sem() will crash attempting to unlink B from the
lists (this can be easily triggered by running the reproducer [1] a
second time).

To prove this scenario instrumentation was added to keep information
about each sem_undo (un) struct that is removed per process and per
semaphore set (sma).

          CPU0                                CPU1
  [caller holds sem_lock(sma for A)]      ...
  freeary()                               exit_sem()
  ...                                     ...
  ...                                     sem_lock(sma for B)
  spin_lock(A->ulp->lock)                 ...
  list_del_rcu(un_A->list_proc)           list_del_rcu(un_B->list_proc)

Undo structures A and B have different semid and sem_lock() operations
proceed.  However they belong to the same list_proc list and they are
removed at the same time.  This results into ulp->list_proc.next
pointing to the address of B which is already removed.

After reverting commit a979558448 ("ipc,sem: remove uneeded
sem_undo_list lock usage in exit_sem()") the issue was no longer
reproducible.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779

Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
Fixes: a979558448 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Herton R. Krzesinski <herton@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <malat@debian.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28 15:42:49 +01:00
Greg Kroah-Hartman
4ebd29edaf This is the 4.9.188 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1JqvYACgkQONu9yGCS
 aT78HA//TVAfvVC0NqHbLFIIgkkZ1ygQc9A7K0hhkdCvZXcwSQgaxWO3uhHOeVNz
 yK2RS5GqxsNQaSnsmO66cfb6+B1YzmnRuXbZsKU9AcA+rs5rhRGZx1Ax+nMxbpeC
 zQ6Yi9aATa+R5O/yt3emtxL/Ml7W82sb0kE/o2YGJBDamiOmWEAnf6ROTz9euzZ3
 bm2ifHNDiN5EG49J6PgQLXGNca9YqxFmpH6bts+mdNIBmv1M46QyD4nMtke3q4fz
 XDLy9RoPXvbUt9i+xRf18Z5lHMSMqdS9+rU7iOK5E3mir9slWqTNLHN6B7dObvH3
 lmICvdYUo+CucVHMHlihUNtPWpw4q1Ml5f8gY9p9qNE7WNErbt8Do9hCi7RKEpwL
 o1jIVoJcWdeElbTsMMOnqnypOX54JsE4NrValuuuRuC8LqqsFe6AxbOoqouQHzqW
 TOuJFw6thJlYXhsZCNciNLSlyD7PbobeBuZ+6DSfeGhqmGR0/5+Aq9bbZmxsLd0p
 OMsnwkiboYBhTn/IaC0jpIVwqcI+NvKQG0fwlAR0KqILwLrvbe62WMQS7OxoKxix
 4fj5B1tZMHKWKMdE0GPFE0f2bfVMJMMnF+ra8ilgLLI1cuOPGwjXHpT0Nio9Uqtb
 UQ2Q9R8MpxUCYkOLy/22U0cNrc29ND9dzp9mkaGpDH0PJUwVg2s=
 =i+NY
 -----END PGP SIGNATURE-----

Merge 4.9.188 into android-4.9-q

Changes in 4.9.188
	ARM: riscpc: fix DMA
	ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200
	ARM: dts: rockchip: Make rk3288-veyron-mickey's emmc work again
	ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend
	ftrace: Enable trampoline when rec count returns back to one
	kernel/module.c: Only return -EEXIST for modules that have finished loading
	MIPS: lantiq: Fix bitfield masking
	dmaengine: rcar-dmac: Reject zero-length slave DMA requests
	fs/adfs: super: fix use-after-free bug
	btrfs: fix minimum number of chunk errors for DUP
	ceph: fix improper use of smp_mb__before_atomic()
	ceph: return -ERANGE if virtual xattr value didn't fit in buffer
	scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
	ACPI: fix false-positive -Wuninitialized warning
	be2net: Signal that the device cannot transmit during reconfiguration
	x86/apic: Silence -Wtype-limits compiler warnings
	x86: math-emu: Hide clang warnings for 16-bit overflow
	mm/cma.c: fail if fixed declaration can't be honored
	coda: add error handling for fget
	coda: fix build using bare-metal toolchain
	uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers
	drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
	ipc/mqueue.c: only perform resource calculation if user valid
	x86/kvm: Don't call kvm_spurious_fault() from .fixup
	x86, boot: Remove multiple copy of static function sanitize_boot_params()
	kbuild: initialize CLANG_FLAGS correctly in the top Makefile
	Btrfs: fix incremental send failure after deduplication
	mmc: dw_mmc: Fix occasional hang after tuning on eMMC
	gpiolib: fix incorrect IRQ requesting of an active-low lineevent
	selinux: fix memory leak in policydb_init()
	s390/dasd: fix endless loop after read unit address configuration
	drivers/perf: arm_pmu: Fix failure path in PM notifier
	xen/swiotlb: fix condition for calling xen_destroy_contiguous_region()
	IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification
	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
	infiniband: fix race condition between infiniband mlx4, mlx5 driver and core dumping
	coredump: fix race condition between collapse_huge_page() and core dumping
	eeprom: at24: make spd world-readable again
	Backport minimal compiler_attributes.h to support GCC 9
	include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
	objtool: Support GCC 9 cold subfunction naming scheme
	x86, mm, gup: prevent get_page() race with munmap in paravirt guest
	Linux 4.9.188

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-08-06 18:45:16 +02:00
Kees Cook
62369d5c01 ipc/mqueue.c: only perform resource calculation if user valid
[ Upstream commit a318f12ed8843cfac53198390c74a565c632f417 ]

Andreas Christoforou reported:

  UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
  9 * 2305843009213693951 cannot be represented in type 'long int'
  ...
  Call Trace:
    mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
    evict+0x472/0x8c0 fs/inode.c:558
    iput_final fs/inode.c:1547 [inline]
    iput+0x51d/0x8c0 fs/inode.c:1573
    mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
    mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
    vfs_mkobj+0x39e/0x580 fs/namei.c:2892
    prepare_open ipc/mqueue.c:731 [inline]
    do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771

Which could be triggered by:

        struct mq_attr attr = {
                .mq_flags = 0,
                .mq_maxmsg = 9,
                .mq_msgsize = 0x1fffffffffffffff,
                .mq_curmsgs = 0,
        };

        if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
                perror("mq_open");

mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
preparing to return -EINVAL.  During the cleanup, it calls
mqueue_evict_inode() which performed resource usage tracking math for
updating "user", before checking if there was a valid "user" at all
(which would indicate that the calculations would be sane).  Instead,
delay this check to after seeing a valid "user".

The overflow was real, but the results went unused, so while the flaw is
harmless, it's noisy for kernel fuzzers, so just fix it by moving the
calculation under the non-NULL "user" where it actually gets used.

Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Andreas Christoforou <andreaschristofo@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-06 18:29:39 +02:00
Greg Kroah-Hartman
a0b21f86b2 This is the 4.9.183 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0Nx/UACgkQONu9yGCS
 aT78vQ/9FibC80cVhM69HzZko3evlbEQmPZkmgjwkEqJTZ+2qczRZR6sM2Oexx9q
 tLBeRU1+3cPe9Pl/SsEZVM/rTqSrxcRXnGgyG6RMq8wBYtBWyiwRFLRJ2j5zw//s
 Wg0LXJhaqLV9FfP34PaPkXmRdBOec2nw5N0U1uoA/0VPwLFRfPTMn8lsfsmbtLWw
 JlXMlpEYosGQJArYMadMOntnCwaPtUJtd4EfLTqb+Yc23LM0RTYhtwn+XKE8cJzj
 1WMWHhTRkd+RtZIlsgw+FW1iX8pAHvrAPq0b2xOpmq5CCmErHfqjmeZ7VY6SUdAf
 RqnTDFb/yNAdhbjklzB5+o/jAl/yv1nbfOGmbJrZNVxzZ3RqKfUJNDpVQPwJlgAc
 frCJ2+ZuBVnJWcvbyaR35DgJlAxEZ26JVP17D8oywXnQLpemKW++bvMB6J3kL4qD
 nN5bmPd5jCe89Aawo66akIVN1UOnbBlzkdiXEXEzxwvKMraYcoSnoL6AmFxzjcBp
 HV/mUy0L7umBbtz1l9aXiACoGtUwZbbsfNTDeiXadbnZSSAGVYpmtf9UGhSSUrA0
 dqkWos5SWXTYWSO4+4FkuIaGIZF+LPE6Esm+PxyzXmbm5rBiVnxxcxT1X4Pd5O+C
 0y+wng0TSKHMPbwUyoGktuR7ec9P+/WhcyBwjS0EnMnBeBL+Bdw=
 =o76k
 -----END PGP SIGNATURE-----

Merge 4.9.183 into android-4.9-q

Changes in 4.9.183
	rapidio: fix a NULL pointer dereference when create_workqueue() fails
	fs/fat/file.c: issue flush after the writeback of FAT
	sysctl: return -EINVAL if val violates minmax
	ipc: prevent lockup on alloc_msg and free_msg
	ARM: prevent tracing IPI_CPU_BACKTRACE
	hugetlbfs: on restore reserve error path retain subpool reservation
	mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE
	mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
	mm/slab.c: fix an infinite loop in leaks_show()
	kernel/sys.c: prctl: fix false positive in validate_prctl_map()
	drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
	mfd: tps65912-spi: Add missing of table registration
	mfd: intel-lpss: Set the device in reset state when init
	mfd: twl6040: Fix device init errors for ACCCTL register
	perf/x86/intel: Allow PEBS multi-entry in watermark mode
	drm/bridge: adv7511: Fix low refresh rate selection
	objtool: Don't use ignore flag for fake jumps
	pwm: meson: Use the spin-lock only to protect register modifications
	ntp: Allow TAI-UTC offset to be set to zero
	f2fs: fix to avoid panic in do_recover_data()
	f2fs: fix to clear dirty inode in error path of f2fs_iget()
	f2fs: fix to do sanity check on valid block count of segment
	configfs: fix possible use-after-free in configfs_register_group
	uml: fix a boot splat wrt use of cpu_all_mask
	watchdog: imx2_wdt: Fix set_timeout for big timeout values
	watchdog: fix compile time error of pretimeout governors
	iommu/vt-d: Set intel_iommu_gfx_mapped correctly
	ALSA: hda - Register irq handler after the chip initialization
	nvmem: core: fix read buffer in place
	fuse: retrieve: cap requested size to negotiated max_write
	nfsd: allow fh_want_write to be called twice
	x86/PCI: Fix PCI IRQ routing table memory leak
	platform/chrome: cros_ec_proto: check for NULL transfer function
	soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
	clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx7d: Specify IMX7D_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6ul: Specify IMX6UL_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
	PCI: rpadlpar: Fix leaked device_node references in add/remove paths
	platform/x86: intel_pmc_ipc: adding error handling
	PCI: rcar: Fix a potential NULL pointer dereference
	PCI: rcar: Fix 64bit MSI message address handling
	video: hgafb: fix potential NULL pointer dereference
	video: imsttfb: fix potential NULL pointer dereferences
	PCI: xilinx: Check for __get_free_pages() failure
	gpio: gpio-omap: add check for off wake capable gpios
	dmaengine: idma64: Use actual device for DMA transfers
	pwm: tiehrpwm: Update shadow register for disabling PWMs
	ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa
	pwm: Fix deadlock warning when removing PWM device
	ARM: exynos: Fix undefined instruction during Exynos5422 resume
	Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections"
	ALSA: seq: Cover unsubscribe_port() in list_mutex
	ALSA: oxfw: allow PCM capture for Stanton SCS.1m
	libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
	mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
	fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
	signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO
	ptrace: restore smp_rmb() in __ptrace_may_access()
	media: v4l2-ioctl: clear fields in s_parm
	i2c: acorn: fix i2c warning
	bcache: fix stack corruption by PRECEDING_KEY()
	cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
	ASoC: cs42xx8: Add regcache mask dirty
	ASoC: fsl_asrc: Fix the issue about unsupported rate
	x86/uaccess, kcov: Disable stack protector
	ALSA: seq: Protect in-kernel ioctl calls with mutex
	ALSA: seq: Fix race of get-subscription call vs port-delete ioctls
	Revert "ALSA: seq: Protect in-kernel ioctl calls with mutex"
	Drivers: misc: fix out-of-bounds access in function param_set_kgdbts_var
	scsi: lpfc: add check for loss of ndlp when sending RRQ
	arm64/mm: Inhibit huge-vmap with ptdump
	scsi: bnx2fc: fix incorrect cast to u64 on shift operation
	selftests/timers: Add missing fflush(stdout) calls
	usbnet: ipheth: fix racing condition
	KVM: x86/pmu: do not mask the value that is written to fixed PMUs
	KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
	drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read
	drm/vmwgfx: NULL pointer dereference from vmw_cmd_dx_view_define()
	usb: dwc2: Fix DMA cache alignment issues
	USB: Fix chipmunk-like voice when using Logitech C270 for recording audio.
	USB: usb-storage: Add new ID to ums-realtek
	USB: serial: pl2303: add Allied Telesis VT-Kit3
	USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
	USB: serial: option: add Telit 0x1260 and 0x1261 compositions
	rtc: pcf8523: don't return invalid date when battery is low
	ax25: fix inconsistent lock state in ax25_destroy_timer
	be2net: Fix number of Rx queues used for flow hashing
	ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero
	lapb: fixed leak of control-blocks.
	neigh: fix use-after-free read in pneigh_get_next
	sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg
	Revert "staging: vc04_services: prevent integer overflow in create_pagelist()"
	perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
	selftests: netfilter: missing error check when setting up veth interface
	mISDN: make sure device name is NUL terminated
	x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
	perf/ring_buffer: Fix exposing a temporarily decreased data_head
	perf/ring_buffer: Add ordering to rb->nest increment
	gpio: fix gpio-adp5588 build errors
	net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE()
	i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
	configfs: Fix use-after-free when accessing sd->s_dentry
	perf data: Fix 'strncat may truncate' build failure with recent gcc
	perf record: Fix s390 missing module symbol and warning for non-root users
	ia64: fix build errors by exporting paddr_to_nid()
	KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
	KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
	net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
	scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
	scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
	scsi: libsas: delete sas port if expander discover failed
	mlxsw: spectrum: Prevent force of 56G
	Abort file_remove_privs() for non-reg. files
	Linux 4.9.183

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-22 08:56:18 +02:00
Li Rongqing
7b5598c8ad ipc: prevent lockup on alloc_msg and free_msg
[ Upstream commit d6a2946a88f524a47cc9b79279667137899db807 ]

msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
enabled on large memory SMP systems, the pages initialization can take a
long time, if msgctl10 requests a huge block memory, and it will block
rcu scheduler, so release cpu actively.

After adding schedule() in free_msg, free_msg can not be called when
holding spinlock, so adding msg to a tmp list, and free it out of
spinlock

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
  rcu:     Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
  rcu:     (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
  msgctl10        R  running task    21608 32505   2794 0x00000082
  Call Trace:
   preempt_schedule_irq+0x4c/0xb0
   retint_kernel+0x1b/0x2d
  RIP: 0010:__is_insn_slot_addr+0xfb/0x250
  Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff <41> c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
  RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
  RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
  RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
  RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
  R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
  R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
   kernel_text_address+0xc1/0x100
   __kernel_text_address+0xe/0x30
   unwind_get_return_address+0x2f/0x50
   __save_stack_trace+0x92/0x100
   create_object+0x380/0x650
   __kmalloc+0x14c/0x2b0
   load_msg+0x38/0x1a0
   do_msgsnd+0x19e/0xcf0
   do_syscall_64+0x117/0x400
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
  rcu:     (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
  msgctl10        R  running task    21608 32170  32155 0x00000082
  Call Trace:
   preempt_schedule_irq+0x4c/0xb0
   retint_kernel+0x1b/0x2d
  RIP: 0010:lock_acquire+0x4d/0x340
  Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 <48> c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
  RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
  RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
  RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
  R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
   is_bpf_text_address+0x32/0xe0
   kernel_text_address+0xec/0x100
   __kernel_text_address+0xe/0x30
   unwind_get_return_address+0x2f/0x50
   __save_stack_trace+0x92/0x100
   save_stack+0x32/0xb0
   __kasan_slab_free+0x130/0x180
   kfree+0xfa/0x2d0
   free_msg+0x24/0x50
   do_msgrcv+0x508/0xe60
   do_syscall_64+0x117/0x400
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Davidlohr said:
 "So after releasing the lock, the msg rbtree/list is empty and new
  calls will not see those in the newly populated tmp_msg list, and
  therefore they cannot access the delayed msg freeing pointers, which
  is good. Also the fact that the node_cache is now freed before the
  actual messages seems to be harmless as this is wanted for
  msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
  info->lock the thing is freed anyway so it should not change things"

Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:17:12 +02:00
Greg Kroah-Hartman
9797dcb8c7 This is the 4.9.104 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlsOO78ACgkQONu9yGCS
 aT4OfBAAk3FbbltZaC/dvdJalMB1D95dL2qQcQkY45eVKpeVzmiSboEdFdqNt2oN
 PtOs6tD/ao7y25CWCvYGoX3jrHUZflDv9akTDPsFyyxkbSpEZ36jTc0W6SpwZ9ez
 IOtRTaUk6oXhkCr8KQRgjFQy0Ip+QRdpMucMl+LSz+HEoXgMUDqHFouznJhsjbd9
 g5uVHWMvbeEWVEGlT3ch2fuO6lRoNxdO9V/DhQxviiIdGhFF+T7nZuAbH/OgXFr9
 eX6UiWVGDLsjP6iSw16pVZT0Lf2RbG5oWlmGtoaj2fPf8eWeF2aJcCT3JtqTN7Bj
 Bd4//X+c+GgYMZM/411+lSgPLQBhiLqPYEzubabxEm17KCi2hmiac6FN6TEVv8ZB
 2I3N2APG94qExPZmYdHTgIM7SDP+vJP7hpJ63Ez2AAjgNHJVvNaLZS6ZznILtTxu
 6It0PfewHgt6NWqSWutc/+gNIbvjyztYgdxRGSClYd4HwezNA8e6nrnpNoS8Pfn1
 RAMg4ercWMm0hKt0ELr+pCOv8Izlo+tMWIKba/GYcX8+ab3XlTH1hWHw/gLV5ikO
 HAv3CRdwr8NU3mkuNwQHJNcEXvThmsOhldH5AU1+503r7Qsjb14wI2ASF4SDYtRt
 TdbolMWoVElSqgC+4AKo+d5BrC30vAmQ90bAgLF5P6/vNPSNMZc=
 =nLEw
 -----END PGP SIGNATURE-----

Merge 4.9.104 into android-4.9

Changes in 4.9.104
	MIPS: c-r4k: Fix data corruption related to cache coherence
	MIPS: ptrace: Expose FIR register through FP regset
	MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
	KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
	affs_lookup(): close a race with affs_remove_link()
	aio: fix io_destroy(2) vs. lookup_ioctx() race
	ALSA: timer: Fix pause event notification
	do d_instantiate/unlock_new_inode combinations safely
	mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
	mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
	libata: Blacklist some Sandisk SSDs for NCQ
	libata: blacklist Micron 500IT SSD with MU01 firmware
	xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
	drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
	IB/hfi1: Use after free race condition in send context error path
	Revert "ipc/shm: Fix shmat mmap nil-page protection"
	ipc/shm: fix shmat() nil address after round-down when remapping
	kasan: fix memory hotplug during boot
	kernel/sys.c: fix potential Spectre v1 issue
	kernel/signal.c: avoid undefined behaviour in kill_something_info
	KVM/VMX: Expose SSBD properly to guests
	KVM: s390: vsie: fix < 8k check for the itdba
	KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
	kvm: x86: IA32_ARCH_CAPABILITIES is always supported
	firewire-ohci: work around oversized DMA reads on JMicron controllers
	x86/tsc: Allow TSC calibration without PIT
	NFSv4: always set NFS_LOCK_LOST when a lock is lost.
	ALSA: hda - Use IS_REACHABLE() for dependency on input
	kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
	netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460
	tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
	PCI: Add function 1 DMA alias quirk for Marvell 9128
	Input: psmouse - fix Synaptics detection when protocol is disabled
	i40iw: Zero-out consumer key on allocate stag for FMR
	tools lib traceevent: Simplify pointer print logic and fix %pF
	perf callchain: Fix attr.sample_max_stack setting
	tools lib traceevent: Fix get_field_str() for dynamic strings
	perf record: Fix failed memory allocation for get_cpuid_str
	iommu/vt-d: Use domain instead of cache fetching
	dm thin: fix documentation relative to low water mark threshold
	net: stmmac: dwmac-meson8b: fix setting the RGMII TX clock on Meson8b
	net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock
	nfs: Do not convert nfs_idmap_cache_timeout to jiffies
	watchdog: sp5100_tco: Fix watchdog disable bit
	kconfig: Don't leak main menus during parsing
	kconfig: Fix automatic menu creation mem leak
	kconfig: Fix expr_free() E_NOT leak
	mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
	ipmi/powernv: Fix error return code in ipmi_powernv_probe()
	Btrfs: set plug for fsync
	btrfs: Fix out of bounds access in btrfs_search_slot
	Btrfs: fix scrub to repair raid6 corruption
	btrfs: fail mount when sb flag is not in BTRFS_SUPER_FLAG_SUPP
	HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
	fm10k: fix "failed to kill vid" message for VF
	device property: Define type of PROPERTY_ENRTY_*() macros
	jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
	powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
	powerpc/numa: Ensure nodes initialized for hotplug
	RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
	ntb_transport: Fix bug with max_mw_size parameter
	gianfar: prevent integer wrapping in the rx handler
	tcp_nv: fix potential integer overflow in tcpnv_acked
	kvm: Map PFN-type memory regions as writable (if possible)
	ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
	ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
	ocfs2: return error when we attempt to access a dirty bh in jbd2
	mm/mempolicy: fix the check of nodemask from user
	mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
	asm-generic: provide generic_pmdp_establish()
	sparc64: update pmdp_invalidate() to return old pmd value
	mm: thp: use down_read_trylock() in khugepaged to avoid long block
	mm: pin address_space before dereferencing it while isolating an LRU page
	mm/fadvise: discard partial page if endbyte is also EOF
	openvswitch: Remove padding from packet before L3+ conntrack processing
	IB/ipoib: Fix for potential no-carrier state
	drm/nouveau/pmu/fuc: don't use movw directly anymore
	netfilter: ipv6: nf_defrag: Kill frag queue on RFC2460 failure
	x86/power: Fix swsusp_arch_resume prototype
	firmware: dmi_scan: Fix handling of empty DMI strings
	ACPI: processor_perflib: Do not send _PPC change notification if not ready
	ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs
	bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
	MIPS: generic: Fix machine compatible matching
	MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
	xen-netfront: Fix race between device setup and open
	xen/grant-table: Use put_page instead of free_page
	RDS: IB: Fix null pointer issue
	arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
	proc: fix /proc/*/map_files lookup
	cifs: silence compiler warnings showing up with gcc-8.0.0
	bcache: properly set task state in bch_writeback_thread()
	bcache: fix for allocator and register thread race
	bcache: fix for data collapse after re-attaching an attached device
	bcache: return attach error when no cache set exist
	tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
	bpf: fix rlimit in reuseport net selftest
	vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
	locking/qspinlock: Ensure node->count is updated before initialising node
	irqchip/gic-v3: Ignore disabled ITS nodes
	cpumask: Make for_each_cpu_wrap() available on UP as well
	irqchip/gic-v3: Change pr_debug message to pr_devel
	ARC: Fix malformed ARC_EMUL_UNALIGNED default
	ptr_ring: prevent integer overflow when calculating size
	libata: Fix compile warning with ATA_DEBUG enabled
	selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
	selftests: memfd: add config fragment for fuse
	ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
	ARM: OMAP3: Fix prm wake interrupt for resume
	ARM: OMAP1: clock: Fix debugfs_create_*() usage
	ibmvnic: Free RX socket buffer in case of adapter error
	iwlwifi: mvm: fix security bug in PN checking
	iwlwifi: mvm: always init rs with 20mhz bandwidth rates
	NFC: llcp: Limit size of SDP URI
	rxrpc: Work around usercopy check
	mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
	mac80211: fix a possible leak of station stats
	mac80211: fix calling sleeping function in atomic context
	mac80211: Do not disconnect on invalid operating class
	md raid10: fix NULL deference in handle_write_completed()
	drm/exynos: g2d: use monotonic timestamps
	drm/exynos: fix comparison to bitshift when dealing with a mask
	locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
	md: raid5: avoid string overflow warning
	kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
	powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
	s390/cio: fix ccw_device_start_timeout API
	s390/cio: fix return code after missing interrupt
	s390/cio: clear timer when terminating driver I/O
	PKCS#7: fix direct verification of SignerInfo signature
	ARM: OMAP: Fix dmtimer init for omap1
	smsc75xx: fix smsc75xx_set_features()
	regulatory: add NUL to request alpha2
	integrity/security: fix digsig.c build error with header file
	locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
	x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
	mac80211: drop frames with unexpected DS bits from fast-rx to slow path
	arm64: fix unwind_frame() for filtered out fn for function graph tracing
	macvlan: fix use-after-free in macvlan_common_newlink()
	kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
	fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
	fs: dcache: Use READ_ONCE when accessing i_dir_seq
	md: fix a potential deadlock of raid5/raid10 reshape
	md/raid1: fix NULL pointer dereference
	batman-adv: fix packet checksum in receive path
	batman-adv: invalidate checksum on fragment reassembly
	netfilter: ebtables: convert BUG_ONs to WARN_ONs
	batman-adv: Ignore invalid batadv_iv_gw during netlink send
	batman-adv: Ignore invalid batadv_v_gw during netlink send
	batman-adv: Fix netlink dumping of BLA claims
	batman-adv: Fix netlink dumping of BLA backbones
	nvme-pci: Fix nvme queue cleanup if IRQ setup fails
	clocksource/drivers/fsl_ftm_timer: Fix error return checking
	ceph: fix dentry leak when failing to init debugfs
	ARM: orion5x: Revert commit 4904dbda41.
	qrtr: add MODULE_ALIAS macro to smd
	r8152: fix tx packets accounting
	virtio-gpu: fix ioctl and expose the fixed status to userspace.
	dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
	bcache: fix kcrashes with fio in RAID5 backend dev
	ip6_tunnel: fix IFLA_MTU ignored on NEWLINK
	sit: fix IFLA_MTU ignored on NEWLINK
	ARM: dts: NSP: Fix amount of RAM on BCM958625HR
	powerpc/boot: Fix random libfdt related build errors
	gianfar: Fix Rx byte accounting for ndev stats
	net/tcp/illinois: replace broken algorithm reference link
	nvmet: fix PSDT field check in command format
	xen/pirq: fix error path cleanup when binding MSIs
	drm/sun4i: Fix dclk_set_phase
	Btrfs: send, fix issuing write op when processing hole in no data mode
	selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
	KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
	iwlwifi: mvm: fix TX of CCMP 256
	watchdog: f71808e_wdt: Fix magic close handling
	watchdog: sbsa: use 32-bit read for WCV
	batman-adv: Fix multicast packet loss with a single WANT_ALL_IPV4/6 flag
	e1000e: Fix check_for_link return value with autoneg off
	e1000e: allocate ring descriptors with dma_zalloc_coherent
	ia64/err-inject: Use get_user_pages_fast()
	RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
	RDMA/qedr: Fix iWARP write and send with immediate
	IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
	IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
	IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
	fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
	fsl/fman: avoid sleeping in atomic context while adding an address
	net: qcom/emac: Use proper free methods during TX
	net: smsc911x: Fix unload crash when link is up
	IB/core: Fix possible crash to access NULL netdev
	xen: xenbus: use put_device() instead of kfree()
	arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
	dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
	netfilter: ebtables: fix erroneous reject of last rule
	bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
	workqueue: use put_device() instead of kfree()
	ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
	sunvnet: does not support GSO for sctp
	drm/imx: move arming of the vblank event to atomic_flush
	net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
	batman-adv: fix header size check in batadv_dbg_arp()
	batman-adv: Fix skbuff rcsum on packet reroute
	vti4: Don't count header length twice on tunnel setup
	vti4: Don't override MTU passed on link creation via IFLA_MTU
	perf/cgroup: Fix child event counting bug
	brcmfmac: Fix check for ISO3166 code
	kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
	RDMA/ucma: Correct option size check using optlen
	RDMA/qedr: fix QP's ack timeout configuration
	RDMA/qedr: Fix rc initialization on CNQ allocation failure
	mm/mempolicy.c: avoid use uninitialized preferred_node
	mm, thp: do not cause memcg oom for thp
	selftests: ftrace: Add probe event argument syntax testcase
	selftests: ftrace: Add a testcase for string type with kprobe_event
	selftests: ftrace: Add a testcase for probepoint
	batman-adv: fix multicast-via-unicast transmission with AP isolation
	batman-adv: fix packet loss for broadcasted DHCP packets to a server
	ARM: 8748/1: mm: Define vdso_start, vdso_end as array
	net: qmi_wwan: add BroadMobi BM806U 2020:2033
	perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
	llc: properly handle dev_queue_xmit() return value
	builddeb: Fix header package regarding dtc source links
	mm/kmemleak.c: wait for scan completion before disabling free
	net: Fix untag for vlan packets without ethernet header
	net: mvneta: fix enable of all initialized RXQs
	sh: fix debug trap failure to process signals before return to user
	nvme: don't send keep-alives to the discovery controller
	x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
	x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
	fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
	swap: divide-by-zero when zero length swap file on ssd
	sr: get/drop reference to device in revalidate and check_events
	Force log to disk before reading the AGF during a fstrim
	cpufreq: CPPC: Initialize shared perf capabilities of CPUs
	dp83640: Ensure against premature access to PHY registers after reset
	mm/ksm: fix interaction with THP
	mm: fix races between address_space dereference and free in page_evicatable
	Btrfs: bail out on error during replay_dir_deletes
	Btrfs: fix NULL pointer dereference in log_dir_items
	btrfs: Fix possible softlock on single core machines
	ocfs2/dlm: don't handle migrate lockres if already in shutdown
	sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
	KVM: VMX: raise internal error for exception during invalid protected mode state
	fscache: Fix hanging wait on page discarded by writeback
	sparc64: Make atomic_xchg() an inline function rather than a macro.
	net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
	btrfs: tests/qgroup: Fix wrong tree backref level
	Btrfs: fix copy_items() return value when logging an inode
	btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
	rxrpc: Fix Tx ring annotation after initial Tx failure
	rxrpc: Don't treat call aborts as conn aborts
	xen/acpi: off by one in read_acpi_id()
	drivers: macintosh: rack-meter: really fix bogus memsets
	ACPI: acpi_pad: Fix memory leak in power saving threads
	powerpc/mpic: Check if cpu_possible() in mpic_physmask()
	m68k: set dma and coherent masks for platform FEC ethernets
	parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
	hwmon: (nct6775) Fix writing pwmX_mode
	powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
	powerpc/perf: Fix kernel address leak via sampling registers
	tools/thermal: tmon: fix for segfault
	selftests: Print the test we're running to /dev/kmsg
	net/mlx5: Protect from command bit overflow
	ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
	cxgb4: Setup FW queues before registering netdev
	ima: Fallback to the builtin hash algorithm
	virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
	arm: dts: socfpga: fix GIC PPI warning
	cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
	zorro: Set up z->dev.dma_mask for the DMA API
	bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
	ACPICA: Events: add a return on failure from acpi_hw_register_read
	ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
	cxgb4: Fix queue free path of ULD drivers
	i2c: mv64xxx: Apply errata delay only in standard mode
	KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
	perf top: Fix top.call-graph config option reading
	perf stat: Fix core dump when flag T is used
	IB/core: Honor port_num while resolving GID for IB link layer
	regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'
	spi: bcm-qspi: fIX some error handling paths
	MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
	PCI: Restore config space on runtime resume despite being unbound
	ipmi_ssif: Fix kernel panic at msg_done_handler
	powerpc: Add missing prototype for arch_irq_work_raise()
	f2fs: fix to check extent cache in f2fs_drop_extent_tree
	perf/core: Fix perf_output_read_group()
	drm/panel: simple: Fix the bus format for the Ontat panel
	hwmon: (pmbus/max8688) Accept negative page register values
	hwmon: (pmbus/adm1275) Accept negative page register values
	perf/x86/intel: Properly save/restore the PMU state in the NMI handler
	cdrom: do not call check_disk_change() inside cdrom_open()
	perf/x86/intel: Fix large period handling on Broadwell CPUs
	perf/x86/intel: Fix event update for auto-reload
	arm64: dts: qcom: Fix SPI5 config on MSM8996
	soc: qcom: wcnss_ctrl: Fix increment in NV upload
	gfs2: Fix fallocate chunk size
	x86/devicetree: Initialize device tree before using it
	x86/devicetree: Fix device IRQ settings in DT
	ALSA: vmaster: Propagate slave error
	dmaengine: pl330: fix a race condition in case of threaded irqs
	dmaengine: rcar-dmac: Check the done lists in rcar_dmac_chan_get_residue()
	enic: enable rq before updating rq descriptors
	hwrng: stm32 - add reset during probe
	dmaengine: qcom: bam_dma: get num-channels and num-ees from dt
	net: stmmac: ensure that the device has released ownership before reading data
	net: stmmac: ensure that the MSS desc is the last desc to set the own bit
	cpufreq: Reorder cpufreq_online() error code path
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
	udf: Provide saner default for invalid uid / gid
	ARM: dts: bcm283x: Fix probing of bcm2835-i2s
	audit: return on memory error to avoid null pointer dereference
	rcu: Call touch_nmi_watchdog() while printing stall warnings
	pinctrl: sh-pfc: r8a7796: Fix MOD_SEL register pin assignment for SSI pins group
	MIPS: Octeon: Fix logging messages with spurious periods after newlines
	drm/rockchip: Respect page offset for PRIME mmap calls
	x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
	perf tests: Use arch__compare_symbol_names to compare symbols
	perf report: Fix memory corruption in --branch-history mode --branch-history
	selftests/net: fixes psock_fanout eBPF test case
	netlabel: If PF_INET6, check sk_buff ip header version
	regmap: Correct comparison in regmap_cached
	ARM: dts: imx7d: cl-som-imx7: fix pinctrl_enet
	ARM: dts: porter: Fix HDMI output routing
	regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
	pinctrl: msm: Use dynamic GPIO numbering
	kdb: make "mdr" command repeat
	Linux 4.9.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-30 13:19:56 +02:00
Davidlohr Bueso
9c798bc19e ipc/shm: fix shmat() nil address after round-down when remapping
commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.

shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for.  Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil.  As
of this patch, such cases will return -EINVAL.

Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:17 +02:00
Davidlohr Bueso
2ef44a3c1a Revert "ipc/shm: Fix shmat mmap nil-page protection"
commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.

Patch series "ipc/shm: shmat() fixes around nil-page".

These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.

The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.

I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour.  Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).

[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

This patch (of 2):

Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED.  However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.

For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].

[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:17 +02:00
Greg Kroah-Hartman
320d53a9d0 This is the 4.9.96 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlre3foACgkQONu9yGCS
 aT4ygw/7BxK85/7my26TCcF3NlNfw8x9B1AuEHcjQdUFmSFip9fbolC1VVFQyzWV
 1Z6sK5LKrFU/234A4LVW+6oFVnUi8VvX5RnWNqRLj/xpPyv/L5PsG4A7VwQLZDCj
 yn6OrIkQ2czv3oio9R8bz1DFxrPsxoCdrMzV1xR/tbWQJscSRSomBSNstZ63lGQP
 FleS//hG21092lM0EDPQRfc9/xnwJFMKJoCwBrf6SUR7CK2ZdYp+w+HoeP5BSA49
 BCDdMU6wi63gTdfGFJa76VcKuFxRRic3F9VVJwOuYPCfCpgNz7GiUaGXirNCRSxD
 F0VveFMx9lOrxMCyDDEoj3lRaZ4FrhwFOF+Q0RK4/p8XH8Pr9emvt4beOeYNk3ev
 dFvCI/zA3zmDj8ppB+ZnKq9GTRSuNZ5sw4q7AyFu6fm0mSMV84ZRHgCrVbJkis3i
 VDnu6V+Nd9ZYP6rR6AegtQwFM3OReVVJG4ZpB17PZOebLdM+n4DsmqDCgvhFboVP
 WowtlR13GIn6ER7hiHfOlwv3CtTUspRrAZi73F8aEA7gdoYzGc8/9Q1fA738WiI4
 46utL18qyfjzwbwmI9dtFx1SLkrobSJ+X88PLquLYNUleoxbLx/2u4+v+gm0P4xo
 yQGPjIv+kms+rbwPR3ogMKqC2GrYYJs4uHBEPJboJWzYVoPAaEs=
 =yoJG
 -----END PGP SIGNATURE-----

Merge 4.9.96 into android-4.9

Changes in 4.9.96
	tty: make n_tty_read() always abort if hangup is in progress
	ubifs: Check ubifs_wbuf_sync() return code
	ubi: fastmap: Don't flush fastmap work on detach
	ubi: Fix error for write access
	ubi: Reject MLC NAND
	fs/reiserfs/journal.c: add missing resierfs_warning() arg
	resource: fix integer overflow at reallocation
	ipc/shm: fix use-after-free of shm file via remap_file_pages()
	mm, slab: reschedule cache_reap() on the same CPU
	usb: musb: gadget: misplaced out of bounds check
	usb: gadget: udc: core: update usb_ep_queue() documentation
	ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
	ARM: dts: exynos: Fix IOMMU support for GScaler devices on Exynos5250
	ARM: dts: at91: sama5d4: fix pinctrl compatible string
	spi: Fix scatterlist elements size in spi_map_buf
	xen-netfront: Fix hang on device removal
	regmap: Fix reversed bounds check in regmap_raw_write()
	ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
	ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
	USB: gadget: f_midi: fixing a possible double-free in f_midi
	USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
	usb: dwc3: pci: Properly cleanup resource
	smb3: Fix root directory when server returns inode number of zero
	HID: i2c-hid: fix size check and type usage
	powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
	powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
	powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
	powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
	HID: Fix hid_report_len usage
	HID: core: Fix size as type u32
	ASoC: ssm2602: Replace reg_default_raw with reg_default
	thunderbolt: Resume control channel after hibernation image is created
	irqchip/gic: Take lock when updating irq type
	random: use a tighter cap in credit_entropy_bits_safe()
	jbd2: if the journal is aborted then don't allow update of the log tail
	ext4: don't update checksum of new initialized bitmaps
	ext4: protect i_disksize update by i_data_sem in direct write path
	ext4: fail ext4_iget for root directory if unallocated
	RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
	RDMA/rxe: Fix an out-of-bounds read
	ALSA: pcm: Fix UAF at PCM release via PCM timer access
	IB/srp: Fix srp_abort()
	IB/srp: Fix completion vector assignment algorithm
	dmaengine: at_xdmac: fix rare residue corruption
	libnvdimm, namespace: use a safe lookup for dimm device name
	nfit, address-range-scrub: fix scrub in-progress reporting
	um: Compile with modern headers
	um: Use POSIX ucontext_t instead of struct ucontext
	iommu/vt-d: Fix a potential memory leak
	mmc: jz4740: Fix race condition in IRQ mask update
	clk: mvebu: armada-38x: add support for 1866MHz variants
	clk: mvebu: armada-38x: add support for missing clocks
	clk: fix false-positive Wmaybe-uninitialized warning
	clk: bcm2835: De-assert/assert PLL reset signal when appropriate
	pwm: rcar: Fix a condition to prevent mismatch value setting to duty
	thermal: imx: Fix race condition in imx_thermal_probe()
	dt-bindings: clock: mediatek: add binding for fixed-factor clock axisel_d4
	watchdog: f71808e_wdt: Fix WD_EN register read
	vfio/pci: Virtualize Maximum Read Request Size
	ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
	ALSA: pcm: Avoid potential races between OSS ioctls and read/write
	ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
	ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
	ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
	ext4: don't allow r/w mounts if metadata blocks overlap the superblock
	drm/amdgpu: Add an ATPX quirk for hybrid laptop
	drm/amdgpu: Fix always_valid bos multiple LRU insertions.
	drm/amdgpu: Fix PCIe lane width calculation
	drm/rockchip: Clear all interrupts before requesting the IRQ
	drm/radeon: Fix PCIe lane width calculation
	ALSA: line6: Use correct endpoint type for midi output
	ALSA: rawmidi: Fix missing input substream checks in compat ioctls
	ALSA: hda - New VIA controller suppor no-snoop path
	random: fix crng_ready() test
	random: crng_reseed() should lock the crng instance that it is modifying
	random: add new ioctl RNDRESEEDCRNG
	HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
	MIPS: uaccess: Add micromips clobbers to bzero invocation
	MIPS: memset.S: EVA & fault support for small_memset
	MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
	MIPS: memset.S: Fix clobber of v1 in last_fixup
	powerpc/eeh: Fix enabling bridge MMIO windows
	powerpc/lib: Fix off-by-one in alternate feature patching
	udf: Fix leak of UTF-16 surrogates into encoded strings
	jffs2_kill_sb(): deal with failed allocations
	hypfs_kill_super(): deal with failed allocations
	orangefs_kill_sb(): deal with allocation failures
	rpc_pipefs: fix double-dput()
	Don't leak MNT_INTERNAL away from internal mounts
	autofs: mount point create should honour passed in mode
	mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
	fanotify: fix logic of events on child
	writeback: safer lock nesting
	block/mq: fix potential deadlock during cpu hotplug
	Linux 4.9.96

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-04-24 11:26:46 +02:00
Eric Biggers
570ef10de6 ipc/shm: fix use-after-free of shm file via remap_file_pages()
commit 3f05317d9889ab75c7190dcd39491d2a97921984 upstream.

syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().

Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd->file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).

Commit 1ac0b6dec6 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

	#include <stdlib.h>
	#include <sys/shm.h>
	#include <sys/syscall.h>
	#include <unistd.h>

	int main()
	{
		int is_parent = (fork() != 0);
		srand(getpid());
		for (;;) {
			int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
			if (is_parent) {
				void *addr = shmat(id, NULL, 0);
				usleep(rand() % 50);
				while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
			} else {
				usleep(rand() % 50);
				shmctl(id, IPC_RMID, NULL);
			}
		}
	}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
	PGD 0 P4D 0
	Oops: 0000 [#1] SMP NOPTI
	CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
	RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
	RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
	[...]
	Call Trace:
	 file_accessed include/linux/fs.h:2063 [inline]
	 shmem_mmap+0x25/0x40 mm/shmem.c:2149
	 call_mmap include/linux/fs.h:1789 [inline]
	 shm_mmap+0x34/0x80 ipc/shm.c:465
	 call_mmap include/linux/fs.h:1789 [inline]
	 mmap_region+0x309/0x5b0 mm/mmap.c:1712
	 do_mmap+0x294/0x4a0 mm/mmap.c:1483
	 do_mmap_pgoff include/linux/mm.h:2235 [inline]
	 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
	 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
	 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
	 entry_SYSCALL_64_after_hwframe+0x42/0xb7

[ebiggers@google.com: add comment]
  Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1823 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:34:09 +02:00
Greg Hackmann
05baf14727 This is the 4.9.93 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlrJ66oACgkQONu9yGCS
 aT5IxA//RiDXkiiGC8T/3YiDvaxfgo2bIRVmN9FzrWE3FiWF6Zkeyw9gEROSI+mq
 V2puTIUaK+BY6M5105HNIn3V903HfYvAeVowAPIwfNqEYFaXNgZANlcdsOGPhetu
 8P41+RXd09cwEBzIiZnn1S4A5VdWr5pXnkUvvhnZargSanyEKmF9aqyJo7huPkBr
 eABky4/cQqUx5GwVSus/N5pkemn+H8wHRhbnbZZY+1qOVQwmrh7Uxl1usiyR+LIb
 CnEQQyJnUeyHHgrMTleGOQdRZ67HbiYlFYKVc0RHp63VHaLIuYx+E6oyPVLbG1/P
 HpXW6IixJf34W3cSMBILroPX6yAsPq1w892pbtj6KM7rpqBApwxECUi3pfUsbtdE
 dzs9/BarDKNKpk//AWXqIIiGl+qAl4zy7vMCERGA7Fhdm0wYK7vbtJSgI1pzrHTD
 V817QTogJ2i46PfcrB7DCr7W4p+vGhyV25+rwHBuNBimpEyXUAMH/sgdLBSlLWxV
 DqbMmsCbLWdwBZnO6v2rNZ8DeHlT99hpOPVsSd5LliiS4o87Ii/xMkyD0F2Xf/rx
 Gwq9B33FHLINWqFs25WZukYYvf+oe5X4tK+kNpsH1dTZrcVf+T3EtpeeVPyBrB9R
 qkFHg5SvJYsRKSR6Zt537wToOg9/WiH/VHQv3+eeEZYo0Sr8vG4=
 =J/va
 -----END PGP SIGNATURE-----

Merge tag 'v4.9.93' into android-4.9

This is the 4.9.93 stable release

Change-Id: I4293d83f45982c6fd479bddbf9b0f811248ddc30
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2018-04-09 11:39:17 -07:00
Mike Kravetz
86c8c892d3 ipc/shm.c: add split function to shm_vm_ops
commit 3d942ee079b917b24e2a0c5f18d35ac8ec9fee48 upstream.

If System V shmget/shmat operations are used to create a hugetlbfs
backed mapping, it is possible to munmap part of the mapping and split
the underlying vma such that it is not huge page aligned.  This will
untimately result in the following BUG:

  kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
  CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E 4.15.0-10-generic #11-Ubuntu
  NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
  REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E (4.15.0-10-generic)
  MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER: 20040000
  CFAR: c00000000036ee44 SOFTE: 1
  NIP __unmap_hugepage_range+0xa4/0x760
  LR __unmap_hugepage_range_final+0x28/0x50
  Call Trace:
    0x7115e4e00000 (unreliable)
    __unmap_hugepage_range_final+0x28/0x50
    unmap_single_vma+0x11c/0x190
    unmap_vmas+0x94/0x140
    exit_mmap+0x9c/0x1d0
    mmput+0xa8/0x1d0
    do_exit+0x360/0xc80
    do_group_exit+0x60/0x100
    SyS_exit_group+0x24/0x30
    system_call+0x58/0x6c
  ---[ end trace ee88f958a1c62605 ]---

This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
introduce ->split() to vm_operations_struct").  A split function was
added to vm_operations_struct to determine if a mapping can be split.
This was mostly for device-dax and hugetlbfs mappings which have
specific alignment constraints.

Mappings initiated via shmget/shmat have their original vm_ops
overwritten with shm_vm_ops.  shm_vm_ops functions will call back to the
original vm_ops if needed.  Add such a split function to shm_vm_ops.

Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com
Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08 12:12:42 +02:00
Greg Kroah-Hartman
71f1469722 This is the 4.9.79 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpxrs0ACgkQONu9yGCS
 aT7Yxw/+NmM+Yh70QOpW02RCFHCB+F9tnuQXNlLfEoDqlujMS/UNuVMx39gQXDaU
 7T/JOmnVtp9WQL9RLgAegSc3ayIQELzvtKjDLSo/hzxYsOmr0WlN2CVTGT7hn9JH
 IQdf8cR2r4FZ/XcxQLpSsRabwhqfeoND1TTm5LUNB1Ii05hUU6/s0k1rQguabuo5
 vi0BzSh7v/URxlLyL0m4ZVqovWOASS5/qSv7wazd4i/bSqH3g7VXLNu93iyOB8ih
 XXpeTjtfAwJ5kUXBWZPNazUzpQ7b56sQPtsvN6CrvTv8jKJ+FH+7S4d50Vgbu51X
 YBC36yypYPXunMXB9iiLYkyb8jraKr12BRLXQyl3TlNANoYjBiT/a2XmHDMA1VbL
 +ydbswbmcAvZ1fuAekVY+HIogEroWzN7FbhdUgV12nm7/4WfxpBTZW+M8Es/Stuh
 2ACT9TWopbhwRFUhFT5kyDTTnK++NsshGzUXbR9qPQzhdaqe76RPfJ6uHV69MXxP
 gE9o3NQ3fUieJO5nQj54atErX+sJ4987DnGoWrg+Ye9Svsq1oVw0K1e44VLBp08v
 iZk2lvNjUWnkDGQOhsPEYCLq6KPjXkaqV4OZVS6tGxGEZ4QQJjbnYk+kPeKjrKIA
 iP3nfaLJ4HQc2kvwEI41HEJGWyGUlhdrnDqfpxpWgGXOStGJrq0=
 =RJ2h
 -----END PGP SIGNATURE-----

Merge 4.9.79 into android-4.9

Changes in 4.9.79
	x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
	orangefs: use list_for_each_entry_safe in purge_waiting_ops
	orangefs: initialize op on loop restart in orangefs_devreq_read
	usbip: prevent vhci_hcd driver from leaking a socket pointer address
	usbip: Fix implicit fallthrough warning
	usbip: Fix potential format overflow in userspace tools
	can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
	can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
	KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
	Prevent timer value 0 for MWAITX
	drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
	drivers: base: cacheinfo: fix boot error message when acpi is enabled
	mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
	hwpoison, memcg: forcibly uncharge LRU pages
	cma: fix calculation of aligned offset
	mm, page_alloc: fix potential false positive in __zone_watermark_ok
	ipc: msg, make msgrcv work with LONG_MIN
	ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
	ACPICA: Namespace: fix operand cache leak
	netfilter: nfnetlink_cthelper: Add missing permission checks
	netfilter: xt_osf: Add missing permission checks
	reiserfs: fix race in prealloc discard
	reiserfs: don't preallocate blocks for extended attributes
	fs/fcntl: f_setown, avoid undefined behaviour
	scsi: libiscsi: fix shifting of DID_REQUEUE host byte
	Revert "module: Add retpoline tag to VERMAGIC"
	mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
	Input: trackpoint - force 3 buttons if 0 button is reported
	orangefs: fix deadlock; do not write i_size in read_iter
	um: link vmlinux with -no-pie
	vsyscall: Fix permissions for emulate mode with KAISER/PTI
	eventpoll.h: add missing epoll event masks
	dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
	ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
	ipv6: fix udpv6 sendmsg crash caused by too small MTU
	ipv6: ip6_make_skb() needs to clear cork.base.dst
	lan78xx: Fix failure in USB Full Speed
	net: igmp: fix source address check for IGMPv3 reports
	net: qdisc_pkt_len_init() should be more robust
	net: tcp: close sock if net namespace is exiting
	pppoe: take ->needed_headroom of lower device into account on xmit
	r8169: fix memory corruption on retrieval of hardware statistics.
	sctp: do not allow the v4 socket to bind a v4mapped v6 address
	sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
	tipc: fix a memory leak in tipc_nl_node_get_link()
	vmxnet3: repair memory leak
	net: Allow neigh contructor functions ability to modify the primary_key
	ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
	ppp: unlock all_ppp_mutex before registering device
	be2net: restore properly promisc mode after queues reconfiguration
	ip6_gre: init dev->mtu and dev->hard_header_len correctly
	gso: validate gso_type in GSO handlers
	mlxsw: spectrum_router: Don't log an error on missing neighbor
	tun: fix a memory leak for tfile->tx_array
	flow_dissector: properly cap thoff field
	perf/x86/amd/power: Do not load AMD power module on !AMD platforms
	x86/microcode/intel: Extend BDW late-loading further with LLC size check
	hrtimer: Reset hrtimer cpu base proper on CPU hotplug
	x86: bpf_jit: small optimization in emit_bpf_tail_call()
	bpf: fix bpf_tail_call() x64 JIT
	bpf: introduce BPF_JIT_ALWAYS_ON config
	bpf: arsh is not supported in 32 bit alu thus reject it
	bpf: avoid false sharing of map refcount with max_entries
	bpf: fix divides by zero
	bpf: fix 32-bit divide by zero
	bpf: reject stores into ctx via st and xadd
	nfsd: auth: Fix gid sorting when rootsquash enabled
	Linux 4.9.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-31 14:13:00 +01:00
Jiri Slaby
542cde0e3c ipc: msg, make msgrcv work with LONG_MIN
commit 999898355e08ae3b92dfd0a08db706e0c6703d30 upstream.

When LONG_MIN is passed to msgrcv, one would expect to recieve any
message.  But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is
undefined.  In particular, with my gcc -LONG_MIN produces -LONG_MIN
again.

So handle this case properly by assigning LONG_MAX to *msgtyp if
LONG_MIN was specified as msgtyp to msgrcv.

This code:
  long msg[] = { 100, 200 };
  int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
  msgsnd(m, &msg, sizeof(msg), 0);
  msgrcv(m, &msg, sizeof(msg), LONG_MIN, 0);

produces currently nothing:

  msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 65538
  msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
  msgrcv(65538, ...

Except a UBSAN warning:

  UBSAN: Undefined behaviour in ipc/msg.c:745:13
  negation of -9223372036854775808 cannot be represented in type 'long int':

With the patch, I see what I expect:

  msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 0
  msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
  msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16

Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 12:55:51 +01:00
Greg Kroah-Hartman
3f353c3ed4 This is the 4.9.38 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllp69gACgkQONu9yGCS
 aT6WLQ/8CVbMTorw1Xb4x9vsIP7v/ozsPYDN41EbpsH19/3xxEyhR0LsN056GxA4
 HPweVL7ridF6/IMzGg+RoBJsjsXR/+bxkhgS/7dcT37BKqAlUOSNV+AdVz7z+NgM
 UOm+s6OjRSRz2A01jqd/k0emGWx0+3Cd2C/637RBgv+vo4D5lDTLhNTFbEM/45oL
 QG0fz/Ba0C9dvrPMwMvMwLzS/Bi+h9PeUazjVgAQHCXUkeSvJYJTk8TmYVUGIJIc
 E4CNaKl3qUEMph/7YujPpdXYaWR/SYnoBv0UYZovksxdmU0DegF4V3BVO8VKt0ma
 K0gf8k3YRAXiEOW9D7yfffnZxbqA0Cwe4hau1gY+IVQYxwP6YaygBMHLbssTKbY1
 WJ36WzQMmqiZ7YD7Yf7cHjDG+d1bl9UYACr0BE8rlWpHa1bJme6H8BdLVqnP7OM8
 KpBgmRAZbQ5TBX/fCrd+PzgfS66393qMPiXzPrpTMyo4HfBAOs8FVCeyxzX+3Qwm
 Qd35CH0wIcHSJ48d+3FCuxYQ654IoLIZieStnFYyXKGhFe5Cgd+6F91oJiRAvY59
 BssBfl2RqnZB9l+tBMliYbCHTOzZKZv70Z9/+vw9ogNqiv7ZlTW7maobLzk7AQOf
 HiCC6GpnxM2FzDSjD/vfQZM9LTSUhXw5owebA+vwCc3KZgpNrn0=
 =zHg+
 -----END PGP SIGNATURE-----

Merge 4.9.38 into android-4.9

Changes in 4.9.38
	mqueue: fix a use-after-free in sys_mq_notify()
	Add "shutdown" to "struct class".
	tpm: Issue a TPM2_Shutdown for TPM2 devices.
	tools include: Add a __fallthrough statement
	tools string: Use __fallthrough in perf_atoll()
	tools strfilter: Use __fallthrough
	perf top: Use __fallthrough
	perf thread_map: Correctly size buffer used with dirent->dt_name
	perf intel-pt: Use __fallthrough
	perf tests: Avoid possible truncation with dirent->d_name + snprintf
	perf bench numa: Avoid possible truncation when using snprintf()
	perf header: Fix handling of PERF_EVENT_UPDATE__SCALE
	perf scripting perl: Fix compile error with some perl5 versions
	perf probe: Fix to probe on gcc generated symbols for offline kernel
	perf probe: Add error checks to offline probe post-processing
	md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
	md: fix super_offset endianness in super_1_rdev_size_change
	locking/rwsem-spinlock: Fix EINTR branch in __down_write_common()
	staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
	staging: comedi: fix clean-up of comedi_class in comedi_init()
	crypto: caam - fix gfp allocation flags (part I)
	crypto: rsa-pkcs1pad - use constant time memory comparison for MACs
	ext4: check return value of kstrtoull correctly in reserved_clusters_store
	x86/mm/pat: Don't report PAT on CPUs that don't support it
	saa7134: fix warm Medion 7134 EEPROM read
	Linux 4.9.38

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-15 13:31:27 +02:00
Cong Wang
e6952841ad mqueue: fix a use-after-free in sys_mq_notify()
commit f991af3daabaecff34684fd51fac80319d1baad1 upstream.

The retry logic for netlink_attachskb() inside sys_mq_notify()
is nasty and vulnerable:

1) The sock refcnt is already released when retry is needed
2) The fd is controllable by user-space because we already
   release the file refcnt

so we when retry but the fd has been just closed by user-space
during this small window, we end up calling netlink_detachskb()
on the error path which releases the sock again, later when
the user-space closes this socket a use-after-free could be
triggered.

Setting 'sock' to NULL here should be sufficient to fix it.

Reported-by: GeneBlue <geneblue.mail@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 12:16:10 +02:00
Davidlohr Bueso
78b2dcc0df ipc/shm: Fix shmat mmap nil-page protection
am: 270e84a1e6

Change-Id: Ibcc5e2d46f533a60c0d1cabb24c42e2c72e5a24a
2017-03-12 08:18:26 +00:00
Davidlohr Bueso
270e84a1e6 ipc/shm: Fix shmat mmap nil-page protection
commit 95e91b831f87ac8e1f8ed50c14d709089b4e01b8 upstream.

The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0.  For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address.  So by the time we do security_mmap_addr(0) things
get funky for shmat().

The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page.  There are two possible fixes
to this.  The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(...  |MAP_FIXED).  While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags.  This makes the
behavior of shmat() identical to the mmap() case.  The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.

Passes shm related ltp tests.

Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Gareth Evans <gareth.evans@contextis.co.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:44 +01:00
Daniel Rosenberg
bbcd0ffae5 ANDROID: vfs: Add permission2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.

Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-01-31 10:47:22 -08:00
Aristeu Rozanski
8c8d4d4520 ipc: account for kmem usage on mqueue and msg
When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.

The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster.  This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e.  accounted
for.

Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Reported-by: Stefan Schimanski <sttts@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Stefan Schimanski <sttts@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Nikolay Borisov
2a1613a586 ipc/sem.c: add cond_resched in exit_sme
In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
exit_sem.  Apparently it's possible for the loop to take quite a long time
and it doesn't have a scheduling point in it.  Since the codes is
executing under an rcu read section this may also cause rcu stalls, which
in turn block synchronize_rcu operations, which more or less de-stabilises
the whole system.

Fix this by introducing a cond_resched() at the beginning of the loop.

So this patch fixes the following:

  NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
  CPU: 10 PID: 18119 Comm: httpd Tainted: G           O    4.4.20-clouder2 #6
  Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
  task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
  RIP: 0010:[<ffffffff81614bc7>]  [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30
  RSP: 0018:ffff881c95553e40  EFLAGS: 00000246
  RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
  RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
  RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
  R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
  R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
  FS:  0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
  Call Trace:
    ? exit_sem+0x7c/0x280
    do_exit+0x338/0xb40
    do_group_exit+0x43/0xd0
    SyS_exit_group+0x14/0x20
    entry_SYSCALL_64_fastpath+0x16/0x6e

Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.com
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Cc: Herton R. Krzesinski <herton@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Davidlohr Bueso
ed27f9122c ipc/msg: avoid waking sender upon full queue
Blocked tasks queued in q_senders waiting for their message to fit in the
queue are blindly awoken every time we think there's a remote chance this
might happen.  This could cause numerous (and expensive -- thundering
herd-ish) bogus wakeups if the queue is still really full.  Adding to the
scheduling cost/overhead, there's also the fact that we need to take the
ipc object lock and requeue ourselves in the q_senders list.

By keeping track of the blocked sender's message size, we can know
previously if the wakeup ought to occur or not.  Otherwise, to maintain
the current wakeup order we just move it to the tail.  This is exactly
what occurs right now if the sender needs to go back to sleep.

The case of EIDRM is left completely untouched, as we need to wakeup all
the tasks, and shouldn't be playing games in the first place.

This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
switches (avg out of ten runs).  Although these tests are really about
functionality (as opposed to performance), is does show the direct
benefits of the optimization.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Davidlohr Bueso
d0d6a2a95e ipc/msg: make ss_wakeup() kill arg boolean
... 'tis annoying.

Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Davidlohr Bueso
e3658538bf ipc/msg: batch queue sender wakeups
Currently the use of wake_qs in sysv msg queues are only for the receiver
tasks that are blocked on the queue.  But blocked sender tasks (due to
queue size constraints) still are awoken with the ipc object lock held,
which can be a problem particularly for small sized queues and far from
gracious for -rt (just like it was for the receiver side).

The paths that actually wakeup a sender are obviously related to when we
are either getting rid of the queue or after (some) space is freed-up
after a receiver takes the msg (msgrcv).  Furthermore, with the exception
of msgrcv, we can always piggy-back on expunge_all that has its own tasks
lined-up for waking.  Finally, upon unlinking the message, it should be no
problem delaying the wakeups a bit until after we've released the lock.

Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Sebastian Andrzej Siewior
ee51636ca5 ipc/msg: implement lockless pipelined wakeups
This patch moves the wakeup_process() invocation so it is not done under
the ipc global lock by making use of a lockless wake_q.  With this change,
the waiter is woken up once the message has been assigned and it does not
need to loop on SMP if the message points to NULL.  In the signal case we
still need to check the pointer under the lock to verify the state.

This change should also avoid the introduction of preempt_disable() in -RT
which avoids a busy-loop which pools for the NULL -> !NULL change if the
waiter has a higher priority compared to the waker.

By making use of wake_qs, the logic of sysv msg queues is greatly
simplified (and very well suited as we can batch lockless wakeups),
particularly around the lockless receive algorithm.

This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
Radeon R7, 12 Compute Cores 4C+8G":

test             |   before   |   after    | diff
-----------------|------------|------------|----------
pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
pmsg-shared 2 60 | 22,884,224 | 24,278,200 | +  ~6.09 %

Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Manfred Spraul
5864a2fd30 ipc/sem.c: fix complex_count vs. simple op race
Commit 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
 - a non-simple operations is ongoing (sma->sem_perm.lock held)
 - a complex operation is sleeping (sma->complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions.  See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma->complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
  find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
  drops sem_lock, sleeps

Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

Fixes: 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by: <felixh@informatik.uni-bremen.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <1vier1@web.de>
Cc: <stable@vger.kernel.org>	[3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Linus Torvalds
101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Deepa Dinamani
078cd8279e fs: Replace CURRENT_TIME with current_time() for inode timestamps
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:21 -04:00
Eric W. Biederman
7872559664 Merge branch 'nsfs-ioctls' into HEAD
From: Andrey Vagin <avagin@openvz.org>

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.

Why we may want to know relationships between namespaces?

One use would be visualization, in order to understand the running
system.  Another would be to answer the question: what capability does
process X have to perform operations on a resource governed by namespace
Y?

One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we are going to dump and restore nested namespaces.

There [1] was a discussion about which interface to choose to determing
relationships between namespaces.

Eric suggested to add two ioctl-s [2]:
> Grumble, Grumble.  I think this may actually a case for creating ioctls
> for these two cases.  Now that random nsfs file descriptors are bind
> mountable the original reason for using proc files is not as pressing.
>
> One ioctl for the user namespace that owns a file descriptor.
> One ioctl for the parent namespace of a namespace file descriptor.

Here is an implementaions of these ioctl-s.

$ man man7/namespaces.7
...
Since  Linux  4.X,  the  following  ioctl(2)  calls are supported for
namespace file descriptors.  The correct syntax is:

      fd = ioctl(ns_fd, ioctl_type);

where ioctl_type is one of the following:

NS_GET_USERNS
      Returns a file descriptor that refers to an owning user names‐
      pace.

NS_GET_PARENT
      Returns  a  file descriptor that refers to a parent namespace.
      This ioctl(2) can be used for pid  and  user  namespaces.  For
      user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
      meaning.

In addition to generic ioctl(2) errors, the following  specific  ones
can occur:

EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.

EPERM  The  requested  namespace  is outside of the current namespace
      scope.

[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101

Changes for v2:
* don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
  outside of the init namespace, so we can return EPERM in this case too.
  > The fewer special cases the easier the code is to get
  > correct, and the easier it is to read. // Eric

Changes for v3:
* rename ns->get_owner() to ns->owner(). get_* usually means that it
  grabs a reference.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: "W. Trevor King" <wking@tremily.us>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
2016-09-22 20:00:36 -05:00
Andrey Vagin
bcac25a58b kernel: add a helper to get an owning user namespace for a namespace
Return -EPERM if an owning user namespace is outside of a process
current user namespace.

v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
    This special cases was removed from this version. There is nothing
    outside of init_user_ns, so we can return EPERM.
v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.

Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22 19:59:39 -05:00
Eric W. Biederman
df75e7748b userns: When the per user per user namespace limit is reached return ENOSPC
The current error codes returned when a the per user per user
namespace limit are hit (EINVAL, EUSERS, and ENFILE) are wrong.  I
asked for advice on linux-api and it we made clear that those were
the wrong error code, but a correct effor code was not suggested.

The best general error code I have found for hitting a resource limit
is ENOSPC.  It is not perfect but as it is unambiguous it will serve
until someone comes up with a better error code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-09-22 13:25:56 -05:00
Eric W. Biederman
aba3566163 ipcns: Add a limit on the number of ipc namespaces
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-08-08 14:42:03 -05:00
Alexey Dobriyan
3bd080e4d8 ipc: delete "nr_ipc_ns"
Write-only variable.

Link: http://lkml.kernel.org/r/20160708214356.GA6785@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:44 -04:00
Fabian Frederick
9b24fef9f0 sysv, ipc: fix security-layer leaking
Commit 53dad6d3a8 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.

Running LTP msgsnd06 with kmemleak gives the following:

  cat /sys/kernel/debug/kmemleak

  unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
      1b 00 00 00 01 00 00 00                          ........
    backtrace:
      kmemleak_alloc+0x23/0x40
      kmem_cache_alloc_trace+0xe1/0x180
      selinux_msg_queue_alloc_security+0x3f/0xd0
      security_msg_queue_alloc+0x2e/0x40
      newque+0x4e/0x150
      ipcget+0x159/0x1b0
      SyS_msgget+0x39/0x40
      entry_SYSCALL_64_fastpath+0x13/0x8f

Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()

Fixes: 53dad6d3a8 ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 17:31:41 -04:00
Linus Torvalds
a867d7349e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns vfs updates from Eric Biederman:
 "This tree contains some very long awaited work on generalizing the
  user namespace support for mounting filesystems to include filesystems
  with a backing store.  The real world target is fuse but the goal is
  to update the vfs to allow any filesystem to be supported.  This
  patchset is based on a lot of code review and testing to approach that
  goal.

  While looking at what is needed to support the fuse filesystem it
  became clear that there were things like xattrs for security modules
  that needed special treatment.  That the resolution of those concerns
  would not be fuse specific.  That sorting out these general issues
  made most sense at the generic level, where the right people could be
  drawn into the conversation, and the issues could be solved for
  everyone.

  At a high level what this patchset does a couple of simple things:

   - Add a user namespace owner (s_user_ns) to struct super_block.

   - Teach the vfs to handle filesystem uids and gids not mapping into
     to kuids and kgids and being reported as INVALID_UID and
     INVALID_GID in vfs data structures.

  By assigning a user namespace owner filesystems that are mounted with
  only user namespace privilege can be detected.  This allows security
  modules and the like to know which mounts may not be trusted.  This
  also allows the set of uids and gids that are communicated to the
  filesystem to be capped at the set of kuids and kgids that are in the
  owning user namespace of the filesystem.

  One of the crazier corner casees this handles is the case of inodes
  whose i_uid or i_gid are not mapped into the vfs.  Most of the code
  simply doesn't care but it is easy to confuse the inode writeback path
  so no operation that could cause an inode write-back is permitted for
  such inodes (aka only reads are allowed).

  This set of changes starts out by cleaning up the code paths involved
  in user namespace permirted mounts.  Then when things are clean enough
  adds code that cleanly sets s_user_ns.  Then additional restrictions
  are added that are possible now that the filesystem superblock
  contains owner information.

  These changes should not affect anyone in practice, but there are some
  parts of these restrictions that are changes in behavior.

   - Andy's restriction on suid executables that does not honor the
     suid bit when the path is from another mount namespace (think
     /proc/[pid]/fd/) or when the filesystem was mounted by a less
     privileged user.

   - The replacement of the user namespace implicit setting of MNT_NODEV
     with implicitly setting SB_I_NODEV on the filesystem superblock
     instead.

     Using SB_I_NODEV is a stronger form that happens to make this state
     user invisible.  The user visibility can be managed but it caused
     problems when it was introduced from applications reasonably
     expecting mount flags to be what they were set to.

  There is a little bit of work remaining before it is safe to support
  mounting filesystems with backing store in user namespaces, beyond
  what is in this set of changes.

   - Verifying the mounter has permission to read/write the block device
     during mount.

   - Teaching the integrity modules IMA and EVM to handle filesystems
     mounted with only user namespace root and to reduce trust in their
     security xattrs accordingly.

   - Capturing the mounters credentials and using that for permission
     checks in d_automount and the like.  (Given that overlayfs already
     does this, and we need the work in d_automount it make sense to
     generalize this case).

  Furthermore there are a few changes that are on the wishlist:

   - Get all filesystems supporting posix acls using the generic posix
     acls so that posix_acl_fix_xattr_from_user and
     posix_acl_fix_xattr_to_user may be removed.  [Maintainability]

   - Reducing the permission checks in places such as remount to allow
     the superblock owner to perform them.

   - Allowing the superblock owner to chown files with unmapped uids and
     gids to something that is mapped so the files may be treated
     normally.

  I am not considering even obvious relaxations of permission checks
  until it is clear there are no more corner cases that need to be
  locked down and handled generically.

  Many thanks to Seth Forshee who kept this code alive, and putting up
  with me rewriting substantial portions of what he did to handle more
  corner cases, and for his diligent testing and reviewing of my
  changes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
  fs: Call d_automount with the filesystems creds
  fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
  evm: Translate user/group ids relative to s_user_ns when computing HMAC
  dquot: For now explicitly don't support filesystems outside of init_user_ns
  quota: Handle quota data stored in s_user_ns in quota_setxquota
  quota: Ensure qids map to the filesystem
  vfs: Don't create inodes with a uid or gid unknown to the vfs
  vfs: Don't modify inodes with a uid or gid unknown to the vfs
  cred: Reject inodes with invalid ids in set_create_file_as()
  fs: Check for invalid i_uid in may_follow_link()
  vfs: Verify acls are valid within superblock's s_user_ns.
  userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
  fs: Refuse uid/gid changes which don't map into s_user_ns
  selinux: Add support for unprivileged mounts from user namespaces
  Smack: Handle labels consistently in untrusted mounts
  Smack: Add support for unprivileged mounts from user namespaces
  fs: Treat foreign mounts as nosuid
  fs: Limit file caps to the user namespace of the super block
  userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
  userns: Remove implicit MNT_NODEV fragility.
  ...
2016-07-29 15:54:19 -07:00
Kirill A. Shutemov
4595ef88d1 shmem: make shmem_inode_info::lock irq-safe
We are going to need to call shmem_charge() under tree_lock to get
accoutning right on collapse of small tmpfs pages into a huge one.

The problem is that tree_lock is irq-safe and lockdep is not happy, that
we take irq-unsafe lock under irq-safe[1].

Let's convert the lock to irq-safe.

[1] https://gist.github.com/kiryl/80c0149e03ed35dfaf26628b8e03cdbc

Link: http://lkml.kernel.org/r/1466021202-61880-34-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Hugh Dickins
c01d5b3007 shmem: get_unmapped_area align huge page
Provide a shmem_get_unmapped_area method in file_operations, called at
mmap time to decide the mapping address.  It could be conditional on
CONFIG_TRANSPARENT_HUGEPAGE, but save #ifdefs in other places by making
it unconditional.

shmem_get_unmapped_area() first calls the usual mm->get_unmapped_area
(which we treat as a black box, highly dependent on architecture and
config and executable layout).  Lots of conditions, and in most cases it
just goes with the address that chose; but when our huge stars are
rightly aligned, yet that did not provide a suitable address, go back to
ask for a larger arena, within which to align the mapping suitably.

There have to be some direct calls to shmem_get_unmapped_area(), not via
the file_operations: because of the way shmem_zero_setup() is called to
create a shmem object late in the mmap sequence, when MAP_SHARED is
requested with MAP_ANONYMOUS or /dev/zero.  Though this only matters
when /proc/sys/vm/shmem_huge has been set.

Link: http://lkml.kernel.org/r/1466021202-61880-29-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Eric W. Biederman
a2982cc922 vfs: Generalize filesystem nodev handling.
Introduce a function may_open_dev that tests MNT_NODEV and a new
superblock flab SB_I_NODEV.  Use this new function in all of the
places where MNT_NODEV was previously tested.

Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those
filesystems should never support device nodes, and a simple superblock
flags makes that very hard to get wrong.  With SB_I_NODEV set if any
device nodes somehow manage to show up on on a filesystem those
device nodes will be unopenable.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:57 -05:00
Eric W. Biederman
3ee690143c ipc/mqueue: The mqueue filesystem should never contain executables
Set SB_I_NOEXEC on mqueuefs to ensure small implementation mistakes
do not result in executable on mqueuefs by accident.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:56 -05:00
Eric W. Biederman
d91ee87d8d vfs: Pass data, ns, and ns->userns to mount_ns
Today what is normally called data (the mount options) is not passed
to fill_super through mount_ns.

Pass the mount options and the namespace separately to mount_ns so
that filesystems such as proc that have mount options, can use
mount_ns.

Pass the user namespace to mount_ns so that the standard permission
check that verifies the mounter has permissions over the namespace can
be performed in mount_ns instead of in each filesystems .mount method.
Thus removing the duplication between mqueuefs and proc in terms of
permission checks.  The extra permission check does not currently
affect the rpc_pipefs filesystem and the nfsd filesystem as those
filesystems do not currently allow unprivileged mounts.  Without
unpvileged mounts it is guaranteed that the caller has already passed
capable(CAP_SYS_ADMIN) which guarantees extra permission check will
pass.

Update rpc_pipefs and the nfsd filesystem to ensure that the network
namespace reference is always taken in fill_super and always put in kill_sb
so that the logic is simpler and so that errors originating inside of
fill_super do not cause a network namespace leak.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:53 -05:00
Eric W. Biederman
b236017acf ipc: Initialize ipc_namespace->user_ns early.
Allow the ipc namespace initialization code to depend on ns->user_ns
being set during initialization.

In particular this allows mq_init_ns to use ns->user_ns for permission
checks and initializating s_user_ns while the the mq filesystem is
being mounted.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Suggested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-06-23 15:41:53 -05:00
Peter Zijlstra
be3e784498 locking/spinlock: Update spin_unlock_wait() users
With the modified semantics of spin_unlock_wait() a number of
explicit barriers can be removed. Also update the comment for the
do_exit() usecase, as that was somewhat stale/obscure.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14 11:55:15 +02:00
Peter Zijlstra
33ac279677 locking/barriers: Introduce smp_acquire__after_ctrl_dep()
Introduce smp_acquire__after_ctrl_dep(), this construct is not
uncommon, but the lack of this barrier is.

Use it to better express smp_rmb() uses in WRITE_ONCE(), the IPC
semaphore code and the qspinlock code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14 11:55:14 +02:00
Michal Hocko
91f4f94ea3 ipc, shm: make shmem attach/detach wait for mmap_sem killable
shmat and shmdt rely on mmap_sem for write.  If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Davidlohr Bueso
a5f4db8771 ipc/sem: make semctl setting sempid consistent
As indicated by bug#112271, Linux sets the sempid value upon semctl, and
not only for semop calls.  However, within semctl we only do this for
SETVAL, leaving SETALL without updating the field, and therefore rather
inconsistent behavior when compared to other Unices.

There is really no documentation regarding this and therefore users
should not make assumptions.  With this patch, along with updating
semctl.2 manpages, this scenario should become less ambiguous As such,
set sempid on SETALL cmd.

Also update some in-code documentation, specifying where the sempid is
set.

Passes ltp and custom testcase where a child (fork) does SETALL to the
set.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Philip Semanchuk <linux_kernel.20.ick@spamgourmet.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Kirill A. Shutemov
1ac0b6dec6 ipc/shm: handle removed segments gracefully in shm_mmap()
remap_file_pages(2) emulation can reach file which represents removed
IPC ID as long as a memory segment is mapped.  It breaks expectations of
IPC subsystem.

Test case (rewritten to be more human readable, originally autogenerated
by syzkaller[1]):

	#define _GNU_SOURCE
	#include <stdlib.h>
	#include <sys/ipc.h>
	#include <sys/mman.h>
	#include <sys/shm.h>

	#define PAGE_SIZE 4096

	int main()
	{
		int id;
		void *p;

		id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
		p = shmat(id, NULL, 0);
		shmctl(id, IPC_RMID, NULL);
		remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);

	        return 0;
	}

The patch changes shm_mmap() and code around shm_lock() to propagate
locking error back to caller of shm_mmap().

[1] http://github.com/google/syzkaller

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Linus Torvalds
cc673757e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull final vfs updates from Al Viro:

 - The ->i_mutex wrappers (with small prereq in lustre)

 - a fix for too early freeing of symlink bodies on shmem (they need to
   be RCU-delayed) (-stable fodder)

 - followup to dedupe stuff merged this cycle

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: abort dedupe loop if fatal signals are pending
  make sure that freeing shmem fast symlinks is RCU-delayed
  wrappers for ->i_mutex access
  lustre: remove unused declaration
2016-01-23 12:24:56 -08:00