wonderful-mobile/exynos-linux-stable

Author	SHA1	Message	Date
xxmustafacooTR	71e430b7c9	gaming_control, mali: more controls, optimizations	2024-09-27 17:20:00 +03:00
Diep Quynh	609163c324	drivers: Introduce brand new kernel gaming mode How to trigger gaming mode? Just open a game that is supported in games list How to exit? Kill the game from recents, or simply back to homescreen What does this gaming mode do? - It limits big cluster maximum frequency to 2,0GHz, and little cluster to values matching GPU frequencies as below: + 338MHz: 455MHz + 385MHz and above: 1053MHz - As for the cluster freq limits, it overcomes heating issue while playing heavy games, as well as saves battery juice Big thanks to [kerneltoast] for the following commits on his wahoo kernel `5ac1e81d3d` `e13e2c4554` Gaming control's idea was based on these Signed-off-by: Diep Quynh <remilia.1505@gmail.com>	2024-09-27 17:19:55 +03:00
Diep Quynh	665c080acf	sched: ems: Update EMS to latest beyond2lte source Signed-off-by: Diep Quynh <remilia.1505@gmail.com>	2024-09-27 17:17:49 +03:00
Diep Quynh	13a21ca507	universal9810: Reset scheduler to android-4.9-q state We will adapt new scheduler mechanism after this Signed-off-by: Diep Quynh <remilia.1505@gmail.com>	2024-09-27 17:17:26 +03:00
FAROVITUS	af1d3ae977	Merge 4.9.212 branch 'android-4.9-q' into tw10-android-4.9-q Documentation/filesystems/fscrypt.rst arch/arm/common/Kconfig arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi arch/arm64/boot/dts/amd/amd-seattle-soc.dtsi arch/arm64/boot/dts/arm/juno-clocks.dtsi arch/arm64/boot/dts/broadcom/ns2.dtsi arch/arm64/boot/dts/lg/lg1312.dtsi arch/arm64/boot/dts/lg/lg1313.dtsi arch/arm64/boot/dts/marvell/armada-37xx.dtsi arch/arm64/boot/dts/nvidia/tegra210-p2180.dtsi arch/arm64/boot/dts/nvidia/tegra210-p2597.dtsi arch/arm64/boot/dts/nvidia/tegra210.dtsi arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi arch/arm64/boot/dts/qcom/msm8996.dtsi arch/arm64/configs/ranchu64_defconfig arch/arm64/include/asm/cpucaps.h arch/arm64/kernel/cpufeature.c arch/arm64/kernel/traps.c arch/arm64/mm/mmu.c crypto/Makefile crypto/ablkcipher.c crypto/blkcipher.c crypto/testmgr.h crypto/zstd.c drivers/android/binder.c drivers/android/binder_alloc.c drivers/char/random.c drivers/clocksource/exynos_mct.c drivers/dma/pl330.c drivers/hid/hid-sony.c drivers/hid/uhid.c drivers/hid/usbhid/hiddev.c drivers/i2c/i2c-core.c drivers/md/dm-crypt.c drivers/media/v4l2-core/videobuf2-v4l2.c drivers/mmc/host/dw_mmc.c drivers/net/ethernet/broadcom/tg3.c drivers/net/usb/r8152.c drivers/scsi/scsi_logging.c drivers/scsi/sd.c drivers/scsi/ufs/ufshcd-pci.c drivers/scsi/ufs/ufshcd-pltfrm.c drivers/staging/android/Kconfig drivers/staging/android/ion/ion.c drivers/staging/android/ion/ion_priv.h drivers/staging/android/ion/ion_system_heap.c drivers/staging/android/lowmemorykiller.c drivers/tty/serial/samsung.c drivers/usb/dwc3/core.c drivers/usb/dwc3/gadget.c drivers/usb/host/xhci-hub.c drivers/video/fbdev/core/fbmon.c drivers/video/fbdev/core/modedb.c fs/crypto/fname.c fs/crypto/fscrypt_private.h fs/crypto/keyinfo.c fs/ext4/ialloc.c fs/ext4/namei.c fs/ext4/xattr.c fs/f2fs/checkpoint.c fs/f2fs/data.c fs/f2fs/debug.c fs/f2fs/dir.c fs/f2fs/f2fs.h fs/f2fs/file.c fs/f2fs/gc.c fs/f2fs/inline.c fs/f2fs/inode.c fs/f2fs/namei.c fs/f2fs/node.c fs/f2fs/recovery.c fs/f2fs/segment.c fs/f2fs/segment.h fs/f2fs/super.c fs/f2fs/sysfs.c fs/fat/dir.c fs/fat/fatent.c fs/file.c fs/namespace.c fs/pnode.c fs/proc/inode.c fs/proc/root.c fs/proc/task_mmu.c fs/sdcardfs/dentry.c fs/sdcardfs/derived_perm.c fs/sdcardfs/file.c fs/sdcardfs/inode.c fs/sdcardfs/lookup.c fs/sdcardfs/main.c fs/sdcardfs/sdcardfs.h fs/sdcardfs/super.c include/linux/blk_types.h include/linux/cpuhotplug.h include/linux/cred.h include/linux/fb.h include/linux/power_supply.h include/linux/sched.h include/linux/zstd.h include/trace/events/sched.h include/uapi/linux/android/binder.h init/Kconfig init/main.c kernel/bpf/hashtab.c kernel/cpu.c kernel/cred.c kernel/fork.c kernel/locking/spinlock_debug.c kernel/panic.c kernel/printk/printk.c kernel/sched/Makefile kernel/sched/core.c kernel/sched/fair.c kernel/sched/rt.c kernel/sched/walt.c kernel/sched/walt.h kernel/trace/trace.c lib/bug.c lib/list_debug.c lib/vsprintf.c lib/zstd/bitstream.h lib/zstd/compress.c lib/zstd/decompress.c lib/zstd/fse.h lib/zstd/fse_compress.c lib/zstd/fse_decompress.c lib/zstd/huf_compress.c lib/zstd/huf_decompress.c lib/zstd/zstd_internal.h mm/debug.c mm/filemap.c mm/rmap.c net/core/filter.c net/ipv4/sysctl_net_ipv4.c net/ipv4/sysfs_net_ipv4.c net/ipv4/tcp_input.c net/ipv4/tcp_output.c net/ipv4/udp.c net/ipv6/netfilter/nf_conntrack_reasm.c net/netfilter/Kconfig net/netfilter/Makefile net/netfilter/xt_qtaguid.c net/netfilter/xt_qtaguid_internal.h net/xfrm/xfrm_policy.c net/xfrm/xfrm_state.c scripts/checkpatch.pl security/selinux/hooks.c sound/core/compress_offload.c	2020-02-12 12:32:38 +02:00
FAROVITUS	2b92eefa41	import G965FXXU7DTAA OSRC *First release for Android (Q). Signed-off-by: FAROVITUS <farovitus@gmail.com>	2020-02-04 13:50:09 +02:00
Greg Kroah-Hartman	e9766ef8f1	This is the 4.9.197 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2o0kcACgkQONu9yGCS aT4nAQ//UOJDaAeVD3HlVI52sv8ZS7lCck8MCrVXeOWfgP5BPgVTcQairhBxtCiz eiIfQclHWTqoZygCHAf2mGNgSymYT13w0EJ/pnrGIBs66+mWTJcZ/U31iD8ftGo+ LBC8nUL4ocmLN+eNFRhDpFjRksFZnYAcE5L3A3Y+oMat/QplYT/R+T6dgnN08VHI oBDNen1y8Q+4PG3LnBBtS1PjvnSJMVPxX4MkRfBD+wjSNZdPCDqQ/piuZCmeBSsy nkkW/fEsWAKbsXT3sybYTmjJpaJZvrhTGi/E/LCFAsAE1iQTawovC4xdT4LwGbID 49qg/Fg9l8485zv7sTJOxjJBQKeePKVWuU0iSuOg9d1lc62ilO0A1NQ61Ys8wqwn DulsHART0d47sQzTwitMBJxX45iW+z+EUSi4WXTZSi1VpAvi8pIYHsnsPLaAOJXR k/vJDQJv0zCZD488pHpMIg3CQHCgavYh7V3bpcpkUcjI/YnnBVGjwn1X7ZT88FTh q9KK3mhz5ZfpNu7ZCrmn7uXdMOa8SL7r6NBhNjUAAoU4bOIg782AxIMOzBYJ9p0w BqcmgKcIaClBMW+q1zYuDh1l221G2VAPvwjaTr9EZ2jEjr5EyqWwZU+icOvZG+vl ETfH4NbRy5Zi5BinHkgYu2GRhk4MmJm/9QNpd/fu14sdnOwQUZs= =rLBB -----END PGP SIGNATURE----- Merge 4.9.197 into android-4.9-q Changes in 4.9.197 KVM: s390: Test for bad access register and size at the start of S390_MEM_OP s390/topology: avoid firing events before kobjs are created s390/cio: avoid calling strlen on null pointer s390/cio: exclude subchannels with no parent from pseudo check KVM: nVMX: handle page fault in vmread fix ASoC: Define a set of DAPM pre/post-up events powerpc/powernv: Restrict OPAL symbol map to only be readable by root can: mcp251x: mcp251x_hw_reset(): allow more time after a reset crypto: qat - Silence smp_processor_id() warning usercopy: Avoid HIGHMEM pfn warning timer: Read jiffies once when forwarding base clk watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout ieee802154: atusb: fix use-after-free at disconnect cfg80211: initialize on-stack chandefs ima: always return negative code for error fs: nfs: Fix possible null-pointer dereferences in encode_attrs() 9p: avoid attaching writeback_fid on mmap with type PRIVATE xen/pci: reserve MCFG areas earlier ceph: fix directories inode i_blkbits initialization ceph: reconnect connection if session hang in opening state drm/amdgpu: Check for valid number of registers to read thermal: Fix use-after-free when unregistering thermal zone device fuse: fix memleak in cuse_channel_open sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() kernel/elfcore.c: include proper prototypes tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure perf tools: Fix segfault in cpu_cache_level__read() perf stat: Fix a segmentation fault when using repeat forever perf stat: Reset previous counts on repeat with interval crypto: caam - fix concurrency issue in givencrypt descriptor coresight: etm4x: Use explicit barriers on enable/disable cfg80211: add and use strongly typed element iteration macros cfg80211: Use const more consistently in for_each_element macros nl80211: validate beacon head ASoC: sgtl5000: Improve VAG power and mute control panic: ensure preemption is disabled during panic() USB: rio500: Remove Rio 500 kernel driver USB: yurex: Don't retry on unexpected errors USB: yurex: fix NULL-derefs on disconnect USB: usb-skeleton: fix runtime PM after driver unbind USB: usb-skeleton: fix NULL-deref on disconnect xhci: Fix false warning message about wrong bounce buffer write length xhci: Prevent device initiated U1/U2 link pm if exit latency is too long xhci: Check all endpoints for LPM timeout usb: xhci: wait for CNR controller not ready bit in xhci resume xhci: Increase STS_SAVE timeout in xhci_suspend() USB: adutux: remove redundant variable minor USB: adutux: fix use-after-free on disconnect USB: adutux: fix NULL-derefs on disconnect USB: adutux: fix use-after-free on release USB: iowarrior: fix use-after-free on disconnect USB: iowarrior: fix use-after-free on release USB: iowarrior: fix use-after-free after driver unbind USB: usblp: fix runtime PM after driver unbind USB: chaoskey: fix use-after-free on release USB: ldusb: fix NULL-derefs on driver unbind serial: uartlite: fix exit path null pointer USB: serial: keyspan: fix NULL-derefs on open() and write() USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20 USB: serial: option: add Telit FN980 compositions USB: serial: option: add support for Cinterion CLS8 devices USB: serial: fix runtime PM after driver unbind USB: usblcd: fix I/O after disconnect USB: microtek: fix info-leak at probe USB: dummy-hcd: fix power budget for SuperSpeed mode usb: renesas_usbhs: gadget: Do not discard queues in usb_ep_set_{halt,wedge}() usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior USB: legousbtower: fix slab info leak at probe USB: legousbtower: fix deadlock on disconnect USB: legousbtower: fix potential NULL-deref on disconnect USB: legousbtower: fix open after failed reset request USB: legousbtower: fix use-after-free on release staging: vt6655: Fix memory leak in vt6655_probe iio: adc: ad799x: fix probe error handling iio: light: opt3001: fix mutex unlock race efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified perf llvm: Don't access out-of-scope array perf inject jit: Fix JIT_CODE_MOVE filename CIFS: Gracefully handle QueryInfo errors during open CIFS: Force revalidate inode when dentry is stale CIFS: Force reval dentry if LOOKUP_REVAL flag is set kernel/sysctl.c: do not override max_threads provided by userspace staging: fbtft: Stop using BL_CORE_DRIVER1 Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc MIPS: Disable Loongson MMI instructions for kernel build Fix the locking in dcache_readdir() and friends media: stkwebcam: fix runtime PM after driver unbind tracing/hwlat: Report total time spent in all NMIs during the sample tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency tracing: Get trace_array reference for available_tracers files x86/asm: Fix MWAITX C-state hint value xfs: clear sb->s_fs_info on mount failure Linux 4.9.197 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-10-17 13:54:42 -07:00
Michal Hocko	5a4a1217c0	kernel/sysctl.c: do not override max_threads provided by userspace commit b0f53dbc4bc4c371f38b14c391095a3bb8a0bb40 upstream. Partially revert `16db3d3f11` ("kernel/sysctl.c: threads-max observe limits") because the patch is causing a regression to any workload which needs to override the auto-tuning of the limit provided by kernel. set_max_threads is implementing a boot time guesstimate to provide a sensible limit of the concurrently running threads so that runaways will not deplete all the memory. This is a good thing in general but there are workloads which might need to increase this limit for an application to run (reportedly WebSpher MQ is affected) and that is simply not possible after the mentioned change. It is also very dubious to override an admin decision by an estimation that doesn't have any direct relation to correctness of the kernel operation. Fix this by dropping set_max_threads from sysctl_max_threads so any value is accepted as long as it fits into MAX_THREADS which is important to check because allowing more threads could break internal robust futex restriction. While at it, do not use MIN_THREADS as the lower boundary because it is also only a heuristic for automatic estimation and admin might have a good reason to stop new threads to be created even when below this limit. This became more severe when we switched x86 from 4k to 8k kernel stacks. Starting since `6538b8ea88` ("x86_64: expand kernel stack to 16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned value. In the particular case 3.12 kernel.threads-max = 515561 4.4 kernel.threads-max = 200000 Neither of the two values is really insane on 32GB machine. I am not sure we want/need to tune the max_thread value further. If anything the tuning should be removed altogether if proven not useful in general. But we definitely need a way to override this auto-tuning. Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz Fixes: `16db3d3f11` ("kernel/sysctl.c: threads-max observe limits") Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-10-17 13:42:44 -07:00
Joel Fernandes (Google)	af1070fbf2	UPSTREAM: pidfd: add polling support This patch adds polling support to pidfd. Android low memory killer (LMK) needs to know when a process dies once it is sent the kill signal. It does so by checking for the existence of /proc/pid which is both racy and slow. For example, if a PID is reused between when LMK sends a kill signal and checks for existence of the PID, since the wrong PID is now possibly checked for existence. Using the polling support, LMK will be able to get notified when a process exists in race-free and fast way, and allows the LMK to do other things (such as by polling on other fds) while awaiting the process being killed to die. For notification to polling processes, we follow the same existing mechanism in the kernel used when the parent of the task group is to be notified of a child's death (do_notify_parent). This is precisely when the tasks waiting on a poll of pidfd are also awakened in this patch. We have decided to include the waitqueue in struct pid for the following reasons: 1. The wait queue has to survive for the lifetime of the poll. Including it in task_struct would not be option in this case because the task can be reaped and destroyed before the poll returns. 2. By including the struct pid for the waitqueue means that during de_thread(), the new thread group leader automatically gets the new waitqueue/pid even though its task_struct is different. Appropriate test cases are added in the second patch to provide coverage of all the cases the patch is handling. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Jonathan Kowalski <bl0pbl33p@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: kernel-team@android.com Reviewed-by: Oleg Nesterov <oleg@redhat.com> Co-developed-by: Daniel Colascione <dancol@google.com> Signed-off-by: Daniel Colascione <dancol@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Christian Brauner <christian@brauner.io> (cherry picked from commit b53b0b9d9a613c418057f6cb921c2f40a6f78c24) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: I02f259d2875bec46b198d580edfbb067f077084e Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-09-03 13:45:29 -07:00
Christian Brauner	3941f126e3	BACKPORT: fork: do not release lock that wasn't taken Avoid calling cgroup_threadgroup_change_end() without having called cgroup_threadgroup_change_begin() first. During process creation we need to check whether the cgroup we are in allows us to fork. To perform this check the cgroup needs to guard itself against threadgroup changes and takes a lock. Prior to CLONE_PIDFD the cleanup target "bad_fork_free_pid" would also need to call cgroup_threadgroup_change_end() because said lock had already been taken. However, this is not the case anymore with the addition of CLONE_PIDFD. We are now allocating a pidfd before we check whether the cgroup we're in can fork and thus prior to taking the lock. So when copy_process() fails at the right step it would release a lock we haven't taken. This bug is not even very subtle to be honest. It's just not very clear from the naming of cgroup_threadgroup_change_{begin,end}() that a lock is taken. Here's the relevant splat: entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7fec849 Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078 RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000 RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(depth <= 0) WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 __lock_release kernel/locking/lockdep.c:4052 [inline] WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 7744 Comm: syz-executor007 Not tainted 5.1.0+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 panic+0x2cb/0x65c kernel/panic.c:214 __warn.cold+0x20/0x45 kernel/panic.c:566 report_bug+0x263/0x2b0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:179 [inline] fixup_bug arch/x86/kernel/traps.c:174 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972 RIP: 0010:__lock_release kernel/locking/lockdep.c:4052 [inline] RIP: 0010:lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321 Code: 0f 85 a0 03 00 00 8b 35 77 66 08 08 85 f6 75 23 48 c7 c6 a0 55 6b 87 48 c7 c7 40 25 6b 87 4c 89 85 70 ff ff ff e8 b7 a9 eb ff <0f> 0b 4c 8b 85 70 ff ff ff 4c 89 ea 4c 89 e6 4c 89 c7 e8 52 63 ff RSP: 0018:ffff888094117b48 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 1ffff11012822f6f RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815af236 RDI: ffffed1012822f5b RBP: ffff888094117c00 R08: ffff888092bfc400 R09: fffffbfff113301d R10: fffffbfff113301c R11: ffffffff889980e3 R12: ffffffff8a451df8 R13: ffffffff8142e71f R14: ffffffff8a44cc80 R15: ffff888094117bd8 percpu_up_read.constprop.0+0xcb/0x110 include/linux/percpu-rwsem.h:92 cgroup_threadgroup_change_end include/linux/cgroup-defs.h:712 [inline] copy_process.part.0+0x47ff/0x6710 kernel/fork.c:2222 copy_process kernel/fork.c:1772 [inline] _do_fork+0x25d/0xfd0 kernel/fork.c:2338 __do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline] __se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline] __ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236 do_syscall_32_irqs_on arch/x86/entry/common.c:334 [inline] do_fast_syscall_32+0x281/0xd54 arch/x86/entry/common.c:405 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7fec849 Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078 RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000 RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Reported-and-tested-by: syzbot+3286e58549edc479faae@syzkaller.appspotmail.com Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Christian Brauner <christian@brauner.io> (cherry picked from commit c3b7112df86b769927a60a6d7175988ca3d60f09) Conflicts: kernel/fork.c (1. Replaced cgroup_threadgroup_change_end with threadgroup_change_end) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: Ib9ecb1e5c0c6e2d062b89c25109ec571570eb497 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-09-03 13:45:09 -07:00
Christian Brauner	0e020c19bb	BACKPORT: clone: add CLONE_PIDFD This patchset makes it possible to retrieve pid file descriptors at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. As spotted by Linus, there is exactly one bit for clone() left. CLONE_PIDFD creates file descriptors based on the anonymous inode implementation in the kernel that will also be used to implement the new mount api. They serve as a simple opaque handle on pids. Logically, this makes it possible to interpret a pidfd differently, narrowing or widening the scope of various operations (e.g. signal sending). Thus, a pidfd cannot just refer to a tgid, but also a tid, or in theory - given appropriate flag arguments in relevant syscalls - a process group or session. A pidfd does not represent a privilege. This does not imply it cannot ever be that way but for now this is not the case. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the parent_tidptr argument of clone. This has the advantage that we can give back the associated pid and the pidfd at the same time. To remove worries about missing metadata access this patchset comes with a sample program that illustrates how a combination of CLONE_PIDFD, and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. The sample program can easily be translated into a helper that would be suitable for inclusion in libc so that users don't have to worry about writing it themselves. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit b3e5838252665ee4cfa76b82bdf1198dca81e5be) Conflicts: kernel/fork.c (1. Replaced proc_pid_ns() with its direct implementation.) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: I3c804a92faea686e5bf7f99df893fe3a5d87ddf7 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-09-03 13:44:48 -07:00
Greg Kroah-Hartman	0eb90dd8f7	This is the 4.9.187 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GilkACgkQONu9yGCS aT6ibw//QcoRk+6jjWIzW2R83tvcSpDREIYKF5xDPULqmaUKv7qcbGqdCT4OI0Vf CySao6jvmN+1V+5OlQHHw1qmithHF/Y4yEVi+eL0CcMgB6qA2tQxeA0ffFu0Fzxe oKrAIDANAJ2FKymqbIzTJU8AChNuy+Rc+C60L9O0ZhBHykLYvO7/VepTxdO2aMGs a9tdZyXnOAr/ZgKx+uN1F8Rx2ZorNfFmP2rDxEZ/B8pVrXsOjMAcxWRsZBv+w9mc zWaMPEL1vIR/kG35L18l2EFwZ++uIGvfGR2HxhNRUlTqN4m/3trATIc0eRn5PyAC qBlVdcwUeJKavBqK3cgHvK9CzWQdSMxNefk9A0H66ZGpfSuNCiFjw54AXEiRzx3t OvzUvDjfa36jsKW7yiYXdLbEa52gT14UDzmSIzlAYQLBplEadHikJ0FxCFzAWpFt C13gG1e4G0ZKEHH8wZMuRdanHbN87c/0Sm4rgTLMwCO6I1PeILcUwIq9El9M4RVL MmHSXLKduaHbPJWx+Qopl6NxjqTe/VR8paT6q1QnU00/4kP15aIVE08vZSiGSNP+ Gp6xbv12skAx1m1zP7K82oGdwnCVCJUqlC9wafcjsaxcoCVBWvIgPkyMWAzUeFzc Ub2yIS9iulUeYOxPXBZKWCqgwpl7kovGztf4gwX2Kuy50mXSZQg= =ozfF -----END PGP SIGNATURE----- Merge 4.9.187 into android-4.9-q Changes in 4.9.187 MIPS: ath79: fix ar933x uart parity mode MIPS: fix build on non-linux hosts arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly dmaengine: imx-sdma: fix use-after-free on probe error path ath10k: Do not send probe response template for mesh ath9k: Check for errors when reading SREV register ath6kl: add some bounds checking ath: DFS JP domain W56 fixed pulse type 3 RADAR detection batman-adv: fix for leaked TVLV handler. media: dvb: usb: fix use after free in dvb_usb_device_exit crypto: talitos - fix skcipher failure due to wrong output IV media: marvell-ccic: fix DMA s/g desc number calculation media: vpss: fix a potential NULL pointer dereference media: media_device_enum_links32: clean a reserved field net: stmmac: dwmac1000: Clear unused address entries net: stmmac: dwmac4/5: Clear unused address entries signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig af_key: fix leaks in key_pol_get_resp and dump_sp. xfrm: Fix xfrm sel prefix length validation media: mc-device.c: don't memset __user pointer contents media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails. net: phy: Check against net_device being NULL crypto: talitos - properly handle split ICV. crypto: talitos - Align SEC1 accesses to 32 bits boundaries. tua6100: Avoid build warnings. locking/lockdep: Fix merging of hlocks with non-zero references media: wl128x: Fix some error handling in fm_v4l2_init_video_device() cpupower : frequency-set -r option misses the last cpu in related cpu list net: fec: Do not use netdev messages too early net: axienet: Fix race condition causing TX hang s390/qdio: handle PENDING state for QEBSM devices perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode perf test 6: Fix missing kvm module load for s390 gpio: omap: fix lack of irqstatus_raw0 for OMAP4 gpio: omap: ensure irq is enabled before wakeup regmap: fix bulk writes on paged registers bpf: silence warning messages in core rcu: Force inlining of rcu_read_lock() blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration xfrm: fix sa selector validation perf evsel: Make perf_evsel__name() accept a NULL argument vhost_net: disable zerocopy by default ipoib: correcly show a VF hardware address EDAC/sysfs: Fix memory leak when creating a csrow object ipsec: select crypto ciphers for xfrm_algo media: i2c: fix warning same module names ntp: Limit TAI-UTC offset timer_list: Guard procfs specific code acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 media: coda: fix mpeg2 sequence number handling media: coda: increment sequence offset for the last returned frame mt7601u: do not schedule rx_tasklet when the device has been disconnected x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c mt7601u: fix possible memory leak when the device is disconnected ath10k: fix PCIE device wake up failed perf tools: Increase MAX_NR_CPUS and MAX_CACHES libata: don't request sense data on !ZAC ATA devices clocksource/drivers/exynos_mct: Increase priority over ARM arch timer rslib: Fix decoding of shortened codes rslib: Fix handling of of caller provided syndrome ixgbe: Check DDM existence in transceiver before access crypto: asymmetric_keys - select CRYPTO_HASH where needed EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() iwlwifi: mvm: Drop large non sta frames net: usb: asix: init MAC address buffers gpiolib: Fix references to gpiod_[gs]et_value_cansleep() variants Bluetooth: hci_bcsp: Fix memory leak in rx_skb Bluetooth: 6lowpan: search for destination address in all peers Bluetooth: Check state in l2cap_disconnect_rsp Bluetooth: validate BLE connection interval updates gtp: fix Illegal context switch in RCU read-side critical section. gtp: fix use-after-free in gtp_newlink() xen: let alloc_xenballooned_pages() fail if not enough memory free scsi: NCR5380: Reduce goto statements in NCR5380_select() scsi: NCR5380: Always re-enable reselection interrupt scsi: mac_scsi: Increase PIO/PDMA transfer length threshold crypto: ghash - fix unaligned memory access in ghash_setkey() crypto: arm64/sha1-ce - correct digest for empty data in finup crypto: arm64/sha2-ce - correct digest for empty data in finup crypto: chacha20poly1305 - fix atomic sleep when using async algorithm crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe Input: gtco - bounds check collection indent level regulator: s2mps11: Fix buck7 and buck8 wrong voltages arm64: tegra: Update Jetson TX1 GPU regulator timings iwlwifi: pcie: don't service an interrupt that was masked tracing/snapshot: Resize spare buffer if size changed NFSv4: Handle the special Linux file open access mode lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE ALSA: seq: Break too long mutex context in the write loop ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom() media: coda: Remove unbalanced and unneeded mutex unlock KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed arm64: tegra: Fix AGIC register range fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes. drm/nouveau/i2c: Enable i2c pads & busses during preinit padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 9p/virtio: Add cleanup path in p9_virtio_init PCI: Do not poll for PME if the device is in D3cold Btrfs: add missing inode version, ctime and mtime updates when punching hole libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields take floppy compat ioctls to sodding floppy.c floppy: fix div-by-zero in setup_format_params floppy: fix out-of-bounds read in next_valid_format floppy: fix invalid pointer dereference in drive_name floppy: fix out-of-bounds read in copy_buffer coda: pass the host file in vma->vm_file on mmap gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM crypto: ccp - Validate the the error value used to index error messages PCI: hv: Delete the device earlier from hbus->children for hot-remove PCI: hv: Fix a use-after-free bug in hv_eject_device_work() crypto: caam - limit output IV to CBC to work around CTR mode DMA issue um: Allow building and running on older hosts um: Fix FP register size for XSTATE/XSAVE parisc: Ensure userspace privilege for ptraced processes in regset functions parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1 powerpc/32s: fix suspend/resume when IBATs 4-7 are used powerpc/watchpoint: Restore NV GPRs while returning from exception eCryptfs: fix a couple type promotion bugs intel_th: msu: Fix single mode with disabled IOMMU Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug usb: Handle USB3 remote wakeup for LPM enabled devices correctly dm bufio: fix deadlock with loop device compiler.h, kasan: Avoid duplicating __read_once_size_nocheck() compiler.h: Add read_word_at_a_time() function. lib/strscpy: Shut up KASAN false-positives in strscpy() ext4: allow directory holes bnx2x: Prevent load reordering in tx completion processing bnx2x: Prevent ptp_task to be rescheduled indefinitely caif-hsi: fix possible deadlock in cfhsi_exit_module() igmp: fix memory leak in igmpv3_del_delrec() ipv4: don't set IPv6 only flags to IPv4 addresses net: bcmgenet: use promisc for unsupported filters net: dsa: mv88e6xxx: wait after reset deactivation net: neigh: fix multiple neigh timer scheduling net: openvswitch: fix csum updates for MPLS actions nfc: fix potential illegal memory access rxrpc: Fix send on a connected, but unbound socket sky2: Disable MSI on ASUS P6T vrf: make sure skb->data contains ip header to make routing macsec: fix use-after-free of skb during RX macsec: fix checksumming after decryption netrom: fix a memory leak in nr_rx_frame() netrom: hold sock when setting skb->destructor bonding: validate ip header before check IPPROTO_IGMP tcp: Reset bytes_acked and bytes_received when disconnecting net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query net: bridge: stp: don't cache eth dest pointer before skb pull perf/x86/amd/uncore: Rename 'L2' to 'LLC' perf/x86/amd/uncore: Get correct number of cores sharing last level cache perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id NFSv4: Fix open create exclusive when the server reboots nfsd: increase DRC cache limit nfsd: give out fewer session slots as limit approaches nfsd: fix performance-limiting session calculation nfsd: Fix overflow causing non-working mounts on 1 TB machines drm/panel: simple: Fix panel_simple_dsi_probe usb: core: hub: Disable hub-initiated U1/U2 tty: max310x: Fix invalid baudrate divisors calculator pinctrl: rockchip: fix leaked of_node references tty: serial: cpm_uart - fix init when SMC is relocated drm/bridge: tc358767: read display_props in get_modes() drm/bridge: sii902x: pixel clock unit is 10kHz instead of 1kHz memstick: Fix error cleanup path of memstick_init tty/serial: digicolor: Fix digicolor-usart already registered warning tty: serial: msm_serial: avoid system lockup condition serial: 8250: Fix TX interrupt handling condition drm/virtio: Add memory barriers for capset cache. phy: renesas: rcar-gen2: Fix memory leak at error paths drm/rockchip: Properly adjust to a true clock in adjusted_mode tty: serial_core: Set port active bit in uart_port_activate usb: gadget: Zero ffs_io_data powerpc/pci/of: Fix OF flags parsing for 64bit BARs PCI: sysfs: Ignore lockdep for remove attribute kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS PCI: xilinx-nwl: Fix Multi MSI data programming iio: iio-utils: Fix possible incorrect mask calculation recordmcount: Fix spurious mcount entries on powerpc mfd: core: Set fwnode for created devices mfd: arizona: Fix undefined behavior mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk um: Silence lockdep complaint about mmap_sem powerpc/4xx/uic: clear pending interrupt after irq type/pol change RDMA/i40iw: Set queue pair state when being queried serial: sh-sci: Terminate TX DMA during buffer flushing serial: sh-sci: Fix TX DMA buffer flushing and workqueue races kallsyms: exclude kasan local symbols on s390 perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h f2fs: avoid out-of-range memory access mailbox: handle failed named mailbox channel request powerpc/eeh: Handle hugepages in ioremap space sh: prevent warnings when using iounmap mm/kmemleak.c: fix check for softirq context 9p: pass the correct prototype to read_cache_page mm/mmu_notifier: use hlist_add_head_rcu() locking/lockdep: Fix lock used or unused stats error locking/lockdep: Hide unused 'class' variable usb: wusbcore: fix unbalanced get/put cluster_id usb: pci-quirks: Correct AMD PLL quirk detection x86/sysfb_efi: Add quirks for some devices with swapped width and height x86/speculation/mds: Apply more accurate check on hypervisor platform hpet: Fix division by zero in hpet_time_div() ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1 ALSA: hda - Add a conexant codec entry to let mute led work powerpc/tm: Fix oops on sigreturn on systems without TM access: avoid the RCU grace period for the temporary subjective credentials ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt tcp: reset sk_send_head in tcp_write_queue_purge arm64: dts: marvell: Fix A37xx UART0 register size i2c: qup: fixed releasing dma without flush operation completion arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ ISDN: hfcsusb: checking idx of ep configuration media: au0828: fix null dereference in error path media: cpia2_usb: first wake up, then free in disconnect media: radio-raremono: change devm_kalloc to k*alloc Bluetooth: hci_uart: check for missing tty operations sched/fair: Don't free p->numa_faults with concurrent readers drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl ceph: hold i_ceph_lock when removing caps for freeing inode Linux 4.9.187 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-08-04 09:50:32 +02:00
Jann Horn	837ffc9723	sched/fair: Don't free p->numa_faults with concurrent readers commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream. When going through execve(), zero out the NUMA fault statistics instead of freeing them. During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU. Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: `82727018b0` ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-04 09:33:45 +02:00
Johannes Weiner	3df0e59afa	UPSTREAM: psi: pressure stall information for CPU, memory, and IO When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and throughput on the individual job can be enormous. In order to maximize hardware utilization without sacrificing individual job health or risk complete machine lockups, this patch implements a way to quantify resource pressure in the system. A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that expose the percentage of time the system is stalled on CPU, memory, or IO, respectively. Stall states are aggregate versions of the per-task delay accounting delays: cpu: some tasks are runnable but not executing on a CPU memory: tasks are reclaiming, or waiting for swapin or thrashing cache io: tasks are waiting for io completions These percentages of walltime can be thought of as pressure percentages, and they give a general sense of system health and productivity loss incurred by resource overcommit. They can also indicate when the system is approaching lockup scenarios and OOMs. To do this, psi keeps track of the task states associated with each CPU and samples the time they spend in stall states. Every 2 seconds, the samples are averaged across CPUs - weighted by the CPUs' non-idle time to eliminate artifacts from unused CPUs - and translated into percentages of walltime. A running average of those percentages is maintained over 10s, 1m, and 5m periods (similar to the loadaverage). [hannes@cmpxchg.org: doc fixlet, per Randy] Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org [hannes@cmpxchg.org: code optimization] Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org [hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter] Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org [hannes@cmpxchg.org: fix build] Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2) Bug: 111308141 Test: modified lmkd to use PSI and tested using lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I54a65620b3ed6f8172fdec789a237a99f8c82156	2019-03-22 14:10:35 -07:00
Greg Kroah-Hartman	c7b283dd04	This is the 4.9.150 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlw6/vsACgkQONu9yGCS aT7Qmw//aTFzpEg2elxZ2o2U+363AGxtHcA8o/4pInzdfeKnnUL3fcdFd2EyGcuW 33w9mCRqSTgy5xln7E0PFX6s2NkfDTlMIrnywu0Bp6zhT2mkljueYK3FhboiJb3b mZazz+PoOAQQ6OtRiGtJnjuM0YCrnwZKkHZgM0IQ1rctl5f9h6eukqEL/Hr2lNgd /AmSwP+AgGNSStPMsyNWkbXGsWNfOxT/VIksBqAZ4giaIAgi89OHlDHhAe49W/Am Ak/R7Rd+ny7jb0NGTONd1J/B67sO0FZ7InayfzyMeJ4l9q/TimKdcXHg2nUoh1h/ dqIE1GmqVhIPsKAbNYmMgFrdBgmQY3LGL2nB5h5V77u86fX70jsXqwKJMJsQcy+F Uiqx6C3TU/yFW4Rg4mZfO6vliP97H6H3ohUQUetu1Pt5Hx4hxSiGc40mZPr80pBf 3d2ntaRdh7tSM8ypyO3Cp8CHG3Lzy1/T5x5tFYi7kMdZ5P86jCaZupOrCIjje6fz pGwsvJA/9pNr/DK2KbaFpYbzVNXqRCHgUsLbsbrJ4u+Cwzca5+khTQnGWaUy1exY O6ulxr6wNyvDPxZHnbbZF7Ikra+6i0KoPWAv9NeNJdH9/zldpKAK6RRhFg4EikTW 5VuXm+805MyPc+UI63MU5Hk83MDl7hYRd9WhKWdbpfgbEYpfzac= =rFBu -----END PGP SIGNATURE----- Merge 4.9.150 into android-4.9 Changes in 4.9.150 pinctrl: meson: fix pull enable register calculation powerpc: Fix COFF zImage booting on old powermacs ARM: imx: update the cpu power up timing setting on i.mx6sx ARM: dts: imx7d-nitrogen7: Fix the description of the Wifi clock Input: restore EV_ABS ABS_RESERVED checkstack.pl: fix for aarch64 xfrm: Fix bucket count reported to userspace netfilter: seqadj: re-load tcp header pointer after possible head reallocation scsi: bnx2fc: Fix NULL dereference in error handling Input: omap-keypad - fix idle configuration to not block SoC idle states netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel bnx2x: Clear fip MAC when fcoe offload support is disabled bnx2x: Remove configured vlans as part of unload sequence. bnx2x: Send update-svid ramrod with retry/poll flags enabled scsi: target: iscsi: cxgbit: fix csk leak scsi: target: iscsi: cxgbit: add missing spin_lock_init() drivers: net: xgene: Remove unnecessary forward declarations w90p910_ether: remove incorrect __init annotation net: hns: Incorrect offset address used for some registers. net: hns: All ports can not work when insmod hns ko after rmmod. net: hns: Some registers use wrong address according to the datasheet. net: hns: Fixed bug that netdev was opened twice net: hns: Clean rx fbd when ae stopped. net: hns: Free irq when exit from abnormal branch net: hns: Avoid net reset caused by pause frames storm net: hns: Fix ntuple-filters status error. net: hns: Add mac pcs config when enable\|disable mac SUNRPC: Fix a race with XPRT_CONNECTING lan78xx: Resolve issue with changing MAC address vxge: ensure data0 is initialized in when fetching firmware version information net: netxen: fix a missing check and an uninitialized use serial/sunsu: fix refcount leak scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown libceph: fix CEPH_FEATURE_CEPHX_V2 check in calc_signature() fork: record start_time late hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL mm, devm_memremap_pages: kill mapping "System RAM" support sunrpc: fix cache_head leak due to queued request sunrpc: use SVC_NET() in svcauth_gss_* functions MIPS: math-emu: Write-protect delay slot emulation pages crypto: x86/chacha20 - avoid sleeping with preemption disabled vhost/vsock: fix uninitialized vhost_vsock->guest_cid IB/hfi1: Incorrect sizing of sge for PIO will OOPs ALSA: cs46xx: Potential NULL dereference in probe ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit() ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks dlm: fixed memory leaks after failed ls_remove_names allocation dlm: possible memory leak on error path in create_lkb() dlm: lost put_lkb on error path in receive_convert() and receive_unlock() dlm: memory leaks on error path in dlm_user_request() gfs2: Get rid of potential double-freeing in gfs2_create_inode gfs2: Fix loop in gfs2_rbm_find b43: Fix error in cordic routine powerpc/tm: Set MSR[TS] just prior to recheckpoint 9p/net: put a lower bound on msize rxe: fix error completion wr_id and qp_num iommu/vt-d: Handle domain agaw being less than iommu agaw ceph: don't update importing cap's mseq when handing cap export genwqe: Fix size check intel_th: msu: Fix an off-by-one in attribute store power: supply: olpc_battery: correct the temperature units drm/vc4: Set ->is_yuv to false when num_planes == 1 bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw Linux 4.9.150 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-01-13 10:34:18 +01:00
David Herrmann	0ea6030b55	fork: record start_time late commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream. This changes the fork(2) syscall to record the process start_time after initializing the basic task structure but still before making the new process visible to user-space. Technically, we could record the start_time anytime during fork(2). But this might lead to scenarios where a start_time is recorded long before a process becomes visible to user-space. For instance, with userfaultfd(2) and TLS, user-space can delay the execution of fork(2) for an indefinite amount of time (and will, if this causes network access, or similar). By recording the start_time late, it much closer reflects the point in time where the process becomes live and can be observed by other processes. Lastly, this makes it much harder for user-space to predict and control the start_time they get assigned. Previously, user-space could fork a process and stall it in copy_thread_tls() before its pid is allocated, but after its start_time is recorded. This can be misused to later-on cycle through PIDs and resume the stalled fork(2) yielding a process that has the same pid and start_time as a process that existed before. This can be used to circumvent security systems that identify processes by their pid+start_time combination. Even though user-space was always aware that start_time recording is flaky (but several projects are known to still rely on start_time-based identification), changing the start_time to be recorded late will help mitigate existing attacks and make it much harder for user-space to control the start_time a process gets assigned. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:03:51 +01:00
Greg Kroah-Hartman	ba01a4255d	This is the 4.9.128 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAluitdUACgkQONu9yGCS aT6k7g//fxvMjnxYwVZ6LQjFkBakhP4gU/unxKppIFb0Iti+LBpx3Fg5JbtpRM8y oCAAMb9F5Dxl38691BqBpCdihmrGurh9EDn9ZZDsoUtipQ7UQozmGDbbc4fV0cQW /2Fs7GjFp1XopOAvFp7z/Xn5Xi8fSxgWg6l6aQ5VzsUtK69FM4TmNPJ2x1adow+N UsewbUkLkD9dMIrrgIajeCsErRYnJHy6FuY/pWSXVuY1gyTlCW/lsxKU2OoS5l5Z u/ZxprWl7z6FjSfzHc4AYrBR6t2zCAV2DNZhX5lDU/EHzTAttq0sSaeEfWzqYrJM q/bStS8eEWj+MkqfrppWRc1I/ZwUrjMB1P8OJEW8UUFTjAcRvlBz/Z1poOCRkBRp flsohLfao2/dbPxT0oLQMidBRX9t90TQLxwGY2SJ2/dL6uU/sapQDo8/iwyCZvC6 u57DLf82P9tUvZgnUOoNuCPSqiZQG/tGPGFRGU0EgHVtoE7S285FvRJZoNlRH+oT EyAHtavzbgsoJb0oP12eOBfq0dwvmXt6i7ypzPoy/Fh+zQTb5Fk1fH+4GF+gFqWY //CpKO+zIdeFzWZ/r8eAkS+uLgvgywNbZV6N8EJYqviajKfU5A/RX5GsQ4oaFfk6 ++Anyn58UYYSkw7ov5ynLNmNr+tlMjKUuRXw0u9rtAhcTQcnHYc= =YQcR -----END PGP SIGNATURE----- Merge 4.9.128 into android-4.9 Changes in 4.9.128 i2c: xiic: Make the start and the byte count write atomic i2c: i801: fix DNV's SMBCTRL register offset KVM: s390: vsie: copy wrapping keys to right place ALSA: hda - Fix cancel_work_sync() stall from jackpoll work cfq: Give a chance for arming slice idle timer in case of group_idle kthread: Fix use-after-free if kthread fork fails kthread: fix boot hang (regression) on MIPS/OpenRISC staging: rt5208: Fix a sleep-in-atomic bug in xd_copy_page staging/rts5208: Fix read overflow in memcpy IB/rxe: do not copy extra stack memory to skb block,blkcg: use __GFP_NOWARN for best-effort allocations in blkcg nl80211: fix null-ptr dereference on invalid mesh configuration locking/rwsem-xadd: Fix missed wakeup due to reordering of load selinux: use GFP_NOWAIT in the AVC kmem_caches locking/osq_lock: Fix osq_lock queue corruption mm, vmscan: clear PGDAT_WRITEBACK when zone is balanced mm: remove seemingly spurious reclaimability check from laptop_mode gating ARC: [plat-axs*]: Enable SWAP misc: mic: SCIF Fix scif_get_new_port() error handling ethtool: Remove trailing semicolon for static inline Bluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV gpio: tegra: Move driver registration to subsys_init level net: phy: Fix the register offsets in Broadcom iProc mdio mux driver scsi: target: fix __transport_register_session locking md/raid5: fix data corruption of replacements after originals dropped timers: Clear timer_base::must_forward_clk with timer_base::lock held misc: ti-st: Fix memory leak in the error path of probe() uio: potential double frees if __uio_register_device() fails tty: rocket: Fix possible buffer overwrite on register_PCI f2fs: do not set free of current section perf tools: Allow overriding MAX_NR_CPUS at compile time NFSv4.0 fix client reference leak in callback macintosh/via-pmu: Add missing mmio accessors ath9k: report tx status on EOSP ath9k_hw: fix channel maximum power level test ath10k: prevent active scans on potential unusable channels wlcore: Set rx_status boottime_ns field on rx MIPS: Fix ISA virt/bus conversion for non-zero PHYS_OFFSET ata: libahci: Correct setting of DEVSLP register scsi: 3ware: fix return 0 on the error path of probe ath10k: disable bundle mgmt tx completion event support Bluetooth: hidp: Fix handling of strncpy for hid->name information x86/mm: Remove in_nmi() warning from vmalloc_fault() gpio: ml-ioh: Fix buffer underwrite on probe error path net: mvneta: fix mtu change on port without link f2fs: try grabbing node page lock aggressively in sync scenario f2fs: fix to skip GC if type in SSA and SIT is inconsistent tpm_tis_spi: Pass the SPI IRQ down to the driver tpm/tpm_i2c_infineon: switch to i2c_lock_bus(..., I2C_LOCK_SEGMENT) f2fs: fix to do sanity check with reserved blkaddr of inline inode MIPS: Octeon: add missing of_node_put() MIPS: generic: fix missing of_node_put() net: dcb: For wild-card lookups, use priority -1, not 0 Input: atmel_mxt_ts - only use first T9 instance media: s5p-mfc: Fix buffer look up in s5p_mfc_handle_frame_{new, copy_time} functions partitions/aix: append null character to print data from disk partitions/aix: fix usage of uninitialized lv_info and lvname structures media: helene: fix xtal frequency setting at power on f2fs: Fix uninitialized return in f2fs_ioc_shutdown() iommu/ipmmu-vmsa: Fix allocation in atomic context mfd: ti_am335x_tscadc: Fix struct clk memory leak f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize NFSv4.1: Fix a potential layoutget/layoutrecall deadlock MIPS: WARN_ON invalid DMA cache maintenance, not BUG_ON RDMA/cma: Do not ignore net namespace for unbound cm_id xhci: Fix use-after-free in xhci_free_virt_device netfilter: x_tables: avoid stack-out-of-bounds read in xt_copy_counters_from_user mtd: ubi: wl: Fix error return code in ubi_wl_init() autofs: fix autofs_sbi() does not check super block type mm: get rid of vmacache_flush_all() entirely Linux 4.9.128 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-09-20 11:14:01 +02:00
Vegard Nossum	a70e46bcea	kthread: Fix use-after-free if kthread fork fails commit 4d6501dce079c1eb6bf0b1d8f528a5e81770109e upstream. If a kthread forks (e.g. usermodehelper since commit 1da5c46fa965) but fails in copy_process() between calling dup_task_struct() and setting p->set_child_tid, then the value of p->set_child_tid will be inherited from the parent and get prematurely freed by free_kthread_struct(). kthread() - worker_thread() - process_one_work() \| - call_usermodehelper_exec_work() \| - kernel_thread() \| - _do_fork() \| - copy_process() \| - dup_task_struct() \| - arch_dup_task_struct() \| - tsk->set_child_tid = current->set_child_tid // implied \| - ... \| - goto bad_fork_* \| - ... \| - free_task(tsk) \| - free_kthread_struct(tsk) \| - kfree(tsk->set_child_tid) - ... - schedule() - __schedule() - wq_worker_sleeping() - kthread_data(task)->flags // UAF The problem started showing up with commit 1da5c46fa965 since it reused ->set_child_tid for the kthread worker data. A better long-term solution might be to get rid of the ->set_child_tid abuse. The comment in set_kthread_struct() also looks slightly wrong. Debugged-by: Jamie Iles <jamie.iles@oracle.com> Fixes: 1da5c46fa965 ("kthread: Make struct kthread kmalloc'ed") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jamie Iles <jamie.iles@oracle.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-09-19 22:47:10 +02:00
Greg Kroah-Hartman	be4935d541	This is the 4.9.127 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlucuBEACgkQONu9yGCS aT4HlBAAnXzLh/c8dKPdWzU3Bx6L9If7syeW0Gl06AtGsUrVqrggOnumdXRgPV3y /4Br6piwrpjHP9P7m/pS03Ol3+DvD4JKhMRFr0VzGwk4e32wZ8cNd7qx5RF/4z9s tpebYJVJoRRmefi1EJyj+NAGgDCk+7A1pzW3W2NK/mirQdkHywfLJHhQDyHgmWQy jyDpz2ihNdorKHN8ogK7FYQ7JZC9tnooB4GZFObaB8nkaeMaRGD2V2jrHRi8a4tv 5Bm9HmlGJOqtsstsr1Em6W+UNRKfztfDmYuk+H9xCo8odCSHeyx30nUDxjp6Bo2f OvC/x1x6JIVNsoAgzlRZKMJR1Lv0IK3SEVPrecOOSWeardtt92JUQjoLaLmW03xR dlnXZyDeO75Qbdgo/5b/d6RiVNGSv0CPDOQSREDMX6Rvf9P8ezceF/iDz0SrH9Zm GkjQCQ2yyqrrSID5baB562AHszBa+rdqgp4RBheE+M5s7UALL09tFJquaijZ8jvO dMs6f847MeMARFIGLNKff9cvWI2zanTwOPX+E3rcj88rN79jvIsyD/hDzkqHoR7J Rhb2K7478gC1N2XTOu97hNmB9qTNIn9U1hvS/f2d7SGyjOU9WR+XgCgDUWZsZnPP H4eeblr9iaCUHI/4Q+nWUuQPKnCdGb9Ay9lyq9uqHIQBz+uVZ4I= =hV8i -----END PGP SIGNATURE----- Merge 4.9.127 into android-4.9 Changes in 4.9.127 x86/speculation/l1tf: Fix up pte->pfn conversion for PAE act_ife: fix a potential use-after-free ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state net: bcmgenet: use MAC link status for fixed phy net: sched: Fix memory exposure from short TCA_U32_SEL qlge: Fix netdev features configuration. r8169: add support for NCube 8168 network card tcp: do not restart timewait timer on rst reception vti6: remove !skb->ignore_df check from vti6_xmit() sctp: hold transport before accessing its asoc in sctp_transport_get_next vhost: correctly check the iova range when waking virtqueue hv_netvsc: ignore devices that are not PCI act_ife: move tcfa_lock down to where necessary act_ife: fix a potential deadlock net: sched: action_ife: take reference to meta module cifs: check if SMB2 PDU size has been padded and suppress the warning hfsplus: don't return 0 when fill_super() failed hfs: prevent crash on exit from failed search sunrpc: Don't use stack buffer with scatterlist fork: don't copy inconsistent signal handler state to child reiserfs: change j_timestamp type to time64_t hfsplus: fix NULL dereference in hfsplus_lookup() fat: validate ->i_start before using scripts: modpost: check memory allocation results virtio: pci-legacy: Validate queue pfn mm/fadvise.c: fix signed overflow UBSAN complaint fs/dcache.c: fix kmemcheck splat at take_dentry_name_snapshot() platform/x86: intel_punit_ipc: fix build errors s390/kdump: Fix memleak in nt_vmcoreinfo ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest() mfd: sm501: Set coherent_dma_mask when creating subdevices platform/x86: asus-nb-wmi: Add keymap entry for lid flip action on UX360 RDMA/hns: Fix usage of bitmap allocation functions return values irqchip/bcm7038-l1: Hide cpu offline callback when building for !SMP net/9p/trans_fd.c: fix race by holding the lock net/9p: fix error path of p9_virtio_probe powerpc: Fix size calculation using resource_size() perf probe powerpc: Fix trace event post-processing block: bvec_nr_vecs() returns value for wrong slab s390/dasd: fix hanging offline processing due to canceled worker s390/dasd: fix panic for failed online processing ACPI / scan: Initialize status to ACPI_STA_DEFAULT scsi: aic94xx: fix an error code in aic94xx_init() PCI: mvebu: Fix I/O space end address calculation dm kcopyd: avoid softlockup in run_complete_job staging: comedi: ni_mio_common: fix subdevice flags for PFI subdevice selftests/powerpc: Kill child processes on SIGINT RDS: IB: fix 'passing zero to ERR_PTR()' warning smb3: fix reset of bytes read and written stats SMB3: Number of requests sent should be displayed for SMB3 not just CIFS powerpc/pseries: Avoid using the size greater than RTAS_ERROR_LOG_MAX. clk: rockchip: Add pclk_rkpwm_pmu to PMU critical clocks in rk3399 btrfs: replace: Reset on-disk dev stats value after replace btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized btrfs: Don't remove block group that still has pinned down bytes arm64: rockchip: Force CONFIG_PM on Rockchip systems ARM: rockchip: Force CONFIG_PM on Rockchip systems drm/edid: Add 6 bpc quirk for SDC panel in Lenovo B50-80 tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" debugobjects: Make stack check warning more informative x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear kbuild: make missing $DEPMOD a Warning instead of an Error irda: Fix memory leak caused by repeated binds of irda socket irda: Only insert new objects into the global database via setsockopt Revert "ARM: imx_v6_v7_defconfig: Select ULPI support" enic: do not call enic_change_mtu in enic_probe Fixes: Commit `2aa6d036b7` ("mm: numa: avoid waiting on freed migrated pages") sch_htb: fix crash on init failure sch_multiq: fix double free on init failure sch_hhf: fix null pointer dereference on init failure sch_netem: avoid null pointer deref on init failure sch_tbf: fix two null pointer dereferences on init failure mei: me: allow runtime pm for platform with D0i3 s390/lib: use expoline for all bcr instructions ASoC: wm8994: Fix missing break in switch btrfs: use correct compare function of dirty_metadata_bytes arm64: Fix mismatched cache line size detection arm64: Handle mismatched cache type Linux 4.9.127 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-09-15 12:17:13 +02:00
Jann Horn	015fd7e0a6	fork: don't copy inconsistent signal handler state to child [ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ] Before this change, if a multithreaded process forks while one of its threads is changing a signal handler using sigaction(), the memcpy() in copy_sighand() can race with the struct assignment in do_sigaction(). It isn't clear whether this can cause corruption of the userspace signal handler pointer, but it definitely can cause inconsistency between different fields of struct sigaction. Take the appropriate spinlock to avoid this. I have tested that this patch prevents inconsistency between sa_sigaction and sa_flags, which is possible before this patch. Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-09-15 09:42:57 +02:00
Greg Kroah-Hartman	92e87041ed	This is the 4.9.119 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltsFNgACgkQONu9yGCS aT6ZSBAAsAf1VBFKXLJDnCfMdft7Ecrvks3Lb+55qeoPbHgCPPR7ci9R0afaifUn 46O2RFCO5PJAPSGKnLzT6vRUozNIwXbR5zsu/KA9+4zEiej/4PFlF8Ty8yh7LTRI q0OgyRylbxeUGRjFeTJtaiMG5NIJQD6TEgMhEZoCOSPHVwWjrEyC4ka2RUuXetTw eXXbm8c+949flMAaxz8plbTUupOQaPiNAaTHzFJR73y5jiRfNTRejgOrawB97BrF N8uM73sNp87bQeCxQUd00SDrzhHEfCrYueeMhO/qfaduMSx/iUNPpK0KmzF6RxDy MsmlAFjs9AnPiM7/PH6pdAgwSqsEoIzlRF/oEMOJtONLwHitwGLna8x4NP49eiaQ 7Np6UJNSUJKTyZVar2SRuzKLAsOyoeZOraey9P4OQCKDVdvlLITLSQ2jbqgEn81K 8U92W0r5E/DI6oE/wubMRgus1xPh8yrlynoaa9+wFtSt7QHvBWEpmhe8RRXzAUNv rSJ4Ovj98IS096f2x1y4zvFfzT3k9soC+5Gxb00JUrWcSbDaffCWTCq4WhrmeasB I8YeP3lVDSRkkkqwxy1IY5otjYmqwUqhE1wXiai14I+XZ19pIfyRxooFwlCL5wwi vBm4Lu+cI8a6lUsM1mFk3sxYpekwY6xJt62W75RorCz36MesbLE= =daBd -----END PGP SIGNATURE----- Merge 4.9.119 into android-4.9 Changes in 4.9.119 scsi: qla2xxx: Fix ISP recovery on unload scsi: qla2xxx: Return error when TMF returns genirq: Make force irq threading setup more robust nohz: Fix local_timer_softirq_pending() netlink: Do not subscribe to non-existent groups netlink: Don't shift with UB on nlk->ngroups netlink: Don't shift on 64 for ngroups ext4: fix false negatives and false positives in ext4_check_descriptors() ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle ring_buffer: tracing: Inherit the tracing setting to next ring buffer i2c: imx: Fix reinit_completion() use Btrfs: fix file data corruption after cloning a range and fsync tcp: add tcp_ooo_try_coalesce() helper kmemleak: clear stale pointers from task stacks fork: unconditionally clear stack on fork IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values jfs: Fix inconsistency between memory allocation and ea_buf->max_size Linux 4.9.119 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-08-09 16:01:36 +02:00
Kees Cook	6a19e26f11	fork: unconditionally clear stack on fork commit e01e80634ecdde1dd113ac43b3adad21b47f3957 upstream. One of the classes of kernel stack content leaks[1] is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. In the face of stack content exposure flaws, those contents can leak to userspace. Fixing this will make the kernel no longer vulnerable to these flaws, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit. Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54 and after: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46 Instead of making this a build or runtime config, Andy Lutomirski recommended this just be enabled by default. [1] A noisy search for many kinds of stack content leaks can be seen here: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak I did some more with perf and cycle counts on running 100,000 execs of /bin/true. before: Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841 Mean: 221015379122.60 Std Dev: 4662486552.47 after: Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348 Mean: 217745009865.40 Std Dev: 5935559279.99 It continues to look like it's faster, though the deviation is rather wide, but I'm not sure what I could do that would be less noisy. I'm open to ideas! Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ Srivatsa: Backported to 4.9.y ] Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Srinidhi Rao <srinidhir@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-08-09 12:18:00 +02:00
Konstantin Khlebnikov	885b49b4f3	kmemleak: clear stale pointers from task stacks commit ca182551857cc2c1e6a2b7f1e72090a137a15008 upstream. Kmemleak considers any pointers on task stacks as references. This patch clears newly allocated and reused vmap stacks. Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ Srivatsa: Backported to 4.9.y ] Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-08-09 12:18:00 +02:00
Sultan Alsawaf	47bbcd6bf8	ANDROID: Fix massive cpufreq_times memory leaks Every time _cpu_up() is called for a CPU, idle_thread_get() is called which then re-initializes a CPU's idle thread that was already previously created and cached in a global variable in smpboot.c. idle_thread_get() calls init_idle() which then calls __sched_fork(). __sched_fork() is where cpufreq_task_times_init() is, and cpufreq_task_times_init() allocates memory for the task struct's time_in_state array. Since idle_thread_get() reuses a task struct instance that was already previously created, this means that every time it calls init_idle(), cpufreq_task_times_init() allocates this array again and overwrites the existing allocation that the idle thread already had. This causes memory to be leaked every time a CPU is onlined. In order to fix this, move allocation of time_in_state into _do_fork to avoid allocating it at all for idle threads. The cpufreq times interface is intended to be used for tracking userspace tasks, so we can safely remove it from the kernel's idle threads without killing any functionality. But that's not all! Task structs can be freed outside of release_task(), which creates another memory leak because a task struct can be freed without having its cpufreq times allocation freed. To fix this, free the cpufreq times allocation at the same time that task struct allocations are freed, in free_task(). Since free_task() can also be called in error paths of copy_process() after dup_task_struct(), set time_in_state to NULL immediately after calling dup_task_struct() to avoid possible double free. Bug description and fix adapted from patch submitted by Sultan Alsawaf <sultanxda@gmail.com> at https://android-review.googlesource.com/c/kernel/msm/+/700134 Bug: 110044919 Test: Hikey960 builds, boots & reports /proc/<pid>/time_in_state correctly Change-Id: I12fe7611fc88eb7f6c39f8f7629ad27b6ec4722c Signed-off-by: Connor O'Brien <connoro@google.com>	2018-07-18 13:22:08 +00:00
Hugh Dickins	0994a2cf8f	kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE Kaiser only needs to map one page of the stack; and kernel/fork.c did not build on powerpc (no __PAGE_KERNEL). It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack() and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's kaiser_add_mapping() and kaiser_remove_mapping(). And use linux/kaiser.h in init/main.c to avoid the #ifdefs there. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-05 15:46:32 +01:00
Dave Hansen	8f0baadf2b	kaiser: merged update Merged fixes and cleanups, rebased to 4.9.51 tree (no 5-level paging). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-05 15:46:32 +01:00
Richard Fellner	13be4483bb	KAISER: Kernel Address Isolation This patch introduces our implementation of KAISER (Kernel Address Isolation to have Side-channels Efficiently Removed), a kernel isolation technique to close hardware side channels on kernel address information. More information about the patch can be found on: https://github.com/IAIK/KAISER From: Richard Fellner <richard.fellner@student.tugraz.at> From: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode Date: Thu, 4 May 2017 14:26:50 +0200 Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2 Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5 To: <linux-kernel@vger.kernel.org> To: <kernel-hardening@lists.openwall.com> Cc: <clementine.maurice@iaik.tugraz.at> Cc: <moritz.lipp@iaik.tugraz.at> Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Cc: Richard Fellner <richard.fellner@student.tugraz.at> Cc: Ingo Molnar <mingo@kernel.org> Cc: <kirill.shutemov@linux.intel.com> Cc: <anders.fogh@gdata-adan.de> After several recent works [1,2,3] KASLR on x86_64 was basically considered dead by many researchers. We have been working on an efficient but effective fix for this problem and found that not mapping the kernel space when running in user mode is the solution to this problem [4] (the corresponding paper [5] will be presented at ESSoS17). With this RFC patch we allow anybody to configure their kernel with the flag CONFIG_KAISER to add our defense mechanism. If there are any questions we would love to answer them. We also appreciate any comments! Cheers, Daniel (+ the KAISER team from Graz University of Technology) [1] `4977a191`.pdf [2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf [3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf [4] https://github.com/IAIK/KAISER [5] https://gruss.cc/files/kaiser.pdf [patch based also on https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch] Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at> Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at> Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-05 15:46:32 +01:00
Eric Biggers	17c564f629	mm, uprobes: fix multiple free of ->uprobes_state.xol_area commit 355627f518978b5167256d27492fe0b343aaf2f2 upstream. Commit `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") made it possible to kill a forking task while it is waiting to acquire its ->mmap_sem for write, in dup_mmap(). However, it was overlooked that this introduced an new error path before the new mm_struct's ->uprobes_state.xol_area has been set to NULL after being copied from the old mm_struct by the memcpy in dup_mm(). For a task that has previously hit a uprobe tracepoint, this resulted in the 'struct xol_area' being freed multiple times if the task was killed at just the right time while forking. Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather than in uprobe_dup_mmap(). With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C program given by commit 2b7e8665b4ff ("fork: fix incorrect fput of ->exe_file causing use-after-free"), provided that a uprobe tracepoint has been set on the fork_thread() function. For example: $ gcc reproducer.c -o reproducer -lpthread $ nm reproducer \| grep fork_thread 0000000000400719 t fork_thread $ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events $ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable $ ./reproducer Here is the use-after-free reported by KASAN: BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200 Read of size 8 at addr ffff8800320a8b88 by task reproducer/198 CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 Call Trace: dump_stack+0xdb/0x185 print_address_description+0x7e/0x290 kasan_report+0x23b/0x350 __asan_report_load8_noabort+0x19/0x20 uprobe_clear_state+0x1c4/0x200 mmput+0xd6/0x360 do_exit+0x740/0x1670 do_group_exit+0x13f/0x380 get_signal+0x597/0x17d0 do_signal+0x99/0x1df0 exit_to_usermode_loop+0x166/0x1e0 syscall_return_slowpath+0x258/0x2c0 entry_SYSCALL_64_fastpath+0xbc/0xbe ... Allocated by task 199: save_stack_trace+0x1b/0x20 kasan_kmalloc+0xfc/0x180 kmem_cache_alloc_trace+0xf3/0x330 __create_xol_area+0x10f/0x780 uprobe_notify_resume+0x1674/0x2210 exit_to_usermode_loop+0x150/0x1e0 prepare_exit_to_usermode+0x14b/0x180 retint_user+0x8/0x20 Freed by task 199: save_stack_trace+0x1b/0x20 kasan_slab_free+0xa8/0x1a0 kfree+0xba/0x210 uprobe_clear_state+0x151/0x200 mmput+0xd6/0x360 copy_process.part.8+0x605f/0x65d0 _do_fork+0x1a5/0xbd0 SyS_clone+0x19/0x20 do_syscall_64+0x22f/0x660 return_from_SYSCALL_64+0x0/0x7a Note: without KASAN, you may instead see a "Bad page state" message, or simply a general protection fault. Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com Fixes: `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-07 08:35:39 +02:00
Eric Biggers	b65b6ac52e	fork: fix incorrect fput of ->exe_file causing use-after-free commit 2b7e8665b4ff51c034c55df3cff76518d1a9ee3a upstream. Commit `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") made it possible to kill a forking task while it is waiting to acquire its ->mmap_sem for write, in dup_mmap(). However, it was overlooked that this introduced an new error path before a reference is taken on the mm_struct's ->exe_file. Since the ->exe_file of the new mm_struct was already set to the old ->exe_file by the memcpy() in dup_mm(), it was possible for the mmput() in the error path of dup_mm() to drop a reference to ->exe_file which was never taken. This caused the struct file to later be freed prematurely. Fix it by updating mm_init() to NULL out the ->exe_file, in the same place it clears other things like the list of mmaps. This bug was found by syzkaller. It can be reproduced using the following C program: #define _GNU_SOURCE #include <pthread.h> #include <stdlib.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void mmap_thread(void _arg) { for (;;) { mmap(NULL, 0x1000000, PROT_READ, MAP_POPULATE\|MAP_ANONYMOUS\|MAP_PRIVATE, -1, 0); } } static void fork_thread(void _arg) { usleep(rand() % 10000); fork(); } int main(void) { fork(); fork(); fork(); for (;;) { if (fork() == 0) { pthread_t t; pthread_create(&t, NULL, mmap_thread, NULL); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } No special kernel config options are needed. It usually causes a NULL pointer dereference in __remove_shared_vm_struct() during exit, or in dup_mmap() (which is usually inlined into copy_process()) during fork. Both are due to a vm_area_struct's ->vm_file being used after it's already been freed. Google Bug Id: 64772007 Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com Fixes: `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:47 +02:00
Daniel Micay	f157261b55	stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream. The stack canary is an 'unsigned long' and should be fully initialized to random data rather than only 32 bits of random data. Signed-off-by: Daniel Micay <danielmicay@gmail.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Arjan van Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-25 15:44:46 +02:00
Kirill Tkhai	2ea2f891fa	pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() commit 3fd37226216620c1a468afa999739d5016fbc349 upstream. Imagine we have a pid namespace and a task from its parent's pid_ns, which made setns() to the pid namespace. The task is doing fork(), while the pid namespace's child reaper is dying. We have the race between them: Task from parent pid_ns Child reaper copy_process() .. alloc_pid() .. .. zap_pid_ns_processes() .. disable_pid_allocation() .. read_lock(&tasklist_lock) .. iterate over pids in pid_ns .. kill tasks linked to pids .. read_unlock(&tasklist_lock) write_lock_irq(&tasklist_lock); .. attach_pid(p, PIDTYPE_PID); .. .. .. So, just created task p won't receive SIGKILL signal, and the pid namespace will be in contradictory state. Only manual kill will help there, but does the userspace care about this? I suppose, the most users just inject a task into a pid namespace and wait a SIGCHLD from it. The patch fixes the problem. It simply checks for (pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process(). We do it under the tasklist_lock, and can't skip PIDNS_HASH_ADDING as noted by Oleg: "zap_pid_ns_processes() does disable_pid_allocation() and then takes tasklist_lock to kill the whole namespace. Given that copy_process() checks PIDNS_HASH_ADDING under write_lock(tasklist) they can't race; if copy_process() takes this lock first, the new child will be killed, otherwise copy_process() can't miss the change in ->nr_hashed." If allocation is disabled, we just return -ENOMEM like it's made for such cases in alloc_pid(). v2: Do not move disable_pid_allocation(), do not introduce a new variable in copy_process() and simplify the patch as suggested by Oleg Nesterov. Account the problem with double irq enabling found by Eric W. Biederman. Fixes: `c876ad7682` ("pidns: Stop pid allocation when init dies") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Ingo Molnar <mingo@kernel.org> CC: Peter Zijlstra <peterz@infradead.org> CC: Oleg Nesterov <oleg@redhat.com> CC: Mike Rapoport <rppt@linux.vnet.ibm.com> CC: Michal Hocko <mhocko@suse.com> CC: Andy Lutomirski <luto@kernel.org> CC: "Eric W. Biederman" <ebiederm@xmission.com> CC: Andrei Vagin <avagin@openvz.org> CC: Cyrill Gorcunov <gorcunov@openvz.org> CC: Serge Hallyn <serge@hallyn.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-25 15:44:38 +02:00
Eric W. Biederman	694a95fa6d	mm: Add a user_ns owner to mm_struct and fix ptrace permission checks commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream. During exec dumpable is cleared if the file that is being executed is not readable by the user executing the file. A bug in ptrace_may_access allows reading the file if the executable happens to enter into a subordinate user namespace (aka clone(CLONE_NEWUSER), unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER). This problem is fixed with only necessary userspace breakage by adding a user namespace owner to mm_struct, captured at the time of exec, so it is clear in which user namespace CAP_SYS_PTRACE must be present in to be able to safely give read permission to the executable. The function ptrace_may_access is modified to verify that the ptracer has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns. This ensures that if the task changes it's cred into a subordinate user namespace it does not become ptraceable. The function ptrace_attach is modified to only set PT_PTRACE_CAP when CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of PT_PTRACE_CAP is to be a flag to note that whatever permission changes the task might go through the tracer has sufficient permissions for it not to be an issue. task->cred->user_ns is always the same as or descendent of mm->user_ns. Which guarantees that having CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks credentials. To prevent regressions mm->dumpable and mm->user_ns are not considered when a task has no mm. As simply failing ptrace_may_attach causes regressions in privileged applications attempting to read things such as /proc/<pid>/stat Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Fixes: `8409cca705` ("userns: allow ptrace from non-init user namespaces") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-01-06 10:40:13 +01:00
Andy Lutomirski	405c075971	fork: Add task stack refcounting sanity check and prevent premature task stack freeing If something goes wrong with task stack refcounting and a stack refcount hits zero too early, warn and leak it rather than potentially freeing it early (and silently). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f29119c783a9680a4b4656e751b6123917ace94b.1477926663.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:39:17 +01:00
Linus Torvalds	9ffc66941d	This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Kees Cook <kees@outflux.net> iQIcBAABCgAGBQJX/BAFAAoJEIly9N/cbcAmzW8QALFbCs7EFFkML+M/M/9d8zEk 1QbUs/z8covJTTT1PjSdw7JUrAMulI3S00owpcQVd/PcWjRPU80QwfsXBgIB0tvC Kub2qxn6Oaf+kTB646zwjFgjdCecw/USJP+90nfcu2+LCnE8ReclKd1aUee+Bnhm iDEUyH2ONIoWq6ta2Z9sA7+E4y2ZgOlmW0iga3Mnf+OcPtLE70fWPoe5E4g9DpYk B+kiPDrD9ql5zsHaEnKG1ldjiAZ1L6Grk8rGgLEXmbOWtTOFmnUhR+raK5NA/RCw MXNuyPay5aYPpqDHFm+OuaWQAiPWfPNWM3Ett4k0d9ZWLixTcD1z68AciExwk7aW SEA8b1Jwbg05ZNYM7NJB6t6suKC4dGPxWzKFOhmBicsh2Ni5f+Az0BQL6q8/V8/4 8UEqDLuFlPJBB50A3z5ngCVeYJKZe8Bg/Swb4zXl6mIzZ9darLzXDEV6ystfPXxJ e1AdBb41WC+O2SAI4l64yyeswkGo3Iw2oMbXG5jmFl6wY/xGp7dWxw7gfnhC6oOh afOT54p2OUDfSAbJaO0IHliWoIdmE5ZYdVYVU9Ek+uWyaIwcXhNmqRg+Uqmo32jf cP5J9x2kF3RdOcbSHXmFp++fU+wkhBtEcjkNpvkjpi4xyA47IWS7lrVBBebrCq9R pa/A7CNQwibIV6YD8+/p =1dUK -----END PGP SIGNATURE----- Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc plugins update from Kees Cook: "This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals" * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: latent_entropy: Mark functions with __latent_entropy gcc-plugins: Add latent_entropy plugin	2016-10-15 10:03:15 -07:00
Emese Revfy	0766f788eb	latent_entropy: Mark functions with __latent_entropy The __latent_entropy gcc attribute can be used only on functions and variables. If it is on a function then the plugin will instrument it for gathering control-flow entropy. If the attribute is on a variable then the plugin will initialize it with random contents. The variable must be an integer, an integer array type or a structure with integer fields. These specific functions have been selected because they are init functions (to help gather boot-time entropy), are called at unpredictable times, or they have variable loops, each of which provide some level of latent entropy. Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: expanded commit message] Signed-off-by: Kees Cook <keescook@chromium.org>	2016-10-10 14:51:45 -07:00
Emese Revfy	38addce8b6	gcc-plugins: Add latent_entropy plugin This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals. The need for very-early boot entropy tends to be very architecture or system design specific, so this plugin is more suited for those sorts of special cases. The existing kernel RNG already attempts to extract entropy from reliable runtime variation, but this plugin takes the idea to a logical extreme by permuting a global variable based on any variation in code execution (e.g. a different value (and permutation function) is used to permute the global based on loop count, case statement, if/then/else branching, etc). To do this, the plugin starts by inserting a local variable in every marked function. The plugin then adds logic so that the value of this variable is modified by randomly chosen operations (add, xor and rol) and random values (gcc generates separate static values for each location at compile time and also injects the stack pointer at runtime). The resulting value depends on the control flow path (e.g., loops and branches taken). Before the function returns, the plugin mixes this local variable into the latent_entropy global variable. The value of this global variable is added to the kernel entropy pool in do_one_initcall() and _do_fork(), though it does not credit any bytes of entropy to the pool; the contents of the global are just used to mix the pool. Additionally, the plugin can pre-initialize arrays with build-time random contents, so that two different kernel builds running on identical hardware will not have the same starting values. Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: expanded commit message and code comments] Signed-off-by: Kees Cook <keescook@chromium.org>	2016-10-10 14:51:44 -07:00
Aaron Lu	6fcb52a56f	thp: reduce usage of huge zero page's atomic counter The global zero page is used to satisfy an anonymous read fault. If THP(Transparent HugePage) is enabled then the global huge zero page is used. The global huge zero page uses an atomic counter for reference counting and is allocated/freed dynamically according to its counter value. CPU time spent on that counter will greatly increase if there are a lot of processes doing anonymous read faults. This patch proposes a way to reduce the access to the global counter so that the CPU load can be reduced accordingly. To do this, a new flag of the mm_struct is introduced: MMF_USED_HUGE_ZERO_PAGE. With this flag, the process only need to touch the global counter in two cases: 1 The first time it uses the global huge zero page; 2 The time when mm_user of its mm_struct reaches zero. Note that right now, the huge zero page is eligible to be freed as soon as its last use goes away. With this patch, the page will not be eligible to be freed until the exit of the last process from which it was ever used. And with the use of mm_user, the kthread is not eligible to use huge zero page either. Since no kthread is using huge zero page today, there is no difference after applying this patch. But if that is not desired, I can change it to when mm_count reaches zero. Case used for test on Haswell EP: usemem -n 72 --readonly -j 0x200000 100G Which spawns 72 processes and each will mmap 100G anonymous space and then do read only access to that space sequentially with a step of 2MB. CPU cycles from perf report for base commit: 54.03% usemem [kernel.kallsyms] [k] get_huge_zero_page CPU cycles from perf report for this commit: 0.11% usemem [kernel.kallsyms] [k] mm_get_huge_zero_page Performance(throughput) of the workload for base commit: 1784430792 Performance(throughput) of the workload for this commit: 4726928591 164% increase. Runtime of the workload for base commit: 707592 us Runtime of the workload for this commit: 303970 us 50% drop. Link: http://lkml.kernel.org/r/fe51a88f-446a-4622-1363-ad1282d71385@intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-07 18:46:28 -07:00
Michal Hocko	862e3073b3	mm, oom: get rid of signal_struct::oom_victims After "oom: keep mm of the killed task available" we can safely detect an oom victim by checking task->signal->oom_mm so we do not need the signal_struct counter anymore so let's get rid of it. This alone wouldn't be sufficient for nommu archs because exit_oom_victim doesn't hide the process from the oom killer anymore. We can, however, mark the mm with a MMF flag in __mmput. We can reuse MMF_OOM_REAPED and rename it to a more generic MMF_OOM_SKIP. Link: http://lkml.kernel.org/r/1472119394-11342-6-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-07 18:46:27 -07:00
Michal Hocko	7283094ec3	kernel, oom: fix potential pgd_lock deadlock from __mmdrop Lockdep complains that __mmdrop is not safe from the softirq context: ================================= [ INFO: inconsistent lock state ] 4.6.0-oomfortification2-00011-geeb3eadeab96-dirty #949 Tainted: G W --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (pgd_lock){+.?...}, at: pgd_free+0x19/0x6b {SOFTIRQ-ON-W} state was registered at: __lock_acquire+0xa06/0x196e lock_acquire+0x139/0x1e1 _raw_spin_lock+0x32/0x41 __change_page_attr_set_clr+0x2a5/0xacd change_page_attr_set_clr+0x16f/0x32c set_memory_nx+0x37/0x3a free_init_pages+0x9e/0xc7 alternative_instructions+0xa2/0xb3 check_bugs+0xe/0x2d start_kernel+0x3ce/0x3ea x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x17a/0x18d irq event stamp: 105916 hardirqs last enabled at (105916): free_hot_cold_page+0x37e/0x390 hardirqs last disabled at (105915): free_hot_cold_page+0x2c1/0x390 softirqs last enabled at (105878): _local_bh_enable+0x42/0x44 softirqs last disabled at (105879): irq_exit+0x6f/0xd1 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(pgd_lock); <Interrupt> lock(pgd_lock); * DEADLOCK * 1 lock held by swapper/1/0: #0: (rcu_callback){......}, at: rcu_process_callbacks+0x390/0x800 stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.6.0-oomfortification2-00011-geeb3eadeab96-dirty #949 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 Call Trace: <IRQ> print_usage_bug.part.25+0x259/0x268 mark_lock+0x381/0x567 __lock_acquire+0x993/0x196e lock_acquire+0x139/0x1e1 _raw_spin_lock+0x32/0x41 pgd_free+0x19/0x6b __mmdrop+0x25/0xb9 __put_task_struct+0x103/0x11e delayed_put_task_struct+0x157/0x15e rcu_process_callbacks+0x660/0x800 __do_softirq+0x1ec/0x4d5 irq_exit+0x6f/0xd1 smp_apic_timer_interrupt+0x42/0x4d apic_timer_interrupt+0x8e/0xa0 <EOI> arch_cpu_idle+0xf/0x11 default_idle_call+0x32/0x34 cpu_startup_entry+0x20c/0x399 start_secondary+0xfe/0x101 More over commit `a79e53d856` ("x86/mm: Fix pgd_lock deadlock") was explicit about pgd_lock not to be called from the irq context. This means that __mmdrop called from free_signal_struct has to be postponed to a user context. We already have a similar mechanism for mmput_async so we can use it here as well. This is safe because mm_count is pinned by mm_users. This fixes bug introduced by "oom: keep mm of the killed task available" Link: http://lkml.kernel.org/r/1472119394-11342-5-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-07 18:46:27 -07:00
Michal Hocko	26db62f179	oom: keep mm of the killed task available oom_reap_task has to call exit_oom_victim in order to make sure that the oom vicim will not block the oom killer for ever. This is, however, opening new problems (e.g oom_killer_disable exclusion - see commit `7407054209` ("oom, suspend: fix oom_reaper vs. oom_killer_disable race")). exit_oom_victim should be only called from the victim's context ideally. One way to achieve this would be to rely on per mm_struct flags. We already have MMF_OOM_REAPED to hide a task from the oom killer since "mm, oom: hide mm which is shared with kthread or global init". The problem is that the exit path: do_exit exit_mm tsk->mm = NULL; mmput __mmput exit_oom_victim doesn't guarantee that exit_oom_victim will get called in a bounded amount of time. At least exit_aio depends on IO which might get blocked due to lack of memory and who knows what else is lurking there. This patch takes a different approach. We remember tsk->mm into the signal_struct and bind it to the signal struct life time for all oom victims. __oom_reap_task_mm as well as oom_scan_process_thread do not have to rely on find_lock_task_mm anymore and they will have a reliable reference to the mm struct. As a result all the oom specific communication inside the OOM killer can be done via tsk->signal->oom_mm. Increasing the signal_struct for something as unlikely as the oom killer is far from ideal but this approach will make the code much more reasonable and long term we even might want to move task->mm into the signal_struct anyway. In the next step we might want to make the oom killer exclusion and access to memory reserves completely independent which would be also nice. Link: http://lkml.kernel.org/r/1472119394-11342-4-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-07 18:46:27 -07:00
Linus Torvalds	14986a34e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This set of changes is a number of smaller things that have been overlooked in other development cycles focused on more fundamental change. The devpts changes are small things that were a distraction until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an trivial regression fix to autofs for the unprivileged mount changes that went in last cycle. A pair of ioctls has been added by Andrey Vagin making it is possible to discover the relationships between namespaces when referring to them through file descriptors. The big user visible change is starting to add simple resource limits to catch programs that misbehave. With namespaces in general and user namespaces in particular allowing users to use more kinds of resources, it has become important to have something to limit errant programs. Because the purpose of these limits is to catch errant programs the code needs to be inexpensive to use as it always on, and the default limits need to be high enough that well behaved programs on well behaved systems don't encounter them. To this end, after some review I have implemented per user per user namespace limits, and use them to limit the number of namespaces. The limits being per user mean that one user can not exhause the limits of another user. The limits being per user namespace allow contexts where the limit is 0 and security conscious folks can remove from their threat anlysis the code used to manage namespaces (as they have historically done as it root only). At the same time the limits being per user namespace allow other parts of the system to use namespaces. Namespaces are increasingly being used in application sand boxing scenarios so an all or nothing disable for the entire system for the security conscious folks makes increasing use of these sandboxes impossible. There is also added a limit on the maximum number of mounts present in a single mount namespace. It is nontrivial to guess what a reasonable system wide limit on the number of mount structure in the kernel would be, especially as it various based on how a system is using containers. A limit on the number of mounts in a mount namespace however is much easier to understand and set. In most cases in practice only about 1000 mounts are used. Given that some autofs scenarious have the potential to be 30,000 to 50,000 mounts I have set the default limit for the number of mounts at 100,000 which is well above every known set of users but low enough that the mount hash tables don't degrade unreaonsably. These limits are a start. I expect this estabilishes a pattern that other limits for resources that namespaces use will follow. There has been interest in making inotify event limits per user per user namespace as well as interest expressed in making details about what is going on in the kernel more visible" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits) autofs: Fix automounts by using current_real_cred()->uid mnt: Add a per mount namespace limit on the number of mounts netns: move {inc,dec}_net_namespaces into #ifdef nsfs: Simplify __ns_get_path tools/testing: add a test to check nsfs ioctl-s nsfs: add ioctl to get a parent namespace nsfs: add ioctl to get an owning user namespace for ns file descriptor kernel: add a helper to get an owning user namespace for a namespace devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts devpts: Remove sync_filesystems devpts: Make devpts_kill_sb safe if fsi is NULL devpts: Simplify devpts_mount by using mount_nodev devpts: Move the creation of /dev/pts/ptmx into fill_super devpts: Move parse_mount_options into fill_super userns: When the per user per user namespace limit is reached return ENOSPC userns; Document per user per user namespace limits. mntns: Add a limit on the number of mount namespaces. netns: Add a limit on the number of net namespaces cgroupns: Add a limit on the number of cgroup namespaces ipcns: Add a limit on the number of ipc namespaces ...	2016-10-06 09:52:23 -07:00
Andy Lutomirski	ac496bf48d	fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y vmalloc() is a bit slow, and pounding vmalloc()/vfree() will eventually force a global TLB flush. To reduce pressure on them, if CONFIG_VMAP_STACK=y, cache two thread stacks per CPU. This will let us quickly allocate a hopefully cache-hot, TLB-hot stack under heavy forking workloads (shell script style). On my silly pthread_create() benchmark, it saves about 2 µs per pthread_create()+join() with CONFIG_VMAP_STACK=y. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/94811d8e3994b2e962f88866290017d498eb069c.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-16 09:18:54 +02:00
Andy Lutomirski	68f24b08ee	sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK We currently keep every task's stack around until the task_struct itself is freed. This means that we keep the stack allocation alive for longer than necessary and that, under load, we free stacks in big batches whenever RCU drops the last task reference. Neither of these is good for reuse of cache-hot memory, and freeing in batches prevents us from usefully caching small numbers of vmalloced stacks. On architectures that have thread_info on the stack, we can't easily change this, but on architectures that set THREAD_INFO_IN_TASK, we can free it as soon as the task is dead. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/08ca06cde00ebed0046c5d26cbbf3fbb7ef5b812.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-16 09:18:54 +02:00
Ingo Molnar	d4b80afbba	Merge branch 'linus' into x86/asm, to pick up recent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:24:53 +02:00
Linus Torvalds	b9677faf45	Merge branch 'akpm' (patches from Andrew) Merge fixes from Andrew Morton: "14 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: rapidio/tsi721: fix incorrect detection of address translation condition rapidio/documentation/mport_cdev: add missing parameter description kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd MAINTAINERS: Vladimir has moved mm, mempolicy: task->mempolicy must be NULL before dropping final reference printk/nmi: avoid direct printk()-s from __printk_nmi_flush() treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator lib/test_hash.c: fix warning in preprocessor symbol evaluation lib/test_hash.c: fix warning in two-dimensional array init kconfig: tinyconfig: provide whole choice blocks to avoid warnings kexec: fix double-free when failing to relocate the purgatory mm, oom: prevent premature OOM killer invocation for high order request	2016-09-01 18:23:22 -07:00
Michal Hocko	735f2770a7	kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Commit `fec1d01152` ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the shared databases, so that the clients are notified when nscd is restarted. Now, when nscd uses a non-persistent database, clients that have it mapped keep thinking the database is being updated by nscd, when in fact nscd has created a new (anonymous) one (for non-persistent databases it uses an unlinked file as backend). The original proposal for the CLONE_CHILD_CLEARTID change claimed (https://lkml.org/lkml/2006/10/25/233): : The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls : on behalf of pthread_create() library calls. This feature is used to : request that the kernel clear the thread-id in user space (at an address : provided in the syscall) when the thread disassociates itself from the : address space, which is done in mm_release(). : : Unfortunately, when a multi-threaded process incurs a core dump (such as : from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of : the other threads, which then proceed to clear their user-space tids : before synchronizing in exit_mm() with the start of core dumping. This : misrepresents the state of process's address space at the time of the : SIGSEGV and makes it more difficult for someone to debug NPTL and glibc : problems (misleading him/her to conclude that the threads had gone away : before the fault). : : The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a : core dump has been initiated. The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269) seems to have a larger scope than the original patch asked for. It seems that limitting the scope of the check to core dumping should work for SIGSEGV issue describe above. [Changelog partly based on Andreas' description] Fixes: `fec1d01152` ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: William Preston <wpreston@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andreas Schwab <schwab@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-09-01 17:52:02 -07:00
Linus Torvalds	511a8cdb65	Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit Pull audit fixes from Paul Moore: "Two small patches to fix some bugs with the audit-by-executable functionality we introduced back in v4.3 (both patches are marked for the stable folks)" * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit: audit: fix exe_file access in audit_exe_compare mm: introduce get_task_exe_file	2016-09-01 15:55:56 -07:00
Mateusz Guzik	cd81a9170e	mm: introduce get_task_exe_file For more convenient access if one has a pointer to the task. As a minor nit take advantage of the fact that only task lock + rcu are needed to safely grab ->exe_file. This saves mm refcount dance. Use the helper in proc_exe_link. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Richard Guy Briggs <rgb@redhat.com> Cc: <stable@vger.kernel.org> # 4.3.x Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-08-31 16:11:20 -04:00
Andy Lutomirski	ba14a194a4	fork: Add generic vmalloced stack support If CONFIG_VMAP_STACK=y is selected, kernel stacks are allocated with __vmalloc_node_range(). Grsecurity has had a similar feature (called GRKERNSEC_KSTACKOVERFLOW=y) for a long time. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/14c07d4fd173a5b117f51e8b939f9f4323e39899.1470907718.git.luto@kernel.org [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:11:41 +02:00
Balbir Singh	568ac88821	cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork cgroup_threadgroup_rwsem is acquired in read mode during process exit and fork. It is also grabbed in write mode during __cgroups_proc_write(). I've recently run into a scenario with lots of memory pressure and OOM and I am beginning to see systemd __switch_to+0x1f8/0x350 __schedule+0x30c/0x990 schedule+0x48/0xc0 percpu_down_write+0x114/0x170 __cgroup_procs_write.isra.12+0xb8/0x3c0 cgroup_file_write+0x74/0x1a0 kernfs_fop_write+0x188/0x200 __vfs_write+0x6c/0xe0 vfs_write+0xc0/0x230 SyS_write+0x6c/0x110 system_call+0x38/0xb4 This thread is waiting on the reader of cgroup_threadgroup_rwsem to exit. The reader itself is under memory pressure and has gone into reclaim after fork. There are times the reader also ends up waiting on oom_lock as well. __switch_to+0x1f8/0x350 __schedule+0x30c/0x990 schedule+0x48/0xc0 jbd2_log_wait_commit+0xd4/0x180 ext4_evict_inode+0x88/0x5c0 evict+0xf8/0x2a0 dispose_list+0x50/0x80 prune_icache_sb+0x6c/0x90 super_cache_scan+0x190/0x210 shrink_slab.part.15+0x22c/0x4c0 shrink_zone+0x288/0x3c0 do_try_to_free_pages+0x1dc/0x590 try_to_free_pages+0xdc/0x260 __alloc_pages_nodemask+0x72c/0xc90 alloc_pages_current+0xb4/0x1a0 page_table_alloc+0xc0/0x170 __pte_alloc+0x58/0x1f0 copy_page_range+0x4ec/0x950 copy_process.isra.5+0x15a0/0x1870 _do_fork+0xa8/0x4b0 ppc_clone+0x8/0xc In the meanwhile, all processes exiting/forking are blocked almost stalling the system. This patch moves the threadgroup_change_begin from before cgroup_fork() to just before cgroup_canfork(). There is no nee to worry about threadgroup changes till the task is actually added to the threadgroup. This avoids having to call reclaim with cgroup_threadgroup_rwsem held. tj: Subject and description edits. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Zefan Li <lizefan@huawei.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Tejun Heo <tj@kernel.org>	2016-08-17 09:54:52 -04:00

1 2 3 4 5 ...

762 commits