wonderful-mobile/exynos-linux-stable

Author	SHA1	Message	Date
xxmustafacooTR	71e430b7c9	gaming_control, mali: more controls, optimizations	2024-09-27 17:20:00 +03:00
Diep Quynh	609163c324	drivers: Introduce brand new kernel gaming mode How to trigger gaming mode? Just open a game that is supported in games list How to exit? Kill the game from recents, or simply back to homescreen What does this gaming mode do? - It limits big cluster maximum frequency to 2,0GHz, and little cluster to values matching GPU frequencies as below: + 338MHz: 455MHz + 385MHz and above: 1053MHz - As for the cluster freq limits, it overcomes heating issue while playing heavy games, as well as saves battery juice Big thanks to [kerneltoast] for the following commits on his wahoo kernel `5ac1e81d3d` `e13e2c4554` Gaming control's idea was based on these Signed-off-by: Diep Quynh <remilia.1505@gmail.com>	2024-09-27 17:19:55 +03:00
Alexei Starovoitov	28ebd88f5c	BACKPORT: bpf: introduce BPF_PROG_QUERY command introduce BPF_PROG_QUERY command to retrieve a set of either attached programs to given cgroup or a set of effective programs that will execute for events within a cgroup Change-Id: I05e0ed5f6eddc30f4a18216d4541448816fd1ae5 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> for cgroup bits Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-09-25 16:54:46 +03:00
Alexei Starovoitov	1437a0899f	UPSTREAM: bpf: multi program support for cgroup+bpf introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple bpf programs to a cgroup. The difference between three possible flags for BPF_PROG_ATTACH command: - NONE(default): No further bpf programs allowed in the subtree. - BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program, the program in this cgroup yields to sub-cgroup program. - BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program, that cgroup program gets run in addition to the program in this cgroup. NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't change their behavior. It only clarifies the semantics in relation to new flag. Only one program is allowed to be attached to a cgroup with NONE or BPF_F_ALLOW_OVERRIDE flag. Multiple programs are allowed to be attached to a cgroup with BPF_F_ALLOW_MULTI flag. They are executed in FIFO order (those that were attached first, run first) The programs of sub-cgroup are executed first, then programs of this cgroup and then programs of parent cgroup. All eligible programs are executed regardless of return code from earlier programs. To allow efficient execution of multiple programs attached to a cgroup and to avoid penalizing cgroups without any programs attached introduce 'struct bpf_prog_array' which is RCU protected array of pointers to bpf programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> for cgroup bits Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0) Signed-off-by: Connor O'Brien <connoro@google.com> Bug: 121213201 Bug: 138317270 Test: build & boot cuttlefish Change-Id: If17b11a773f73d45ea565a947fc1bf7e158db98d	2024-09-25 16:54:37 +03:00
Mustafa Gökmen	4fce632291	Revert "BACKPORT: bpf: multi program support for cgroup+bpf" This reverts commit `148f111e98`.	2024-09-25 16:54:36 +03:00
FAROVITUS	eb6fae6224	Merge 4.9.217 branch 'android-4.9-q' into tw10-android-4.9-q	2020-03-23 16:23:34 +02:00
Greg Kroah-Hartman	d2adefff96	This is the 4.9.217 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl50ed8ACgkQONu9yGCS aT7A8A//WF8aq34T1LLR70ESa6fJVj2dEzCQA94qvUCPHDJfBj1x00R8OGvZfKEc E7YPI8MPrvMmXnyxImge2UcqCJhmnis+gk1fLn+N3Yi5nfZWrxC1dnShCnCTeye1 tA+BuhyzOXHDmLqrktBcT3TCTrIUWK8lehR8m7jlkAh7wG6Z2QmgKlaiq5fRAVE9 Z0je4fw0HxdVrCM84BJiM1M6w3pa+AYUkSBfqvU+fgm8xfOWGUPZFEitx9O1PwaH m0UOS51gM5ZSzpAMDBlXYnmUgPxUTlpwqIL77zgMFzYcpKbf2rVphFRxmacWWqQ2 MWnt6vn/eatDeJ6TFAF08APy63SUAR7BvssSYoySz8Y4m6LbL4yUfCWYkL9lhLFH dJL6TTjUwFw1Rd7UVzHongEr24NASCS37MiR8V/3RUiDqIYtrlaJKxhhVdTKg7mm xRKA+QBXkGUfwFJm2mL+3E3D7/Xgl4kNo22lp1YgamTgc50/UMB5JEf8Z3aqa/qh ie/4oKnmaa+SSgOxQmvqS3lWF7RYU+axvc9K0FK5CYSXnc6Jfgtq0yDB5twslj05 HdCwNJX9KOL0NZAzQKs+FLjnhOtnRigANMG7KMm5nmS3vwsxQqIkm0b6tPhgBcd/ k4QQm8h6C5Tdi+R39fGQOgtBK0gVVJl+n3XfA3gbmunvXhSo6IQ= =fFH1 -----END PGP SIGNATURE----- Merge 4.9.217 into android-4.9-q Changes in 4.9.217 NFS: Remove superfluous kmap in nfs_readdir_xdr_to_array phy: Revert toggling reset changes. net: phy: Avoid multiple suspends cgroup, netclassid: periodically release file_lock on classid updating gre: fix uninit-value in __iptunnel_pull_header ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface net: macsec: update SCI upon MAC address change. net: nfc: fix bounds checking bugs on "pipe" r8152: check disconnect status after long sleep bnxt_en: reinitialize IRQs when MTU is modified fib: add missing attribute validation for tun_id nl802154: add missing attribute validation nl802154: add missing attribute validation for dev_type macsec: add missing attribute validation for port net: fq: add missing attribute validation for orphan mask team: add missing attribute validation for port ifindex team: add missing attribute validation for array index nfc: add missing attribute validation for SE API nfc: add missing attribute validation for vendor subcommand ipvlan: add cond_resched_rcu() while processing muticast backlog ipvlan: do not add hardware address of master to its unicast filter list ipvlan: egress mcast packets are not exceptional ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast() ipvlan: don't deref eth hdr before checking it's set macvlan: add cond_resched() during multicast processing net: fec: validate the new settings in fec_enet_set_coalesce() slip: make slhc_compress() more robust against malicious packets bonding/alb: make sure arp header is pulled before accessing it cgroup: memcg: net: do not associate sock with unrelated cgroup net: phy: fix MDIO bus PM PHY resuming virtio-blk: fix hw_queue stopped on arbitrary error iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint workqueue: don't use wq_select_unbound_cpu() for bound works drm/amd/display: remove duplicated assignment to grph_obj_type cifs_atomic_open(): fix double-put on late allocation failure gfs2_atomic_open(): fix O_EXCL\|O_CREAT handling on cold dcache KVM: x86: clear stale x86_emulate_ctxt->intercept value ARC: define __ALIGN_STR and __ALIGN symbols for ARC efi: Fix a race and a buffer overflow while reading efivars via sysfs iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page nl80211: add missing attribute validation for critical protocol indication nl80211: add missing attribute validation for beacon report scanning nl80211: add missing attribute validation for channel switch netfilter: cthelper: add missing attribute validation for cthelper mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame() iommu/vt-d: Fix the wrong printing in RHSA parsing iommu/vt-d: Ignore devices with out-of-spec domain number ipv6: restrict IPV6_ADDRFORM operation efi: Add a sanity check to efivar_store_raw() batman-adv: Fix double free during fragment merge error batman-adv: Fix transmission of final, 16th fragment batman-adv: Initialize gw sel_class via batadv_algo batman-adv: Fix rx packet/bytes stats on local ARP reply batman-adv: Use default throughput value on cfg80211 error batman-adv: Accept only filled wifi station info batman-adv: fix TT sync flag inconsistencies batman-adv: Avoid spurious warnings from bat_v neigh_cmp implementation batman-adv: Always initialize fragment header priority batman-adv: Fix check of retrieved orig_gw in batadv_v_gw_is_eligible batman-adv: Fix lock for ogm cnt access in batadv_iv_ogm_calc_tq batman-adv: Fix internal interface indices types batman-adv: Avoid race in TT TVLV allocator helper batman-adv: Fix TT sync flags for intermediate TT responses batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs batman-adv: Fix debugfs path for renamed hardif batman-adv: Fix debugfs path for renamed softif batman-adv: Avoid storing non-TT-sync flags on singular entries too batman-adv: Fix multicast TT issues with bogus ROAM flags batman-adv: Prevent duplicated gateway_node entry batman-adv: Fix duplicated OGMs on NETDEV_UP batman-adv: Avoid free/alloc race when handling OGM2 buffer batman-adv: Avoid free/alloc race when handling OGM buffer batman-adv: Don't schedule OGM for disabled interface batman-adv: update data pointers after skb_cow() batman-adv: Avoid probe ELP information leak batman-adv: Use explicit tvlv padding for ELP packets perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag ACPI: watchdog: Allow disabling WDAT at boot HID: apple: Add support for recent firmware on Magic Keyboards HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override cfg80211: check reg_rule for NULL in handle_channel_custom() net: ks8851-ml: Fix IRQ handling and locking mac80211: rx: avoid RCU list traversal under mutex signal: avoid double atomic counter increments for user accounting jbd2: fix data races at struct journal_head ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional() ARM: 8958/1: rename missed uaccess .fixup section mm: slub: add missing TID bump in kmem_cache_alloc_bulk() ipv4: ensure rcu_read_lock() in cipso_v4_error() Linux 4.9.217 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia7aeed273cd7548dc8d0dfaaad8b96bedfe499b1	2020-03-20 11:01:08 +01:00
Shakeel Butt	529f4b7ad3	cgroup: memcg: net: do not associate sock with unrelated cgroup [ Upstream commit e876ecc67db80dfdb8e237f71e5b43bb88ae549c ] We are testing network memory accounting in our setup and noticed inconsistent network memory usage and often unrelated cgroups network usage correlates with testing workload. On further inspection, it seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in irq context specially for cgroup v1. mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context and kind of assumes that this can only happen from sk_clone_lock() and the source sock object has already associated cgroup. However in cgroup v1, where network memory accounting is opt-in, the source sock can be unassociated with any cgroup and the new cloned sock can get associated with unrelated interrupted cgroup. Cgroup v2 can also suffer if the source sock object was created by process in the root cgroup or if sk_alloc() is called in irq context. The fix is to just do nothing in interrupt. WARNING: Please note that about half of the TCP sockets are allocated from the IRQ context, so, memory used by such sockets will not be accouted by the memcg. The stack trace of mem_cgroup_sk_alloc() from IRQ-context: CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1 Hardware name: ... Call Trace: <IRQ> dump_stack+0x57/0x75 mem_cgroup_sk_alloc+0xe9/0xf0 sk_clone_lock+0x2a7/0x420 inet_csk_clone_lock+0x1b/0x110 tcp_create_openreq_child+0x23/0x3b0 tcp_v6_syn_recv_sock+0x88/0x730 tcp_check_req+0x429/0x560 tcp_v6_rcv+0x72d/0xa40 ip6_protocol_deliver_rcu+0xc9/0x400 ip6_input+0x44/0xd0 ? ip6_protocol_deliver_rcu+0x400/0x400 ip6_rcv_finish+0x71/0x80 ipv6_rcv+0x5b/0xe0 ? ip6_sublist_rcv+0x2e0/0x2e0 process_backlog+0x108/0x1e0 net_rx_action+0x26b/0x460 __do_softirq+0x104/0x2a6 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq.part.19+0x40/0x50 __local_bh_enable_ip+0x51/0x60 ip6_finish_output2+0x23d/0x520 ? ip6table_mangle_hook+0x55/0x160 __ip6_finish_output+0xa1/0x100 ip6_finish_output+0x30/0xd0 ip6_output+0x73/0x120 ? __ip6_finish_output+0x100/0x100 ip6_xmit+0x2e3/0x600 ? ipv6_anycast_cleanup+0x50/0x50 ? inet6_csk_route_socket+0x136/0x1e0 ? skb_free_head+0x1e/0x30 inet6_csk_xmit+0x95/0xf0 __tcp_transmit_skb+0x5b4/0xb20 __tcp_send_ack.part.60+0xa3/0x110 tcp_send_ack+0x1d/0x20 tcp_rcv_state_process+0xe64/0xe80 ? tcp_v6_connect+0x5d1/0x5f0 tcp_v6_do_rcv+0x1b1/0x3f0 ? tcp_v6_do_rcv+0x1b1/0x3f0 __release_sock+0x7f/0xd0 release_sock+0x30/0xa0 __inet_stream_connect+0x1c3/0x3b0 ? prepare_to_wait+0xb0/0xb0 inet_stream_connect+0x3b/0x60 __sys_connect+0x101/0x120 ? __sys_getsockopt+0x11b/0x140 __x64_sys_connect+0x1a/0x20 do_syscall_64+0x51/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The stack trace of mem_cgroup_sk_alloc() from IRQ-context: Fixes: `2d75807383` ("mm: memcontrol: consolidate cgroup socket tracking") Fixes: `d979a39d72` ("cgroup: duplicate cgroup reference when cloning sockets") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-03-20 09:07:43 +01:00
FAROVITUS	af1d3ae977	Merge 4.9.212 branch 'android-4.9-q' into tw10-android-4.9-q Documentation/filesystems/fscrypt.rst arch/arm/common/Kconfig arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi arch/arm64/boot/dts/amd/amd-seattle-soc.dtsi arch/arm64/boot/dts/arm/juno-clocks.dtsi arch/arm64/boot/dts/broadcom/ns2.dtsi arch/arm64/boot/dts/lg/lg1312.dtsi arch/arm64/boot/dts/lg/lg1313.dtsi arch/arm64/boot/dts/marvell/armada-37xx.dtsi arch/arm64/boot/dts/nvidia/tegra210-p2180.dtsi arch/arm64/boot/dts/nvidia/tegra210-p2597.dtsi arch/arm64/boot/dts/nvidia/tegra210.dtsi arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi arch/arm64/boot/dts/qcom/msm8996.dtsi arch/arm64/configs/ranchu64_defconfig arch/arm64/include/asm/cpucaps.h arch/arm64/kernel/cpufeature.c arch/arm64/kernel/traps.c arch/arm64/mm/mmu.c crypto/Makefile crypto/ablkcipher.c crypto/blkcipher.c crypto/testmgr.h crypto/zstd.c drivers/android/binder.c drivers/android/binder_alloc.c drivers/char/random.c drivers/clocksource/exynos_mct.c drivers/dma/pl330.c drivers/hid/hid-sony.c drivers/hid/uhid.c drivers/hid/usbhid/hiddev.c drivers/i2c/i2c-core.c drivers/md/dm-crypt.c drivers/media/v4l2-core/videobuf2-v4l2.c drivers/mmc/host/dw_mmc.c drivers/net/ethernet/broadcom/tg3.c drivers/net/usb/r8152.c drivers/scsi/scsi_logging.c drivers/scsi/sd.c drivers/scsi/ufs/ufshcd-pci.c drivers/scsi/ufs/ufshcd-pltfrm.c drivers/staging/android/Kconfig drivers/staging/android/ion/ion.c drivers/staging/android/ion/ion_priv.h drivers/staging/android/ion/ion_system_heap.c drivers/staging/android/lowmemorykiller.c drivers/tty/serial/samsung.c drivers/usb/dwc3/core.c drivers/usb/dwc3/gadget.c drivers/usb/host/xhci-hub.c drivers/video/fbdev/core/fbmon.c drivers/video/fbdev/core/modedb.c fs/crypto/fname.c fs/crypto/fscrypt_private.h fs/crypto/keyinfo.c fs/ext4/ialloc.c fs/ext4/namei.c fs/ext4/xattr.c fs/f2fs/checkpoint.c fs/f2fs/data.c fs/f2fs/debug.c fs/f2fs/dir.c fs/f2fs/f2fs.h fs/f2fs/file.c fs/f2fs/gc.c fs/f2fs/inline.c fs/f2fs/inode.c fs/f2fs/namei.c fs/f2fs/node.c fs/f2fs/recovery.c fs/f2fs/segment.c fs/f2fs/segment.h fs/f2fs/super.c fs/f2fs/sysfs.c fs/fat/dir.c fs/fat/fatent.c fs/file.c fs/namespace.c fs/pnode.c fs/proc/inode.c fs/proc/root.c fs/proc/task_mmu.c fs/sdcardfs/dentry.c fs/sdcardfs/derived_perm.c fs/sdcardfs/file.c fs/sdcardfs/inode.c fs/sdcardfs/lookup.c fs/sdcardfs/main.c fs/sdcardfs/sdcardfs.h fs/sdcardfs/super.c include/linux/blk_types.h include/linux/cpuhotplug.h include/linux/cred.h include/linux/fb.h include/linux/power_supply.h include/linux/sched.h include/linux/zstd.h include/trace/events/sched.h include/uapi/linux/android/binder.h init/Kconfig init/main.c kernel/bpf/hashtab.c kernel/cpu.c kernel/cred.c kernel/fork.c kernel/locking/spinlock_debug.c kernel/panic.c kernel/printk/printk.c kernel/sched/Makefile kernel/sched/core.c kernel/sched/fair.c kernel/sched/rt.c kernel/sched/walt.c kernel/sched/walt.h kernel/trace/trace.c lib/bug.c lib/list_debug.c lib/vsprintf.c lib/zstd/bitstream.h lib/zstd/compress.c lib/zstd/decompress.c lib/zstd/fse.h lib/zstd/fse_compress.c lib/zstd/fse_decompress.c lib/zstd/huf_compress.c lib/zstd/huf_decompress.c lib/zstd/zstd_internal.h mm/debug.c mm/filemap.c mm/rmap.c net/core/filter.c net/ipv4/sysctl_net_ipv4.c net/ipv4/sysfs_net_ipv4.c net/ipv4/tcp_input.c net/ipv4/tcp_output.c net/ipv4/udp.c net/ipv6/netfilter/nf_conntrack_reasm.c net/netfilter/Kconfig net/netfilter/Makefile net/netfilter/xt_qtaguid.c net/netfilter/xt_qtaguid_internal.h net/xfrm/xfrm_policy.c net/xfrm/xfrm_state.c scripts/checkpatch.pl security/selinux/hooks.c sound/core/compress_offload.c	2020-02-12 12:32:38 +02:00
FAROVITUS	2b92eefa41	import G965FXXU7DTAA OSRC *First release for Android (Q). Signed-off-by: FAROVITUS <farovitus@gmail.com>	2020-02-04 13:50:09 +02:00
Alexei Starovoitov	148f111e98	BACKPORT: bpf: multi program support for cgroup+bpf introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple bpf programs to a cgroup. The difference between three possible flags for BPF_PROG_ATTACH command: - NONE(default): No further bpf programs allowed in the subtree. - BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program, the program in this cgroup yields to sub-cgroup program. - BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program, that cgroup program gets run in addition to the program in this cgroup. NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't change their behavior. It only clarifies the semantics in relation to new flag. Only one program is allowed to be attached to a cgroup with NONE or BPF_F_ALLOW_OVERRIDE flag. Multiple programs are allowed to be attached to a cgroup with BPF_F_ALLOW_MULTI flag. They are executed in FIFO order (those that were attached first, run first) The programs of sub-cgroup are executed first, then programs of this cgroup and then programs of parent cgroup. All eligible programs are executed regardless of return code from earlier programs. To allow efficient execution of multiple programs attached to a cgroup and to avoid penalizing cgroups without any programs attached introduce 'struct bpf_prog_array' which is RCU protected array of pointers to bpf programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> for cgroup bits Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0) Signed-off-by: Connor O'Brien <connoro@google.com> Bug: 121213201 Bug: 138317270 Test: build & boot cuttlefish Change-Id: I06b71c850b9f3e052b106abab7a4a3add012a3f8	2019-12-12 15:48:18 -08:00
Suren Baghdasaryan	a163d3fb8a	FROMLIST: psi: introduce psi monitor Psi monitor aims to provide a low-latency short-term pressure detection mechanism configurable by users. It allows users to monitor psi metrics growth and trigger events whenever a metric raises above user-defined threshold within user-defined time window. Time window and threshold are both expressed in usecs. Multiple psi resources with different thresholds and window sizes can be monitored concurrently. Psi monitors activate when system enters stall state for the monitored psi metric and deactivate upon exit from the stall state. While system is in the stall state psi signal growth is monitored at a rate of 10 times per tracking window. Min window size is 500ms, therefore the min monitoring interval is 50ms. Max window size is 10s with monitoring interval of 1s. When activated psi monitor stays active for at least the duration of one tracking window to avoid repeated activations/deactivations when psi signal is bouncing. Notifications to the users are rate-limited to one per tracking window. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> (not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052418/) Conflicts: include/linux/psi.h kernel/cgroup.c kernel/sched/psi.c (1. replaced __poll_t with unsigned int 2. replaced EPOLLERR/EPOLLPRI with POLLERR/POLLPRI (values are the same) 3. include <linux/cgroup-defs.h> in include/linux/psi.h) Bug: 127712811 Bug: 129157727 Test: lmkd in PSI mode Change-Id: I1688f047e98e1f109627dad72a33d2f70e575268 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-03-23 00:29:15 +00:00
Johannes Weiner	0ee3cb39dc	BACKPORT: kernel: cgroup: add poll file operation Cgroup has a standardized poll/notification mechanism for waking all pollers on all fds when a filesystem node changes. To allow polling for custom events, add a .poll callback that can override the default. This is in preparation for pollable cgroup pressure files which have per-fd trigger configurations. Link: http://lkml.kernel.org/r/20190124211518.244221-3-surenb@google.com Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit: dc50537bdd1a0804fa2cbc990565ee9a944e66fa) Conflicts: include/linux/cgroup-defs.h kernel/cgroup.c 1. made changes in kernel/cgroup.c instead of kernel/cgroup/cgroup.c 2. replaced __poll_t with unsigned int Bug: 111308141 Test: modified lmkd to use PSI and tested using lmkd_unit_test Change-Id: Ie3d914197d1f150e1d83c6206865566a7cbff1b4 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-03-22 14:15:40 -07:00
Tejun Heo	59735bfc7e	UPSTREAM: cgroup add cftype->open/release() callbacks Pipe the newly added kernfs->open/release() callbacks through cftype. While at it, as cleanup operations now can be performed from ->release() instead of ->seq_stop(), make the latter optional. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Acked-by: Zefan Li <lizefan@huawei.com> (cherry picked from commit e90cbebc3fa5caea4c8bfeb0d0157a0cee53efc7) Bug: 111308141 Test: modified lmkd to use PSI and tested using lmkd_unit_test Change-Id: Iff9794cbbc2c7067c24cb2f767bbdeffa26b5180 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-03-22 14:14:49 -07:00
Johannes Weiner	e868a99c44	BACKPORT: psi: cgroup support On a system that executes multiple cgrouped jobs and independent workloads, we don't just care about the health of the overall system, but also that of individual jobs, so that we can ensure individual job health, fairness between jobs, or prioritize some jobs over others. This patch implements pressure stall tracking for cgroups. In kernels with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure, and io.pressure files that track aggregate pressure stall times for only the tasks inside the cgroup. Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 2ce7135adc9ad081aa3c49744144376ac74fea60) Conflicts: Documentation/cgroup-v2.txt include/linux/cgroup.h kernel/cgroup/cgroup.c (1. manual merge from Documentation/admin-guide/cgroup-v2.rst 2. manually merged changes from kernel/cgroup/cgroup.c into kernel/cgroup.c 3. manual merge in css_free_work_fn to allow psi support only for cgroup v2 4. manual merge in cgroup_create to allow psi support only for cgroup v2) Bug: 111308141 Test: modified lmkd to use PSI and tested using lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I2ffb9d50ba87f8b7655bed215a625784e098879c	2019-03-22 14:10:44 -07:00
Tejun Heo	86c5c11aa0	BACKPORT: cgroup: misc changes Misc trivial changes to prepare for future changes. No functional difference. * Expose cgroup_get(), cgroup_tryget() and cgroup_parent(). * Implement task_dfl_cgroup() which dereferences css_set->dfl_cgrp. * Rename cgroup_stats_show() to cgroup_stat_show() for consistency with the file name. Signed-off-by: Tejun Heo <tj@kernel.org> (cherry picked from commit 3e48930cc74f0c212ee1838f89ad0ca7fcf2fea1) Conflicts: kernel/cgroup/cgroup.c (1. manual merge because kernel/cgroup/cgroup.c is under kernel/cgroup.c 2. cgroup_stats_show change is skipped because the function dos not exist) Bug: 111308141 Test: modified lmkd to use PSI and tested using lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I756ee3dcf0d0f3da69cd1b58e644271625053538	2019-03-22 14:08:16 -07:00
Greg Kroah-Hartman	d589c0d406	This is the 4.9.133 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlvBnGMACgkQONu9yGCS aT6KZA//cXQIPaITOcikXVN9LTbfNI32k0803qJi2PPr0lajb6RETnfYbQH+h+Ta d77uEJ7e15U7bxaoLDKJhYe8SvtqMHq6hhqNXwg+woFgAkoMU4pGmAocywLoMpS/ INIc4KwEPKD4JkQYQF1sk3nB22K9DiXn/OjoXTlxyAYuCnvdu99xwIpvaI8GN1Yo SWiMDvmhu1Nj+vD79VDV5UhigI/+lHIaCasJ3fSVyFx6kE2XYBz7qcNGULGzI+pl c6qQpds4UIdVp04d/JQNOurANA6oHYWexrV1Q8zENMwpAOEoDVMDjC/Fi8iIxfCK EfgNABr+5INg2gtPZ1YFBad96vgrxWo1gS+Rdaq71zO5luoWktqG/Kv99WnMIOZf TiS0+6GRKfL06PyfFu/Bsx8Dzp/GVCsYWQEEPNhp5PYUOoIRzI8XiJcofRNlxL0p IiQ0RUansCaN7VN8Z3GzBqDOGikuhxAPrTTAhTAXABrXcrSoAEVhUpmf5UMItnIj cp8AoYqqQxA81PMTLLK0K78B3Mkkdk9BIB82y39WV28j6dB6ZzfPFg4ikdIwS+hm ZNck/2IwhuuPGugDVVX206zH5ATQaS0Vg+BQNqw/Y03/U6SKSlUs6HB4c0X1quy6 jwskBvuADXXlWFuzMlenlAGg1Shk4lwsCGN3F9ziSkrSI6azBRs= =aHnt -----END PGP SIGNATURE----- Merge 4.9.133 into android-4.9 Changes in 4.9.133 mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly fbdev/omapfb: fix omapfb_memory_read infoleak xen-netback: fix input validation in xenvif_set_hash_mapping() x86/vdso: Fix asm constraints on vDSO syscall fallbacks x86/vdso: Fix vDSO syscall fallback asm constraint regression PCI: Reprogram bridge prefetch registers on resume mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys PM / core: Clear the direct_complete flag on errors dm cache metadata: ignore hints array being too small during resize dm cache: fix resize crash if user doesn't reload cache table xhci: Add missing CAS workaround for Intel Sunrise Point xHCI usb: xhci-mtk: resume USB3 roothub first USB: serial: simple: add Motorola Tetra MTP6550 id tty: Drop tty->count on tty_reopen() failure of: unittest: Disable interrupt node tests for old world MAC systems ext4: add corruption check in ext4_xattr_set_entry() ext4: always verify the magic number in xattr blocks cgroup: Fix deadlock in cpu hotplug path ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait ath10k: fix kernel panic issue during pci probe powerpc/fadump: Return error when fadump registration fails ARC: clone syscall to setp r25 as thread pointer x86/mm: Expand static page table for fixmap space f2fs: fix invalid memory access ucma: fix a use-after-free in ucma_resolve_ip() ubifs: Check for name being NULL while mounting ath10k: fix scan crash due to incorrect length calculation ebtables: arpreply: Add the standard target sanity check x86/fpu: Remove use_eager_fpu() x86/fpu: Remove struct fpu::counter Revert "perf: sync up x86/.../cpufeatures.h" x86/fpu: Finish excising 'eagerfpu' Linux 4.9.133 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-10-13 10:42:50 +02:00
Prateek Sood	35b80e75a6	cgroup: Fix deadlock in cpu hotplug path commit 116d2f7496c51b2e02e8e4ecdd2bdf5fb9d5a641 upstream. Deadlock during cgroup migration from cpu hotplug path when a task T is being moved from source to destination cgroup. kworker/0:0 cpuset_hotplug_workfn() cpuset_hotplug_update_tasks() hotplug_update_tasks_legacy() remove_tasks_in_empty_cpuset() cgroup_transfer_tasks() // stuck in iterator loop cgroup_migrate() cgroup_migrate_add_task() In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T. Task T will not migrate to destination cgroup. css_task_iter_start() will keep pointing to task T in loop waiting for task T cg_list node to be removed. Task T do_exit() exit_signals() // sets PF_EXITING exit_task_namespaces() switch_task_namespaces() free_nsproxy() put_mnt_ns() drop_collected_mounts() namespace_unlock() synchronize_rcu() _synchronize_rcu_expedited() schedule_work() // on cpu0 low priority worker pool wait_event() // waiting for work item to execute Task T inserted a work item in the worklist of cpu0 low priority worker pool. It is waiting for expedited grace period work item to execute. This work item will only be executed once kworker/0:0 complete execution of cpuset_hotplug_workfn(). kworker/0:0 ==> Task T ==>kworker/0:0 In case of PF_EXITING task being migrated from source to destination cgroup, migrate next available task in source cgroup. Signed-off-by: Prateek Sood <prsood@codeaurora.org> Signed-off-by: Tejun Heo <tj@kernel.org> [AmitP: Upstream commit cherry-pick failed, so I picked the backported changes from CAF/msm-4.9 tree instead: https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?id=49b74f1696417b270c89cd893ca9f37088928078] Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-10-13 09:18:56 +02:00
Greg Kroah-Hartman	02f29ab1b9	This is the 4.9.42 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmN0iUACgkQONu9yGCS aT6gZRAAhYbpsz2XRFSQ5H/Sk8xwJuwtLB2at3Y1CLCb+lLhlRsV3l4pD+4KRoEl fU01l5s5ZalZJekGfEfEOQOfoJHsCxzFSzKzP06/GA5u6DbwtHUE2SjNWe84j6Ct Hx0jN90yj7S8vy2umROux+fVvZQ4Xay4TDCWhBeXgOFXevwC/G9D2LWE2NIYwbDH Ighahrhs21FZc9wbah0L04bRBAR7+ALLq1sO8ebKwl8eFzAkcEwI/yS48cnjGlgW 9HW5MmY1BYTnRCrXaw5L0Vf5zH6obT7amrLNljNYN6vN62DRoOfwQh4QcblnIAoi L+HdZilifZ970RwQ2As3vy63/Kk3b207ht4mriTCyGXM9MY6bRovYv1wDAUlv7aD GlA8Q7xwsiJ4sG4i5LABjly+QeWymZ2b0kVWYpneJuBuj/gWVDhh1lfT+nOCAcJ6 ROUY6d64ghKPBomkqlMSC+7sH7QKa0/W9WDQCLxtnmjcAkeElpGNGu/m/Thhvi2I NDq2sbMAeGJquXBXIN8W4NPy0puOn0wjqFI7LE61ujSiAxT8973uDtNlrmB/eCAf zD9yJsKELS20PKToren4hYYuRM2XlKh9gVIOWB2pShfzvSO7807ZyVcoI8/bVgZe I2BH6Dt3t+qqWR7B5/qvxxmNCv3HNMpNUzy/z+fXEf8/U3zqiTM= =n7pX -----END PGP SIGNATURE----- Merge 4.9.42 into android-4.9 Changes in 4.9.42 parisc: Handle vma's whose context is not current in flush_cache_range cgroup: create dfl_root files on subsys registration cgroup: fix error return value from cgroup_subtree_control() libata: array underflow in ata_find_dev() workqueue: restore WQ_UNBOUND/max_active==1 to be ordered iwlwifi: dvm: prevent an out of bounds access brcmfmac: fix memleak due to calling brcmf_sdiod_sgtable_alloc() twice NFSv4: Fix EXCHANGE_ID corrupt verifier issue mmc: sdhci-of-at91: force card detect value for non removable devices device property: Make dev_fwnode() public mmc: core: Fix access to HS400-ES devices mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries cpuset: fix a deadlock due to incomplete patching of cpusets_enabled() ALSA: hda - Fix speaker output from VAIO VPCL14M1R drm/amdgpu: Fix undue fallthroughs in golden registers initialization ASoC: do not close shared backend dailink KVM: async_pf: make rcu irq exit if not triggered from idle task mm/page_alloc: Remove kernel address exposure in free_reserved_area() timers: Fix overflow in get_next_timer_interrupt powerpc/tm: Fix saving of TM SPRs in core dump powerpc/64: Fix __check_irq_replay missing decrementer interrupt iommu/amd: Enable ga_log_intr when enabling guest_mode gpiolib: skip unwanted events, don't convert them to opposite edge ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize ext4: fix overflow caused by missing cast in ext4_resize_fs() ARM: dts: armada-38x: Fix irq type for pca955 ARM: dts: tango4: Request RGMII RX and TX clock delays media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl iscsi-target: Fix initial login PDU asynchronous socket close OOPs mmc: dw_mmc: Use device_property_read instead of of_property_read mmc: core: Use device_property_read instead of of_property_read media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds f2fs: sanity check checkpoint segno and blkoff Btrfs: fix early ENOSPC due to delalloc saa7164: fix double fetch PCIe access condition tcp_bbr: cut pacing rate only if filled pipe tcp_bbr: introduce bbr_bw_to_pacing_rate() helper tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helper tcp_bbr: remove sk_pacing_rate=0 transient during init tcp_bbr: init pacing rate on first RTT sample ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check() net: Zero terminate ifr_name in dev_ifname(). ipv6: avoid overflow of offset in ip6_find_1stfragopt net: dsa: b53: Add missing ARL entries for BCM53125 ipv4: initialize fib_trie prior to register_netdev_notifier call. rtnetlink: allocate more memory for dev_set_mac_address() mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled openvswitch: fix potential out of bound access in parse_ct packet: fix use-after-free in prb_retire_rx_blk_timer_expired() ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment() net: ethernet: nb8800: Handle all 4 RGMII modes identically dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly dccp: fix a memleak for dccp_feat_init err process sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}() sctp: fix the check for _sctp_walk_params and _sctp_walk_errors net/mlx5: Consider tx_enabled in all modes on remap net/mlx5: Fix command bad flow on command entry allocation failure net/mlx5e: Fix outer_header_zero() check size net/mlx5e: Fix wrong delay calculation for overflow check scheduling net/mlx5e: Schedule overflow check work to mlx5e workqueue net: phy: Correctly process PHY_HALTED in phy_stop_machine() xen-netback: correctly schedule rate-limited queues sparc64: Measure receiver forward progress to avoid send mondo timeout sparc64: Fix exception handling in UltraSPARC-III memcpy. wext: handle NULL extra data in iwe_stream_add_point better sh_eth: fix EESIPR values for SH77{34\|63} sh_eth: R8A7740 supports packet shecksumming net: phy: dp83867: fix irq generation tg3: Fix race condition in tg3_get_stats64(). x86/boot: Add missing declaration of string functions spi: spi-axi: Free resources on error path ASoC: rt5645: set sel_i2s_pre_div1 to 2 netfilter: use fwmark_reflect in nf_send_reset phy state machine: failsafe leave invalid RUNNING state ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int clk/samsung: exynos542x: mark some clocks as critical scsi: qla2xxx: Get mutex lock before checking optrom_state drm/virtio: fix framebuffer sparse warning ARM: dts: sun8i: Support DTB build for NanoPi M1 ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort nbd: blk_mq_init_queue returns an error code on failure, not NULL virtio_blk: fix panic in initialization error path ARM: 8632/1: ftrace: fix syscall name matching mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER lib/Kconfig.debug: fix frv build failure signal: protect SIGNAL_UNKILLABLE from unintentional clearing. mm: don't dereference struct page fields of invalid pages net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroy ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output net: account for current skb length when deciding about UFO net: phy: Fix PHY unbind crash workqueue: implicit ordered attribute should be overridable Linux 4.9.42 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-08-11 13:55:02 -07:00
Tejun Heo	445ee6cdd9	cgroup: fix error return value from cgroup_subtree_control() commit 3c74541777302eec43a0d1327c4d58b8659a776b upstream. While refactoring, `f7b2814bb9` ("cgroup: factor out cgroup_{apply\|finalize}_control() from cgroup_subtree_control_write()") broke error return value from the function. The return value from the last operation is always overridden to zero. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-11 08:49:28 -07:00
Tejun Heo	4a99eac8d2	cgroup: create dfl_root files on subsys registration commit 7af608e4f9530372aec6e940552bf76595f2e265 upstream. On subsystem registration, css_populate_dir() is not called on the new root css, so the interface files for the subsystem on cgrp_dfl_root aren't created on registration. This is a residue from the days when cgrp_dfl_root was used only as the parking spot for unused subsystems, which no longer is true as it's used as the root for cgroup2. This is often fine as later operations tend to create them as a part of mount (cgroup1) or subtree_control operations (cgroup2); however, it's not difficult to mount cgroup2 with the controller interface files missing as Waiman found out. Fix it by invoking css_populate_dir() on the root css on subsys registration. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-11 08:49:28 -07:00
Greg Kroah-Hartman	da3493c028	This is the 4.9.32 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllBNNMACgkQONu9yGCS aT4rJA//XOtxp2nr01OqffSItCX0vq7vMofbjo7spI2LF3JyxKkx0hje0im82XoD cYhZHyeK9lCHRZt2O541xOp3ITconecRvlt6uq49zKNvlFAelnp0rRTOktluJK/n Xm7EP2KLHeHXv+rvCfzU0hDC9fvoZe82NIF7kddAWHX4D/K3C6Jw6FVc4/QPl2MG 11R6pylPtMGObgPEmfDlohBitcC3KawXAhIcyTTAr3rcuO2Wm00H+uF0VOpyD9hZ S5A8ecuyZVQw3mIKAJ9vOwkVoln1E+/P/OltWEXElkBZONHpI4LB2IE7Nmwnx+pE 5oO3afNZIYOejX+GQvK2Apc8VONDV2VZRoIstzl+uaKDNGFaSLIu8AQ/UQfmoqTE Pwrp/FxgDGjq6NOf/+MqNhiqS9433Xgmt4HT5xctzS0zFNGzLIokdjRH3x7n6cez 2JvdtRvJfdNq31kh6xQE/8oXuZktRSw3GxC5K4Dw3NRh02/9KNAgEjNPEMNXziGw RhLhyXl8i5eGAu1gxruojzRyCvaMS6S8ZcAvSSoJze0kj8jfkZtQbBswvS1LF9EI FFemshd1sQxUeZiXGYPuUUfyDvxvs7KRIqAYux8lsIhSQTu16REqsZRrzidpDP+K xtH61YBWhwDHZKIoT+Jchmn8QCytI77MvFdGuTl+KRPV4OyyeJs= =NE4C -----END PGP SIGNATURE----- Merge 4.9.32 into android-4.9 Changes in 4.9.32 bnx2x: Fix Multi-Cos vxlan: eliminate cached dst leak ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() cxgb4: avoid enabling napi twice to the same queue tcp: disallow cwnd undo when switching congestion control vxlan: fix use-after-free on deletion ipv6: Fix leak in ipv6_gso_segment(). net: ping: do not abuse udp_poll() net/ipv6: Fix CALIPSO causing GPF with datagram support net: ethoc: enable NAPI before poll may be scheduled net: stmmac: fix completely hung TX when using TSO net: bridge: start hello timer only if device is up sparc64: Add __multi3 for gcc 7.x and later. sparc64: mm: fix copy_tsb to correctly copy huge page TSBs sparc: Machine description indices can vary sparc64: reset mm cpumask after wrap sparc64: combine activate_mm and switch_mm sparc64: redefine first version sparc64: add per-cpu mm of secondary contexts sparc64: new context wrap sparc64: delete old wrap code arch/sparc: support NR_CPUS = 4096 serial: ifx6x60: fix use-after-free on module unload ptrace: Properly initialize ptracer_cred on fork crypto: asymmetric_keys - handle EBUSY due to backlog correctly KEYS: fix dereferencing NULL payload with nonzero length KEYS: fix freeing uninitialized memory in key_update() KEYS: encrypted: avoid encrypting/decrypting stack buffers crypto: drbg - wait for crypto op not signal safe crypto: gcm - wait for crypto op not signal safe drm/amdgpu/ci: disable mclk switching for high refresh rates (v2) nfsd4: fix null dereference on replay nfsd: Fix up the "supattr_exclcreat" attributes efi: Don't issue error message when booted under Xen kvm: async_pf: fix rcu_irq_enter() with irqs enabled KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation arm64: KVM: Preserve RES1 bits in SCTLR_EL2 arm64: KVM: Allow unaligned accesses at EL2 arm: KVM: Allow unaligned accesses at HYP KVM: async_pf: avoid async pf injection when in guest mode KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt KVM: arm/arm64: vgic-v2: Do not use Active+Pending state for a HW interrupt dmaengine: usb-dmac: Fix DMAOR AE bit definition dmaengine: ep93xx: Always start from BASE0 dmaengine: ep93xx: Don't drain the transfers in terminate_all() dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx dmaengine: mv_xor_v2: enable XOR engine after its configuration dmaengine: mv_xor_v2: fix tx_submit() implementation dmaengine: mv_xor_v2: remove interrupt coalescing dmaengine: mv_xor_v2: set DMA mask to 40 bits cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode xen/privcmd: Support correctly 64KB page granularity when mapping memory ext4: fix SEEK_HOLE ext4: keep existing extra fields when inode expands ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO ext4: fix fdatasync(2) after extent manipulation operations drm: Fix oops + Xserver hang when unplugging USB drm devices usb: gadget: f_mass_storage: Serialize wake and sleep execution usb: chipidea: udc: fix NULL pointer dereference if udc_start failed usb: chipidea: debug: check before accessing ci_role staging/lustre/lov: remove set_fs() call from lov_getstripe() iio: adc: bcm_iproc_adc: swap primary and secondary isr handler's iio: light: ltr501 Fix interchanged als/ps register field iio: proximity: as3935: fix AS3935_INT mask iio: proximity: as3935: fix iio_trigger_poll issue mei: make sysfs modalias format similar as uevent modalias cpufreq: cpufreq_register_driver() should return -ENODEV if init fails target: Re-add check to reject control WRITEs with overflow data drm/msm: Expose our reservation object when exporting a dmabuf. ahci: Acer SA5-271 SSD Not Detected Fix cgroup: Prevent kill_css() from being called more than once Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled cpuset: consider dying css as offline fs: add i_blocksize() ufs: restore proper tail allocation fix ufs_isblockset() ufs: restore maintaining ->i_blocks ufs: set correct ->s_maxsize ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments() ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path cxl: Fix error path on bad ioctl cxl: Avoid double free_irq() for psl,slice interrupts btrfs: use correct types for page indices in btrfs_page_exists_in_range btrfs: fix memory leak in update_space_info failure path KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages scsi: qla2xxx: don't disable a not previously enabled PCI device scsi: qla2xxx: Modify T262 FW dump template to specify same start/end to debug customer issues scsi: qla2xxx: Set bit 15 for DIAG_ECHO_TEST MBC scsi: qla2xxx: Fix mailbox pointer error in fwdump capture powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function powerpc/numa: Fix percpu allocations to be NUMA aware powerpc/hotplug-mem: Fix missing endian conversion of aa_index powerpc/kernel: Fix FP and vector register restoration powerpc/kernel: Initialize load_tm on task creation perf/core: Drop kernel samples even though :u is specified drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl() drm/vmwgfx: Make sure backup_handle is always valid drm/nouveau/tmr: fully separate alarm execution/pending lists ALSA: timer: Fix race between read and ioctl ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT ASoC: Fix use-after-free at card unregistration cpu/hotplug: Drop the device lock on error drivers: char: mem: Fix wraparound check to allow mappings up to the end serial: sh-sci: Fix panic when serial console and DMA are enabled arm64: traps: fix userspace cache maintenance emulation on a tagged pointer arm64: hw_breakpoint: fix watchpoint matching for tagged pointers arm64: entry: improve data abort handling of tagged pointers ARM: 8636/1: Cleanup sanity_check_meminfo ARM: 8637/1: Adjust memory boundaries after reservations usercopy: Adjust tests to deal with SMAP/PAN drm/i915/vbt: don't propagate errors from intel_bios_init() drm/i915/vbt: split out defaults that are set when there is no VBT cpufreq: schedutil: move cached_raw_freq to struct sugov_policy cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() netfilter: nft_set_rbtree: handle element re-addition after deletion Linux 4.9.32 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-06-14 16:42:56 +02:00
Waiman Long	dff4c8bb13	cgroup: Prevent kill_css() from being called more than once commit 33c35aa4817864e056fd772230b0c6b552e36ea2 upstream. The kill_css() function may be called more than once under the condition that the css was killed but not physically removed yet followed by the removal of the cgroup that is hosting the css. This patch prevents any harmm from being done when that happens. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-06-14 15:06:00 +02:00
Alexei Starovoitov	1ee2b4b803	BACKPORT: bpf: introduce BPF_F_ALLOW_OVERRIDE flag If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command to the given cgroup the descendent cgroup will be able to override effective bpf program that was inherited from this cgroup. By default it's not passed, therefore override is disallowed. Examples: 1. prog X attached to /A with default prog Y fails to attach to /A/B and /A/B/C Everything under /A runs prog X 2. prog X attached to /A with allow_override. prog Y fails to attach to /A/B with default (non-override) prog M attached to /A/B with allow_override. Everything under /A/B runs prog M only. 3. prog X attached to /A with allow_override. prog Y fails to attach to /A with default. The user has to detach first to switch the mode. In the future this behavior may be extended with a chain of non-overridable programs. Also fix the bug where detach from cgroup where nothing is attached was not throwing error. Return ENOENT in such case. Add several testcases and adjust libbpf. Fixes: 3007098494be ("cgroup: add support for eBPF programs") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Daniel Mack <daniel@zonque.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28 ("cgroup: add support for eBPF programs") [AmitP: Refactored original patch for android-4.9 where libbpf sources are in samples/bpf/ and test_cgrp2_attach2, test_cgrp2_sock, and test_cgrp2_sock2 sample tests do not exist.] (cherry picked from commit 7f677633379b4abb3281cdbe7e7006f049305c03) Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-05-30 17:27:28 -07:00
Daniel Mack	f791c42b63	UPSTREAM: cgroup: add support for eBPF programs Cherry-pick from commit 3007098494bec614fb55dee7bc0410bb7db5ad18 This patch adds two sets of eBPF program pointers to struct cgroup. One for such that are directly pinned to a cgroup, and one for such that are effective for it. To illustrate the logic behind that, assume the following example cgroup hierarchy. A - B - C \ D - E If only B has a program attached, it will be effective for B, C, D and E. If D then attaches a program itself, that will be effective for both D and E, and the program in B will only affect B and C. Only one program of a given type is effective for a cgroup. Attaching and detaching programs will be done through the bpf(2) syscall. For now, ingress and egress inet socket filtering are the only supported use-cases. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 30950746 Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28	2017-05-22 15:30:56 -07:00
Dmitry Shmidt	5b9202d62b	Revert "ANDROID: [RFC]cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions" This reverts commit `8cc698d951`. Change-Id: Iad523b2f7fa83c461a5e965272319fd8f65ef10b Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2017-04-25 12:42:05 -07:00
Greg Kroah-Hartman	a2659b2b78	This is the 4.9.24 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlj5tWEACgkQONu9yGCS aT4noA//ZKztJudzNoXOaYThKQrbKTJeB+cL4PusLqd7gv9yKt7IJD+5+V/+jYRS N5GlwNOS02r6tduhCz9I/XjRKzilvC2WHt3Akf8nPC2dadhzcOW3UPvT0IEoIrHy FkTnUn/1TvBA2I1uB8k93nk/jfx7RYrO+4axiyZ0XQCMJTc7M4MmPTGw2jrzxPug Ogy7taNpY+By7PxJyU4rH8Lh2jiIYGYDrRWExZIcL7A7IlWBPbVkagzn2PPgQbg4 XX250JZ7bQyCGcrVxlTqaBScEkdICZTUqI3d3HQUwF39QBq9z+VlqP13T5j7XMea d7jHZswWClQj4xeGWq+iiLR0VF5OVCTkbhl61YTI9rPFEaewSwMiGrpHiidKTaOK KAxa34dwrz1IVI/p/a9DbZBj7IpwiH3jF3rEvx1ieEbTGHxLBi/COGK3aCES5TtX mjz/WjReo4aR3fudgNFJqNsmJmgEJ3yAO7Lk3tglBmYOgUpQhbRwClPjYvXVphzH ClYE1KCMMV4BTcDUqEfoBL46Qd2141E9Wxgmh+KYB+14X0QqQg+qRyZYyOvUGIuD A7J7MbXyBuM0dXS2t9olLbHb+tllGz6lBHANf8g141ot8jjlCBoP5yjBP6W0gZP8 UsKSH/sLEPGMO3KD4D2lLixkoENilS+3DhJwmrrqZAhhaqlMPEI= =/rwk -----END PGP SIGNATURE----- Merge 4.9.24 into android-4.9 Changes in 4.9.24: cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups tcmu: Fix possible overwrite of t_data_sg's last iov[] tcmu: Fix wrongly calculating of the base_command_size tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case thp: fix MADV_DONTNEED vs. MADV_FREE race thp: fix MADV_DONTNEED vs clear soft dirty race zsmalloc: expand class bit orangefs: free superblock when mount fails drm/nouveau/mpeg: mthd returns true on success now drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one drm/etnaviv: fix missing unlock on error in etnaviv_gpu_submit() CIFS: reconnect thread reschedule itself CIFS: store results of cifs_reopen_file to avoid infinite wait Input: xpad - add support for Razer Wildcat gamepad perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32() x86/efi: Don't try to reserve runtime regions x86/signals: Fix lower/upper bound reporting in compat siginfo x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions x86/vdso: Ensure vdso32_enabled gets set to valid values only x86/vdso: Plug race between mapping and ELF header setup acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison) ACPI / scan: Set the visited flag for all enumerated devices parisc: fix bugs in pa_memcpy efi/libstub: Skip GOP with PIXEL_BLT_ONLY format efi/fb: Avoid reconfiguration of BAR that covers the framebuffer iscsi-target: Fix TMR reference leak during session shutdown iscsi-target: Drop work-around for legacy GlobalSAN initiator scsi: sr: Sanity check returned mode data scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable scsi: qla2xxx: Add fix to read correct register value for ISP82xx. scsi: sd: Fix capacity calculation with 32-bit sector_t target: Avoid mappedlun symlink creation during lun shutdown xen, fbfront: fix connecting to backend new privimitive: iov_iter_revert() make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error libnvdimm: fix blk free space accounting libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat can: ifi: use correct register to read rx status pwm: rockchip: State of PWM clock should synchronize with PWM enabled state cpufreq: Bring CPUs up even if cpufreq_online() failed irqchip/irq-imx-gpcv2: Fix spinlock initialization ftrace: Fix removing of second function probe char: lack of bool string made CONFIG_DEVPORT always on Revert "MIPS: Lantiq: Fix cascaded IRQ setup" kvm: fix page struct leak in handle_vmon zram: do not use copy_page with non-page aligned address ftrace: Fix function pid filter on instances crypto: algif_aead - Fix bogus request dereference in completion function crypto: ahash - Fix EINPROGRESS notification callback parisc: Fix get_user() for 64-bit value on 32-bit kernel ath9k: fix NULL pointer dereference dvb-usb-v2: avoid use-after-free ext4: fix inode checksum calculation problem if i_extra_size is small mm: memcontrol: use special workqueue for creating per-memcg caches drm/nouveau/disp/mcp7x: disable dptmds workaround nbd: use loff_t for blocksize and nbd_set_size args nbd: fix 64-bit division ASoC: Intel: select DW_DMAC_CORE since it's mandatory platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event x86/xen: Fix APIC id mismatch warning on Intel ACPI / EC: Use busy polling mode when GPE is not enabled rtc: tegra: Implement clock handling mm: Tighten x86 /dev/mem with zeroing reads dvb-usb: don't use stack for firmware load dvb-usb-firmware: don't do DMA on stack cxusb: Use a dma capable buffer also for reading virtio-console: avoid DMA from stack net: ipv6: check route protocol when deleting routes sctp: deny peeloff operation on asocs with threads sleeping on it Linux 4.9.24 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-04-21 09:48:33 +02:00
Tejun Heo	f44236a1b0	cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream. Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-21 09:31:18 +02:00
Dmitry Shmidt	dcb6110067	This is the 4.9.9 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlicFX0ACgkQONu9yGCS aT7a2A//UGRG6hYhQwzHp25vRQmfXnHcet1cB9D2ADEY3EqrfWgcZcHWF4IbGtJ2 OWGtsgp3eiOlgYZiQBj1VG90M+8RixM06utcjlQzISsFaFGDPIeAX+IDY3rG51qR Wpo0p9ujV5HMsUJEXZ1OPbqrsgKqKOQpecq9nSZJR5bhdb3SkvgQK8VFbavko60J 5oUWM7n0CRy7VMNjv5UekUss7cNdc4OyqUbCcekYlmGtJ3y5EC9O0k07t7jMjKu8 49MgoXKVEk/7nYapGV3AJISuZxHXgirM9HSHsKXDZoFG94hjSlVHrDBDIykTQQ4t 7XKnCHveB+tGrE+pdDkFtYrN7k6pMdRMxq1NPA2eMpWzsvH4aQWaWXVMESkMIwLP x5C29xkOa9MmOTM4oqCrJ66CGar8Kcz156TEn4MR8Q4TSKoA4AHO85C8jCYOB+ID 18KJVrNLOgybYWr/Ci2GM/lx4A5mD5yTDm++AZRt4hBBo5K9Ns/d9OatlVtTmzgT ollAUkl593/nJoIqA4o0mbaHrwawI9Gn3fgJQyEkiy0IksMkTzj2XcmwAbzOGWaz mIEWLPg9OYZXHQxqNTAGdHPYl2NEWqv5mE0A7B+3qhUsu8VoU3LWET6wTEzl5bHC obaygJ+r/K1pvey+P9BH35ModRSy/qA3YmENEQmuF3//BUGAP4g= =9AU7 -----END PGP SIGNATURE----- Merge tag 'v4.9.9' into android-4.9-aosp This is the 4.9.9 stable release	2017-02-09 10:49:40 -08:00
Tejun Heo	1d88791d5e	cgroup: don't online subsystems before cgroup_name/path() are operational commit 07cd12945551b63ecb1a349d50a6d69d1d6feb4a upstream. While refactoring cgroup creation, `a5bca21520` ("cgroup: factor out cgroup_create() out of cgroup_mkdir()") incorrectly onlined subsystems before the new cgroup is associated with it kernfs_node. This is fine for cgroup proper but cgroup_name/path() depend on the associated kernfs_node and if a subsystem makes the new cgroup_subsys_state visible, which they're allowed to after onlining, it can lead to NULL dereference. The current code performs cgroup creation and subsystem onlining in cgroup_create() and cgroup_mkdir() makes the cgroup and subsystems visible afterwards. There's no reason to online the subsystems early and we can simply drop cgroup_apply_control_enable() call from cgroup_create() so that the subsystems are onlined and made visible at the same time. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: `a5bca21520` ("cgroup: factor out cgroup_create() out of cgroup_mkdir()") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-02-09 08:08:28 +01:00
John Stultz	8cc698d951	ANDROID: [RFC]cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions Try to better match what we're pushing upstream, use CAP_SYS_RESOURCE instead of CAP_SYS_NICE, which shoudln't affect Android as Zygote and system_server already use CAP_SYS_RESOURCE. Change-Id: I9b7ba2d9be1a469c9636497a6287f840891a91a8 Signed-off-by: John Stultz <john.stultz@linaro.org>	2017-01-27 13:55:31 -08:00
Dmitry Torokhov	579a63bf28	CHROMIUM: cgroups: relax permissions on moving tasks between cgroups Android expects system_server to be able to move tasks between different cgroups/cpusets, but does not want to be running as root. Let's relax permission check so that processes can move other tasks if they have CAP_SYS_NICE in the affected task's user namespace. BUG=b:31790445,chromium:647994 TEST=Boot android container, examine logcat Change-Id: Ia919c66ab6ed6a6daf7c4cf67feb38b13b1ad09b Signed-off-by: Dmitry Torokhov <dtor@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/394927 Reviewed-by: Ricky Zhou <rickyz@chromium.org>	2017-01-27 13:55:30 -08:00
Linus Torvalds	f34d3606f7	Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - tracepoints for basic cgroup management operations added - kernfs and cgroup path formatting functions updated to behave in the style of strlcpy() - non-critical bug fixes * 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent() cpuset: fix error handling regression in proc_cpuset_show() cgroup: add tracepoints for basic operations cgroup: make cgroup_path() and friends behave in the style of strlcpy() kernfs: remove kernfs_path_len() kernfs: make kernfs_path*() behave in the style of strlcpy() kernfs: add dummy implementation of kernfs_path_from_node()	2016-10-14 12:18:50 -07:00
Linus Torvalds	14986a34e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This set of changes is a number of smaller things that have been overlooked in other development cycles focused on more fundamental change. The devpts changes are small things that were a distraction until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an trivial regression fix to autofs for the unprivileged mount changes that went in last cycle. A pair of ioctls has been added by Andrey Vagin making it is possible to discover the relationships between namespaces when referring to them through file descriptors. The big user visible change is starting to add simple resource limits to catch programs that misbehave. With namespaces in general and user namespaces in particular allowing users to use more kinds of resources, it has become important to have something to limit errant programs. Because the purpose of these limits is to catch errant programs the code needs to be inexpensive to use as it always on, and the default limits need to be high enough that well behaved programs on well behaved systems don't encounter them. To this end, after some review I have implemented per user per user namespace limits, and use them to limit the number of namespaces. The limits being per user mean that one user can not exhause the limits of another user. The limits being per user namespace allow contexts where the limit is 0 and security conscious folks can remove from their threat anlysis the code used to manage namespaces (as they have historically done as it root only). At the same time the limits being per user namespace allow other parts of the system to use namespaces. Namespaces are increasingly being used in application sand boxing scenarios so an all or nothing disable for the entire system for the security conscious folks makes increasing use of these sandboxes impossible. There is also added a limit on the maximum number of mounts present in a single mount namespace. It is nontrivial to guess what a reasonable system wide limit on the number of mount structure in the kernel would be, especially as it various based on how a system is using containers. A limit on the number of mounts in a mount namespace however is much easier to understand and set. In most cases in practice only about 1000 mounts are used. Given that some autofs scenarious have the potential to be 30,000 to 50,000 mounts I have set the default limit for the number of mounts at 100,000 which is well above every known set of users but low enough that the mount hash tables don't degrade unreaonsably. These limits are a start. I expect this estabilishes a pattern that other limits for resources that namespaces use will follow. There has been interest in making inotify event limits per user per user namespace as well as interest expressed in making details about what is going on in the kernel more visible" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits) autofs: Fix automounts by using current_real_cred()->uid mnt: Add a per mount namespace limit on the number of mounts netns: move {inc,dec}_net_namespaces into #ifdef nsfs: Simplify __ns_get_path tools/testing: add a test to check nsfs ioctl-s nsfs: add ioctl to get a parent namespace nsfs: add ioctl to get an owning user namespace for ns file descriptor kernel: add a helper to get an owning user namespace for a namespace devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts devpts: Remove sync_filesystems devpts: Make devpts_kill_sb safe if fsi is NULL devpts: Simplify devpts_mount by using mount_nodev devpts: Move the creation of /dev/pts/ptmx into fill_super devpts: Move parse_mount_options into fill_super userns: When the per user per user namespace limit is reached return ENOSPC userns; Document per user per user namespace limits. mntns: Add a limit on the number of mount namespaces. netns: Add a limit on the number of net namespaces cgroupns: Add a limit on the number of cgroup namespaces ipcns: Add a limit on the number of ipc namespaces ...	2016-10-06 09:52:23 -07:00
Ingo Molnar	0b429e18c2	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-30 10:54:46 +02:00
Tejun Heo	e0223003e6	cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent() `4c737b41de` ("cgroup: make cgroup_path() and friends behave in the style of strlcpy()") broke error handling in proc_cgroup_show() and cgroup_release_agent() by not handling negative return values from cgroup_path_ns_locked(). Fix it. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-09-29 15:55:16 +02:00
Linus Torvalds	8ab293e3a1	Merge branch 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Three late fixes for cgroup: Two cpuset ones, one trivial and the other pretty obscure, and a cgroup core fix for a bug which impacts cgroup v2 namespace users" * 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix invalid controller enable rejections with cgroup namespace cpuset: fix non static symbol warning cpuset: handle race between CPU hotplug and cpuset_hotplug_work	2016-09-27 16:43:11 -07:00
Tejun Heo	9157056da8	cgroup: fix invalid controller enable rejections with cgroup namespace On the v2 hierarchy, "cgroup.subtree_control" rejects controller enables if the cgroup has processes in it. The enforcement of this logic assumes that the cgroup wouldn't have any css_sets associated with it if there are no tasks in the cgroup, which is no longer true since `a79a908fd2` ("cgroup: introduce cgroup namespaces"). When a cgroup namespace is created, it pins the css_set of the creating task to use it as the root css_set of the namespace. This extra reference stays as long as the namespace is around and makes "cgroup.subtree_control" think that the namespace root cgroup is not empty even when it is and thus reject controller enables. Fix it by making cgroup_subtree_control() walk and test emptiness of each css_set instead of testing whether the list_head is empty. While at it, update the comment of cgroup_task_count() to indicate that the returned value may be higher than the number of tasks, which has always been true due to temporary references and doesn't break anything. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Evgeny Vereshchagin <evvers@ya.ru> Cc: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Aditya Kali <adityakali@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: stable@vger.kernel.org # v4.6+ Fixes: `a79a908fd2` ("cgroup: introduce cgroup namespaces") Link: https://github.com/systemd/systemd/pull/3589#issuecomment-249089541	2016-09-23 16:55:49 -04:00
Eric W. Biederman	7872559664	Merge branch 'nsfs-ioctls' into HEAD From: Andrey Vagin <avagin@openvz.org> Each namespace has an owning user namespace and now there is not way to discover these relationships. Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships too. Why we may want to know relationships between namespaces? One use would be visualization, in order to understand the running system. Another would be to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? One more use-case (which usually called abnormal) is checkpoint/restart. In CRIU we are going to dump and restore nested namespaces. There [1] was a discussion about which interface to choose to determing relationships between namespaces. Eric suggested to add two ioctl-s [2]: > Grumble, Grumble. I think this may actually a case for creating ioctls > for these two cases. Now that random nsfs file descriptors are bind > mountable the original reason for using proc files is not as pressing. > > One ioctl for the user namespace that owns a file descriptor. > One ioctl for the parent namespace of a namespace file descriptor. Here is an implementaions of these ioctl-s. $ man man7/namespaces.7 ... Since Linux 4.X, the following ioctl(2) calls are supported for namespace file descriptors. The correct syntax is: fd = ioctl(ns_fd, ioctl_type); where ioctl_type is one of the following: NS_GET_USERNS Returns a file descriptor that refers to an owning user names‐ pace. NS_GET_PARENT Returns a file descriptor that refers to a parent namespace. This ioctl(2) can be used for pid and user namespaces. For user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same meaning. In addition to generic ioctl(2) errors, the following specific ones can occur: EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. EPERM The requested namespace is outside of the current namespace scope. [1] https://lkml.org/lkml/2016/7/6/158 [2] https://lkml.org/lkml/2016/7/9/101 Changes for v2: * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing outside of the init namespace, so we can return EPERM in this case too. > The fewer special cases the easier the code is to get > correct, and the easier it is to read. // Eric Changes for v3: * rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: "W. Trevor King" <wking@tremily.us> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@canonical.com>	2016-09-22 20:00:36 -05:00
Andrey Vagin	bcac25a58b	kernel: add a helper to get an owning user namespace for a namespace Return -EPERM if an owning user namespace is outside of a process current user namespace. v2: In a first version ns_get_owner returned ENOENT for init_user_ns. This special cases was removed from this version. There is nothing outside of init_user_ns, so we can return EPERM. v3: rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2016-09-22 19:59:39 -05:00
Eric W. Biederman	df75e7748b	userns: When the per user per user namespace limit is reached return ENOSPC The current error codes returned when a the per user per user namespace limit are hit (EINVAL, EUSERS, and ENFILE) are wrong. I asked for advice on linux-api and it we made clear that those were the wrong error code, but a correct effor code was not suggested. The best general error code I have found for hitting a resource limit is ENOSPC. It is not perfect but as it is unambiguous it will serve until someone comes up with a better error code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-09-22 13:25:56 -05:00
Ingo Molnar	7cf0f1426a	Merge branch 'locking/urgent' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-22 15:21:48 +02:00
Johannes Weiner	d979a39d72	cgroup: duplicate cgroup reference when cloning sockets When a socket is cloned, the associated sock_cgroup_data is duplicated but not its reference on the cgroup. As a result, the cgroup reference count will underflow when both sockets are destroyed later on. Fixes: `bd1060a1d6` ("sock, cgroup: add sock->sk_cgroup") Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-09-19 15:36:17 -07:00
Peter Zijlstra	3942a9bd7b	locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write() The current percpu-rwsem read side is entirely free of serializing insns at the cost of having a synchronize_sched() in the write path. The latency of the synchronize_sched() is too high for cgroups. The commit `1ed1328792` talks about the write path being a fairly cold path but this is not the case for Android which moves task to the foreground cgroup and back around binder IPC calls from foreground processes to background processes, so it is significantly hotter than human initiated operations. Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the problem, hopefully it should not be that slow after another commit: `80127a3968` ("locking/percpu-rwsem: Optimize readers and reduce global impact"). We could just add rcu_sync_enter() into cgroup_init() but we do not want another synchronize_sched() at boot time, so this patch adds the new helper which doesn't block but currently can only be called before the first use. Reported-by: John Stultz <john.stultz@linaro.org> Reported-by: Dmitry Shmidt <dimitrysh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Colin Cross <ccross@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rom Lemarchand <romlem@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Link: http://lkml.kernel.org/r/20160811165413.GA22807@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-18 15:36:59 +02:00
Tejun Heo	ed1777de25	cgroup: add tracepoints for basic operations Debugging what goes wrong with cgroup setup can get hairy. Add tracepoints for cgroup hierarchy mount, cgroup creation/destruction and task migration operations for better visibility. Signed-off-by: Tejun Heo <tj@kernel.org>	2016-08-10 11:23:44 -04:00
Tejun Heo	4c737b41de	cgroup: make cgroup_path() and friends behave in the style of strlcpy() cgroup_path() and friends used to format the path from the end and thus the resulting path usually didn't start at the start of the passed in buffer. Also, when the buffer was too small, the partial result was truncated from the head rather than tail and there was no way to tell how long the full path would be. These make the functions less robust and more awkward to use. With recent updates to kernfs_path(), cgroup_path() and friends can be made to behave in strlcpy() style. * cgroup_path(), cgroup_path_ns[_locked]() and task_cgroup_path() now always return the length of the full path. If buffer is too small, it contains nul terminated truncated output. * All users updated accordingly. v2: cgroup_path() usage in kernel/sched/debug.c converted. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Peter Zijlstra <peterz@infradead.org>	2016-08-10 11:23:44 -04:00
Eric W. Biederman	d08311dd6f	cgroupns: Add a limit on the number of cgroup namespaces Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-08-08 14:42:03 -05:00
Linus Torvalds	574c7e2333	Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull more cgroup updates from Tejun Heo: "I forgot to include the patches which got applied to for-4.7-fixes late during last cycle. Eric's three patches fix bugs introduced with the namespace support" * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroupns: Only allow creation of hierarchies in the initial cgroup namespace cgroupns: Close race between cgroup_post_fork and copy_cgroup_ns cgroupns: Fix the locking in copy_cgroup_ns	2016-07-29 14:29:04 -07:00
Linus Torvalds	468fc7ed55	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Unified UDP encapsulation offload methods for drivers, from Alexander Duyck. 2) Make DSA binding more sane, from Andrew Lunn. 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli. 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar. 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX packets as soon as the device sees them, with the option to mirror the packet on TX via the same interface. From Brenden Blanco and others. 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet. 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli. 8) Simplify netlink conntrack entry layout, from Florian Westphal. 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido Schimmel, Yotam Gigi, and Jiri Pirko. 10) Add SKB array infrastructure and convert tun and macvtap over to it. From Michael S Tsirkin and Jason Wang. 11) Support qdisc packet injection in pktgen, from John Fastabend. 12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy. 13) Add NV congestion control support to TCP, from Lawrence Brakmo. 14) Add GSO support to SCTP, from Marcelo Ricardo Leitner. 15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni. 16) Support MPLS over IPV4, from Simon Horman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) xgene: Fix build warning with ACPI disabled. be2net: perform temperature query in adapter regardless of its interface state l2tp: Correctly return -EBADF from pppol2tp_getname. net/mlx5_core/health: Remove deprecated create_singlethread_workqueue net: ipmr/ip6mr: update lastuse on entry change macsec: ensure rx_sa is set when validation is disabled tipc: dump monitor attributes tipc: add a function to get the bearer name tipc: get monitor threshold for the cluster tipc: make cluster size threshold for monitoring configurable tipc: introduce constants for tipc address validation net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() MAINTAINERS: xgene: Add driver and documentation path Documentation: dtb: xgene: Add MDIO node dtb: xgene: Add MDIO node drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset drivers: net: xgene: Use exported functions drivers: net: xgene: Enable MDIO driver drivers: net: xgene: Add backward compatibility drivers: net: phy: xgene: Add MDIO driver ...	2016-07-27 12:03:20 -07:00
Linus Torvalds	b55b048718	Merge branch 'for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too exciting. - updates to the pids controller so that pid limit breaches can be noticed and monitored from userland. - cleanups and non-critical bug fixes" * 'for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: remove duplicated include from cgroup.c cgroup: Use lld instead of ld when printing pids controller events_limit cgroup: Add pids controller event when fork fails because of pid limit cgroup: allow NULL return from ss->css_alloc() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root	2016-07-26 14:34:17 -07:00

1 2 3 4 5 ...

931 commits