How to trigger gaming mode? Just open a game that is supported in games list
How to exit? Kill the game from recents, or simply back to homescreen
What does this gaming mode do?
- It limits big cluster maximum frequency to 2,0GHz, and little cluster
to values matching GPU frequencies as below:
+ 338MHz: 455MHz
+ 385MHz and above: 1053MHz
- As for the cluster freq limits, it overcomes heating issue while playing
heavy games, as well as saves battery juice
Big thanks to [kerneltoast] for the following commits on his wahoo kernel
5ac1e81d3de13e2c4554
Gaming control's idea was based on these
Signed-off-by: Diep Quynh <remilia.1505@gmail.com>
introduce BPF_PROG_QUERY command to retrieve a set of either
attached programs to given cgroup or a set of effective programs
that will execute for events within a cgroup
Change-Id: I05e0ed5f6eddc30f4a18216d4541448816fd1ae5
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.
The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
that cgroup program gets run in addition to the program in this cgroup.
NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.
Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.
To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0)
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 121213201
Bug: 138317270
Test: build & boot cuttlefish
Change-Id: If17b11a773f73d45ea565a947fc1bf7e158db98d
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl50ed8ACgkQONu9yGCS
aT7A8A//WF8aq34T1LLR70ESa6fJVj2dEzCQA94qvUCPHDJfBj1x00R8OGvZfKEc
E7YPI8MPrvMmXnyxImge2UcqCJhmnis+gk1fLn+N3Yi5nfZWrxC1dnShCnCTeye1
tA+BuhyzOXHDmLqrktBcT3TCTrIUWK8lehR8m7jlkAh7wG6Z2QmgKlaiq5fRAVE9
Z0je4fw0HxdVrCM84BJiM1M6w3pa+AYUkSBfqvU+fgm8xfOWGUPZFEitx9O1PwaH
m0UOS51gM5ZSzpAMDBlXYnmUgPxUTlpwqIL77zgMFzYcpKbf2rVphFRxmacWWqQ2
MWnt6vn/eatDeJ6TFAF08APy63SUAR7BvssSYoySz8Y4m6LbL4yUfCWYkL9lhLFH
dJL6TTjUwFw1Rd7UVzHongEr24NASCS37MiR8V/3RUiDqIYtrlaJKxhhVdTKg7mm
xRKA+QBXkGUfwFJm2mL+3E3D7/Xgl4kNo22lp1YgamTgc50/UMB5JEf8Z3aqa/qh
ie/4oKnmaa+SSgOxQmvqS3lWF7RYU+axvc9K0FK5CYSXnc6Jfgtq0yDB5twslj05
HdCwNJX9KOL0NZAzQKs+FLjnhOtnRigANMG7KMm5nmS3vwsxQqIkm0b6tPhgBcd/
k4QQm8h6C5Tdi+R39fGQOgtBK0gVVJl+n3XfA3gbmunvXhSo6IQ=
=fFH1
-----END PGP SIGNATURE-----
Merge 4.9.217 into android-4.9-q
Changes in 4.9.217
NFS: Remove superfluous kmap in nfs_readdir_xdr_to_array
phy: Revert toggling reset changes.
net: phy: Avoid multiple suspends
cgroup, netclassid: periodically release file_lock on classid updating
gre: fix uninit-value in __iptunnel_pull_header
ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface
net: macsec: update SCI upon MAC address change.
net: nfc: fix bounds checking bugs on "pipe"
r8152: check disconnect status after long sleep
bnxt_en: reinitialize IRQs when MTU is modified
fib: add missing attribute validation for tun_id
nl802154: add missing attribute validation
nl802154: add missing attribute validation for dev_type
macsec: add missing attribute validation for port
net: fq: add missing attribute validation for orphan mask
team: add missing attribute validation for port ifindex
team: add missing attribute validation for array index
nfc: add missing attribute validation for SE API
nfc: add missing attribute validation for vendor subcommand
ipvlan: add cond_resched_rcu() while processing muticast backlog
ipvlan: do not add hardware address of master to its unicast filter list
ipvlan: egress mcast packets are not exceptional
ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast()
ipvlan: don't deref eth hdr before checking it's set
macvlan: add cond_resched() during multicast processing
net: fec: validate the new settings in fec_enet_set_coalesce()
slip: make slhc_compress() more robust against malicious packets
bonding/alb: make sure arp header is pulled before accessing it
cgroup: memcg: net: do not associate sock with unrelated cgroup
net: phy: fix MDIO bus PM PHY resuming
virtio-blk: fix hw_queue stopped on arbitrary error
iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
workqueue: don't use wq_select_unbound_cpu() for bound works
drm/amd/display: remove duplicated assignment to grph_obj_type
cifs_atomic_open(): fix double-put on late allocation failure
gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
KVM: x86: clear stale x86_emulate_ctxt->intercept value
ARC: define __ALIGN_STR and __ALIGN symbols for ARC
efi: Fix a race and a buffer overflow while reading efivars via sysfs
iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
nl80211: add missing attribute validation for critical protocol indication
nl80211: add missing attribute validation for beacon report scanning
nl80211: add missing attribute validation for channel switch
netfilter: cthelper: add missing attribute validation for cthelper
mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()
iommu/vt-d: Fix the wrong printing in RHSA parsing
iommu/vt-d: Ignore devices with out-of-spec domain number
ipv6: restrict IPV6_ADDRFORM operation
efi: Add a sanity check to efivar_store_raw()
batman-adv: Fix double free during fragment merge error
batman-adv: Fix transmission of final, 16th fragment
batman-adv: Initialize gw sel_class via batadv_algo
batman-adv: Fix rx packet/bytes stats on local ARP reply
batman-adv: Use default throughput value on cfg80211 error
batman-adv: Accept only filled wifi station info
batman-adv: fix TT sync flag inconsistencies
batman-adv: Avoid spurious warnings from bat_v neigh_cmp implementation
batman-adv: Always initialize fragment header priority
batman-adv: Fix check of retrieved orig_gw in batadv_v_gw_is_eligible
batman-adv: Fix lock for ogm cnt access in batadv_iv_ogm_calc_tq
batman-adv: Fix internal interface indices types
batman-adv: Avoid race in TT TVLV allocator helper
batman-adv: Fix TT sync flags for intermediate TT responses
batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
batman-adv: Fix debugfs path for renamed hardif
batman-adv: Fix debugfs path for renamed softif
batman-adv: Avoid storing non-TT-sync flags on singular entries too
batman-adv: Fix multicast TT issues with bogus ROAM flags
batman-adv: Prevent duplicated gateway_node entry
batman-adv: Fix duplicated OGMs on NETDEV_UP
batman-adv: Avoid free/alloc race when handling OGM2 buffer
batman-adv: Avoid free/alloc race when handling OGM buffer
batman-adv: Don't schedule OGM for disabled interface
batman-adv: update data pointers after skb_cow()
batman-adv: Avoid probe ELP information leak
batman-adv: Use explicit tvlv padding for ELP packets
perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
ACPI: watchdog: Allow disabling WDAT at boot
HID: apple: Add support for recent firmware on Magic Keyboards
HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
cfg80211: check reg_rule for NULL in handle_channel_custom()
net: ks8851-ml: Fix IRQ handling and locking
mac80211: rx: avoid RCU list traversal under mutex
signal: avoid double atomic counter increments for user accounting
jbd2: fix data races at struct journal_head
ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional()
ARM: 8958/1: rename missed uaccess .fixup section
mm: slub: add missing TID bump in kmem_cache_alloc_bulk()
ipv4: ensure rcu_read_lock() in cipso_v4_error()
Linux 4.9.217
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia7aeed273cd7548dc8d0dfaaad8b96bedfe499b1
[ Upstream commit e876ecc67db80dfdb8e237f71e5b43bb88ae549c ]
We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.
mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.
Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.
WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.
The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
<IRQ>
dump_stack+0x57/0x75
mem_cgroup_sk_alloc+0xe9/0xf0
sk_clone_lock+0x2a7/0x420
inet_csk_clone_lock+0x1b/0x110
tcp_create_openreq_child+0x23/0x3b0
tcp_v6_syn_recv_sock+0x88/0x730
tcp_check_req+0x429/0x560
tcp_v6_rcv+0x72d/0xa40
ip6_protocol_deliver_rcu+0xc9/0x400
ip6_input+0x44/0xd0
? ip6_protocol_deliver_rcu+0x400/0x400
ip6_rcv_finish+0x71/0x80
ipv6_rcv+0x5b/0xe0
? ip6_sublist_rcv+0x2e0/0x2e0
process_backlog+0x108/0x1e0
net_rx_action+0x26b/0x460
__do_softirq+0x104/0x2a6
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq.part.19+0x40/0x50
__local_bh_enable_ip+0x51/0x60
ip6_finish_output2+0x23d/0x520
? ip6table_mangle_hook+0x55/0x160
__ip6_finish_output+0xa1/0x100
ip6_finish_output+0x30/0xd0
ip6_output+0x73/0x120
? __ip6_finish_output+0x100/0x100
ip6_xmit+0x2e3/0x600
? ipv6_anycast_cleanup+0x50/0x50
? inet6_csk_route_socket+0x136/0x1e0
? skb_free_head+0x1e/0x30
inet6_csk_xmit+0x95/0xf0
__tcp_transmit_skb+0x5b4/0xb20
__tcp_send_ack.part.60+0xa3/0x110
tcp_send_ack+0x1d/0x20
tcp_rcv_state_process+0xe64/0xe80
? tcp_v6_connect+0x5d1/0x5f0
tcp_v6_do_rcv+0x1b1/0x3f0
? tcp_v6_do_rcv+0x1b1/0x3f0
__release_sock+0x7f/0xd0
release_sock+0x30/0xa0
__inet_stream_connect+0x1c3/0x3b0
? prepare_to_wait+0xb0/0xb0
inet_stream_connect+0x3b/0x60
__sys_connect+0x101/0x120
? __sys_getsockopt+0x11b/0x140
__x64_sys_connect+0x1a/0x20
do_syscall_64+0x51/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d75807383 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.
The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
that cgroup program gets run in addition to the program in this cgroup.
NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.
Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.
To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0)
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 121213201
Bug: 138317270
Test: build & boot cuttlefish
Change-Id: I06b71c850b9f3e052b106abab7a4a3add012a3f8
Psi monitor aims to provide a low-latency short-term pressure
detection mechanism configurable by users. It allows users to
monitor psi metrics growth and trigger events whenever a metric
raises above user-defined threshold within user-defined time window.
Time window and threshold are both expressed in usecs. Multiple psi
resources with different thresholds and window sizes can be monitored
concurrently.
Psi monitors activate when system enters stall state for the monitored
psi metric and deactivate upon exit from the stall state. While system
is in the stall state psi signal growth is monitored at a rate of 10 times
per tracking window. Min window size is 500ms, therefore the min monitoring
interval is 50ms. Max window size is 10s with monitoring interval of 1s.
When activated psi monitor stays active for at least the duration of one
tracking window to avoid repeated activations/deactivations when psi
signal is bouncing.
Notifications to the users are rate-limited to one per tracking window.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052418/)
Conflicts:
include/linux/psi.h
kernel/cgroup.c
kernel/sched/psi.c
(1. replaced __poll_t with unsigned int
2. replaced EPOLLERR/EPOLLPRI with POLLERR/POLLPRI (values are the same)
3. include <linux/cgroup-defs.h> in include/linux/psi.h)
Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: I1688f047e98e1f109627dad72a33d2f70e575268
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cgroup has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes. To allow polling for
custom events, add a .poll callback that can override the default.
This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.
Link: http://lkml.kernel.org/r/20190124211518.244221-3-surenb@google.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit: dc50537bdd1a0804fa2cbc990565ee9a944e66fa)
Conflicts:
include/linux/cgroup-defs.h
kernel/cgroup.c
1. made changes in kernel/cgroup.c instead of kernel/cgroup/cgroup.c
2. replaced __poll_t with unsigned int
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Change-Id: Ie3d914197d1f150e1d83c6206865566a7cbff1b4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Pipe the newly added kernfs->open/release() callbacks through cftype.
While at it, as cleanup operations now can be performed from
->release() instead of ->seq_stop(), make the latter optional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
(cherry picked from commit e90cbebc3fa5caea4c8bfeb0d0157a0cee53efc7)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Change-Id: Iff9794cbbc2c7067c24cb2f767bbdeffa26b5180
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
On a system that executes multiple cgrouped jobs and independent
workloads, we don't just care about the health of the overall system, but
also that of individual jobs, so that we can ensure individual job health,
fairness between jobs, or prioritize some jobs over others.
This patch implements pressure stall tracking for cgroups. In kernels
with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure,
and io.pressure files that track aggregate pressure stall times for only
the tasks inside the cgroup.
Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2ce7135adc9ad081aa3c49744144376ac74fea60)
Conflicts:
Documentation/cgroup-v2.txt
include/linux/cgroup.h
kernel/cgroup/cgroup.c
(1. manual merge from Documentation/admin-guide/cgroup-v2.rst
2. manually merged changes from kernel/cgroup/cgroup.c into kernel/cgroup.c
3. manual merge in css_free_work_fn to allow psi support only for cgroup v2
4. manual merge in cgroup_create to allow psi support only for cgroup v2)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I2ffb9d50ba87f8b7655bed215a625784e098879c
Misc trivial changes to prepare for future changes. No functional
difference.
* Expose cgroup_get(), cgroup_tryget() and cgroup_parent().
* Implement task_dfl_cgroup() which dereferences css_set->dfl_cgrp.
* Rename cgroup_stats_show() to cgroup_stat_show() for consistency
with the file name.
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 3e48930cc74f0c212ee1838f89ad0ca7fcf2fea1)
Conflicts:
kernel/cgroup/cgroup.c
(1. manual merge because kernel/cgroup/cgroup.c is under kernel/cgroup.c
2. cgroup_stats_show change is skipped because the function dos not exist)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I756ee3dcf0d0f3da69cd1b58e644271625053538
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlvBnGMACgkQONu9yGCS
aT6KZA//cXQIPaITOcikXVN9LTbfNI32k0803qJi2PPr0lajb6RETnfYbQH+h+Ta
d77uEJ7e15U7bxaoLDKJhYe8SvtqMHq6hhqNXwg+woFgAkoMU4pGmAocywLoMpS/
INIc4KwEPKD4JkQYQF1sk3nB22K9DiXn/OjoXTlxyAYuCnvdu99xwIpvaI8GN1Yo
SWiMDvmhu1Nj+vD79VDV5UhigI/+lHIaCasJ3fSVyFx6kE2XYBz7qcNGULGzI+pl
c6qQpds4UIdVp04d/JQNOurANA6oHYWexrV1Q8zENMwpAOEoDVMDjC/Fi8iIxfCK
EfgNABr+5INg2gtPZ1YFBad96vgrxWo1gS+Rdaq71zO5luoWktqG/Kv99WnMIOZf
TiS0+6GRKfL06PyfFu/Bsx8Dzp/GVCsYWQEEPNhp5PYUOoIRzI8XiJcofRNlxL0p
IiQ0RUansCaN7VN8Z3GzBqDOGikuhxAPrTTAhTAXABrXcrSoAEVhUpmf5UMItnIj
cp8AoYqqQxA81PMTLLK0K78B3Mkkdk9BIB82y39WV28j6dB6ZzfPFg4ikdIwS+hm
ZNck/2IwhuuPGugDVVX206zH5ATQaS0Vg+BQNqw/Y03/U6SKSlUs6HB4c0X1quy6
jwskBvuADXXlWFuzMlenlAGg1Shk4lwsCGN3F9ziSkrSI6azBRs=
=aHnt
-----END PGP SIGNATURE-----
Merge 4.9.133 into android-4.9
Changes in 4.9.133
mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
fbdev/omapfb: fix omapfb_memory_read infoleak
xen-netback: fix input validation in xenvif_set_hash_mapping()
x86/vdso: Fix asm constraints on vDSO syscall fallbacks
x86/vdso: Fix vDSO syscall fallback asm constraint regression
PCI: Reprogram bridge prefetch registers on resume
mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys
PM / core: Clear the direct_complete flag on errors
dm cache metadata: ignore hints array being too small during resize
dm cache: fix resize crash if user doesn't reload cache table
xhci: Add missing CAS workaround for Intel Sunrise Point xHCI
usb: xhci-mtk: resume USB3 roothub first
USB: serial: simple: add Motorola Tetra MTP6550 id
tty: Drop tty->count on tty_reopen() failure
of: unittest: Disable interrupt node tests for old world MAC systems
ext4: add corruption check in ext4_xattr_set_entry()
ext4: always verify the magic number in xattr blocks
cgroup: Fix deadlock in cpu hotplug path
ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait
ath10k: fix kernel panic issue during pci probe
powerpc/fadump: Return error when fadump registration fails
ARC: clone syscall to setp r25 as thread pointer
x86/mm: Expand static page table for fixmap space
f2fs: fix invalid memory access
ucma: fix a use-after-free in ucma_resolve_ip()
ubifs: Check for name being NULL while mounting
ath10k: fix scan crash due to incorrect length calculation
ebtables: arpreply: Add the standard target sanity check
x86/fpu: Remove use_eager_fpu()
x86/fpu: Remove struct fpu::counter
Revert "perf: sync up x86/.../cpufeatures.h"
x86/fpu: Finish excising 'eagerfpu'
Linux 4.9.133
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 116d2f7496c51b2e02e8e4ecdd2bdf5fb9d5a641 upstream.
Deadlock during cgroup migration from cpu hotplug path when a task T is
being moved from source to destination cgroup.
kworker/0:0
cpuset_hotplug_workfn()
cpuset_hotplug_update_tasks()
hotplug_update_tasks_legacy()
remove_tasks_in_empty_cpuset()
cgroup_transfer_tasks() // stuck in iterator loop
cgroup_migrate()
cgroup_migrate_add_task()
In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T.
Task T will not migrate to destination cgroup. css_task_iter_start()
will keep pointing to task T in loop waiting for task T cg_list node
to be removed.
Task T
do_exit()
exit_signals() // sets PF_EXITING
exit_task_namespaces()
switch_task_namespaces()
free_nsproxy()
put_mnt_ns()
drop_collected_mounts()
namespace_unlock()
synchronize_rcu()
_synchronize_rcu_expedited()
schedule_work() // on cpu0 low priority worker pool
wait_event() // waiting for work item to execute
Task T inserted a work item in the worklist of cpu0 low priority
worker pool. It is waiting for expedited grace period work item
to execute. This work item will only be executed once kworker/0:0
complete execution of cpuset_hotplug_workfn().
kworker/0:0 ==> Task T ==>kworker/0:0
In case of PF_EXITING task being migrated from source to destination
cgroup, migrate next available task in source cgroup.
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
[AmitP: Upstream commit cherry-pick failed, so I picked the
backported changes from CAF/msm-4.9 tree instead:
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?id=49b74f1696417b270c89cd893ca9f37088928078]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmN0iUACgkQONu9yGCS
aT6gZRAAhYbpsz2XRFSQ5H/Sk8xwJuwtLB2at3Y1CLCb+lLhlRsV3l4pD+4KRoEl
fU01l5s5ZalZJekGfEfEOQOfoJHsCxzFSzKzP06/GA5u6DbwtHUE2SjNWe84j6Ct
Hx0jN90yj7S8vy2umROux+fVvZQ4Xay4TDCWhBeXgOFXevwC/G9D2LWE2NIYwbDH
Ighahrhs21FZc9wbah0L04bRBAR7+ALLq1sO8ebKwl8eFzAkcEwI/yS48cnjGlgW
9HW5MmY1BYTnRCrXaw5L0Vf5zH6obT7amrLNljNYN6vN62DRoOfwQh4QcblnIAoi
L+HdZilifZ970RwQ2As3vy63/Kk3b207ht4mriTCyGXM9MY6bRovYv1wDAUlv7aD
GlA8Q7xwsiJ4sG4i5LABjly+QeWymZ2b0kVWYpneJuBuj/gWVDhh1lfT+nOCAcJ6
ROUY6d64ghKPBomkqlMSC+7sH7QKa0/W9WDQCLxtnmjcAkeElpGNGu/m/Thhvi2I
NDq2sbMAeGJquXBXIN8W4NPy0puOn0wjqFI7LE61ujSiAxT8973uDtNlrmB/eCAf
zD9yJsKELS20PKToren4hYYuRM2XlKh9gVIOWB2pShfzvSO7807ZyVcoI8/bVgZe
I2BH6Dt3t+qqWR7B5/qvxxmNCv3HNMpNUzy/z+fXEf8/U3zqiTM=
=n7pX
-----END PGP SIGNATURE-----
Merge 4.9.42 into android-4.9
Changes in 4.9.42
parisc: Handle vma's whose context is not current in flush_cache_range
cgroup: create dfl_root files on subsys registration
cgroup: fix error return value from cgroup_subtree_control()
libata: array underflow in ata_find_dev()
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
iwlwifi: dvm: prevent an out of bounds access
brcmfmac: fix memleak due to calling brcmf_sdiod_sgtable_alloc() twice
NFSv4: Fix EXCHANGE_ID corrupt verifier issue
mmc: sdhci-of-at91: force card detect value for non removable devices
device property: Make dev_fwnode() public
mmc: core: Fix access to HS400-ES devices
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
ALSA: hda - Fix speaker output from VAIO VPCL14M1R
drm/amdgpu: Fix undue fallthroughs in golden registers initialization
ASoC: do not close shared backend dailink
KVM: async_pf: make rcu irq exit if not triggered from idle task
mm/page_alloc: Remove kernel address exposure in free_reserved_area()
timers: Fix overflow in get_next_timer_interrupt
powerpc/tm: Fix saving of TM SPRs in core dump
powerpc/64: Fix __check_irq_replay missing decrementer interrupt
iommu/amd: Enable ga_log_intr when enabling guest_mode
gpiolib: skip unwanted events, don't convert them to opposite edge
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4: fix overflow caused by missing cast in ext4_resize_fs()
ARM: dts: armada-38x: Fix irq type for pca955
ARM: dts: tango4: Request RGMII RX and TX clock delays
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
iscsi-target: Fix initial login PDU asynchronous socket close OOPs
mmc: dw_mmc: Use device_property_read instead of of_property_read
mmc: core: Use device_property_read instead of of_property_read
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
f2fs: sanity check checkpoint segno and blkoff
Btrfs: fix early ENOSPC due to delalloc
saa7164: fix double fetch PCIe access condition
tcp_bbr: cut pacing rate only if filled pipe
tcp_bbr: introduce bbr_bw_to_pacing_rate() helper
tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helper
tcp_bbr: remove sk_pacing_rate=0 transient during init
tcp_bbr: init pacing rate on first RTT sample
ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
net: Zero terminate ifr_name in dev_ifname().
ipv6: avoid overflow of offset in ip6_find_1stfragopt
net: dsa: b53: Add missing ARL entries for BCM53125
ipv4: initialize fib_trie prior to register_netdev_notifier call.
rtnetlink: allocate more memory for dev_set_mac_address()
mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled
openvswitch: fix potential out of bound access in parse_ct
packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
net: ethernet: nb8800: Handle all 4 RGMII modes identically
dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
dccp: fix a memleak for dccp_feat_init err process
sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
net/mlx5: Consider tx_enabled in all modes on remap
net/mlx5: Fix command bad flow on command entry allocation failure
net/mlx5e: Fix outer_header_zero() check size
net/mlx5e: Fix wrong delay calculation for overflow check scheduling
net/mlx5e: Schedule overflow check work to mlx5e workqueue
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
xen-netback: correctly schedule rate-limited queues
sparc64: Measure receiver forward progress to avoid send mondo timeout
sparc64: Fix exception handling in UltraSPARC-III memcpy.
wext: handle NULL extra data in iwe_stream_add_point better
sh_eth: fix EESIPR values for SH77{34|63}
sh_eth: R8A7740 supports packet shecksumming
net: phy: dp83867: fix irq generation
tg3: Fix race condition in tg3_get_stats64().
x86/boot: Add missing declaration of string functions
spi: spi-axi: Free resources on error path
ASoC: rt5645: set sel_i2s_pre_div1 to 2
netfilter: use fwmark_reflect in nf_send_reset
phy state machine: failsafe leave invalid RUNNING state
ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
clk/samsung: exynos542x: mark some clocks as critical
scsi: qla2xxx: Get mutex lock before checking optrom_state
drm/virtio: fix framebuffer sparse warning
ARM: dts: sun8i: Support DTB build for NanoPi M1
ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc
iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort
nbd: blk_mq_init_queue returns an error code on failure, not NULL
virtio_blk: fix panic in initialization error path
ARM: 8632/1: ftrace: fix syscall name matching
mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
lib/Kconfig.debug: fix frv build failure
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
mm: don't dereference struct page fields of invalid pages
net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroy
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
net: account for current skb length when deciding about UFO
net: phy: Fix PHY unbind crash
workqueue: implicit ordered attribute should be overridable
Linux 4.9.42
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 3c74541777302eec43a0d1327c4d58b8659a776b upstream.
While refactoring, f7b2814bb9 ("cgroup: factor out
cgroup_{apply|finalize}_control() from
cgroup_subtree_control_write()") broke error return value from the
function. The return value from the last operation is always
overridden to zero. Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7af608e4f9530372aec6e940552bf76595f2e265 upstream.
On subsystem registration, css_populate_dir() is not called on the new
root css, so the interface files for the subsystem on cgrp_dfl_root
aren't created on registration. This is a residue from the days when
cgrp_dfl_root was used only as the parking spot for unused subsystems,
which no longer is true as it's used as the root for cgroup2.
This is often fine as later operations tend to create them as a part
of mount (cgroup1) or subtree_control operations (cgroup2); however,
it's not difficult to mount cgroup2 with the controller interface
files missing as Waiman found out.
Fix it by invoking css_populate_dir() on the root css on subsys
registration.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllBNNMACgkQONu9yGCS
aT4rJA//XOtxp2nr01OqffSItCX0vq7vMofbjo7spI2LF3JyxKkx0hje0im82XoD
cYhZHyeK9lCHRZt2O541xOp3ITconecRvlt6uq49zKNvlFAelnp0rRTOktluJK/n
Xm7EP2KLHeHXv+rvCfzU0hDC9fvoZe82NIF7kddAWHX4D/K3C6Jw6FVc4/QPl2MG
11R6pylPtMGObgPEmfDlohBitcC3KawXAhIcyTTAr3rcuO2Wm00H+uF0VOpyD9hZ
S5A8ecuyZVQw3mIKAJ9vOwkVoln1E+/P/OltWEXElkBZONHpI4LB2IE7Nmwnx+pE
5oO3afNZIYOejX+GQvK2Apc8VONDV2VZRoIstzl+uaKDNGFaSLIu8AQ/UQfmoqTE
Pwrp/FxgDGjq6NOf/+MqNhiqS9433Xgmt4HT5xctzS0zFNGzLIokdjRH3x7n6cez
2JvdtRvJfdNq31kh6xQE/8oXuZktRSw3GxC5K4Dw3NRh02/9KNAgEjNPEMNXziGw
RhLhyXl8i5eGAu1gxruojzRyCvaMS6S8ZcAvSSoJze0kj8jfkZtQbBswvS1LF9EI
FFemshd1sQxUeZiXGYPuUUfyDvxvs7KRIqAYux8lsIhSQTu16REqsZRrzidpDP+K
xtH61YBWhwDHZKIoT+Jchmn8QCytI77MvFdGuTl+KRPV4OyyeJs=
=NE4C
-----END PGP SIGNATURE-----
Merge 4.9.32 into android-4.9
Changes in 4.9.32
bnx2x: Fix Multi-Cos
vxlan: eliminate cached dst leak
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
cxgb4: avoid enabling napi twice to the same queue
tcp: disallow cwnd undo when switching congestion control
vxlan: fix use-after-free on deletion
ipv6: Fix leak in ipv6_gso_segment().
net: ping: do not abuse udp_poll()
net/ipv6: Fix CALIPSO causing GPF with datagram support
net: ethoc: enable NAPI before poll may be scheduled
net: stmmac: fix completely hung TX when using TSO
net: bridge: start hello timer only if device is up
sparc64: Add __multi3 for gcc 7.x and later.
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
sparc: Machine description indices can vary
sparc64: reset mm cpumask after wrap
sparc64: combine activate_mm and switch_mm
sparc64: redefine first version
sparc64: add per-cpu mm of secondary contexts
sparc64: new context wrap
sparc64: delete old wrap code
arch/sparc: support NR_CPUS = 4096
serial: ifx6x60: fix use-after-free on module unload
ptrace: Properly initialize ptracer_cred on fork
crypto: asymmetric_keys - handle EBUSY due to backlog correctly
KEYS: fix dereferencing NULL payload with nonzero length
KEYS: fix freeing uninitialized memory in key_update()
KEYS: encrypted: avoid encrypting/decrypting stack buffers
crypto: drbg - wait for crypto op not signal safe
crypto: gcm - wait for crypto op not signal safe
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
nfsd4: fix null dereference on replay
nfsd: Fix up the "supattr_exclcreat" attributes
efi: Don't issue error message when booted under Xen
kvm: async_pf: fix rcu_irq_enter() with irqs enabled
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
arm64: KVM: Preserve RES1 bits in SCTLR_EL2
arm64: KVM: Allow unaligned accesses at EL2
arm: KVM: Allow unaligned accesses at HYP
KVM: async_pf: avoid async pf injection when in guest mode
KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt
KVM: arm/arm64: vgic-v2: Do not use Active+Pending state for a HW interrupt
dmaengine: usb-dmac: Fix DMAOR AE bit definition
dmaengine: ep93xx: Always start from BASE0
dmaengine: ep93xx: Don't drain the transfers in terminate_all()
dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly
dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
dmaengine: mv_xor_v2: enable XOR engine after its configuration
dmaengine: mv_xor_v2: fix tx_submit() implementation
dmaengine: mv_xor_v2: remove interrupt coalescing
dmaengine: mv_xor_v2: set DMA mask to 40 bits
cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
xen/privcmd: Support correctly 64KB page granularity when mapping memory
ext4: fix SEEK_HOLE
ext4: keep existing extra fields when inode expands
ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO
ext4: fix fdatasync(2) after extent manipulation operations
drm: Fix oops + Xserver hang when unplugging USB drm devices
usb: gadget: f_mass_storage: Serialize wake and sleep execution
usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
usb: chipidea: debug: check before accessing ci_role
staging/lustre/lov: remove set_fs() call from lov_getstripe()
iio: adc: bcm_iproc_adc: swap primary and secondary isr handler's
iio: light: ltr501 Fix interchanged als/ps register field
iio: proximity: as3935: fix AS3935_INT mask
iio: proximity: as3935: fix iio_trigger_poll issue
mei: make sysfs modalias format similar as uevent modalias
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
target: Re-add check to reject control WRITEs with overflow data
drm/msm: Expose our reservation object when exporting a dmabuf.
ahci: Acer SA5-271 SSD Not Detected Fix
cgroup: Prevent kill_css() from being called more than once
Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
cpuset: consider dying css as offline
fs: add i_blocksize()
ufs: restore proper tail allocation
fix ufs_isblockset()
ufs: restore maintaining ->i_blocks
ufs: set correct ->s_maxsize
ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
cxl: Fix error path on bad ioctl
cxl: Avoid double free_irq() for psl,slice interrupts
btrfs: use correct types for page indices in btrfs_page_exists_in_range
btrfs: fix memory leak in update_space_info failure path
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
scsi: qla2xxx: don't disable a not previously enabled PCI device
scsi: qla2xxx: Modify T262 FW dump template to specify same start/end to debug customer issues
scsi: qla2xxx: Set bit 15 for DIAG_ECHO_TEST MBC
scsi: qla2xxx: Fix mailbox pointer error in fwdump capture
powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
powerpc/numa: Fix percpu allocations to be NUMA aware
powerpc/hotplug-mem: Fix missing endian conversion of aa_index
powerpc/kernel: Fix FP and vector register restoration
powerpc/kernel: Initialize load_tm on task creation
perf/core: Drop kernel samples even though :u is specified
drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
drm/vmwgfx: Make sure backup_handle is always valid
drm/nouveau/tmr: fully separate alarm execution/pending lists
ALSA: timer: Fix race between read and ioctl
ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
ASoC: Fix use-after-free at card unregistration
cpu/hotplug: Drop the device lock on error
drivers: char: mem: Fix wraparound check to allow mappings up to the end
serial: sh-sci: Fix panic when serial console and DMA are enabled
arm64: traps: fix userspace cache maintenance emulation on a tagged pointer
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
arm64: entry: improve data abort handling of tagged pointers
ARM: 8636/1: Cleanup sanity_check_meminfo
ARM: 8637/1: Adjust memory boundaries after reservations
usercopy: Adjust tests to deal with SMAP/PAN
drm/i915/vbt: don't propagate errors from intel_bios_init()
drm/i915/vbt: split out defaults that are set when there is no VBT
cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
netfilter: nft_set_rbtree: handle element re-addition after deletion
Linux 4.9.32
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 33c35aa4817864e056fd772230b0c6b552e36ea2 upstream.
The kill_css() function may be called more than once under the condition
that the css was killed but not physically removed yet followed by the
removal of the cgroup that is hosting the css. This patch prevents any
harmm from being done when that happens.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.
Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X
2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.
3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.
In the future this behavior may be extended with a chain of
non-overridable programs.
Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.
Add several testcases and adjust libbpf.
Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28
("cgroup: add support for eBPF programs")
[AmitP: Refactored original patch for android-4.9 where libbpf sources
are in samples/bpf/ and test_cgrp2_attach2, test_cgrp2_sock,
and test_cgrp2_sock2 sample tests do not exist.]
(cherry picked from commit 7f677633379b4abb3281cdbe7e7006f049305c03)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Cherry-pick from commit 3007098494bec614fb55dee7bc0410bb7db5ad18
This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.
To illustrate the logic behind that, assume the following example
cgroup hierarchy.
A - B - C
\ D - E
If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.
Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlj5tWEACgkQONu9yGCS
aT4noA//ZKztJudzNoXOaYThKQrbKTJeB+cL4PusLqd7gv9yKt7IJD+5+V/+jYRS
N5GlwNOS02r6tduhCz9I/XjRKzilvC2WHt3Akf8nPC2dadhzcOW3UPvT0IEoIrHy
FkTnUn/1TvBA2I1uB8k93nk/jfx7RYrO+4axiyZ0XQCMJTc7M4MmPTGw2jrzxPug
Ogy7taNpY+By7PxJyU4rH8Lh2jiIYGYDrRWExZIcL7A7IlWBPbVkagzn2PPgQbg4
XX250JZ7bQyCGcrVxlTqaBScEkdICZTUqI3d3HQUwF39QBq9z+VlqP13T5j7XMea
d7jHZswWClQj4xeGWq+iiLR0VF5OVCTkbhl61YTI9rPFEaewSwMiGrpHiidKTaOK
KAxa34dwrz1IVI/p/a9DbZBj7IpwiH3jF3rEvx1ieEbTGHxLBi/COGK3aCES5TtX
mjz/WjReo4aR3fudgNFJqNsmJmgEJ3yAO7Lk3tglBmYOgUpQhbRwClPjYvXVphzH
ClYE1KCMMV4BTcDUqEfoBL46Qd2141E9Wxgmh+KYB+14X0QqQg+qRyZYyOvUGIuD
A7J7MbXyBuM0dXS2t9olLbHb+tllGz6lBHANf8g141ot8jjlCBoP5yjBP6W0gZP8
UsKSH/sLEPGMO3KD4D2lLixkoENilS+3DhJwmrrqZAhhaqlMPEI=
=/rwk
-----END PGP SIGNATURE-----
Merge 4.9.24 into android-4.9
Changes in 4.9.24:
cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
tcmu: Fix possible overwrite of t_data_sg's last iov[]
tcmu: Fix wrongly calculating of the base_command_size
tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
thp: fix MADV_DONTNEED vs. MADV_FREE race
thp: fix MADV_DONTNEED vs clear soft dirty race
zsmalloc: expand class bit
orangefs: free superblock when mount fails
drm/nouveau/mpeg: mthd returns true on success now
drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one
drm/etnaviv: fix missing unlock on error in etnaviv_gpu_submit()
CIFS: reconnect thread reschedule itself
CIFS: store results of cifs_reopen_file to avoid infinite wait
Input: xpad - add support for Razer Wildcat gamepad
perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
x86/efi: Don't try to reserve runtime regions
x86/signals: Fix lower/upper bound reporting in compat siginfo
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
x86/vdso: Ensure vdso32_enabled gets set to valid values only
x86/vdso: Plug race between mapping and ELF header setup
acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
ACPI / scan: Set the visited flag for all enumerated devices
parisc: fix bugs in pa_memcpy
efi/libstub: Skip GOP with PIXEL_BLT_ONLY format
efi/fb: Avoid reconfiguration of BAR that covers the framebuffer
iscsi-target: Fix TMR reference leak during session shutdown
iscsi-target: Drop work-around for legacy GlobalSAN initiator
scsi: sr: Sanity check returned mode data
scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
scsi: qla2xxx: Add fix to read correct register value for ISP82xx.
scsi: sd: Fix capacity calculation with 32-bit sector_t
target: Avoid mappedlun symlink creation during lun shutdown
xen, fbfront: fix connecting to backend
new privimitive: iov_iter_revert()
make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error
libnvdimm: fix blk free space accounting
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
can: ifi: use correct register to read rx status
pwm: rockchip: State of PWM clock should synchronize with PWM enabled state
cpufreq: Bring CPUs up even if cpufreq_online() failed
irqchip/irq-imx-gpcv2: Fix spinlock initialization
ftrace: Fix removing of second function probe
char: lack of bool string made CONFIG_DEVPORT always on
Revert "MIPS: Lantiq: Fix cascaded IRQ setup"
kvm: fix page struct leak in handle_vmon
zram: do not use copy_page with non-page aligned address
ftrace: Fix function pid filter on instances
crypto: algif_aead - Fix bogus request dereference in completion function
crypto: ahash - Fix EINPROGRESS notification callback
parisc: Fix get_user() for 64-bit value on 32-bit kernel
ath9k: fix NULL pointer dereference
dvb-usb-v2: avoid use-after-free
ext4: fix inode checksum calculation problem if i_extra_size is small
mm: memcontrol: use special workqueue for creating per-memcg caches
drm/nouveau/disp/mcp7x: disable dptmds workaround
nbd: use loff_t for blocksize and nbd_set_size args
nbd: fix 64-bit division
ASoC: Intel: select DW_DMAC_CORE since it's mandatory
platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event
x86/xen: Fix APIC id mismatch warning on Intel
ACPI / EC: Use busy polling mode when GPE is not enabled
rtc: tegra: Implement clock handling
mm: Tighten x86 /dev/mem with zeroing reads
dvb-usb: don't use stack for firmware load
dvb-usb-firmware: don't do DMA on stack
cxusb: Use a dma capable buffer also for reading
virtio-console: avoid DMA from stack
net: ipv6: check route protocol when deleting routes
sctp: deny peeloff operation on asocs with threads sleeping on it
Linux 4.9.24
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream.
Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator. Once the new kthread starts
running, it initializes itself and wakes up the creator. The creator
then can further configure the kthread and then let it start doing its
job by waking it up.
In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland. When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity. This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.
Unfortunately, the cgroup side of protection is racy. While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.
This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one. Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.
This patch adds task->no_cgroup_migration which disallows the task to
be migrated by userland. kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed. The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.
It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side. Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.
v2: Switch to a simpler implementation using a new task_struct bit
field suggested by Oleg.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-and-debugged-by: Chris Mason <clm@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07cd12945551b63ecb1a349d50a6d69d1d6feb4a upstream.
While refactoring cgroup creation, a5bca21520 ("cgroup: factor out
cgroup_create() out of cgroup_mkdir()") incorrectly onlined subsystems
before the new cgroup is associated with it kernfs_node. This is fine
for cgroup proper but cgroup_name/path() depend on the associated
kernfs_node and if a subsystem makes the new cgroup_subsys_state
visible, which they're allowed to after onlining, it can lead to NULL
dereference.
The current code performs cgroup creation and subsystem onlining in
cgroup_create() and cgroup_mkdir() makes the cgroup and subsystems
visible afterwards. There's no reason to online the subsystems early
and we can simply drop cgroup_apply_control_enable() call from
cgroup_create() so that the subsystems are onlined and made visible at
the same time.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: a5bca21520 ("cgroup: factor out cgroup_create() out of cgroup_mkdir()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Try to better match what we're pushing upstream, use CAP_SYS_RESOURCE
instead of CAP_SYS_NICE, which shoudln't affect Android as Zygote and
system_server already use CAP_SYS_RESOURCE.
Change-Id: I9b7ba2d9be1a469c9636497a6287f840891a91a8
Signed-off-by: John Stultz <john.stultz@linaro.org>
Android expects system_server to be able to move tasks between different
cgroups/cpusets, but does not want to be running as root. Let's relax
permission check so that processes can move other tasks if they have
CAP_SYS_NICE in the affected task's user namespace.
BUG=b:31790445,chromium:647994
TEST=Boot android container, examine logcat
Change-Id: Ia919c66ab6ed6a6daf7c4cf67feb38b13b1ad09b
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/394927
Reviewed-by: Ricky Zhou <rickyz@chromium.org>
Pull cgroup updates from Tejun Heo:
- tracepoints for basic cgroup management operations added
- kernfs and cgroup path formatting functions updated to behave in the
style of strlcpy()
- non-critical bug fixes
* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
cpuset: fix error handling regression in proc_cpuset_show()
cgroup: add tracepoints for basic operations
cgroup: make cgroup_path() and friends behave in the style of strlcpy()
kernfs: remove kernfs_path_len()
kernfs: make kernfs_path*() behave in the style of strlcpy()
kernfs: add dummy implementation of kernfs_path_from_node()
Pull namespace updates from Eric Biederman:
"This set of changes is a number of smaller things that have been
overlooked in other development cycles focused on more fundamental
change. The devpts changes are small things that were a distraction
until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an
trivial regression fix to autofs for the unprivileged mount changes
that went in last cycle. A pair of ioctls has been added by Andrey
Vagin making it is possible to discover the relationships between
namespaces when referring to them through file descriptors.
The big user visible change is starting to add simple resource limits
to catch programs that misbehave. With namespaces in general and user
namespaces in particular allowing users to use more kinds of
resources, it has become important to have something to limit errant
programs. Because the purpose of these limits is to catch errant
programs the code needs to be inexpensive to use as it always on, and
the default limits need to be high enough that well behaved programs
on well behaved systems don't encounter them.
To this end, after some review I have implemented per user per user
namespace limits, and use them to limit the number of namespaces. The
limits being per user mean that one user can not exhause the limits of
another user. The limits being per user namespace allow contexts where
the limit is 0 and security conscious folks can remove from their
threat anlysis the code used to manage namespaces (as they have
historically done as it root only). At the same time the limits being
per user namespace allow other parts of the system to use namespaces.
Namespaces are increasingly being used in application sand boxing
scenarios so an all or nothing disable for the entire system for the
security conscious folks makes increasing use of these sandboxes
impossible.
There is also added a limit on the maximum number of mounts present in
a single mount namespace. It is nontrivial to guess what a reasonable
system wide limit on the number of mount structure in the kernel would
be, especially as it various based on how a system is using
containers. A limit on the number of mounts in a mount namespace
however is much easier to understand and set. In most cases in
practice only about 1000 mounts are used. Given that some autofs
scenarious have the potential to be 30,000 to 50,000 mounts I have set
the default limit for the number of mounts at 100,000 which is well
above every known set of users but low enough that the mount hash
tables don't degrade unreaonsably.
These limits are a start. I expect this estabilishes a pattern that
other limits for resources that namespaces use will follow. There has
been interest in making inotify event limits per user per user
namespace as well as interest expressed in making details about what
is going on in the kernel more visible"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits)
autofs: Fix automounts by using current_real_cred()->uid
mnt: Add a per mount namespace limit on the number of mounts
netns: move {inc,dec}_net_namespaces into #ifdef
nsfs: Simplify __ns_get_path
tools/testing: add a test to check nsfs ioctl-s
nsfs: add ioctl to get a parent namespace
nsfs: add ioctl to get an owning user namespace for ns file descriptor
kernel: add a helper to get an owning user namespace for a namespace
devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts
devpts: Remove sync_filesystems
devpts: Make devpts_kill_sb safe if fsi is NULL
devpts: Simplify devpts_mount by using mount_nodev
devpts: Move the creation of /dev/pts/ptmx into fill_super
devpts: Move parse_mount_options into fill_super
userns: When the per user per user namespace limit is reached return ENOSPC
userns; Document per user per user namespace limits.
mntns: Add a limit on the number of mount namespaces.
netns: Add a limit on the number of net namespaces
cgroupns: Add a limit on the number of cgroup namespaces
ipcns: Add a limit on the number of ipc namespaces
...
4c737b41de ("cgroup: make cgroup_path() and friends behave in the
style of strlcpy()") broke error handling in proc_cgroup_show() and
cgroup_release_agent() by not handling negative return values from
cgroup_path_ns_locked(). Fix it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull cgroup fixes from Tejun Heo:
"Three late fixes for cgroup: Two cpuset ones, one trivial and the
other pretty obscure, and a cgroup core fix for a bug which impacts
cgroup v2 namespace users"
* 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix invalid controller enable rejections with cgroup namespace
cpuset: fix non static symbol warning
cpuset: handle race between CPU hotplug and cpuset_hotplug_work
On the v2 hierarchy, "cgroup.subtree_control" rejects controller
enables if the cgroup has processes in it. The enforcement of this
logic assumes that the cgroup wouldn't have any css_sets associated
with it if there are no tasks in the cgroup, which is no longer true
since a79a908fd2 ("cgroup: introduce cgroup namespaces").
When a cgroup namespace is created, it pins the css_set of the
creating task to use it as the root css_set of the namespace. This
extra reference stays as long as the namespace is around and makes
"cgroup.subtree_control" think that the namespace root cgroup is not
empty even when it is and thus reject controller enables.
Fix it by making cgroup_subtree_control() walk and test emptiness of
each css_set instead of testing whether the list_head is empty.
While at it, update the comment of cgroup_task_count() to indicate
that the returned value may be higher than the number of tasks, which
has always been true due to temporary references and doesn't break
anything.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Aditya Kali <adityakali@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: stable@vger.kernel.org # v4.6+
Fixes: a79a908fd2 ("cgroup: introduce cgroup namespaces")
Link: https://github.com/systemd/systemd/pull/3589#issuecomment-249089541
From: Andrey Vagin <avagin@openvz.org>
Each namespace has an owning user namespace and now there is not way
to discover these relationships.
Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.
Why we may want to know relationships between namespaces?
One use would be visualization, in order to understand the running
system. Another would be to answer the question: what capability does
process X have to perform operations on a resource governed by namespace
Y?
One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we are going to dump and restore nested namespaces.
There [1] was a discussion about which interface to choose to determing
relationships between namespaces.
Eric suggested to add two ioctl-s [2]:
> Grumble, Grumble. I think this may actually a case for creating ioctls
> for these two cases. Now that random nsfs file descriptors are bind
> mountable the original reason for using proc files is not as pressing.
>
> One ioctl for the user namespace that owns a file descriptor.
> One ioctl for the parent namespace of a namespace file descriptor.
Here is an implementaions of these ioctl-s.
$ man man7/namespaces.7
...
Since Linux 4.X, the following ioctl(2) calls are supported for
namespace file descriptors. The correct syntax is:
fd = ioctl(ns_fd, ioctl_type);
where ioctl_type is one of the following:
NS_GET_USERNS
Returns a file descriptor that refers to an owning user names‐
pace.
NS_GET_PARENT
Returns a file descriptor that refers to a parent namespace.
This ioctl(2) can be used for pid and user namespaces. For
user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
meaning.
In addition to generic ioctl(2) errors, the following specific ones
can occur:
EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
EPERM The requested namespace is outside of the current namespace
scope.
[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101
Changes for v2:
* don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
outside of the init namespace, so we can return EPERM in this case too.
> The fewer special cases the easier the code is to get
> correct, and the easier it is to read. // Eric
Changes for v3:
* rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: "W. Trevor King" <wking@tremily.us>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Return -EPERM if an owning user namespace is outside of a process
current user namespace.
v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
This special cases was removed from this version. There is nothing
outside of init_user_ns, so we can return EPERM.
v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The current error codes returned when a the per user per user
namespace limit are hit (EINVAL, EUSERS, and ENFILE) are wrong. I
asked for advice on linux-api and it we made clear that those were
the wrong error code, but a correct effor code was not suggested.
The best general error code I have found for hitting a resource limit
is ENOSPC. It is not perfect but as it is unambiguous it will serve
until someone comes up with a better error code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.
Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.
The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.
Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit:
80127a3968 ("locking/percpu-rwsem: Optimize readers and reduce global impact").
We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Colin Cross <ccross@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Link: http://lkml.kernel.org/r/20160811165413.GA22807@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Debugging what goes wrong with cgroup setup can get hairy. Add
tracepoints for cgroup hierarchy mount, cgroup creation/destruction
and task migration operations for better visibility.
Signed-off-by: Tejun Heo <tj@kernel.org>
cgroup_path() and friends used to format the path from the end and
thus the resulting path usually didn't start at the start of the
passed in buffer. Also, when the buffer was too small, the partial
result was truncated from the head rather than tail and there was no
way to tell how long the full path would be. These make the functions
less robust and more awkward to use.
With recent updates to kernfs_path(), cgroup_path() and friends can be
made to behave in strlcpy() style.
* cgroup_path(), cgroup_path_ns[_locked]() and task_cgroup_path() now
always return the length of the full path. If buffer is too small,
it contains nul terminated truncated output.
* All users updated accordingly.
v2: cgroup_path() usage in kernel/sched/debug.c converted.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Pull more cgroup updates from Tejun Heo:
"I forgot to include the patches which got applied to for-4.7-fixes
late during last cycle.
Eric's three patches fix bugs introduced with the namespace support"
* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroupns: Only allow creation of hierarchies in the initial cgroup namespace
cgroupns: Close race between cgroup_post_fork and copy_cgroup_ns
cgroupns: Fix the locking in copy_cgroup_ns
Pull networking updates from David Miller:
1) Unified UDP encapsulation offload methods for drivers, from
Alexander Duyck.
2) Make DSA binding more sane, from Andrew Lunn.
3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.
4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.
5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
packets as soon as the device sees them, with the option to mirror
the packet on TX via the same interface. From Brenden Blanco and
others.
6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.
7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.
8) Simplify netlink conntrack entry layout, from Florian Westphal.
9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
Schimmel, Yotam Gigi, and Jiri Pirko.
10) Add SKB array infrastructure and convert tun and macvtap over to it.
From Michael S Tsirkin and Jason Wang.
11) Support qdisc packet injection in pktgen, from John Fastabend.
12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.
13) Add NV congestion control support to TCP, from Lawrence Brakmo.
14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.
15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.
16) Support MPLS over IPV4, from Simon Horman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
xgene: Fix build warning with ACPI disabled.
be2net: perform temperature query in adapter regardless of its interface state
l2tp: Correctly return -EBADF from pppol2tp_getname.
net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
net: ipmr/ip6mr: update lastuse on entry change
macsec: ensure rx_sa is set when validation is disabled
tipc: dump monitor attributes
tipc: add a function to get the bearer name
tipc: get monitor threshold for the cluster
tipc: make cluster size threshold for monitoring configurable
tipc: introduce constants for tipc address validation
net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
MAINTAINERS: xgene: Add driver and documentation path
Documentation: dtb: xgene: Add MDIO node
dtb: xgene: Add MDIO node
drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
drivers: net: xgene: Use exported functions
drivers: net: xgene: Enable MDIO driver
drivers: net: xgene: Add backward compatibility
drivers: net: phy: xgene: Add MDIO driver
...
Pull cgroup updates from Tejun Heo:
"Nothing too exciting.
- updates to the pids controller so that pid limit breaches can be
noticed and monitored from userland.
- cleanups and non-critical bug fixes"
* 'for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: remove duplicated include from cgroup.c
cgroup: Use lld instead of ld when printing pids controller events_limit
cgroup: Add pids controller event when fork fails because of pid limit
cgroup: allow NULL return from ss->css_alloc()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root