We added the SIMPLIFIED_ENERGY_TABLE config to prevent conflicts
with existing energy model. If this config is enabled, the codes for
legacy energy model is unactivated.
Change-Id: I9b6d48334b32a50fd4cc8029d8eab85adb623a9d
Signed-off-by: Park Bumgyu <bumgyu.park@samsung.com>
Rename Exynos scheduler. Existing EHMP(Exynos HMP) was a scheduler
considering only HMP chipset. EMS(Exynos Mobile Scheduler) will support
all chipsets regardless of cluster and core configuration.
Change-Id: I2802ddcd9e401a0d92f9c98656b5e591d429d6ce
Signed-off-by: Park Bumgyu <bumgyu.park@samsung.com>
Cpu selection logic change for fluid scheduling.
Fluid RT Scheduler consider task load for core selection
and it can victim higher priority rt task.
because Fluid RT scheduler make a decision through
weighted load by priority.
Change-Id: I40c3c93d4fcf985b0002796748037cacd04e813e
Signed-off-by: Johnlay Park <jonglae.park@samsung.com>
RELR is a relocation packing format for relative relocations.
The format is described in a generic-abi proposal:
https://groups.google.com/d/topic/generic-abi/bX460iggiKg/discussion
The LLD linker can be instructed to pack relocations in the RELR
format by passing the flag --pack-dyn-relocs=relr.
This patch adds a new config option, CONFIG_RELR. Enabling this option
instructs the linker to pack vmlinux's relative relocations in the RELR
format, and causes the kernel to apply the relocations at startup along
with the RELA relocations. RELA relocations still need to be applied
because the linker will emit RELA relative relocations if they are
unrepresentable in the RELR format (i.e. address not a multiple of 2).
Enabling CONFIG_RELR reduces the size of a defconfig kernel image
with CONFIG_RANDOMIZE_BASE by 3.5MB/16% uncompressed, or 550KB/5%
compressed (lz4).
Signed-off-by: Peter Collingbourne <pcc@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 5cf896fb6be3effd9aea455b22213e27be8bdb1d)
Bug: 137200966
Test: booted defconfig + CONFIG_RELR kernel on qemu
Change-Id: I4c55bf5b10bc6c934543c651eca9fc8e260ffc6d
[pcc: 4.14 lacks support for running shell commands in Kconfig, so
CONFIG_TOOLS_SUPPORT_RELR is made user-configurable and users must
self-declare tool support]
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Rapherion Rollerscaperers <rapherion@raphielgang.org>
Signed-off-by: Twisted <36546624+TwistedPrime@users.noreply.github.com>
Signed-off-by: MadeOfGreat <ravenklawasd@gmail.com>
Signed-off-by: Jprimero15 <jprimero155@gmail.com>
When doing Clang builds of the kernel, it is possible to link with
either ld.bfd (binutils) or ld.lld (LLVM), but it is not possible to
discover this from a running kernel. Add the "$LD -v" output to
/proc/version.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Bug: 153484457
(cherry picked from commit 6f04f056df3c
https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
for-next)
[nd: commit 4b950bb9ac0c ("Kbuild: Handle PREEMPT_RT for version string and magic")
missing in 4.14, first landed in 5.4-rc1.
commit b79c6aa6a1f1 ("kbuild: remove unnecessary in-subshell execution")
missing in 4.14, first landed in 5.1-rc1.
]
Change-Id: Ifa5a98fe159392862e8d07a733c0f141fa9c7715
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Differentiate the MS_* flags passed to mount(2) from the internal flags set
in the super_block's s_flags. s_flags are now called SB_*, with the names
and the values for the moment mirroring the MS_* flags that they're
equivalent to.
In this patch, just the headers are altered and some kernel code where
blind automated conversion isn't necessarily correct.
Note that this shows up some interesting issues:
(1) Some MS_* flags get translated to MNT_* flags (such as MS_NODEV ->
MNT_NODEV) without passing this on to the filesystem, but some
filesystems set such flags anyway.
(2) The ->remount_fs() methods of some filesystems adjust the *flags
argument by setting MS_* flags in it, such as MS_NOATIME - but these
flags are then scrubbed by do_remount_sb() (only the occupants of
MS_RMT_MASK are permitted: MS_RDONLY, MS_SYNCHRONOUS, MS_MANDLOCK,
MS_I_VERSION and MS_LAZYTIME)
I'm not sure what's the best way to solve all these cases.
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Firstly by applying the following with coccinelle's spatch:
@@ expression SB; @@
-SB->s_flags & MS_RDONLY
+sb_rdonly(SB)
to effect the conversion to sb_rdonly(sb), then by applying:
@@ expression A, SB; @@
(
-(!sb_rdonly(SB)) && A
+!sb_rdonly(SB) && A
|
-A != (sb_rdonly(SB))
+A != sb_rdonly(SB)
|
-A == (sb_rdonly(SB))
+A == sb_rdonly(SB)
|
-!(sb_rdonly(SB))
+!sb_rdonly(SB)
|
-A && (sb_rdonly(SB))
+A && sb_rdonly(SB)
|
-A || (sb_rdonly(SB))
+A || sb_rdonly(SB)
|
-(sb_rdonly(SB)) != A
+sb_rdonly(SB) != A
|
-(sb_rdonly(SB)) == A
+sb_rdonly(SB) == A
|
-(sb_rdonly(SB)) && A
+sb_rdonly(SB) && A
|
-(sb_rdonly(SB)) || A
+sb_rdonly(SB) || A
)
@@ expression A, B, SB; @@
(
-(sb_rdonly(SB)) ? 1 : 0
+sb_rdonly(SB)
|
-(sb_rdonly(SB)) ? A : B
+sb_rdonly(SB) ? A : B
)
to remove left over excess bracketage and finally by applying:
@@ expression A, SB; @@
(
-(A & MS_RDONLY) != sb_rdonly(SB)
+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
|
-(A & MS_RDONLY) == sb_rdonly(SB)
+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
)
to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.
Signed-off-by: David Howells <dhowells@redhat.com>
arch/arc/Makefile overrides -O2 with -O3. This is the only user of
ARCH_CFLAGS. There is no user of ARCH_CPPFLAGS or ARCH_AFLAGS.
My plan is to remove ARCH_{CPP,A,C}FLAGS after refactoring the ARC
Makefile.
Currently, ARC has no way to enable -Wmaybe-uninitialized because both
-O3 and -Os disable it. Enabling it will be useful for compile-testing.
This commit allows allmodconfig (, which defaults to -O2) to enable it.
Add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y to all the defconfig files
in arch/arc/configs/ in order to keep the current config settings.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
This reverts commit d342ee64906fe8cb0a36ec8063ae4118aeb57620.
The WALT code is updated to handle the CFS tasks throttling and
unthrottling. So remove the limitation of disabling WALT to
enable CFS_BANDWIDTH feature.
Bug: 139071966
Change-Id: I3b3a17cd3e552b6238cfba24c9251b7b83e2bda3
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Todd Kjos <tkjos@google.com>
Make the anon_inodes facility unconditional so that it can be used by core
VFS code.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit dadd2299ab61fc2b55b95b7b3a8f674cdd3b69c9)
Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I2f97bda4f360d8d05bbb603de839717b3d8067ae
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
The WALT time accounting breaks when CFS tasks are throttled by the CPU
bandwidth control mechanism of the CPU cgroups controller. This can
result in a negative cumulative_runnable_avg, which can then lead to a
kernel panic, and the device crashing.
Although the right fix would be add support for throttled CFS tasks to
WALT, the common kernel is now in stable maintenance mode and will not
get new features which could cause issues for partners downstream.
To work around the issue, make the CFS_BANDWIDTH Kconfig option depend
on SCHED_WALT=n, hence preventing these two things from being enabled
simultaneously. This should not be an issue for most partners (nobody
had noticed the breakage for years), and those who do need the better
fix can apply it in their device kernel.
Bug: 139071966
Bug: 120440300
Change-Id: Ieb3c367ae7893ac93fb5b38c1580dc59151aacce
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzdoVMACgkQONu9yGCS
aT4qIQ//XMYZIYkQGwN57TzWVqaE1TqutVHuAnc+k111hEl4Af9bKedyOQKoA7EF
WzWau8wUKZsoAegoHbao29Ow5JE6iO0Qy9hjgRIbHRQA+IziPmYbJZ/zE2KBZFyH
LlC3cxcCujABPtdfSq8/ChXpvtcQAtvGLWwpppHxIhYUC0WPTlzdMYqCDR6DxJbQ
Z1KKTjR4uEH2d7XwsRsBfoEeYiHL4Wi33ZFjN3IeXOjNiN54QIdH3INsEkOlugxE
NmoZrkyMzh8wkVlhAhpimREI5+0YSjxyEBbi7Qdo76+dK7NHdR6AHMh4/l/i8Ytc
qMK7tgXQ0VjZlsfZ1a803XEcM0TY/j4zGdSInpmKOP5uR1IsBz+THidKNlwmB4ZT
AZjzgfLz/lmRm8W58SSSL1y9g1sNFJcziUjAq0j24kWseeTRd1gcooZP+RJBtHhN
zGnG54l67RW1POFK+1sAIUElHrx36FMrAD3ae3nwUW7I91XO65ODS59S6xJJMzjl
qKV6jm7ZhqvyCJKkxWDq9pSBE5EMccaFQpnzy1oQhxh1nFLaK9+jsAkbfey6tAjC
d0hsqgUXRF8yFGUSE/Nd7LEn8Pf9lFl8GTrsF13xLXXuWknsFulmOnu8h8nNPnDG
i8Os9NfS7VImodPCvGAXF+ACtlbzCiTYJhebiZw8Ak0+vSeURyQ=
=O1S8
-----END PGP SIGNATURE-----
Merge 4.9.177 into android-4.9-q
Changes in 4.9.177
netfilter: compat: initialize all fields in xt_init
bpf: fix struct htab_elem layout
bpf: convert htab map to hlist_nulls
platform/x86: sony-laptop: Fix unintentional fall-through
USB: serial: fix unthrottle races
iio: adc: xilinx: fix potential use-after-free on remove
libnvdimm/namespace: Fix a potential NULL pointer dereference
HID: input: add mapping for Expose/Overview key
HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys
HID: input: add mapping for "Toggle Display" key
libnvdimm/btt: Fix a kmemdup failure check
s390/dasd: Fix capacity calculation for large volumes
mac80211: fix unaligned access in mesh table hash function
s390/3270: fix lockdep false positive on view->lock
mISDN: Check address length before reading address family
x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T
KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
tools lib traceevent: Fix missing equality check for strcmp
init: initialize jump labels before command line option parsing
selftests: netfilter: check icmp pkttoobig errors are set as related
ipvs: do not schedule icmp errors from tunnels
MIPS: perf: ath79: Fix perfcount IRQ assignment
s390: ctcm: fix ctcm_new_device error return code
drm/sun4i: Set device driver data at bind time for use in unbind
selftests/net: correct the return value for run_netsocktests
gpu: ipu-v3: dp: fix CSC handling
spi: Micrel eth switch: declare missing of table
spi: ST ST95HF NFC: declare missing of table
Input: synaptics-rmi4 - fix possible double free
cw1200: fix missing unlock on error in cw1200_hw_scan()
ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command
rtlwifi: rtl8723ae: Fix missing break in switch statement
Don't jump to compute_result state from check_result state
Revert "x86/vdso: Drop implicit common-page-size linker flag"
Revert "x86: vdso: Use $LD instead of $CC to link"
x86: vdso: Use $LD instead of $CC to link
x86/vdso: Drop implicit common-page-size linker flag
x86/vdso: Pass --eh-frame-hdr to the linker
powerpc/64s: Include cpu header
bridge: Fix error path for kobject_init_and_add()
fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied
net: ucc_geth - fix Oops when changing number of buffers in the ring
packet: Fix error path in packet_init
vlan: disable SIOCSHWTSTAMP in container
vrf: sit mtu should not be updated when vrf netdev is the link
ipv4: Fix raw socket lookup for local traffic
bonding: fix arp_validate toggling in active-backup mode
drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl
drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl
powerpc/lib: fix book3s/32 boot failure due to code patching
powerpc/booke64: set RI in default MSR
Linux 4.9.177
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 6041186a32585fc7a1d0f6cfe2f138b05fdc3c82 ]
When a module option, or core kernel argument, toggles a static-key it
requires jump labels to be initialized early. While x86, PowerPC, and
ARM64 arrange for jump_label_init() to be called before parse_args(),
ARM does not.
Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
page_alloc_shuffle+0x12c/0x1ac
static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
before call to jump_label_init()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
Hardware name: ARM Integrator/CP (Device Tree)
[<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18)
[<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24)
[<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108)
[<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c)
[<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>]
(page_alloc_shuffle+0x12c/0x1ac)
[<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48)
[<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350)
[<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488)
Move the fallback call to jump_label_init() to occur before
parse_args().
The redundant calls to jump_label_init() in other archs are left intact
in case they have static key toggling use cases that are even earlier
than option parsing.
Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Guenter Roeck <groeck@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
The current help text caused some confusion in online forums about
whether or not to default-enable or default-disable psi in vendor
kernels. This is because it doesn't communicate the reason for why we
made this setting configurable in the first place: that the overhead is
non-zero in an artificial scheduler stress test.
Since this isn't representative of real workloads, and the effect was
not measurable in scheduler-heavy real world applications such as the
webservers and memcache installations at Facebook, it's fair to point
out that this is a pretty cautious option to select.
Link: http://lkml.kernel.org/r/20190129233617.16767-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7b2489d37e1e355228f7c55724f77580e1dec22a)
Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I5d0cb901562fd74c82d9d211544745b802776d8a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED
help text contradicts the documentation in kernel-parameters.txt, and
the code. Fix that.
Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org
Fixes: e0c274472d ("psi: make disabling/enabling easier for vendor kernels")
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 428a1cb4baeb9e5c7feda93af7372ba6d2491558)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Change-Id: I008b79786ccc6798f0a35f53dd29f4ca013c53a5
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.
With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge. Do the following things to
make it easier:
1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
to enable CONFIG_PSI in their kernel but leave the feature disabled
unless a user requests it at boot-time.
To avoid double negatives, rename psi_disabled= to psi=.
2. Make psi_disabled a static branch to eliminate any branch costs
when the feature is disabled.
In terms of numbers before and after this patch, Mel says:
: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
: 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4
: kconfigdisable-v1r1 vanilla psidisable-v1r1
: Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%)
: Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%)
: Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%)
: Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%)
: Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%)
: Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%)
: Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%)
: Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%)
: Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;
Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e0c274472d5d27f277af722e017525e0b33784cd)
Conflicts:
Documentation/admin-guide/kernel-parameters.txt
1. Changes in Documentation/admin-guide/kernel-parameters.txt are moved
into Documentation/kernel-parameters.txt
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Change-Id: I34416d48137f67b23473e0f90a3ebefb66a6b36d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
On a system that executes multiple cgrouped jobs and independent
workloads, we don't just care about the health of the overall system, but
also that of individual jobs, so that we can ensure individual job health,
fairness between jobs, or prioritize some jobs over others.
This patch implements pressure stall tracking for cgroups. In kernels
with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure,
and io.pressure files that track aggregate pressure stall times for only
the tasks inside the cgroup.
Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2ce7135adc9ad081aa3c49744144376ac74fea60)
Conflicts:
Documentation/cgroup-v2.txt
include/linux/cgroup.h
kernel/cgroup/cgroup.c
(1. manual merge from Documentation/admin-guide/cgroup-v2.rst
2. manually merged changes from kernel/cgroup/cgroup.c into kernel/cgroup.c
3. manual merge in css_free_work_fn to allow psi support only for cgroup v2
4. manual merge in cgroup_create to allow psi support only for cgroup v2)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I2ffb9d50ba87f8b7655bed215a625784e098879c
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills. In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.
In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.
A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively. Stall states are aggregate versions of the per-task delay
accounting delays:
cpu: some tasks are runnable but not executing on a CPU
memory: tasks are reclaiming, or waiting for swapin or thrashing cache
io: tasks are waiting for io completions
These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit. They can also indicate when the system
is approaching lockup scenarios and OOMs.
To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states. Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime. A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).
[hannes@cmpxchg.org: doc fixlet, per Randy]
Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I54a65620b3ed6f8172fdec789a237a99f8c82156
Add a new page flag, PageWaiters, to indicate the page waitqueue has
tasks waiting. This can be tested rather than testing waitqueue_active
which requires another cacheline load.
This bit is always set when the page has tasks on page_waitqueue(page),
and is set and cleared under the waitqueue lock. It may be set when
there are no tasks on the waitqueue, which will cause a harmless extra
wakeup check that will clears the bit.
The generic bit-waitqueue infrastructure is no longer used for pages.
Instead, waitqueues are used directly with a custom key type. The
generic code was not flexible enough to have PageWaiters manipulation
under the waitqueue lock (which simplifies concurrency).
This improves the performance of page lock intensive microbenchmarks by
2-3%.
Putting two bits in the same word opens the opportunity to remove the
memory barrier between clearing the lock bit and testing the waiters
bit, after some work on the arch primitives (e.g., ensuring memory
operand widths match and cover both bits).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 62906027091f1d02de44041524f0769f60bb9cf3)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ibc97545d99846a672ebb4838cdc7dabef4add611
Workqueue is currently initialized in an early init call; however,
there are cases where early boot code has to be split and reordered to
come after workqueue initialization or the same code path which makes
use of workqueues is used both before workqueue initailization and
after. The latter cases have to gate workqueue usages with
keventd_up() tests, which is nasty and easy to get wrong.
Workqueue usages have become widespread and it'd be a lot more
convenient if it can be used very early from boot. This patch splits
workqueue initialization into two steps. workqueue_init_early() which
sets up the basic data structures so that workqueues can be created
and work items queued, and workqueue_init() which actually brings up
workqueues online and starts executing queued work items. The former
step can be done very early during boot once memory allocation,
cpumasks and idr are initialized. The latter right after kthreads
become available.
This allows work item queueing and canceling from very early boot
which is what most of these use cases want.
* As systemd_wq being initialized doesn't indicate that workqueue is
fully online anymore, update keventd_up() to test wq_online instead.
The follow-up patches will get rid of all its usages and the
function itself.
* Flushing doesn't make sense before workqueue is fully initialized.
The flush functions trigger WARN and return immediately before fully
online.
* Work items are never in-flight before fully online. Canceling can
always succeed by skipping the flush step.
* Some code paths can no longer assume to be called with irq enabled
as irq is disabled during early boot. Use irqsave/restore
operations instead.
v2: Watchdog init, which requires timer to be running, moved from
workqueue_init_early() to workqueue_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
(cherry picked from commit 3347fa0928210d96aaa2bd6cd5a8391d5e630873)
Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6348ea0969efd7142d5e61aa350904834a82d229
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlvm/IUACgkQONu9yGCS
aT6mrw//ctcqOR9aZYTODrVHFZ4puE2xhae5Hr+hwtcE2WSjHWuxJfVkrEuJGlIH
4oQpUfek+eYf3yZy8Iw9WLZH1+P3evGkR0G4gBD/A4f25qCKCcHEXOAPiKgeadnC
tj49fEkiJgO3I9vRx8yJnUvhxR/Br5CTOUMdTYsWHbCsdewzCMHWlwpJhLwV053j
P9cCrpfJLD55HDdj/jwcn2jfooIVfYsYkut8jP0qTKI04rWEZgOrCSjahN8KHtQ5
GgykDU7db8mmP1IhM+bhGuQReSX7myx/MGx5dS7Mli+5aUtYCMlkqylpL96NuBbe
axFpie4nBTny6dIHXodZx59J/T1ERBws9zLzKF1oyxANHEeTiO7q+hbaw9vRLN5G
mNWyn0KZ8T0+BWSL1pyA+oVwZkjOcMDil5Gz7Y7A9kE4xj5grrl5IevAtSD6tb9X
zwAk5hjvaBmZVVM9NgbG2bGATPNLnv1l57TCRjsx91p9uzReg8gYxNrijIwGqGip
HrR/HJvgfI9Df52X8JtGfs+397mXevxl1Lo56Pv1nkagkD1fvhqFLRZgd3y1MoIO
DNjdUohw0tBorHqdpvgnZnifuwk3AcPiCMqqfCcGwkcguoM8XFhedTkTPrut5+f4
IPK0Qh25lcT9k+GHJUvDOEzQvx4CGcG8uVj0FgiebWdlS3KZ56s=
=0M4P
-----END PGP SIGNATURE-----
Merge 4.9.136 into android-4.9
Also revert commit b91d532928df ("ipv6: set rt6i_protocol properly in
the route when it is installed") as it breaks the test systems.
Changes in 4.9.136
xfrm: Validate address prefix lengths in the xfrm selector.
xfrm6: call kfree_skb when skb is toobig
mac80211: Always report TX status
cfg80211: reg: Init wiphy_idx in regulatory_hint_core()
mac80211: fix pending queue hang due to TX_DROP
cfg80211: Address some corner cases in scan result channel updating
mac80211: TDLS: fix skb queue/priority assignment
ARM: 8799/1: mm: fix pci_ioremap_io() offset check
xfrm: validate template mode
ARM: dts: BCM63xx: Fix incorrect interrupt specifiers
net: macb: Clean 64b dma addresses if they are not detected
soc: fsl: qbman: qman: avoid allocating from non existing gen_pool
soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift()
nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT
mac80211_hwsim: do not omit multicast announce of first added radio
Bluetooth: SMP: fix crash in unpairing
pxa168fb: prepare the clock
qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor
qed: Avoid constant logical operation warning in qed_vf_pf_acquire
asix: Check for supported Wake-on-LAN modes
ax88179_178a: Check for supported Wake-on-LAN modes
lan78xx: Check for supported Wake-on-LAN modes
sr9800: Check for supported Wake-on-LAN modes
r8152: Check for supported Wake-on-LAN Modes
smsc75xx: Check for Wake-on-LAN modes
smsc95xx: Check for Wake-on-LAN modes
perf/ring_buffer: Prevent concurent ring buffer access
perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
net: fec: fix rare tx timeout
declance: Fix continuation with the adapter identification message
net: cxgb3_main: fix a missing-check bug
perf symbols: Fix memory corruption because of zero length symbols
mm/memory_hotplug.c: fix overflow in test_pages_in_a_zone()
MIPS: microMIPS: Fix decoding of swsp16 instruction
MIPS: Handle non word sized instructions when examining frame
scsi: aacraid: Fix typo in blink status
f2fs: fix multiple f2fs_add_link() having same name for inline dentry
igb: Remove superfluous reset to PHY and page 0 selection
ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs
PCI: Disable MSI for HiSilicon Hip06/Hip07 only in Root Port mode
i2c: bcm2835: Avoid possible NULL ptr dereference
efi/fb: Correct PCI_STD_RESOURCE_END usage
ipv6: set rt6i_protocol properly in the route when it is installed
platform/x86: acer-wmi: setup accelerometer when ACPI device was found
IB/ipoib: Do not warn if IPoIB debugfs doesn't exist
IB/core: Fix the validations of a multicast LID in attach or detach operations
orangefs: off by ones in xattr size checks
rxe: Fix a sleep-in-atomic bug in post_one_send
nvme-pci: fix CMB sysfs file removal in reset path
net: phy: marvell: Limit 88m1101 autoneg errata to 88E1145 as well.
net/mlx5: Fix command completion after timeout access invalid structure
tipc: Fix tipc_sk_reinit handling of -EAGAIN
tipc: fix a race condition of releasing subscriber object
bnxt_en: Don't use rtnl lock to protect link change logic in workqueue.
ath10k: fix NAPI enable/disable symmetry for AHB interface
ARM: dts: bcm283x: Reserve first page for firmware
btrfs: fiemap: Cache and merge fiemap extent before submit it to user
ata: sata_rcar: Handle return value of clk_prepare_enable
reset: hi6220: Set module license so that it can be loaded
ASoC: Intel: Skylake: Fix to parse consecutive string tkns in manifest
arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5
mac80211: fix TX aggregation start/stop callback race
libata: fix error checking in in ata_parse_force_one()
net: ethernet: stmmac: Fix altr_tse_pcs SGMII Initialization
qlcnic: Fix tunnel offload for 82xx adapters
x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC
ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
gpu: ipu-v3: Fix CSI selection for VDIC
elevator: fix truncation of icq_cache_name
net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io
ufs: we need to sync inode before freeing it
net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
ip6_tunnel: Correct tos value in collect_md mode
net/mlx5: Fix driver load error flow when firmware is stuck
perf evsel: Fix probing of precise_ip level for default cycles event
perf probe: Fix probe definition for inlined functions
net/mlx5: Fix health work queue spin lock to IRQ safe
usb: renesas_usbhs: gadget: fix spin_lock_init() for &uep->lock
usb: renesas_usbhs: gadget: fix unused-but-set-variable warning
usb: dwc3: omap: remove IRQ_NOAUTOEN used with shared irq
clk: samsung: Fix m2m scaler clock on Exynos542x
ptr_ring: fix up after recent ptr_ring changes
staging: wilc1000: Fix problem with wrong vif index
rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp
iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications"
qed: Warn PTT usage by wrong hw-function
ocfs2: fix deadlock caused by recursive locking in xattr
net: cdc_ncm: GetNtbFormat endian fix
sctp: use right member as the param of list_for_each_entry
ALSA: hda - No loopback on ALC299 codec
ath10k: convert warning about non-existent OTP board id to debug message
ipv6: fix cleanup ordering for ip6_mr failure
IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
IB/rxe: put the pool on allocation failure
nbd: only set MSG_MORE when we have more to send
mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'
IB/mlx5: Avoid passing an invalid QP type to firmware
scsi: qla2xxx: Avoid double completion of abort command
drm: bochs: Don't remove uninitialized fbdev framebuffer
i40e: avoid NVM acquire deadlock during NVM update
Revert "IB/ipoib: Update broadcast object if PKey value was changed in index 0"
Btrfs: incremental send, fix invalid memory access
drm/msm: Fix possible null dereference on failure of get_pages()
module: fix DEBUG_SET_MODULE_RONX typo
iio: pressure: zpa2326: Remove always-true check which confuses gcc
l2tp: remove configurable payload offset
macsec: fix memory leaks when skb_to_sgvec fails
perf/core: Fix locking for children siblings group read
cifs: Use ULL suffix for 64-bit constant
futex: futex_wake_op, do not fail on invalid op
ALSA: hda - Fix incorrect usage of IS_REACHABLE()
test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches
xen-netfront: Update features after registering netdev
sparc64: Fix regression in pmdp_invalidate().
xen-netfront: Fix mismatched rtnl_unlock
enic: do not overwrite error code
bonding: ratelimit failed speed/duplex update warning
nvmet: fix space padding in serial number
iio: buffer: fix the function signature to match implementation
x86/paravirt: Fix some warning messages
IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()'
libertas: call into generic suspend code before turning off power
xhci: Fix USB3 NULL pointer dereference at logical disconnect.
perf tests: Fix indexing when invoking subtests
ARM: dts: imx53-qsb: disable 1.2GHz OPP
rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window()
rxrpc: Only take the rwind and mtu values from latest ACK
net: ena: fix NULL dereference due to untimely napi initialization
fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()
mtd: spi-nor: Add support for is25wp series chips
Revert "netfilter: ipv6: nf_defrag: drop skb dst before queueing"
perf tools: Disable parallelism for 'make clean'
bridge: do not add port to router list when receives query with source 0.0.0.0
net: bridge: remove ipv6 zero address check in mcast queries
ipv6: mcast: fix a use-after-free in inet6_mc_check
ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
llc: set SOCK_RCU_FREE in llc_sap_add_socket()
net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
net: sched: gred: pass the right attribute to gred_change_table_def()
net: socket: fix a missing-check bug
net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
net: udp: fix handling of CHECKSUM_COMPLETE packets
r8169: fix NAPI handling under high load
sctp: fix race on sctp_id2asoc
vhost: Fix Spectre V1 vulnerability
ethtool: fix a privilege escalation bug
bonding: fix length of actor system
net: drop skb on failure in ip_check_defrag()
net: fix pskb_trim_rcsum_slow() with odd trim offset
rtnetlink: Disallow FDB configuration for non-Ethernet device
ip6_tunnel: Fix encapsulation layout
Revert "x86/mm: Expand static page table for fixmap space"
crypto: shash - Fix a sleep-in-atomic bug in shash_setkey_unaligned
ahci: don't ignore result code of ahci_reset_controller()
gpio: mxs: Get rid of external API call
xfs: truncate transaction does not modify the inobt
cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)
ptp: fix Spectre v1 vulnerability
drm/edid: Add 6 bpc quirk for BOE panel in HP Pavilion 15-n233sl
RDMA/ucma: Fix Spectre v1 vulnerability
IB/ucm: Fix Spectre v1 vulnerability
cdc-acm: correct counting of UART states in serial state notification
usb: gadget: storage: Fix Spectre v1 vulnerability
USB: fix the usbfs flag sanitization for control transfers
Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15IGM
sched/fair: Fix throttle_list starvation with low CFS quota
x86/percpu: Fix this_cpu_read()
x86/time: Correct the attribute on jiffies' definition
net: fs_enet: do not call phy_stop() in interrupts
posix-timers: Sanitize overrun handling
Linux 4.9.136
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 4d217a5adccf5e806790c37c61cc374a08bd7381 ]
The newly added 'rodata_enabled' global variable is protected by
the wrong #ifdef, leading to a link error when CONFIG_DEBUG_SET_MODULE_RONX
is turned on:
kernel/module.o: In function `disable_ro_nx':
module.c:(.text.unlikely.disable_ro_nx+0x88): undefined reference to `rodata_enabled'
kernel/module.o: In function `module_disable_ro':
module.c:(.text.module_disable_ro+0x8c): undefined reference to `rodata_enabled'
kernel/module.o: In function `module_enable_ro':
module.c:(.text.module_enable_ro+0xb0): undefined reference to `rodata_enabled'
CONFIG_SET_MODULE_RONX does not exist, so use the correct one instead.
Fixes: 39290b389ea2 ("module: extend 'rodata=off' boot cmdline parameter to module mappings")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
SLUB creates a per-cache directory under /sys/kernel/slab which hosts a
bunch of debug files. Usually, there aren't that many caches on a
system and this doesn't really matter; however, if memcg is in use, each
cache can have per-cgroup sub-caches. SLUB creates the same directories
for these sub-caches under /sys/kernel/slab/$CACHE/cgroup.
Unfortunately, because there can be a lot of cgroups, active or
draining, the product of the numbers of caches, cgroups and files in
each directory can reach a very high number - hundreds of thousands is
commonplace. Millions and beyond aren't difficult to reach either.
What's under /sys/kernel/slab is primarily for debugging and the
information and control on the a root cache already cover its
sub-caches. While having a separate directory for each sub-cache can be
helpful for development, it doesn't make much sense to pay this amount
of overhead by default.
This patch introduces a boot parameter slub_memcg_sysfs which determines
whether to create sysfs directories for per-memcg sub-caches. It also
adds CONFIG_SLUB_MEMCG_SYSFS_ON which determines the boot parameter's
default value and defaults to 0.
(cherry picked from commit 1663f26df3df7df3720306ca67f5ea8296d68fa1)
[akpm@linux-foundation.org: kset_unregister(NULL) is legal]
Link: http://lkml.kernel.org/r/20170204145203.GB26958@mtj.duckdns.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 113130132
Change-Id: I36b3b65ad583527c57518aa9f73a97ebd68d01b2
Signed-off-by: Minchan Kim <minchan@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlt0UY8ACgkQONu9yGCS
aT7S6hAAlsHMdJgnhCQ06Ec1ld1b3Q7tehn+xVFgWAPyHm/L3koM/78IMjo7ZYCV
IxiIQyQ+JCn9DXQb2nqeEFgo8DYo83KvBkIgI+GQZHuVp+Dp9AvQLV8Bm3CQwiCY
LPlP1sWs0xRZxovazMJ5MLsAOf9hZVkpwWoQHrEQMlUHpKYwf6GRBteZ6C3evZ4G
lPo5N986h2YDYA3FLA2h5EeFS2H39bgbnepJ6/4/cYBpDy443X3TrV6UZjDDhHST
6XYPuqoApz+QIk2x2FfhVbZUb8WtPJNg/6IunOhlaUH/WCEN05lQ2x0jXAA4jOV/
Z2QyGnqsD8hMleDeakzo+yggaECkK2n+b6SicmomXWj7ILmBCrAIG0dOtPksmAaw
JP9mOKz5b87N2GOShSvj9LXuFOIO7TVvwFZCo4oYxkaW6ROxSO7Ffkiv8I6imMn5
zPGSBG4Pr9eQfeO+IK2JAxrULICcFbh57XXEP5x7MH78yRw4hG++BtWg62pI7TQl
l3zZ/eY8wKjTlNQbFkSAPenMypPic6w5NRA9tHST5XrwZkF0nzMWDz/1mAgOH5jx
XVTK3kZabKAf3YQ2/2nAnUvDM4BsM1SwUxTfc1CNQHIl24G7Y3Z4Vxlfj5orNEQ+
Y5OPKDatNi8gWBecDLNITih7h+WlRn1UKR/v4f8TPV0gyGhn2Vc=
=peDG
-----END PGP SIGNATURE-----
Merge 4.9.120 into android-4.9
Changes in 4.9.120
ext4: fix check to prevent initializing reserved inodes
tpm: fix race condition in tpm_common_write()
parisc: Enable CONFIG_MLONGCALLS by default
parisc: Define mb() and add memory barriers to assembler unlock sequences
kasan: add no_sanitize attribute for clang builds
Mark HI and TASKLET softirq synchronous
xen/netfront: don't cache skb_shinfo()
ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices
scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
init: rename and re-order boot_cpu_state_init()
root dentries need RCU-delayed freeing
make sure that __dentry_kill() always invalidates d_seq, unhashed or not
fix mntput/mntput race
fix __legitimize_mnt()/mntput() race
proc/sysctl: prune stale dentries during unregistering
proc/sysctl: Don't grab i_lock under sysctl_lock.
proc: Fix proc_sys_prune_dcache to hold a sb reference
IB/core: Make testing MR flags for writability a static inline function
IB/mlx4: Mark user MR as writable if actual virtual memory is writable
mtd: nand: qcom: Add a NULL check for devm_kasprintf()
IB/ocrdma: fix out of bounds access to local buffer
ARM: dts: imx6sx: fix irq for pcie bridge
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
x86/speculation: Protect against userspace-userspace spectreRSB
kprobes/x86: Fix %p uses in error messages
x86/irqflags: Provide a declaration for native_save_fl
x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
x86/speculation/l1tf: Change order of offset/type in swap entry
x86/speculation/l1tf: Protect swap entries against L1TF
x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
x86/speculation/l1tf: Make sure the first page is always reserved
x86/speculation/l1tf: Add sysfs reporting for l1tf
x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
x86/speculation/l1tf: Limit swap file size to MAX_PA/2
x86/bugs: Move the l1tf function and define pr_fmt properly
x86/smp: Provide topology_is_primary_thread()
x86/topology: Provide topology_smt_supported()
cpu/hotplug: Make bringup/teardown of smp threads symmetric
cpu/hotplug: Split do_cpu_down()
cpu/hotplug: Provide knobs to control SMT
x86/cpu: Remove the pointless CPU printout
x86/cpu/AMD: Remove the pointless detect_ht() call
x86/cpu/common: Provide detect_ht_early()
x86/cpu/topology: Provide detect_extended_topology_early()
x86/cpu/intel: Evaluate smp_num_siblings early
x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
x86/cpu/AMD: Evaluate smp_num_siblings early
x86/apic: Ignore secondary threads if nosmt=force
x86/speculation/l1tf: Extend 64bit swap file size limit
x86/cpufeatures: Add detection of L1D cache flush support.
x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
x86/speculation/l1tf: Protect PAE swap entries against L1TF
x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
Revert "x86/apic: Ignore secondary threads if nosmt=force"
cpu/hotplug: Boot HT siblings at least once
x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present
x86/KVM/VMX: Add module argument for L1TF mitigation
x86/KVM/VMX: Add L1D flush algorithm
x86/KVM/VMX: Add L1D MSR based flush
x86/KVM/VMX: Add L1D flush logic
kvm: nVMX: Update MSR load counts on a VMCS switch
x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers
x86/KVM/VMX: Add find_msr() helper function
x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting
x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs
x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required
cpu/hotplug: Online siblings when SMT control is turned on
x86/litf: Introduce vmx status variable
x86/kvm: Drop L1TF MSR list approach
x86/l1tf: Handle EPT disabled state proper
x86/kvm: Move l1tf setup function
x86/kvm: Add static key for flush always
x86/kvm: Serialize L1D flush parameter setter
x86/kvm: Allow runtime control of L1D flush
cpu/hotplug: Expose SMT control init function
cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Documentation: Add section about CPU vulnerabilities
x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content
Documentation/l1tf: Fix typos
cpu/hotplug: detect SMT disabled by BIOS
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
x86: Don't include linux/irq.h from asm/hardirq.h
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
Documentation/l1tf: Remove Yonah processors from not vulnerable list
KVM: x86: Add a framework for supporting MSR-based features
KVM: SVM: Add MSR-based feature support for serializing LFENCE
KVM: X86: Introduce kvm_get_msr_feature()
KVM: X86: Allow userspace to define the microcode version
KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
cpu/hotplug: Fix SMT supported evaluation
x86/speculation/l1tf: Invert all not present mappings
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/mm/pat: Make set_memory_np() L1TF safe
x86/mm/kmmio: Make the tracer robust against L1TF
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
x86/microcode: Do not upload microcode if CPUs are offline
x86/microcode: Allow late microcode loading with SMT disabled
x86/smp: fix non-SMP broken build due to redefinition of apic_id_is_primary_thread
cpu/hotplug: Non-SMP machines do not make use of booted_once
x86/init: fix build with CONFIG_SWAP=n
x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures
x86/cpu/amd: Limit cpu_core_id fixup to families older than F17h
x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
Linux 4.9.120
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit b5b1404d0815894de0690de8a1ab58269e56eae6 upstream.
This is purely a preparatory patch for upcoming changes during the 4.19
merge window.
We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).
This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized. It even has a comment to
that effect.
Except it _doesn't_ actually run after the percpu data has been properly
initialized. On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:
- per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+ this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
which is obviously the right thing to do. Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.
So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.
Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39290b389ea upstream.
The current "rodata=off" parameter disables read-only kernel mappings
under CONFIG_DEBUG_RODATA:
commit d2aa1acad2 ("mm/init: Add 'rodata=off' boot cmdline parameter
to disable read-only kernel mappings")
This patch is a logical extension to module mappings ie. read-only mappings
at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX
(mainly for debug use). Please note, however, that it only affects RO/RW
permissions, keeping NX set.
This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory
(always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64.
Suggested-by: and Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/20161114061505.15238-1-takahiro.akashi@linaro.org
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Alex Shi <alex.shi@linaro.org> [v4.9 backport]
Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport]
Tested-by: Will Deacon <will.deacon@arm.com>
Tested-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This change adds the CONFIG_CFI_CLANG option, CFI error handling,
and a faster look-up table for cross module CFI checks.
Bug: 67506682
Change-Id: Ic009f0a629b552a0eb16e6d89808c7029e91447d
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpxrs0ACgkQONu9yGCS
aT7Yxw/+NmM+Yh70QOpW02RCFHCB+F9tnuQXNlLfEoDqlujMS/UNuVMx39gQXDaU
7T/JOmnVtp9WQL9RLgAegSc3ayIQELzvtKjDLSo/hzxYsOmr0WlN2CVTGT7hn9JH
IQdf8cR2r4FZ/XcxQLpSsRabwhqfeoND1TTm5LUNB1Ii05hUU6/s0k1rQguabuo5
vi0BzSh7v/URxlLyL0m4ZVqovWOASS5/qSv7wazd4i/bSqH3g7VXLNu93iyOB8ih
XXpeTjtfAwJ5kUXBWZPNazUzpQ7b56sQPtsvN6CrvTv8jKJ+FH+7S4d50Vgbu51X
YBC36yypYPXunMXB9iiLYkyb8jraKr12BRLXQyl3TlNANoYjBiT/a2XmHDMA1VbL
+ydbswbmcAvZ1fuAekVY+HIogEroWzN7FbhdUgV12nm7/4WfxpBTZW+M8Es/Stuh
2ACT9TWopbhwRFUhFT5kyDTTnK++NsshGzUXbR9qPQzhdaqe76RPfJ6uHV69MXxP
gE9o3NQ3fUieJO5nQj54atErX+sJ4987DnGoWrg+Ye9Svsq1oVw0K1e44VLBp08v
iZk2lvNjUWnkDGQOhsPEYCLq6KPjXkaqV4OZVS6tGxGEZ4QQJjbnYk+kPeKjrKIA
iP3nfaLJ4HQc2kvwEI41HEJGWyGUlhdrnDqfpxpWgGXOStGJrq0=
=RJ2h
-----END PGP SIGNATURE-----
Merge 4.9.79 into android-4.9
Changes in 4.9.79
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
orangefs: use list_for_each_entry_safe in purge_waiting_ops
orangefs: initialize op on loop restart in orangefs_devreq_read
usbip: prevent vhci_hcd driver from leaking a socket pointer address
usbip: Fix implicit fallthrough warning
usbip: Fix potential format overflow in userspace tools
can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
Prevent timer value 0 for MWAITX
drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
drivers: base: cacheinfo: fix boot error message when acpi is enabled
mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
hwpoison, memcg: forcibly uncharge LRU pages
cma: fix calculation of aligned offset
mm, page_alloc: fix potential false positive in __zone_watermark_ok
ipc: msg, make msgrcv work with LONG_MIN
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ACPICA: Namespace: fix operand cache leak
netfilter: nfnetlink_cthelper: Add missing permission checks
netfilter: xt_osf: Add missing permission checks
reiserfs: fix race in prealloc discard
reiserfs: don't preallocate blocks for extended attributes
fs/fcntl: f_setown, avoid undefined behaviour
scsi: libiscsi: fix shifting of DID_REQUEUE host byte
Revert "module: Add retpoline tag to VERMAGIC"
mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
Input: trackpoint - force 3 buttons if 0 button is reported
orangefs: fix deadlock; do not write i_size in read_iter
um: link vmlinux with -no-pie
vsyscall: Fix permissions for emulate mode with KAISER/PTI
eventpoll.h: add missing epoll event masks
dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
ipv6: fix udpv6 sendmsg crash caused by too small MTU
ipv6: ip6_make_skb() needs to clear cork.base.dst
lan78xx: Fix failure in USB Full Speed
net: igmp: fix source address check for IGMPv3 reports
net: qdisc_pkt_len_init() should be more robust
net: tcp: close sock if net namespace is exiting
pppoe: take ->needed_headroom of lower device into account on xmit
r8169: fix memory corruption on retrieval of hardware statistics.
sctp: do not allow the v4 socket to bind a v4mapped v6 address
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
tipc: fix a memory leak in tipc_nl_node_get_link()
vmxnet3: repair memory leak
net: Allow neigh contructor functions ability to modify the primary_key
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
ppp: unlock all_ppp_mutex before registering device
be2net: restore properly promisc mode after queues reconfiguration
ip6_gre: init dev->mtu and dev->hard_header_len correctly
gso: validate gso_type in GSO handlers
mlxsw: spectrum_router: Don't log an error on missing neighbor
tun: fix a memory leak for tfile->tx_array
flow_dissector: properly cap thoff field
perf/x86/amd/power: Do not load AMD power module on !AMD platforms
x86/microcode/intel: Extend BDW late-loading further with LLC size check
hrtimer: Reset hrtimer cpu base proper on CPU hotplug
x86: bpf_jit: small optimization in emit_bpf_tail_call()
bpf: fix bpf_tail_call() x64 JIT
bpf: introduce BPF_JIT_ALWAYS_ON config
bpf: arsh is not supported in 32 bit alu thus reject it
bpf: avoid false sharing of map refcount with max_entries
bpf: fix divides by zero
bpf: fix 32-bit divide by zero
bpf: reject stores into ctx via st and xadd
nfsd: auth: Fix gid sorting when rootsquash enabled
Linux 4.9.79
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpPj9UACgkQONu9yGCS
aT6W6A/9HYNDiavWc0w5W96jQKBATh/AbvqULeT7F37Gwp9XRihYqUkHYJlCqtLr
QiY67DrN08MwNYxq5iXWlG++UJJcG7hE0nWDIuF1WJwsbBVBFuaqeiQWyudkpEXB
aTF+pmKaVx9crlbWHtwYZEPM4qwT22zxZEEka+DYggTuF7IwvfSYC0GQ3oHvSGcq
e+iChj72ELtymyo4NxWNndm2wxRkVRdiwRmmFriV7vT4B7QkpmNMYnCODBylyJ1J
iWD6q4Hv8PsNa9z/Es+v09x0DvTJgexGNsLh6kSCJ+ajiLhw9aEAgqBSlEZANUVm
p4dNg3POgsry3JCFGb4R9WDSpD7I98xQCeVKccZr7yKAo1gw+h274wV9VnsReoUI
91MT9VQZDqNnWQFLTC9+k3Cv1UibrDz5hI/m3/foNN3NP8z8xXdsCMMwLVRwSlOP
ZAGV9yVgA0Hko6457FenUTWfCDVyMBf8monEZ0YWxucAW8ihh/EovmmJMNIxhlWX
oJsuNwf/zYOxoYIm1Z551RtOxvH84axSHtrPrS47bLggK09IWomfeDwOZv4ZQrIK
L2LXc5uFlMS7z359paNK8NyvraTFrZ7j5X8RvegZRgKq/T3Qp0jwqu6lxyY8q5hH
os13s63QBvqdPZm1G3nc5/89oR976k+yW9hhZqwG7qi1XCPuU/k=
=vqC6
-----END PGP SIGNATURE-----
Merge 4.9.75 into android-4.9
Changes in 4.9.75
tcp_bbr: reset full pipe detection on loss recovery undo
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
x86/boot: Add early cmdline parsing for options with arguments
KAISER: Kernel Address Isolation
kaiser: merged update
kaiser: do not set _PAGE_NX on pgd_none
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
kaiser: fix build and FIXME in alloc_ldt_struct()
kaiser: KAISER depends on SMP
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
kaiser: fix perf crashes
kaiser: ENOMEM if kaiser_pagetable_walk() NULL
kaiser: tidied up asm/kaiser.h somewhat
kaiser: tidied up kaiser_add/remove_mapping slightly
kaiser: align addition to x86/mm/Makefile
kaiser: cleanups while trying for gold link
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
kaiser: delete KAISER_REAL_SWITCH option
kaiser: vmstat show NR_KAISERTABLE as nr_overhead
kaiser: enhanced by kernel and user PCIDs
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
kaiser: PCID 0 for kernel and 128 for user
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
kaiser: paranoid_entry pass cr3 need to paranoid_exit
kaiser: kaiser_remove_mapping() move along the pgd
kaiser: fix unlikely error in alloc_ldt_struct()
kaiser: add "nokaiser" boot option, using ALTERNATIVE
x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
x86/kaiser: Check boottime cmdline params
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
kaiser: drop is_atomic arg to kaiser_pagetable_walk()
kaiser: asm/tlbflush.h handle noPGE at lower level
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
x86/paravirt: Dont patch flush_tlb_single
x86/kaiser: Reenable PARAVIRT
kaiser: disabled on Xen PV
x86/kaiser: Move feature detection up
KPTI: Rename to PAGE_TABLE_ISOLATION
KPTI: Report when enabled
kaiser: Set _PAGE_NX only if supported
Linux 4.9.75
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping(). And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.
More information about the patch can be found on:
https://github.com/IAIK/KAISER
From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5
To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>
After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).
With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.
If there are any questions we would love to answer them.
We also appreciate any comments!
Cheers,
Daniel (+ the KAISER team from Graz University of Technology)
[1] 4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf
[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]
Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Memory allocated for initrd would not be reclaimed if initializing ramfs
was skipped.
Bug: 69901741
Test: "grep MemTotal /proc/meminfo" increases by a few MB on an Android
device with a/b boot.
Change-Id: Ifbe094d303ed12cfd6de6aa004a8a19137a2f58a
Signed-off-by: Nick Bray <ncbray@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnrZgMACgkQONu9yGCS
aT7tNxAA0QEFAgTqEn/ICvHTGHJKQDcrTTw0PUNftClB7XED5xLLdYBMOpFk4pUS
fY7y6s64JvJCJnAkTEa0QkrhyoESSkVvsk38i6bkIZh7PwJIwb9MoYjPiVQGufAi
0i/4Fx+SGfHWb5BCAYeTH0PivFkpYWIHYyxVpgkBOs9uDi+W/haxje6NsbTvM8o7
q7Ajyg5iUTOsJdAdBfXc5ueUgr5kV8WrSq5bSp01Nwtoa3mLAhiB88/ovca+DRt4
ptRdEaBQL17bTzVdry4d4ru+9aEsfUHHazGwrB8lbnYRbnZ1OUZRfENeq0FFKvRJ
JDRZlEY9iyToWnDTYeiV2fvMoaZtcc+vCP47EjTUfS2sus6rbh1iZBIk8hgRP6Rh
Giyyoxutk89JcU76yvKU8uVnnTCtYGshS/CCHrQjl7T7wefbIJD8BWoaINApbjiW
B8or6g4BWr0UXW2ZNsQAE89Vs99xCzosvrNUFXCkQyfi31GhoW3pAl48Tct4cHv3
s3Ll8PfVGItjFQ4f9xinYfqHgQCxA1fx4CNqUX+9RP2mxb3aILioRtAKTktZzrZZ
yC5h/h1RBU+Ob4i0G9JeC+oIlRLaoAtY2XAQRHtif+uQbK3/bl4p+SY3evtcmAsm
etv1kb9iYGq6Dw5wztUkpiftLJ06Xat5PTGtSejfA8/ELKW8J9U=
=yD4q
-----END PGP SIGNATURE-----
Merge 4.9.58 into android-4.9
Changes in 4.9.58
MIPS: Fix minimum alignment requirement of IRQ stack
Revert "bsg-lib: don't free job in bsg_prepare_job"
xen-netback: Use GFP_ATOMIC to allocate hash
locking/lockdep: Add nest_lock integrity test
watchdog: kempld: fix gcc-4.3 build
irqchip/crossbar: Fix incorrect type of local variables
initramfs: finish fput() before accessing any binary from initramfs
mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
ALSA: hda: Add Geminilake HDMI codec ID
qed: Don't use attention PTT for configuring BW
mac80211: fix power saving clients handling in iwlwifi
net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
staging: vchiq_2835_arm: Make cache-line-size a required DT property
netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
iio: adc: xilinx: Fix error handling
f2fs: do SSR for data when there is enough free space
sched/fair: Update rq clock before changing a task's CPU affinity
Btrfs: send, fix failure to rename top level inode due to name collision
f2fs: do not wait for writeback in write_begin
md/linear: shutup lockdep warnning
sparc64: Migrate hvcons irq to panicked cpu
net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
crypto: xts - Add ECB dependency
mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next
ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
slub: do not merge cache if slub_debug contains a never-merge flag
scsi: scsi_dh_emc: return success in clariion_std_inquiry()
ASoC: mediatek: add I2C dependency for CS42XX8
drm/amdgpu: refuse to reserve io mem for split VRAM buffers
net: mvpp2: release reference to txq_cpu[] entry after unmapping
qede: Prevent index problems in loopback test
qed: Reserve doorbell BAR space for present CPUs
qed: Read queue state before releasing buffer
i2c: at91: ensure state is restored after suspending
ceph: don't update_dentry_lease unless we actually got one
ceph: fix bogus endianness change in ceph_ioctl_set_layout
ceph: clean up unsafe d_parent accesses in build_dentry_path
uapi: fix linux/rds.h userspace compilation errors
uapi: fix linux/mroute6.h userspace compilation errors
IB/hfi1: Use static CTLE with Preset 6 for integrated HFIs
IB/hfi1: Allocate context data on memory node
target/iscsi: Fix unsolicited data seq_end_offset calculation
hrtimer: Catch invalid clockids again
nfsd/callback: Cleanup callback cred on shutdown
powerpc/perf: Add restrictions to PMC5 in power9 DD1
drm/nouveau/gr/gf100-: fix ccache error logging
regulator: core: Resolve supplies before disabling unused regulators
btmrvl: avoid double-disable_irq() race
EDAC, mce_amd: Print IPID and Syndrome on a separate line
cpufreq: CPPC: add ACPI_PROCESSOR dependency
usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets
Linux 4.9.58
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>