/vendor/etc/perf/perfconfigstore.xml enables ro.lmk.enable_userspace_lmk,
and when the lmkd properties are re-read, lmkd tries to honor it and ignores
the presence of in-kernel lmk and exits repeatedly, which causes init to
trigger a reboot.
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GilkACgkQONu9yGCS
aT6ibw//QcoRk+6jjWIzW2R83tvcSpDREIYKF5xDPULqmaUKv7qcbGqdCT4OI0Vf
CySao6jvmN+1V+5OlQHHw1qmithHF/Y4yEVi+eL0CcMgB6qA2tQxeA0ffFu0Fzxe
oKrAIDANAJ2FKymqbIzTJU8AChNuy+Rc+C60L9O0ZhBHykLYvO7/VepTxdO2aMGs
a9tdZyXnOAr/ZgKx+uN1F8Rx2ZorNfFmP2rDxEZ/B8pVrXsOjMAcxWRsZBv+w9mc
zWaMPEL1vIR/kG35L18l2EFwZ++uIGvfGR2HxhNRUlTqN4m/3trATIc0eRn5PyAC
qBlVdcwUeJKavBqK3cgHvK9CzWQdSMxNefk9A0H66ZGpfSuNCiFjw54AXEiRzx3t
OvzUvDjfa36jsKW7yiYXdLbEa52gT14UDzmSIzlAYQLBplEadHikJ0FxCFzAWpFt
C13gG1e4G0ZKEHH8wZMuRdanHbN87c/0Sm4rgTLMwCO6I1PeILcUwIq9El9M4RVL
MmHSXLKduaHbPJWx+Qopl6NxjqTe/VR8paT6q1QnU00/4kP15aIVE08vZSiGSNP+
Gp6xbv12skAx1m1zP7K82oGdwnCVCJUqlC9wafcjsaxcoCVBWvIgPkyMWAzUeFzc
Ub2yIS9iulUeYOxPXBZKWCqgwpl7kovGztf4gwX2Kuy50mXSZQg=
=ozfF
-----END PGP SIGNATURE-----
Merge 4.9.187 into android-4.9-q
Changes in 4.9.187
MIPS: ath79: fix ar933x uart parity mode
MIPS: fix build on non-linux hosts
arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly
dmaengine: imx-sdma: fix use-after-free on probe error path
ath10k: Do not send probe response template for mesh
ath9k: Check for errors when reading SREV register
ath6kl: add some bounds checking
ath: DFS JP domain W56 fixed pulse type 3 RADAR detection
batman-adv: fix for leaked TVLV handler.
media: dvb: usb: fix use after free in dvb_usb_device_exit
crypto: talitos - fix skcipher failure due to wrong output IV
media: marvell-ccic: fix DMA s/g desc number calculation
media: vpss: fix a potential NULL pointer dereference
media: media_device_enum_links32: clean a reserved field
net: stmmac: dwmac1000: Clear unused address entries
net: stmmac: dwmac4/5: Clear unused address entries
signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig
af_key: fix leaks in key_pol_get_resp and dump_sp.
xfrm: Fix xfrm sel prefix length validation
media: mc-device.c: don't memset __user pointer contents
media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails.
net: phy: Check against net_device being NULL
crypto: talitos - properly handle split ICV.
crypto: talitos - Align SEC1 accesses to 32 bits boundaries.
tua6100: Avoid build warnings.
locking/lockdep: Fix merging of hlocks with non-zero references
media: wl128x: Fix some error handling in fm_v4l2_init_video_device()
cpupower : frequency-set -r option misses the last cpu in related cpu list
net: fec: Do not use netdev messages too early
net: axienet: Fix race condition causing TX hang
s390/qdio: handle PENDING state for QEBSM devices
perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode
perf test 6: Fix missing kvm module load for s390
gpio: omap: fix lack of irqstatus_raw0 for OMAP4
gpio: omap: ensure irq is enabled before wakeup
regmap: fix bulk writes on paged registers
bpf: silence warning messages in core
rcu: Force inlining of rcu_read_lock()
blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
xfrm: fix sa selector validation
perf evsel: Make perf_evsel__name() accept a NULL argument
vhost_net: disable zerocopy by default
ipoib: correcly show a VF hardware address
EDAC/sysfs: Fix memory leak when creating a csrow object
ipsec: select crypto ciphers for xfrm_algo
media: i2c: fix warning same module names
ntp: Limit TAI-UTC offset
timer_list: Guard procfs specific code
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
media: coda: fix mpeg2 sequence number handling
media: coda: increment sequence offset for the last returned frame
mt7601u: do not schedule rx_tasklet when the device has been disconnected
x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
mt7601u: fix possible memory leak when the device is disconnected
ath10k: fix PCIE device wake up failed
perf tools: Increase MAX_NR_CPUS and MAX_CACHES
libata: don't request sense data on !ZAC ATA devices
clocksource/drivers/exynos_mct: Increase priority over ARM arch timer
rslib: Fix decoding of shortened codes
rslib: Fix handling of of caller provided syndrome
ixgbe: Check DDM existence in transceiver before access
crypto: asymmetric_keys - select CRYPTO_HASH where needed
EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec
bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush()
iwlwifi: mvm: Drop large non sta frames
net: usb: asix: init MAC address buffers
gpiolib: Fix references to gpiod_[gs]et_*value_cansleep() variants
Bluetooth: hci_bcsp: Fix memory leak in rx_skb
Bluetooth: 6lowpan: search for destination address in all peers
Bluetooth: Check state in l2cap_disconnect_rsp
Bluetooth: validate BLE connection interval updates
gtp: fix Illegal context switch in RCU read-side critical section.
gtp: fix use-after-free in gtp_newlink()
xen: let alloc_xenballooned_pages() fail if not enough memory free
scsi: NCR5380: Reduce goto statements in NCR5380_select()
scsi: NCR5380: Always re-enable reselection interrupt
scsi: mac_scsi: Increase PIO/PDMA transfer length threshold
crypto: ghash - fix unaligned memory access in ghash_setkey()
crypto: arm64/sha1-ce - correct digest for empty data in finup
crypto: arm64/sha2-ce - correct digest for empty data in finup
crypto: chacha20poly1305 - fix atomic sleep when using async algorithm
crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe
Input: gtco - bounds check collection indent level
regulator: s2mps11: Fix buck7 and buck8 wrong voltages
arm64: tegra: Update Jetson TX1 GPU regulator timings
iwlwifi: pcie: don't service an interrupt that was masked
tracing/snapshot: Resize spare buffer if size changed
NFSv4: Handle the special Linux file open access mode
lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE
ALSA: seq: Break too long mutex context in the write loop
ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine
media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom()
media: coda: Remove unbalanced and unneeded mutex unlock
KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
arm64: tegra: Fix AGIC register range
fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
drm/nouveau/i2c: Enable i2c pads & busses during preinit
padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
9p/virtio: Add cleanup path in p9_virtio_init
PCI: Do not poll for PME if the device is in D3cold
Btrfs: add missing inode version, ctime and mtime updates when punching hole
libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
take floppy compat ioctls to sodding floppy.c
floppy: fix div-by-zero in setup_format_params
floppy: fix out-of-bounds read in next_valid_format
floppy: fix invalid pointer dereference in drive_name
floppy: fix out-of-bounds read in copy_buffer
coda: pass the host file in vma->vm_file on mmap
gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM
crypto: ccp - Validate the the error value used to index error messages
PCI: hv: Delete the device earlier from hbus->children for hot-remove
PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
crypto: caam - limit output IV to CBC to work around CTR mode DMA issue
um: Allow building and running on older hosts
um: Fix FP register size for XSTATE/XSAVE
parisc: Ensure userspace privilege for ptraced processes in regset functions
parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1
powerpc/32s: fix suspend/resume when IBATs 4-7 are used
powerpc/watchpoint: Restore NV GPRs while returning from exception
eCryptfs: fix a couple type promotion bugs
intel_th: msu: Fix single mode with disabled IOMMU
Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug
usb: Handle USB3 remote wakeup for LPM enabled devices correctly
dm bufio: fix deadlock with loop device
compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
compiler.h: Add read_word_at_a_time() function.
lib/strscpy: Shut up KASAN false-positives in strscpy()
ext4: allow directory holes
bnx2x: Prevent load reordering in tx completion processing
bnx2x: Prevent ptp_task to be rescheduled indefinitely
caif-hsi: fix possible deadlock in cfhsi_exit_module()
igmp: fix memory leak in igmpv3_del_delrec()
ipv4: don't set IPv6 only flags to IPv4 addresses
net: bcmgenet: use promisc for unsupported filters
net: dsa: mv88e6xxx: wait after reset deactivation
net: neigh: fix multiple neigh timer scheduling
net: openvswitch: fix csum updates for MPLS actions
nfc: fix potential illegal memory access
rxrpc: Fix send on a connected, but unbound socket
sky2: Disable MSI on ASUS P6T
vrf: make sure skb->data contains ip header to make routing
macsec: fix use-after-free of skb during RX
macsec: fix checksumming after decryption
netrom: fix a memory leak in nr_rx_frame()
netrom: hold sock when setting skb->destructor
bonding: validate ip header before check IPPROTO_IGMP
tcp: Reset bytes_acked and bytes_received when disconnecting
net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
net: bridge: stp: don't cache eth dest pointer before skb pull
perf/x86/amd/uncore: Rename 'L2' to 'LLC'
perf/x86/amd/uncore: Get correct number of cores sharing last level cache
perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
NFSv4: Fix open create exclusive when the server reboots
nfsd: increase DRC cache limit
nfsd: give out fewer session slots as limit approaches
nfsd: fix performance-limiting session calculation
nfsd: Fix overflow causing non-working mounts on 1 TB machines
drm/panel: simple: Fix panel_simple_dsi_probe
usb: core: hub: Disable hub-initiated U1/U2
tty: max310x: Fix invalid baudrate divisors calculator
pinctrl: rockchip: fix leaked of_node references
tty: serial: cpm_uart - fix init when SMC is relocated
drm/bridge: tc358767: read display_props in get_modes()
drm/bridge: sii902x: pixel clock unit is 10kHz instead of 1kHz
memstick: Fix error cleanup path of memstick_init
tty/serial: digicolor: Fix digicolor-usart already registered warning
tty: serial: msm_serial: avoid system lockup condition
serial: 8250: Fix TX interrupt handling condition
drm/virtio: Add memory barriers for capset cache.
phy: renesas: rcar-gen2: Fix memory leak at error paths
drm/rockchip: Properly adjust to a true clock in adjusted_mode
tty: serial_core: Set port active bit in uart_port_activate
usb: gadget: Zero ffs_io_data
powerpc/pci/of: Fix OF flags parsing for 64bit BARs
PCI: sysfs: Ignore lockdep for remove attribute
kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS
PCI: xilinx-nwl: Fix Multi MSI data programming
iio: iio-utils: Fix possible incorrect mask calculation
recordmcount: Fix spurious mcount entries on powerpc
mfd: core: Set fwnode for created devices
mfd: arizona: Fix undefined behavior
mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk
um: Silence lockdep complaint about mmap_sem
powerpc/4xx/uic: clear pending interrupt after irq type/pol change
RDMA/i40iw: Set queue pair state when being queried
serial: sh-sci: Terminate TX DMA during buffer flushing
serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
kallsyms: exclude kasan local symbols on s390
perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning
RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
f2fs: avoid out-of-range memory access
mailbox: handle failed named mailbox channel request
powerpc/eeh: Handle hugepages in ioremap space
sh: prevent warnings when using iounmap
mm/kmemleak.c: fix check for softirq context
9p: pass the correct prototype to read_cache_page
mm/mmu_notifier: use hlist_add_head_rcu()
locking/lockdep: Fix lock used or unused stats error
locking/lockdep: Hide unused 'class' variable
usb: wusbcore: fix unbalanced get/put cluster_id
usb: pci-quirks: Correct AMD PLL quirk detection
x86/sysfb_efi: Add quirks for some devices with swapped width and height
x86/speculation/mds: Apply more accurate check on hypervisor platform
hpet: Fix division by zero in hpet_time_div()
ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1
ALSA: hda - Add a conexant codec entry to let mute led work
powerpc/tm: Fix oops on sigreturn on systems without TM
access: avoid the RCU grace period for the temporary subjective credentials
ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt
tcp: reset sk_send_head in tcp_write_queue_purge
arm64: dts: marvell: Fix A37xx UART0 register size
i2c: qup: fixed releasing dma without flush operation completion
arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
ISDN: hfcsusb: checking idx of ep configuration
media: au0828: fix null dereference in error path
media: cpia2_usb: first wake up, then free in disconnect
media: radio-raremono: change devm_k*alloc to k*alloc
Bluetooth: hci_uart: check for missing tty operations
sched/fair: Don't free p->numa_faults with concurrent readers
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
ceph: hold i_ceph_lock when removing caps for freeing inode
Linux 4.9.187
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlyJcH4ACgkQONu9yGCS
aT6A3A//dVZKmejQ7tUeESZm5e/J+1/qSsF1LXVHoAYBz5aetWr+4/Ya3a3yl/QS
cVypLhSi/pEstd6gxgIyDts9C7JZdnsOrRg+tJ0D6DEUWBJGrWlBj+gSFdKyN5ko
4yFLk66kL8RV5d921q3OfHbhDy0qIso6VaFOKh4kRWTuoATgSAjksA/A//FijWh3
QQGB30IElENFfTYBIxqlzqer3FnarqauaiaJxSMSAhzlWcxcy7gs43RryIX1zX+F
BqhNgDwP4pU7EK128s0Jjx32PwOWb8sV9WB4t9b5NkIj1BRsc2repxorQg3GO+b9
ATaGJHVT87sU8Uvegvq+rFqmNdlg8OhGEg0GgpJOe/jrigx8+iCO1kbPgfXUJmzx
hu0qD+e1bW2hRk11l0kgTR2b1xaTq1Kfrwz8NjIirVLYBxZfLZpnZ9kiYeKRBjZC
lFHNogmLswWBgZfnqJGMWnZsL+lx8eyzB+27ntm9ZHffs+pZCw+njLcOkE7Fv7tb
5WKNLgdcD2mnH0OWKsszHFa7xVQnlhXPrfPC9Qh+e5ds6E5rDRHpr88DDBdAFW9f
ECtdLyh2Wl3fdxTGKhzo8psONigWRnyniqr6ELf3K6gUCuGcyHyYPkTy10OnnYU8
Dq3TtnQyhHeiNvmtGGcDtCJo/oKxZRtt9+q4k6NDBSV5i4ZOtI8=
=ZC4+
-----END PGP SIGNATURE-----
Merge 4.9.163 into android-4.9
Changes in 4.9.163
USB: serial: option: add Telit ME910 ECM composition
USB: serial: cp210x: add ID for Ingenico 3070
USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
cpufreq: Use struct kobj_attribute instead of struct global_attr
ncpfs: fix build warning of strncpy
isdn: isdn_tty: fix build warning of strncpy
staging: comedi: ni_660x: fix missing break in switch statement
staging: wilc1000: fix to set correct value for 'vif_num'
staging: android: ion: fix sys heap pool's gfp_flags
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
net-sysfs: Fix mem leak in netdev_register_kobject
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
team: Free BPF filter when unregistering netdev
bnxt_en: Drop oversize TX packets to prevent errors.
hv_netvsc: Fix IP header checksum for coalesced packets
net: dsa: mv88e6xxx: Fix u64 statistics
netlabel: fix out-of-bounds memory accesses
net: netem: fix skb length BUG_ON in __skb_to_sgvec
net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
net: sit: fix memory leak in sit_init_net()
xen-netback: don't populate the hash cache on XenBus disconnect
xen-netback: fix occasional leak of grant ref mappings under memory pressure
net: Add __icmp_send helper.
net: avoid use IPCB in cipso_v4_error
tun: fix blocking read
tun: remove unnecessary memory barrier
net: phy: Micrel KSZ8061: link failure after cable connect
x86/CPU/AMD: Set the CPB bit unconditionally on F17h
applicom: Fix potential Spectre v1 vulnerabilities
MIPS: irq: Allocate accurate order pages for irq stack
hugetlbfs: fix races and page leaks during migration
exec: Fix mem leak in kernel_read_file
media: uvcvideo: Fix 'type' check leading to overflow
vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
perf core: Fix perf_proc_update_handler() bug
perf tools: Handle TOPOLOGY headers with no CPU
IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
iommu/amd: Call free_iova_fast with pfn in map_sg
iommu/amd: Unmap all mapped pages in error path of map_sg
ipvs: Fix signed integer overflow when setsockopt timeout
iommu/amd: Fix IOMMU page flush when detach device from a domain
xtensa: SMP: fix ccount_timer_shutdown
xtensa: SMP: fix secondary CPU initialization
xtensa: smp_lx200_defconfig: fix vectors clash
xtensa: SMP: mark each possible CPU as present
xtensa: SMP: limit number of possible CPUs by NR_CPUS
net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
net: hns: Fix for missing of_node_put() after of_parse_phandle()
net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
gpio: vf610: Mask all GPIO interrupts
nfs: Fix NULL pointer dereference of dev_name
qed: Fix VF probe failure while FLR
scsi: libfc: free skb when receiving invalid flogi resp
platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
cifs: fix computation for MAX_SMB2_HDR_SIZE
arm64: kprobe: Always blacklist the KVM world-switch code
x86/kexec: Don't setup EFI info if EFI runtime is not enabled
x86_64: increase stack size for KASAN_EXTRA
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
autofs: drop dentry reference only when it is never used
autofs: fix error return in autofs_fill_super()
soc: fsl: qbman: avoid race in clearing QMan interrupt
ARM: pxa: ssp: unneeded to free devm_ allocated data
arm64: dts: add msm8996 compatible to gicv3
usb: phy: fix link errors
irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init
dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
vsock/virtio: fix kernel panic after device hot-unplug
vsock/virtio: reset connected sockets on device removal
dmaengine: dmatest: Abort test in case of mapping error
selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET
selftests: netfilter: add simple masq/redirect test cases
netfilter: nf_nat: skip nat clash resolution for same-origin entries
s390/qeth: fix use-after-free in error path
perf symbols: Filter out hidden symbols from labels
MIPS: Remove function size check in get_frame_info()
fs: ratelimit __find_get_block_slow() failure message.
Input: wacom_serial4 - add support for Wacom ArtPad II tablet
Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
iscsi_ibft: Fix missing break in switch statement
scsi: aacraid: Fix missing break in switch statement
futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
ARM: dts: exynos: Fix pinctrl definition for eMMC RTSN line on Odroid X2/U3
ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
drm: disable uncached DMA optimization for ARM and arm64
ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420
perf/x86/intel: Make cpuc allocations consistent
perf/x86/intel: Generalize dynamic constraint creation
x86: Add TSX Force Abort CPUID/MSR
Linux 4.9.163
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltcA9oACgkQONu9yGCS
aT4NZg/7Bbf10v5kf18jZeolQkj1GyVv7+tI1fAlM3tT5BGEqAZDUdp383YegQih
YRV5Q7fsdPxqyoXAfQosKdjViBowWglzWJE2YKZHRIOkOBSC0mlhfhNiqwp9owlQ
/JHGwfPhYaJt9Oyuc/OZ3iq/KNe8gm29OuFnQd8pKp8mFakpyiEVcLSeqHUjGQ9P
BBM0H9+F/16iOOVcOqQvbG7rza9AjPXeTLGcMf63Nah6qLSvuH3il/v42N5XXOuJ
iXozco9ifh3BxC/vP3sHrt+BCUeUsNbLUdZO1gZIpybd1byJAbQSPkN8v9jgNZbG
j7xMfMecsUNVsPpv8i8f7Zbh7PDYx+XGk6ufArmYItmp3X65gO+rrxbme+pSvKib
g8x0952+u+ddnyEPH/DcypTI/WU2qeAfXk4HEbeeYiZZxOUmF76XNn55YZW8xpqj
jJi9CaXHiXQpje2a8KGMR3b37T3f5fntOn4rIWT/isaqbqms8j/3b9AYf9yEEGZ1
b05787d6ybHQrMVi9nTXKrRAQqlnKpZZWdsOPvrrV9jO5TnYyDy2RB9/19SEpkdj
kD6lsMlL//o6TRFDIdph9Kg1sm2rFnkT78Hc/RZJ5t27+CM2YfvrLr1+k4G15QqG
N2h+0naYkA6dc052i0kbL0cQGXngeoBeINAKOcXyom99p/rFaKA=
=gXo7
-----END PGP SIGNATURE-----
Merge 4.9.116 into android-4.9
Changes in 4.9.116
MIPS: ath79: fix register address in ath79_ddr_wb_flush()
MIPS: Fix off-by-one in pci_resource_to_user()
ip: hash fragments consistently
ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
net: skb_segment() should not return NULL
net/mlx5: Adjust clock overflow work period
net/mlx5e: Don't allow aRFS for encapsulated packets
net/mlx5e: Fix quota counting in aRFS expire flow
multicast: do not restore deleted record source filter mode to new one
net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv
rtnetlink: add rtnl_link_state check in rtnl_configure_link
tcp: fix dctcp delayed ACK schedule
tcp: helpers to send special DCTCP ack
tcp: do not cancel delay-AcK on DCTCP special ACK
tcp: do not delay ACK in DCTCP upon CE status change
tcp: free batches of packets in tcp_prune_ofo_queue()
tcp: avoid collapses in tcp_prune_queue() if possible
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
tcp: call tcp_drop() from tcp_data_queue_ofo()
usb: cdc_acm: Add quirk for Castles VEGA3000
usb: core: handle hub C_PORT_OVER_CURRENT condition
usb: gadget: f_fs: Only return delayed status when len is 0
driver core: Partially revert "driver core: correct device's shutdown order"
can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
can: xilinx_can: fix power management handling
can: xilinx_can: fix recovery from error states not being propagated
can: xilinx_can: fix device dropping off bus on RX overrun
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
can: xilinx_can: fix incorrect clear of non-processed interrupts
can: xilinx_can: fix RX overflow interrupt not being enabled
turn off -Wattribute-alias
exec: avoid gcc-8 warning for get_task_comm
Linux 4.9.116
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 3756f6401c302617c5e091081ca4d26ab604bec5 upstream.
gcc-8 warns about using strncpy() with the source size as the limit:
fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
This is indeed slightly suspicious, as it protects us from source
arguments without NUL-termination, but does not guarantee that the
destination is terminated.
This keeps the strncpy() to ensure we have properly padded target
buffer, but ensures that we use the correct length, by passing the
actual length of the destination buffer as well as adding a build-time
check to ensure it is exactly TASK_COMM_LEN.
There are only 23 callsites which I all reviewed to ensure this is
currently the case. We could get away with doing only the check or
passing the right length, but it doesn't hurt to do both.
Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllxlFQACgkQONu9yGCS
aT42GBAAlL+49kLHeHQszHbIhCyd3bf59TC6cMZdGCA11X/TxvS4jFWpeopXQ7SR
rngaPiAItWFuakZPAIe+cdIQhA1P8o0pwr5iIj0/bXilKYF9ujNrOXxYGTD6qgeB
1G437oYOcIXZQ1sHzy0eMYxM/s8HvmFvVxCwOaISqB6LIZDtX5lYQdzSk/Iy5zAt
k0YBr+q+c6rm9039UQrvyQ0oTwUh0qSWg4rSW8tCOJcIJZP2BaD4M0BLQbaCOYWg
43Npx4MAHqIzWk38fZ7KxOo9yfNv6vbqW1WpB2WAT6i4zJJ6JtXqr7BpYVrSLDL9
NQWvxpuPbEcACL0Mt/mp2fh3bqfh5Hxl77YijyLCo5yq3XRt4w2kOsLUtEiSomQb
51rik+amM7v+ryJqPHLbSngZWbPoVZmkBO1psm5CaMcA4WnVEYy+ga9tnnXRQOzu
8irWSr4L4AxqBoxR/OjyfBcdIOlvgT9jEgnLnER1/W3vHtooyEkXBDYoZ4MszXWm
79VTDRb8i5VIm97klPUpo61D8kGnZOuXDJS2arKqJqVBjcVHeeOwU0jF/4HVpU4N
HKHS/PQRLbV8Aa1r3uADAZLs01dUV1qbdgd7EZBUWj6Vix80w4e3K+kTDIcCe8zC
1KkvGNWjP/3JxwH6GgHsz8hJ6I4Dmsl9JTRUSAFnnGeSj68UDN4=
=QrTI
-----END PGP SIGNATURE-----
Merge 4.9.39 into android-4.9
Changes in 4.9.39
xen-netfront: Rework the fix for Rx stall during OOM and network stress
net_sched: fix error recovery at qdisc creation
net: sched: Fix one possible panic when no destroy callback
net/phy: micrel: configure intterupts after autoneg workaround
ipv6: avoid unregistering inet6_dev for loopback
net: dp83640: Avoid NULL pointer dereference.
tcp: reset sk_rx_dst in tcp_disconnect()
net: prevent sign extension in dev_get_stats()
bridge: mdb: fix leak on complete_info ptr on fail path
rocker: move dereference before free
bpf: prevent leaking pointer via xadd on unpriviledged
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
net/mlx5: Cancel delayed recovery work when unloading the driver
liquidio: fix bug in soft reset failure detection
net/mlx5e: Fix TX carrier errors report in get stats ndo
ipv6: dad: don't remove dynamic addresses if link is down
vxlan: fix hlist corruption
net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64
net: ipv6: Compare lwstate in detecting duplicate nexthops
vrf: fix bug_on triggered by rx when destroying a vrf
rds: tcp: use sock_create_lite() to create the accept socket
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
brcmfmac: Fix a memory leak in error handling path in 'brcmf_cfg80211_attach'
brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain
sfc: don't read beyond unicast address list
cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
cfg80211: Check if PMKID attribute is of expected size
cfg80211: Check if NAN service ID is of expected size
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
parisc: Report SIGSEGV instead of SIGBUS when running out of stack
parisc: use compat_sys_keyctl()
parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
parisc/mm: Ensure IRQs are off in switch_mm()
tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
thp, mm: fix crash due race in MADV_FREE handling
kernel/extable.c: mark core_kernel_text notrace
mm/list_lru.c: fix list_lru_count_node() to be race free
fs/dcache.c: fix spin lockup issue on nlru->lock
checkpatch: silence perl 5.26.0 unescaped left brace warnings
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
arm: move ELF_ET_DYN_BASE to 4MB
arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
s390: reduce ELF_ET_DYN_BASE
exec: Limit arg stack to at most 75% of _STK_LIM
ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers
vt: fix unchecked __put_user() in tioclinux ioctls
rcu: Add memory barriers for NOCB leader wakeup
nvmem: core: fix leaks on registration errors
mnt: In umount propagation reparent in a separate pass
mnt: In propgate_umount handle visiting mounts in any order
mnt: Make propagate_umount less slow for overlapping mount propagation trees
selftests/capabilities: Fix the test_execve test
mm: fix overflow check in expand_upwards()
crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
crypto: atmel - only treat EBUSY as transient if backlog
crypto: sha1-ssse3 - Disable avx2
crypto: caam - properly set IV after {en,de}crypt
crypto: caam - fix signals handling
Revert "sched/core: Optimize SCHED_SMT"
sched/fair, cpumask: Export for_each_cpu_wrap()
sched/topology: Fix building of overlapping sched-groups
sched/topology: Optimize build_group_mask()
sched/topology: Fix overlapping sched_group_mask
PM / wakeirq: Convert to SRCU
PM / QoS: return -EINVAL for bogus strings
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
kvm: vmx: Do not disable intercepts for BNDCFGS
kvm: x86: Guest BNDCFGS requires guest MPX support
kvm: vmx: Check value written to IA32_BNDCFGS
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
4.9.39
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit da029c11e6b12f321f36dac8771e833b65cec962 upstream.
To avoid pathological stack usage or the need to special-case setuid
execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllU3ecACgkQONu9yGCS
aT4beg/7BvQP0EyKbL4RollmfE3BiKL5JNfn/Dxl0SeEuL6tcVg6szwPI9gxoTBx
xVuWIdldPJ0Eb82meDjXibUOq1+5rAPbD0FNJ8AU0pKiNwfTXdmI7kkz8821Za1f
qXuBRUCtNDC4Ab7jRwfUapiafR3cWGSwTCkv2cThrX3ERr8Ie2mP/LXuEIdc3CBy
SY6TxpzsNAwZD29PDAo7FAgbzMH3jJ3Lkxhw7/qcJA0ck23eLJaFDOuUISX+2OxS
65yXXC+nCt/HeSxRUbJKqKpvJs34jqcQKOx967GKhqib7ovuFTIQ3x2v6/6EAXJV
lQkJv9MHqYY3ItsAxqr3AD5cNHrDClZ0GQcnwC6DpvEj2VIwszU3Xxesb4a1U5BM
FlxcTlbRbboANy3Y6+Fevkg5G/sk2cDyqOJIorqKF05ey3KXvV2DNJ2whPAG/0q+
QfCBihDoR259tNz0AzGvQf/d/sLDrrZWLMe4ZG0Dow0woY4S8r2+phz3ywvuqpyr
2dhHN0VZqRax+hEcT7+ZPrBt4ThFlBM5/QSScUhszvIWjgRNvlJnlxDp4uUrNUWO
+Nzd67vbwdDAwmZdV73F7jnHIPkgvw+9Xv5rMmkGzoO9h9y0DTf6c3BlY79p6RH4
vumyxVhyrAybmAA1yQWsF4VCxS2VIRiGPovu/vQ8Km+6Bkm3abM=
=ivKZ
-----END PGP SIGNATURE-----
Merge 4.9.35 into android-4.9
Changes in 4.9.35
clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset
xen/blkback: fix disconnect while I/Os in flight
xen-blkback: don't leak stack data via response ring
ALSA: firewire-lib: Fix stall of process context at packet error
ALSA: pcm: Don't treat NULL chmap as a fatal error
fs/exec.c: account for argv/envp pointers
powerpc/perf: Fix oops when kthread execs user process
autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
lib/cmdline.c: fix get_options() overflow while parsing ranges
perf/x86/intel: Add 1G DTLB load/store miss support for SKL
KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows
KVM: PPC: Book3S HV: Preserve userspace HTM state properly
KVM: PPC: Book3S HV: Context-switch EBB registers properly
CIFS: Improve readdir verbosity
cxgb4: notify uP to route ctrlq compl to rdma rspq
HID: Add quirk for Dell PIXART OEM mouse
signal: Only reschedule timers on signals timers have sent
powerpc/kprobes: Pause function_graph tracing during jprobes handling
powerpc/64s: Handle data breakpoints in Radix mode
Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list
brcmfmac: add parameter to pass error code in firmware callback
brcmfmac: use firmware callback upon failure to load
brcmfmac: unbind all devices upon failure in firmware callback
time: Fix clock->read(clock) race around clocksource changes
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
target: Fix kref->refcount underflow in transport_cmd_finish_abort
iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
iscsi-target: Reject immediate data underflow larger than SCSI transfer length
drm/radeon: add a PX quirk for another K53TK variant
drm/radeon: add a quirk for Toshiba Satellite L20-183
drm/amdgpu/atom: fix ps allocation size for EnableDispPowerGating
drm/amdgpu: adjust default display clock
rxrpc: Fix several cases where a padded len isn't checked in ticket decode
of: Add check to of_scan_flat_dt() before accessing initial_boot_params
mtd: spi-nor: fix spansion quad enable
usb: gadget: f_fs: avoid out of bounds access on comp_desc
rt2x00: avoid introducing a USB dependency in the rt2x00lib module
net: phy: Initialize mdio clock at probe function
dmaengine: bcm2835: Fix cyclic DMA period splitting
spi: double time out tolerance
net: phy: fix marvell phy status reading
jump label: fix passing kbuild_cflags when checking for asm goto support
brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2()
Linux 4.9.35
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 98da7d08850fb8bdeb395d6368ed15753304aa0c upstream.
When limiting the argv/envp strings during exec to 1/4 of the stack limit,
the storage of the pointers to the strings was not included. This means
that an exec with huge numbers of tiny strings could eat 1/4 of the stack
limit in strings and then additional space would be later used by the
pointers to the strings.
For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
single-byte strings would consume less than 2MB of stack, the max (8MB /
4) amount allowed, but the pointers to the strings would consume the
remaining additional stack space (1677721 * 4 == 6710884).
The result (1677721 + 6710884 == 8388605) would exhaust stack space
entirely. Controlling this stack exhaustion could result in
pathological behavior in setuid binaries (CVE-2017-1000365).
[akpm@linux-foundation.org: additional commenting from Kees]
Fixes: b6a2fea393 ("mm: variable length argument support")
Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Qualys Security Advisory <qsa@qualys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.
Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream.
When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked. This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.
Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in. This has already allowed one mistake through
insufficient granulariy.
I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.
This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid. More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.
Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 upstream.
If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):
[vfs]
-> proc_pid_link_inode_operations.get_link
-> proc_pid_get_link
-> proc_fd_access_allowed
-> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.
This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).
Cc: dev@opencontainers.org
Reported-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f84df2a6f268de584a201e8911384a2d244876e3 upstream.
When the user namespace support was merged the need to prevent
ptrace from revealing the contents of an unreadable executable
was overlooked.
Correct this oversight by ensuring that the executed file
or files are in mm->user_ns, by adjusting mm->user_ns.
Use the new function privileged_wrt_inode_uidgid to see if
the executable is a member of the user namespace, and as such
if having CAP_SYS_PTRACE in the user namespace should allow
tracing the executable. If not update mm->user_ns to
the parent user namespace until an appropriate parent is found.
Reported-by: Jann Horn <jann@thejh.net>
Fixes: 9e4a36ece6 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull m68knommu updates from Greg Ungerer:
"This series is all about Nicolas flat format support for MMU systems.
Traditional m68k no-MMU flat format binaries can now be run on m68k
MMU enabled systems too. The series includes some nice cleanups of
the binfmt_flat code and converts it to using proper user space
accessor functions.
With all this in place you can boot and run a complete no-MMU flat
format based user space on an MMU enabled system"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: enable binfmt_flat on systems with an MMU
binfmt_flat: allow compressed flat binary format to work on MMU systems
binfmt_flat: add MMU-specific support
binfmt_flat: update libraries' data segment pointer with userspace accessors
binfmt_flat: use clear_user() rather than memset() to clear .bss
binfmt_flat: use proper user space accessors with old relocs code
binfmt_flat: use proper user space accessors with relocs processing code
binfmt_flat: clean up create_flat_tables() and stack accesses
binfmt_flat: use generic transfer_args_to_stack()
elf_fdpic_transfer_args_to_stack(): make it generic
binfmt_flat: prevent kernel dammage from corrupted executable headers
binfmt_flat: convert printk invocations to their modern form
binfmt_flat: assorted cleanups
m68k: use same start_thread() on MMU and no-MMU
m68k: fix file path comment
m68k: fix bFLT executable running on MMU enabled systems
Some systems are memory constrained but they need to load very large
firmwares. The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver. This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.
This creates needless memory pressure and delays loading because we have
to copy from kernel memory to somewhere else. Let's add a
request_firmware_into_buf() API that allows drivers to request firmware
be loaded directly into a pre-allocated buffer. This skips the
intermediate step of allocating a buffer in kernel memory to hold the
firmware image while it's read from the filesystem. It also requires
that drivers know how much memory they'll require before requesting the
firmware and negates any benefits of firmware caching because the
firmware layer doesn't manage the buffer lifetime.
For a 16MB buffer, about half the time is spent performing a memcpy from
the buffer to the final resting place. I see loading times go from
0.081171 seconds to 0.047696 seconds after applying this patch. Plus
the vmalloc pressure is reduced.
This is based on a patch from Vikram Mulukutla on codeaurora.org:
https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&id=0a328c5f6cd999f5c591f172216835636f39bcb5
Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.
While at it, improve the code a bit not to copy below the actual
data area.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem. Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.
This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.
This does not remove the need for MNT_LOCK_NOSUID. The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.
As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents. If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.
On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.
As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
setup_arg_pages requires mmap_sem for write. If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting. All the callers are already handling error
path and the fatal signal doesn't need any additional treatment.
The same applies to __bprm_mm_init.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remove_arg_zero() does free_arg_page() for no reason. This was needed
before and only if CONFIG_MMU=y: see commit 4fc75ff481 ("exec: fix
remove_arg_zero"), install_arg_page() was called for every page != NULL
in bprm->page[] array. Today install_arg_page() has already gone and
free_arg_page() is nop after another commit b6a2fea393 ("mm: variable
length argument support").
CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't
need remove_arg_zero()->free_arg_page() too; apart from get_arg_page()
it never checks if the page in bprm->page[] was allocated or not, so the
"extra" non-freed page is fine. OTOH, this free_arg_page() can add the
minor pessimization, the caller is going to do copy_strings_kernel()
right after remove_arg_zero() which will likely need to re-allocate the
same page again.
And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong
because we are going to increment bprm->p once again before return, so
CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this
page.
NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this
is not necessarily true. copy_strings() does "len = strnlen_user(...)",
then copy_from_user(len) but another thread or debuger can overwrite the
trailing '0' in between. Afaics nothing really bad can happen because
we must always have the null-terminated bprm->filename copied by the 1st
copy_strings_kernel(), but perhaps we should change this code to check
"bprm->p < bprm->exec" anyway, and/or change copy_strings() to ensure
that the last byte in string is always zero.
Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported by: hujunjie <jj.net@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates from James Morris:
"Highlights:
- A new LSM, "LoadPin", from Kees Cook is added, which allows forcing
of modules and firmware to be loaded from a specific device (this
is from ChromeOS, where the device as a whole is verified
cryptographically via dm-verity).
This is disabled by default but can be configured to be enabled by
default (don't do this if you don't know what you're doing).
- Keys: allow authentication data to be stored in an asymmetric key.
Lots of general fixes and updates.
- SELinux: add restrictions for loading of kernel modules via
finit_module(). Distinguish non-init user namespace capability
checks. Apply execstack check on thread stacks"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (48 commits)
LSM: LoadPin: provide enablement CONFIG
Yama: use atomic allocations when reporting
seccomp: Fix comment typo
ima: add support for creating files using the mknodat syscall
ima: fix ima_inode_post_setattr
vfs: forbid write access when reading a file into memory
fs: fix over-zealous use of "const"
selinux: apply execstack check on thread stacks
selinux: distinguish non-init user namespace capability checks
LSM: LoadPin for kernel file loading restrictions
fs: define a string representation of the kernel_read_file_id enumeration
Yama: consolidate error reporting
string_helpers: add kstrdup_quotable_file
string_helpers: add kstrdup_quotable_cmdline
string_helpers: add kstrdup_quotable
selinux: check ss_initialized before revalidating an inode label
selinux: delay inode label lookup as long as possible
selinux: don't revalidate an inode's label when explicitly setting it
selinux: Change bool variable name to index.
KEYS: Add KEYCTL_DH_COMPUTE command
...
This section of code initially looks redundant, but is required. This
improves the comment to explain more clearly why the reset is needed.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is based on top of the "vfs: support for a common kernel file
loader" patch set. In general when the kernel is reading a file into
memory it does not want anything else writing to it.
The kernel currently only forbids write access to a file being executed.
This patch extends this locking to files being read by the kernel.
Changelog:
- moved function to kernel_read_file() - Mimi
- updated patch description - Mimi
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@huawei.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).
There's a background article at LWN.net:
https://lwn.net/Articles/643797/
The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.
This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).
This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:
mmap(..., PROT_EXEC);
or
mprotect(ptr, sz, PROT_EXEC);
(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.
So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.
We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.
There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.
Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"
* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...
This patch defines kernel_read_file_from_fd(), a wrapper for the VFS
common kernel_read_file().
Changelog:
- Separated from the kernel modules patch
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The kernel_read_file security hook is called prior to reading the file
into memory.
Changelog v4+:
- export security_kernel_read_file()
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
This patch defines kernel_read_file_from_path(), a wrapper for the VFS
common kernel_read_file().
Changelog:
- revert error msg regression - reported by Sergey Senozhatsky
- Separated from the IMA patch
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
To differentiate between the kernel_read_file() callers, this patch
defines a new enumeration named kernel_read_file_id and includes the
caller identifier as an argument.
Subsequent patches define READING_KEXEC_IMAGE, READING_KEXEC_INITRAMFS,
READING_FIRMWARE, READING_MODULE, and READING_POLICY.
Changelog v3:
- Replace the IMA specific enumeration with a generic one.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
For a while it was looked down upon to directly read files from Linux.
These days there exists a few mechanisms in the kernel that do just
this though to load a file into a local buffer. There are minor but
important checks differences on each. This patch set is the first
attempt at resolving some of these differences.
This patch introduces a common function for reading files from the kernel
with the corresponding security post-read hook and function.
Changelog v4+:
- export security_kernel_post_read_file() - Fengguang Wu
v3:
- additional bounds checking - Luis
v2:
- To simplify patch review, re-ordered patches
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
For protection keys, we need to understand whether protections
should be enforced in software or not. In general, we enforce
protections when working on our own task, but not when on others.
We call these "current" and "remote" operations.
This patch introduces a new get_user_pages() variant:
get_user_pages_remote()
Which is a replacement for when get_user_pages() is called on
non-current tsk/mm.
We also introduce a new gup flag: FOLL_REMOTE which can be used
for the "__" gup variants to get this new behavior.
The uprobes is_trap_at_addr() location holds mmap_sem and
calls get_user_pages(current->mm) on an instruction address. This
makes it a pretty unique gup caller. Being an instruction access
and also really originating from the kernel (vs. the app), I opted
to consider this a 'remote' access where protection keys will not
be enforced.
Without protection keys, this patch should not change any behavior.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Today proc and sysfs do not contain any executable files. Several
applications today mount proc or sysfs without noexec and nosuid and
then depend on there being no exectuables files on proc or sysfs.
Having any executable files show on proc or sysfs would cause
a user space visible regression, and most likely security problems.
Therefore commit to never allowing executables on proc and sysfs by
adding a new flag to mark them as filesystems without executables and
enforce that flag.
Test the flag where MNT_NOEXEC is tested today, so that the only user
visible effect will be that exectuables will be treated as if the
execute bit is cleared.
The filesystems proc and sysfs do not currently incoporate any
executable files so this does not result in any user visible effects.
This makes it unnecessary to vet changes to proc and sysfs tightly for
adding exectuable files or changes to chattr that would modify
existing files, as no matter what the individual file say they will
not be treated as exectuable files by the vfs.
Not having to vet changes to closely is important as without this we
are only one proc_create call (or another goof up in the
implementation of notify_change) from having problematic executables
on proc. Those mistakes are all too easy to make and would create
a situation where there are security issues or the assumptions of
some program having to be broken (and cause userspace regressions).
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
currently parisc and metag only) stack randomization sometimes leads to crashes
when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
by default if not defined in arch-specific headers).
The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
additional space needed for the stack randomization (as defined by the value of
STACK_RND_MASK) was not taken into account yet and as such, when the stack
randomization code added a random offset to the stack start, the stack
effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
which then sometimes leads to out-of-stack situations and crashes.
This patch fixes it by adding the maximum possible amount of memory (based on
STACK_RND_MASK) which theoretically could be added by the stack randomization
code to the initial stack size. That way, the user-defined stack size is always
guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).
This bug is currently not visible on the metag architecture, because on metag
STACK_RND_MASK is defined to 0 which effectively disables stack randomization.
The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
section, so it does not affect other platformws beside those where the
stack grows upwards (parisc and metag).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: stable@vger.kernel.org # v3.16+
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.
This patch was mostly written by Linus Torvalds.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We set sig->notify_count = -1 between RELEASE and ACQUIRE operations:
spin_unlock_irq(lock);
...
if (!thread_group_leader(tsk)) {
...
for (;;) {
sig->notify_count = -1;
write_lock_irq(&tasklist_lock);
There are no restriction on it so other processors may see this STORE
mixed with other STOREs in both areas limited by the spinlocks.
Probably, it may be reordered with the above
sig->group_exit_task = tsk;
sig->notify_count = zap_other_threads(tsk);
in some way.
Set it under tasklist_lock locked to be sure nothing will be reordered.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case. For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe. As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock. Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several areas in the kernel that create temporary filename
objects using the following pattern:
int func(const char *name)
{
struct filename *file = { .name = name };
...
return 0;
}
... which for the most part works okay, but it causes havoc within the
audit subsystem as the filename object does not persist beyond the
lifetime of the function. This patch converts all of these temporary
filename objects into proper filename objects using getname_kernel()
and putname() which ensure that the filename object persists until the
audit subsystem is finished with it.
Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
for helping resolve a difficult kernel panic on boot related to a
use-after-free problem in kern_path_create(); the thread can be seen
at the link below:
* https://lkml.org/lkml/2015/1/20/710
This patch includes code that was either based on, or directly written
by Al in the above thread.
CC: viro@zeniv.linux.org.uk
CC: linux@roeck-us.net
CC: sd@queasysnail.net
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts). The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.
Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.
Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns. The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).
Related history:
- https://lkml.org/lkml/2006/12/27/123 is an example of someone
realizing that fexecve() is likely to fail in a chroot environment.
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
documenting the /proc requirement of fexecve(3) in its manpage, to
"prevent other people from wasting their time".
- https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
problem where a process that did setuid() could not fexecve()
because it no longer had access to /proc/self/fd; this has since
been fixed.
This patch (of 4):
Add a new execveat(2) system call. execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.
In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers. This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
so relies on /proc being mounted).
The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found. This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).
Based on patches by Meredydd Luff.
Signed-off-by: David Drysdale <drysdale@google.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We no longer need mpx.h in exec.c. This will obviously also
break the build for non-x86 builds. We get the MPX includes that
we need from mmu_context.h now.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141118003608.837015B3@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>