struct call_single_data is used in IPIs to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Currently it is not allocated with any explicit alignment
requirements. This makes it possible for allocated call_single_data to
cross two cache lines, which results in double the number of the cache lines
that need to be transferred among CPUs.
This can be fixed by requiring call_single_data to be aligned with the
size of call_single_data. Currently the size of call_single_data is the
power of 2. If we add new fields to call_single_data, we may need to
add padding to make sure the size of new definition is the power of 2
as well.
Fortunately, this is enforced by GCC, which will report bad sizes.
To set alignment requirements of call_single_data to the size of
call_single_data, a struct definition and a typedef is used.
To test the effect of the patch, I used the vm-scalability multiple
thread swap test case (swap-w-seq-mt). The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPIs are triggered when unmapping
memory. In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data, because of faster IPIs.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang, Ying <ying.huang@intel.com>
[ Add call_single_data_t and align with size of call_single_data. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4pSQ8ACgkQONu9yGCS
aT7byBAA1Sx2dkXj107MHQ/XQUQjn9LeDoUtdB105780XXRJqS1L6Bsm5pvsiOvQ
GvkKpCHCWg12iuEKJCQ1pr88XKQrik68vRUOCgMt4rh0TovM8eUz9fuvFGO+u330
tCyW9zftkKNMYJmPSn2w6hOZcDK4wxgVjP6hkFgQJjyjFy/dbkcwFb6Vg9cfMRKc
kkuacR9Hm7hG4V2RWD1pNsI4Nlly/oPXEmLJMplVGY+YyOAB5ne14JCevVX22bV2
WD9EUihPCsyB41LF2FhX5jzWhyFKgt/9tyrl7VFjsEqmvvdS7S9YMD3RJ2alzQbo
qT4Wph+xVT3JIdXuFZuAaHUfFKwnWR/6cHcMiejsv/A6B72aRaECSMSN8aCJSYit
eV3L/LNoLaKcpdpJLKVAWSny1ZaLnYTxk0E3OilQz+ZzqRk/LDjnxQry5sem6oXt
3HJlo2cuvd2bQ0Jd5RDnGW6N8CLx4HIMwnnxEjJmOqpUog6zSnhSbsvzpkQ2IZVs
3pFj1eYMausbEfdLXrFuky0cLvswjcYKT6W+CcapGba6IaHDhSg5V2WkPOktwxMW
BYnnzJptWXbbCt6de1ZwOyVpKdmmf/9hDG4egPVaCAs7/AOzE7P5+zIheiN9KqRw
3Fz+KNFB85oztp5Ds4gnd9xYa11uEzTSm+vVCKGVfymPEKvCQfU=
=84H5
-----END PGP SIGNATURE-----
Merge 4.9.211 into android-4.9-q
Changes in 4.9.211
hidraw: Return EPOLLOUT from hidraw_poll
HID: hidraw: Fix returning EPOLLOUT from hidraw_poll
HID: hidraw, uhid: Always report EPOLLOUT
ethtool: reduce stack usage with clang
fs/select: avoid clang stack usage warning
rsi: add fix for crash during assertions
arm64: mm: BUG on unsupported manipulations of live kernel mappings
arm64: don't open code page table entry creation
arm64: mm: Change page table pointer name in p[md]_set_huge()
arm64: Enforce BBM for huge IO/VMAP mappings
arm64: Make sure permission updates happen for pmd/pud
cfg80211/mac80211: make ieee80211_send_layer2_update a public function
mac80211: Do not send Layer 2 Update frame before authorization
media: usb:zr364xx:Fix KASAN:null-ptr-deref Read in zr364xx_vidioc_querycap
wimax: i2400: fix memory leak
wimax: i2400: Fix memory leak in i2400m_op_rfkill_sw_toggle
ext4: fix use-after-free race with debug_want_extra_isize
ext4: add more paranoia checking in ext4_expand_extra_isize handling
dccp: Fix memleak in __feat_register_sp
rtc: mt6397: fix alarm register overwrite
iommu: Remove device link to group on failure
gpio: Fix error message on out-of-range GPIO in lookup table
hsr: reset network header when supervision frame is created
cifs: Adjust indentation in smb2_open_file
RDMA/srpt: Report the SCSI residual to the initiator
scsi: enclosure: Fix stale device oops with hot replug
scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI
platform/x86: asus-wmi: Fix keyboard brightness cannot be set to 0
iio: imu: adis16480: assign bias value only if operation succeeded
mei: fix modalias documentation
clk: samsung: exynos5420: Preserve CPU clocks configuration during suspend/resume
compat_ioctl: handle SIOCOUTQNSD
PCI/PTM: Remove spurious "d" from granularity message
powerpc/powernv: Disable native PCIe port management
tty: serial: imx: use the sg count from dma_map_sg
tty: serial: pch_uart: correct usage of dma_unmap_sg
media: exynos4-is: Fix recursive locking in isp_video_release()
mtd: spi-nor: fix silent truncation in spi_nor_read()
spi: atmel: fix handling of cs_change set on non-last xfer
rtlwifi: Remove unnecessary NULL check in rtl_regd_init
f2fs: fix potential overflow
rtc: msm6242: Fix reading of 10-hour digit
gpio: mpc8xxx: Add platform device to gpiochip->parent
scsi: libcxgbi: fix NULL pointer dereference in cxgbi_device_destroy()
rseq/selftests: Turn off timeout setting
MIPS: Prevent link failure with kcov instrumentation
ioat: ioat_alloc_ring() failure handling.
hexagon: parenthesize registers in asm predicates
hexagon: work around compiler crash
ocfs2: call journal flush to mark journal as empty after journal recovery when mount
dt-bindings: reset: meson8b: fix duplicate reset IDs
clk: Don't try to enable critical clocks if prepare failed
ALSA: seq: Fix racy access for queue timer in proc read
Fix built-in early-load Intel microcode alignment
block: fix an integer overflow in logical block size
iio: buffer: align the size of scan bytes to size of the largest element
USB: serial: simple: Add Motorola Solutions TETRA MTP3xxx and MTP85xx
USB: serial: opticon: fix control-message timeouts
USB: serial: suppress driver bind attributes
USB: serial: ch341: handle unbound port at reset_resume
USB: serial: io_edgeport: add missing active-port sanity check
USB: serial: quatech2: handle unbound ports
scsi: mptfusion: Fix double fetch bug in ioctl
usb: core: hub: Improved device recognition on remote wakeup
x86/efistub: Disable paging at mixed mode entry
perf hists: Fix variable name's inconsistency in hists__for_each() macro
perf report: Fix incorrectly added dimensions as switch perf data file
mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
net: stmmac: 16KB buffer must be 16 byte aligned
net: stmmac: Enable 16KB buffer size
USB: serial: io_edgeport: use irqsave() in USB's complete callback
USB: serial: io_edgeport: handle unbound ports on URB completion
USB: serial: keyspan: handle unbound ports
scsi: fnic: use kernel's '%pM' format option to print MAC
scsi: fnic: fix invalid stack access
arm64: dts: agilex/stratix10: fix pmu interrupt numbers
cfg80211: fix page refcount issue in A-MSDU decap
netfilter: fix a use-after-free in mtype_destroy()
netfilter: arp_tables: init netns pointer in xt_tgdtor_param struct
batman-adv: Fix DAT candidate selection on little endian systems
macvlan: use skb_reset_mac_header() in macvlan_queue_xmit()
net: dsa: tag_qca: fix doubled Tx statistics
net/wan/fsl_ucc_hdlc: fix out of bounds write on array utdm_info
r8152: add missing endpoint sanity check
tcp: fix marked lost packets not being retransmitted
net: usb: lan78xx: limit size of local TSO packets
xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk
cw1200: Fix a signedness bug in cw1200_load_firmware()
cfg80211: check for set_wiphy_params
reiserfs: fix handling of -EOPNOTSUPP in reiserfs_for_each_xattr
scsi: esas2r: unlock on error in esas2r_nvram_read_direct()
scsi: qla4xxx: fix double free bug
scsi: bnx2i: fix potential use after free
scsi: target: core: Fix a pr_debug() argument
scsi: core: scsi_trace: Use get_unaligned_be*()
perf probe: Fix wrong address verification
regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_id
Linux 4.9.211
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifc27b3c6afdbd6a39bd7ae4e551d8bed42dc4973
commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.
Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.
For exmaple (run this on an architecture with 64k pages):
Mount will fail with this error because it tries to read the superblock using 2-sector
access:
device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
EXT4-fs (dm-0): unable to read superblock
This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3blr8ACgkQONu9yGCS
aT5BrQ//bhVt4Fy5NPJ/QlzpslU7XrwTYtGH0foNbUVvUeeJjiTYZj4XrRDE1VhC
dQJNYRs8wqxG1EhNV4ydpCyLZgG+iLFW0pJc+k47WL+KJAChN0gQT0lGRkZ90GUJ
1+FAM9/QJ5T5Z5T3sMfigp+nUDDoWgWrXl4RBxxbROmYu+xgOOa5agWxqzJDh3CU
bjMF9heQEbpK26NyCqpaumL/nBaj86c/Qspq9PHeHwQDF3Sv+XEDgx8S94t8Zc1K
DJojA3Nyl87FQOpdR/1UFCjge7GcdMP2xozRYGzMXqUwNCHjm6crKoM2EaGqHf2a
HrFDUKL+tC6pE0YrPnrsaQkEFauZEB4Zm9zegwuIqxEovM3o+VkjiTi4XI4VdrZT
t5VVE2tirlvNEAIN1BSbMqZnlE+c+8PpObABLhQtpmNAJP+PNXVeze600uissLbp
grAU7hVvC89uzhPTbfhUyAginAAlqftDaO/YvdPlHXYCv4bLGcF1PYyEH2WfzAPp
fpFJwyVE5tUev5WT4Xm1tAzK5EsW0aUZ69y0+RqwC1mKXTvcHQy9A0u+5ccmQTl9
KSoR2Z/y2qvzfUAB0umAsXQohLUhZYtE4cCyj1us0WDy/0DI3HiJK5rDmJElM3DQ
4PcGN4QtH2PZUogtPaHdwOnL9WS7kwFv+RKDdV3Mjw8bTZPiZg0=
=TWjb
-----END PGP SIGNATURE-----
Merge 4.9.203 into android-4.9-q
Changes in 4.9.203
ax88172a: fix information leak on short answers
slip: Fix memory leak in slip_open error path
ALSA: usb-audio: Fix missing error check at mixer resolution test
ALSA: usb-audio: not submit urb for stopped endpoint
Input: ff-memless - kill timer in destroy()
Input: synaptics-rmi4 - fix video buffer size
Input: synaptics-rmi4 - clear IRQ enables for F54
Input: synaptics-rmi4 - destroy F54 poller workqueue when removing
IB/hfi1: Ensure full Gen3 speed in a Gen4 system
ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable
ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either
iommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros
mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
mmc: sdhci-of-at91: fix quirk2 overwrite
ath10k: fix kernel panic by moving pci flush after napi_disable
iio: dac: mcp4922: fix error handling in mcp4922_write_raw
ALSA: pcm: signedness bug in snd_pcm_plug_alloc()
arm64: dts: tegra210-p2180: Correct sdmmc4 vqmmc-supply
ARM: dts: at91/trivial: Fix USART1 definition for at91sam9g45
cfg80211: Avoid regulatory restore when COUNTRY_IE_IGNORE is set
ALSA: seq: Do error checks at creating system ports
ath9k: fix tx99 with monitor mode interface
gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updated
ASoC: dpcm: Properly initialise hw->rate_max
MIPS: BCM47XX: Enable USB power on Netgear WNDR3400v3
ARM: dts: exynos: Fix sound in Snow-rev5 Chromebook
ARM: dts: exynos: Fix regulators configuration on Peach Pi/Pit Chromebooks
i40e: use correct length for strncpy
i40e: hold the rtnl lock on clearing interrupt scheme
i40e: Prevent deleting MAC address from VF when set by PF
IB/rxe: fixes for rdma read retry
iwlwifi: mvm: avoid sending too many BARs
ARM: dts: pxa: fix power i2c base address
rtl8187: Fix warning generated when strncpy() destination length matches the sixe argument
net: lan78xx: Bail out if lan78xx_get_endpoints fails
ASoC: sgtl5000: avoid division by zero if lo_vag is zero
ARM: dts: exynos: Disable pull control for S5M8767 PMIC
ath10k: wmi: disable softirq's while calling ieee80211_rx
mips: txx9: fix iounmap related issue
ASoC: Intel: hdac_hdmi: Limit sampling rates at dai creation
of: make PowerMac cache node search conditional on CONFIG_PPC_PMAC
ARM: dts: omap3-gta04: give spi_lcd node a label so that we can overwrite in other DTS files
ARM: dts: omap3-gta04: fixes for tvout / venc
ARM: dts: omap3-gta04: tvout: enable as display1 alias
ARM: dts: omap3-gta04: fix touchscreen tsc2007
ARM: dts: omap3-gta04: make NAND partitions compatible with recent U-Boot
ARM: dts: omap3-gta04: keep vpll2 always on
dmaengine: dma-jz4780: Don't depend on MACH_JZ4780
dmaengine: dma-jz4780: Further residue status fix
ath9k: add back support for using active monitor interfaces for tx99
signal: Always ignore SIGKILL and SIGSTOP sent to the global init
signal: Properly deliver SIGILL from uprobes
signal: Properly deliver SIGSEGV from x86 uprobes
f2fs: fix memory leak of percpu counter in fill_super()
scsi: sym53c8xx: fix NULL pointer dereference panic in sym_int_sir()
ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff" is set
scsi: pm80xx: Corrected dma_unmap_sg() parameter
scsi: pm80xx: Fixed system hang issue during kexec boot
kprobes: Don't call BUG_ON() if there is a kprobe in use on free list
nvmem: core: return error code instead of NULL from nvmem_device_get
media: fix: media: pci: meye: validate offset to avoid arbitrary access
media: dvb: fix compat ioctl translation
ALSA: intel8x0m: Register irq handler after register initializations
pinctrl: at91-pio4: fix has_config check in atmel_pctl_dt_subnode_to_map()
llc: avoid blocking in llc_sap_close()
ARM: dts: qcom: ipq4019: fix cpu0's qcom,saw2 reg value
powerpc/vdso: Correct call frame information
ARM: dts: socfpga: Fix I2C bus unit-address error
pinctrl: at91: don't use the same irqchip with multiple gpiochips
cxgb4: Fix endianness issue in t4_fwcache()
power: supply: ab8500_fg: silence uninitialized variable warnings
power: reset: at91-poweroff: do not procede if at91_shdwc is allocated
power: supply: max8998-charger: Fix platform data retrieval
component: fix loop condition to call unbind() if bind() fails
kernfs: Fix range checks in kernfs_get_target_path
ip_gre: fix parsing gre header in ipgre_err
ARM: dts: rockchip: Fix erroneous SPI bus dtc warnings on rk3036
ath9k: Fix a locking bug in ath9k_add_interface()
s390/qeth: invoke softirqs after napi_schedule()
PCI/ACPI: Correct error message for ASPM disabling
serial: mxs-auart: Fix potential infinite loop
powerpc/iommu: Avoid derefence before pointer check
powerpc/64s/hash: Fix stab_rr off by one initialization
powerpc/pseries: Disable CPU hotplug across migrations
RDMA/i40iw: Fix incorrect iterator type
libfdt: Ensure INT_MAX is defined in libfdt_env.h
power: supply: twl4030_charger: fix charging current out-of-bounds
power: supply: twl4030_charger: disable eoc interrupt on linear charge
net: toshiba: fix return type of ndo_start_xmit function
net: xilinx: fix return type of ndo_start_xmit function
net: broadcom: fix return type of ndo_start_xmit function
net: amd: fix return type of ndo_start_xmit function
usb: chipidea: imx: enable OTG overcurrent in case USB subsystem is already started
usb: chipidea: Fix otg event handler
mlxsw: spectrum: Init shaper for TCs 8..15
ARM: dts: am335x-evm: fix number of cpsw
f2fs: fix to recover inode's uid/gid during POR
ARM: dts: ux500: Correct SCU unit address
ARM: dts: ux500: Fix LCDA clock line muxing
ARM: dts: ste: Fix SPI controller node names
spi: pic32: Use proper enum in dmaengine_prep_slave_rg
cpufeature: avoid warning when compiling with clang
ARM: dts: marvell: Fix SPI and I2C bus warnings
bnx2x: Ignore bandwidth attention in single function mode
net: micrel: fix return type of ndo_start_xmit function
x86/CPU: Use correct macros for Cyrix calls
MIPS: kexec: Relax memory restriction
media: pci: ivtv: Fix a sleep-in-atomic-context bug in ivtv_yuv_init()
media: au0828: Fix incorrect error messages
media: davinci: Fix implicit enum conversion warning
usb: gadget: uvc: configfs: Drop leaked references to config items
usb: gadget: uvc: configfs: Prevent format changes after linking header
phy: phy-twl4030-usb: fix denied runtime access
usb: gadget: uvc: Factor out video USB request queueing
usb: gadget: uvc: Only halt video streaming endpoint in bulk mode
coresight: Fix handling of sinks
coresight: etm4x: Configure EL2 exception level when kernel is running in HYP
coresight: tmc: Fix byte-address alignment for RRP
misc: kgdbts: Fix restrict error
misc: genwqe: should return proper error value.
vfio/pci: Fix potential memory leak in vfio_msi_cap_len
vfio/pci: Mask buggy SR-IOV VF INTx support
scsi: libsas: always unregister the old device if going to discover new
ARM: dts: tegra30: fix xcvr-setup-use-fuses
ARM: tegra: apalis_t30: fix mmc1 cmd pull-up
ARM: dts: paz00: fix wakeup gpio keycode
net: smsc: fix return type of ndo_start_xmit function
EDAC: Raise the maximum number of memory controllers
ARM: dts: realview: Fix SPI controller node names
Bluetooth: L2CAP: Detect if remote is not able to use the whole MPS
crypto: s5p-sss: Fix Fix argument list alignment
crypto: fix a memory leak in rsa-kcs1pad's encryption mode
scsi: NCR5380: Clear all unissued commands on host reset
scsi: NCR5380: Use DRIVER_SENSE to indicate valid sense data
scsi: NCR5380: Check for invalid reselection target
scsi: NCR5380: Don't clear busy flag when abort fails
scsi: NCR5380: Don't call dsprintk() following reselection interrupt
scsi: NCR5380: Handle BUS FREE during reselection
arm64: dts: amd: Fix SPI bus warnings
arm64: dts: lg: Fix SPI controller node names
ARM: dts: lpc32xx: Fix SPI controller node names
usb: xhci-mtk: fix ISOC error when interval is zero
fuse: use READ_ONCE on congestion_threshold and max_background
IB/iser: Fix possible NULL deref at iser_inv_desc()
memfd: Use radix_tree_deref_slot_protected to avoid the warning.
slcan: Fix memory leak in error path
net: cdc_ncm: Signedness bug in cdc_ncm_set_dgram_size()
x86/atomic: Fix smp_mb__{before,after}_atomic()
kprobes/x86: Prohibit probing on exception masking instructions
uprobes/x86: Prohibit probing on MOV SS instruction
fbdev: Ditch fb_edid_add_monspecs
block: introduce blk_rq_is_passthrough
libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
net: ovs: fix return type of ndo_start_xmit function
net: xen-netback: fix return type of ndo_start_xmit function
ARM: dts: omap5: enable OTG role for DWC3 controller
f2fs: return correct errno in f2fs_gc
SUNRPC: Fix priority queue fairness
kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
arm64/numa: Report correct memblock range for the dummy node
ath10k: fix vdev-start timeout on error
ata: ahci_brcm: Allow using driver or DSL SoCs
ath9k: fix reporting calculated new FFT upper max
usb: gadget: udc: fotg210-udc: Fix a sleep-in-atomic-context bug in fotg210_get_status()
nl80211: Fix a GET_KEY reply attribute
dmaengine: ep93xx: Return proper enum in ep93xx_dma_chan_direction
dmaengine: timb_dma: Use proper enum in td_prep_slave_sg
mei: samples: fix a signedness bug in amt_host_if_call()
cxgb4: Use proper enum in cxgb4_dcb_handle_fw_update
cxgb4: Use proper enum in IEEE_FAUX_SYNC
powerpc/pseries: Fix DTL buffer registration
powerpc/pseries: Fix how we iterate over the DTL entries
mtd: rawnand: sh_flctl: Use proper enum for flctl_dma_fifo0_transfer
ixgbe: Fix crash with VFs and flow director on interface flap
IB/mthca: Fix error return code in __mthca_init_one()
IB/mlx4: Avoid implicit enumerated type conversion
ACPICA: Never run _REG on system_memory and system_IO
ata: ep93xx: Use proper enums for directions
media: pxa_camera: Fix check for pdev->dev.of_node
ALSA: hda/sigmatel - Disable automute for Elo VuPoint
KVM: PPC: Book3S PR: Exiting split hack mode needs to fixup both PC and LR
USB: serial: cypress_m8: fix interrupt-out transfer length
mtd: physmap_of: Release resources on error
cpu/SMT: State SMT is disabled even with nosmt and without "=force"
brcmfmac: reduce timeout for action frame scan
brcmfmac: fix full timeout waiting for action frame on-channel tx
clk: samsung: Use clk_hw API for calling clk framework from clk notifiers
i2c: brcmstb: Allow enabling the driver on DSL SoCs
NFSv4.x: fix lock recovery during delegation recall
dmaengine: ioat: fix prototype of ioat_enumerate_channels
Input: st1232 - set INPUT_PROP_DIRECT property
Input: silead - try firmware reload after unsuccessful resume
x86/olpc: Fix build error with CONFIG_MFD_CS5535=m
crypto: mxs-dcp - Fix SHA null hashes and output length
crypto: mxs-dcp - Fix AES issues
ACPI / SBS: Fix rare oops when removing modules
iwlwifi: mvm: don't send keys when entering D3
fbdev: sbuslib: use checked version of put_user()
fbdev: sbuslib: integer overflow in sbusfb_ioctl_helper()
reset: Fix potential use-after-free in __of_reset_control_get()
bcache: recal cached_dev_sectors on detach
s390/kasan: avoid vdso instrumentation
proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted()
backlight: lm3639: Unconditionally call led_classdev_unregister
mfd: ti_am335x_tscadc: Keep ADC interface on if child is wakeup capable
printk: Give error on attempt to set log buffer length to over 2G
media: isif: fix a NULL pointer dereference bug
GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
media: cx231xx: fix potential sign-extension overflow on large shift
x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
gpio: syscon: Fix possible NULL ptr usage
spi: spidev: Fix OF tree warning logic
ARM: 8802/1: Call syscall_trace_exit even when system call skipped
orangefs: rate limit the client not running info message
hwmon: (pwm-fan) Silence error on probe deferral
hwmon: (ina3221) Fix INA3221_CONFIG_MODE macros
misc: cxl: Fix possible null pointer dereference
mac80211: minstrel: fix CCK rate group streams value
spi: rockchip: initialize dma_slave_config properly
ARM: dts: omap5: Fix dual-role mode on Super-Speed port
arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault
Linux 4.9.203
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 57292b58ddb58689e8c3b4c6eadbef10d9ca44dd upstream.
This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
[only take the blkdev.h changes as we only want the function for backported
patches - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAls7QIYACgkQONu9yGCS
aT5uAw/9Fn18nJcN+UdW2eiUWAXEUFSGVSNpN9q9XkRBoVIeTtIRFf+dusvVMhVX
4uVMeVzjzoXSGuWQRUWGE+gfs12U55LK6ZcgNQqCrc0kdlacas7QhYHcAJyUAG95
qNuOYtYA/bcCnEN/0xZKs9e7tCKwblSV1n9J3zTd71f0l87ZfFxi/7kCNabuQCLr
d6onuOS9pDJpb6j0X5ukjHc5LMW4UmcSBdKfwi1p+6pkx0tZ1V6Zx9rDO5zGU+4A
VChyT+Cvpv+oy70LQsPjjYyAksIclDFJdDORrQqx3PWCxFTbirgrqKRehyPOi5T+
afIx1dnxOI3kdpVFZwoa9vjDegX1A6jFNMdS2Gvi14I9OWFKCPNtaPg0XrOmZ2Ck
apxXJx1hD8zUfJW1vxn6LKQ7osP6RfkGsFm0Gh0U/5tPWU+Eqsv/kWoIzQM0q8LM
ksVxu/8IVSvXA1cOc0w2lLjiD9zpG+Dma8avKhSSflK1DfCEDD/lH7OeFsI/bDWY
3Rl6h/e+2YQJI6L4BMdZ1RyaTPeaskkmtXIC4z/f9XTYHEAj/gcpbIhfqcJMgm/F
NDGVdTlcTvCk9EBDAx5sgSBA8KetDtFTUENWswM5fzbMdX/hawcQXRmzkNNaBIF5
R6CoPB05zNbsmgUhuiz5wOxFle6VfAGyhi+kiM4mVx+ywVvGxCg=
=d1H3
-----END PGP SIGNATURE-----
Merge 4.9.111 into android-4.9
Changes in 4.9.111
x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec()
x86/mce: Improve error message when kernel cannot recover
x86/mce: Check for alternate indication of machine check recovery on Skylake
x86/mce: Fix incorrect "Machine check from unknown source" message
x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()
x86: Call fixup_exception() before notify_die() in math_error()
m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap()
serial: sh-sci: Use spin_{try}lock_irqsave instead of open coding version
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
usb: do not reset if a low-speed or full-speed device timed out
1wire: family module autoload fails because of upper/lower case mismatch.
ASoC: dapm: delete dapm_kcontrol_data paths list before freeing it
ASoC: cirrus: i2s: Fix LRCLK configuration
ASoC: cirrus: i2s: Fix {TX|RX}LinCtrlData setup
clk: renesas: cpg-mssr: Stop using printk format %pCr
lib/vsprintf: Remove atomic-unsafe support for %pCr
mips: ftrace: fix static function graph tracing
branch-check: fix long->int truncation when profiling branches
ipmi:bt: Set the timeout before doing a capabilities check
Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader
fuse: atomic_o_trunc should truncate pagecache
fuse: don't keep dead fuse_conn at fuse_fill_super().
fuse: fix control dir setup and teardown
powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch
powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG
powerpc/ptrace: Fix enforcement of DAWR constraints
powerpc/powernv/ioda2: Remove redundant free of TCE pages
cpuidle: powernv: Fix promotion from snooze if next state disabled
powerpc/fadump: Unregister fadump on kexec down path.
ARM: 8764/1: kgdb: fix NUMREGBYTES so that gdb_regs[] is the correct size
arm64: kpti: Use early_param for kpti= command-line option
arm64: mm: Ensure writes to swapper are ordered wrt subsequent cache maintenance
of: unittest: for strings, account for trailing \0 in property length field
IB/qib: Fix DMA api warning with debug kernel
IB/{hfi1, qib}: Add handling of kernel restart
IB/mlx5: Fetch soft WQE's on fatal error state
IB/isert: Fix for lib/dma_debug check_sync warning
IB/isert: fix T10-pi check mask setting
RDMA/mlx4: Discard unknown SQP work requests
mtd: cfi_cmdset_0002: Change write buffer to check correct value
mtd: cfi_cmdset_0002: Use right chip in do_ppb_xxlock()
mtd: cfi_cmdset_0002: fix SEGV unlocking multiple chips
mtd: cfi_cmdset_0002: Fix unlocking requests crossing a chip boudary
mtd: cfi_cmdset_0002: Avoid walking all chips when unlocking.
MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum
PCI: Add ACS quirk for Intel 7th & 8th Gen mobile
PCI: Add ACS quirk for Intel 300 series
PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
printk: fix possible reuse of va_list variable
MIPS: io: Add barrier after register read in inX()
time: Make sure jiffies_to_msecs() preserves non-zero time periods
X.509: unpack RSA signatureValue field from BIT STRING
Btrfs: fix return value on rename exchange failure
Btrfs: fix unexpected cow in run_delalloc_nocow
iio:buffer: make length types match kfifo types
scsi: qla2xxx: Fix setting lower transfer speed if GPSC fails
scsi: zfcp: fix missing SCSI trace for result of eh_host_reset_handler
scsi: zfcp: fix missing SCSI trace for retry of abort / scsi_eh TMF
scsi: zfcp: fix misleading REC trigger trace where erp_action setup failed
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io early return
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io for ERP_FAILED
scsi: zfcp: fix missing REC trigger trace for all objects in ERP_FAILED
scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread
linvdimm, pmem: Preserve read-only setting for pmem devices
clk: at91: PLL recalc_rate() now using cached MUL and DIV values
md: fix two problems with setting the "re-add" device state.
rpmsg: smd: do not use mananged resources for endpoints and channels
ubi: fastmap: Cancel work upon detach
ubi: fastmap: Correctly handle interrupted erasures in EBA
UBIFS: Fix potential integer overflow in allocation
backlight: as3711_bl: Fix Device Tree node lookup
backlight: max8925_bl: Fix Device Tree node lookup
backlight: tps65217_bl: Fix Device Tree node lookup
mfd: intel-lpss: Program REMAP register in PIO mode
perf tools: Fix symbol and object code resolution for vdso32 and vdsox32
perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING
perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP
perf intel-pt: Fix MTC timing after overflow
perf intel-pt: Fix "Unexpected indirect branch" error
perf intel-pt: Fix packet decoding of CYC packets
media: v4l2-compat-ioctl32: prevent go past max size
media: cx231xx: Add support for AverMedia DVD EZMaker 7
media: dvb_frontend: fix locking issues at dvb_frontend_get_event()
nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdir
NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_message
NFSv4: Revert commit 5f83d86cf5 ("NFSv4.x: Fix wraparound issues..")
video: uvesafb: Fix integer overflow in allocation
Input: elan_i2c - add ELAN0618 (Lenovo v330 15IKB) ACPI ID
pwm: lpss: platform: Save/restore the ctrl register over a suspend/resume
rbd: flush rbd_dev->watch_dwork after watch is unregistered
mm: fix devmem_is_allowed() for sub-page System RAM intersections
xen: Remove unnecessary BUG_ON from __unbind_from_irq()
udf: Detect incorrect directory size
Input: elan_i2c_smbus - fix more potential stack buffer overflows
Input: elantech - enable middle button of touchpads on ThinkPad P52
Input: elantech - fix V4 report decoding for module with middle key
ALSA: hda/realtek - Fix pop noise on Lenovo P50 & co
ALSA: hda/realtek - Add a quirk for FSC ESPRIMO U9210
block: Fix transfer when chunk sectors exceeds max
dm thin: handle running out of data space vs concurrent discard
cdc_ncm: avoid padding beyond end of skb
Linux 4.9.111
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 15bfd21fbc5d35834b9ea383dc458a1f0c9e3434 upstream.
A device may have boundary restrictions where the number of sectors
between boundaries exceeds its max transfer size. In this case, we need
to cap the max size to the smaller of the two limits.
Reported-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Tested-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current io_latency_state structure includes entries for read
and write requests. There are special types of write commands
such as sync and discard (trim) commands, and the current
implementation is not general enough if we want to separate
latency histogram for such special commands.
This change makes io_latency_state structure request-type neutral.
It also changes to print the latency average.
Signed-off-by: Hyojun Kim <hyojun@google.com>
This patch adds a new sysfs node (latency_hist) and reports
IO (svc time) latency histograms. Disabled by default, can be
enabled by echoing 0 into latency_hist, stats can be cleared
by writing 2 into latency_hist. This commit fixes the 32 bit
build breakage in the previous commit. Tested on both 32 bit
and 64 bit arm devices. (Merged from android4.4-common).
Bug: 30677035
Change-Id: I9403ea093f4cd54ebae08f12a11d01ef153118d0
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
commit 9a05e7541c39680d28ecf91892338e074738d5fd upstream.
With compilers which follow the C99 standard (like modern versions of
gcc and clang), "extern inline" does the opposite thing from older
versions of gcc (emits code for an externally linkable version of the
inline function).
"static inline" does the intended behavior in all cases instead.
Description taken from commit 6d91857d48 ("staging, rtl8192e,
LLVMLinux: Change extern inline to static inline").
This also fixes the following GCC warning when building with CONFIG_PM
disabled:
./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes]
Fixes: d07ab6d114 ("block: Add blk_set_runtime_active()")
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
blk_mq_delay_kick_requeue_list() provides the ability to kick the
q->requeue_list after a specified time. To do this the request_queue's
'requeue_work' member was changed to a delayed_work.
blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
requests while it doesn't make sense to immediately requeue them
(e.g. when all paths in a DM multipath have failed).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit 288dab8a35 ("block: add a separate operation type for secure
erase") split REQ_OP_SECURE_ERASE from REQ_OP_DISCARD without considering
all the places REQ_OP_DISCARD was being used to mean either. Fix those.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 288dab8a35 ("block: add a separate operation type for secure erase")
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit abf545484d changed it from an 'rw' flags type to the
newer ops based interface, but now we're effectively leaking
some bdev internals to the rest of the kernel. Since we only
care about whether it's a read or a write at that level, just
pass in a bool 'is_write' parameter instead.
Then we can also move op_is_write() and friends back under
CONFIG_BLOCK protection.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The rw_page users were not converted to use bio/req ops. As a result
bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
be sent down as reads.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Fixes: 4e1b2d52a8 ("block, fs, drivers: remove REQ_OP compat defs and related code")
Modified by me to:
1) Drop op_flags passing into ->rw_page(), as we don't use it.
2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK
Signed-off-by: Jens Axboe <axboe@fb.com>
In include/linux/blkdev.h duplicate declarations of the request
struct exist. Cleaned up by removing the second, unneeded
declaration.
Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
1/ Replace pcommit with ADR / directed-flushing:
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement either
ADR, or provide one or more flush addresses per nvdimm. ADR
(Asynchronous DRAM Refresh) flushes data in posted write buffers to the
memory controller on a power-fail event. Flush addresses are defined in
ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
"Flush Hint Address Structure". A flush hint is an mmio address that
when written and fenced assures that all previous posted writes
targeting a given dimm have been flushed to media.
2/ On-demand ARS (address range scrub):
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the media
to refresh the bad block list, userspace can also request a re-scrub at
any time.
3/ Support for the Microsoft DSM (device specific method) command format.
4/ Support for EDK2/OVMF virtual disk device memory ranges.
5/ Various fixes and cleanups across the subsystem.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ
NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY
ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h
trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG
KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu
qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP
WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7
41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA
DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl
b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC
6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV
cN3edFVIdxvZeMgM5Ubq
=xCBG
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
- Replace pcommit with ADR / directed-flushing.
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement
either ADR, or provide one or more flush addresses per nvdimm.
ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
to the memory controller on a power-fail event.
Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
A flush hint is an mmio address that when written and fenced assures
that all previous posted writes targeting a given dimm have been
flushed to media.
- On-demand ARS (address range scrub).
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the
media to refresh the bad block list, userspace can also request a
re-scrub at any time.
- Support for the Microsoft DSM (device specific method) command
format.
- Support for EDK2/OVMF virtual disk device memory ranges.
- Various fixes and cleanups across the subsystem.
* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
nfit: do an ARS scrub on hitting a latent media error
nfit: move to nfit/ sub-directory
nfit, libnvdimm: allow an ARS scrub to be triggered on demand
libnvdimm: register nvdimm_bus devices with an nd_bus driver
pmem: clarify a debug print in pmem_clear_poison
x86/insn: remove pcommit
Revert "KVM: x86: add pcommit support"
nfit, tools/testing/nvdimm/: unify shutdown paths
libnvdimm: move ->module to struct nvdimm_bus_descriptor
nfit: cleanup acpi_nfit_init calling convention
nfit: fix _FIT evaluation memory leak + use after free
tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
tools/testing/nvdimm: add virtual ramdisk range
acpi, nfit: treat virtual ramdisk SPA as pmem region
pmem: kill __pmem address space
pmem: kill wmb_pmem()
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
fs/dax: remove wmb_pmem()
libnvdimm, pmem: flush posted-write queues on shutdown
...
Pull block driver updates from Jens Axboe:
"This branch also contains core changes. I've come to the conclusion
that from 4.9 and forward, I'll be doing just a single branch. We
often have dependencies between core and drivers, and it's hard to
always split them up appropriately without pulling core into drivers
when that happens.
That said, this contains:
- separate secure erase type for the core block layer, from
Christoph.
- set of discard fixes, from Christoph.
- bio shrinking fixes from Christoph, as a followup up to the
op/flags change in the core branch.
- map and append request fixes from Christoph.
- NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
exciting!
- nvme-loop fixes from Arnd.
- removal of ->driverfs_dev from Dan, after providing a
device_add_disk() helper.
- bcache fixes from Bhaktipriya and Yijing.
- cdrom subchannel read fix from Vchannaiah.
- set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.
- set of drbd updates and fixes from Fabian, Lars, and Philipp.
- mg_disk error path fix from Bart.
- user notification for failed device add for loop, from Minfei.
- NVMe in general:
+ NVMe delay quirk from Guilherme.
+ SR-IOV support and command retry limits from Keith.
+ fix for memory-less NUMA node from Masayoshi.
+ use UINT_MAX for discard sectors, from Minfei.
+ cancel IO fixes from Ming.
+ don't allocate unused major, from Neil.
+ error code fixup from Dan.
+ use constants for PSDT/FUSE from James.
+ variable init fix from Jay.
+ fabrics fixes from Ming, Sagi, and Wei.
+ various fixes"
* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
nvme/pci: Provide SR-IOV support
nvme: initialize variable before logical OR'ing it
block: unexport various bio mapping helpers
scsi/osd: open code blk_make_request
target: stop using blk_make_request
block: simplify and export blk_rq_append_bio
block: ensure bios return from blk_get_request are properly initialized
virtio_blk: use blk_rq_map_kern
memstick: don't allow REQ_TYPE_BLOCK_PC requests
block: shrink bio size again
block: simplify and cleanup bvec pool handling
block: get rid of bio_rw and READA
block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
NVMe: don't allocate unused nvme_major
nvme: avoid crashes when node 0 is memoryless node.
nvme: Limit command retries
loop: Make user notify for adding loop device failed
nvme-loop: fix nvme-loop Kconfig dependencies
nvmet: fix return value check in nvmet_subsys_alloc()
...
For a front merge, the maximum number of sectors of the
request must be checked against the front merge BIO sector,
not the current sector of the request.
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently, presence of direct_access() in block_device_operations
indicates support of DAX on its block device. Because
block_device_operations is instantiated with 'const', this DAX
capablity may not be enabled conditinally.
In preparation for supporting DAX to device-mapper devices, add
QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
support. This will allow to set the DAX capability based on how
mapped device is composed.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <linux-s390@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
I wish the OSD code could simply use blk_rq_map_* helpers like
everyone else, but the complex nature of deciding if we have
DATA IN and/or DATA OUT buffers might make this impossible
(at least for a mere human like me).
But using blk_rq_append_bio at least allows sharing the setup code
between request with or without dat a buffers, and given that this
is the last user of blk_make_request it allows getting rid of that
somewhat awkward interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The target SCSI passthrough backend is much better served with the low-level
blk_rq_append_bio construct then the helpers built on top of it, so export it.
Also use the opportunity to remove the pointless request_queue argument and
make the code flow a little more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently blkdev_issue_zeroout cascades down from discards (if the driver
guarantees that discards zero data), to WRITE SAME and then to a loop
writing zeroes. Unfortunately we ignore run-time EOPNOTSUPP errors in the
block layer blkdev_issue_discard helper to work around DM volumes that
may have mixed discard support underneath.
This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
blkdev_issue_discard that indicates we are called for zeroing operation.
This allows both to ignore the EOPNOTSUPP hack and actually consolidating
the discard_zeroes_data check into the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem(). Now that
wmb_pmem() is gone, there is little need to keep this annotation.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently rq->fifo_time is unsigned long but CFQ stores nanosecond
timestamp in it which would overflow on 32-bit archs. Convert it to u64
to avoid the overflow. Since the rq->fifo_time is unioned with struct
call_single_data(), this does not change the size of struct request in
any way.
We have to slightly fixup block/deadline-iosched.c so that comparison
happens in the right types.
Fixes: 9a7f38c42c
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Instead of overloading the discard support with the REQ_SECURE flag.
Use the opportunity to rename the queue flag as well, and remove the
dead checks for this flag in the RAID 1 and RAID 10 drivers that don't
claim support for secure erase.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Keep the 32-bit CPU and cmd_type flags together to avoid holes on 64-bit
architectures.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
This adds a REQ_OP_FLUSH operation that is sent to request_fn
based drivers by the block layer's flush code, instead of
sending requests with the request->cmd_flags REQ_FLUSH bit set.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch drops the compat definition of req_op where it matches
the rq_flag_bits definitions, and drops the related old and compat
code that allowed users to set either the op or flags for the operation.
We also then store the operation in the bi_rw/cmd_flags field similar
to how we used to store the bio ioprio where it sat in the upper bits
of the field.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch converts the is_sync helpers to use separate variables
for the operation and flags.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch converts the block layer merging code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This converts the block issue discard helper and users to use
the bio_set_op_attrs accessor and only pass in the operation flags
like REQ_SEQURE.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The following patches separate the operation (WRITE, READ, DISCARD,
etc) from the rq_flag_bits flags. This patch adds definitions for
request/bio operations (REQ_OPs) and adds request/bio accessors to
get/set the op.
In this patch the REQ_OPs match the REQ rq_flag_bits ones
for compat reasons while all the code is converted to use the
op accessors in the set. In the last patches the op will become a
number and the accessors and helpers in this patch will be dropped
or updated.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
- Until now, dax has been disabled if media errors were found on
any device. This enables the use of DAX in the presence of these
errors by making all sector-aligned zeroing go through the driver.
- The driver (already) has the ability to clear errors on writes that
are sent through the block layer using 'DSMs' defined in ACPI 6.1.
Other misc changes:
- When mounting DAX filesystems, check to make sure the partition
is page aligned. This is a requirement for DAX, and previously, we
allowed such unaligned mounts to succeed, but subsequent reads/writes
would fail.
- Misc/cleanup fixes from Jan that remove unused code from DAX related to
zeroing, writeback, and some size checks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXQ4GKAAoJEHr6Yb6juE3/zowP/iclIhgXXXMQJRUHJlePMXC8
15sGZ32JS1ak9g7vrsmNVEDNynfNtiMYdBxtUyRuj6xqgwdZvFk3F55KOCPtaeA1
+yADkgeRkTAcwzmHw9WQVEzBCqyzSisdrwtEfH817qdq9FJdH66x2Kos6i+HeAVr
5Q/e4gs7lKrjf384/QBl+wxNZOndJaQAPd2VRHQqx2A9F33v0ljdwRaUG1r4fjK2
dtmhcZCqdQyuAGXW3piTnZc5ZFc3DPqO4FkEfqkEK3lFOflK0fd8wMsAZRp/Jd0j
GJsgnVSWSqG0Dz476djlG0w8t2p5Jv1g9cKChV+ZZEdFLKWHCOUFqXNj8uI8I4k5
cOEKCHyJ3IwfSHhNQqktEWrQN4T8ZXhWtuc9GuV4UZYuqJqHci6EdR/YsWsJjV+L
lm/qvK4ipDS1pivxOy8KX/iN0z7Io8J9GXpStDx3g8iWjLlh4YYlbJLWeeRepo/z
aPlV/QAKcHiGY6jzLExrZIyCWkzwo6O+0p1Kxerv9/7K/32HWbOodZ+tC8eD+N25
pV69nCGf+u50T2TtIx1+iann4NC1r7zg5yqnT9AgpyZpiwR5joCDzI5sXW+D0rcS
vPtfM84Ccdeq/e6mvfIpZgR0/npQapKnrmUest0J7P2BFPHiFPji1KzZ7M+1aFOo
9R6JdrAj0Sc+FBa+cGzH
=v6Of
-----END PGP SIGNATURE-----
Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull misc DAX updates from Vishal Verma:
"DAX error handling for 4.7
- Until now, dax has been disabled if media errors were found on any
device. This enables the use of DAX in the presence of these
errors by making all sector-aligned zeroing go through the driver.
- The driver (already) has the ability to clear errors on writes that
are sent through the block layer using 'DSMs' defined in ACPI 6.1.
Other misc changes:
- When mounting DAX filesystems, check to make sure the partition is
page aligned. This is a requirement for DAX, and previously, we
allowed such unaligned mounts to succeed, but subsequent
reads/writes would fail.
- Misc/cleanup fixes from Jan that remove unused code from DAX
related to zeroing, writeback, and some size checks"
* tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix a comment in dax_zero_page_range and dax_truncate_page
dax: for truncate/hole-punch, do zeroing through the driver if possible
dax: export a low-level __dax_zero_page_range helper
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: enable dax in the presence of known media errors (badblocks)
dax: fallback from pmd to pte on error
block: Update blkdev_dax_capable() for consistency
xfs: Add alignment check for DAX mount
ext2: Add alignment check for DAX mount
ext4: Add alignment check for DAX mount
block: Add bdev_dax_supported() for dax mount checks
block: Add vfs_msg() interface
dax: Remove redundant inode size checks
dax: Remove pointless writeback from dax_do_io()
dax: Remove zeroing from dax_io()
dax: Remove dead zeroing code from fault handlers
ext2: Avoid DAX zeroing to corrupt data
ext2: Fix block zeroing in ext2_get_blocks() for DAX
dax: Remove complete_unwritten argument
DAX: move RADIX_DAX_ definitions to dax.c
1/ If a mapping overlaps a bad sector fail the request.
2/ Do not opportunistically report more dax-capable capacity than is
requested when errors present.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[vishal: fix a conflict with system RAM collision patches]
[vishal: add a 'size' parameter to ->direct_access]
[vishal: fix a conflict with DAX alignment check patches]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Pull block driver updates from Jens Axboe:
"On top of the core pull request, this is the drivers pull request for
this merge window. This contains:
- Switch drivers to the new write back cache API, and kill off the
flush flags. From me.
- Kill the discard support for the STEC pci-e flash driver. It's
trivially broken, and apparently unmaintained, so it's safer to
just remove it. From Jeff Moyer.
- A set of lightnvm updates from the usual suspects (Matias/Javier,
and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei
Tao.
- A set of updates for NVMe:
- Turn the controller state management into a proper state
machine. From Christoph.
- Shuffling of code in preparation for NVMe-over-fabrics, also
from Christoph.
- Cleanup of the command prep part from Ming Lin.
- Rewrite of the discard support from Ming Lin.
- Deadlock fix for namespace removal from Ming Lin.
- Use the now exported blk-mq tag helper for IO termination.
From Sagi.
- Various little fixes from Christoph, Guilherme, Keith, Ming
Lin, Wang Sheng-Hui.
- Convert mtip32xx to use the now exported blk-mq tag iter function,
from Keith"
* 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits)
lightnvm: reserved space calculation incorrect
lightnvm: rename nr_pages to nr_ppas on nvm_rq
lightnvm: add is_cached entry to struct ppa_addr
lightnvm: expose gennvm_mark_blk to targets
lightnvm: remove mgt targets on mgt removal
lightnvm: pass dma address to hardware rather than pointer
lightnvm: do not assume sequential lun alloc.
nvme/lightnvm: Log using the ctrl named device
lightnvm: rename dma helper functions
lightnvm: enable metadata to be sent to device
lightnvm: do not free unused metadata on rrpc
lightnvm: fix out of bound ppa lun id on bb tbl
lightnvm: refactor set_bb_tbl for accepting ppa list
lightnvm: move responsibility for bad blk mgmt to target
lightnvm: make nvm_set_rqd_ppalist() aware of vblks
lightnvm: remove struct factory_blks
lightnvm: refactor device ops->get_bb_tbl()
lightnvm: introduce nvm_for_each_lun_ppa() macro
lightnvm: refactor dev->online_target to global nvm_targets
lightnvm: rename nvm_targets to nvm_tgt_type
...
blkdev_dax_capable() is similar to bdev_dax_supported(), but needs
to remain as a separate interface for checking dax capability of
a raw block device.
Rename and relocate blkdev_dax_capable() to keep them maintained
consistently, and call bdev_direct_access() for the dax capability
check.
There is no change in the behavior.
Link: https://lkml.org/lkml/2016/5/9/950
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@fb.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
DAX imposes additional requirements to a device. Add
bdev_dax_supported() which performs all the precondition checks
necessary for filesystem to mount the device with dax option.
Also add a new check to verify if a partition is aligned by 4KB.
When a partition is unaligned, any dax read/write access fails,
except for metadata update.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@fb.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
In preparation of moving DAX capability checks to the block layer
from filesystem code, add a VFS message interface that aligns with
filesystem's message format.
For instance, a vfs_msg() message followed by XFS messages in case
of a dax mount error may look like:
VFS (pmem0p1): error: unaligned partition for dax
XFS (pmem0p1): DAX unsupported by block device. Turning off DAX.
XFS (pmem0p1): Mounting V5 Filesystem
:
vfs_msg() is largely based on ext4_msg().
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@fb.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
This is a version of blkdev_issue_discard which doesn't wait for
the I/O to complete, but instead allows the caller to submit
the final bio and/or chain it to others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.
Signed-off-by: Jens Axboe <axboe@fb.com>
We don't have any drivers left using it, so kill it off. Update
documentation to use the newer blk_queue_write_cache().
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.
This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
We could kmalloc() the payload, so need the offset in page.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
- Preparations of parallel lookups (the remaining main obstacle is the
need to move security_d_instantiate(); once that becomes safe, the
rest will be a matter of rather short series local to fs/*.c
- preadv2/pwritev2 series from Christoph
- assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits)
splice: handle zero nr_pages in splice_to_pipe()
vfs: show_vfsstat: do not ignore errors from show_devname method
dcache.c: new helper: __d_add()
don't bother with __d_instantiate(dentry, NULL)
untangle fsnotify_d_instantiate() a bit
uninline d_add()
replace d_add_unique() with saner primitive
quota: use lookup_one_len_unlocked()
cifs_get_root(): use lookup_one_len_unlocked()
nfs_lookup: don't bother with d_instantiate(dentry, NULL)
kill dentry_unhash()
ceph_fill_trace(): don't bother with d_instantiate(dn, NULL)
autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup()
configfs: move d_rehash() into configfs_create() for regular files
ceph: don't bother with d_rehash() in splice_dentry()
namei: teach lookup_slow() to skip revalidate
namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()
lookup_one_len_unlocked(): use lookup_dcache()
namei: simplify invalidation logics in lookup_dcache()
namei: change calling conventions for lookup_{fast,slow} and follow_managed()
...
Pull libata updates from Tejun Heo:
- ahci grew runtime power management support so that the controller can
be turned off if no devices are attached.
- sata_via isn't dead yet. It got hotplug support and more refined
workaround for certain WD drives.
- Misc cleanups. There's a merge from for-4.5-fixes to avoid confusing
conflicts in ahci PCI ID table.
* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci_xgene: dereferencing uninitialized pointer in probe
AHCI: Remove obsolete Intel Lewisburg SATA RAID device IDs
ata: sata_rcar: Use ARCH_RENESAS
sata_via: Implement hotplug for VT6421
sata_via: Apply WD workaround only when needed on VT6421
ahci: Add runtime PM support for the host controller
ahci: Add functions to manage runtime PM of AHCI ports
ahci: Convert driver to use modern PM hooks
ahci: Cache host controller version
scsi: Drop runtime PM usage count after host is added
scsi: Set request queue runtime PM status back to active on resume
block: Add blk_set_runtime_active()
ata: ahci_mvebu: add support for Armada 3700 variant
libata: fix unbalanced spin_lock_irqsave/spin_unlock_irq() in ata_scsi_park_show()
libata: support AHCI on OCTEON platform