Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b35
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.
Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.
[1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf
Suggested-by: Eric Dumazet <edumazet@google.com>
Change-Id: I41744d03eb67dcef27b412f4dedba1b9c78c1d17
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4a894ACgkQONu9yGCS
aT5rLRAAu7SpHGDWU7SlH3ncUXauqLyQ/cMSLLyS1HJ7TYvHujJInV5U3yaz4LQH
e5q+k5BgaPKSe3zN44D9hRhEbEysk19ekRQ5tkhRZ7a41hs31kXTz/QoH2D3+Ybg
VCZVqyuthiDmQnVQwBHzUy9T2RhOIOop0y5I3GcaSj6fHw0ZQ4AOJdmxZzafLvpB
obmDuRYNyhpxU1Ds+gYgcDXjPWsDyrNVEVuCeLVRY7wwLd41ZrKj9r655tBpHyJ+
PrlMI6zNaZ4BMWcG/4bpTE5gUWqhHCxYi5kFgAh9yZmoPV9jSt9DZeggOASwSDTd
qJLPpqlxDp4+/Kx3wZz2brKN6ZHxGvkpGOztcMEvUOOqet+l7yD8RRs5HmcYKaSg
qMDNcFIBSN3jcnQdKyqkb+NkJ+6RXMueyk4Mn8f9C1lptDsfODP1hQnJbIPc0NZJ
t7LTDPEVIxuLIAMXQI4Sg2yFEAD+dtaq3XjhMUlCz2SWLAeblvOcrn/Fm7Pb63/s
fj9pq4mNyRmHatmsuL1/R4zLPbz3A1gQ8TbHCNIcxGzPlD5zPlM3/iBXXLKefGrE
BgbFRhdjOCNuHnHVtdzgYH6N/VRF4dtWOT8Qz2btIT9mkm11W5A53BTAtwIYAXgo
GmKOSajhxb25/tUZhc0WVmYf2jdvlM7ygaDfQdYE42G7AxnE67k=
=cc6F
-----END PGP SIGNATURE-----
Merge 4.9.209 into android-4.9-q
Changes in 4.9.209
PM / devfreq: Don't fail devfreq_dev_release if not in list
RDMA/cma: add missed unregister_pernet_subsys in init failure
scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
scsi: qla2xxx: Don't call qlt_async_event twice
scsi: iscsi: qla4xxx: fix double free in probe
scsi: libsas: stop discovering if oob mode is disconnected
usb: gadget: fix wrong endpoint desc
md: raid1: check rdev before reference in raid1_sync_request func
s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
s390/cpum_sf: Avoid SBD overflow condition in irq handler
IB/mlx4: Follow mirror sequence of device add during device removal
xen-blkback: prevent premature module unload
xen/balloon: fix ballooned page accounting without hotplug enabled
PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
xfs: fix mount failure crash on invalid iclog memory access
taskstats: fix data-race
drm: limit to INT_MAX in create_blob ioctl
Revert "perf report: Add warning when libunwind not compiled in"
ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
MIPS: Avoid VDSO ABI breakage due to global register variable
mm/zsmalloc.c: fix the migrated zspage statistics.
memcg: account security cred as well to kmemcg
locks: print unsigned ino in /proc/locks
dmaengine: Fix access to uninitialized dma_slave_caps
compat_ioctl: block: handle Persistent Reservations
ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
ata: ahci_brcm: Allow optional reset controller to be used
ata: ahci_brcm: Fix AHCI resources management
gpiolib: fix up emulated open drain outputs
tracing: Have the histogram compare functions convert to u64 first
ALSA: cs4236: fix error return comparison of an unsigned integer
ftrace: Avoid potential division by zero in function profiler
arm64: Revert support for execute-only user mappings
PM / devfreq: Check NULL governor in available_governors_show
nfsd4: fix up replay_matches_cache()
xfs: don't check for AG deadlock for realtime files in bunmapi
Bluetooth: btusb: fix PM leak in error case of setup
Bluetooth: delete a stray unlock
Bluetooth: Fix memory leak in hci_connect_le_scan
media: flexcop-usb: ensure -EIO is returned on error condition
regulator: ab8500: Remove AB8505 USB regulator
media: usb: fix memory leak in af9005_identify_state
tty: serial: msm_serial: Fix lockup for sysrq and oops
fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
drm/mst: Fix MST sideband up-reply failure handling
powerpc/pseries/hvconsole: Fix stack overread via udbg
rxrpc: Fix possible NULL pointer access in ICMP handling
ath9k_htc: Modify byte order for an error message
ath9k_htc: Discard undersized packets
net: add annotations on hh->hh_len lockless accesses
s390/smp: fix physical to logical CPU map for SMT
xen/blkback: Avoid unmapping unmapped grant pages
locking/x86: Remove the unused atomic_inc_short() methd
pstore/ram: Write new dumps to start of recycled zones
locking/spinlock/debug: Fix various data races
netfilter: ctnetlink: netns exit must wait for callbacks
efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
efi/gop: Return EFI_SUCCESS if a usable GOP was found
efi/gop: Fix memory leak in __gop_query32/64()
ARM: vexpress: Set-up shared OPP table instead of individual for each CPU
netfilter: uapi: Avoid undefined left-shift in xt_sctp.h
spi: spi-cavium-thunderx: Add missing pci_release_regions()
ARM: dts: am437x-gp/epos-evm: fix panel compatible
samples: bpf: Replace symbol compare of trace_event
powerpc: Ensure that swiotlb buffer is allocated from low memory
bnx2x: Do not handle requests from VFs after parity
bnx2x: Fix logic to get total no. of PFs per engine
net: usb: lan78xx: Fix error message format specifier
rfkill: Fix incorrect check to avoid NULL pointer dereference
ASoC: wm8962: fix lambda value
regulator: rn5t618: fix module aliases
kconfig: don't crash on NULL expressions in expr_eq()
perf/x86/intel: Fix PT PMI handling
net: stmmac: RX buffer size must be 16 byte aligned
block: fix memleak when __blk_rq_map_user_iov() is failed
parisc: Fix compiler warnings in debug_core.c
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
macvlan: do not assume mac_header is set in macvlan_broadcast()
net: stmmac: dwmac-sunxi: Allow all RGMII modes
net: usb: lan78xx: fix possible skb leak
pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY
tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
vxlan: fix tos value before xmit
vlan: vlan_changelink() should propagate errors
net: sch_prio: When ungrafting, replace with FIFO
vlan: fix memory leak in vlan_dev_set_egress_priority
USB: core: fix check for duplicate endpoints
USB: serial: option: add Telit ME910G1 0x110a composition
Linux 4.9.209
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I37562d713bddce2ead7934800c3f9e231884c43f
[ Upstream commit bba1b2a890253528c45aa66cf856f289a215bfbc ]
Previously, when this sample is added, commit 1c47910ef8
("samples/bpf: add perf_event+bpf example"), a symbol 'sys_read' and
'sys_write' has been used without no prefixes. But currently there are
no exact symbols with these under kallsyms and this leads to failure.
This commit changes exact compare to substring compare to keep compatible
with exact symbol or prefixed symbol.
Fixes: 1c47910ef8 ("samples/bpf: add perf_event+bpf example")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191205080114.19766-2-danieltimlee@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl00DuEACgkQONu9yGCS
aT5SfA/8Cb+dy+hYcmzuWvhuifNMXgyMVFCoyRbgpM8XC0+kRz713NoJ6qeuCzVy
DeHYL87OaPCdQiLM/G+wXnGeGpr/ZmhKwCJbseayFXlMDX4qrN3PxyYmDdBLIgIz
4entWkXY2ZlwpOMPdLHAKZUUzSeMLmY9yhIjli6myu2cA7AKVqbPmvKf0g8dSOcL
lSTVPWZvA29sCA4yQ3MbdQWK92f9Dhfu/egFC41jHFhp5l8a2r0cIPKJykWI9STn
CQb81SsYaQD/mjsw8hsw9VGDDzKzSsO1xPkAEa3WNRkIOp4ANeiz0cDDA9bW/vKZ
2gbu/VXezjK4EuuCSeWtKf+Wev6RTTVV7SFez7ppE361zYt3EPIudp0WuXSrvXqm
IUlyR30Q/dAov5Y6ZsDYiV5q35IuXxFVeMJto1SDA/5s+u+IxHZbsFREDIDATAMJ
V70slGD9tkavsb+XgppSXhhBYPxCCqlXw8FunJc4xVR9/LEmStcGUFnP9/UUlQog
O+Uvolckn/ljzxYzYZ5bbSRga1CCfAxaCNv8zx6GLPrgE7civ9IaJIFBbjeGR1Yn
zq9k3klDhACmWFxRCr2y/cGSkaR3+QNXH43T7aagj5himYiIdoJMr2+Fx2lPOGwD
fPK/wrSIIXPgQFtqJolGOw5uZQQDv9HKWOdy9P/SVlo4bPsRQH0=
=We9z
-----END PGP SIGNATURE-----
Merge 4.9.186 into android-4.9-q
Changes in 4.9.186
crypto: talitos - rename alternative AEAD algos.
Input: elantech - enable middle button support on 2 ThinkPads
samples, bpf: fix to change the buffer size for read()
staging:iio:ad7150: fix threshold mode config bit
mac80211: mesh: fix RCU warning
mac80211: free peer keys before vif down in mesh
mwifiex: Fix possible buffer overflows at parsing bss descriptor
netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments
netfilter: ipv6: nf_defrag: accept duplicate fragments again
dt-bindings: can: mcp251x: add mcp25625 support
can: mcp251x: add support for mcp25625
Input: imx_keypad - make sure keyboard can always wake up system
KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy
mlxsw: spectrum: Disallow prio-tagged packets when PVID is removed
ARM: davinci: da850-evm: call regulator_has_full_constraints()
ARM: davinci: da8xx: specify dma_coherent_mask for lcdc
mac80211: only warn once on chanctx_conf being NULL
md: fix for divide error in status_resync
bnx2x: Check if transceiver implements DDM before access
ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
net :sunrpc :clnt :Fix xps refcount imbalance on the error path
udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
x86/tls: Fix possible spectre-v1 in do_get_thread_area()
mwifiex: Abort at too short BSS descriptor element
mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies()
fscrypt: don't set policy for a dead directory
mwifiex: Don't abort on small, spec-compliant vendor IEs
USB: serial: ftdi_sio: add ID for isodebug v1
USB: serial: option: add support for GosunCn ME3630 RNDIS mode
Revert "serial: 8250: Don't service RX FIFO if interrupts are disabled"
p54usb: Fix race between disconnect and firmware loading
usb: gadget: ether: Fix race between gether_disconnect and rx_submit
usb: renesas_usbhs: add a workaround for a race condition of workqueue
staging: comedi: dt282x: fix a null pointer deref on interrupt
staging: comedi: amplc_pci230: fix null pointer deref on interrupt
carl9170: fix misuse of device driver API
VMCI: Fix integer overflow in VMCI handle arrays
MIPS: Remove superfluous check for __linux__
Revert "e1000e: fix cyclic resets at link up with active tx"
e1000e: start network tx queue only when link is up
nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
arm64: crypto: remove accidentally backported files
perf/core: Fix perf_sample_regs_user() mm check
ARM: omap2: remove incorrect __init annotation
be2net: fix link failure after ethtool offline test
ppp: mppe: Add softdep to arc4
sis900: fix TX completion
ARM: dts: imx6ul: fix PWM[1-4] interrupts
dm verity: use message limit for data block corruption message
ARC: hide unused function unw_hdr_alloc
s390: fix stfle zero padding
s390/qdio: (re-)initialize tiqdio list entries
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Linux 4.9.186
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit f7c2d64bac1be2ff32f8e4f500c6e5429c1003e0 ]
If the trace for read is larger than 4096, the return
value sz will be 4096. This results in off-by-one error
on buf:
static char buf[4096];
ssize_t sz;
sz = read(trace_fd, buf, sizeof(buf));
if (sz > 0) {
buf[sz] = 0;
puts(buf);
}
Signed-off-by: Chang-Hsien Tsai <luke.tw@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlt/6CsACgkQONu9yGCS
aT7KOw//boywfZ/3pWvs5bp6MsNBrYxa9mzvv7MDQFj10nibTdwY+XFf7O370j3q
Udt3A3V4a57KnploVglMSv4452LETDhMFdlQlxVTMcWAISiZoviVemDTi6GoRCwI
PJg30COP3kE4u51O5i4sGDrff9mRiUPH2XR4JP27CY1aYbjE1IYoOTfB7jUSZhNn
nzghkYLuxIpXc3GlNiWwXSxjRoDz9hw/0jaY4oZC9CRGi/Z/YBqvXLvj9dMO/2R2
0R1ftiI/CEpuQ10XZ9z3h2WlgiWz9Pj183qBMwCUVR0IDVRo+A6lN1Zz7X+eG0no
+CsDAhwfxu2BmDZ4eTDCgcKTk1LDmSVAo3cOoIdW+UkNbeoz99gFivy+jhEyKwQA
KYfMciX31qmrbVfqGT9mZTsDgrYnLCqOmRnhIoAbA3Tb/4gzO6pQXt08eGK4l+LI
u2Yv4L3BiNtZlRPBZYaibtnAAWOPZEuMIf/cRxXJ4//AcWsLirS1Dy3Z70z7J6ew
jkxtDf2F34YYrj1QGRAKfELiZJOJYX7Ig81y80M4tBD2sjRfb29z7QEK5XbDrIID
OJ8Kh57YrdnkGBOFbG8Yvfy4QV1js8VsReasK1pgad8iL8suBYPTPQwmUztGWXz5
AQsU+aQxNYicTtsRIfEN4Tr1Nb/2IqfShC+lwFpcagMvjzjDT/Q=
=GH5x
-----END PGP SIGNATURE-----
Merge 4.9.124 into android-4.9
Changes in 4.9.124
x86/entry/64: Remove %ebx handling from error_entry/exit
ARC: Explicitly add -mmedium-calls to CFLAGS
usb: dwc3: of-simple: fix use-after-free on remove
netfilter: ipv6: nf_defrag: reduce struct net memory waste
selftests: pstore: return Kselftest Skip code for skipped tests
selftests: static_keys: return Kselftest Skip code for skipped tests
selftests: user: return Kselftest Skip code for skipped tests
selftests: zram: return Kselftest Skip code for skipped tests
selftests: sync: add config fragment for testing sync framework
ARM: dts: NSP: Fix i2c controller interrupt type
ARM: dts: NSP: Fix PCIe controllers interrupt types
ARM: dts: Cygnus: Fix I2C controller interrupt type
ARM: dts: Cygnus: Fix PCIe controller interrupt type
arm64: dts: ns2: Fix I2C controller interrupt type
drm: mali-dp: Enable Global SE interrupts mask for DP500
IB/rxe: Fix missing completion for mem_reg work requests
libahci: Fix possible Spectre-v1 pmp indexing in ahci_led_store()
usb: dwc2: fix isoc split in transfer with no data
usb: gadget: composite: fix delayed_status race condition when set_interface
usb: gadget: dwc2: fix memory leak in gadget_init()
xen: add error handling for xenbus_printf
scsi: xen-scsifront: add error handling for xenbus_printf
xen/scsiback: add error handling for xenbus_printf
arm64: make secondary_start_kernel() notrace
qed: Add sanity check for SIMD fastpath handler.
enic: initialize enic->rfs_h.lock in enic_probe
net: hamradio: use eth_broadcast_addr
net: propagate dev_get_valid_name return code
net: stmmac: socfpga: add additional ocp reset line for Stratix10
nvmet: reset keep alive timer in controller enable
ARC: Enable machine_desc->init_per_cpu for !CONFIG_SMP
net: davinci_emac: match the mdio device against its compatible if possible
KVM: arm/arm64: Drop resource size check for GICV window
locking/lockdep: Do not record IRQ state within lockdep code
ipv6: mcast: fix unsolicited report interval after receiving querys
Smack: Mark inode instant in smack_task_to_inode
batman-adv: Fix bat_ogm_iv best gw refcnt after netlink dump
batman-adv: Fix bat_v best gw refcnt after netlink dump
cxgb4: when disabling dcb set txq dcb priority to 0
iio: pressure: bmp280: fix relative humidity unit
brcmfmac: stop watchdog before detach and free everything
ARM: dts: am437x: make edt-ft5x06 a wakeup source
ALSA: seq: Fix UBSAN warning at SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT ioctl
usb: xhci: remove the code build warning
usb: xhci: increase CRS timeout value
NFC: pn533: Fix wrong GFP flag usage
perf test session topology: Fix test on s390
perf report powerpc: Fix crash if callchain is empty
perf bench: Fix numa report output code
netfilter: nf_log: fix uninit read in nf_log_proc_dostring
ceph: fix dentry leak in splice_dentry()
selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
selftests/x86/sigreturn: Do minor cleanups
ARM: dts: da850: Fix interrups property for gpio
dmaengine: pl330: report BURST residue granularity
dmaengine: k3dma: Off by one in k3_of_dma_simple_xlate()
md/raid10: fix that replacement cannot complete recovery after reassemble
nl80211: relax ht operation checks for mesh
drm/exynos: gsc: Fix support for NV16/61, YUV420/YVU420 and YUV422 modes
drm/exynos: decon5433: Fix per-plane global alpha for XRGB modes
drm/exynos: decon5433: Fix WINCONx reset value
bpf, s390: fix potential memleak when later bpf_jit_prog fails
PCI: xilinx: Add missing of_node_put()
PCI: xilinx-nwl: Add missing of_node_put()
bnx2x: Fix receiving tx-timeout in error or recovery state.
acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value
m68k: fix "bad page state" oops on ColdFire boot
objtool: Support GCC 8 '-fnoreorder-functions'
ipvlan: call dev_change_flags when ipvlan mode is reset
HID: wacom: Correct touch maximum XY of 2nd-gen Intuos
ARM: imx_v6_v7_defconfig: Select ULPI support
ARM: imx_v4_v5_defconfig: Select ULPI support
tracing: Use __printf markup to silence compiler
kasan: fix shadow_size calculation error in kasan_module_alloc
smsc75xx: Add workaround for gigabit link up hardware errata.
samples/bpf: add missing <linux/if_vlan.h>
samples/bpf: Check the error of write() and read()
ieee802154: 6lowpan: set IFLA_LINK
netfilter: x_tables: set module owner for icmp(6) matches
ipv6: make ipv6_renew_options() interrupt/kernel safe
net: qrtr: Broadcast messages only from control port
sh_eth: fix invalid context bug while calling auto-negotiation by ethtool
sh_eth: fix invalid context bug while changing link options by ethtool
ravb: fix invalid context bug while calling auto-negotiation by ethtool
ravb: fix invalid context bug while changing link options by ethtool
ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used
ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem
ieee802154: at86rf230: use __func__ macro for debug messages
ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem
drm/armada: fix colorkey mode property
netfilter: nf_conntrack: Fix possible possible crash on module loading.
ARC: Improve cmpxchg syscall implementation
bnxt_en: Always set output parameters in bnxt_get_max_rings().
bnxt_en: Fix for system hang if request_irq fails
perf llvm-utils: Remove bashism from kernel include fetch script
nfit: fix unchecked dereference in acpi_nfit_ctl
RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot
ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores
ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
ixgbe: Be more careful when modifying MAC filters
tools: build: Use HOSTLDFLAGS with fixdep
packet: reset network header if packet shorter than ll reserved space
qlogic: check kstrtoul() for errors
tcp: remove DELAYED ACK events in DCTCP
pinctrl: nsp: off by ones in nsp_pinmux_enable()
pinctrl: nsp: Fix potential NULL dereference
drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply()
net/ethernet/freescale/fman: fix cross-build error
net: usb: rtl8150: demote allmulti message to dev_dbg()
PCI: OF: Fix I/O space page leak
PCI: versatile: Fix I/O space page leak
net: qca_spi: Avoid packet drop during initial sync
net: qca_spi: Make sure the QCA7000 reset is triggered
net: qca_spi: Fix log level if probe fails
tcp: identify cryptic messages as TCP seq # bugs
KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
ext4: fix spectre gadget in ext4_mb_regular_allocator()
parisc: Remove ordered stores from syscall.S
xfrm_user: prevent leaking 2 bytes of kernel memory
netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior state
packet: refine ring v3 block size test to hold one frame
parisc: Remove unnecessary barriers from spinlock.h
PCI: hotplug: Don't leak pci_slot on registration failure
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: pciehp: Fix use-after-free on unplug
PCI: pciehp: Fix unprotected list iteration in IRQ handler
i2c: imx: Fix race condition in dma read
reiserfs: fix broken xattr handling (heap corruption, bad retval)
Linux 4.9.124
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 02a2f000a3629274bfad60bfc4de9edec49e63e7 ]
test_task_rename() and test_urandom_read()
can be failed during write() and read(),
So check the result of them.
Reviewed-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4d5d33a085335ef469c9a87792bcaaaa8e64d8c4 ]
This fixes build error regarding redefinition:
CLANG-bpf samples/bpf/parse_varlen.o
samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr'
struct vlan_hdr {
^
./include/linux/if_vlan.h:38:8: note: previous definition is here
So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.
Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X
2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.
3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.
In the future this behavior may be extended with a chain of
non-overridable programs.
Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.
Add several testcases and adjust libbpf.
Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28
("cgroup: add support for eBPF programs")
[AmitP: Refactored original patch for android-4.9 where libbpf sources
are in samples/bpf/ and test_cgrp2_attach2, test_cgrp2_sock,
and test_cgrp2_sock2 sample tests do not exist.]
(cherry picked from commit 7f677633379b4abb3281cdbe7e7006f049305c03)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Cherry-pick from commit d8c5b17f2bc0de09fbbfa14d90e8168163a579e7
Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:
* Create arraymap in kernel with 4 byte keys and 8 byte values
* Load eBPF program
The eBPF program accesses the map passed in to store two pieces of
information. The number of invocations of the program, which maps
to the number of packets received, is stored to key 0. Key 1 is
incremented on each iteration by the number of bytes stored in
the skb.
* Detach any eBPF program previously attached to the cgroup
* Attach the new program to the cgroup using BPF_PROG_ATTACH
* Once a second, read map[0] and map[1] to see how many bytes and
packets were seen on any socket of tasks in the given cgroup.
The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.
libbpf gained two new wrappers for the new syscall commands.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: I011436a755abd62050edd22e47995c166a0bd8a2
[ Upstream commit 332270fdc8b6fba07d059a9ad44df9e1a2ad4529 ]
llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.
Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The files "sampleip_kern.c" and "trace_event_kern.c" directly access
"ctx->regs.ip" which is not available on s390x. Fix this and use the
PT_REGS_IP() macro instead.
Also fix the macro for s390x and use "psw.addr" from "pt_regs".
Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test creates two netns, ns1 and ns2. The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.
The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].
At ns1, the VIPs are routed _via_ the host.
At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl. The test is configured in a way so
that both ingress and egress can be tested.
At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified. The return path is routed to the dev ipip/ip6tnl.
During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the sample files are causing issues when they are loaded with tc
and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
file as program/map/etc sections are not present, which can be easily
spotted with readelf(1).
Currently, BPF samples are including some of the kernel headers and mid
term we should change them to refrain from this, really. When dynamic
debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
is easily overlooked in the build as clang spills this along with other
noisy warnings from various header includes, and llc still generates an
ELF file with mentioned characteristics. For just playing around with BPF
examples, this can be a bit of a hurdle to take.
Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
done in xdp*_kern samples already.
Fixes: 65d472fb00 ("samples/bpf: add 'pointer to packet' tests")
Fixes: 6afb1e28b8 ("samples/bpf: Add tunnel set/get tests.")
Fixes: a3f7461734 ("cgroup: bpf: Add an example to do cgroup checking in BPF")
Reported-by: Chandrasekar Kannan <ckannan@console.to>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppose you have a map array value that is something like this
struct foo {
unsigned iter;
int array[SOME_CONSTANT];
};
You can easily insert this into an array, but you cannot modify the contents of
foo->array[] after the fact. This is because we have no way to verify we won't
go off the end of the array at verification time. This patch provides a start
for this work. We accomplish this by keeping track of a minimum and maximum
value a register could be while we're checking the code. Then at the time we
try to do an access into a MAP_VALUE we verify that the maximum offset into that
region is a valid access into that memory region. So in practice, code such as
this
unsigned index = 0;
if (foo->iter >= SOME_CONSTANT)
foo->iter = index;
else
index = foo->iter++;
foo->array[index] = bar;
would be allowed, as we can verify that index will always be between 0 and
SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra
check to make sure the index isn't less than 0, or do something like index %=
SOME_CONSTANT.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
seccomp_phase1() does not exist anymore. Instead, update sample to use
__seccomp_filter(). While at it, set max locked memory to unlimited.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These samples fail to compile as 'struct flow_keys' conflicts with
definition in net/flow_dissector.h. Fix the same by renaming the
structure used in the sample.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add couple of test cases for direct write and the negative size issue, and
also adjust the direct packet access test4 since it asserts that writes are
not possible, but since we've just added support for writes, we need to
invert the verdict to ACCEPT, of course. Summary: 133 PASSED, 0 FAILED.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
the test creates 3 namespaces with veth connected via bridge.
First two namespaces simulate two different hosts with the same
IPv4 and IPv6 addresses configured on the tunnel interface and they
communicate with outside world via standard tunnels.
Third namespace creates collect_md tunnel that is driven by BPF
program which selects different remote host (either first or
second namespace) based on tcp dest port number while tcp dst
ip is the same.
This scenario is rough approximation of load balancer use case.
The tests check both traditional tunnel configuration and collect_md mode.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
It tests both traditional tunnel configuration and
dynamic via bpf helpers.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
LLVM can generate code that tests for direct packet access via
skb->data/data_end in a way that currently gets rejected by the
verifier, example:
[...]
7: (61) r3 = *(u32 *)(r6 +80)
8: (61) r9 = *(u32 *)(r6 +76)
9: (bf) r2 = r9
10: (07) r2 += 54
11: (3d) if r3 >= r2 goto pc+12
R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
R9=pkt(id=0,off=0,r=0) R10=fp
12: (18) r4 = 0xffffff7a
14: (05) goto pc+430
[...]
from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
24: (7b) *(u64 *)(r10 -40) = r1
25: (b7) r1 = 0
26: (63) *(u32 *)(r6 +56) = r1
27: (b7) r2 = 40
28: (71) r8 = *(u8 *)(r9 +20)
invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)
The reason why this gets rejected despite a proper test is that we
currently call find_good_pkt_pointers() only in case where we detect
tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
derived, for example, from a register of type pkt(id=Y,off=0,r=0)
pointing to skb->data. find_good_pkt_pointers() then fills the range
in the current branch to pkt(id=Y,off=0,r=Z) on success.
For above case, we need to extend that to recognize pkt_end >= rX
pattern and mark the other branch that is taken on success with the
appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
only two practical options to test for from what LLVM could have
generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
that we would need to take into account as well.
After the fix:
[...]
7: (61) r3 = *(u32 *)(r6 +80)
8: (61) r9 = *(u32 *)(r6 +76)
9: (bf) r2 = r9
10: (07) r2 += 54
11: (3d) if r3 >= r2 goto pc+12
R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
R9=pkt(id=0,off=0,r=0) R10=fp
12: (18) r4 = 0xffffff7a
14: (05) goto pc+430
[...]
from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
24: (7b) *(u64 *)(r10 -40) = r1
25: (b7) r1 = 0
26: (63) *(u32 *)(r6 +56) = r1
27: (b7) r2 = 40
28: (71) r8 = *(u8 *)(r9 +20)
29: (bf) r1 = r8
30: (25) if r8 > 0x3c goto pc+47
R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
R9=pkt(id=0,off=0,r=54) R10=fp
31: (b7) r1 = 1
[...]
Verifier test cases are also added in this work, one that demonstrates
the mentioned example here and one that tries a bad packet access for
the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
with both test variants (>, >=).
Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sample instruction pointer and frequency count in a BPF map
Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++
It's primary purpose to check that key bpf helpers like map lookup, update,
get_stackid, trace_printk and ctx access are all working.
It checks:
- PERF_COUNT_HW_CPU_CYCLES on all cpus
- PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children
- PERF_COUNT_SW_CPU_CLOCK on all cpus
- PERF_COUNT_SW_CPU_CLOCK for current process
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE. A native
tunnel device is created in a namespace to interact with a lwtunnel
device out of the namespace, with metadata enabled. The bpf_skb_set_*
program is attached to tc egress and bpf_skb_get_* is attached to egress
qdisc. A ping between two tunnels is used to verify correctness and
the result of bpf_skb_get_* printed by bpf_trace_printk.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor overlapping changes for both merge conflicts.
Resolution work done by Stephen Rothwell was used
as a reference.
Signed-off-by: David S. Miller <davem@davemloft.net>
test various corner cases of the helper function access to the packet
via crafted XDP programs.
Signed-off-by: Aaron Yue <haoxuany@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.
Tejun says:
So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.
[...]
It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.
Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.
Fixes: 4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
This test has a BPF program which writes the last known pid to call the
sync syscall within a given cgroup to a map.
The user mode program creates its own mount namespace, and mounts the
cgroupsv2 hierarchy in there, as on all current test systems
(Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default.
Once it does this, it proceeds to test.
The test checks for positive and negative condition. It ensures that
when it's part of a given cgroup, its pid is captured in the map,
and that when it leaves the cgroup, this doesn't happen.
It populate a cgroups arraymap prior to execution in userspace. This means
that the program must be run in the same cgroups namespace as the programs
that are being traced.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")
started using 20 of initially reserved upper 32-bits of 'flags' argument
in bpf_perf_event_output(). Adjust corresponding prototype in samples/bpf/bpf_helpers.h
Signed-off-by: Adam Barth <arb@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
increase test coverage to check previously missing 'update when full'
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This example shows using a kprobe to act as a dnat mechanism to divert
traffic for arbitrary endpoints. It rewrite the arguments to a syscall
while they're still in userspace, and before the syscall has a chance
to copy the argument into kernel space.
Although this is an example, it also acts as a test because the mapped
address is 255.255.255.255:555 -> real address, and that's not a legal
address to connect to. If the helper is broken, the example will fail
on the intermediate steps, as well as the final step to verify the
rewrite of userspace memory succeeded.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows user memory to be written to during the course of a kprobe.
It shouldn't be used to implement any kind of security mechanism
because of TOC-TOU attacks, but rather to debug, divert, and
manipulate execution of semi-cooperative processes.
Although it uses probe_kernel_write, we limit the address space
the probe can write into by checking the space with access_ok.
We do this as opposed to calling copy_to_user directly, in order
to avoid sleeping. In addition we ensure the threads's current fs
/ segment is USER_DS and the thread isn't exiting nor a kernel thread.
Given this feature is meant for experiments, and it has a risk of
crashing the system, and running programs, we print a warning on
when a proglet that attempts to use this helper is installed,
along with the pid and process name.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The naming choice of index is not terribly descriptive, and dropcnt is
in fact incorrect for xdp2. Pick better names for these: ipproto and
rxcnt.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.
Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file. The pinned file can be loaded by tc and then used
by the bpf prog later. This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.
test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc. It is to demonstrate
the usage of bpf_skb_in_cgroup.
test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together. The idea is like:
1. Load the test_cgrp2_tc_kern.o by tc
2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
with a cgroup fd
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
dropped because of a match on the cgroup
Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc. It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
add few tests for "pointer to packet" logic of the verifier
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
parse_simple.c - packet parser exapmle with single length check that
filters out udp packets for port 9
parse_varlen.c - variable length parser that understand multiple vlan headers,
ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
The packet is parsed layer by layer with multitple length checks.
parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
Same functionality as parse_simple.
simple = 24.1Mpps per core
varlen = 22.7Mpps
ldabs = 21.4Mpps
Parser with LD_ABS instructions is slower than full direct access parser
which does more packet accesses and checks.
These examples demonstrate the choice bpf program authors can make between
flexibility of the parser vs speed.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv4/ip_gre.c
Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Users are likely to manually compile both LLVM 'llc' and 'clang'
tools. Thus, also allow redefining CLANG and verify command exist.
Makefile implementation wise, the target that verify the command have
been generalized.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not intuitive that 'make' must be run from the top level
directory with argument "samples/bpf/" to compile these eBPF samples.
Introduce a kbuild make file trick that allow make to be run from the
"samples/bpf/" directory itself. It basically change to the top level
directory and call "make samples/bpf/" with the "/" slash after the
directory name.
Also add a clean target that only cleans this directory, by taking
advantage of the kbuild external module setting M=$PWD.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Getting started with using examples in samples/bpf/ is not
straightforward. There are several dependencies, and specific
versions of these dependencies.
Just compiling the example tool is also slightly obscure, e.g. one
need to call make like:
make samples/bpf/
Do notice the "/" slash after the directory name.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make compiling samples/bpf more user friendly, by detecting if LLVM
compiler tool 'llc' is available, and also detect if the 'bpf' target
is available in this version of LLVM.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is practical to be-able-to redefine the location of the LLVM
command 'llc', because not all distros have a LLVM version with bpf
target support. Thus, it is sometimes required to compile LLVM from
source, and sometimes it is not desired to overwrite the distros
default LLVM version.
This feature was removed with 128d1514be ("samples/bpf: Use llc in
PATH, rather than a hardcoded value").
Add this features back. Note that it is possible to redefine the LLC
on the make command like:
make samples/bpf/ LLC=~/git/llvm/build/bin/llc
Fixes: 128d1514be ("samples/bpf: Use llc in PATH, rather than a hardcoded value")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
llvm cannot always recognize memset as builtin function and optimize
it away, so just delete it. It was a leftover from testing
of bpf_perf_event_output() with large data structures.
Fixes: 39111695b1 ("samples: bpf: add bpf_perf_event_output example")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
verifier behaviour.
[...]
#84 raw_stack: no skb_load_bytes OK
#85 raw_stack: skb_load_bytes, no init OK
#86 raw_stack: skb_load_bytes, init OK
#87 raw_stack: skb_load_bytes, spilled regs around bounds OK
#88 raw_stack: skb_load_bytes, spilled regs corruption OK
#89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
#90 raw_stack: skb_load_bytes, spilled regs + data OK
#91 raw_stack: skb_load_bytes, invalid access 1 OK
#92 raw_stack: skb_load_bytes, invalid access 2 OK
#93 raw_stack: skb_load_bytes, invalid access 3 OK
#94 raw_stack: skb_load_bytes, invalid access 4 OK
#95 raw_stack: skb_load_bytes, invalid access 5 OK
#96 raw_stack: skb_load_bytes, invalid access 6 OK
#97 raw_stack: skb_load_bytes, large access OK
Summary: 98 PASSED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the zero initialization in the sample programs where appropriate.
Note that this is an optimization which is now possible, old programs
still doing the zero initialization are just fine as well. Also, make
sure we don't have padding issues when we don't memset() the entire
struct anymore.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>