Commit graph

108 commits

Author SHA1 Message Date
Daniel Borkmann
63ad3e9d66
bpf: add napi_id read access to __sk_buff
Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).

Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b35
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.

Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.

  [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf

Suggested-by: Eric Dumazet <edumazet@google.com>
Change-Id: I41744d03eb67dcef27b412f4dedba1b9c78c1d17
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:41 +03:00
Daniel Borkmann
5c9ec3757c
bpf: allow b/h/w/dw access for bpf's cb in ctx
When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:38 +03:00
Greg Kroah-Hartman
c830063283 This is the 4.9.209 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4a894ACgkQONu9yGCS
 aT5rLRAAu7SpHGDWU7SlH3ncUXauqLyQ/cMSLLyS1HJ7TYvHujJInV5U3yaz4LQH
 e5q+k5BgaPKSe3zN44D9hRhEbEysk19ekRQ5tkhRZ7a41hs31kXTz/QoH2D3+Ybg
 VCZVqyuthiDmQnVQwBHzUy9T2RhOIOop0y5I3GcaSj6fHw0ZQ4AOJdmxZzafLvpB
 obmDuRYNyhpxU1Ds+gYgcDXjPWsDyrNVEVuCeLVRY7wwLd41ZrKj9r655tBpHyJ+
 PrlMI6zNaZ4BMWcG/4bpTE5gUWqhHCxYi5kFgAh9yZmoPV9jSt9DZeggOASwSDTd
 qJLPpqlxDp4+/Kx3wZz2brKN6ZHxGvkpGOztcMEvUOOqet+l7yD8RRs5HmcYKaSg
 qMDNcFIBSN3jcnQdKyqkb+NkJ+6RXMueyk4Mn8f9C1lptDsfODP1hQnJbIPc0NZJ
 t7LTDPEVIxuLIAMXQI4Sg2yFEAD+dtaq3XjhMUlCz2SWLAeblvOcrn/Fm7Pb63/s
 fj9pq4mNyRmHatmsuL1/R4zLPbz3A1gQ8TbHCNIcxGzPlD5zPlM3/iBXXLKefGrE
 BgbFRhdjOCNuHnHVtdzgYH6N/VRF4dtWOT8Qz2btIT9mkm11W5A53BTAtwIYAXgo
 GmKOSajhxb25/tUZhc0WVmYf2jdvlM7ygaDfQdYE42G7AxnE67k=
 =cc6F
 -----END PGP SIGNATURE-----

Merge 4.9.209 into android-4.9-q

Changes in 4.9.209
	PM / devfreq: Don't fail devfreq_dev_release if not in list
	RDMA/cma: add missed unregister_pernet_subsys in init failure
	scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
	scsi: qla2xxx: Don't call qlt_async_event twice
	scsi: iscsi: qla4xxx: fix double free in probe
	scsi: libsas: stop discovering if oob mode is disconnected
	usb: gadget: fix wrong endpoint desc
	md: raid1: check rdev before reference in raid1_sync_request func
	s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
	s390/cpum_sf: Avoid SBD overflow condition in irq handler
	IB/mlx4: Follow mirror sequence of device add during device removal
	xen-blkback: prevent premature module unload
	xen/balloon: fix ballooned page accounting without hotplug enabled
	PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
	xfs: fix mount failure crash on invalid iclog memory access
	taskstats: fix data-race
	drm: limit to INT_MAX in create_blob ioctl
	Revert "perf report: Add warning when libunwind not compiled in"
	ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
	MIPS: Avoid VDSO ABI breakage due to global register variable
	mm/zsmalloc.c: fix the migrated zspage statistics.
	memcg: account security cred as well to kmemcg
	locks: print unsigned ino in /proc/locks
	dmaengine: Fix access to uninitialized dma_slave_caps
	compat_ioctl: block: handle Persistent Reservations
	ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
	ata: ahci_brcm: Allow optional reset controller to be used
	ata: ahci_brcm: Fix AHCI resources management
	gpiolib: fix up emulated open drain outputs
	tracing: Have the histogram compare functions convert to u64 first
	ALSA: cs4236: fix error return comparison of an unsigned integer
	ftrace: Avoid potential division by zero in function profiler
	arm64: Revert support for execute-only user mappings
	PM / devfreq: Check NULL governor in available_governors_show
	nfsd4: fix up replay_matches_cache()
	xfs: don't check for AG deadlock for realtime files in bunmapi
	Bluetooth: btusb: fix PM leak in error case of setup
	Bluetooth: delete a stray unlock
	Bluetooth: Fix memory leak in hci_connect_le_scan
	media: flexcop-usb: ensure -EIO is returned on error condition
	regulator: ab8500: Remove AB8505 USB regulator
	media: usb: fix memory leak in af9005_identify_state
	tty: serial: msm_serial: Fix lockup for sysrq and oops
	fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
	drm/mst: Fix MST sideband up-reply failure handling
	powerpc/pseries/hvconsole: Fix stack overread via udbg
	rxrpc: Fix possible NULL pointer access in ICMP handling
	ath9k_htc: Modify byte order for an error message
	ath9k_htc: Discard undersized packets
	net: add annotations on hh->hh_len lockless accesses
	s390/smp: fix physical to logical CPU map for SMT
	xen/blkback: Avoid unmapping unmapped grant pages
	locking/x86: Remove the unused atomic_inc_short() methd
	pstore/ram: Write new dumps to start of recycled zones
	locking/spinlock/debug: Fix various data races
	netfilter: ctnetlink: netns exit must wait for callbacks
	efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
	efi/gop: Return EFI_SUCCESS if a usable GOP was found
	efi/gop: Fix memory leak in __gop_query32/64()
	ARM: vexpress: Set-up shared OPP table instead of individual for each CPU
	netfilter: uapi: Avoid undefined left-shift in xt_sctp.h
	spi: spi-cavium-thunderx: Add missing pci_release_regions()
	ARM: dts: am437x-gp/epos-evm: fix panel compatible
	samples: bpf: Replace symbol compare of trace_event
	powerpc: Ensure that swiotlb buffer is allocated from low memory
	bnx2x: Do not handle requests from VFs after parity
	bnx2x: Fix logic to get total no. of PFs per engine
	net: usb: lan78xx: Fix error message format specifier
	rfkill: Fix incorrect check to avoid NULL pointer dereference
	ASoC: wm8962: fix lambda value
	regulator: rn5t618: fix module aliases
	kconfig: don't crash on NULL expressions in expr_eq()
	perf/x86/intel: Fix PT PMI handling
	net: stmmac: RX buffer size must be 16 byte aligned
	block: fix memleak when __blk_rq_map_user_iov() is failed
	parisc: Fix compiler warnings in debug_core.c
	llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
	macvlan: do not assume mac_header is set in macvlan_broadcast()
	net: stmmac: dwmac-sunxi: Allow all RGMII modes
	net: usb: lan78xx: fix possible skb leak
	pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
	sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY
	tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
	vxlan: fix tos value before xmit
	vlan: vlan_changelink() should propagate errors
	net: sch_prio: When ungrafting, replace with FIFO
	vlan: fix memory leak in vlan_dev_set_egress_priority
	USB: core: fix check for duplicate endpoints
	USB: serial: option: add Telit ME910G1 0x110a composition
	Linux 4.9.209

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I37562d713bddce2ead7934800c3f9e231884c43f
2020-01-12 12:14:52 +01:00
Daniel T. Lee
666ce1753c samples: bpf: Replace symbol compare of trace_event
[ Upstream commit bba1b2a890253528c45aa66cf856f289a215bfbc ]

Previously, when this sample is added, commit 1c47910ef8
("samples/bpf: add perf_event+bpf example"), a symbol 'sys_read' and
'sys_write' has been used without no prefixes. But currently there are
no exact symbols with these under kallsyms and this leads to failure.

This commit changes exact compare to substring compare to keep compatible
with exact symbol or prefixed symbol.

Fixes: 1c47910ef8 ("samples/bpf: add perf_event+bpf example")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191205080114.19766-2-danieltimlee@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 11:24:22 +01:00
Greg Kroah-Hartman
83cf3cfdb2 This is the 4.9.186 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl00DuEACgkQONu9yGCS
 aT5SfA/8Cb+dy+hYcmzuWvhuifNMXgyMVFCoyRbgpM8XC0+kRz713NoJ6qeuCzVy
 DeHYL87OaPCdQiLM/G+wXnGeGpr/ZmhKwCJbseayFXlMDX4qrN3PxyYmDdBLIgIz
 4entWkXY2ZlwpOMPdLHAKZUUzSeMLmY9yhIjli6myu2cA7AKVqbPmvKf0g8dSOcL
 lSTVPWZvA29sCA4yQ3MbdQWK92f9Dhfu/egFC41jHFhp5l8a2r0cIPKJykWI9STn
 CQb81SsYaQD/mjsw8hsw9VGDDzKzSsO1xPkAEa3WNRkIOp4ANeiz0cDDA9bW/vKZ
 2gbu/VXezjK4EuuCSeWtKf+Wev6RTTVV7SFez7ppE361zYt3EPIudp0WuXSrvXqm
 IUlyR30Q/dAov5Y6ZsDYiV5q35IuXxFVeMJto1SDA/5s+u+IxHZbsFREDIDATAMJ
 V70slGD9tkavsb+XgppSXhhBYPxCCqlXw8FunJc4xVR9/LEmStcGUFnP9/UUlQog
 O+Uvolckn/ljzxYzYZ5bbSRga1CCfAxaCNv8zx6GLPrgE7civ9IaJIFBbjeGR1Yn
 zq9k3klDhACmWFxRCr2y/cGSkaR3+QNXH43T7aagj5himYiIdoJMr2+Fx2lPOGwD
 fPK/wrSIIXPgQFtqJolGOw5uZQQDv9HKWOdy9P/SVlo4bPsRQH0=
 =We9z
 -----END PGP SIGNATURE-----

Merge 4.9.186 into android-4.9-q

Changes in 4.9.186
	crypto: talitos - rename alternative AEAD algos.
	Input: elantech - enable middle button support on 2 ThinkPads
	samples, bpf: fix to change the buffer size for read()
	staging:iio:ad7150: fix threshold mode config bit
	mac80211: mesh: fix RCU warning
	mac80211: free peer keys before vif down in mesh
	mwifiex: Fix possible buffer overflows at parsing bss descriptor
	netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments
	netfilter: ipv6: nf_defrag: accept duplicate fragments again
	dt-bindings: can: mcp251x: add mcp25625 support
	can: mcp251x: add support for mcp25625
	Input: imx_keypad - make sure keyboard can always wake up system
	KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy
	mlxsw: spectrum: Disallow prio-tagged packets when PVID is removed
	ARM: davinci: da850-evm: call regulator_has_full_constraints()
	ARM: davinci: da8xx: specify dma_coherent_mask for lcdc
	mac80211: only warn once on chanctx_conf being NULL
	md: fix for divide error in status_resync
	bnx2x: Check if transceiver implements DDM before access
	ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
	net :sunrpc :clnt :Fix xps refcount imbalance on the error path
	udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
	x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
	x86/tls: Fix possible spectre-v1 in do_get_thread_area()
	mwifiex: Abort at too short BSS descriptor element
	mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies()
	fscrypt: don't set policy for a dead directory
	mwifiex: Don't abort on small, spec-compliant vendor IEs
	USB: serial: ftdi_sio: add ID for isodebug v1
	USB: serial: option: add support for GosunCn ME3630 RNDIS mode
	Revert "serial: 8250: Don't service RX FIFO if interrupts are disabled"
	p54usb: Fix race between disconnect and firmware loading
	usb: gadget: ether: Fix race between gether_disconnect and rx_submit
	usb: renesas_usbhs: add a workaround for a race condition of workqueue
	staging: comedi: dt282x: fix a null pointer deref on interrupt
	staging: comedi: amplc_pci230: fix null pointer deref on interrupt
	carl9170: fix misuse of device driver API
	VMCI: Fix integer overflow in VMCI handle arrays
	MIPS: Remove superfluous check for __linux__
	Revert "e1000e: fix cyclic resets at link up with active tx"
	e1000e: start network tx queue only when link is up
	nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
	arm64: crypto: remove accidentally backported files
	perf/core: Fix perf_sample_regs_user() mm check
	ARM: omap2: remove incorrect __init annotation
	be2net: fix link failure after ethtool offline test
	ppp: mppe: Add softdep to arc4
	sis900: fix TX completion
	ARM: dts: imx6ul: fix PWM[1-4] interrupts
	dm verity: use message limit for data block corruption message
	ARC: hide unused function unw_hdr_alloc
	s390: fix stfle zero padding
	s390/qdio: (re-)initialize tiqdio list entries
	s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
	Linux 4.9.186

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-22 15:07:58 +02:00
Chang-Hsien Tsai
95a21e7829 samples, bpf: fix to change the buffer size for read()
[ Upstream commit f7c2d64bac1be2ff32f8e4f500c6e5429c1003e0 ]

If the trace for read is larger than 4096, the return
value sz will be 4096. This results in off-by-one error
on buf:

    static char buf[4096];
    ssize_t sz;

    sz = read(trace_fd, buf, sizeof(buf));
    if (sz > 0) {
        buf[sz] = 0;
        puts(buf);
    }

Signed-off-by: Chang-Hsien Tsai <luke.tw@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-21 09:05:51 +02:00
Greg Kroah-Hartman
6a1b592354 This is the 4.9.124 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlt/6CsACgkQONu9yGCS
 aT7KOw//boywfZ/3pWvs5bp6MsNBrYxa9mzvv7MDQFj10nibTdwY+XFf7O370j3q
 Udt3A3V4a57KnploVglMSv4452LETDhMFdlQlxVTMcWAISiZoviVemDTi6GoRCwI
 PJg30COP3kE4u51O5i4sGDrff9mRiUPH2XR4JP27CY1aYbjE1IYoOTfB7jUSZhNn
 nzghkYLuxIpXc3GlNiWwXSxjRoDz9hw/0jaY4oZC9CRGi/Z/YBqvXLvj9dMO/2R2
 0R1ftiI/CEpuQ10XZ9z3h2WlgiWz9Pj183qBMwCUVR0IDVRo+A6lN1Zz7X+eG0no
 +CsDAhwfxu2BmDZ4eTDCgcKTk1LDmSVAo3cOoIdW+UkNbeoz99gFivy+jhEyKwQA
 KYfMciX31qmrbVfqGT9mZTsDgrYnLCqOmRnhIoAbA3Tb/4gzO6pQXt08eGK4l+LI
 u2Yv4L3BiNtZlRPBZYaibtnAAWOPZEuMIf/cRxXJ4//AcWsLirS1Dy3Z70z7J6ew
 jkxtDf2F34YYrj1QGRAKfELiZJOJYX7Ig81y80M4tBD2sjRfb29z7QEK5XbDrIID
 OJ8Kh57YrdnkGBOFbG8Yvfy4QV1js8VsReasK1pgad8iL8suBYPTPQwmUztGWXz5
 AQsU+aQxNYicTtsRIfEN4Tr1Nb/2IqfShC+lwFpcagMvjzjDT/Q=
 =GH5x
 -----END PGP SIGNATURE-----

Merge 4.9.124 into android-4.9

Changes in 4.9.124
	x86/entry/64: Remove %ebx handling from error_entry/exit
	ARC: Explicitly add -mmedium-calls to CFLAGS
	usb: dwc3: of-simple: fix use-after-free on remove
	netfilter: ipv6: nf_defrag: reduce struct net memory waste
	selftests: pstore: return Kselftest Skip code for skipped tests
	selftests: static_keys: return Kselftest Skip code for skipped tests
	selftests: user: return Kselftest Skip code for skipped tests
	selftests: zram: return Kselftest Skip code for skipped tests
	selftests: sync: add config fragment for testing sync framework
	ARM: dts: NSP: Fix i2c controller interrupt type
	ARM: dts: NSP: Fix PCIe controllers interrupt types
	ARM: dts: Cygnus: Fix I2C controller interrupt type
	ARM: dts: Cygnus: Fix PCIe controller interrupt type
	arm64: dts: ns2: Fix I2C controller interrupt type
	drm: mali-dp: Enable Global SE interrupts mask for DP500
	IB/rxe: Fix missing completion for mem_reg work requests
	libahci: Fix possible Spectre-v1 pmp indexing in ahci_led_store()
	usb: dwc2: fix isoc split in transfer with no data
	usb: gadget: composite: fix delayed_status race condition when set_interface
	usb: gadget: dwc2: fix memory leak in gadget_init()
	xen: add error handling for xenbus_printf
	scsi: xen-scsifront: add error handling for xenbus_printf
	xen/scsiback: add error handling for xenbus_printf
	arm64: make secondary_start_kernel() notrace
	qed: Add sanity check for SIMD fastpath handler.
	enic: initialize enic->rfs_h.lock in enic_probe
	net: hamradio: use eth_broadcast_addr
	net: propagate dev_get_valid_name return code
	net: stmmac: socfpga: add additional ocp reset line for Stratix10
	nvmet: reset keep alive timer in controller enable
	ARC: Enable machine_desc->init_per_cpu for !CONFIG_SMP
	net: davinci_emac: match the mdio device against its compatible if possible
	KVM: arm/arm64: Drop resource size check for GICV window
	locking/lockdep: Do not record IRQ state within lockdep code
	ipv6: mcast: fix unsolicited report interval after receiving querys
	Smack: Mark inode instant in smack_task_to_inode
	batman-adv: Fix bat_ogm_iv best gw refcnt after netlink dump
	batman-adv: Fix bat_v best gw refcnt after netlink dump
	cxgb4: when disabling dcb set txq dcb priority to 0
	iio: pressure: bmp280: fix relative humidity unit
	brcmfmac: stop watchdog before detach and free everything
	ARM: dts: am437x: make edt-ft5x06 a wakeup source
	ALSA: seq: Fix UBSAN warning at SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT ioctl
	usb: xhci: remove the code build warning
	usb: xhci: increase CRS timeout value
	NFC: pn533: Fix wrong GFP flag usage
	perf test session topology: Fix test on s390
	perf report powerpc: Fix crash if callchain is empty
	perf bench: Fix numa report output code
	netfilter: nf_log: fix uninit read in nf_log_proc_dostring
	ceph: fix dentry leak in splice_dentry()
	selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
	selftests/x86/sigreturn: Do minor cleanups
	ARM: dts: da850: Fix interrups property for gpio
	dmaengine: pl330: report BURST residue granularity
	dmaengine: k3dma: Off by one in k3_of_dma_simple_xlate()
	md/raid10: fix that replacement cannot complete recovery after reassemble
	nl80211: relax ht operation checks for mesh
	drm/exynos: gsc: Fix support for NV16/61, YUV420/YVU420 and YUV422 modes
	drm/exynos: decon5433: Fix per-plane global alpha for XRGB modes
	drm/exynos: decon5433: Fix WINCONx reset value
	bpf, s390: fix potential memleak when later bpf_jit_prog fails
	PCI: xilinx: Add missing of_node_put()
	PCI: xilinx-nwl: Add missing of_node_put()
	bnx2x: Fix receiving tx-timeout in error or recovery state.
	acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value
	m68k: fix "bad page state" oops on ColdFire boot
	objtool: Support GCC 8 '-fnoreorder-functions'
	ipvlan: call dev_change_flags when ipvlan mode is reset
	HID: wacom: Correct touch maximum XY of 2nd-gen Intuos
	ARM: imx_v6_v7_defconfig: Select ULPI support
	ARM: imx_v4_v5_defconfig: Select ULPI support
	tracing: Use __printf markup to silence compiler
	kasan: fix shadow_size calculation error in kasan_module_alloc
	smsc75xx: Add workaround for gigabit link up hardware errata.
	samples/bpf: add missing <linux/if_vlan.h>
	samples/bpf: Check the error of write() and read()
	ieee802154: 6lowpan: set IFLA_LINK
	netfilter: x_tables: set module owner for icmp(6) matches
	ipv6: make ipv6_renew_options() interrupt/kernel safe
	net: qrtr: Broadcast messages only from control port
	sh_eth: fix invalid context bug while calling auto-negotiation by ethtool
	sh_eth: fix invalid context bug while changing link options by ethtool
	ravb: fix invalid context bug while calling auto-negotiation by ethtool
	ravb: fix invalid context bug while changing link options by ethtool
	ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
	net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used
	ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem
	ieee802154: at86rf230: use __func__ macro for debug messages
	ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem
	drm/armada: fix colorkey mode property
	netfilter: nf_conntrack: Fix possible possible crash on module loading.
	ARC: Improve cmpxchg syscall implementation
	bnxt_en: Always set output parameters in bnxt_get_max_rings().
	bnxt_en: Fix for system hang if request_irq fails
	perf llvm-utils: Remove bashism from kernel include fetch script
	nfit: fix unchecked dereference in acpi_nfit_ctl
	RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
	ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot
	ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores
	ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
	ixgbe: Be more careful when modifying MAC filters
	tools: build: Use HOSTLDFLAGS with fixdep
	packet: reset network header if packet shorter than ll reserved space
	qlogic: check kstrtoul() for errors
	tcp: remove DELAYED ACK events in DCTCP
	pinctrl: nsp: off by ones in nsp_pinmux_enable()
	pinctrl: nsp: Fix potential NULL dereference
	drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply()
	net/ethernet/freescale/fman: fix cross-build error
	net: usb: rtl8150: demote allmulti message to dev_dbg()
	PCI: OF: Fix I/O space page leak
	PCI: versatile: Fix I/O space page leak
	net: qca_spi: Avoid packet drop during initial sync
	net: qca_spi: Make sure the QCA7000 reset is triggered
	net: qca_spi: Fix log level if probe fails
	tcp: identify cryptic messages as TCP seq # bugs
	KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
	ext4: fix spectre gadget in ext4_mb_regular_allocator()
	parisc: Remove ordered stores from syscall.S
	xfrm_user: prevent leaking 2 bytes of kernel memory
	netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior state
	packet: refine ring v3 block size test to hold one frame
	parisc: Remove unnecessary barriers from spinlock.h
	PCI: hotplug: Don't leak pci_slot on registration failure
	PCI: Skip MPS logic for Virtual Functions (VFs)
	PCI: pciehp: Fix use-after-free on unplug
	PCI: pciehp: Fix unprotected list iteration in IRQ handler
	i2c: imx: Fix race condition in dma read
	reiserfs: fix broken xattr handling (heap corruption, bad retval)
	Linux 4.9.124

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-08-24 13:23:30 +02:00
Taeung Song
fbfd753aeb samples/bpf: Check the error of write() and read()
[ Upstream commit 02a2f000a3629274bfad60bfc4de9edec49e63e7 ]

test_task_rename() and test_urandom_read()
can be failed during write() and read(),
So check the result of them.

Reviewed-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24 13:12:36 +02:00
Taeung Song
f4bc80f5a5 samples/bpf: add missing <linux/if_vlan.h>
[ Upstream commit 4d5d33a085335ef469c9a87792bcaaaa8e64d8c4 ]

This fixes build error regarding redefinition:

    CLANG-bpf  samples/bpf/parse_varlen.o
  samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr'
  struct vlan_hdr {
         ^
  ./include/linux/if_vlan.h:38:8: note: previous definition is here

So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24 13:12:35 +02:00
Alexei Starovoitov
1ee2b4b803 BACKPORT: bpf: introduce BPF_F_ALLOW_OVERRIDE flag
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Fixes: Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28
       ("cgroup: add support for eBPF programs")
[AmitP: Refactored original patch for android-4.9 where libbpf sources
        are in samples/bpf/ and test_cgrp2_attach2, test_cgrp2_sock,
        and test_cgrp2_sock2 sample tests do not exist.]
(cherry picked from commit 7f677633379b4abb3281cdbe7e7006f049305c03)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-05-30 17:27:28 -07:00
Daniel Mack
c00662a587 UPSTREAM: samples: bpf: add userspace example for attaching eBPF programs to cgroups
Cherry-pick from commit d8c5b17f2bc0de09fbbfa14d90e8168163a579e7

Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:

 * Create arraymap in kernel with 4 byte keys and 8 byte values

 * Load eBPF program

   The eBPF program accesses the map passed in to store two pieces of
   information. The number of invocations of the program, which maps
   to the number of packets received, is stored to key 0. Key 1 is
   incremented on each iteration by the number of bytes stored in
   the skb.

 * Detach any eBPF program previously attached to the cgroup

 * Attach the new program to the cgroup using BPF_PROG_ATTACH

 * Once a second, read map[0] and map[1] to see how many bytes and
   packets were seen on any socket of tasks in the given cgroup.

The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.

libbpf gained two new wrappers for the new syscall commands.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: I011436a755abd62050edd22e47995c166a0bd8a2
2017-05-22 15:30:58 -07:00
Yonghong Song
7bca0a9702 bpf: enhance verifier to understand stack pointer arithmetic
[ Upstream commit 332270fdc8b6fba07d059a9ad44df9e1a2ad4529 ]

llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.

Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:00:20 +02:00
Michael Holzheu
2dbb4c05d0 bpf/samples: Fix PT_REGS_IP on s390x and use it
The files "sampleip_kern.c" and "trace_event_kern.c" directly access
"ctx->regs.ip" which is not available on s390x. Fix this and use the
PT_REGS_IP() macro instead.

Also fix the macro for s390x and use "psw.addr" from "pt_regs".

Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:26:46 -05:00
Martin KaFai Lau
90e02896f1 bpf: Add test for bpf_redirect to ipip/ip6tnl
The test creates two netns, ns1 and ns2.  The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.

    ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)

The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.

The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].

At ns1, the VIPs are routed _via_ the host.

At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl.  The test is configured in a way so
that both ingress and egress can be tested.

At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified.  The return path is routed to the dev ipip/ip6tnl.

During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).

Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12 23:38:07 -05:00
Daniel Borkmann
96a8eb1eee bpf: fix samples to add fake KBUILD_MODNAME
Some of the sample files are causing issues when they are loaded with tc
and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
file as program/map/etc sections are not present, which can be easily
spotted with readelf(1).

Currently, BPF samples are including some of the kernel headers and mid
term we should change them to refrain from this, really. When dynamic
debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
is easily overlooked in the build as clang spills this along with other
noisy warnings from various header includes, and llc still generates an
ELF file with mentioned characteristics. For just playing around with BPF
examples, this can be a bit of a hurdle to take.

Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
done in xdp*_kern samples already.

Fixes: 65d472fb00 ("samples/bpf: add 'pointer to packet' tests")
Fixes: 6afb1e28b8 ("samples/bpf: Add tunnel set/get tests.")
Fixes: a3f7461734 ("cgroup: bpf: Add an example to do cgroup checking in BPF")
Reported-by: Chandrasekar Kannan <ckannan@console.to>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 14:46:12 -04:00
Josef Bacik
484611357c bpf: allow access into map value arrays
Suppose you have a map array value that is something like this

struct foo {
	unsigned iter;
	int array[SOME_CONSTANT];
};

You can easily insert this into an array, but you cannot modify the contents of
foo->array[] after the fact.  This is because we have no way to verify we won't
go off the end of the array at verification time.  This patch provides a start
for this work.  We accomplish this by keeping track of a minimum and maximum
value a register could be while we're checking the code.  Then at the time we
try to do an access into a MAP_VALUE we verify that the maximum offset into that
region is a valid access into that memory region.  So in practice, code such as
this

unsigned index = 0;

if (foo->iter >= SOME_CONSTANT)
	foo->iter = index;
else
	index = foo->iter++;
foo->array[index] = bar;

would be allowed, as we can verify that index will always be between 0 and
SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
check to make sure the index isn't less than 0, or do something like index %=
SOME_CONSTANT.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-29 01:35:35 -04:00
Naveen N. Rao
973d94d8a8 bpf samples: update tracex5 sample to use __seccomp_filter
seccomp_phase1() does not exist anymore. Instead, update sample to use
__seccomp_filter(). While at it, set max locked memory to unlimited.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 03:48:58 -04:00
Naveen N. Rao
2b064fff85 bpf samples: fix compiler errors with sockex2 and sockex3
These samples fail to compile as 'struct flow_keys' conflicts with
definition in net/flow_dissector.h. Fix the same by renaming the
structure used in the sample.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 03:48:58 -04:00
Daniel Borkmann
7d95b0ab5b bpf: add test cases for direct packet access
Add couple of test cases for direct write and the negative size issue, and
also adjust the direct packet access test4 since it asserts that writes are
not possible, but since we've just added support for writes, we need to
invert the verdict to ACCEPT, of course. Summary: 133 PASSED, 0 FAILED.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 23:32:11 -04:00
Alexei Starovoitov
173ca26e9b samples/bpf: add comprehensive ipip, ipip6, ip6ip6 test
the test creates 3 namespaces with veth connected via bridge.
First two namespaces simulate two different hosts with the same
IPv4 and IPv6 addresses configured on the tunnel interface and they
communicate with outside world via standard tunnels.
Third namespace creates collect_md tunnel that is driven by BPF
program which selects different remote host (either first or
second namespace) based on tcp dest port number while tcp dst
ip is the same.
This scenario is rough approximation of load balancer use case.
The tests check both traditional tunnel configuration and collect_md mode.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 10:13:07 -04:00
Alexei Starovoitov
a1c82704d1 samples/bpf: extend test_tunnel_bpf.sh with IPIP test
extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
It tests both traditional tunnel configuration and
dynamic via bpf helpers.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 10:13:07 -04:00
Daniel Borkmann
2d2be8cab2 bpf: fix range propagation on direct packet access
LLVM can generate code that tests for direct packet access via
skb->data/data_end in a way that currently gets rejected by the
verifier, example:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)

The reason why this gets rejected despite a proper test is that we
currently call find_good_pkt_pointers() only in case where we detect
tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
derived, for example, from a register of type pkt(id=Y,off=0,r=0)
pointing to skb->data. find_good_pkt_pointers() then fills the range
in the current branch to pkt(id=Y,off=0,r=Z) on success.

For above case, we need to extend that to recognize pkt_end >= rX
pattern and mark the other branch that is taken on success with the
appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
only two practical options to test for from what LLVM could have
generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
that we would need to take into account as well.

After the fix:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  29: (bf) r1 = r8
  30: (25) if r8 > 0x3c goto pc+47
   R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
   R9=pkt(id=0,off=0,r=54) R10=fp
  31: (b7) r1 = 1
  [...]

Verifier test cases are also added in this work, one that demonstrates
the mentioned example here and one that tries a bad packet access for
the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
with both test variants (>, >=).

Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:28:37 -07:00
Brendan Gregg
72874418e4 samples/bpf: add sampleip example
sample instruction pointer and frequency count in a BPF map

Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:45 -07:00
Alexei Starovoitov
1c47910ef8 samples/bpf: add perf_event+bpf example
The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++
It's primary purpose to check that key bpf helpers like map lookup, update,
get_stackid, trace_printk and ctx access are all working.
It checks:
- PERF_COUNT_HW_CPU_CYCLES on all cpus
- PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children
- PERF_COUNT_SW_CPU_CLOCK on all cpus
- PERF_COUNT_SW_CPU_CLOCK for current process

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:45 -07:00
William Tu
6afb1e28b8 samples/bpf: Add tunnel set/get tests.
The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE.  A native
tunnel device is created in a namespace to interact with a lwtunnel
device out of the namespace, with metadata enabled.  The bpf_skb_set_*
program is attached to tc egress and bpf_skb_get_* is attached to egress
qdisc.  A ping between two tunnels is used to verify correctness and
the result of bpf_skb_get_* printed by bpf_trace_printk.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 22:42:44 -07:00
David S. Miller
60747ef4d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes for both merge conflicts.

Resolution work done by Stephen Rothwell was used
as a reference.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 01:17:32 -04:00
Aaron Yue
1633ac0a2e samples/bpf: add verifier tests for the helper access to the packet
test various corner cases of the helper function access to the packet
via crafted XDP programs.

Signed-off-by: Aaron Yue <haoxuany@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 21:56:18 -07:00
Daniel Borkmann
747ea55e4f bpf: fix bpf_skb_in_cgroup helper naming
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.

Tejun says:

  So, I think in_cgroup should mean that the object is in that
  particular cgroup while under_cgroup in the subhierarchy of that
  cgroup. Let's rename the other subhierarchy test to under too. I
  think that'd be a lot less confusing going forward.

  [...]

  It's more intuitive and gives us the room to implement the real
  "in" test if ever necessary in the future.

Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.

Fixes: 4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-08-12 21:53:33 -07:00
Sargun Dhillon
9e6e60ecbd samples/bpf: Add test_current_task_under_cgroup test
This test has a BPF program which writes the last known pid to call the
sync syscall within a given cgroup to a map.

The user mode program creates its own mount namespace, and mounts the
cgroupsv2  hierarchy in there, as on all current test systems
(Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default.
Once it does this, it proceeds to test.

The test checks for positive and negative condition. It ensures that
when it's part of a given cgroup, its pid is captured in the map,
and that when it leaves the cgroup, this doesn't happen.

It populate a cgroups arraymap prior to execution in userspace. This means
that the program must be run in the same cgroups namespace as the programs
that are being traced.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 21:49:42 -07:00
Adam Barth
05b8ad25bc samples/bpf: fix bpf_perf_event_output prototype
The commit 555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")
started using 20 of initially reserved upper 32-bits of 'flags' argument
in bpf_perf_event_output(). Adjust corresponding prototype in samples/bpf/bpf_helpers.h

Signed-off-by: Adam Barth <arb@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10 23:12:31 -07:00
Alexei Starovoitov
ba0cc3c153 samples/bpf: add bpf_map_update_elem() tests
increase test coverage to check previously missing 'update when full'

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06 20:49:19 -04:00
Sargun Dhillon
cf9b1199de samples/bpf: Add test/example of using bpf_probe_write_user bpf helper
This example shows using a kprobe to act as a dnat mechanism to divert
traffic for arbitrary endpoints. It rewrite the arguments to a syscall
while they're still in userspace, and before the syscall has a chance
to copy the argument into kernel space.

Although this is an example, it also acts as a test because the mapped
address is 255.255.255.255:555 -> real address, and that's not a legal
address to connect to. If the helper is broken, the example will fail
on the intermediate steps, as well as the final step to verify the
rewrite of userspace memory succeeded.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 18:07:48 -07:00
Sargun Dhillon
96ae522795 bpf: Add bpf_probe_write_user BPF helper to be called in tracers
This allows user memory to be written to during the course of a kprobe.
It shouldn't be used to implement any kind of security mechanism
because of TOC-TOU attacks, but rather to debug, divert, and
manipulate execution of semi-cooperative processes.

Although it uses probe_kernel_write, we limit the address space
the probe can write into by checking the space with access_ok.
We do this as opposed to calling copy_to_user directly, in order
to avoid sleeping. In addition we ensure the threads's current fs
/ segment is USER_DS and the thread isn't exiting nor a kernel thread.

Given this feature is meant for experiments, and it has a risk of
crashing the system, and running programs, we print a warning on
when a proglet that attempts to use this helper is installed,
along with the pid and process name.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 18:07:48 -07:00
Brenden Blanco
d9094bda5c bpf: make xdp sample variable names more meaningful
The naming choice of index is not terribly descriptive, and dropcnt is
in fact incorrect for xdp2. Pick better names for these: ipproto and
rxcnt.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 22:07:24 -07:00
Brenden Blanco
764cbccef8 bpf: add sample for xdp forwarding and rewrite
Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.

Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:33 -07:00
Brenden Blanco
86af8b4191 Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.

Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.

$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17:   20403027 drops/s

./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
  5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
  5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
  5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
  5043067pps 2420Mb/sec (2420672160bps) errors: 0

perf report --no-children:
 26.05%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
 17.84%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
  5.52%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
  4.90%  swapper      [kernel.vmlinux]  [k] poll_idle
  4.14%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  2.78%  ksoftirqd/0  [kernel.vmlinux]  [k] __free_pages_ok
  2.57%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.51%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq
  1.94%  ksoftirqd/0  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
  1.45%  swapper      [mlx4_en]         [k] mlx4_en_alloc_frags
  1.35%  ksoftirqd/0  [kernel.vmlinux]  [k] free_one_page
  1.33%  swapper      [kernel.vmlinux]  [k] intel_idle
  1.04%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5c5
  0.96%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c58d
  0.93%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6ee
  0.92%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6b9
  0.89%  ksoftirqd/0  [kernel.vmlinux]  [k] __alloc_pages_nodemask
  0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c686
  0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5d5
  0.78%  ksoftirqd/0  [mlx4_en]         [k] mlx4_alloc_pages.isra.23
  0.77%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5b4
  0.77%  ksoftirqd/0  [kernel.vmlinux]  [k] net_rx_action

machine specs:
 receiver - Intel E5-1630 v3 @ 3.70GHz
 sender - Intel E5645 @ 2.40GHz
 Mellanox ConnectX-3 @40G

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Martin KaFai Lau
a3f7461734 cgroup: bpf: Add an example to do cgroup checking in BPF
test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file.  The pinned file can be loaded by tc and then used
by the bpf prog later.  This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.

test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc.  It is to demonstrate
the usage of bpf_skb_in_cgroup.

test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together.  The idea is like:
1. Load the test_cgrp2_tc_kern.o by tc
2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
   with a cgroup fd
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
   dropped because of a match on the cgroup

Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc.  It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:32:13 -04:00
William Tu
eb88d58559 samples/bpf: set max locked memory to ulimited
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25 12:03:46 -04:00
Alexei Starovoitov
883e44e4de samples/bpf: add verifier tests
add few tests for "pointer to packet" logic of the verifier

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
65d472fb00 samples/bpf: add 'pointer to packet' tests
parse_simple.c - packet parser exapmle with single length check that
filters out udp packets for port 9

parse_varlen.c - variable length parser that understand multiple vlan headers,
ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
The packet is parsed layer by layer with multitple length checks.

parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
Same functionality as parse_simple.

simple = 24.1Mpps per core
varlen = 22.7Mpps
ldabs  = 21.4Mpps

Parser with LD_ABS instructions is slower than full direct access parser
which does more packet accesses and checks.

These examples demonstrate the choice bpf program authors can make between
flexibility of the parser vs speed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
David S. Miller
cba6532100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv4/ip_gre.c

Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 00:52:29 -04:00
Jesper Dangaard Brouer
bdefbbf2ec samples/bpf: like LLC also verify and allow redefining CLANG command
Users are likely to manually compile both LLVM 'llc' and 'clang'
tools.  Thus, also allow redefining CLANG and verify command exist.

Makefile implementation wise, the target that verify the command have
been generalized.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:26:08 -04:00
Jesper Dangaard Brouer
b62a796c10 samples/bpf: allow make to be run from samples/bpf/ directory
It is not intuitive that 'make' must be run from the top level
directory with argument "samples/bpf/" to compile these eBPF samples.

Introduce a kbuild make file trick that allow make to be run from the
"samples/bpf/" directory itself.  It basically change to the top level
directory and call "make samples/bpf/" with the "/" slash after the
directory name.

Also add a clean target that only cleans this directory, by taking
advantage of the kbuild external module setting M=$PWD.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:33 -04:00
Jesper Dangaard Brouer
1c97566d51 samples/bpf: add a README file to get users started
Getting started with using examples in samples/bpf/ is not
straightforward.  There are several dependencies, and specific
versions of these dependencies.

Just compiling the example tool is also slightly obscure, e.g. one
need to call make like:

 make samples/bpf/

Do notice the "/" slash after the directory name.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:32 -04:00
Jesper Dangaard Brouer
7b01dd5793 samples/bpf: Makefile verify LLVM compiler avail and bpf target is supported
Make compiling samples/bpf more user friendly, by detecting if LLVM
compiler tool 'llc' is available, and also detect if the 'bpf' target
is available in this version of LLVM.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:32 -04:00
Jesper Dangaard Brouer
6ccfba75d3 samples/bpf: add back functionality to redefine LLC command
It is practical to be-able-to redefine the location of the LLVM
command 'llc', because not all distros have a LLVM version with bpf
target support.  Thus, it is sometimes required to compile LLVM from
source, and sometimes it is not desired to overwrite the distros
default LLVM version.

This feature was removed with 128d1514be ("samples/bpf: Use llc in
PATH, rather than a hardcoded value").

Add this features back. Note that it is possible to redefine the LLC
on the make command like:

 make samples/bpf/ LLC=~/git/llvm/build/bin/llc

Fixes: 128d1514be ("samples/bpf: Use llc in PATH, rather than a hardcoded value")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:32 -04:00
Alexei Starovoitov
569cc39d39 samples/bpf: fix trace_output example
llvm cannot always recognize memset as builtin function and optimize
it away, so just delete it. It was a leftover from testing
of bpf_perf_event_output() with large data structures.

Fixes: 39111695b1 ("samples: bpf: add bpf_perf_event_output example")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 17:29:45 -04:00
Daniel Borkmann
3f2050e20e bpf, samples: add test cases for raw stack
This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
verifier behaviour.

  [...]
  #84 raw_stack: no skb_load_bytes OK
  #85 raw_stack: skb_load_bytes, no init OK
  #86 raw_stack: skb_load_bytes, init OK
  #87 raw_stack: skb_load_bytes, spilled regs around bounds OK
  #88 raw_stack: skb_load_bytes, spilled regs corruption OK
  #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
  #90 raw_stack: skb_load_bytes, spilled regs + data OK
  #91 raw_stack: skb_load_bytes, invalid access 1 OK
  #92 raw_stack: skb_load_bytes, invalid access 2 OK
  #93 raw_stack: skb_load_bytes, invalid access 3 OK
  #94 raw_stack: skb_load_bytes, invalid access 4 OK
  #95 raw_stack: skb_load_bytes, invalid access 5 OK
  #96 raw_stack: skb_load_bytes, invalid access 6 OK
  #97 raw_stack: skb_load_bytes, large access OK
  Summary: 98 PASSED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 21:40:42 -04:00
Daniel Borkmann
02413cabd6 bpf, samples: don't zero data when not needed
Remove the zero initialization in the sample programs where appropriate.
Note that this is an optimization which is now possible, old programs
still doing the zero initialization are just fine as well. Also, make
sure we don't have padding issues when we don't memset() the entire
struct anymore.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 21:40:42 -04:00
David S. Miller
ae95d71261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00