Commit graph

86 commits

Author SHA1 Message Date
Al Viro
aa3b6e3d41
bpf: don't open-code memdup_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-09-25 16:54:39 +03:00
Martin KaFai Lau
8f9fb2d745
bpf: Allow selecting numa node during map creation
The current map creation API does not allow to provide the numa-node
preference.  The memory usually comes from where the map-creation-process
is running.  The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.

One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps).  Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:

[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]

># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec  #<<<

After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec
4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec
7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec
0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<<

This patch adds one field, numa_node, to the bpf_attr.  Since numa node 0
is a valid node, a new flag BPF_F_NUMA_NODE is also added.  The numa_node
field is honored if and only if the BPF_F_NUMA_NODE flag is set.

Numa node selection is not supported for percpu map.

This patch does not change all the kmalloc.  F.e.
'htab = kzalloc()' is not changed since the object
is small enough to stay in the cache.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:39 +03:00
Johannes Berg
331272374b
bpf: remove struct bpf_map_type_list
There's no need to have struct bpf_map_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer. Also
initialize this array statically to remove code needed
to initialize it.

In order to save duplicating the list, move it to the
types header file added by the previous patch and
include it in the same fashion.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:39 +03:00
Johannes Berg
d4587f5f73
bpf: remove struct bpf_prog_type_list
There's no need to have struct bpf_prog_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer. Also
initialize this array statically to remove code needed
to initialize it.

In order to save duplicating the list, move it to a new
header file and include it in the places needing it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:39 +03:00
Martin KaFai Lau
0ab0efbf96
bpf: Add hash of maps support
This patch adds hash of maps support (hashmap->bpf_map).
BPF_MAP_TYPE_HASH_OF_MAPS is added.

A map-in-map contains a pointer to another map and lets call
this pointer 'inner_map_ptr'.

Notes on deleting inner_map_ptr from a hash map:

1. For BPF_F_NO_PREALLOC map-in-map, when deleting
   an inner_map_ptr, the htab_elem itself will go through
   a rcu grace period and the inner_map_ptr resides
   in the htab_elem.

2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC),
   when deleting an inner_map_ptr, the htab_elem may
   get reused immediately.  This situation is similar
   to the existing prealloc-ated use cases.

   However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls
   inner_map->ops->map_free(inner_map) which will go
   through a rcu grace period (i.e. all bpf_map's map_free
   currently goes through a rcu grace period).  Hence,
   the inner_map_ptr is still safe for the rcu reader side.

This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the
check_map_prealloc() in the verifier.  preallocation is a
must for BPF_PROG_TYPE_PERF_EVENT.  Hence, even we don't expect
heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map
is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using
map-in-map first.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:38 +03:00
Martin KaFai Lau
3bf58335ca
bpf: Add array of maps support
This patch adds a few helper funcs to enable map-in-map
support (i.e. outer_map->inner_map).  The first outer_map type
BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch.
The next patch will introduce a hash of maps type.

Any bpf map type can be acted as an inner_map.  The exception
is BPF_MAP_TYPE_PROG_ARRAY because the extra level of
indirection makes it harder to verify the owner_prog_type
and owner_jited.

Multi-level map-in-map is not supported (i.e. map->map is ok
but not map->map->map).

When adding an inner_map to an outer_map, it currently checks the
map_type, key_size, value_size, map_flags, max_entries and ops.
The verifier also uses those map's properties to do static analysis.
map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT
is using a preallocated hashtab for the inner_hash also.  ops and
max_entries are needed to generate inlined map-lookup instructions.
For simplicity reason, a simple '==' test is used for both map_flags
and max_entries.  The equality of ops is implied by the equality of
map_type.

During outer_map creation time, an inner_map_fd is needed to create an
outer_map.  However, the inner_map_fd's life time does not depend on the
outer_map.  The inner_map_fd is merely used to initialize
the inner_map_meta of the outer_map.

Also, for the outer_map:

* It allows element update and delete from syscall
* It allows element lookup from bpf_prog

The above is similar to the current fd_array pattern.

Change-Id: I9549de584b197524f58f2edef0d103645ac98d06
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:38 +03:00
Daniel Borkmann
de760e6d03
bpf: rework prog_digest into prog_tag
Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via
fdinfo/netlink") was recently discussed, partially due to
admittedly suboptimal name of "prog_digest" in combination
with sha1 hash usage, thus inevitably and rightfully concerns
about its security in terms of collision resistance were
raised with regards to use-cases.

The intended use cases are for debugging resp. introspection
only for providing a stable "tag" over the instruction sequence
that both kernel and user space can calculate independently.
It's not usable at all for making a security relevant decision.
So collisions where two different instruction sequences generate
the same tag can happen, but ideally at a rather low rate. The
"tag" will be dumped in hex and is short enough to introspect
in tracepoints or kallsyms output along with other data such
as stack trace, etc. Thus, this patch performs a rename into
prog_tag and truncates the tag to a short output (64 bits) to
make it obvious it's not collision-free.

Should in future a hash or facility be needed with a security
relevant focus, then we can think about requirements, constraints,
etc that would fit to that situation. For now, rework the exposed
parts for the current use cases as long as nothing has been
released yet. Tested on x86_64 and s390x.

Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink")
Change-Id: Iea6eb91ea03d6488cc3bbf662f5e301015f3433d
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:38 +03:00
Daniel Borkmann
e98c4a5653
bpf: dynamically allocate digest scratch buffer
Geert rightfully complained that 7bd509e311f4 ("bpf: add prog_digest
and expose it via fdinfo/netlink") added a too large allocation of
variable 'raw' from bss section, and should instead be done dynamically:

  # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2
  add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291)
  function                                     old     new   delta
  raw                                            -   32832  +32832
  [...]

Since this is only relevant during program creation path, which can be
considered slow-path anyway, lets allocate that dynamically and be not
implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at
the beginning of replace_map_fd_with_map_ptr() and also error handling
stays straight forward.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:37 +03:00
Mickaël Salaün
311e548897
bpf: Use u64_to_user_ptr()
Replace the custom u64_to_ptr() function with the u64_to_user_ptr()
macro.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:37 +03:00
Daniel Borkmann
e6004d7227
bpf: add prog_digest and expose it via fdinfo/netlink
When loading a BPF program via bpf(2), calculate the digest over
the program's instruction stream and store it in struct bpf_prog's
digest member. This is done at a point in time before any instructions
are rewritten by the verifier. Any unstable map file descriptor
number part of the imm field will be zeroed for the hash.

fdinfo example output for progs:

  # cat /proc/1590/fdinfo/5
  pos:          0
  flags:        02000002
  mnt_id:       11
  prog_type:    1
  prog_jited:   1
  prog_digest:  b27e8b06da22707513aa97363dfb11c7c3675d28
  memlock:      4096

When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded. Furthermore, this can also be
exposed through other means such as netlink in case of a tc cls/act
dump (or xdp in future), but also through tracepoints or other
facilities to identify the program. Other than that, the digest can
also serve as a base name for the work in progress kallsyms support
of programs. The digest doesn't depend/select the crypto layer, since
we need to keep dependencies to a minimum. iproute2 will get support
for this facility.

Change-Id: I660d02fb1eac1a8b42e4cd0248018d64ec07c34c
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:37 +03:00
Martin KaFai Lau
cbd6f3b2f1
bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH
Provide a LRU version of the existing BPF_MAP_TYPE_PERCPU_HASH

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:37 +03:00
Nguyễn Long
cb48ccb273
Revert "BACKPORT: bpf: Add file mode configuration into bpf maps"
This reverts commit 21a435494df37fab20c393a0bc6019ca73e10f03.
2024-09-25 16:54:37 +03:00
Daniel Borkmann
878fee3d4f
bpf: fix loading of BPF_MAXINSNS sized programs
General assumption is that single program can hold up to BPF_MAXINSNS,
that is, 4096 number of instructions. It is the case with cBPF and
that limit was carried over to eBPF. When recently testing digest, I
noticed that it's actually not possible to feed 4096 instructions
via bpf(2).

The check for > BPF_MAXINSNS was added back then to bpf_check() in
cbd3570086 ("bpf: verifier (add ability to receive verification log)").
However, 09756af468 ("bpf: expand BPF syscall with program load/unload")
added yet another check that comes before that into bpf_prog_load(),
but this time bails out already in case of >= BPF_MAXINSNS.

Fix it up and perform the check early in bpf_prog_load(), so we can drop
the second one in bpf_check(). It makes sense, because also a 0 insn
program is useless and we don't want to waste any resources doing work
up to bpf_check() point. The existing bpf(2) man page documents E2BIG
as the official error for such cases, so just stick with it as well.

Fixes: 09756af468 ("bpf: expand BPF syscall with program load/unload")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:37 +03:00
Alexei Starovoitov
1437a0899f
UPSTREAM: bpf: multi program support for cgroup+bpf
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.

The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
  the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
  that cgroup program gets run in addition to the program in this cgroup.

NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.

Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.

To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0)
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 121213201
Bug: 138317270
Test: build & boot cuttlefish
Change-Id: If17b11a773f73d45ea565a947fc1bf7e158db98d
2024-09-25 16:54:37 +03:00
David Ahern
aa1f9ab9cd
bpf: Add new cgroup attach type to enable sock modifications
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Change-Id: I35b0000bd46da69088416ee91c65226c91753ca7
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:37 +03:00
Daniel Borkmann
c9864aa72b
bpf: add owner_prog_type and accounted mem to array map's fdinfo
Allow for checking the owner_prog_type of a program array map. In some
cases bpf(2) can return -EINVAL /after/ the verifier passed and did all
the rewrites of the bpf program.

The reason that lets us fail at this late stage is that program array
maps are incompatible. Allow users to inspect this earlier after they
got the map fd through BPF_OBJ_GET command. tc will get support for this.

Also, display how much we charged the map with regards to RLIMIT_MEMLOCK.

Change-Id: Ifd2de947739563d123c11d21d76fc71f2d8eb522
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:36 +03:00
David Ahern
805be7017d
bpf: Refactor cgroups code in prep for new type
Code move and rename only; no functional change intended.

[Nguyễn Long: Update to match "bpf: introduce BPF_F_ALLOW_OVERRIDE
flag"]

Change-Id: Idcab1159beaa5bbff8b097b0cad2e6edfda1bbb4
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-25 16:54:36 +03:00
Mustafa Gökmen
4fce632291
Revert "BACKPORT: bpf: multi program support for cgroup+bpf"
This reverts commit 148f111e98.
2024-09-25 16:54:36 +03:00
Mustafa Gökmen
7ae7f04d9f
Revert "UPSTREAM: bpf: Explicitly memset the bpf_attr structure"
This reverts commit 2fc19cf960.
2024-09-25 16:54:36 +03:00
Mustafa Gökmen
03c1559487
Revert "bpf: Add BPF_OBJ_GET_INFO_BY_FD"
This reverts commit 053633c9a8.
2024-09-25 16:54:36 +03:00
Mustafa Gökmen
b75007c01b
Revert "bpf: Discard unused bpf_prog_info members code"
This reverts commit 62a45e6fb4.
2024-09-25 16:54:36 +03:00
roynatech2544
62a45e6fb4 bpf: Discard unused bpf_prog_info members code
- Removes the need for a bunch of backports
2024-01-28 03:18:44 +03:00
Martin KaFai Lau
053633c9a8 bpf: Add BPF_OBJ_GET_INFO_BY_FD
A single BPF_OBJ_GET_INFO_BY_FD cmd is used to obtain the info
for both bpf_prog and bpf_map.  The kernel can figure out the
fd is associated with a bpf_prog or bpf_map.

The suggested struct bpf_prog_info and struct bpf_map_info are
not meant to be a complete list and it is not the goal of this patch.
New fields can be added in the future patch.

The focus of this patch is to create the interface,
BPF_OBJ_GET_INFO_BY_FD cmd for exposing the bpf_prog's and
bpf_map's info.

The obj's info, which will be extended (and get bigger) over time, is
separated from the bpf_attr to avoid bloating the bpf_attr.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-28 03:18:44 +03:00
Greg Kroah-Hartman
2fc19cf960 UPSTREAM: bpf: Explicitly memset the bpf_attr structure
For the bpf syscall, we are relying on the compiler to properly zero out
the bpf_attr union that we copy userspace data into. Unfortunately that
doesn't always work properly, padding and other oddities might not be
correctly zeroed, and in some tests odd things have been found when the
stack is pre-initialized to other values.

Fix this by explicitly memsetting the structure to 0 before using it.

Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: Alistair Delva <adelva@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: 1235490
Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com
(cherry picked from commit 8096f229421f7b22433775e928d506f0342e5907)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2dc28cd45024da5cc6861ff4a9b25fae389cc6d8
2020-03-23 18:59:38 +01:00
Alexei Starovoitov
148f111e98 BACKPORT: bpf: multi program support for cgroup+bpf
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.

The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
  the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
  that cgroup program gets run in addition to the program in this cgroup.

NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.

Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.

To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0)
Signed-off-by: Connor O'Brien <connoro@google.com>
Bug: 121213201
Bug: 138317270
Test: build & boot cuttlefish
Change-Id: I06b71c850b9f3e052b106abab7a4a3add012a3f8
2019-12-12 15:48:18 -08:00
Greg Kroah-Hartman
c462abbf77 This is the 4.9.99 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlryqEsACgkQONu9yGCS
 aT4a2xAAw8RIK5Yp2iWIM+yXGnkvt99eUqE5W4HMwwbbJUr3aeW5Uw4A0JOEA6lJ
 z7J+tu7qOTFj9H37Tk0GlOe0GmbfLsyfcxPWNDyII/0l38sgdkypOw/Um6uMaa0l
 o7yBKXTUb0ZKaUonIy7tz3UVh8h1OTtynUHxDY2zMu3zuZzpAccuQe9pPhP7p8QJ
 f3F8yvRlu69763eGPKC87FeX7rBSoQPAIg1PArR6aFWhQ6+NrE+VMkz0zSYqQuTQ
 5rlgUJxUmTGjeug5lhI1c+JN98NSzcLy6EnlIdaW10e83DsH9+jL9YnevC/RtUXc
 jY3kLGHB/InQLxbuHK+4F6NISGcYUOoyD4LNTyeV/v4H4nL81ftxVSBa6SxdQqiC
 YMrXDiRMm9X2aod0eMWCPERv5NtJbcQWYEKr+C702jWJEvr0ijfa7vamB2ClAD92
 ipvfLYaHpNutBOR8JgClwdBwokQXjznIy/PO32gjGYEBGHpxlHBpyP565umTgZcP
 bkTPBai14aWELrzgW3Xyd5fcgIH0znZ3Xi6Krx7RXBzgJEo7A92G4OeaqO4NZ9/B
 UBCOmdHLfbwQRbeJa3h+CqA6SB3DbJXLquUJuuB0tsFP9yvLDyoZfBx4xJuzTACn
 NdiPu7ik9q94pcT6b2P46VSChaQGmHZjaA1/quIO9iVVryeZNlM=
 =m2YN
 -----END PGP SIGNATURE-----

Merge 4.9.99 into android-4.9

Changes in 4.9.99
	perf/core: Fix the perf_cpu_time_max_percent check
	percpu: include linux/sched.h for cond_resched()
	bpf: map_get_next_key to return first key on NULL
	arm/arm64: KVM: Add PSCI version selection API
	crypto: talitos - fix IPsec cipher in length
	serial: imx: ensure UCR3 and UFCR are setup correctly
	USB: serial: option: Add support for Quectel EP06
	ALSA: pcm: Check PCM state at xfern compat ioctl
	ALSA: seq: Fix races at MIDI encoding in snd_virmidi_output_trigger()
	ALSA: aloop: Mark paused device as inactive
	ALSA: aloop: Add missing cable lock to ctl API callbacks
	tracepoint: Do not warn on ENOMEM
	Input: leds - fix out of bound access
	Input: atmel_mxt_ts - add touchpad button mapping for Samsung Chromebook Pro
	xfs: prevent creating negative-sized file via INSERT_RANGE
	RDMA/cxgb4: release hw resources on device removal
	RDMA/ucma: Allow resolving address w/o specifying source address
	RDMA/mlx5: Protect from shift operand overflow
	NET: usb: qmi_wwan: add support for ublox R410M PID 0x90b2
	IB/mlx5: Use unlimited rate when static rate is not supported
	IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
	drm/vmwgfx: Fix a buffer object leak
	drm/bridge: vga-dac: Fix edid memory leak
	test_firmware: fix setting old custom fw path back on exit, second try
	USB: serial: visor: handle potential invalid device configuration
	USB: Accept bulk endpoints with 1024-byte maxpacket
	USB: serial: option: reimplement interface masking
	USB: serial: option: adding support for ublox R410M
	usb: musb: host: fix potential NULL pointer dereference
	usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
	platform/x86: asus-wireless: Fix NULL pointer dereference
	s390/facilites: use stfle_fac_list array size for MAX_FACILITY_BIT
	Linux 4.9.99

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-09 12:28:25 +02:00
Teng Qin
fcbc8d0e7d bpf: map_get_next_key to return first key on NULL
commit 8fe45924387be6b5c1be59a7eb330790c61d5d10 upstream.

When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chenbo Feng <fengc@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-09 09:50:19 +02:00
Greg Kroah-Hartman
bb94f9d8f5 This is the 4.9.91 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlq7xT4ACgkQONu9yGCS
 aT54KBAAsw3cqSI3soZ1zjsjBlwrbH1pnv1M36yUik1Jllg03PM5cBqzPHb3aDJd
 zIl8p57Q4tPnsOH42S/yv6v9oFMoiOdZs7pOWvGVmb7PMe3waIPEkHhs4H6JEGO5
 6dregIdYm7nowsgZk/f7ZprLJ3YouIgRM+K7CD114Tt1Mnk/jt7s/6aiE2gY1R9m
 RMAH77DYJntFehazGucYdzxKsVJo5kPug+PoM3svR9A4kzl4W5WLQ3j8/Oz/M640
 9CaDUXxeA6FQm+4XCdiqaIj5KcY3+kSy+QEOVy59yOMp8TOY9Yk9L8oDdl5Jgx22
 WeiHnroNTbLGbSzH4tq1/DMBt9wT5FSyHDmuhjrWj5scW+PfCk3ALdFvpZD3E032
 sexY1gBTDPkozwSrDdNYXST/Gi3V+2gwBH94IrDlL93IixRnbsgIt8G6Xg9MZ//X
 DDGeVACKyZkUtRF7HXyHAMZ6vruj3QEUrn+pMflhdbF46jMTx4RimyT5P/NQiQNt
 4L0Z9XZpv6QUAJUmmYH8nexJMOTYXy3lpjpZOrNPw5cc2z4VveFma2kDamNDw8Jk
 sgJnDZOVuR70hcQUp5vmws/itlDLQBLRiwrKV28Dtid9kgW+mWzNpUM8ciUag8rf
 t5KoqQjAuooZH+eISoX4hOpCMNoLHKCCibKauWfDxj9tgyL61ho=
 =kgAh
 -----END PGP SIGNATURE-----

Merge 4.9.91 into android-4.9

Changes in 4.9.91
	MIPS: ralink: Remove ralink_halt()
	iio: st_pressure: st_accel: pass correct platform data to init
	ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit
	ALSA: aloop: Sync stale timer before release
	ALSA: aloop: Fix access to not-yet-ready substream via cable
	ALSA: hda/realtek - Always immediately update mute LED with pin VREF
	mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
	PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L
	ahci: Add PCI-id for the Highpoint Rocketraid 644L card
	clk: bcm2835: Fix ana->maskX definitions
	clk: bcm2835: Protect sections updating shared registers
	clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops
	Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174
	libata: fix length validation of ATAPI-relayed SCSI commands
	libata: remove WARN() for DMA or PIO command without data
	libata: don't try to pass through NCQ commands to non-NCQ devices
	libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
	libata: disable LPM for Crucial BX100 SSD 500GB drive
	libata: Enable queued TRIM for Samsung SSD 860
	libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
	libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
	libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
	nfsd: remove blocked locks on client teardown
	mm/vmalloc: add interfaces to free unmapped page table
	x86/mm: implement free pmd/pte page interfaces
	mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
	mm/thp: do not wait for lock_page() in deferred_split_scan()
	mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
	drm/vmwgfx: Fix a destoy-while-held mutex problem.
	drm/radeon: Don't turn off DP sink when disconnected
	drm: udl: Properly check framebuffer mmap offsets
	acpi, numa: fix pxm to online numa node associations
	ACPI / watchdog: Fix off-by-one error at resource assignment
	libnvdimm, {btt, blk}: do integrity setup before add_disk()
	brcmfmac: fix P2P_DEVICE ethernet address generation
	rtlwifi: rtl8723be: Fix loss of signal
	tracing: probeevent: Fix to support minus offset from symbol
	mtdchar: fix usage of mtd_ooblayout_ecc()
	mtd: nand: fsl_ifc: Fix nand waitfunc return value
	mtd: nand: fsl_ifc: Fix eccstat array overflow for IFC ver >= 2.0.0
	mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0
	staging: ncpfs: memory corruption in ncp_read_kernel()
	can: ifi: Repair the error handling
	can: ifi: Check core revision upon probe
	can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack
	can: cc770: Fix queue stall & dropped RTR reply
	can: cc770: Fix use after free in cc770_tx_interrupt()
	tty: vt: fix up tabstops properly
	selftests/x86/ptrace_syscall: Fix for yet more glibc interference
	kvm/x86: fix icebp instruction handling
	x86/build/64: Force the linker to use 2MB page size
	x86/boot/64: Verify alignment of the LOAD segment
	x86/entry/64: Don't use IST entry for #BP stack
	perf/x86/intel/uncore: Fix Skylake UPI event format
	perf stat: Fix CVS output format for non-supported counters
	perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()
	perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake servers
	iio: ABI: Fix name of timestamp sysfs file
	staging: lustre: ptlrpc: kfree used instead of kvfree
	selftests, x86, protection_keys: fix wrong offset in siginfo
	selftests/x86/protection_keys: Fix syscall NR redefinition warnings
	signal/testing: Don't look for __SI_FAULT in userspace
	x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
	selftests: x86: sysret_ss_attrs doesn't build on a PIE build
	kbuild: disable clang's default use of -fmerge-all-constants
	bpf: skip unnecessary capability check
	bpf, x64: increase number of passes
	Linux 4.9.91

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-29 11:32:39 +02:00
Chenbo Feng
3eb88807b2 bpf: skip unnecessary capability check
commit 0fa4fe85f4724fff89b09741c437cbee9cf8b008 upstream.

The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:26 +02:00
Greg Kroah-Hartman
033d019ce2 This is the 4.9.77 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpfC6UACgkQONu9yGCS
 aT7DEg//eQt2Bh3O3UgkO00vmWYCrZVCR8cCYeEhKOiuwtyU6c5zdEybfVTIIsi7
 jm9UypajB6kq50ZXxR1PY1tczfFQpAEZLQBBbaaX7y47HQEYi1FWFRr6ba3VLXnp
 fNYOySWbeyO0XBywyXv5n9a2BE13MX97bW3zP6+Yhf0x7o4wPx5Xx0mDI9etqPnA
 FPItaRVC1Ud+ILQf26PiJPCIOZKXHBYDyqmMlPhwu+Vgn4cbUs9WFeJ0Jq58h5ya
 A/SiknxKH6iCU8us9z4hIXWvyCvoxVLhHm789r4O0L64yAHkc3uHPPax//Xt5psL
 FyTP7e0G+Bj56YpPwx7f6k/+GSrLN52X81cF7oNXy/oJztMuL6oKsfPFxbRxh6wk
 xYgBnie654qbH2zncK1c5/ZgIWSHGlCXPkUzdH5Wky72N6xsvlTmrCQ55TJHLB6t
 UKkHfhKeUSMDjxDEHMhT3pPViKSXZb2zwXBej976OJEVlyeOr65v7rULD+Di7jFT
 dBegeauxWJiVCiOZrM3606C1K6xQk5nnxngg9dAQ8Gj7V2bge3/P414zWEJo0APw
 96EVgGsk0vTcy2irKJmdRWWH1svIflakfvut0PVfbOEhIvsMtcO2cKzEXFmA0vJz
 gcsy+Gsj3sW83BJR5NNiZZ90hF6ht+tNjbp7YNoxkoLHf64SE4A=
 =r9Be
 -----END PGP SIGNATURE-----

Merge 4.9.77 into android-4.9

Changes in 4.9.77
	dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
	mac80211: Add RX flag to indicate ICV stripped
	ath10k: rebuild crypto header in rx data frames
	KVM: Fix stack-out-of-bounds read in write_mmio
	can: gs_usb: fix return value of the "set_bittiming" callback
	IB/srpt: Disable RDMA access by the initiator
	MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
	MIPS: Factor out NT_PRFPREG regset access helpers
	MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
	MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
	MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
	MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
	MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
	kvm: vmx: Scrub hardware GPRs at VM-exit
	platform/x86: wmi: Call acpi_wmi_init() later
	x86/acpi: Handle SCI interrupts above legacy space gracefully
	ALSA: pcm: Remove incorrect snd_BUG_ON() usages
	ALSA: pcm: Add missing error checks in OSS emulation plugin builder
	ALSA: pcm: Abort properly at pending signal in OSS read/write loops
	ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
	ALSA: aloop: Release cable upon open error path
	ALSA: aloop: Fix inconsistent format due to incomplete rule
	ALSA: aloop: Fix racy hw constraints adjustment
	x86/acpi: Reduce code duplication in mp_override_legacy_irq()
	zswap: don't param_set_charp while holding spinlock
	lan78xx: use skb_cow_head() to deal with cloned skbs
	sr9700: use skb_cow_head() to deal with cloned skbs
	smsc75xx: use skb_cow_head() to deal with cloned skbs
	cx82310_eth: use skb_cow_head() to deal with cloned skbs
	xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
	8021q: fix a memory leak for VLAN 0 device
	ip6_tunnel: disable dst caching if tunnel is dual-stack
	net: core: fix module type in sock_diag_bind
	RDS: Heap OOB write in rds_message_alloc_sgs()
	RDS: null pointer dereference in rds_atomic_free_op
	sh_eth: fix TSU resource handling
	sh_eth: fix SH7757 GEther initialization
	net: stmmac: enable EEE in MII, GMII or RGMII only
	ipv6: fix possible mem leaks in ipv6_make_skb()
	ethtool: do not print warning for applications using legacy API
	mlxsw: spectrum_router: Fix NULL pointer deref
	net/sched: Fix update of lastuse in act modules implementing stats_update
	crypto: algapi - fix NULL dereference in crypto_remove_spawns()
	rbd: set max_segments to USHRT_MAX
	x86/microcode/intel: Extend BDW late-loading with a revision check
	KVM: x86: Add memory barrier on vmcs field lookup
	drm/vmwgfx: Potential off by one in vmw_view_add()
	kaiser: Set _PAGE_NX only if supported
	iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
	target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
	bpf: move fixup_bpf_calls() function
	bpf: refactor fixup_bpf_calls()
	bpf: prevent out-of-bounds speculation
	bpf, array: fix overflow in max_entries and undefined behavior in index_mask
	USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
	USB: serial: cp210x: add new device ID ELV ALC 8xxx
	usb: misc: usb3503: make sure reset is low for at least 100us
	USB: fix usbmon BUG trigger
	usbip: remove kernel addresses from usb device and urb debug msgs
	usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
	usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
	staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
	Bluetooth: Prevent stack info leak from the EFS element.
	uas: ignore UAS for Norelsys NS1068(X) chips
	e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
	x86/Documentation: Add PTI description
	x86/cpu: Factor out application of forced CPU caps
	x86/cpufeatures: Make CPU bugs sticky
	x86/cpufeatures: Add X86_BUG_CPU_INSECURE
	x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
	x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
	x86/cpu: Merge bugs.c and bugs_64.c
	sysfs/cpu: Add vulnerability folder
	x86/cpu: Implement CPU vulnerabilites sysfs functions
	x86/cpu/AMD: Make LFENCE a serializing instruction
	x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
	sysfs/cpu: Fix typos in vulnerability documentation
	x86/alternatives: Fix optimize_nops() checking
	x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
	x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
	objtool, modules: Discard objtool annotation sections for modules
	objtool: Detect jumps to retpoline thunks
	objtool: Allow alternatives to be ignored
	x86/asm: Use register variable to get stack pointer value
	x86/retpoline: Add initial retpoline support
	x86/spectre: Add boot time option to select Spectre v2 mitigation
	x86/retpoline/crypto: Convert crypto assembler indirect jumps
	x86/retpoline/entry: Convert entry assembler indirect jumps
	x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
	x86/retpoline/hyperv: Convert assembler indirect jumps
	x86/retpoline/xen: Convert Xen hypercall indirect jumps
	x86/retpoline/checksum32: Convert assembler indirect jumps
	x86/retpoline/irq32: Convert assembler indirect jumps
	x86/retpoline: Fill return stack buffer on vmexit
	selftests/x86: Add test_vsyscall
	x86/retpoline: Remove compile time warning
	objtool: Fix retpoline support for pre-ORC objtool
	x86/pti/efi: broken conversion from efi to kernel page table
	Linux 4.9.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-17 10:29:45 +01:00
Alexei Starovoitov
28035366af bpf: move fixup_bpf_calls() function
commit e245c5c6a5656e4d61aa7bb08e9694fd6e5b2b9d upstream.

no functional change.
move fixup_bpf_calls() to verifier.c
it's being refactored in the next patch

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jiri Slaby <jslaby@suse.cz>
[backported to 4.9 - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:38:55 +01:00
Chenbo Feng
0521e0b3fc UPSTREAM: selinux: bpf: Add addtional check for bpf object file receive
Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry-pick from net-next: f66e448cfda021b0bcd884f26709796fe19c7cc1)
Bug: 30950746

Change-Id: I5b2cf4ccb4eab7eda91ddd7091d6aa3e7ed9f2cd
2017-11-07 12:59:54 -08:00
Chenbo Feng
f3ad3766a9 BACKPORT: security: bpf: Add LSM hooks for bpf object related syscall
Introduce several LSM hooks for the syscalls that will allow the
userspace to access to eBPF object such as eBPF programs and eBPF maps.
The security check is aimed to enforce a per object security protection
for eBPF object so only processes with the right priviliges can
read/write to a specific map or use a specific eBPF program. Besides
that, a general security hook is added before the multiplexer of bpf
syscall to check the cmd and the attribute used for the command. The
actual security module can decide which command need to be checked and
how the cmd should be checked.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Added the LIST_HEAD_INIT call for security hooks, it nolonger exist in
uptream code.
(cherry-pick from net-next: afdb09c720b62b8090584c11151d856df330e57d)
Bug: 30950746

Change-Id: Ieb3ac74392f531735fc7c949b83346a5f587a77b
2017-11-07 12:59:20 -08:00
Chenbo Feng
4672ded3ec BACKPORT: bpf: Add file mode configuration into bpf maps
Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Deleted the file mode configuration code in unsupported map type and
removed the file mode check in non-existing helper functions.
(cherry-pick from net-next: 6e71b04a82248ccf13a94b85cbc674a9fefe53f5)
Bug: 30950746

Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b
2017-11-07 12:47:56 -08:00
Greg Kroah-Hartman
184ce810ce This is the 4.9.36 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllc3lYACgkQONu9yGCS
 aT7rhBAAoYx4e8Cbp2iUSEOupe4zWD087bWaQELz9EDh5fsr8zgelyzoeK6y0Rxy
 ngwjVViu9gkouqeHY8gLCFcL4Q/2NFMCGmZ+x7rDV1+d7PWrxCtDw9xnlwIzHtlb
 K9KeWs1TTNtmN2ZQMmnnOQZhdwSseHsA+bdn60n/ol3+n5cLcil6B/xOFCLPlynq
 kr3CZg70HV91V1Cy1jj4/QOY+LE19ratIAY3lirLNphrVt/AWFYBUnBwH4Y7Pked
 VrQ5v0UsB7SHzuuWPkoTE2hM103TjVMoA+/SF6zurb0KcxP+uo9o6RPsOzxeBco/
 cfKGI6ZozOFHy/KIEzC+/EDhdlG08t4MMiAS73hYNthMkVSz6AafNrfGQNmgO+lG
 b2jCmzdp3vdAan4JQ2FvGuUYHFQaskJh5FGCSFiWvzOl1fbIBknjBA+Wubj91lKn
 YqC3srZwi5Z+Ol644uEMorBI3mYVPHlkdSvw17dS786x4zL1ATxpRrAGhcbK/jdU
 e8x/S2nr+A3thvLgK75R6nAfr+srnKz4MUaN3vHLnvXYECGNEXddj5FNPO1lSbeK
 28F/5+W8Onm+pBsldLJjzN4v3Fsq+JWtUZGod+aMZJq0l+Mu8bjZ6rtTbepv5YOv
 6RA8MgBt2heLU39thiXXcpk7BPLTHk79cvKV8drQww4cGNuk48k=
 =HbH+
 -----END PGP SIGNATURE-----

Merge 4.9.36 into android-4.9

Changes in 4.9.36
	ipv6: release dst on error in ip6_dst_lookup_tail
	net: don't call strlen on non-terminated string in dev_set_alias()
	decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
	net: Zero ifla_vf_info in rtnl_fill_vfinfo()
	net: vrf: Make add_fib_rules per network namespace flag
	af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
	Fix an intermittent pr_emerg warning about lo becoming free.
	sctp: disable BH in sctp_for_each_endpoint
	net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx
	net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse
	net/mlx5e: Added BW check for DIM decision mechanism
	net/mlx5e: Fix wrong indications in DIM due to counter wraparound
	proc: snmp6: Use correct type in memset
	igmp: acquire pmc lock for ip_mc_clear_src()
	igmp: add a missing spin_lock_init()
	ipv6: fix calling in6_ifa_hold incorrectly for dad work
	sctp: return next obj by passing pos + 1 into sctp_transport_get_idx
	net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it
	net/mlx5: Wait for FW readiness before initializing command interface
	net/mlx5e: Fix timestamping capabilities reporting
	decnet: always not take dst->__refcnt when inserting dst into hash table
	net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
	sfc: provide dummy definitions of vswitch functions
	ipv6: Do not leak throw route references
	rtnetlink: add IFLA_GROUP to ifla_policy
	netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
	netfilter: synproxy: fix conntrackd interaction
	NFSv4: fix a reference leak caused WARNING messages
	NFSv4.x/callback: Create the callback service through svc_create_pooled
	xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
	drm/ast: Handle configuration without P2A bridge
	mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
	MIPS: head: Reorder instructions missing a delay slot
	MIPS: Avoid accidental raw backtrace
	MIPS: pm-cps: Drop manual cache-line alignment of ready_count
	MIPS: Fix IRQ tracing & lockdep when rescheduling
	ALSA: hda - Fix endless loop of codec configure
	ALSA: hda - set input_path bitmap to zero after moving it to new place
	NFSv4.1: Fix a race in nfs4_proc_layoutget
	gpiolib: fix filtering out unwanted events
	drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
	dm thin: do not queue freed thin mapping for next stage processing
	x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds()
	usb: gadget: f_fs: Fix possibe deadlock
	l2tp: fix race in l2tp_recv_common()
	l2tp: ensure session can't get removed during pppol2tp_session_ioctl()
	l2tp: fix duplicate session creation
	l2tp: hold session while sending creation notifications
	l2tp: take a reference on sessions used in genetlink handlers
	mm: numa: avoid waiting on freed migrated pages
	sparc64: Handle PIO & MEM non-resumable errors.
	sparc64: Zero pages on allocation for mondo and error queues.
	net: ethtool: add support for 2500BaseT and 5000BaseT link modes
	net: phy: add an option to disable EEE advertisement
	dt-bindings: net: add EEE capability constants
	net: phy: fix sign type error in genphy_config_eee_advert
	net: phy: use boolean dt properties for eee broken modes
	dt: bindings: net: use boolean dt properties for eee broken modes
	ARM64: dts: meson-gxbb-odroidc2: fix GbE tx link breakage
	xen/blkback: don't free be structure too early
	KVM: x86: fix fixing of hypercalls
	scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
	stmmac: add missing of_node_put
	scsi: lpfc: Set elsiocb contexts to NULL after freeing it
	qla2xxx: Terminate exchange if corrupted
	qla2xxx: Fix erroneous invalid handle message
	drm/amdgpu: fix program vce instance logic error.
	drm/amdgpu: add support for new hainan variants
	net: phy: dp83848: add DP83620 PHY support
	perf/x86/intel: Handle exclusive threadid correctly on CPU hotplug
	net: korina: Fix NAPI versus resources freeing
	powerpc/eeh: Enable IO path on permanent error
	net: ethtool: Initialize buffer when querying device channel settings
	xen-netback: fix memory leaks on XenBus disconnect
	xen-netback: protect resource cleaning on XenBus disconnect
	bnxt_en: Fix "uninitialized variable" bug in TPA code path.
	bpf: don't trigger OOM killer under pressure with map alloc
	objtool: Fix IRET's opcode
	gianfar: Do not reuse pages from emergency reserve
	Btrfs: Fix deadlock between direct IO and fast fsync
	Btrfs: fix truncate down when no_holes feature is enabled
	virtio_console: fix a crash in config_work_handler
	swiotlb-xen: update dev_addr after swapping pages
	xen-netfront: Fix Rx stall during network stress and OOM
	scsi: virtio_scsi: Reject commands when virtqueue is broken
	iwlwifi: fix kernel crash when unregistering thermal zone
	platform/x86: ideapad-laptop: handle ACPI event 1
	amd-xgbe: Check xgbe_init() return code
	net: dsa: Check return value of phy_connect_direct()
	drm/amdgpu: check ring being ready before using
	vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
	mlxsw: spectrum_router: Correctly reallocate adjacency entries
	virtio_net: fix PAGE_SIZE > 64k
	ip6_tunnel: must reload ipv6h in ip6ip6_tnl_xmit()
	vxlan: do not age static remote mac entries
	ibmveth: Add a proper check for the availability of the checksum features
	kernel/panic.c: add missing \n
	Documentation: devicetree: change the mediatek ethernet compatible string
	drm/etnaviv: trick drm_mm into giving out a low IOVA
	perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
	pinctrl: intel: Set pin direction properly
	net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
	mac80211: recalculate min channel width on VHT opmode changes
	perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
	HID: i2c-hid: Add sleep between POWER ON and RESET
	scsi: lpfc: avoid double free of resource identifiers
	spi: davinci: use dma_mapping_error()
	arm64: assembler: make adr_l work in modules under KASLR
	net: thunderx: acpi: fix LMAC initialization
	drm/radeon/si: load special ucode for certain MC configs
	drm/amd/powerplay: fix vce cg logic error on CZ/St.
	drm/amd/powerplay: refine vce dpm update code on Cz.
	pmem: return EIO on read_pmem() failure
	mac80211: initialize SMPS field in HT capabilities
	x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
	x86/mpx: Use compatible types in comparison to fix sparse error
	perf/core: Fix sys_perf_event_open() vs. hotplug
	perf/x86: Reject non sampling events with precise_ip
	aio: fix lock dep warning
	coredump: Ensure proper size of sparse core files
	swiotlb: ensure that page-sized mappings are page-aligned
	s390/ctl_reg: make __ctl_load a full memory barrier
	usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value
	be2net: fix status check in be_cmd_pmac_add()
	be2net: don't delete MAC on close on unprivileged BE3 VFs
	be2net: fix MAC addr setting on privileged BE3 VFs
	perf probe: Fix to show correct locations for events on modules
	net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types
	tipc: allocate user memory with GFP_KERNEL flag
	perf probe: Fix to probe on gcc generated functions in modules
	net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
	sctp: check af before verify address in sctp_addr_id2transport
	ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
	ravb: Fix use-after-free on `ifconfig eth0 down`
	mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings
	xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
	xfrm: NULL dereference on allocation failure
	xfrm: Oops on error in pfkey_msg2xfrm_state()
	netfilter: use skb_to_full_sk in ip_route_me_harder
	watchdog: bcm281xx: Fix use of uninitialized spinlock.
	sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
	spi: When no dma_chan map buffers with spi_master's parent
	spi: fix device-node leaks
	regulator: tps65086: Fix expected switch DT node names
	regulator: tps65086: Fix DT node referencing in of_parse_cb
	ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer
	ARM: dts: OMAP3: Fix MFG ID EEPROM
	ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
	ARM: 8685/1: ensure memblock-limit is pmd-aligned
	tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel
	x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug
	x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
	x86/mm: Fix flush_tlb_page() on Xen
	ocfs2: o2hb: revert hb threshold to keep compatible
	iommu/vt-d: Don't over-free page table directories
	iommu: Handle default domain attach failure
	iommu/dma: Don't reserve PCI I/O windows
	iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
	iommu/amd: Fix interrupt remapping when disable guest_mode
	cpufreq: s3c2416: double free on driver init error path
	clk: scpi: don't add cpufreq device if the scpi dvfs node is disabled
	objtool: Fix another GCC jump table detection issue
	infiniband: hns: avoid gcc-7.0.1 warning for uninitialized data
	brcmfmac: avoid writing channel out of allocated array
	i2c: brcmstb: Fix START and STOP conditions
	mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program
	arm64: fix NULL dereference in have_cpu_die()
	KVM: x86: fix emulation of RSM and IRET instructions
	KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
	KVM: x86: zero base3 of unusable segments
	KVM: nVMX: Fix exception injection
	Linux 4.9.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-05 16:18:14 +02:00
Daniel Borkmann
251d00bf13 bpf: don't trigger OOM killer under pressure with map alloc
[ Upstream commit d407bd25a204bd66b7346dde24bd3d37ef0e0b05 ]

This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
that are to be used for map allocations. Using kmalloc() for very large
allocations can cause excessive work within the page allocator, so i) fall
back earlier to vmalloc() when the attempt is considered costly anyway,
and even more importantly ii) don't trigger OOM killer with any of the
allocators.

Since this is based on a user space request, for example, when creating
maps with element pre-allocation, we really want such requests to fail
instead of killing other user space processes.

Also, don't spam the kernel log with warnings should any of the allocations
fail under pressure. Given that, we can make backend selection in
bpf_map_area_alloc() generic, and convert all maps over to use this API
for spots with potentially large allocation requests.

Note, replacing the one kmalloc_array() is fine as overflow checks happen
earlier in htab_map_alloc(), since it must also protect the multiplication
for vmalloc() should kmalloc_array() fail.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:40:21 +02:00
Chenbo Feng
b9aad97657 FROMLIST: [net-next,v2,2/2] bpf: Remove the capability check for cgroup skb eBPF
program

Currently loading a cgroup skb eBPF program require a CAP_SYS_ADMIN
capability while attaching the program to a cgroup only requires the
user have CAP_NET_ADMIN privilege. We can escape the capability
check when load the program just like socket filter program to make
the capability requirement consistent.

Change since v1:
Change the code style in order to be compliant with checkpatch.pl
preference

(url: http://patchwork.ozlabs.org/patch/769460/)
Signed-off-by: Chenbo Feng <fengc@google.com>
Bug: 30950746
Change-Id: Ibe51235127d6f9349b8f563ad31effc061b278ed
2017-06-06 00:34:00 +00:00
Alexei Starovoitov
1ee2b4b803 BACKPORT: bpf: introduce BPF_F_ALLOW_OVERRIDE flag
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Fixes: Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28
       ("cgroup: add support for eBPF programs")
[AmitP: Refactored original patch for android-4.9 where libbpf sources
        are in samples/bpf/ and test_cgrp2_attach2, test_cgrp2_sock,
        and test_cgrp2_sock2 sample tests do not exist.]
(cherry picked from commit 7f677633379b4abb3281cdbe7e7006f049305c03)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-05-30 17:27:28 -07:00
Daniel Mack
00615dfcd2 UPSTREAM: bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands
Cherry-pick from commit f4324551489e8781d838f941b7aee4208e52e8bf

Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and
BPF_PROG_DETACH which allow attaching and detaching eBPF programs
to a target.

On the API level, the target could be anything that has an fd in
userspace, hence the name of the field in union bpf_attr is called
'target_fd'.

When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is
expected to be a valid file descriptor of a cgroup v2 directory which
has the bpf controller enabled. These are the only use-cases
implemented by this patch at this point, but more can be added.

If a program of the given type already exists in the given cgroup,
the program is swapped automically, so userspace does not have to drop
an existing program first before installing a new one, which would
otherwise leave a gap in which no program is attached.

For more information on the propagation logic to subcgroups, please
refer to the bpf cgroup controller implementation.

The API is guarded by CAP_NET_ADMIN.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: Iab156859332166835d51e1e6f64e5cb8b81870f2
2017-05-22 15:30:56 -07:00
Daniel Borkmann
20b2b24f91 bpf: fix map not being uncharged during map creation failure
In map_create(), we first find and create the map, then once that
suceeded, we charge it to the user's RLIMIT_MEMLOCK, and then fetch
a new anon fd through anon_inode_getfd(). The problem is, once the
latter fails f.e. due to RLIMIT_NOFILE limit, then we only destruct
the map via map->ops->map_free(), but without uncharging the previously
locked memory first. That means that the user_struct allocation is
leaked as well as the accounted RLIMIT_MEMLOCK memory not released.
Make the label names in the fix consistent with bpf_prog_load().

Fixes: aaac3ba95e ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 13:22:26 -05:00
Brenden Blanco
59d3656d5b bpf: add bpf_prog_add api for bulk prog refcnt
A subsystem may need to store many copies of a bpf program, each
deserving its own reference. Rather than requiring the caller to loop
one by one (with possible mid-loop failure), add a bulk bpf_prog_add
api.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:31 -07:00
Martin KaFai Lau
4ed8ec521e cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:30:38 -04:00
Daniel Borkmann
113214be7f bpf: refactor bpf_prog_get and type check into helper
Since bpf_prog_get() and program type check is used in a couple of places,
refactor this into a small helper function that we can make use of. Since
the non RO prog->aux part is not used in performance critical paths and a
program destruction via RCU is rather very unlikley when doing the put, we
shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
check, but actually not taking the ref at all (due to being in fdget() /
fdput() section of the bpf fd) is even cleaner and makes the diff smaller
as well, so just go for that. Callsites are changed to make use of the new
helper where possible.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:00:47 -04:00
Daniel Borkmann
1aacde3d22 bpf: generally move prog destruction to RCU deferral
Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:

 - Set up a process with threads T1, T2 and T3
 - Let T1 set up a socket filter F1 that invokes another filter F2
   through a BPF map [tail call]
 - Let T1 trigger the socket filter via a unix domain socket write,
   don't wait for completion
 - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
 - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
 - Let T3 close the file descriptor for F2, dropping the reference
   count of F2 to 2
 - At this point, T1 should have looked up F2 from the map, but not
   finished executing it
 - Let T3 remove F2 from the BPF map, dropping the reference count of
   F2 to 1
 - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
   the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
   via schedule_work()
 - At this point, the BPF program could be freed
 - BPF execution is still running in a freed BPF program

While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().

That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb5607035 ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
their destruction) with call_rcu() look good to me.

Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:00:47 -04:00
Daniel Borkmann
d056a78876 bpf, maps: extend map_fd_get_ptr arguments
This patch extends map_fd_get_ptr() callback that is used by fd array
maps, so that struct file pointer from the related map can be passed
in. It's safe to remove map_update_elem() callback for the two maps since
this is only allowed from syscall side, but not from eBPF programs for these
two map types. Like in per-cpu map case, bpf_fd_array_map_update_elem()
needs to be called directly here due to the extra argument.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 23:42:57 -07:00
Daniel Borkmann
61d1b6a42f bpf, maps: add release callback
Add a release callback for maps that is invoked when the last
reference to its struct file is gone and the struct file about
to be released by vfs. The handler will be used by fd array maps.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 23:42:57 -07:00
Daniel Borkmann
d1c55ab5e4 bpf: prepare bpf_int_jit_compile/bpf_prog_select_runtime apis
Since the blinding is strictly only called from inside eBPF JITs,
we need to change signatures for bpf_int_jit_compile() and
bpf_prog_select_runtime() first in order to prepare that the
eBPF program we're dealing with can change underneath. Hence,
for call sites, we need to return the latest prog. No functional
change in this patch.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
Alexei Starovoitov
92117d8443 bpf: fix refcnt overflow
On a system with >32Gbyte of phyiscal memory and infinite RLIMIT_MEMLOCK,
the malicious application may overflow 32-bit bpf program refcnt.
It's also possible to overflow map refcnt on 1Tb system.
Impose 32k hard limit which means that the same bpf program or
map cannot be shared by more than 32k processes.

Fixes: 1be7f75d16 ("bpf: enable non-root eBPF programs")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 17:29:45 -04:00
Daniel Borkmann
322cea2f41 bpf: add missing map_flags to bpf_map_show_fdinfo
Add map_flags attribute to bpf_map_show_fdinfo(), so that tools like
tc can check for them when loading objects from a pinned entry, e.g.
if user intent wrt allocation (BPF_F_NO_PREALLOC) is different to the
pinned object, it can bail out. Follow-up to 6c90598174 ("bpf:
pre-allocate hash map elements"), so that tc can still support this
with v4.6.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-25 11:36:41 -04:00
Alexei Starovoitov
b8cdc05173 bpf: bpf_stackmap_copy depends on CONFIG_PERF_EVENTS
0-day bot reported build error:
kernel/built-in.o: In function `map_lookup_elem':
>> kernel/bpf/.tmp_syscall.o:(.text+0x329b3c): undefined reference to `bpf_stackmap_copy'
when CONFIG_BPF_SYSCALL is set and CONFIG_PERF_EVENTS is not.
Add weak definition to resolve it.
This code path in map_lookup_elem() is never taken
when CONFIG_PERF_EVENTS is not set.

Fixes: 557c0c6e7d ("bpf: convert stackmap to pre-allocation")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 23:26:51 -05:00