Commit graph

4229 commits

Author SHA1 Message Date
Phapoom Saksri
b4cf681687 drunk 2025-01-14 01:20:57 +07:00
THEBOSS619
366405377c
exynos9810: schedutil/KAIR: Sync schedutil to latest patches from Samsung
Taken from Note 10 5G & S10 [Exynos 9825/9820] and fully merged with KAIR feature as they quote "AI based Resource Control"
2024-09-27 17:19:06 +03:00
Nick Terrell
4cbf3d88c0 zstd: Improve fast huffman decoding speed in linux kernel
gcc in the linux kernel was not unrolling the inner loops of the Huffman
decoder, which was destroying decoding performance. The compiler was
generating crazy code with all sorts of branches. I suspect because of
Spectre mitigations, but I'm not certain. Once the loops were manually
unrolled, performance was restored.

Additionally, when gcc couldn't prove that the variable left shift in
the 4X2 decode loop wasn't greater than 63, it inserted checks to verify
it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete
this check. This is a no op, because  is guaranteed to be
less than 64.

Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the
fast C loops for Issue https://github.com/facebook/zstd/issues/3762 . So if even after this change, there is a
performance regression, users can opt-out at compile time.
2023-12-17 21:12:40 +03:00
Nick Terrell
d33530b08f zstd: Backport fix for arry-index-out-of-bounds UBSAN warning
Backport the fix already in the kernel, and is fixed in the dev branch
in a different way.

https://lkml.kernel.org/r/20231012213428.1390905-1-nickrterrell@gmail.com
2023-12-17 21:12:40 +03:00
Dave Rodgman
2dde2cda7e
UPSTREAM: lib/lzo: fix ambiguous encoding bug in lzo-rle
In some rare cases, for input data over 32 KB, lzo-rle could encode two
different inputs to the same compressed representation, so that
decompression is then ambiguous (i.e.  data may be corrupted - although
zram is not affected because it operates over 4 KB pages).

This modifies the compressor without changing the decompressor or the
bitstream format, such that:

 - there is no change to how data produced by the old compressor is
   decompressed

 - an old decompressor will correctly decode data from the updated
   compressor

 - performance and compression ratio are not affected

 - we avoid introducing a new bitstream format

In testing over 12.8M real-world files totalling 903 GB, three files
were affected by this bug.  I also constructed 37M semi-random 64 KB
files totalling 2.27 TB, and saw no affected files.  Finally I tested
over files constructed to contain each of the ~1024 possible bad input
sequences; for all of these cases, updated lzo-rle worked correctly.

There is no significant impact to performance or compression ratio.

Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Rodgman <dave.rodgman@arm.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200507100203.29785-1-dave.rodgman@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I918e3f96e4da46c91c04344aaa459f952c47ba1d
2023-06-08 13:36:33 +03:00
Dave Rodgman
d34740f19c
UPSTREAM: lib/lzo/lzo1x_compress.c: fix alignment bug in lzo-rle
Fix an unaligned access which breaks on platforms where this is not
permitted (e.g., Sparc).

Link: http://lkml.kernel.org/r/20190912145502.35229-1-dave.rodgman@arm.com
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Cc: Dave Rodgman <dave.rodgman@arm.com>
Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Icaea4b5e0c3b436e1a0025538482f79b69dd3279
2023-06-08 13:36:32 +03:00
Dave Rodgman
4e533b6870
UPSTREAM: lib/lzo: fix bugs for very short or empty input
For very short input data (0 - 1 bytes), lzo-rle was not behaving
correctly.  Fix this behaviour and update documentation accordingly.

For zero-length input, lzo v0 outputs an end-of-stream marker only,
which was misinterpreted by lzo-rle as a bitstream version number.
Ensure bitstream versions > 0 require a minimum stream length of 5.

Also fixes a bug in handling the tail for very short inputs when a
bitstream version is present.

Link: http://lkml.kernel.org/r/20190326165857.34613-1-dave.rodgman@arm.com
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ifcb1941def32904d1772784af8ca8926526e4e2f
2023-06-08 13:36:30 +03:00
阿菌•未霜
5830a7261e
lib/lz4: Import arm64 V8 ASM lz4 decompression acceleration
Change-Id: I3c8dd91df090bb692784a6b7a61c8877b1e1dfba
2023-06-08 13:36:26 +03:00
xxmustafacooTR
bf704dd850
wireguard: compile siphash 2023-05-12 00:59:53 +03:00
Krzysztof Kozlowski
8b3ed567ea
lib/crc: Use consistent naming for CRC-32 polynomials
Header was defining CRCPOLY_LE/BE and CRC32C_POLY_LE but in fact all of
them are CRC-32 polynomials so use consistent naming.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2023-05-02 20:56:58 +03:00
Krzysztof Kozlowski
e9ad49b430
lib/crc: Move polynomial definition to separate header
Allow other drivers and parts of kernel to use the same define for
CRC32 polynomial, instead of duplicating it in many places.  This code
does not bring any functional changes, except moving existing code.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2023-05-02 20:56:57 +03:00
Alexey Dobriyan
ffa1398967
locking/refcounts: Include fewer headers in <linux/refcount.h>
Debloat <linux/refcount.h>'s dependencies:

- <linux/kernel.h> is not needed, but <linux/compiler.h> is.
- <linux/mutex.h> is not needed, only a forward declaration of "struct mutex".
- <linux/spinlock.h> is not needed, <linux/spinlock_types.h> is enough.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/lkml/20180331220036.GA7676@avx2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-04-30 19:49:36 +03:00
Anna-Maria Gleixner
c16da8a546
locking/refcounts: Implement refcount_dec_and_lock_irqsave()
There are in-tree users of refcount_dec_and_lock() which must acquire the
spin lock with interrupts disabled. To workaround the lack of an irqsave
variant of refcount_dec_and_lock() they use local_irq_save() at the call
site. This causes extra code and creates in some places unneeded long
interrupt disabled times. These places need also extra treatment for
PREEMPT_RT due to the disconnect of the irq disabling and the lock
function.

Implement the missing irqsave variant of the function.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r20180612161621.22645-4-bigeasy@linutronix.de

[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g]
2023-04-30 19:49:36 +03:00
Isaac J. Manjarres
9370578cb2
lib: refcount: Cause kernel panic on refcount error detection
Currently, when using the refcount API functions,
a warning is printed out once to let a user of
the refcount API know that an error case has been
detected. Then the refcount functions will
silently return, without modifying the reference count, which
could be mistaken for a successful modification. This can
allow for improper use of the object associated with that
refcount later. Trigger a kernel panic in case of refcount
error detection to prevent misuse of objects associated with
refcounts.

Change-Id: Ifb6a331d08a7d6c285225bc9667d2f4054db3561
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2023-04-30 19:49:35 +03:00
Greg Kroah-Hartman
a681ba0a3d
refcount: change EXPORT_SYMBOL markings
Now that kref is using the refcount apis, the _GPL markings are getting
exported to places that it previously wasn't.  Now kref.h is GPLv2
licensed, so any non-GPL code using it better be talking to some
lawyers, but changing api markings isn't considered "nice", so let's fix
this up.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-30 19:49:34 +03:00
Peter Zijlstra
587c3afda2
locking/refcounts: Use atomic_try_cmpxchg()
Generates better code (GCC-6.2.1):

  text        filename
  1576        defconfig-build/lib/refcount.o.pre
  1488        defconfig-build/lib/refcount.o.post

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-04-30 19:49:34 +03:00
Ingo Molnar
51e86ead56
locking/refcounts: Change WARN() to WARN_ONCE()
Linus noticed that the new refcount.h APIs used WARN(), which would turn
into a dmesg DoS if it triggers frequently on some buggy driver.

So make sure we only warn once. These warnings are never supposed to happen,
so it's typically not a problem to lose subsequent warnings.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CA+55aFzbYUTZ=oqZ2YgDjY0C2_n6ODhTfqj6V+m5xWmDxsuB0w@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-04-30 19:49:34 +03:00
Kees Cook
80a2ca95d9
locking/refcount: Create unchecked atomic_t implementation
Many subsystems will not use refcount_t unless there is a way to build the
kernel so that there is no regression in speed compared to atomic_t. This
adds CONFIG_REFCOUNT_FULL to enable the full refcount_t implementation
which has the validation but is slightly slower. When not enabled,
refcount_t uses the basic unchecked atomic_t routines, which results in
no code changes compared to just using atomic_t directly.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170621200026.GA115679@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-04-30 19:49:32 +03:00
David Windsor
a6ca06e061
locking/refcount: Add refcount_t API kernel-doc comments
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: elena.reshetova@intel.com
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/1489160052-20293-1-git-send-email-dwindsor@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-04-30 19:49:32 +03:00
Peter Zijlstra
9013c72d49
locking/refcounts: Out-of-line everything
Linus asked to please make this real C code.

And since size then isn't an issue what so ever anymore, remove the
debug knob and make all WARN()s unconditional.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dwindsor@gmail.com
Cc: elena.reshetova@intel.com
Cc: gregkh@linuxfoundation.org
Cc: ishkamiel@gmail.com
Cc: keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-04-30 19:49:31 +03:00
Peter Zijlstra
7f8abe242c
refcount_t: Introduce a special purpose refcount type
Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides saturation semantics such that overflow becomes impossible
and thereby 'spurious' use-after-free is avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-04-30 19:49:08 +03:00
Faiz Faadhillah
792a6c3819
zstd: Import upstream v1.5.5
Signed-off-by: Faiz Faadhillah <faiz.faadhillah@gmail.com>
2023-04-11 19:03:42 +03:00
Nick Desaulniers
48d53a72d2
lib/string.c: implement stpcpy
commit 1e1b6d63d6340764e00356873e5794225a2a03ea upstream.

LLVM implemented a recent "libcall optimization" that lowers calls to
`sprintf(dest, "%s", str)` where the return value is used to
`stpcpy(dest, str) - dest`.

This generally avoids the machinery involved in parsing format strings.
`stpcpy` is just like `strcpy` except it returns the pointer to the new
tail of `dest`.  This optimization was introduced into clang-12.

Implement this so that we don't observe linkage failures due to missing
symbol definitions for `stpcpy`.

Similar to last year's fire drill with: commit 5f074f3e192f
("lib/string.c: implement a basic bcmp")

The kernel is somewhere between a "freestanding" environment (no full
libc) and "hosted" environment (many symbols from libc exist with the
same type, function signature, and semantics).

As Peter Anvin notes, there's not really a great way to inform the
compiler that you're targeting a freestanding environment but would like
to opt-in to some libcall optimizations (see pr/47280 below), rather
than opt-out.

Arvind notes, -fno-builtin-* behaves slightly differently between GCC
and Clang, and Clang is missing many __builtin_* definitions, which I
consider a bug in Clang and am working on fixing.

Masahiro summarizes the subtle distinction between compilers justly:
  To prevent transformation from foo() into bar(), there are two ways in
  Clang to do that; -fno-builtin-foo, and -fno-builtin-bar.  There is
  only one in GCC; -fno-buitin-foo.

(Any difference in that behavior in Clang is likely a bug from a missing
__builtin_* definition.)

Masahiro also notes:
  We want to disable optimization from foo() to bar(),
  but we may still benefit from the optimization from
  foo() into something else. If GCC implements the same transform, we
  would run into a problem because it is not -fno-builtin-bar, but
  -fno-builtin-foo that disables that optimization.

  In this regard, -fno-builtin-foo would be more future-proof than
  -fno-built-bar, but -fno-builtin-foo is still potentially overkill. We
  may want to prevent calls from foo() being optimized into calls to
  bar(), but we still may want other optimization on calls to foo().

It seems that compilers today don't quite provide the fine grain control
over which libcall optimizations pseudo-freestanding environments would
prefer.

Finally, Kees notes that this interface is unsafe, so we should not
encourage its use.  As such, I've removed the declaration from any
header, but it still needs to be exported to avoid linkage errors in
modules.

Reported-by: Sami Tolvanen <samitolvanen@google.com>
Suggested-by: Andy Lavr <andy.lavr@gmail.com>
Suggested-by: Arvind Sankar <nivedita@alum.mit.edu>
Suggested-by: Joe Perches <joe@perches.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200914161643.938408-1-ndesaulniers@google.com
Link: https://bugs.llvm.org/show_bug.cgi?id=47162
Link: https://bugs.llvm.org/show_bug.cgi?id=47280
Link: https://github.com/ClangBuiltLinux/linux/issues/1126
Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html
Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html
Link: https://reviews.llvm.org/D85963
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-21 00:55:35 +03:00
Mark Salyzyn
4b8f76b16f
ANDROID: clock_gettime(CLOCK_BOOTTIME,) slows down >20x
clock_gettime(CLOCK_BOOTTIME,) slows down after significant
accumulation of suspend time creating a large offset between it and
CLOCK_MONOTONIC time.  The __iter_div_u64_rem() is only for the usage
of adding a few second+nanosecond times and saving cycles on more
expensive remainder and division operations, but iterates one second
at a time which quickly goes out of scale in CLOCK_BOOTTIME's case
since it was specified as nanoseconds only.

The fix is to split off seconds from the boot time and cap the
nanoseconds so that __iter_div_u64_rem does not iterate.

Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 72406285
Change-Id: Ia647ef1e76b7ba3b0c003028d4b3b955635adabb
Signed-off-by: khusika <khusikadhamar@gmail.com>
2023-02-21 00:25:56 +03:00
Mark Salyzyn
7209ae7db1
FROMLIST: lib: vdso: add support for time
(cherry pick from url https://patchwork.kernel.org/patch/10053549/)

Add time() vdso support to match up with existing support in the x86's
vdso.  Currently benefitting arm and arm64 which uses the common
vgettimeofday.c implementation.  On arm provides about a ~14 fold
improvement in speed over the straight syscall, and about a ~5 fold
improvement in speed over an alternate library implementation that
relies on the vdso call to gettimeofday to fulfill the request.

We can provide __vdso_time even if we can not provide a speed
enhanced __vdso_gettimeofday.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Bug: 63737556
Bug: 20045882
Change-Id: I0bb3c6bafe57f9ed69350e2dd54edaae58316e8f
Signed-off-by: khusika <khusikadhamar@gmail.com>
2023-02-21 00:24:50 +03:00
Mark Salyzyn
882f7bf86d
FROMLIST: [PATCH v5 12/12] lib: vdso: do not expose gettimeofday, if no arch supported timer
(cherry pick from url https://patchwork.kernel.org/patch/10044539/)

Take an effort to recode the arm64 vdso code from assembler to C
previously submitted by Andrew Pinski <apinski@cavium.com>, rework
it for use in both arm and arm64, overlapping any optimizations
for each architecture. But instead of landing it in arm64, land the
result into lib/vdso and unify both implementations to simplify
future maintenance.

If ARCH_PROVIDES_TIMER is not defined, do not expose gettimeofday.
libc will default directly to syscall.  Also ifdef clock_gettime
switch cases and stubs if not supported and other unused components.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: James Morse <james.morse@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Andrew Pinski <apinski@cavium.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Bug: 63737556
Bug: 20045882
Change-Id: I362a7114db0aac800e16eb90d14a8739e18f42e4
Signed-off-by: khusika <khusikadhamar@gmail.com>
2023-02-21 00:24:49 +03:00
Mark Salyzyn
4423ee507e
FROMLIST: [PATCH v5 11/12] lib: vdso: Add support for CLOCK_BOOTTIME
(cherry pick from url https://patchwork.kernel.org/patch/10044503/)

Take an effort to recode the arm64 vdso code from assembler to C
previously submitted by Andrew Pinski <apinski@cavium.com>, rework
it for use in both arm and arm64, overlapping any optimizations
for each architecture. But instead of landing it in arm64, land the
result into lib/vdso and unify both implementations to simplify
future maintenance.

Add a case for CLOCK_BOOTTIME as it is popular for measuring
relative time on systems expected to suspend() or hibernate().

Android uses CLOCK_BOOTTIME for all relative time measurements
and timeouts. Switching to vdso reduced CPU utilization and improves
accuracy. There is also a desire by some partners to switch all
logging over to CLOCK_BOOTTIME, and thus this operation alone would
contribute to a near percentile CPU load.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: James Morse <james.morse@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Andrew Pinski <apinski@cavium.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Bug: 63737556
Bug: 20045882
Change-Id: I76c26b054baf7f1100e03c65d6b16fe649b883b1
Signed-off-by: khusika <khusikadhamar@gmail.com>
2023-02-21 00:24:49 +03:00
Mark Salyzyn
96eb5ccd2f
FROMLIST: [PATCH v5 09/12] arm: vdso: move vgettimeofday.c to lib/vdso/
(cherry pick from url https://patchwork.kernel.org/patch/10044497/)

Take an effort to recode the arm64 vdso code from assembler to C
previously submitted by Andrew Pinski <apinski@cavium.com>, rework
it for use in both arm and arm64, overlapping any optimizations
for each architecture. But instead of landing it in arm64, land the
result into lib/vdso and unify both implementations to simplify
future maintenance.

Declare arch/arm/vdso/vgettimeofday.c to be a candidate for a global
implementation of the vdso timer calls.  The hope is that new
architectures can take advantage of the current unification of
arm and arm64 implementations.

We urge future efforts to merge their implementations into the
global vgettimeofday.c file and thus provide functional parity.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: James Morse <james.morse@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Andrew Pinski <apinski@cavium.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Bug: 63737556
Bug: 20045882
Change-Id: If7da1d8144684d52ed9520a581e6023c623df931
Signed-off-by: khusika <khusikadhamar@gmail.com>
2023-02-21 00:24:48 +03:00
Andrzej Perczak
3d0695d4f3
lz4: Update to version 1.9.4
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
2023-02-21 00:24:44 +03:00
Andrzej Perczak
8d1c592571
lib: Update LZ4 module to v1.9.3+
Update lz4 module using official repository from revision [1].

Keep in mind lz4hc wasn't updated thus it is not used. It may not
compile anymore.

[1]: 4ebe313e00
2023-02-21 00:24:43 +03:00
Faiz Faadhillah
15d24546ce
zstd: Import upstream v1.5.4
Signed-off-by: Faiz Faadhillah <faiz.faadhillah@gmail.com>
2023-02-21 00:21:21 +03:00
Nick Terrell
f5bfa71cfe
lib: zstd: Don't add -O3 to cflags
After the update to zstd-1.4.10 passing -O3 is no longer necessary to
get good performance from zstd. Using the default optimization level -O2
is sufficient to get good performance.

I've measured no significant change to compression speed, and a ~1%
decompression speed loss, which is acceptable.

This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The
gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating
stacks that are ~3KB. With -O2 these same functions generate stacks in
the < 100B, completely fixing the problem. Function size deltas are
listed below:

ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68
ZSTD_compressBlock_fast: 2216 -> 40
ZSTD_compressBlock_fast_dictMatchState: 1848 ->  64
ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76
ZSTD_fillDoubleHashTable: 3252 -> 0
ZSTD_compressBlock_doubleFast: 5856 -> 36
ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84
ZSTD_copmressBlock_lazy2: 2420 -> 72

Additionally, this improves the reported code bloat [1]. With gcc-11
bloat-o-meter shows an 80KB code size improvement:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190)
Total: Before=6418562, After=6336372, chg -1.28%
```

Compared to before the zstd-1.4.10 update we see a total code size
regression of 105KB, down from 374KB at v5.16-rc1:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522)
Total: Before=6228850, After=6336372, chg +1.73%
```

[0] https://lkml.org/lkml/2021/11/15/710
[1] https://lkml.org/lkml/2021/11/14/189

Link: https://lore.kernel.org/r/20211117014949.1169186-4-nickrterrell@gmail.com/
Link: https://lore.kernel.org/r/20211117201459.1194876-4-nickrterrell@gmail.com/

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:13 +03:00
Cyber Knight
d8ec1bec06
lib: zstd: Fix attribute declaration
fallthrough, __fallthrough__ and [[fallthrough]] are compiler dependent.
Hence let's use a generic fallthrough implementation which is __attribute__((__fallthrough__))

Test: lib/zstd/ compiled successfully with this change using GCC 12.0.0 / Clang 14.0.0 .
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2023-02-21 00:21:08 +03:00
Pzqqt
528ae5ab25
lib: zstd: Add missing header file 2023-02-21 00:21:08 +03:00
Nick Terrell
2a3ee93f73
zstd: import usptream v1.5.2
Updates the kernel's zstd library to v1.5.2, the latest zstd release.
The upstream tag it is updated to is `v1.5.2-kernel`, which contains
several cherry-picked commits on top of the v1.5.2 release which are
required for the kernel update. I will create this tag once the PR is
ready to merge, until then reference the temporary upstream branch
`v1.5.2-kernel-cherrypicks`.

I plan to submit this patch as part of the v6.2 merge window.

I've done basic build testing & testing on x86-64, i386, and aarch64.
I'm merging these patches into my `zstd-next` branch, which is pulled
into `linux-next` for further testing.

I've benchmarked BtrFS with zstd compression on a x86-64 machine, and
saw these results. Decompression speed is a small win across the board.
The lower compression levels 1-4 see both compression speed and
compression ratio wins. The higher compression levels see a small
compression speed loss and about neutral ratio. I expect the lower
compression levels to be used much more heavily than the high
compression levels, so this should be a net win.

Level	CTime	DTime	Ratio
1	-2.95%	-1.1%	-0.7%
3	-3.5%	-1.2%	-0.5%
5	+3.7%	-1.0%	+0.0%
7	+3.2%	-0.9%	+0.0%
9	-4.3%	-0.8%	+0.1%

Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:07 +03:00
Nick Terrell
2179dadbd5
zstd: Move zstd-common module exports to zstd_common_module.c
The zstd codebase is imported from the upstream zstd repo, and is over-written on
every update. Upstream keeps the kernel specific code separate from the main
library. So the module definition is moved into the zstd_common_module.c file.
This matches the pattern followed by the zstd-compress and zstd-decompress files.

I've done build and boot testing on x86-64, i386, and aarch64. I've
verified that zstd built both as modules and built-in build and boot.

Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:07 +03:00
Xin Gao
1009f93304
lib: zstd: Fix comment typo
The double `when' is duplicated in line 999, remove one.

Signed-off-by: Xin Gao <gaoxin@cdjrlc.com>
Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:07 +03:00
Jilin Yuan
d193623724
lib: zstd: fix repeated words in comments
Delete the redundant word 'the'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:06 +03:00
Alexey Kardashevskiy
b230c6f769
zstd: Fixing mixed module-builtin objects
With CONFIG_ZSTD_COMPRESS=m and CONFIG_ZSTD_DECOMPRESS=y we end up in
a situation when files from lib/zstd/common/ are compiled once to be
linked later for ZSTD_DECOMPRESS (build-in) and ZSTD_COMPRESS (module)
even though CFLAGS are different for builtins and modules.
So far somehow this was not a problem but enabling LLVM LTO exposes
the problem as:

ld.lld: error: linking module flags 'Code Model': IDs have conflicting values in 'lib/built-in.a(zstd_common.o at 5868)' and 'ld-temp.o'

This particular conflict is caused by KBUILD_CFLAGS=-mcmodel=medium vs.
KBUILD_CFLAGS_MODULE=-mcmodel=large , modules use the large model on
POWERPC as explained at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Makefile?h=v5.18-rc4#n127
but the current use of common files is wrong anyway.

This works around the issue by introducing a zstd_common module with
shared code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-02-21 00:21:06 +03:00
Nick Terrell
9978661337
lib: zstd: Don't inline functions in zstd_opt.c
`zstd_opt.c` contains the match finder for the highest compression
levels. These levels are already very slow, and are unlikely to be used
in the kernel. If they are used, they shouldn't be used in latency
sensitive workloads, so slowing them down shouldn't be a big deal.

This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
I've also opened an issue upstream [1] so that we can properly tackle
the code size issue in `zstd_opt.c` for all users, and can hopefully
remove this hack in the next zstd version we import.

Bloat-o-meter output on x86-64:

```
> ../scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
Function                                     old     new   delta
ZSTD_compressBlock_opt_generic.constprop       -    7559   +7559
ZSTD_insertBtAndGetAllMatches                  -    6304   +6304
ZSTD_insertBt1                                 -    1731   +1731
ZSTD_storeSeq                                  -     693    +693
ZSTD_BtGetAllMatches                           -     255    +255
ZSTD_updateRep                                 -     128    +128
ZSTD_updateTree                               96      99      +3
ZSTD_insertAndFindFirstIndexHash3             81       -     -81
ZSTD_setBasePrices.constprop                  98       -     -98
ZSTD_litLengthPrice.constprop                138       -    -138
ZSTD_count                                   362     181    -181
ZSTD_count_2segments                        1407     938    -469
ZSTD_insertBt1.constprop                    2689       -   -2689
ZSTD_compressBlock_btultra2                19990     423  -19567
ZSTD_compressBlock_btultra                 19633      15  -19618
ZSTD_initStats_ultra                       19825       -  -19825
ZSTD_compressBlock_btopt                   20374      12  -20362
ZSTD_compressBlock_btopt_extDict           29984      12  -29972
ZSTD_compressBlock_btultra_extDict         30718      15  -30703
ZSTD_compressBlock_btopt_dictMatchState    32689      12  -32677
ZSTD_compressBlock_btultra_dictMatchState   33574      15  -33559
Total: Before=6611828, After=6418562, chg -2.92%
```

[0] https://lkml.org/lkml/2021/11/14/189
[1] https://github.com/facebook/zstd/issues/2862

Link: https://lore.kernel.org/r/20211117014949.1169186-3-nickrterrell@gmail.com/
Link: https://lore.kernel.org/r/20211117201459.1194876-3-nickrterrell@gmail.com/

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:05 +03:00
Nick Terrell
181c7903ae
lib: zstd: Fix unused variable warning
The variable `litLengthSum` is only used by an `assert()`, so when
asserts are disabled the compiler doesn't see any usage and warns.

This issue is already fixed upstream by PR #2838 [0]. It was reported
by the Kernel test robot in [1].

Another approach would be to change zstd's disabled `assert()`
definition to use the argument in a disabled branch, instead of
ignoring the argument. I've avoided this approach because there are
some small changes necessary to get zstd to build, and I would
want to thoroughly re-test for performance, since that is slightly
changing the code in every function in zstd. It seems like a
trivial change, but some functions are pretty sensitive to small
changes. However, I think it is a valid approach that I would
like to see upstream take, so I've opened Issue #2868 to attempt
this upstream.

Lastly, I've chosen not to use __maybe_unused because all code
in lib/zstd/ must eventually be upstreamed. Upstream zstd can't
use __maybe_unused because it isn't portable across all compilers.

[0] https://github.com/facebook/zstd/pull/2838
[1] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/
[2] https://github.com/facebook/zstd/issues/2868

Link: https://lore.kernel.org/r/20211117014949.1169186-2-nickrterrell@gmail.com/
Link: https://lore.kernel.org/r/20211117201459.1194876-2-nickrterrell@gmail.com/

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:05 +03:00
Nathan Chancellor
ae970d19f0
lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical
A new warning in clang warns that there is an instance where boolean
expressions are being used with bitwise operators instead of logical
ones:

lib/zstd/decompress/huf_decompress.c:890:25: warning: use of bitwise '&' with boolean operands [-Wbitwise-instead-of-logical]
                       (BIT_reloadDStreamFast(&bitD1) == BIT_DStream_unfinished)
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

zstd does this frequently to help with performance, as logical operators
have branches whereas bitwise ones do not.

To fix this warning in other cases, the expressions were placed on
separate lines with the '&=' operator; however, this particular instance
was moved away from that so that it could be surrounded by LIKELY, which
is a macro for __builtin_expect(), to help with a performance
regression, according to upstream zstd pull #1973.

Aside from switching to logical operators, which is likely undesirable
in this instance, or disabling the warning outright, the solution is
casting one of the expressions to an integer type to make it clear to
clang that the author knows what they are doing. Add a cast to U32 to
silence the warning. The first U32 cast is to silence an instance of
-Wshorten-64-to-32 because __builtin_expect() returns long so it cannot
be moved.

Link: https://github.com/ClangBuiltLinux/linux/issues/1486
Link: https://github.com/facebook/zstd/pull/1973
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:04 +03:00
Nick Terrell
6fac2fbada
lib: zstd: Upgrade to latest upstream zstd version 1.4.10
Upgrade to the latest upstream zstd version 1.4.10.

This patch is 100% generated from upstream zstd commit 20821a46f412 [0].

This patch is very large because it is transitioning from the custom
kernel zstd to using upstream directly. The new zstd follows upstreams
file structure which is different. Future update patches will be much
smaller because they will only contain the changes from one upstream
zstd release.

As an aid for review I've created a commit [1] that shows the diff
between upstream zstd as-is (which doesn't compile), and the zstd
code imported in this patch. The verion of zstd in this patch is
generated from upstream with changes applied by automation to replace
upstreams libc dependencies, remove unnecessary portability macros,
replace `/**` comments with `/*` comments, and use the kernel's xxhash
instead of bundling it.

The benefits of this patch are as follows:
1. Using upstream directly with automated script to generate kernel
   code. This allows us to update the kernel every upstream release, so
   the kernel gets the latest bug fixes and performance improvements,
   and doesn't get 3 years out of date again. The automation and the
   translated code are tested every upstream commit to ensure it
   continues to work.
2. Upgrades from a custom zstd based on 1.3.1 to 1.4.10, getting 3 years
   of performance improvements and bug fixes. On x86_64 I've measured
   15% faster BtrFS and SquashFS decompression+read speeds, 35% faster
   kernel decompression, and 30% faster ZRAM decompression+read speeds.
3. Zstd-1.4.10 supports negative compression levels, which allow zstd to
   match or subsume lzo's performance.
4. Maintains the same kernel-specific wrapper API, so no callers have to
   be modified with zstd version updates.

One concern that was brought up was stack usage. Upstream zstd had
already removed most of its heavy stack usage functions, but I just
removed the last functions that allocate arrays on the stack. I've
measured the high water mark for both compression and decompression
before and after this patch. Decompression is approximately neutral,
using about 1.2KB of stack space. Compression levels up to 3 regressed
from 1.4KB -> 1.6KB, and higher compression levels regressed from 1.5KB
-> 2KB. We've added unit tests upstream to prevent further regression.
I believe that this is a reasonable increase, and if it does end up
causing problems, this commit can be cleanly reverted, because it only
touches zstd.

I chose the bulk update instead of replaying upstream commits because
there have been ~3500 upstream commits since the 1.3.1 release, zstd
wasn't ready to be used in the kernel as-is before a month ago, and not
all upstream zstd commits build. The bulk update preserves bisectablity
because bugs can be bisected to the zstd version update. At that point
the update can be reverted, and we can work with upstream to find and
fix the bug.

Note that upstream zstd release 1.4.10 doesn't exist yet. I have cut a
staging branch at 20821a46f412 [0] and will apply any changes requested
to the staging branch. Once we're ready to merge this update I will cut
a zstd release at the commit we merge, so we have a known zstd release
in the kernel.

The implementation of the kernel API is contained in
zstd_compress_module.c and zstd_decompress_module.c.

[0] 20821a46f4
[1] e0fa481d0e

Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
2023-02-21 00:21:04 +03:00
Nick Terrell
c2a42cf6f2
lib: zstd: Add kernel-specific API
This patch:
- Moves `include/linux/zstd.h` -> `include/linux/zstd_lib.h`
- Updates modified zstd headers to yearless copyright
- Adds a new API in `include/linux/zstd.h` that is functionally
  equivalent to the in-use subset of the current API. Functions are
  renamed to avoid symbol collisions with zstd, to make it clear it is
  not the upstream zstd API, and to follow the kernel style guide.
- Updates all callers to use the new API.

There are no functional changes in this patch. Since there are no
functional change, I felt it was okay to update all the callers in a
single patch. Once the API is approved, the callers are mechanically
changed.

This patch is preparing for the 3rd patch in this series, which updates
zstd to version 1.4.10. Since the upstream zstd API is no longer exposed
to callers, the update can happen transparently.

Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
2023-02-21 00:21:04 +03:00
Nick Terrell
33a3edc6c3
lib: zstd: Add decompress_sources.h for decompress_unzstd
Adds decompress_sources.h which includes every .c file necessary for
zstd decompression. This is used in decompress_unzstd.c so the internal
structure of the library isn't exposed.

This allows us to upgrade the zstd library version without modifying any
callers. Instead we just need to update decompress_sources.h.

Signed-off-by: Nick Terrell <terrelln@fb.com>
2023-02-21 00:21:03 +03:00
Zhen Lei
0dc52ba3a5
lib/decompressors: fix spelling mistakes
Fix some spelling mistakes in comments:
sentinal ==> sentinel
compresed ==> compressed
dependeny ==> dependency
immediatelly ==> immediately
dervied ==> derived
splitted ==> split
nore ==> not
independed ==> independent
asumed ==> assumed

Link: https://lkml.kernel.org/r/20210604085656.12257-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-21 00:21:03 +03:00
Paul Cercueil
07f2369f8f
lib: decompress_unzstd: Limit output size
The zstd decompression code, as it is right now, will most likely fail
on 32-bit systems, as the default output buffer size causes the buffer's
end address to overflow.

Address this issue by setting a sane default to the default output size,
with a value that won't overflow the buffer's end address.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-02-21 00:21:02 +03:00
Nick Terrell
758cac1719
lib: Add zstd support to decompress
- Add unzstd() and the zstd decompress interface.

- Add zstd support to decompress_method().

The decompress_method() and unzstd() functions are used to decompress
the initramfs and the initrd. The __decompress() function is used in
the preboot environment to decompress a zstd compressed kernel.

The zstd decompression function allows the input and output buffers to
overlap because that is used by x86 kernel decompression.

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-3-nickrterrell@gmail.com
2023-02-21 00:21:02 +03:00
Nick Terrell
4dff89d500
lib: Prepare zstd for preboot environment, improve performance
These changes are necessary to get the build to work in the preboot
environment, and to get reasonable performance:

- Remove a double definition of the CHECK_F macro when the zstd
  library is amalgamated.

- Switch ZSTD_copy8() to __builtin_memcpy(), because in the preboot
  environment on x86 gcc can't inline `memcpy()` otherwise.

- Limit the gcc hack in ZSTD_wildcopy() to the broken gcc version. See
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388.

ZSTD_copy8() and ZSTD_wildcopy() are in the core of the zstd hot loop.
So outlining these calls to memcpy(), and having an extra branch are very
detrimental to performance.

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-2-nickrterrell@gmail.com
2023-02-21 00:21:02 +03:00
Rasmus Villemoes
b8508112d7
lib/zstd/mem.h: replace __inline by inline
Currently, compiler_types.h #defines __inline as inline (and further
#defines inline to automatically attach some attributes), so this does
not change functionality. It serves as preparation for removing the
#define of __inline.

While at it, also remove the __attribute__((unused)) - it's already
included in the definition of the inline macro, and "open-coded"
__attribute__(()) should be avoided.

Since commit a95b37e20db9 (kbuild: get <linux/compiler_types.h> out of
<linux/kconfig.h>), compiler_types.h is automatically included by all
kernel C code - i.e., the definition of inline including the unused
attribute is guaranteed to be in effect whenever ZSTD_STATIC is
expanded.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2023-02-21 00:21:01 +03:00