Turbo Base64:Fastest Base64 SIMD/Neon
Fastest Base64 SIMD Encoding library
- 100% C (C++ headers), as simple as memcpy.
- No other base64 library encode or decode faster
- :sparkles: Scalar can be faster than other SSE or ARM Neon based base64 libraries
- :new: (2019.12) Turbo Base64 SSE faster than other SSE/AVX/AVX2! base64 library
- :new: (2019.12) Fastest AVX2 implementation, damn near to memcpy
- TurboBase64 AVX2 decoding is ~2x faster than other AVX2 libs.
- :new: (2020.1) For short string, TurboBase64 is 3-4 times faster than other libs.
- :new: (2019.1) Fastest ARM Neon base64
- :+1: Dynamic CPU detection and JIT scalar/sse/avx/avx2 switching
- Base64 robust error checking, optimzed for long+short strings
- Portable library, 32/64 bits, SSE/AVX/AVX2, ARM Neon, Power9 Altivec
- OS:Linux amd64, arm64, Power9, MacOs, s390x. Windows: Mingw, visual c++
- Big+Little endian
-
Ready and simple to use library, no armada of files, no hassles dependencies
Benchmark incl. the best SIMD Base64 libs:
- with TurboBench
- Single thread
- Including base64 error checking
- Small file + realistic and practical (no PURE cache) benchmark with large binary game assets corpus pd3d.tar (20 MB)
- Unlike other benchmarks, the best of the best scalar+simd libraries are included
Benchmark Intel CPU: Skylake i7-6700 3.4GHz gcc 9.2
E Size | ratio% | E MB/s | D MB/s | Name | 1MB binary 2019.12 |
---|---|---|---|---|---|
1333336 | 133.3 | 16329 | 26032 | TB64avx2 | Turbo Base64 avx2 |
1333336 | 133.3 | 9920 | 16207 | TB64avx | Turbo Base64 avx |
1333336 | 133.3 | 8891 | 12132 | TB64sse | Turbo Base64 sse |
1333336 | 133.3 | 12973 | 13986 | fb64avx2 | Fastbase64 avx2 |
1333336 | 133.3 | 12962 | 13970 | b64avx2 | Base64 avx2 |
1333336 | 133.3 | 8392 | 8717 | b64avx | Base64 avx |
1333336 | 133.3 | 6899 | 7477 | b64sse | Base64 sse41 |
1000000 | 100.0 | 29934 | 29992 | memcpy |
E Size | ratio% | E MB/s | D MB/s | Name | 20MB binary 2019.12 |
---|---|---|---|---|---|
26666668 | 133.3 | 8920 | 12706 | TB64avx2 | Turbo Base64 avx2 |
26666668 | 133.3 | 8466 | 12401 | TB64avx | Turbo Base64 avx |
26666668 | 133.3 | 8103 | 11291 | TB64sse | Turbo Base64 sse |
26666668 | 133.3 | 7795 | 10452 | fb64avx2 | Fastbase64 avx2 |
26666668 | 133.3 | 7809 | 10381 | b64avx2 | Base64 avx2 |
26666668 | 133.3 | 7161 | 8172 | b64avx | Base64 avx |
26666668 | 133.3 | 6420 | 7042 | b64sse | Base64 sse41 |
26666668 | 133.3 | 3925 | 4281 | TB64x | Turbo Base64 scalar |
26666668 | 133.3 | 1840 | 3320 | b64plain | Base64 plain |
26666668 | 133.3 | 1908 | 2752 | TB64s | Turbo Base64 scalar |
26666668 | 133.3 | 1522 | 3198 | chrome | Google Chrome base64 |
26666668 | 133.3 | 1871 | 1612 | fb64plain | FastBase64 plain |
26666668 | 133.3 | 1122 | 816 | quicktime | Apple Quicktime base64 |
27083334 | 135.4 | 1100 | 178 | linux | Linux base64 |
20000000 | 100.0 | 14432 | 14464 | memcpy |
TurboBase64 vs. Base64 for short strings (incl. checking) |String length|E MB/s|D MB/s|Name|1MB short strings 2020.01 | |————:|——–:|——–:|—————-|—————-| | 4 - 16 |1682|1843|TB64avx2|Turbo Base64 avx2| | |559|622|b64avx2|Base64 avx2| | 8 - 32 |2835|2965|TB64avx2|Turbo Base64 avx2| | |838|818|b64avx2|Base64 avx2| | 16 - 64 |4623|4893|TB64avx2|Turbo Base64 avx2| | |1555|1229|b64avx2|Base64 avx2| | 32 - 128 |7218|7533|TB64avx2|Turbo Base64 avx2| | |2955|2365|b64avx2|Base64 avx2|
Benchmark ARM Neon: ARMv8 A73-ODROID-N2 1.8GHz (clang 6.0)
E Size | ratio% | E MB/s | D MB/s | Name | 30MB binary 2019.12 |
---|---|---|---|---|---|
40000000 | 133.3 | 2026 | 1650 | TB64neon | Turbo Base64 Neon |
40000000 | 133.3 | 1795 | 1285 | b64neon64 | Base64 Neon |
40000000 | 133.3 | 1270 | 1095 | TB64x | Turbo Base64 scalar |
40000000 | 133.3 | 695 | 965 | TB64s | Turbo Base64 scalar |
40000000 | 133.3 | 512 | 782 | fb64neon | Fastbase64 SIMD Neon |
40000000 | 133.3 | 565 | 460 | Chrome | Google Chrome base64 |
40000000 | 133.3 | 642 | 614 | b64plain | Base64 plain |
40000000 | 133.3 | 506 | 548 | fb64plain | Fastbase64 plain |
40500000 | 135.4 | 314 | 91 | Linux | Linux base64 |
30000000 | 100.0 | 3820 | 3834 | memcpy |
(bold = pareto in category) MB=1.000.000
(E/D) : Encode/Decode
Compile: (Download or clone Turbo Base64 SIMD)
git clone git://github.com/powturbo/TurboBase64.git
make
Usage: (Benchmark App)
./tb64app file
or
./tb64app
Function usage:
static inline unsigned turbob64len(unsigned n)
Base64 output length after encodingunsigned tb64enc(const unsigned char *in, unsigned inlen, unsigned char *out)
Encode binary input ‘in’ buffer into base64 string ‘out’
with automatic cpu detection for simd and switch (sse/avx2/scalar
in : Input buffer to encode
inlen : Length in bytes of input buffer
out : Output buffer
return value: Length of output buffer
Remark : byte ‘zero’ is not written to end of output stream
Caller must add 0 (out[outlen] = 0) for a null terminated stringunsigned tb64dec(const unsigned char *in, unsigned inlen, unsigned char *out)
Decode base64 input ‘in’ buffer into binary buffer ‘out’
in : input buffer to decode
inlen : length in bytes of input buffer
out : output buffer
return value: >0 output buffer length
0 Error (invalid base64 input or input length = 0)
Environment:
OS/Compiler (32 + 64 bits):
- Windows: Visual C++ (2017)
- Windows: MinGW-w64 makefile
- Linux amd/intel: GNU GCC (>=4.6)
- Linux amd/intel: Clang (>=3.2)
- Linux arm: aarch64 ARMv8 Neon: gcc (>=6.3)
- Linux arm: aarch64 ARMv8 Neon: clang (>=6.0)
- MaxOS: XCode (>=9)
- PowerPC ppc64le: gcc (>=8.0) incl. SIMD Altivec
References:
* SIMD Base64 publications:
- :green_book:Faster Base64 Encoding and Decoding Using AVX2 Instructions
- :green_book:RFC 4648:The Base16, Base32, and Base64 Data Encodings
Last update: 03 MAR 2020