UMAC Performance (More)

This page provides further performance data. For more information, see our main performance page.

We provide throughput rates, measured in machine cycles-per-byte, for authenticating and hashing messages of various byte-lengths. Pentium III results were gathered on a 700 MHz Pentium III. The PowerPC results were gathered on a 450 MHz PowerPC.

UMAC and UHASH are named in the manner UMAC-w/l and UHASH-w/l, where w indicates the "basic" word size used (in bytes) and l indicates the length of the output (in bytes). Most architectures manipulate 4-byte words well, while some architectures support SIMD parallelism for 2-byte words.

We advocate the use of 64-bit authentication tags for most purposes. UMAC and UHASH, however, provide flexibility in meeting performance and security needs. Longer or shorter outputs can be produced with proportionally longer or shorter computation time and proportionally stronger or weaker security. For a detailed discussion of security considerations, see the UMAC specification.

32-Bit "Light" Security: Authentication tags of 32-bits provide a useful security guarantee when a forgery is not catastrophic, and in computationally constrained environments. Tags of this length have been standard for retail banking, but longer tags are safer for many applications.
 

Algorithm
type

Output
bytes
Pentium III PowerPC 7400
43 b 256 b 1500 b 256 kb 43 b 256 b 1500 b 256 kb
UMAC-4/4 MAC 4 7.5 1.6 1.1 0.9        
UMAC-2/4 MAC 4 6.8 1.3 0.7 0.5        
UHASH-4/4 Hash 4 5.2 1.3 1.0 0.9 8.3 2.7 2.3 2.1
UHASH-2/4 Hash 4 4.9 1.0 0.6 0.5 4.7 0.9 0.7 0.5

64-Bit "Commercial" Security: Authentication tags of 64-bits are good for most authentication needs.
 

Algorithm
type

Output
bytes
Pentium III PowerPC 7400
43 b 256 b 1500 b 256 kb 43 b 256 b 1500 b 256 kb
UMAC-4/8 MAC 8 12.6 3.0 2.1 1.8        
UMAC-2/8 MAC 8 10.7 2.2 1.2 0.9        
UHASH-4/8 Hash 8 8.3 2.4 2.0 1.8 14.3 4.7 3.9 3.6
UHASH-2/8 Hash 8 7.7 1.7 1.1 0.9 7.9 1.7 1.1 0.9

96-Bit "Extra" Security: Authentication tags of 96-bits provide an extra margin of security not generally needed.
 

Algorithm
type

Output
bytes
Pentium III PowerPC 7400
43 b 256 b 1500 b 256 kb 43 b 256 b 1500 b 256 kb
UMAC-4/12 MAC 12 18.7 4.7 3.1 2.6        
UMAC-2/12 MAC 12 15.8 3.3 1.9 1.4        
UHASH-4/12 Hash 12 12.6 3.7 3.1 2.6 23.1 7.6 6.1 5.6
UHASH-2/12 Hash 12 9.7 2.4 1.7 1.4 11.0 2.4 1.7 1.3

128-Bit "Extremist" Security: Authentication tags of 128-bits provide levels of authentication not generally considered needed.
 

Algorithm
type

Output
bytes
Pentium III PowerPC 7400
43 b 256 b 1500 b 256 kb 43 b 256 b 1500 b 256 kb
UMAC-4/16 MAC 16 21.0 5.7 4.1 3.5        
UMAC-2/16 MAC 16 17.9 4.1 2.4 1.9        
UHASH-4/16 Hash 16 15.2 4.8 3.9 3.5 26.2 9.2 7.8 7.2
UHASH-2/16 Hash 16 12.0 3.1 2.2 1.9 13.5 8.7 1.6 1.6

Comparison algorithms:

Comparison Algorithms: Throughputs for some MACs and hashing algorithms over several message lengths. The MAC "hash127-RC6" uses RC6 to supply random bits to hash127 for tag generation. Results were gathered on a 700 MHz Pentium III.
 

Algorithm
type

Output
bytes
Pentium III
43 b 256 b 1500 b 4096 b 256 kb
CBC-RC6-MAC MAC 16   18.0 17.7 17.7 17.7
HMAC-SHA1 MAC 20 50.3 21.5 14.5 13.8 13.3
hash127-RC6 MAC 16 17.0 6.2 4.6 4.4 5.7
SHA1 Hash 20 25.4 17.2 13.8 13.5 13.3
hash127 Hash 16 10.8 5.2 4.4 4.3 5.7

Notes:

The algorithm hash127 is a polynomial evaluation hash which Daniel Bernstein has shown can be implemented for high speed using IEEE floating point. A paper and implementation can be found at http://cr.yp.to/hash127.html. The hash127 results for 256 kb are slower because the hash127 interface does not allow to simulate large messages residing in level 1 cache.

The Intel Pentium 4 processor provides wider, 128-bit, SIMD registers than those provided by MMX-compatible processors (64-bits). The Pentium 4 also provides SIMD instructions allowing parallel multiplication of 32-bit operands into 64-bit results. These two additions to the Pentium architecture should speed our Pentium implementations by at least 50%. Preliminary results indicate a peak throughput of 1.1 cpb for UMAC-4/8 on the Pentium 4.

 

Last updated: 2000.08.29