UMAC Performance

We provide throughput rates, measured in machine cycles-per-byte, for the processing of messages of various byte-lengths by UMAC and its underlying hash function UHASH. We use cycles-per-byte because it makes possible efficiency comparisons between processors running at different speeds. To convert to bytes-per-second, divide your processor's cycles-per-second (Hz) by the reported cycles-per-byte to get bytes-per-second. For example, a 1 GHz processor spending 2.0 cycles-per-byte processes 1e9 / 2.0 = 0.5e9 bytes-per-second (500 MB/sec).

We report on throughput speeds for four message lengths: 43 bytes, 256 bytes, 1500 bytes and 256 kilobytes. We are told by engineers at Cisco Systems that these first three message lengths form a good "rule-of-thumb" for the most-common message lengths seen on Internet backbones. The forth length represents peak throughput for most algorithms.

UMAC and UHASH are named in the manner UMAC-w/l and UHASH-w/l, where w indicates the "basic" word-size used (in bytes) and l indicates the length of the output (in bytes). Most architectures manipulate 4-byte words well, while some architectures support SIMD parallelism for 2-byte words, so we report only on the cases where w=2 and w=4.

64-bit output: We advocate the use of 64-bit authentication tags for most purposes. UMAC and UHASH, however, provide flexibity in meeting performance and security needs. Longer or shorter outputs can be produced with proportionally longer or shorter computation time and proportionally stronger or weaker security. To view performance results for UMAC and UHASH for other output lengths, see our "more performance results" page. For a detailed discussion of security considerations, see the UMAC specification.

64-Bit "Commercial" Security: 64-bit authentication tags are good for most authentication needs. Results are in machine cycles-per-byte of message processed. Pentium III results were gathered on a 700 MHz Pentium III. The PowerPC results were gathered on a 450 MHz PowerPC.
 

Algorithm
type

Output
bytes
Pentium III PowerPC 7400
43 b 256 b 1500 b 256 kb 43 b 256 b 1500 b 256 kb
UMAC-4/8 MAC 8 12.6 3.0 2.1 1.8        
UMAC-2/8 MAC 8 10.7 2.2 1.2 0.9        
UHASH-4/8 Hash 8 8.3 2.4 2.0 1.8 14.3 4.7 3.9 3.6
UHASH-2/8 Hash 8 7.7 1.7 1.1 0.9 7.9 1.7 1.1 0.9

Comparison algorithms: To better compare UMAC and UHASH performance with other algorithms, we give their performance here.

Comparison Algorithms: Throughputs for some MACs and hashing algorithms over several message lengths. The MAC "hash127-RC6" uses RC6 to supply random bits to hash127 for tag generation. Results were gathered on a 700 MHz Pentium III.
 

Algorithm
type

Output
bytes
Pentium III
43 b 256 b 1500 b 4096 b 256 kb
CBC-RC6-MAC MAC 16   18.0 17.7 17.7 17.7
HMAC-SHA1 MAC 20 50.3 21.5 14.5 13.8 13.3
hash127-RC6 MAC 16 17.0 6.2 4.6 4.4 5.7
SHA1 Hash 20 25.4 17.2 13.8 13.5 13.3
hash127 Hash 16 10.8 5.2 4.4 4.3 5.7

Notes:

The algorithm hash127 is a polynomial evaluation hash which Daniel Bernstein has shown can be implemented for high speed using IEEE floating point. A paper and implementation can be found at http://cr.yp.to/hash127.html. The hash127 results for 256 kb are slower because the hash127 interface provided by Bernstein does not allow us to simulate large messages residing in level 1 cache.

The Intel Pentium 4 processor provides wider, 128-bit, SIMD registers than those provided by MMX-compatible processors (64-bits). The Pentium 4 also provides SIMD instructions allowing parallel multiplication of 32-bit operands into 64-bit results. These two additions to the Pentium architecture should speed our Pentium implementations by at least 50%. Preliminary results indicate a peak throughput of 1.1 cpb for UMAC-4/8 on the Pentium 4.

Blank entries indicate that we have no data for those parameters.

 

Last updated: 2000.08.29