UMAC Performance

UMAC Performance

We provide throughput rates, measured in machine cycles-per-byte, for the processing of messages of various byte-lengths by UMAC and its underlying hash function UHASH. We use cycles-per-byte because it makes possible efficiency comparisons between processors running at different speeds. To convert to bytes-per-second, divide your processor's cycles-per-second (Hz) by the reported cycles-per-byte to get bytes-per-second. For example, a 1 GHz processor spending 2.0 cycles-per-byte processes 1e9 / 2.0 = 0.5e9 bytes-per-second (500 MB/sec).

We report on throughput speeds for four message lengths: 43 bytes, 256 bytes, 1500 bytes and 256 kilobytes. We are told by engineers at Cisco Systems that these first three message lengths form a good "rule-of-thumb" for the most-common message lengths seen on Internet backbones. The forth length represents peak throughput for most algorithms.

UMAC and UHASH are named in the manner UMAC-w/l and UHASH-w/l, where w indicates the "basic" word-size used (in bytes) and l indicates the length of the output (in bytes). Most architectures manipulate 4-byte words well, while some architectures support SIMD parallelism for 2-byte words, so we report only on the cases where w=2 and w=4.

64-bit output: We advocate the use of 64-bit authentication tags for most purposes. UMAC and UHASH, however, provide flexibity in meeting performance and security needs. Longer or shorter outputs can be produced with proportionally longer or shorter computation time and proportionally stronger or weaker security. To view performance results for UMAC and UHASH for other output lengths, see our "more performance results" page. For a detailed discussion of security considerations, see the UMAC specification.

**64-Bit "Commercial" Security:** 64-bit authentication tags are good for most authentication needs. Results are in machine cycles-per-byte of message processed. Pentium III results were gathered on a 700 MHz Pentium III. The PowerPC results were gathered on a 450 MHz PowerPC.
	Algorithm type	Output bytes	Pentium III				PowerPC 7400
	Algorithm type	Output bytes	43 b	256 b	1500 b	256 kb	43 b	256 b	1500 b	256 kb
UMAC-4/8	MAC	8	12.6	3.0	2.1	1.8
UMAC-2/8	MAC	8	10.7	2.2	1.2	0.9
UHASH-4/8	Hash	8	8.3	2.4	2.0	1.8	14.3	4.7	3.9	3.6
UHASH-2/8	Hash	8	7.7	1.7	1.1	0.9	7.9	1.7	1.1	0.9

Comparison algorithms: To better compare UMAC and UHASH performance with other algorithms, we give their performance here.

**Comparison Algorithms:** Throughputs for some MACs and hashing algorithms over several message lengths. The MAC "hash127-RC6" uses RC6 to supply random bits to hash127 for tag generation. Results were gathered on a 700 MHz Pentium III.
	Algorithm type	Output bytes	Pentium III
	Algorithm type	Output bytes	43 b	256 b	1500 b	4096 b	256 kb
CBC-RC6-MAC	MAC	16		18.0	17.7	17.7	17.7
HMAC-SHA1	MAC	20	50.3	21.5	14.5	13.8	13.3
hash127-RC6	MAC	16	17.0	6.2	4.6	4.4	5.7
SHA1	Hash	20	25.4	17.2	13.8	13.5	13.3
hash127	Hash	16	10.8	5.2	4.4	4.3	5.7

Notes:

The algorithm hash127 is a polynomial evaluation hash which Daniel Bernstein has shown can be implemented for high speed using IEEE floating point. A paper and implementation can be found at http://cr.yp.to/hash127.html. The hash127 results for 256 kb are slower because the hash127 interface provided by Bernstein does not allow us to simulate large messages residing in level 1 cache.

The Intel Pentium 4 processor provides wider, 128-bit, SIMD registers than those provided by MMX-compatible processors (64-bits). The Pentium 4 also provides SIMD instructions allowing parallel multiplication of 32-bit operands into 64-bit results. These two additions to the Pentium architecture should speed our Pentium implementations by at least 50%. Preliminary results indicate a peak throughput of 1.1 cpb for UMAC-4/8 on the Pentium 4.

Blank entries indicate that we have no data for those parameters.

Last updated: 2000.08.29