We provide throughput rates, measured in machine cycles-per-byte, for the processing of messages of various byte-lengths by UMAC and its underlying hash function UHASH. We use cycles-per-byte because it makes possible efficiency comparisons between processors running at different speeds. To convert to bytes-per-second, divide your processor's cycles-per-second (Hz) by the reported cycles-per-byte to get bytes-per-second. For example, a 1 GHz processor spending 2.0 cycles-per-byte processes 1e9 / 2.0 = 0.5e9 bytes-per-second (500 MB/sec).
We report on throughput speeds for four message lengths: 43 bytes, 256 bytes, 1500 bytes and 256 kilobytes. We are told by engineers at Cisco Systems that these first three message lengths form a good "rule-of-thumb" for the most-common message lengths seen on Internet backbones. The forth length represents peak throughput for most algorithms.
UMAC and UHASH are named in the manner UMAC-w/l and UHASH-w/l, where w indicates the "basic" word-size used (in bytes) and l indicates the length of the output (in bytes). Most architectures manipulate 4-byte words well, while some architectures support SIMD parallelism for 2-byte words, so we report only on the cases where w=2 and w=4.
64-bit output: We advocate the use of 64-bit authentication tags for most purposes. UMAC and UHASH, however, provide flexibity in meeting performance and security needs. Longer or shorter outputs can be produced with proportionally longer or shorter computation time and proportionally stronger or weaker security. To view performance results for UMAC and UHASH for other output lengths, see our "more performance results" page. For a detailed discussion of security considerations, see the UMAC specification.
|Pentium III||PowerPC 7400|
|43 b||256 b||1500 b||256 kb||43 b||256 b||1500 b||256 kb|
Comparison algorithms: To better compare UMAC and UHASH performance with other algorithms, we give their performance here.
|43 b||256 b||1500 b||4096 b||256 kb|
The algorithm hash127 is a polynomial evaluation hash which Daniel Bernstein has shown can be implemented for high speed using IEEE floating point. A paper and implementation can be found at http://cr.yp.to/hash127.html. The hash127 results for 256 kb are slower because the hash127 interface provided by Bernstein does not allow us to simulate large messages residing in level 1 cache.
The Intel Pentium 4 processor provides wider, 128-bit, SIMD registers than those provided by MMX-compatible processors (64-bits). The Pentium 4 also provides SIMD instructions allowing parallel multiplication of 32-bit operands into 64-bit results. These two additions to the Pentium architecture should speed our Pentium implementations by at least 50%. Preliminary results indicate a peak throughput of 1.1 cpb for UMAC-4/8 on the Pentium 4.
Blank entries indicate that we have no data for those parameters.
Last updated: 2000.08.29