Modular multiplication is a fundamental and performance
determining operation in various public-key cryptosystems.
High-performance modular multipliers on FPGAs are commonly realized by several small-sized multipliers, an adder tree for summing up the digit-products, and a reduction circuit. While small-sized multipliers are available in pre-fabricated high-speed DSP slices, the adder tree and the reduction circuit are implemented in standard logic. The latter operations represent the performance bottleneck to highperformance
implementations. Previous works attempted to minimize the critical path of the adder tree by rearranging digit-products on digit-level. We report improved performance by regrouping digit-products on bit-level, while incorporating the reduction for Mersenne primes. Our approach leads to very fast modular multipliers, whose latency and throughput characteristics outperform all previous results. We formalize our approach and provide algorithms to automatically generate high-performance modular multipliers for arbitrary Mersenne
primes from any small-sized multipliers.
«
Modular multiplication is a fundamental and performance
determining operation in various public-key cryptosystems.
High-performance modular multipliers on FPGAs are commonly realized by several small-sized multipliers, an adder tree for summing up the digit-products, and a reduction circuit. While small-sized multipliers are available in pre-fabricated high-speed DSP slices, the adder tree and the reduction circuit are implemented in standard logic. The latter operations represent the performa...
»