Low-Compilation-Cost Register Allocation in LLVM-Based Binary Translation

Abstract

Efficiently allocating guest architecture registers to host registers is a crucial technique for improving performance in dynamic binary translation. Translators typically enlarge code regions to reduce context-switching overhead and expose more optimization opportunities. Traditional approaches load guest registers upon entering a code region and save them upon exiting, aiming to retain guest registers in host registers for as long as possible. However, as the size of code region increases, the growing number of intermediate variables and the increasing complexity of control flow lead to higher computational costs for register allocation, thereby increasing compilation overhead. This paper investigates the causes of increased register allocation overhead in LLVM-based binary translators and proposes a low-compilation-cost strategy called LCCRA. It explicitly restores the semantics of accessing physical guest registers in LLVM IR, facilitating efficient register allocation on mainstream processor architectures. LCCRA propagates guest register values using virtual registers within each basic block. For values that need to cross basic block boundaries, virtual registers are introduced as substitutes for load operations where necessary, and redundant store operations are eliminated based on control flow analysis. Since this optimization is performed at the LLVM IR level, it does not interfere with existing LLVM optimizations, making it applicable to various LLVM-based binary translators. We implement our strategy in CrossDBT, an LLVM-based binary translator. Evaluation on PARSEC benchmark suite, considering varying code region sizes across two translation scenarios (x86-64 → x86-64 and x86-64 → AArch64), demonstrates that LCCRA achieves a 5.76%-7.79% reduction in end-to-end latency and a 69.55%-74.98% reduction in register allocation time when host registers are limited. In register-abundant scenarios, LCCRA still delivers slightly better end-to-end latency while consistently reducing compilation overhead.

Abstract

Related papers