RISCV LLVM Backend项目报告 - DailinH/llvm-riscv-backend GitHub Wiki
本项目是东南大学计算机科学与工程16级综合课程设计项目的一部分,我和@Vigilans同学负责编译器部分,采取前后端分离的开发方式。我负责的内容为LLVM IR -> RISCV Assembly,预计2020年1月中旬完成整个项目。在正式开始写后端之前,先尝试在linux系统下利用clang/llvm交叉编译至riscv指令。从2019年6月起,RISC-V已经成为了LLVM库的正式编译Target的一部分,现有的库支持RV32I和RV64I。我的工作主要是实现对RISCV-C Standard Extension的子编译对象支持。
graph LR
A[code] --> |LLVM Frontend|B[LLVM IR]
B --> |LLVM Backend|D[Target Machine Assembly: RISCV]
后端流水线实际结构:
graph LR
A[IR] --> B[SelectionDAG]
B --> C[MachineDAG]
C --> D[MachineInst]
D --> E[MCInst]
操作系统: (Amazon AWS) Ubuntu 16.04.6 LTS xenial
$ sudo apt-get update
$ sudo apt-get -y dist-upgrade
$ sudo apt-get -y install \
> binutils build-essential libtool texinfo \
> gzip zip unzip patchutils curl git \
> make cmake ninja-build automake bison flex gperf \
> grep sed gawk python bc \
> zlib1g-dev libexpat1-dev libmpc-dev \
> libglib2.0-dev libfdt-dev libpixman-1-dev 注意这里cmake的最低版本需要3.4.3[3].
在交叉编译LLVM/Clang的过程中,LLVM_TARGET_ARCH这一选项默认为host, 即编译到本机ASM。我们先尝试编译到本机ASM并运行。
测试代码 hello.c:
#include <stdio.h>
int main(){
printf("Hello RISCV!\n");
return 0;
} 使用clang编译,执行命令
$ clang -O3 hello.c -c -S -o hello.bc编译结果如下。
.text
.file "hello.c"
.globl main
.align 16, 0x90
.type main,@function
main: # @main
.cfi_startproc
# BB#0:
pushq %rax
.Ltmp0:
.cfi_def_cfa_offset 16
movl $.Lstr, %edi
callq puts
xorl %eax, %eax
popq %rcx
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
.type .Lstr,@object # @str
.section .rodata.str1.1,"aMS",@progbits,1
.Lstr:
.asciz "Hello RISCV!"
.size .Lstr, 13
.ident "clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)"
.section ".note.GNU-stack","",@progbitsclang -S -emit-llvm hello.c生成hello.ll文件,内容如下
; ModuleID = 'hello.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
@.str = private unnamed_addr constant [14 x i8] c"Hello RISCV!\0A\00", align 1
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1, align 4
%2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @.str, i32 0, i32 0))
ret i32 0
}
declare i32 @printf(i8*, ...) #1
attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = !{!"clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)"}
LLVM官方文档提供了从LLVM IR到指定Assembly Language的后端的构造教程[4]. 在llvm/lib/Target中也提供了样例。
RVC是RISCV的一个扩展,可以用于RV32, RV64, RV128任意指令集扩展,其作用是减小静态和动态编译对象大小。50%-60%的RISCV指令都能用RVC实现,可以将生成代码大小缩减到原来的25%-30%。RVC使用16位指令,共有八种压缩指令类型,如图所示:



















- 创建TargetMachine类的子类,描述目标机的特性
- 用TableGen描述目标机的寄存器集合
- 描述目标机的指令集
- 描述从LLVM IR从DAG到目标机指令集的选择与转换
- 在AsmPrinter下写一个子类,完成LLVM bitcode到GNU Assembler riscv的转换
项目文件放置在/llvm/lib/Target/miniRISCV/中.在target miniRISCV未完成的情况下,使用cmake -G Ninja -DLLVM_ENABLE_PROJECTS=clang DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=miniRISCV ../llvm进行编译,以避免将其添加到target列表中造成混乱。
首先我们需要在/llvm/lib/Target/miniRISCV/创建一个TargetMachine子类。Target Machine用get*info 保存目标机相关信息,包括指令集getInstrInfo, 寄存器集getRegisterInfo,页设置getFrameInfo, 数据格式getDataLayout.
在这里,我们暂时先采用内置RISCV使用的Target Machine设定。实际上,C Extension无论基于RV32, RV64, RV128都可以使用。在本实验报告中,我们基于RV32对象实现RVC后端。
在原生代码中,RV32的datalayout为e-m:e-p:32:32-i64:64-n32-S128, RV64的datalayout为e-m:e-p:64:64-i64:64-i128:128-n64-S128,其中e表示小端LSB。
在RISCVTargetInfo.cpp中,我们分别登记RV32和RV64对象。
extern "C" void LLVMInitializeRISCVTarget() {
RegisterTargetMachine<RISCVTargetMachine> X(getTheRISCV32Target());
RegisterTargetMachine<RISCVTargetMachine> Y(getTheRISCV64Target());
auto PR = PassRegistry::getPassRegistry();
initializeGlobalISel(*PR);
initializeRISCVExpandPseudoPass(*PR);
}在RISCVAsmPrinter.cpp中,我们登记asmprinter对象。ASMPrinter用来生成从LLVM到asm的转换。
extern "C" void LLVMInitializeRISCVAsmPrinter() {
RegisterAsmPrinter<RISCVAsmPrinter> X(getTheRISCV32Target());
RegisterAsmPrinter<RISCVAsmPrinter> Y(getTheRISCV64Target());
}在RISCVRegisterInfo.h/cpp/td中,我们需要定义寄存器数据分配和交互的信息,还需要定义寄存器类以为同类操作的寄存器分类。
- 定义寄存器和寄存器类
我们在RISCVRegisterInfo.td中定义寄存器。寄存器的基类为
class Register<string n> {
string Namespace = ""; //
string AsmName = n;
string Name = n;
int SpillSize = 0;
int SpillAlignment = 0;
list<Register> Aliases = [];
list<Register> SubRegs = [];
list<int> DwarfNumbers = [];
}在这里,RISCV定义了基类RISCV寄存器、基类RISCV32bit寄存器、基类RISCV64bit寄存器。其中,RISCV64bit寄存器是在32bit寄存器的基础上进行的扩展。与此同时,我们定义了32个整数寄存器。利用以上信息,TableGen自动生成RISCVGenRegister.inc文件
- TableGen[12]
TableGen的语法基于C++模板,主要包括类和定义两个部分,统称为记录。 细节规定见[12].
- 实现TargetRegisterInfo的子类
LLVM在编译过程中,会先将LLVM IR转换为SelectionDAG, 节点类型为SDNode.每个节点有操作码(opcode), 操作符(operands),类型限制(type requirements), 以及操作特性(operation properties),详见include/llvm/CodeGen/SelectionDAGNodes.h.
TableGen主要从Target.td, TargetSelectionDAG.td, RISCVInstrFormats.td, RISCVInstrInfo.td四个文件生成指令定义。另外,RISCV.td也包含了所有的指令信息,但是其内容对于子编译目标更为重要。
在RISCVInstrInfo.td中,我们需要描述由目标机支持的机器指令。RISCVInstrInfo中有一个RISCVInstrDescriptor数组,每个元素描述一个指令。对指令的描述需要包括操作吗助记符、操作符数量、寄存器描述和使用、与目标机无关的性质、与目标机相关的性质。
对于RVC来说,作为扩展指令集,Target Machine、Target Registration, Register Set几乎不需要改动。因此主要需要关注的是RISCVInstrInfoC.td文件和RISCVFormatsC.td文件。
RISCVFormatsC.td文件中需要描述八种指令格式: CR(Register), CI(Immediate), CSS(Stack-relative Store), CIW(Wide Immediate), CL(Load), CS(Store), CB(Branch) and CJ(Jump).
在RISCVISelDAGToDAG中,我们描述从SelectionDAG到目标机DAG的映射方法。TableGen从RISCVInstrInfo.td,RISCVCallingConv.td两个文件读取目标机描述。我们使用llc可以显示SelectionDAG[10]。
- SelectDAG生成步骤
- 函数声明约定
Assembly Printer输出最终的asm结果。
$ mkdir build
$ cd build
$ cmake -G Ninja -DLLVM_ENABLE_PROJECTS=clang -LLVM_TARGET_ARCH=RISCV ../llvm
$ ninja
$ ninja check-all运行结果如下:
[168/619] cd /home/dae/llvm-riscv-back...b /usr/bin/python -m unittest discover
..............................s......s.s.......s...s...........................................................s..............
----------------------------------------------------------------------
Ran 126 tests in 2.049s
OK (skipped=6)
[618/619] Running all regression tests
llvm-lit: /home/dae/llvm-riscv-backend/llvm/utils/lit/lit/llvm/config.py:342: note: using clang: /home/dae/llvm-riscv-backend/build/bin/clang
llvm-lit: /home/dae/llvm-riscv-backend/build/utils/lit/tests/lit.cfg:80: warning: Setting a timeout per test not supported. Requires the Python psutil module but it could not be found. Try installing it via pip or via your operating system's package manager. Some tests will be skipped and the --timeout command line argument will not work.
-- Testing: 51561 tests, 4 workers --
Testing Time: 1280.57s
Expected Passes : 50173
Expected Failures : 165
Unsupported Tests : 1223
1 warning(s) in tests
测试clang编译:
$ clang -O3 hello.c -c -S -o hello.bc输出结果如下:
.text
.file "hello.c"
.globl main # -- Begin function main
.p2align 4, 0x90
.type main,@function
main: # @main
.cfi_startproc
# %bb.0:
pushq %rax
.cfi_def_cfa_offset 16
movl $.Lstr, %edi
callq puts
xorl %eax, %eax
popq %rcx
.cfi_def_cfa_offset 8
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
# -- End function
.type .Lstr,@object # @str
.section .rodata.str1.1,"aMS",@progbits,1
.Lstr:
.asciz "Hello RISCV!"
.size .Lstr, 13
.ident "clang version 10.0.0 (https://github.com/DailinH/llvm-riscv-backend.git 4164be7206d740b77b5a7b4b2f859ed122d08c10)"
.section ".note.GNU-stack","",@progbits
.addrsig
生成的中间IR如下:
; ModuleID = 'hello.c'
source_filename = "hello.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@str = private unnamed_addr constant [13 x i8] c"Hello RISCV!\00", align 1
; Function Attrs: nofree nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
%1 = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str, i64 0, i64 0))
ret i32 0
}
; Function Attrs: nofree nounwind
declare i32 @puts(i8* nocapture readonly) local_unnamed_addr #1
attributes #0 = { nofree nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nofree nounwind }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0 (https://github.com/DailinH/llvm-riscv-backend.git 4164be7206d740b77b5a7b4b2f859ed122d08c10)"}
[1] M. Pandey, S. Sarda. LLVM Cookbook
[2] sifive/riscv-llvm, https://github.com/sifive/riscv-llvm
[3] Building LLVM with CMake, https://llvm.org/docs/CMake.html
[4] Writing an LLVM Backend https://llvm.org/docs/WritingAnLLVMBackend.html
[5] LLVM Language Reference Manual https://llvm.org/docs/LangRef.html
[6] QEMU简介 https://www.ibm.com/developerworks/cn/linux/l-qemu/index.html
[7] The LLVM Target-Independent Code Generator https://llvm.org/docs/CodeGenerator.html
[8] TableGen https://llvm.org/docs/TableGen/index.html
[9] Writing an LLVM Pass https://llvm.org/docs/WritingAnLLVMPass.html
[10] SelectionDAG Instruction Selection Process https://llvm.org/docs/CodeGenerator.html#selectiondag-process
[11] The RISC-V Instruction Set Manual
Volume I: User-Level ISA https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
[12] TableGen https://llvm.org/docs/TableGen/index.html