RISC-V Assembly Programmer‘s Manual

RISC-V Assembly Programmer’s Manual

Copyright and License Information

The RISC-V Assembly Programmer’s Manual is

© 2017 Palmer Dabbelt palmer@dabbelt.com
© 2017 Michael Clark michaeljclark@mac.com
© 2017 Alex Bradbury asb@lowrisc.org

It is licensed under the Creative Commons Attribution 4.0 International License
(CC-BY 4.0). The full license text is available at
https://creativecommons.org/licenses/by/4.0/.

Command-Line Arguments

I think it’s probably better to beef up the binutils documentation rather than
duplicating it here.

Registers

Registers are the most important part of any processor. RISC-V defines various
types, depending on which extensions are included: The general registers (with
the program counter), control registers, floating point registers (F extension),
and vector registers (V extension).

General registers

The RV32I base integer ISA includes 32 registers, named x0 to x31. The
program counter PC is separate from these registers, in contrast to other
processors such as the ARM-32. The first register, x0, has a special function:
Reading it always returns 0 and writes to it are ignored. As we will see later,
this allows various tricks and simplifications.

In practice, the programmer doesn’t use this notation for the registers. Though
x1 to x31 are all equally general-use registers as far as the processor is
concerned, by convention certain registers are used for special tasks. In
assembler, they are given standardized names as part of the RISC-V application
binary interface
(ABI). This is what you will usually see in code listings. If
you really want to see the numeric register names, the -M argument to objdump
will provide them.

RegisterABIUse by conventionPreserved?
x0zerohardwired to 0, ignores writesn/a
x1rareturn address for jumpsno
x2spstack pointeryes
x3gpglobal pointern/a
x4tpthread pointern/a
x5t0temporary register 0no
x6t1temporary register 1no
x7t2temporary register 2no
x8s0 or fpsaved register 0 or frame pointeryes
x9s1saved register 1yes
x10a0return value or function argument 0no
x11a1return value or function argument 1no
x12a2function argument 2no
x13a3function argument 3no
x14a4function argument 4no
x15a5function argument 5no
x16a6function argument 6no
x17a7function argument 7no
x18s2saved register 2yes
x19s3saved register 3yes
x20s4saved register 4yes
x21s5saved register 5yes
x22s6saved register 6yes
x23s7saved register 7yes
x24s8saved register 8yes
x25s9saved register 9yes
x26s10saved register 10yes
x27s11saved register 11yes
x28t3temporary register 3no
x29t4temporary register 4no
x30t5temporary register 5no
x31t6temporary register 6no
pc(none)program countern/a

Registers of the RV32I. Based on RISC-V documentation and Patterson and
Waterman “The RISC-V Reader” (2017)

As a general rule, the saved registers s0 to s11 are preserved across
function calls, while the argument registers a0 to a7 and the
temporary registers t0 to t6 are not. The use of the various
specialized registers such as sp by convention will be discussed later in more
detail.

Control registers

(TBA)

Floating Point registers (RV32F)

(TBA)

Vector registers (RV32V)

(TBA)

Addressing

Addressing formats like %pcrel_lo(). We can just link to the RISC-V PS ABI
document to describe what the relocations actually do.

Instruction Set

Official Specifications webpage:

  • https://riscv.org/specifications/

Latest Specifications draft repository:

  • https://github.com/riscv/riscv-isa-manual

Instructions

RISC-V ISA Specifications

https://riscv.org/specifications/

Instruction Aliases

ALIAS line from opcodes/riscv-opc.c

To better diagnose situations where the program flow reaches an unexpected
location, you might want to emit there an instruction that’s known to trap. You
can use an UNIMP pseudo-instruction, which should trap in nearly all systems.
The de facto standard implementation of this instruction is:

  • C.UNIMP: 0000. The all-zeroes pattern is not a valid instruction. Any
    system which traps on invalid instructions will thus trap on this UNIMP
    instruction form. Despite not being a valid instruction, it still fits the
    16-bit (compressed) instruction format, and so 0000 0000 is interpreted as
    being two 16-bit UNIMP instructions.

  • UNIMP : C0001073. This is an alias for CSRRW x0, cycle, x0. Since
    cycle is a read-only CSR, then (whether this CSR exists or not) an attempt
    to write into it will generate an illegal instruction exception. This 32-bit
    form of UNIMP is emitted when targeting a system without the C extension,
    or when the .option norvc directive is used.

Pseudo Ops

Both the RISC-V-specific and GNU .-prefixed options.

The following table lists assembler directives:

DirectiveArgumentsDescription
.alignintegeralign to power of 2 (alias for .p2align)
.file“filename”emit filename FILE LOCAL symbol table
.globlsymbol_nameemit symbol_name to symbol table (scope GLOBAL)
.localsymbol_nameemit symbol_name to symbol table (scope LOCAL)
.commsymbol_name,size,alignemit common object to .bss section
.commonsymbol_name,size,alignemit common object to .bss section
.ident“string”accepted for source compatibility
.section[{.text,.data,.rodata,.bss}]emit section (if not present, default .text) and make current
.sizesymbol, symbolaccepted for source compatibility
.textemit .text section (if not present) and make current
.dataemit .data section (if not present) and make current
.rodataemit .rodata section (if not present) and make current
.bssemit .bss section (if not present) and make current
.string“string”emit string
.asciz“string”emit string (alias for .string)
.equname, valueconstant definition
.macroname arg1 [, argn]begin macro definition \argname to substitute
.endmend macro definition
.typesymbol, @functionaccepted for source compatibility
.option{rvc,norvc,pic,nopic,push,pop}RISC-V options
.byteexpression [, expression]*8-bit comma separated words
.2byteexpression [, expression]*16-bit comma separated words
.halfexpression [, expression]*16-bit comma separated words
.shortexpression [, expression]*16-bit comma separated words
.4byteexpression [, expression]*32-bit comma separated words
.wordexpression [, expression]*32-bit comma separated words
.longexpression [, expression]*32-bit comma separated words
.8byteexpression [, expression]*64-bit comma separated words
.dwordexpression [, expression]*64-bit comma separated words
.quadexpression [, expression]*64-bit comma separated words
.dtprelwordexpression [, expression]*32-bit thread local word
.dtpreldwordexpression [, expression]*64-bit thread local word
.sleb128expressionsigned little endian base 128, DWARF
.uleb128expressionunsigned little endian base 128, DWARF
.p2alignp2,[pad_val=0],maxalign to power of 2
.balignb,[pad_val=0]byte align
.zerointegerzero bytes

Assembler Relocation Functions

The following table lists assembler relocation expansions:

Assembler NotationDescriptionInstruction / Macro
%hi(symbol)Absolute (HI20)lui
%lo(symbol)Absolute (LO12)load, store, add
%pcrel_hi(symbol)PC-relative (HI20)auipc
%pcrel_lo(label)PC-relative (LO12)load, store, add
%tprel_hi(symbol)TLS LE “Local Exec”lui
%tprel_lo(symbol)TLS LE “Local Exec”load, store, add
%tprel_add(symbol)TLS LE “Local Exec”add
%tls_ie_pcrel_hi(symbol) *TLS IE “Initial Exec” (HI20)auipc
%tls_gd_pcrel_hi(symbol) *TLS GD “Global Dynamic” (HI20)auipc
%got_pcrel_hi(symbol) *GOT PC-relative (HI20)auipc

* These reuse %pcrel_lo(label) for their lower half

Labels

Text labels are used as branch, unconditional jump targets and symbol offsets.
Text labels are added to the symbol table of the compiled module.

loop:
        j loop

Numeric labels are used for local references. References to local labels are
suffixed with ‘f’ for a forward reference or ‘b’ for a backwards reference.

1:
        j 1b

Absolute addressing

The following example shows how to load an absolute address:

	lui	a0, %hi(msg + 1)
	addi	a0, a0, %lo(msg + 1)

Which generates the following assembler output and relocations
as seen by objdump:

0000000000000000 <.text>:
   0:	00000537          	lui	a0,0x0
			0: R_RISCV_HI20	msg+0x1
   4:	00150513          	addi	a0,a0,1 # 0x1
			4: R_RISCV_LO12_I	msg+0x1

Relative addressing

The following example shows how to load a PC-relative address:

1:
	auipc	a0, %pcrel_hi(msg + 1)
	addi	a0, a0, %pcrel_lo(1b)

Which generates the following assembler output and relocations
as seen by objdump:

0000000000000000 <.text>:
   0:	00000517          	auipc	a0,0x0
			0: R_RISCV_PCREL_HI20	msg+0x1
   4:	00050513          	mv	a0,a0
			4: R_RISCV_PCREL_LO12_I	.L1

GOT-indirect addressing

The following example shows how to load an address from the GOT:

1:
	auipc	a0, %got_pcrel_hi(msg + 1)
	ld	a0, %pcrel_lo(1b)(a0)

Which generates the following assembler output and relocations
as seen by objdump:

0000000000000000 <.text>:
   0:	00000517          	auipc	a0,0x0
			0: R_RISCV_GOT_HI20	msg+0x1
   4:	00050513          	mv	a0,a0
			4: R_RISCV_PCREL_LO12_I	.L1

Load Immediate

The following example shows the li pseudo instruction which
is used to load immediate values:

	.equ	CONSTANT, 0xdeadbeef

	li	a0, CONSTANT

Which, for RV32I, generates the following assembler output, as seen by objdump:

00000000 <.text>:
   0:	deadc537          	lui	a0,0xdeadc
   4:	eef50513          	addi	a0,a0,-273 # deadbeef <CONSTANT+0x0>

Load Upper Immediate’s Immediate

The immediate argument to lui is an integer in the interval [0x0, 0xfffff].
Its compressed form, c.lui, accepts only those in the subintervals [0x1, 0x1f] and [0xfffe0, 0xfffff].

Load Address

The following example shows the la pseudo instruction which
is used to load symbol addresses:

	la	a0, msg + 1

Which generates the following assembler output and relocations
for non-PIC as seen by objdump:

0000000000000000 <.text>:
   0:	00000517          	auipc	a0,0x0
			0: R_RISCV_PCREL_HI20	msg+0x1
   4:	00050513          	mv	a0,a0
			4: R_RISCV_PCREL_LO12_I	.L0

And generates the following assembler output and relocations
for PIC as seen by objdump:

0000000000000000 <.text>:
   0:	00000517          	auipc	a0,0x0
			0: R_RISCV_GOT_HI20	msg+0x1
   4:	00053503          	ld	a0,0(a0) # 0 <.text>
			4: R_RISCV_PCREL_LO12_I	.L0

Load and Store Global

The following pseudo instructions are available to load from and store to
global objects:

  • l{b|h|w|d} <rd>, <symbol>: load byte, half word, word or double word from global1
  • s{b|h|w|d} <rd>, <symbol>, <rt>: store byte, half word, word or double word to global2
  • fl{h|w|d|q} <rd>, <symbol>, <rt>: load half, float, double or quad precision from global2
  • fs{h|w|d|q} <rd>, <symbol>, <rt>: store half, float, double or quad precision to global2

The following example shows how these pseudo instructions are used:

	lw	a0, var1
	fld	fa0, var2, t0
	sw	a0, var3, t0
	fsd	fa0, var4, t0

Which generates the following assembler output and relocations
as seen by objdump:

0000000000000000 <.text>:
   0:	00000517          	auipc	a0,0x0
			0: R_RISCV_PCREL_HI20	var1
   4:	00052503          	lw	a0,0(a0) # 0 <.text>
			4: R_RISCV_PCREL_LO12_I	.L0
   8:	00000297          	auipc	t0,0x0
			8: R_RISCV_PCREL_HI20	var2
   c:	0002b507          	fld	fa0,0(t0) # 8 <.text+0x8>
			c: R_RISCV_PCREL_LO12_I	.L0
  10:	00000297          	auipc	t0,0x0
			10: R_RISCV_PCREL_HI20	var3
  14:	00a2a023          	sw	a0,0(t0) # 10 <.text+0x10>
			14: R_RISCV_PCREL_LO12_S	.L0
  18:	00000297          	auipc	t0,0x0
			18: R_RISCV_PCREL_HI20	var4
  1c:	00a2b027          	fsd	fa0,0(t0) # 18 <.text+0x18>
			1c: R_RISCV_PCREL_LO12_S	.L0

Constants

The following example shows loading a constant using the %hi and
%lo assembler functions.

	.equ	UART_BASE, 0x40003080

	lui	a0, %hi(UART_BASE)
	addi	a0, a0, %lo(UART_BASE)

Which generates the following assembler output
as seen by objdump:

0000000000000000 <.text>:
   0:	40003537          	lui	a0,0x40003
   4:	08050513          	addi	a0,a0,128 # 40003080 <UART_BASE>

Function Calls

The following pseudo instructions are available to call subroutines far from
the current position:

  • call <symbol>: call away subroutine1
  • call <rd>, <symbol>: call away subroutine2
  • tail <symbol>: tail call away subroutine3
  • jump <symbol>, <rt>: jump to away routine4

The following example shows how these pseudo instructions are used:

	call	func1
	tail	func2
	jump	func3, t0

Which generates the following assembler output and relocations
as seen by objdump:

0000000000000000 <.text>:
   0:	00000097          	auipc	ra,0x0
			0: R_RISCV_CALL	func1
   4:	000080e7          	jalr	ra # 0x0
   8:	00000317          	auipc	t1,0x0
			8: R_RISCV_CALL	func2
   c:	00030067          	jr	t1 # 0x8
  10:	00000297          	auipc	t0,0x0
			10: R_RISCV_CALL	func3
  14:	00028067          	jr	t0 # 0x10

Floating-point rounding modes

For floating-point instructions with a rounding mode field, the rounding mode
can be specified by adding an additional operand. e.g. fcvt.w.s with
round-to-zero can be written as fcvt.w.s a0, fa0, rtz. If unspecified, the
default dyn rounding mode will be used.

Supported rounding modes are as follows (must be specified in lowercase):

  • rne: round to nearest, ties to even
  • rtz: round towards zero
  • rdn: round down
  • rup: round up
  • rmm: round to nearest, ties to max magnitude
  • dyn: dynamic rounding mode (the rounding mode specified in the frm field
    of the fcsr register is used)

Control and Status Registers

The following code sample shows how to enable timer interrupts,
set and wait for a timer interrupt to occur:

.equ RTC_BASE,      0x40000000
.equ TIMER_BASE,    0x40004000

# setup machine trap vector
1:      auipc   t0, %pcrel_hi(mtvec)        # load mtvec(hi)
        addi    t0, t0, %pcrel_lo(1b)       # load mtvec(lo)
        csrrw   zero, mtvec, t0

# set mstatus.MIE=1 (enable M mode interrupt)
        li      t0, 8
        csrrs   zero, mstatus, t0

# set mie.MTIE=1 (enable M mode timer interrupts)
        li      t0, 128
        csrrs   zero, mie, t0

# read from mtime
        li      a0, RTC_BASE
        ld      a1, 0(a0)

# write to mtimecmp
        li      a0, TIMER_BASE
        li      t0, 1000000000
        add     a1, a1, t0
        sd      a1, 0(a0)

# loop
loop:
        wfi
        j loop

# break on interrupt
mtvec:
        csrrc  t0, mcause, zero
        bgez t0, fail       # interrupt causes are less than zero
        slli t0, t0, 1      # shift off high bit
        srli t0, t0, 1
        li t1, 7            # check this is an m_timer interrupt
        bne t0, t1, fail
        j pass

pass:
        la a0, pass_msg
        jal puts
        j shutdown

fail:
        la a0, fail_msg
        jal puts
        j shutdown

.section .rodata

pass_msg:
        .string "PASS\n"

fail_msg:
        .string "FAIL\n"

A listing of standard RISC-V pseudoinstructions

PseudoinstructionBase Instruction(s)MeaningComment
la rd, symbolauipc rd, symbol[31:12]; addi rd, rd, symbol[11:0]Load address
l{b|h|w|d} rd, symbolauipc rd, symbol[31:12]; l{b|h|w|d} rd, symbol[11:0](rd)Load global
s{b|h|w|d} rd, symbol, rtauipc rt, symbol[31:12]; s{b|h|w|d} rd, symbol[11:0](rt)Store global
fl{w|d} rd, symbol, rtauipc rt, symbol[31:12]; fl{w|d} rd, symbol[11:0](rt)Floating-point load global
fs{w|d} rd, symbol, rtauipc rt, symbol[31:12]; fs{w|d} rd, symbol[11:0](rt)Floating-point store global
nopaddi x0, x0, 0No operation
li rd, immediateMyriad sequencesLoad immediate
mv rd, rsaddi rd, rs, 0Copy register
not rd, rsxori rd, rs, -1Ones’ complement
neg rd, rssub rd, x0, rsTwo’s complement
negw rd, rssubw rd, x0, rsTwo’s complement word
sext.b rd, rsslli rd, rs, XLEN - 8; srai rd, rd, XLEN - 8Sign extend byteIt will expand to another instruction sequence when B extension is available*[1]
sext.h rd, rsslli rd, rs, XLEN - 16; srai rd, rd, XLEN - 16Sign extend half wordIt will expand to another instruction sequence when B extension is available*[1]
sext.w rd, rsaddiw rd, rs, 0Sign extend word
zext.b rd, rsandi rd, rs, 255Zero extend byte
zext.h rd, rsslli rd, rs, XLEN - 16; srli rd, rd, XLEN - 16Zero extend half wordIt will expand to another instruction sequence when B extension is available*[1]
zext.w rd, rsslli rd, rs, XLEN - 32; srli rd, rd, XLEN - 32Zero extend wordIt will expand to another instruction sequence when B extension is available*[1]
seqz rd, rssltiu rd, rs, 1Set if = zero
snez rd, rssltu rd, x0, rsSet if != zero
sltz rd, rsslt rd, rs, x0Set if < zero
sgtz rd, rsslt rd, x0, rsSet if > zero
fmv.s rd, rsfsgnj.s rd, rs, rsCopy single-precision register
fabs.s rd, rsfsgnjx.s rd, rs, rsSingle-precision absolute value
fneg.s rd, rsfsgnjn.s rd, rs, rsSingle-precision negate
fmv.d rd, rsfsgnj.d rd, rs, rsCopy double-precision register
fabs.d rd, rsfsgnjx.d rd, rs, rsDouble-precision absolute value
fneg.d rd, rsfsgnjn.d rd, rs, rsDouble-precision negate
beqz rs, offsetbeq rs, x0, offsetBranch if = zero
bnez rs, offsetbne rs, x0, offsetBranch if != zero
blez rs, offsetbge x0, rs, offsetBranch if ≤ zero
bgez rs, offsetbge rs, x0, offsetBranch if ≥ zero
bltz rs, offsetblt rs, x0, offsetBranch if < zero
bgtz rs, offsetblt x0, rs, offsetBranch if > zero
bgt rs, rt, offsetblt rt, rs, offsetBranch if >
ble rs, rt, offsetbge rt, rs, offsetBranch if ≤
bgtu rs, rt, offsetbltu rt, rs, offsetBranch if >, unsigned
bleu rs, rt, offsetbgeu rt, rs, offsetBranch if ≤, unsigned
j offsetjal x0, offsetJump
jal offsetjal x1, offsetJump and link
jr rsjalr x0, rs, 0Jump register
jalr rsjalr x1, rs, 0Jump and link register
retjalr x0, x1, 0Return from subroutine
call offsetauipc x6, offset[31:12]; jalr x1, x6, offset[11:0]Call far-away subroutine
tail offsetauipc x6, offset[31:12]; jalr x0, x6, offset[11:0]Tail call far-away subroutine
fencefence iorw, iorwFence on all memory and I/O
  • [1] We don’t specify the code sequence when the B-extension is present, since B-extension still not ratified or frozen. We will specify the expansion sequence once it’s frozen.

Pseudoinstructions for accessing control and status registers

PseudoinstructionBase Instruction(s)Meaning
rdinstret[h] rdcsrrs rd, instret[h], x0Read instructions-retired counter
rdcycle[h] rdcsrrs rd, cycle[h], x0Read cycle counter
rdtime[h] rdcsrrs rd, time[h], x0Read real-time clock
csrr rd, csrcsrrs rd, csr, x0Read CSR
csrw csr, rscsrrw x0, csr, rsWrite CSR
csrs csr, rscsrrs x0, csr, rsSet bits in CSR
csrc csr, rscsrrc x0, csr, rsClear bits in CSR
csrwi csr, immcsrrwi x0, csr, immWrite CSR, immediate
csrsi csr, immcsrrsi x0, csr, immSet bits in CSR, immediate
csrci csr, immcsrrci x0, csr, immClear bits in CSR, immediate
frcsr rdcsrrs rd, fcsr, x0Read FP control/status register
fscsr rd, rscsrrw rd, fcsr, rsSwap FP control/status register
fscsr rscsrrw x0, fcsr, rsWrite FP control/status register
frrm rdcsrrs rd, frm, x0Read FP rounding mode
fsrm rd, rscsrrw rd, frm, rsSwap FP rounding mode
fsrm rscsrrw x0, frm, rsWrite FP rounding mode
fsrmi rd, immcsrrwi rd, frm, immSwap FP rounding mode, immediate
fsrmi immcsrrwi x0, frm, immWrite FP rounding mode, immediate
frflags rdcsrrs rd, fflags, x0Read FP exception flags
fsflags rd, rscsrrw rd, fflags, rsSwap FP exception flags
fsflags rscsrrw x0, fflags, rsWrite FP exception flags
fsflagsi rd, immcsrrwi rd, fflags, immSwap FP exception flags, immediate
fsflagsi immcsrrwi x0, fflags, immWrite FP exception flags, immediate

  1. ra is implicitly used to save the return address. ↩︎ ↩︎

  2. similar to call <symbol>, but <rd> is used to save the return address instead. ↩︎ ↩︎ ↩︎ ↩︎

  3. t1 is implicitly used as a scratch register. ↩︎

  4. similar to tail <symbol>, but <rt> is used as the scratch register instead. ↩︎

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值