SiFive - September 28, 2018
Last Week in RISC-V: Sept 28, 2018
I spent too much time writing my single entry this week so it's all I have. The issue itself is somewhat complex, so I thought it warranted a deep dive.
Relaxing R_RISCV_PCREL_*
in the Presence of Addends
Jim recently found an issue with our ELF relocation scheme. While technically it's a somewhat wide-ranging ABI bug, the actual set of cases in which the issue will manifest is somewhat restricted -- in other words: while I'm sure the proper long-term fix for this will be quite involved, it probably isn't biting too many people in practice right now. If you're unsure about how ELF relocations and linker relaxation work on RISC-V systems, some of the early entries in my "All Aboard" series go into greater detail. For this week's summary, I'll restrict myself to the specific issues at hand.
The vast majority of RISC-V relocations follow the pattern of having one
relocation to fill out the high 20 bits of a 32-bit address (relocations
that follow the naming pattern R_RISCV_*HI20
), and a second relocation
to fill out the low 12 bits of a 32-bit address (relocations that follow
the naming pattern R_RISCV_*LO12*
). This
R_RISCV*_HI20
/R_RISCV*_LO12_*
pattern is baked into RISC-V's
instruction encodings, with the U format providing the high 20 bits and
the I or S formats proving the low 12 bits (hence the names *_LO12_S
and *_LO12_I
). While the issue at hand applies to all relocations
that follow this pattern, the toolchain can handle this extra complexity
for the vast majority of these relocations -- that's why it took us so
long to notice! It turns out that there is one specific case where the
ABI we've chosen where this is difficult to do: the case where multiple
LO12 relocations share the same HI20 relocation but have different
addends.
Unfortunately this toolchain bug manifests as incorrect code generation from the linker, which is pretty much the worst sort of bug -- nobody understands the linker, so it's a long exercise for users to track down the bug and then a lot of work to fix it. In practice this bug it is somewhat unlikely to actually manifest in real code, but since it's a linker bug it's probably best to understand what is going on with an example before we dig into exactly why this bug is unlikely to manifest.
A Concrete Manifestation of this Bug
Like most of the examples I write it's somewhat contrived, I'll discuss how this applies to more likely user code later. The following assembly program is nonsensical but legal:
.text
.global _start
_start:
.balign 4096
.fill 2048, 1, 0
1: auipc t0, %pcrel_hi(global)
addi t1, t0, %pcrel_lo(1b)+0
addi t0, t0, %pcrel_lo(1b)+8
.data
.balign 4096
global:
.fill 2, 8, 0
it's a fairly simple program: it takes the address of a global symbol
and an offset into that global symbol. When assembled it generates a
single R_RISCV_PCREL_HI20
relocation to fill out the top 20 bits of
the auipc
, and then a pair of R_RISCV_PCREL_LO12_I
relocations to
fill out each addi
. This program assembles and links correctly,
resulting in the pre-linked object file you'd expect
17fe: 00000297 auipc t0,0x0
17fe: R_RISCV_PCREL_HI20 global
17fe: R_RISCV_RELAX *ABS*
1802: 00028313 mv t1,t0
1802: R_RISCV_PCREL_LO12_I .L11
1802: R_RISCV_RELAX *ABS*
1806: 00028293 mv t0,t0
1806: R_RISCV_PCREL_LO12_I .L11+0x8
1806: R_RISCV_RELAX *ABS*+0x8
and a sensible executable
11800: 00003297 auipc t0,0x3
11804: 80028313 addi t1,t0,-2048 # 14000 <global>
11808: 80828293 addi t0,t0,-2040
Things start to get interesting when we slightly perturb this program.
In this case, let's look at what happens when we change the alignment of
the auipc
. At the assembly source level this is a straight-forward
change (because I carefully constructed the example that way):
.global _start
_start:
.balign 4096
.fill 2050, 1, 0
1: auipc t0, %pcrel_hi(global)
addi t1, t0, %pcrel_lo(1b)+0
addi t0, t0, %pcrel_lo(1b)+8
.data
.balign 4096
global:
.fill 2, 8, 0
In this case we generate a pre-linked object file that looks like you'd expect, essentially the same as what's above but with different alignment
1800: 00000297 auipc t0,0x0
1800: R_RISCV_PCREL_HI20 global
1800: R_RISCV_RELAX *ABS*
1804: 00028313 mv t1,t0
1804: R_RISCV_PCREL_LO12_I .L11
1804: R_RISCV_RELAX *ABS*
1808: 00028293 mv t0,t0
1808: R_RISCV_PCREL_LO12_I .L11+0x8
1808: R_RISCV_RELAX *ABS*+0x8
The interesting thing here is that this slight input perturbation results in an incorrect binary after linking
11802: 00002297 auipc t0,0x2
11806: 7fe28313 addi t1,t0,2046 # 14000 <global>
1180a: 80628293 addi t0,t0,-2042
Essentially what's going on here is that our relocations are capable of providing 2^32 distinct bit patterns, but in the presence of pair relocations with distinct addends we need to be capable of generating 2^32+1 bit patterns. Like many bugs of this sort, it's quite obvious once it's been described but fairly difficult to find in practice.
Why this Probably isn't Manifesting in Your Program
If you've managed to make it this far you're probably getting pretty worried: a linker bug that manifests as silent incorrect code generation is a pretty hairy thing to deal with. While I agree, it turns out this bug has lurked for so long because it's somewhat unlikely to manifest in real application code under the set of compiler options that are actually sensible to use. We are of course working on fixing the bug (and there's already patches to at least cause it to manifest as an explicit failure to link), but a fully performant fix is somewhat tricky.
This bug can only manifest under a very specific set of conditions:
- There must be multiple
LO12
relocations that share aHI20
relocation but have different addends. - The result of the
HI20
relocation must be misaligned WRT to final target of theLO12
relocations.
As long as you're writing C code and haven't written your own GCC optimization passes, these conditions are mutually exclusive:
- GCC doesn't know how to share a single
auipc
between multiple accesses, so the only way to end up with a shared relocation for the top 20 bits of an address is vialui
. - GCC always aligns the intermediate results of the
lui
-based relocations. This alignment is implicit: GCC aligns the target symbol, which forlui-
based addressing results in an intermediate result that is always aligned.
Importantly, the second case is not true for our auipc
-based
relocations. Since auipc
includes the current PC as part of the
intermediate result, aligning the target symbol is not sufficient to
align the intermediate result.
What's the Takeaway for a User of the Toolchain?
This article has been somewhat involved. I wrote it because I think
it's an interesting dive into a toolchain bug, but most people reading
this probably aren't interested in that level of detail and just want to
know what to do. My official recommendation is simple: don't use
-mcmodel=medany
on an RV32I target. That's actually exactly the same
recommendation as I would have provided before we found this bug. The
key point here is that -mcmodel=medany
doesn't actually provide any
benefit on RV32I targets: -mcmodel=medlow
can already generate every
possible 32-bit address, so all -mcmodel=medany
does is add a bit of
complexity to the address generation. I used to think that all that did
was generate slightly worse code, but now that we've found a correctness
bug my recommendation is simply a bit stronger.
The only users who are likely to need -mcmodel=medany
are those on
64-bit systems. While you're unlikely to find a bug in -mcmodel=medany -mexplicit-relocs
because GCC can't perform constant subexpression
elimination on auipc
, I'd still recommend avoiding -mcmodel=medany -mexplicit-relocs
. While -mcmodel=medany -mexplicit-relocs
is
necessary to squeeze the last fer percent out of Dhrystone, we've found
it actually generates worse code for real programs which is why it's not
the toolchain default. If you're really that worried about a few
percent performance in Dhrystone then I'd recommend finding a better
benchmark.
I also explicitly recommend that users do not try to backport the various patches to fix this bug themselves and instead wait for us to completely verify these bug fixes and add them to our backport branches. The workarounds above are really the right thing to do even without the bug, and due to the subtlety of what's going on here, you're more likely to end up introducing a bug than fixing one.
Events
- October 18: RISC-V Day Tokyo hosted by Keio University in Tokyo, Japan.
- October 22: RISC-V BoF at Embedded Linux Conference + IoT Summit Europe hosted by The Linux Foundation in Edinburgh, UK.
- November 13: RISC-V Microconference at Linux Plumbers Conference in Vancouver, Canada.
- December 3: RISC-V Summit in Santa Clara, CA.