Memory Access in Assembly

The Memory Hierarchy

So far, we have seen a bunch of RISC-V instructions that access the 32 registers, but we haven’t accessed memory yet. Registers are fine as long as your data fits in 31 64-bit values, but real software needs “bulk” storage, and that’s what memory is for.

In general, computer architects think of these different ways of storing data as tiers in an organization called the memory hierarchy. You can imagine an entire spectrum of different ways of storing data, all of which trade off between different goals:

  • Smaller memories that are closer to the processor and faster to access.
  • Larger memories that are farther from the processor and slower to access.

Registers are toward the first extreme: in 64-bit RISC-V, there is only a total of \(31 \times 8 = 248\) bytes of mutable storage, and it usually takes around 1 cycle (less than a nanosecond) to access a register.

Modern main memory is at the opposite extreme: even cheap phones have several gigabytes of main memory, and it typically takes hundreds of cycles (hundreds of nanoseconds) to access it.

You might reasonably ask: why not make the whole plane out of registers? There are two big answers to this question.

  • In real computers, these different memories are made out of different memory technologies. The physical details of how to construct memories are out of scope for CS 3410, but registers are universally made from transistors (like the flip-flops we built in class) and integrated with the processor, main memory is made of DRAM, a memory-specific technology that uses tiny little capacitors to store bits. DRAM requires different manufacturing processes than logic, is much cheaper per bit than integrated-with-logic storage, but it is also much slower.
  • There is a fundamental trade-off between capacity and latency. In any memory technology you can think of, building a larger memory makes it take longer to access.

Registers and main memory are two points in the memory-hierarchy spectrum. There are other points too: later in the semester, we will learn much more about caches, which fill in the space in between registers and main memory. You can also think of persistent storage (magnetic hard drives or flash memory SSDs) or even the Internet as further tiers beyond main memory.

Load and Store

RISC-V has special instructions for accessing memory. Let’s write a C program that accesses memory and compile it to assembly to see which instructions those are.

By “a C program that accesses memory,” I mean a C program that uses pointers. Here’s a simple one:

void mystery(int* x, int* y) {
    *x = *y;
}

Let’s reuse that command from last time to produce the assembly:

rv gcc -O1 -S loadstore.c -o loadstore.s

Here are the two instructions in the function’s body:

lw a5, 0(a1)
sw a5, 0(a0)

Those mnemonics lw and sw stand for load word and store word. Perhaps unsurprisingly, a load instruction implements a read from memory (the right-hand side in our C assignment) and a store instruction implements a write to memory (the left-hand side of the equals sign). Let’s look up the general form in our RISC-V reference sheet, linked from our RISC-V assembly page. Here’s what the entries say:

LW rd, offset(rs1)
SW rs2, offset(rs1)

In both cases, the second operand is the address. This operand uses the funky-looking offset(rs1) syntax. This means “get the value from register rs1, and add the constant value offset to it; treat the result as the address.” The reason these instructions have a built-in constant offset is because it is so incredibly common for code to need to add a small constant value to an address before doing the access. If you don’t need this offset, you can always use 0 for the offset.

The lw instruction puts the value into rd. The sw instruction takes the value from rs2 and stores it to memory at the computed address.

Access Sizes

If you look around at the reference card, you’ll notice that there are other instructions that start with l and s. For example, there’s lb, lh, lbu, lhu, lwu, and ld. These are also load and store instructions, but they work on values of different sizes (a.k.a. bit widths).

To understand these differently-sized loads and stores, the most important thing to keep in mind is that, in the 64-bit RISC-V ISA, all registers are always 64 bits. So these instructions work by reading and writing parts of those 64-bit values.

Here’s a full list of the available memory-access instructions:

  • ld and sd: Load or store a double word (64 bits).
  • lw, lwu, and sw: Load or store a word (32 bits).
  • lh, lhu, and sh: Load or store a half word (16 bits).
  • lb, lbu, and sw: Load or store a byte (8 bits).

The double-word instructions, ld and sd, load or store an entire registers. Those ones are pretty simple. If you store a register to address \(b\), the computer puts all 8 bytes from the register into memory at addresses \(b\) through \(b+7\) (using its byte order).

The other, smaller loads and stores are more a little more complicated. These use the least-significant bytes of each register. For example, lh and sh load and store the lowest 2 bytes of the 8-byte register. But the details require a little more nuance.

Extension and Truncation

Because our registers are always 8 bytes, narrower memory instructions essentially need to “convert” values between different bit widths. For example, instructions like lw and sw must convert between 8-byte and 4-byte values. This section is about how to do those conversions.

When you increase the number of bits, that’s called extension, and when you decrease the size, that’s called truncation. The goal in both situations is to avoid losing information whenever possible: that is, to keep the same represented integer value when converting between sizes.

Truncation

Truncation from \(m\) bits to \(n\) bits works by extracting the lowest (least significant) \(n\) bits from the value. There is, sadly, no way to avoid losing information in some cases. Here are some examples:

  • Let’s truncate the 64-bit value 0x00000000000000ab to 32 bits. In decimal, this number has the value 171. Truncating to 32 bits yields 0x000000ab. That’s also 171. Awesome!
  • Let’s truncate 0xffffffffffffffab to 32 bits. That’s the value -85 in two’s complement. Truncating yields 0xffffffab. That’s still -85. Excellent!
  • Now let’s truncate the bits 0x80000000000000ab (note the 8 in the most-significant hex digit). That’s a really big negative value, because the leading bit is 1. Truncating yields 0x000000ab, which represents 171. That’s bad—we now have a different value. But losing some information is inevitable when you lose some bits.

Extension

There are two modes for extending from \(m\) bits to \(n\) bits. Both work by putting the value in the \(m\) least-significant bits of the \(n\)-bit output. The difference is in what we do with the extra \(n-m\) bits, which are the most-significant (upper) bits in the output.

  • Zero extension fills the upper bits with zeroes.
  • Sign extension fills them with copies of the most-significant bit in the input. (That is, the sign bit.)

Let’s see some examples.

  • Let’s zero-extend 0xffffffab (remember, that’s -85) to 64 bits. The result is 0x00000000ffffffab a pretty big positive number (4294967211 in decimal). So we didn’t preserve the value.
  • Now let’s sign-extend the same value. Because the most significant bit in the 32-bit input is 1, we fill in the upper 32 bits with 1s. The output is 0xffffffffffffffab in hex, or -85 in decimal. So we preserved the value!

The moral of the story is: when extending unsigned numbers, use zero extension; when extending signed numbers, use sign extension.

Summary: Load and Store Instructions

RISC-V has many different load and store instructions, but they all have the same assembly format. Using 8-byte accesses as an example, here’s what they look like:

ld rd, offset(rs1)
sd rs2, offset(rs1)

The address for both loads and stores is the offset(rs1) part; the address we use is the constant offset plus the value of the register rs1. For loads, we include a destination register rd for the value we get from memory. For stores, we use a source register rs1 for the value we’ll send to memory.

The names of the instructions use this convention:

  • First, l or s indicates a load or store.
  • The next letter is the size. d is 8 bytes, w is 4 bytes, h is 2 bytes, and b is 1 byte.
  • For loads only, the letter u means unsigned load. Omitting the u means signed load. Stores do not have signed and unsigned versions.

Here are the rules for handling widths smaller than 8 bytes:

  • When storing, you truncate (take the lowest \(n\) bits from the register).
  • When loading, you extend. The instruction tells you whether you zero-extend or sign-extend:
    • The instructions with the u suffix are for unsigned numbers, and they zero-extend.
    • The instructions without this suffix are for signed numbers, and they sign-extend.

So, for example, lb loads a single byte and sign-extends it to 64 bits to put it in a register. lbu does the same thing, but it zero-extends instead.

Example: Store Word, Load Byte

Consider this short program:

addi x11, x0, 0x49C
sw x11, 0(x5)
lb x12, 0(x5)

What is the value of x12 at the end?

As always, it helps to translate the assembly to pseudocode to understand it. Here’s one attempt:

x11 = 0x49c;
store_word(x11, x5);
x12 = load_byte(x5);

So we don’t know what address x5 holds, but that’s the memory address. We’re storing the value 0x49c as a word (32 bits) to that address, and then loading the byte at that address. Let’s look at the two steps:

  1. First, we store the 64-bit value 0x49c. Since we use little endian, least-significant byte goes at the smallest address. Let’s say x5 holds the address \(a\). Then address \(a\) will hold the byte 0x9c, \(a+1\) holds the byte 0x04, and addresses \(a+2\) and \(a+3\) both hold zero.
  2. Next, we load the byte at the same address. The load instruction gets the byte 0x9c, and it sign-extends it to 64 bits, so the final value is 0xffffffffffffff9c, or -100 in decimal if we interpret it as a signed number.

To confirm your answer, try running this program in our in-browser RISC-V simulator.

Example: Translating from C

How would you translate this C program to assembly?

void mystery(int* p) {
    p[1] = p[0] / 2;
}

Assume (as is the case on our RISC-V target) that int is a 32-bit type. Assume also that the pointer p is stored in register x3 initially.

Here’s a reasonable translation:

lw x8, 0(x3)
srli x8, x8, 1
sw x8, 4(x3)

Here are some salient observations about this code:

  • The choice of x8 for the temporary value is arbitrary; any register would work.
  • It makes sense that this is a load instruction followed by a store instruction, because we need to read the value at &p[0] and write it back to address &p[1].
  • It also makes sense that we are using word-sized accesses (lw and sw) because that’s how you access 32 bits.
  • We use the signed version of the load (lw instead of lwu) to get sign-extension, not zero-extension. (If we used unsigned int instead, you would want lwu.)
  • The load uses offset 0 and the store uses offset 4, and they both use the same base address x3. It may help to remember that, for C arrays, p[0] means *(p + 0) and p[1] means *(p + 1). The latter, according to the rules of pointer arithmetic, adds an offset of 4 (the size of int) to the pointer.

To try this one out in the simulator, you may want to add some additional instructions to put some values into memory first. For example, these instructions will populate the p[0] and p[1] with some numbers:

addi x5, x5, 0x34
sw x5, 0(x3)
addi x5, x5, 0x10
sw x5, 4(x3)