In this tiny ebook I’m going to show you how to get started writing 6502 assembly language. The 6502 processor was massive in the seventies and eighties, powering famous computers like the BBC Micro, Atari 2600, Commodore 64, Apple II, and the Nintendo Entertainment System. Bender in Futurama has a 6502 processor for a brain. Even the Terminator was programmed in 6502.
So, why would you want to learn 6502? It’s a dead language isn’t it? Well, so’s Latin. And they still teach that. Q.E.D.
Seriously though, I think it’s valuable to have an understanding of assembly language. Assembly language is the lowest level of abstraction in computers - the point at which the code is still readable. Assembly language translates directly to the bytes that are executed by your computer’s processor. If you understand how it works, you’ve basically become a computer magician.
Then why 6502? Why not a useful assembly language, like x86? Well, I don’t think learning x86 is useful. I don’t think you’ll ever have to write assembly language in your day job - this is purely an academic exercise, something to expand your mind and your thinking. 6502 was originally written in a different age, a time when the majority of developers were writing assembly directly, rather than in these new-fangled high-level programming languages. So, it was designed to be written by humans. More modern assembly languages are meant to written by compilers, so let’s leave it to them. Plus, 6502 is fun. Nobody ever called x86 fun.
Hopefully the black area on the right now has three coloured “pixels” at the top left. (If this doesn’t work, you’ll probably need to upgrade your browser to something more modern, like Chrome or Firefox.)
So, what’s this program actually doing? Let’s step through it with the
debugger. Hit Reset, then check the Debugger checkbox to start the
debugger. Click Step once. If you were watching carefully, you’ll have
A= changed from
PC= changed from
Any numbers prefixed with
$ in 6502 assembly language (and by extension, in
this book) are in hexadecimal (hex) format. If you’re not familiar with hex
numbers, I recommend you read the Wikipedia
article. Anything prefixed with
is a literal number value. Any other number refers to a memory location.
Equipped with that knowledge, you should be able to see that the instruction
LDA #$01 loads the hex value
$01 into register
A. I’ll go into more
detail on registers in the next section.
Press Step again to execute the second instruction. The top-left pixel of
the simulator display should now be white. This simulator uses the memory
$05ff to draw pixels on its display. The values
$0f represent 16 different colours (
$00 is black and
$01 is white), so
storing the value
$01 at memory location
$0200 draws a white pixel at the
top left corner. This is simpler than how an actual computer would output
video, but it’ll do for now.
So, the instruction
STA $0200 stores the value of the
A register to memory
$0200. Click Step four more times to execute the rest of the
instructions, keeping an eye on the
A register as it changes.
We’ve already had a little look at the processor status section (the bit with
PC etc.), but what does it all mean?
The first line shows the
Y registers (
A is often called the
“accumulator”). Each register holds a single byte. Most operations work on the
contents of these registers.
SP is the stack pointer. I won’t get into the stack yet, but basically this
register is decremented every time a byte is pushed onto the stack, and
incremented when a byte is popped off the stack.
PC is the program counter - it’s how the processor knows at what point in the
program it currently is. It’s like the current line number of an executing
PC always starts there.
The last section shows the processor flags. Each flag is one bit, so all seven flags live in a single byte. The flags are set by the processor to give information about the previous instruction. More on that later. Read more about the registers and flags here.
Instructions in assembly language are like a small set of predefined functions. All instructions take zero or one arguments. Here’s some annotated source code to introduce a few different instructions:
Assemble the code, then turn on the debugger and step through the code, watching
X registers. Something slightly odd happens on the line
You might expect that adding
$c0 would give
$184, but this
processor gives the result as
$84. What’s up with that?
The problem is,
$184 is too big to fit in a single byte (the max is
and the registers can only hold a single byte. It’s OK though; the processor
isn’t actually dumb. If you were looking carefully enough, you’ll have noticed
that the carry flag was set to
1 after this operation. So that’s how you
In the simulator below type (don’t paste) the following code:
LDA #$80 STA $01 ADC $01
An important thing to notice here is the distinction between
ADC #$01 and
ADC $01. The first one adds the value
$01 to the
A register, but the
second adds the value stored at memory location
$01 to the
Assemble, check the Monitor checkbox, then step through these three
instructions. The monitor shows a section of memory, and can be helpful to
visualise the execution of programs.
STA $01 stores the value of the
register at memory location
ADC $01 adds the value stored at the
$01 to the
$80 + $80 should equal
because this is bigger than a byte, the
A register is set to
$00 and the
carry flag is set. As well as this though, the zero flag is set. The zero flag
is set by all instructions where the result is zero.
A full list of the 6502 instruction set is available here and here (I usually refer to both pages as they have their strengths and weaknesses). These pages detail the arguments to each instruction, which registers they use, and which flags they set. They are your bible.
TAX. You can probably guess what
TYAdo, but write some code to test your assumptions.
Yregister instead of the
SBC(subtract with carry). Write a program that uses this instruction.
So far we’re only able to write basic programs without any branching logic. Let’s change that.
6502 assembly language has a bunch of branching instructions, all of which
branch based on whether certain flags are set or not. In this example we’ll be
BNE: “Branch on not equal”.
First we load the value
$08 into the
X register. The next line is a label.
Labels just mark certain points in a program so we can return to them later.
After the label we decrement
X, store it to
$0200 (the top-left pixel), and
then compare it to the value
CPX compares the
value in the
X register with another value. If the two values are equal, the
Z flag is set to
1, otherwise it is set to
The next line,
BNE decrement, will shift execution to the decrement label if
Z flag is set to
0 (meaning that the two values in the
were not equal), otherwise it does nothing and we store
finish the program.
In assembly language, you’ll usually use labels with branch instructions. When assembled though, this label is converted to a single-byte relative offset (a number of bytes to go backwards or forwards from the next instruction) so branch instructions can only go forward and back around 256 bytes. This means they can only be used to move around local code. For moving further you’ll need to use the jumping instructions.
BEQ. Try writing a program that uses
BCS(“branch on carry clear” and “branch on carry set”) are used to branch on the carry flag. Write a program that uses one of these two.
The 6502 uses a 16-bit address bus, meaning that there are 65536 bytes of
memory available to the processor. Remember that a byte is represented by two
hex characters, so the memory locations are generally represented as
$ffff. There are various ways to refer to these memory locations, as detailed below.
With all these examples you might find it helpful to use the memory monitor to
watch the memory change. The monitor takes a starting memory location and a
number of bytes to display from that location. Both of these are hex values.
For example, to display 16 bytes of memory from
into Start and Length, respectively.
With absolute addressing, the full memory location is used as the argument to the instruction. For example:
STA $c000 ;Store the value in the accumulator at memory location $c000
All instructions that support absolute addressing (with the exception of the jump instructions) also have the option to take a single-byte address. This type of addressing is called “zero page” - only the first page (the first 256 bytes) of memory is accessible. This is faster, as only one byte needs to be looked up, and takes up less space in the assembled code as well.
This is where addressing gets interesting. In this mode, a zero page address is given, and then the value of the
X register is added. Here is an example:
LDX #$01 ;X is $01 LDA #$aa ;A is $aa STA $a0,X ;Store the value of A at memory location $a1 INX ;Increment X STA $a0,X ;Store the value of A at memory location $a2
If the result of the addition is larger than a single byte, the address wraps around. For example:
LDX #$05 STA $ff,X ;Store the value of A at memory location $04
This is the equivalent of zero page,X, but can only be used with
These are the absolute addressing versions of zero page,X and zero page,Y. For example:
LDX #$01 STA $0200,X ;Store the value of A at memory location $0201
Immediate addressing doesn’t strictly deal with memory addresses - this is the
mode where actual values are used. For example,
LDX #$01 loads the value
$01 into the
X register. This is very different to the zero page
LDX $01 which loads the value at memory location
$01 into the
Relative addressing is used for branching instructions. These instructions take a single byte, which is used as an offset from the following instruction.
Assemble the following code, then click the Hexdump button to see the assembled code.
The hex should look something like this:
a9 01 c9 02 d0 02 85 22 00
c9 are the processor opcodes for immediate-addressed
02 are the arguments to these instructions.
the opcode for
BNE, and its argument is
02. This means “skip over the next
two bytes” (
85 22, the assembled version of
STA $22). Try editing the code
STA takes a two-byte absolute address rather than a single-byte zero page
address (e.g. change
STA $22 to
STA $2222). Reassemble the code and look at
the hexdump again - the argument to
BNE should now be
03, because the
instruction the processor is skipping past is now three bytes long.
Some instructions don’t deal with memory locations (e.g.
INX - increment the
X register). These are said to have implicit addressing - the argument is
implied by the instruction.
Indirect addressing uses an absolute address to look up another address. The first address gives the least significant byte of the address, and the following byte gives the most significant byte. That can be hard to wrap your head around, so here’s an example:
In this example,
$f0 contains the value
$f1 contains the value
$cc. The instruction
JMP ($f0) causes the processor to look up the two
$cc) and put them together to form the
$cc01, which becomes the new program counter. Assemble and step
through the program above to see what happens. I’ll talk more about
the section on Jumping.
This one’s kinda weird. It’s like a cross between zero page,X and indirect.
Basically, you take the zero page address, add the value of the
X register to
it, then use that to look up a two-byte address. For example:
$02 contain the values
respectively. Think of
($00 + X). In this case
this simplifies to
($01). From here things proceed like standard indirect
addressing - the two bytes at
$07) are looked up
to form the address
$0705. This is the address that the
Y register was
stored into in the previous instruction, so the
A register gets the same
Y, albeit through a much more circuitous route. You won’t see this
Indirect indexed is like indexed indirect but less insane. Instead of adding
X register to the address before dereferencing, the zero page address
is dereferenced, and the
Y register is added to the resulting address.
In this case,
($01) looks up the two bytes at
$07. These form the address
$0703. The value of the
Y register is added
to this address to give the final address
The stack in a 6502 processor is just like any other stack - values are pushed
onto it and popped (“pulled” in 6502 parlance) off it. The current depth of the
stack is measured by the stack pointer, a special register. The stack lives in
$01ff. The stack pointer is initially
points to memory location
$01ff. When a byte is pushed onto the stack, the
stack pointer becomes
$fe, or memory location
$01fe, and so on.
Two of the stack instructions are
PLA, “push accumulator” and “pull
accumulator”. Below is an example of these two in action.
X holds the pixel colour, and
Y holds the position of the current pixel.
The first loop draws the current colour as a pixel (via the
pushes the colour to the stack, then increments the colour and position. The
second loop pops the stack, draws the popped colour as a pixel, then increments
the position. As should be expected, this creates a mirrored pattern.
Jumping is like branching with two main differences. First, jumps are not conditionally executed, and second, they take a two-byte absolute address. For small programs, this second detail isn’t very important, as you’ll mostly be using labels, and the assembler works out the correct memory location from the label. For larger programs though, jumping is the only way to move from one section of the code to another.
JMP is an unconditional jump. Here’s a really simple example to show it in action:
RTS (“jump to subroutine” and “return from subroutine”) are a
dynamic duo that you’ll usually see used together.
JSR is used to jump from
the current location to another part of the code.
RTS returns to the previous
position. This is basically like calling a function and returning.
The processor knows where to return to because
JSR pushes the address minus
one of the next instruction onto the stack before jumping to the given
RTS pops this location, adds one to it, and jumps to that location.
The first instruction causes execution to jump to the
init label. This sets
X, then returns to the next instruction,
JSR loop. This jumps to the
label, which increments
X until it is equal to
$05. After that we return to
the next instruction,
JSR end, which jumps to the end of the file. This
RTS can be used together to create modular code.
Now, let’s put all this knowledge to good use, and make a game! We’re going to be making a really simple version of the classic game ‘Snake’.
Even though this will be a simple version, the code will be substantially larger than all the previous examples. We will need to keep track of several memory locations together for the various aspects of the game. We can still do the necessary bookkeeping throughout the program ourselves, as before, but on a larger scale that quickly becomes tedious and can also lead to bugs that are difficult to spot. Instead we’ll now let the assembler do some of the mundane work for us.
In this assembler, we can define descriptive constants (or symbols) that represent numbers. The rest of the code can then simply use the constants instead of the literal number, which immediately makes it obvious what we’re dealing with. You can use letters, digits and underscores in a name.
Here’s an example. Note that immediate operands are still prefixed with a
The simulator widget below contains the entire source code of the game. I’ll explain how it works in the following sections.
After the initial block of comments (lines starting with semicolons), the first two lines are:
jsr init jsr loop
loop are both subroutines.
init initializes the game state, and
loop is the main game loop.
loop subroutine itself just calls a number of subroutines sequentially,
before looping back on itself:
loop: jsr readkeys jsr checkCollision jsr updateSnake jsr drawApple jsr drawSnake jsr spinwheels jmp loop
readkeys checks to see if one of the direction keys (W, A, S, D) was
pressed, and if so, sets the direction of the snake accordingly. Then,
checkCollision checks to see if the snake collided with itself or the apple.
updateSnake updates the internal representation of the snake, based on its
direction. Next, the apple and snake are drawn. Finally,
spinWheels makes the
processor do some busy work, to stop the game from running too quickly. Think
of it like a sleep command. The game keeps running until the snake collides
with the wall or itself.
The zero page of memory is used to store a number of game state variables, as
noted in the comment block at the top of the game. Everything in
$10 upwards is a pair of bytes representing a two-byte memory location
that will be looked up using indirect addressing. These memory locations will
all be between
$05ff - the section of memory corresponding to the
simulator display. For example, if
$01 contained the values
$02, they would be referring to the second pixel of the display (
$0201 - remember, the least significant byte comes first in indirect addressing).
The first two bytes hold the location of the apple. This is updated every time
the snake eats the apple. Byte
$02 contains the current direction.
4 down, and
8 left. The reasoning behind these numbers will
become clear later.
$03 contains the current length of the snake, in terms of bytes
in memory (so a length of 4 means 2 pixels).
init subroutine defers to two subroutines,
initSnake sets the snake direction, length, and then
loads the initial memory locations of the snake head and body. The byte pair at
$10 contains the screen location of the head, the pair at
$12 contains the
location of the single body segment, and
$14 contains the location of the
tail (the tail is the last segment of the body and is drawn in black to keep
the snake moving). This happens in the following code:
lda #$11 sta $10 lda #$10 sta $12 lda #$0f sta $14 lda #$04 sta $11 sta $13 sta $15
This loads the value
$11 into the memory location
$10, the value
$14. It then loads the value
$15. This leads to memory like this:
0010: 11 04 10 04 0f 04
which represents the indirectly-addressed memory locations
$040f (three pixels in the middle of the display). I’m labouring this point,
but it’s important to fully grok how indirect addressing works.
The next subroutine,
generateApplePosition, sets the apple location to a
random position on the display. First, it loads a random byte into the
$fe is a random number generator in this simulator). This is
$00. Next, a different random byte is loaded into the
accumulator, which is then
AND-ed with the value
$03. This part requires a
bit of a detour.
The hex value
$03 is represented in binary as
performs a bitwise AND of the argument with the accumulator. For example, if
the accumulator contains the binary value
10101010, then the result of
00000011 will be
The effect of this is to mask out the least significant two bits of the accumulator, setting the others to zero. This converts a number in the range of 0–255 to a number in the range of 0–3.
After this, the value
2 is added to the accumulator, to create a final random
number in the range 2–5.
The result of this subroutine is to load a random byte into
$00, and a random
number between 2 and 5 into
$01. Because the least significant byte comes
first with indirect addressing, this translates into a memory address between
$05ff: the exact range used to draw the display.
Nearly all games have at their heart a game loop. All game loops have the same basic form: accept user input, update the game state, and render the game state. This loop is no different.
The first subroutine,
readKeys, takes the job of accepting user input. The
$ff holds the ascii code of the most recent key press in this
simulator. The value is loaded into the accumulator, then compared to
(the hex code for W),
$73 (S) and
$61 (A). If any of these
comparisons are successful, the program branches to the appropriate section.
Each section (
rightKey, etc.) first checks to see if the current
direction is the opposite of the new direction. This requires another little detour.
As stated before, the four directions are represented internally by the numbers
1, 2, 4 and 8. Each of these numbers is a power of 2, thus they are represented
by a binary number with a single
1 => 0001 (up) 2 => 0010 (right) 4 => 0100 (down) 8 => 1000 (left)
BIT opcode is similar to
AND, but the calculation is only used to set
the zero flag - the actual result is discarded. The zero flag is set only if the
result of AND-ing the accumulator with argument is zero. When we’re looking at
powers of two, the zero flag will only be set if the two numbers are not the
same. For example,
0001 AND 0001 is not zero, but
0001 AND 0010 is zero.
So, looking at
upKey, if the current direction is down (4), the bit test will
BNE means “branch if the zero flag is clear”, so in this case we’ll
illegalMove, which just returns from the subroutine. Otherwise, the
new direction (1 in this case) is stored in the appropriate memory location.
The next subroutine,
checkCollision, defers to
checkAppleCollision just checks to see if the two
bytes holding the location of the apple match the two bytes holding the
location of the head. If they do, the length is increased and a new apple
position is generated.
checkSnakeCollision loops through the snake’s body segments, checking each
byte pair against the head pair. If there is a match, then game over.
After collision detection, we update the snake’s location. This is done at a high level like so: First, move each byte pair of the body up one position in memory. Second, update the head according to the current direction. Finally, if the head is out of bounds, handle it as a collision. I’ll illustrate this with some ascii art. Each pair of brackets contains an x,y coordinate rather than a pair of bytes for simplicity.
0 1 2 3 4 Head Tail [1,5][1,4][1,3][1,2][2,2] Starting position [1,5][1,4][1,3][1,2][1,2] Value of (3) is copied into (4) [1,5][1,4][1,3][1,3][1,2] Value of (2) is copied into (3) [1,5][1,4][1,4][1,3][1,2] Value of (1) is copied into (2) [1,5][1,5][1,4][1,3][1,2] Value of (0) is copied into (1) [0,5][1,5][1,4][1,3][1,2] Value of (0) is updated based on direction
At a low level, this subroutine is slightly more complex. First, the length is
loaded into the
X register, which is then decremented. The snippet below
shows the starting memory for the snake.
Memory location: $10 $11 $12 $13 $14 $15 Value: $11 $04 $10 $04 $0f $04
The length is initialized to
X starts off as
LDA $10,x loads the
STA $12,x stores this value into
decremented, and we loop. Now
2, so we load
$12 and store it into
$14. This loops while
X is positive (
BPL means “branch if positive”).
Once the values have been shifted down the snake, we have to work out what to
do with the head. The direction is first loaded into
LSR means “logical
shift right”, or “shift all the bits one position to the right”. The least
significant bit is shifted into the carry flag, so if the accumulator is
LSR it is
0, with the carry flag set.
To test whether the direction is
8, the code continually
shifts right until the carry is set. One
LSR means “up”, two means “right”,
and so on.
The next bit updates the head of the snake depending on the direction. This is probably the most complicated part of the code, and it’s all reliant on how memory locations map to the screen, so let’s look at that in more detail.
You can think of the screen as four horizontal strips of 32 × 8 pixels.
These strips map to
The first rows of pixels are
As long as you’re moving within one of these horizontal strips, things are
simple. For example, to move right, just increment the least significant byte
$0201). To go down, add
$0220). Left and up are the reverse.
Going between sections is more complicated, as we have to take into account the
most significant byte as well. For example, going down from
$02e1 should lead
$0301. Luckily, this is fairly easy to accomplish. Adding
$01 and sets the carry bit. If the carry bit was set, we know we
also need to increment the most significant byte.
After a move in each direction, we also need to check to see if the head
would become out of bounds. This is handled differently for each direction. For
left and right, we can check to see if the head has effectively “wrapped
around”. Going right from
$021f by incrementing the least significant byte
would lead to
$0220, but this is actually jumping from the last pixel of the
first row to the first pixel of the second row. So, every time we move right,
we need to check if the new least significant byte is a multiple of
is done using a bit check against the mask
$1f. Hopefully the illustration
below will show you how masking out the lowest 5 bits reveals whether a number
is a multiple of
$20 or not.
$20: 0010 0000 $40: 0100 0000 $60: 0110 0000 $1f: 0001 1111
I won’t explain in depth how each of the directions work, but the above explanation should give you enough to work it out with a bit of study.
Because the game state is stored in terms of pixel locations, rendering the
game is very straightforward. The first subroutine,
drawApple, is extremely
simple. It sets
Y to zero, loads a random colour into the accumulator, then
stores this value into
$00 is where the location of the apple is
($00),y dereferences to this memory location. Read the “Indirect
indexed” section in Addressing modes for more details.
drawSnake. This is pretty simple too - we first undraw the tail
and then draw the head.
X is set to the length of the snake, so we can index
to the right pixel, and we set
A to zero then perform the write using the
indexed indirect addressing mode. Then we reload
X to index to the head, set
A to one and store it at
$10 stores the two-byte location of
the head, so this draws a white pixel at the current head position. As only
the head and the tail of the snake move, this is enough to keep the snake
The last subroutine,
spinWheels, is just there because the game would run too
fast otherwise. All
spinWheels does is count
X down from zero until it hits
zero again. The first
dex wraps, making