OverTheWire Advent Bonanza 2019: day-15 self replicating toy

I took part in the OverTheWire Advent Bonanza 2019. All in all, the CTF was very pleasant because the challenges were interesting to solve, there was quite some time available as well (I spent about two weeks on and off on it) and I also managed to solve a few challenges as well.

The following write-up is about the challenge from day 15: self replicating toy.

Description

Can you design your own self-replicating toy?

Along with an IP address/port there is also the file chal.c given: https://advent2019.s3.amazonaws.com/dc3f15513e6d0ca076135b4a05fa954d62938670ddd7db88168d68c00e488b87-chal.c

The file chal.c contains the implementation of a custom stack-based virtual machine. The machine operates on three main stacks: data, code and output. There are also 32 "scratch stacks" which are called "functions". The VM operates on instructions given on the code stack. Every instruction has one byte.

The goal of the challenge is to construct a quine for the virtual machine. A quine is a program fragment that outputs itself. In case of the VM, this means that the output stack contains the same bytes at the end of the execution as the code stack in the beginning of the execution.

Analysis

The main method of the challenge contains the following code:

puts("Enter your Assemblium sequence:");
unsigned char *user_code = malloc(length);
for (int i = 0; i < length; i++) {
  read(0, user_code + i, 1);
}
vm_state vm = new_vm_state();
for (int i = length - 1; i >= 0; i--) {
  push_stack(vm.code, user_code[i]);
}
while (vm.code->size > 0) {
  execute_one_inst(&vm);
}
if (length != vm.output->size) {
  goto fail;
}
for (int i = 0; i < length; i++) {
  if (vm.output->data[i] != user_code[i]) {
    goto fail;
  }
}
print_flag();

The Assemblium sequence (i. e. instructions for the custom instruction set) are read from stdin and pushed to the VM's code stack. Then the instructions are executed one by one using the execute_one_inst(&vm) function. If the output of the VM, i. e. the output stack, contains the same bytes as the given user code, the flag is printed. This is called a quine (see also https://en.wikipedia.org/wiki/Quine_(computing)):

A quine is a computer program which takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are "self-replicating programs", "self-reproducing programs", and "self-copying programs".

Quines are usually built around the following principle (see also https://cs.lmu.edu/~ray/notes/quineprograms/):

The classic way to produce a self-printing program has two steps:

Initialize a string variable, with a placeholder for interpolation.

Print the string, interpolating the string into itself.

Therefore we have to analyze the instruction set to find out how we can apply these principles.

To get an inspiration, let us look at quines in other assembly languages (see also https://codegolf.stackexchange.com/a/5742). The following quine, for example, is written in gas for x86 Linux:

.globl main
main:movw $34,B+86
push $B+1
call puts
call puts
pop B
ret
.data
B:.ascii"
.globl main
main:movw $34,B+86
push $B+1
call puts
call puts
pop B
ret
.data
B:.ascii"

We can see that the previously mentioned principles are applied: There is a string (B) that contains the program code that prints the string. To interpolate the string into itself, puts() is simply called twice. The idea of this quine to have a string and print the string twice can be applied to our own instruction set.

Basic idea

To abstract the idea of the GAS x86 quine: we have to build a string that contains two print instructions, and then print that string two times.

Let's first imagine an instruction set in which the first two bytes of the given instructions are always interpreted as a literal (i. e. as a string) and the two following instructions are interpreted according to the semantics of the instruction set (i. e. as code). This is a simplification of the given instruction set but we can iterate from there once we understand how to make a quine in this simplified environment. If the print instruction in that instruction set is 0xb0, the quine would be simply 0xb0 0xb0 0xb0 0xb0. The first two bytes are the literal which would be put on the data stack. The next two bytes are interpreted as print what is on the stack (but without destroying it). The result is that 0xb0 0xb0 0xb0 0xb0 is printed (i. e. written to the output stack).

So far so good but our instruction set is far more flexible so we have to be more creative.

First, not the first n bytes are interpreted as a literal but only bytes that are less than 0x80, i. e. where the MSB is not set. This means that the instruction set is split into two halves: The first half (everything below 0x80) is interpreted as a literal byte and copied to the data stack, the second half (everything >= 0x80) is interpreted as code and the semantics depend on the byte as described above. This means that our literal cannot include the print instruction as-is because then it would not be a literal anymore. How can we solve this? We encode bytes which are >= 0x80 so that they are interpreted as data. So instead of using syntax to mark the bytes as data, we encode the bytes. This makes things slightly more complicated. The x86 quine has to deal with quotes being special as well but that's rather easy: movw $34,B+86 inserts a quote at the end of the string. The string is then printed as-is using the puts() function.

We do something similar: We first print the data bytes as-is, then we print the instructions that printed the data bytes.

In an abstract way the quine looks as follows:

<encoded instructions E(I)> || <instructions I: print the data stack, decode and print the data stack>

When the program is executed, the following happens:

The data in the beginning of the program (i. e. the encoded instructions) are copied to the data stack.
The instructions after the data are executed. These instructions do the following:
The content of the data stack is printed.
The content of the data stack is decoded and printed.

This way, we have created a self-replication program: The encoded instructions are printed first, then the encoded instructions are decoded and printed which prints the actually executed instructions. Our program replicated itself.

Note how this is similar to the GAS x86 quine, except that the encoded instructions are at the beginning and not at the end and that we have to encode the instructions instead of simply using syntax (quotes in the case of GAS) to give the instructions as literal and not as code.

Solution

To implement the concepts we have to solve the following problems:

Define a variable-length encoding scheme that takes bytes between 0x00 and 0xff and produces bytes between 0x00 and 0x7f. This encoding scheme has to be reversible with the given instruction set (it is not clear if the instruction set is Turing complete so we might be limited).
Define instructions that print the content of the data stack, decode the data stack using the scheme from (1) and print the decoded result.

Instruction set

First, we have to analyze the instruction set more deeply. Every instruction takes one byte. The instruction byte is interpreted as follows:

< 0x80: pushed as-is to data
0x80: pop from data, XOR 0x80, push to data
0x81: pop from data, if 0 push 0xff, else push 0
0x82: pop a from data, pop b from data, push a & b to data
0x83: pop a from data, pop b from data, push a | b to data
0x84: pop a from data, pop b from data, push a ^ b to data
0x90: pop a from data, pop b from data, push a to to data, push b to data (= swap a and b)
0x91: pop from data, push twice to data
0xa0: pop function index from data, pop from data unless value is 0xa1, push to function index's data
0xb0: pop from data, push to output
=0xc0, <0xe0: push all function index's (index = inst - 0xc0) data to

code (= call a function)
= 0xe0: pop from data, if != 0: push all function index's (index =

inst - 0xe0) data to code (= call a function)

Implementation

Encoder

Our first task is to come up with an encoding scheme. The encoding scheme will encode every byte into three bytes which are guaranteed to be below 0x80. This encoding is not very efficient in general: The encoded data is three times bigger than the original data but it is relatively easy to implement.

The encoding works as follows:

Every byte b that is below 0x80 will be encoded as 0x01 0x10 b.
Every byte b that is 0x80 or above will be XORed with 0xff and encoded as 0x11 0x11 b', where b' is the XORed byte.

Example: The bytes 0x1f 0xf1 are encoded as 0x01 0x10 0x1f 0x11 0x11 0x0e.

The Python code for the encoding looks as follows:

def encode(bytecode):
    result = bytearray()

    for b in bytecode:
        if b == 0:
            raise ValueError("0x00 is not supported!")
        elif b < 0x80:
            result.append(0x01)
            result.append(0x10)
            result.append(b)
        else:
            result.append(0x11)
            result.append(0x11)
            result.append(b ^ 0xff)

    result.append(0x00)  # end of encoded sequence

    return bytes(result)

Note that 0x00 cannot be encoded because it marks the end of the encoded sequence. We need to have such an end mark because of the variable length property. However, this is a viable limitation because it is possible to avoid having to encode null bytes.

This encoding scheme can be quite easily decoded with the following instructions:

0x84: pop a from data, pop b from data, push a ^ b to data
0x81: pop from data, if 0 push 0xff, else push 0

Thus, the instructions 0x84 0x81 0x84 reverse the encoding.

Example:

The bytes 0x01 0x10 0x1f 0x11 0x11 0x0e are given on the data stack.

0x84 pops 0x01 and 0x10 from the stack and pushes 0x01 ^ 0x10 = 0x11 to the stack. 0x81 pops 0x11 from the stack, since it is not 0 it pushes 0 to the stack. 0x84 pops 0 and 0x1f from the stack and pushes 0x1f ^ 0x00 = 0x1f to the stack. 0x1f is now on the stack which was the first byte in the original sequence.
0x84 pops 0x11 and 0x11 from the stack and pushes 0x11 ^ 0x11 = 0x00 to the stack. 0x81 pops 0x00 from the stack, since it is 0 it pushes 0xff to the stack. 0x84 pops 0xff and 0x0e from the stack and pushes 0x0e ^ 0xff = 0xf1 to the stack. 0xf1 is now on the stack which was the second byte in the original sequence.

Jumps / loops

To run the decode instructions 0x84 0x81 0x84 in a loop, we resort to the following trick: The virtual machine supports the notion of "functions", i. e. 32 separate stacks that can hold code which can be executed more than once. To decode all bytes until we reach 0x00 we create two functions which call each other ("ping pong"). There are special instructions which conditionally call a function. I. e. the function is only called if the top of the data stack is not 0x00. We can use this to end the decoding by giving 0x00 on the stack at the end of the encoded sequence. (In x86 assembly, we would use cmp and jnz instead of the functions.)

Decoding and printing

Now that we know how to encode instructions, it is time to define instructions that print the content of the data stack, decode the data stack using the scheme from (1) and print the decoded result.

Let's start with instructions that print everything that is on the data stack. As described in the Jumps / loops section, we will use functions because the language does not support jumps or loops.

First, we define the the ping function. This function executes the following instructions: 0x91 0xb0 0xe3

0x91: Duplicate the value on the data stack
0xb0: Pop value from data stack, push to output stack (this is basically the "puts" in our virtual machine)
0xe3: Pop value from data stack, if value is not 0x00 call function 3 (the pong function)

We duplicate the value on the stack so that we can print it once (which is destructive) and still be able to check if it is not 0x00.

Second, we define the pong function. This function executes the following instructions: 0x91 0xb0 0xe2

This function does basically the same except that it calls function 2 instead of 3, i. e. the ping function.

Now that we know how the function bodies should look like, we need to know how functions are created.

Creating a function works like this: If the 0xa0 instruction is encountered, a value is popped from the data stack. This value is the index of the function (indices 0 to 31 are supported). Then, all values from the data stack are popped as long as 0xa1 is not encountered. Be aware that functions are created in reverse order due to the stack principle. I. e. the instruction that is executed at the end must go first. Further, we have to create the function code on the data stack.

In order to get instructions on the data stack we have to encode them (remember: only values < 0xb0 are copied as-is to the data stack). This is relatively simple: The 0x80 instruction pops the value from the data stack, XORs it with 0x80 and pushes the result.

The ping function looks like this when it is encoded: 0x21 0x80 0x63 0x80 0x30 0x80 0x11 0x80 0x02 0xa0

0x21 ^ 0x80 = 0xa1: marks the end of the function
0x63 ^ 0x80 = 0xe3: call function 3
0x30 ^ 0x80 = 0xb0: print
0x11 ^ 0x80 = 0x91: duplicate
0x02: index of function
0xa0: create function

The pong function can be encoded in a similar fashion.

To call the print ping function, one instruction is enough: 0xc2 calls the function with index 2, i. e. the print ping function.

Encoding and printing works similar. The difference is that we have to include the decoding instructions before we print it.

Putting everything together

Now we create the final payload:

We encode the instructions using the encoding defined earlier and wrap the data into a function so that we can use it twice (once for printing as-is, once for decoding and printing).
We append the instructions for decoding and printing.

The final payload is then (without the newlines):

218001100111115f01102111114f01101101101111117
b11117f11114f11113e01102111117f01106311117f01
103011117f01101111117f01100211115f01102111117
f01106211117f01103011117f01101111117f01100311
115f11113d11113e01102111117f01106511117f01101
111117f01100411115f01102111117f01104411117f01
103011117f01100411117f01100111117f01100411117
f01100511115f11113b0001a021b011118480b0c12180
63803080118002a0218062803080118003a0c2c121806
580118004a021804480308004800180048005a0c4

A commented solution that creates the payload:

import binascii
import re

from pwn import *

context.log_level = "debug"

instructions_before = """
#### create literal as a function

21      # 0x80 ^ 0xa1
80      # 0xa1 is on->data
"""

instructions_to_encode = """
89
"""

#00      # end instructions  # bottom of the stack

instructions_after = """
01      # index of function
a0      # upon detection of a0, the index is popped as well (= 0x01), everything is copied into the function stack until 0xa1 is reached on ->data, then ->data is empty

#### print 0x21, 0x80

21
b0      # print 0x21

# avoid 00 by encoding it as 0x11 0x11 and XORing it
11      # 0x11 is on ->data
11      # 0x11 0x11 is on ->data
84      # 0x00 is on ->data
80      # 0x80 is on ->data
b0      # print 0x80

# copy literal from function stack to ->data (i.e. call the function)
# literal must end with 0x00 for the print ping pong

#### print the encoded instructions with a print ping pong

c1      # the whole literal function is pushed to ->code; if the function only contains instructions < 0x80, they will be copied to ->data

# ping

21
80      # 0xa1 is on ->data (end of function)

63
80      # 0xe3 is on ->data (call function 3)

30
80      # 0xb0 is on ->data (print)

11
80      # 0x91 is on ->data (duplicate)

02
a0      # create function (i.e. print and call 0xe3; if 0x00 was popped [which was duplicated], function is not called)

# pong

21
80      # 0xa1 is on ->data (end of function)

62
80      # 0xe2 is on ->data (call function 2)

30
80      # 0xb0 is on ->data (print)

11
80      # 0x91 is on ->data (duplicate)

03
a0      # create function (i.e. print and call 0xe2; if 0x00 was popped [which was duplicated], function is not called)

# print encoded instructions

c2      # call the print function (prints everything until and including 0x00 in vm->data)

#### decode the instructions and print them with a print ping pong

c1      # the whole literal function is pushed to ->code; if the function only contains instructions < 0x80, they will be copied to ->data

# ping

21
80      # 0xa1 is on ->data (end of function)

65
80      # 0xe5 is on ->data (call function 5)

11
80      # 0x91 is on ->data (duplicate)

04
a0      # create function (i.e. print and call 0xe5; if 0x00 was popped [which was duplicated], function is not called)

# pong

21
80      # 0xa1 is on ->data (end of function)

44
80      # 0xc4 is on ->data (call function 4)

30
80      # 0xb0 is on ->data (print)

04
80      # 0x84 is on ->data

01
80      # 0x81 is on ->data

04
80      # 0x84 is on ->data

05
a0      # create function (i.e. print and call 0xe4; if 0x00 was popped [which was duplicated], function is not called)

# decode and print instructions

c4
"""

def assemble(instructions):
    # print(instructions)
    # remove comments
    instructions = re.sub(r"#.*", "", instructions)
    # remove whitespaces
    instructions = re.sub(r"\s+", "", instructions)
    # print(instructions)
    return binascii.unhexlify(instructions)


def encode(bytecode):
    result = bytearray()

    for b in bytecode:
        if b == 0:
            raise ValueError("0x00 is not supported!")
        elif b < 0x80:
            result.append(0x01)
            result.append(0x10)
            result.append(b)
        else:
            result.append(0x11)
            result.append(0x11)
            result.append(b ^ 0xff)

    result.append(0x00)  # end of encoded sequence

    return bytes(result)


bytecode = (
    assemble(instructions_before)
    + encode(assemble(instructions_to_encode))
    + assemble(instructions_after)
)
import binascii
print(binascii.hexlify(bytecode))

#vm = process("./vm-chal")
vm = remote('3.93.128.89', 1214)
vm.recvuntil("Length of")
vm.sendline(str(len(bytecode)))
vm.recvuntil("Enter your")
vm.write(bytecode)
vm.recvline()
vm.recvline()
vm.recvline()
vm.recvline()
vm.close()

Sending the payload to the server gives us the flag back: AOTW{G0od_job_writing_y0ur_v3ry_0wN_quin3!}