unleash the angr

angr, as described by it’s creators is a binary analysis framework that combines static and symbolic analysis making it applicable to a variety of tasks. It is a really ambitious project that aims to replace the best in the industry some day. Although, it may not be able to do everything that Hex-Ray’s IDA does with a “pretty” GUI, angr’s strength lies in the fact that it choses to do things differently and thus in choosing to do so is able to do things that even IDA cannot.

angr for CTFs

As a CTF player, I ventured to learn angr to use it while solving CTF challenges and it so happens that angr is really good at it, which is clearly evident from the examples they provided.

Most of the Reversing challenges were part of some great CTFs and they really do illustrate the abilities and strengths of angr while at the same time making it easy for people to try out angr. I really encourage you to check them out.

jakespringer/angr-ctf

While the official examples are great, they are to be honest are a bit disorganized, that is how I ended up exploring the depths of angr abilities in solving CTF challenges using the angr-ctf repo by jakespringer, it is a little rough on the edges but still manages to be relevant and easy to understand with a smooth learning curve. While I encourage you to try out the challenges on your own, I will be illustrating what the challenges in angr-ctf are all about with solutions of my own in this article. Hopefully the first couple will be more than enough to get you going.

The solutions below have been aggregated and are available here for you to check out. I will be including most of them in this article though.

the first couple

I am pretty sure this is not the kind of script that is generally shown to someone new to angr, it is usually something with a lot fewer lines, written in such a manner that angr has to figure out everything just to show off angr’s abilities, but I chose to show you this script first because this is the kind of script that I ended up using the most.

First I have assigned two variables win and lose, both containing addresses to locations where “Good Job!” and “Try Again!” are printed respectively, note that the providing the lose address is optional, but still does reduce execution time and is recommended. Then we open the binary as the project, following which we declare a “symbolic variable”, by simply overlooking the binary we know that the input is of 8 bytes (scanf’s argument), thus the size. The variable is declared using the claripy wrapper function and is a “Bit Vector String”

I have chosen not to go into the details of what a symbolic variables and simulation states are (maybe some other time), for now I highly recommend reading the official documentation to get a high level understanding.

After declaring the variable we then create a state, which is from where you can consider angr to be beginning the execution when we do sm.explore in the lines that follow. After creating a state, we can choose to assign values to registers and memory locations at that state of the program using state.memory.store(), the first parameter is supposed to be a address which can be a hard-coded one from the .data section of the binary or as in this case a stack address, which can be accessed using state.regs.ebp. The second parameter is the value to be stored at that address.

Now we get to the part where we create a simulation manager and then leave the rest to angr with sm.explore, and as noted before the avoid address is optional here.

If angr is able to figure out an input such that the program reaches our win address than the sm.found[0] should contain our required solution.

In the final line of the script we print out the input as a string, extracting it from the found state.

I hope you will eventually be able to see the advantage of using such a script, where we figure out how and where our input is stored and then declare and store a symbolic variable there. Doing so allows us to bypass a lot of complications in most of the binaries encountered while solving CTF challenges, I will make sure to include a few examples in this article itself to illustrate this.

Another advantage of a script like this is that we can use it to figure out what values at specific locations should be for segments of a program so that we are able to reach a specific branch. The only other complication that arises while doing so is that we have to assign all the other variables and registers to be what they would have been at normal execution since we are using a black state here and which is in a very literal sense blank.

The notable difference here in the second one is that, I have not used a declared symbolic variable but rather sys.stdin.fileno() and an entry_state instead of black state to reverse the program.

the third one

Now here, you would have noticed that there is not much difference in how I solved this one and the ones before, that is one other thing with angr, you can always just reverse engineer a bit more to make the script simpler. I have added the optimization option  add_options={angr.options.LAZY_SOLVES} here though.

Also, note that this is not the intended solution. :p. You can check out the solutions directory of angr-ctf for the intended ones.

the fourth one

Here notably I have directly stored the symbolic variables into registers.

the fifth one

Something to note in this one, and something I particularly had a lot of trouble with is the endianness. Which can be specified as seen above.

the sixth one

Here the symbolic variables are stored to hard-coded addresses of the .data section. The challenge in this binary stems from having to figure out where the input is stored.

the seventh one

This one is notably different and important as the input is stored in a dynamic memory location in this challenge. The problem is that, at the state angr starts executing, the malloc has not really been called yet and thus there is nothing in the addresses where the the input is taken from.

To take care of this and to bypass making angr do malloc, we can store fake addresses, I have chosen them in such a manner that they happen to be where the heap would have been in normal execution. I have then stored our input symbolic variables into those fake addresses.

We then simply follow up with the usual routine.

the eighth one

Here, we are simulating a file system in angr. To do so we assign a file memory using  angr.state_plugins.SimSymbolicMemory() then we provide it with a state to work on by using .set_state().

We can then store into this file by using .store(), for which the first parameter is the index from which you are to start storing followed by the content. We then continue to create the a file using the assigned memory using angr.storage.SimFile(), which I believe have self explanatory parameters here.

The next few lines of code that follow are the ones that actually create the “file-system”, finally we set the filesystem of our initial state as the one we just created right now.

Then we continue with the usual routine.

Yet again, if we were to delve a little deeper into what the program is doing, for example once the password is retrieved from the file, then we can infact use the very first script to get the required input.

the ninth one

In this challenge we reach the point where we begin to compensate for the limitations of what angr can do on its own.

cont …