The “What” and “Why” of Reverse Engineering

The “What” of Reverse Engineering

Reverse engineering in software is usually described as the process of analyzing a binary and understanding its working, to either audit its goals or to replicate them. This usually involves using several tools and techniques to translate machine code into a high level programming language. But don’t be fooled, even in CTFs this might not be enough as it’s scope and applications are limitless.

Reverse engineering by the authors of Practical Reverse Engineering is defined as the process of understanding a system, i. e. a problem solving process. This is the definition that has made the most sense to me in the practical situations I have encountered reverse engineering in. It can be said that the process has a broad meaning and it encompasses a lot of things even in software. Hence I find the term “Problem Solving” to be closer to the real thing.

The “Why” of Reverse Engineering

Learning to reverse engineer and doing so will help you to gain a deeper and more thorough understanding of the applications and operating systems you use. The understanding of how a particular set of data can make a computer do all kinds of things opens new doors of opportunity for further learning and application. You are also likely to encounter several situations where, reverse engineering is going to be an helpful skill to possess.

A practical and real life example of reverse engineering that you will most probably come across is having to work with someone else’s undocumented, badly written code. This can be a painful and troublesome experience for an average coder but having worked with all kinds of poorly decompiled code as a reverse engineer, these situations can turn out to be a walk in the park and even a fun little challenge.
A similar case can be of when you have lost the source code and all you have is the compiled binary or when you receive suspicious software and you are doubtful of its intentions.
In such commonplace situations, having mastery over the skill of reverse engineering is going to be of great advantage.

Professionally in the field of cybersecurity, reverse engineering is used by Malware Analysts to analyze and develop signatures that help in detecting malicious software and viruses.
It is used to detect vulnerabilities in a system which can be then used to exploit the said system, for example – cracking a game/software. This knowledge can then be used to prevent misuse or unauthorized use of systems. In several cases, analysis/reverse engineering of malicious software like ransomware will actually help us beat the bad guys and save the day!

 

unleash the angr

angr, as described by it’s creators is a binary analysis framework that combines static and symbolic analysis making it applicable to a variety of tasks. It is a really ambitious project that aims to replace the best in the industry some day. Although, it may not be able to do everything that Hex-Ray’s IDA does with a “pretty” GUI, angr’s strength lies in the fact that it choses to do things differently and thus in choosing to do so is able to do things that even IDA cannot.

angr for CTFs

As a CTF player, I ventured to learn angr to use it while solving CTF challenges and it so happens that angr is really good at it, which is clearly evident from the examples they provided.

Most of the Reversing challenges were part of some great CTFs and they really do illustrate the abilities and strengths of angr while at the same time making it easy for people to try out angr. I really encourage you to check them out.

jakespringer/angr-ctf

While the official examples are great, they are to be honest are a bit disorganized, that is how I ended up exploring the depths of angr abilities in solving CTF challenges using the angr-ctf repo by jakespringer, it is a little rough on the edges but still manages to be relevant and easy to understand with a smooth learning curve. While I encourage you to try out the challenges on your own, I will be illustrating what the challenges in angr-ctf are all about with solutions of my own in this article. Hopefully the first couple will be more than enough to get you going.

The solutions below have been aggregated and are available here for you to check out. I will be including most of them in this article though.

the first couple

I am pretty sure this is not the kind of script that is generally shown to someone new to angr, it is usually something with a lot fewer lines, written in such a manner that angr has to figure out everything just to show off angr’s abilities, but I chose to show you this script first because this is the kind of script that I ended up using the most.

First I have assigned two variables win and lose, both containing addresses to locations where “Good Job!” and “Try Again!” are printed respectively, note that the providing the lose address is optional, but still does reduce execution time and is recommended. Then we open the binary as the project, following which we declare a “symbolic variable”, by simply overlooking the binary we know that the input is of 8 bytes (scanf’s argument), thus the size. The variable is declared using the claripy wrapper function and is a “Bit Vector String”

I have chosen not to go into the details of what a symbolic variables and simulation states are (maybe some other time), for now I highly recommend reading the official documentation to get a high level understanding.

After declaring the variable we then create a state, which is from where you can consider angr to be beginning the execution when we do sm.explore in the lines that follow. After creating a state, we can choose to assign values to registers and memory locations at that state of the program using state.memory.store(), the first parameter is supposed to be a address which can be a hard-coded one from the .data section of the binary or as in this case a stack address, which can be accessed using state.regs.ebp. The second parameter is the value to be stored at that address.

Now we get to the part where we create a simulation manager and then leave the rest to angr with sm.explore, and as noted before the avoid address is optional here.

If angr is able to figure out an input such that the program reaches our win address than the sm.found[0] should contain our required solution.

In the final line of the script we print out the input as a string, extracting it from the found state.

I hope you will eventually be able to see the advantage of using such a script, where we figure out how and where our input is stored and then declare and store a symbolic variable there. Doing so allows us to bypass a lot of complications in most of the binaries encountered while solving CTF challenges, I will make sure to include a few examples in this article itself to illustrate this.

Another advantage of a script like this is that we can use it to figure out what values at specific locations should be for segments of a program so that we are able to reach a specific branch. The only other complication that arises while doing so is that we have to assign all the other variables and registers to be what they would have been at normal execution since we are using a black state here and which is in a very literal sense blank.

The notable difference here in the second one is that, I have not used a declared symbolic variable but rather sys.stdin.fileno() and an entry_state instead of black state to reverse the program.

the third one

Now here, you would have noticed that there is not much difference in how I solved this one and the ones before, that is one other thing with angr, you can always just reverse engineer a bit more to make the script simpler. I have added the optimization option  add_options={angr.options.LAZY_SOLVES} here though.

Also, note that this is not the intended solution. :p. You can check out the solutions directory of angr-ctf for the intended ones.

the fourth one

Here notably I have directly stored the symbolic variables into registers.

the fifth one

Something to note in this one, and something I particularly had a lot of trouble with is the endianness. Which can be specified as seen above.

the sixth one

Here the symbolic variables are stored to hard-coded addresses of the .data section. The challenge in this binary stems from having to figure out where the input is stored.

the seventh one

This one is notably different and important as the input is stored in a dynamic memory location in this challenge. The problem is that, at the state angr starts executing, the malloc has not really been called yet and thus there is nothing in the addresses where the the input is taken from.

To take care of this and to bypass making angr do malloc, we can store fake addresses, I have chosen them in such a manner that they happen to be where the heap would have been in normal execution. I have then stored our input symbolic variables into those fake addresses.

We then simply follow up with the usual routine.

the eighth one

Here, we are simulating a file system in angr. To do so we assign a file memory using  angr.state_plugins.SimSymbolicMemory() then we provide it with a state to work on by using .set_state().

We can then store into this file by using .store(), for which the first parameter is the index from which you are to start storing followed by the content. We then continue to create the a file using the assigned memory using angr.storage.SimFile(), which I believe have self explanatory parameters here.

The next few lines of code that follow are the ones that actually create the “file-system”, finally we set the filesystem of our initial state as the one we just created right now.

Then we continue with the usual routine.

Yet again, if we were to delve a little deeper into what the program is doing, for example once the password is retrieved from the file, then we can infact use the very first script to get the required input.

the ninth one

In this challenge we reach the point where we begin to compensate for the limitations of what angr can do on its own.

cont …