CompTIA CYSA+ CS0-002 – Analyzing Application Assessments part 1

  1. Software Assessments (OBJ 2.2)

Software assessments. In this lesson, we’re going to talk about the different types of software assessments because it’s really important for you to have a comprehensive testing program that validates the effectiveness of your protection of confidentiality, integrity and availability within your software. If you’re doing any kind of web application development or software development, it is really crucial that you have a comprehensive testing program in place. Now, the things we’re going to talk talk about in this lesson include things like static code analysis, formal verification, user acceptance testing or UAT, and security regression testing. Now, when we talk about static code analysis, this is the process of reviewing uncompiled source code either manually or using automated tools. So let’s pretend that one of my web developers created a program so you could take quizzes on our website.

Now, that web application is written in some kind of code and if somebody wanted to look at that code line by line and see how it works, that would be a static code analysis. Now, in addition to doing it manually, we could use automated tools though. Now these automated tools can reveal issues ranging from faulty logic to insecure libraries before you even start running that application. Now these automated tools are essentially software that will scan the source code for signatures of known issues. Now, a lot of these use different databases for these known issues. For example, we’ve mentioned OAS before OAS is a really popular place to find these coding vulnerabilities. And there’s a top ten for application security risks or injection vulnerabilities.

And you can look at those and load those into your automated tools as well. Now, just because you have an automated tool doesn’t mean you can just scan it with a tool and forget about it. No, these tools are not an adequate substitute for human judgment. Instead, we should use these automated tools to get the first pass through that software code and then we should put a human on it to look through it as well. By doing this one two punch, you’re going to get a lot better results because a lot of times the human will catch things the automated tool won’t. And the automated tool will catch things the human won’t. Now, when you’re doing these things, these are known as code reviews. Now, code reviews are this manual process and it’s a process of peer review of uncompiled source code by other developers.

So let’s say one of my web developers wrote a program, he emails it to me and I read through it. That would be a peer review code review. Now, there’s lots of other types you can do. We can do something known as an over the shoulder code review. This is where the person who wrote the code will sit at the keyboard and explain exactly what they did. And then I might stand behind them and look over their shoulder to see the code as they’re explaining it. Another way you can do it is by doing pair programming. This is more of an interactive way where they write a section and I write a section and while I’m writing my section they’re looking over my shoulder and when they’re writing a section I’m looking over their shoulder. And so that way we can try to catch mistakes. That oversight allows us to catch different assumptions.

For instance, I might go hey, why are using that variable declaration that doesn’t make sense, you’re making a faulty assumption and then I can call them out on that and we can get those things fixed. By doing this you can bring up level of knowledge as well because a more experienced person can help a junior coder or a junior coder might be able to show a new way of doing things to that senior coder. It does work both ways. And so doing these code reviews and pure programming can really help the security of your programs. Now, another thing we can do is what’s called a formal verification method. Now this is a process of validating software design through mathematical modeling of expected inputs and outputs. Now, why might we use a formal verification method instead of just doing a static code review? Well, a lot of times there are these things known as corner cases.

And when you’re doing a formal verification method these are used in critical software where corner cases have to be eliminated. Now, what do I mean by a corner case? Well, let’s say had a program that does something like a self driving car. Well, a self driving car has a lot of different code in it. And most of the time the self driving car is going to be driving on a road and it needs to be able to detect objects like a hole in the road or a person on the side of the road or a car in front of them. And they need to be able to see these things using its sensors and its logic to be able to drive the car. Now, a corner case might be what does the car do if that person on the side road jumps out in front of me? Now, the car only has two options. It can try to stop but it won’t be able to in time and it will hit the person or it can swerve to the left and crash into the other car.

Or it could swerve to the right and hit that other person on the side of the road. All of these are bad options. In this case this would be known as corner cases because these are all bad options that we’d want to avoid. But if you have to choose one of them, which one are you going to choose? And so we have to be able to go through into you a formal verification method of all the software, especially in something that’s a critical environment like a car that weighs two or £3000 barreling down the highway at 60 miles an hour. That’s a great place for using formal verification methods. Another thing we might use is what’s called user acceptance testing. Now user acceptance testing or UAT is beta testing by your end users. That proves a program is usable and fit for purpose in real world conditions.

So back in the late ninety s and early two thousand s, I owned a company doing web development. That’s where I started out. And when I was doing web development back then we did a lot of user acceptance testing. So I would sit down with somebody, you would tell me exactly what you want this website to do. You tell me all the features you want, what you want the design to look like, all of that stuff. We gather all the requirements, we plan out the design, we code it and we do our testing. We’re going to do code reviews, static code reviews. We’re going do automated code reviews. We might even do formal verification methods if we’re doing something for say, a bank or something like that. Now once we do all of that, we now need to get into the user acceptance testing.

This is where we’re going to put hands on end users on the system and see how they react. This allows us to gather feedback from those end users, identify any workflow issues we may have never seen, and ensure that it meets their particular use case. So if I was building a website for a bank, for instance, I might think that I’ve coded it exactly right and the bank might have thought that I coded it exactly right. But then we put a user on the system, they go, well this doesn’t make sense. When I try to transfer money from one account to another, I have to go through 17 clicks to do it. That isn’t very good. And so that end user who’s representing the customer might go, we’re not going to accept this, go back and recode it. Or in version two you need to fix this.

This is the idea of user acceptance testing. This is more about the end functionality and fit. Does it do what it’s supposed to do? And is it doing it in a well thought out way that makes it easier for the user to use? The final thing we’re going to talk about here is security regression testing. Now this is the process of checking that updates to code do not compromise the existing security functionality or capability of the code itself. Now why is this important? Well, because every time you install a patch or an update or some kind of new piece of software, you run the risk of breaking things. And so if I take a new security update and I install it on my network and I didn’t test it first, well guess what? I might have introduced new bugs or I might have broken features so instead we want to do security regression testing.

We’re going to set up in a lab. We’re going to test that thing and make sure we didn’t break something else when we fixed it. Security regression testing is going to enable the identification of security mechanisms that had worked before but are now broken after the latest changes. This is a common problem because one of the things we say as a programmer is that I can add one piece of code to fix something, but I might end up breaking three or four other things. Anytime you add a change, there’s a possibility something is going to get broken. And so one of the things you always want to go back and check is that you didn’t break any of your security mechanisms. Because that would be a bad thing for us, especially since we’re trying to be cybersecurity experts by being cybersecurity analysts.

  1. Reverse Engineering (OBJ 1.4)

Reverse engineering. In this lesson, we’re going to talk about some different reverse engineering tools and techniques and why these are important to us. Now, as a cybersecurity analyst, you may be asked to do some reverse engineering, specifically of malware that was collected on your network. When you do reverse engineering, this is the process of analyzing the structure of hardware or software to reveal more about how it functions. Now, the challenge is that when you’re doing reverse engineering engineering, you have to start with a binary. And that binary is an executable file. You can’t just look at that binary and then figure out what it’s doing. No, you need to actually take the time to decompose that executable into something that could be readable as code. Now, when you decompose the executable, you can do it into one of three things.

You can do it into machine code, assembly code, or high level code. Now, when we talk about machine code, this is software that has been assembled into binary instructions that are expressed as hexadecimal digits native to the processor platform. So if you’re using an intel based processor, one of the modern ones, they are X 86 64. That is the native processor that they use, and they have a certain instruction set. And so if you send this set of ones and zeros, it’s going to do this function. If you set this other set of ones and zeros, it does something else. Those are the binary instructions. Now, to make it a little bit easier to read, we take eight bits at a time, and we represent that by two hexadecimal digits.

This way we can use that bytecode to be able to read what was being told to the processor. For example, if I have this 48 89 32, that is a three byte instruction that is written in machine code. Now, can you understand what this thing is doing? No. Well, neither can I just by looking at it, but I actually looked it up. And if we actually look at this and try to reverse engineer the machine level code into something that we can read, like this binary, it can take a lot of time for us to be able to go through it, because now we have this series of hexadecimal numbers, and we’d have to look them up and figure out exactly what they mean. Now, in this case, 48 89 32 is telling the system to store the value received into a variable that’s stored at a particular location in memory. Again, not very helpful yet.

Right? Let’s go ahead and take this up one more level. We can use something called a disassembler. Now, a disassembler can take that machine language and they can reverse engineer it to be able to convert it from machine language code into assembly language code. Now, once we get it into assembly code, we have assembly code. That is a compiled software program that’s converted to binary machine code using the instruction set of that CPU platform and now it’s actually represented in human readable text. So instead of using just the hexadecimal characters, I can use things that are written more in ASCII something that I can actually read. So if I want to look at some typical instructions for assembly you might see things like instructions for moving things between one register and memory to another that’s things like int or push or move.

We might perform logical bitwise operations like not and or XOR. We might perform mathematical operations like adding subtracting incrementing or decrementing. Or we could perform branching things like a jump or we can use test conditions like compare or test. All of these are things that we can use inside of assembly. And so basically there are these two or three or four letter codes that tell an instruction of what to do. Now again, these aren’t really easy to read but they’re a lot easier than looking at binary or hexadecimal. For instance, if I have something like this move q, percent RSI comma parentheses percent RDX parentheses what is this saying? Well, this is an example of that three byte instruction that I had earlier from machine code. So instead of saying 48, 89, 32, I can now have it written this way.

Now this still isn’t really easy to read but it’s certainly better than that machine code was. Now here is something that is more human readable for us. So even if you don’t know assembly language, you could probably see this and figure out it’s doing something with moving because we have move queue there and this is going to move data from one place to another in memory. Now, the reason it says move q instead of move is because this is written for a 64 bit processor and it supports the movement of quads of memory. And so this is a move Q or four bytes at a time. Now, the next thing we want to talk about is a decompiler. So we took something from machine code and we brought it up to assembly by using the disassembler. Now if we take that assembly and we want to go higher, we would use a decompiler.

A decompiler is a reverse engineering tool that converts that machine code or assembly language code to code in a specific higher level language or pseudocode. This makes it much easier for us to read as humans. Now, when we talk about higher level code, we’re talking about code that is easier to read for humans. They can read it, write it and understand it much more natively. For example, if I had some code like this which is written in C, c is considered a high level language with code that’s fairly easy to read for somebody with some basic training. Even if you haven’t known C in the past, you can probably look at this and figure out what it’s doing. What do you think it’s doing it’s assigning the value of T into the variable dest, or in this case, the destination location of a piece of memory.

And that’s all we’re doing. It’s the same code that we were using before. I can write it this way, or I can write it with the move statement, or I can write it as 48, 89, 32. All three do the exact same thing. But this one is by far the easiest to understand. Now, another way you can do this is by using pseudocode. Pseudocode is not real code. The code I just showed you was actually C. But pseudocode can just be a made up language that your decompiler is going to use. Pseudocode makes it easier to identify individual functions within the process, track the use of variables, and to find branching logic. Now, when you’re dealing with this, you want to use a tool to do this. Now, what is the most common tool out there when it comes to decompiling and disassembly? Well, IDA. IDA is the interactive disassembler, also known as IDA IDA.

This is a popular crossplatform disassembler and D compiler that’s often used by reverse engineers. If you happen to take a course in reverse malware engineering, you’re going to get really familiar with how to use IDA. Now, IDA has an automated functionality as well that’s able to identify different API calls, function, parameters, constraints, and other components of the disassembled code. In this particular example, IDA has taken the machine code from the binary, disassembled it back into assembly language and then use this decompiler function to make it more human readable by giving us this high level pseudocode that looks something like a C or a C program. You can see there on the right, all the pseudocode on the right side. And this would help us decompile this malware and figure out what it’s doing.

Now, programmers, if they want to make their code more difficult to analyze, they can do this by using an obstacle. An obstacle is basically a software that’s going to randomize the names of variables, constants functions, procedures, and removes comments and all the white space inside a code. This makes it harder for us as malware analysts to be able to go through that code and figure out what the malware writer is doing, which is their goal, because they want to keep the secret sauce of their malware secret so that we can’t write signatures to block it in the future. And if we can go through and reverse engineer their code, we can understand what it’s doing. We can block IP addresses, block domain names, or whatever else they’re using for beaconing call outs and other exfiltration.

 

img