This blog is about my basic ideas about VSA and some questions, whose aim is to try to expose my misunderstanding and throw some problems about the difficulty in applying VSA onto
The remainder of the blog is organized as follows:
My Basic Ideas about Applying VSA onto RadecoIL
Some Obstacle in front of Our Work
By the way, all of these mentioned below will ONLY concentrate on the basic VSA from "Analyzing Memory Accesses in x86 Executables".
will be included after solving all of problems in this blog.
VSA consists of three basic compositions:
a_loc: A_locs corresponds to the variables in source code, which contain local variables (gathered from R2), global (gathered from R2), heap-allocation variables and registers.
Value Set: N-tuple of PIC, for n memory regions, while RIC consist of four numbers (a, b, c, d) as a segment: a * [b, c] + d. In general, pure numbers could be in the GLOBAL region (Like what the original paper did).
Abstract Store: Abstract Stores are a map from a_locs to Value Sets, for every single instruction. In RadecoIL, it could be bounded with every single OpCode/Phi node.
From now on, I have three ideas about the whole data structures, but I'm not sure which to choose, or other better ones.
The algoriths for all data structures are similar, working in as BFS.
The program maintains a heap, which consist of basic blocks. The sort key of the heap is the number of unvisited predecessor, from small to big. The aim of this heap is that we could distinguish whether the block is a loop entry. When Heap pops a basic block, if the number of unvisited predecessor is zero, it means is's not a loop entry block, otherwise, it is.
Push the start node into heap with the initial abstract store and a_locs.
If the heap is empty, go to step 7.
Pop a basic block, if it's a loop entry, use
limited widening for every phi node, otherwise, use
Trave the Opcode node in order within the poped block, and calculate the abstract store at the exit of this block.
Push all the successor of the poped block. Go to step 4.
End the program.
However, the algorithm mentioned above is not complete. The first obstacle is the initial global data. The problem occurs in the step 2 for initial abstract store
We all know that global datas could have initial value. It could be an address, or a pure number, or more common, a zero number in .bss segment. But the point is, we only concentrate on single function, that means we could not know whether other functions have changed their value. We cannot have a accurate initial abstrace store for global.
Only one solution I could have is to assume they are any value.
Call Statements are the most difficult point in the whole alogrithm. We cannot calculate abstract store when we meet an OpCall, because we cannot know what happen will inside the callee, even the BP/SP could be changed in special situation like function
In the original paper, the auther solved this problem by building a supergraph of the executable, which includes all the functions. But we could do that, because it's an amazing consumption on time and memory.
Thus, the only solution is to analyze functions in order, from leaf to the root in a special tree. The tree consists of functions, where the father function node calls the son function node. But, there are still some sub-problems:
Not A Tree, but A Graph
sp_delta, but how to find the sp_delta?
If above problems have been solved (or maybe there are still other problems), we could gather information form callee function and handle
We have mentioned a method to finger the loop entry. By the way, that idea was made by myself, so I'm not sure it's total right, even I have proved it. There're also other problems about the
Affine Relationsof general registers. I think it's a hard work to find them
However, the problem in
Condition and Loop is not so urgent, because we can give up a litte preciseness and replace
Limited Widening with
Ordinary Widening ;P