[关闭]
@SmashStack 2017-08-11T08:56:24.000000Z 字数 5309 阅读 17

Some Ideas and Questions about Value Set Analysis

RSoC


This blog is about my basic ideas about VSA and some questions, whose aim is to try to expose my misunderstanding and throw some problems about the difficulty in applying VSA onto RadecoIL.

The remainder of the blog is organized as follows:

By the way, all of these mentioned below will ONLY concentrate on the basic VSA from "Analyzing Memory Accesses in x86 Executables".

will be included after solving all of problems in this blog.

My Basic Ideas about Applying VSA onto RadecoIL

Data Struct

VSA consists of three basic compositions:

From now on, I have three ideas about the whole data structures, but I'm not sure which to choose, or other better ones.

Algorithm

The algoriths for all data structures are similar, working in as BFS.

  1. The program maintains a heap, which consist of basic blocks. The sort key of the heap is the number of unvisited predecessor, from small to big. The aim of this heap is that we could distinguish whether the block is a loop entry. When Heap pops a basic block, if the number of unvisited predecessor is zero, it means is's not a loop entry block, otherwise, it is.

  2. Push the start node into heap with the initial abstract store and a_locs.

  3. If the heap is empty, go to step 7.

  4. Pop a basic block, if it's a loop entry, use limited widening for every phi node, otherwise, use simple union/join.

  5. Trave the Opcode node in order within the poped block, and calculate the abstract store at the exit of this block.

  6. Push all the successor of the poped block. Go to step 4.

  7. End the program.

Some Obstacles in front of Our Work

Initial Global Data

However, the algorithm mentioned above is not complete. The first obstacle is the initial global data. The problem occurs in the step 2 for initial abstract store

We all know that global datas could have initial value. It could be an address, or a pure number, or more common, a zero number in .bss segment. But the point is, we only concentrate on single function, that means we could not know whether other functions have changed their value. We cannot have a accurate initial abstrace store for global.

Only one solution I could have is to assume they are any value.

Call Site

Call Statements are the most difficult point in the whole alogrithm. We cannot calculate abstract store when we meet an OpCall, because we cannot know what happen will inside the callee, even the BP/SP could be changed in special situation like function alloc.

In the original paper, the auther solved this problem by building a supergraph of the executable, which includes all the functions. But we could do that, because it's an amazing consumption on time and memory.

Thus, the only solution is to analyze functions in order, from leaf to the root in a special tree. The tree consists of functions, where the father function node calls the son function node. But, there are still some sub-problems:

Not A Tree, but A Graph

If above problems have been solved (or maybe there are still other problems), we could gather information form callee function and handle OpCall

Condition and Loop

We have mentioned a method to finger the loop entry. By the way, that idea was made by myself, so I'm not sure it's total right, even I have proved it. There're also other problems about the limited widening:

However, the problem in Condition and Loop is not so urgent, because we can give up a litte preciseness and replace Limited Widening with Ordinary Widening ;P

Some Pre-work should be donw

  1. Make a HashMap from SSA node into it's registers (for calculating abstract store)
  2. Gather global variables and their initial value from r2, especially some readonly data.
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注