Program Slicing Based Buffer Overflow Detection

doi:10.4236/jsea.2010.310113

Paper Menu >>

Journal Menu >>

J. Software Engi neeri n g & Applications, 2010, 3, 965-971

doi:10.4236/jsea.2010.310113 Published Online October 2010 (http://www.SciRP.org/journal/jsea)

965

Program Slicing Based Buffer Overflow Detection*

Yingzhou Zhang1,2,3, Wei Fu1,2, Xiaofei Qian1, Wei Chen1

1College of Computer, Nanjing University of Posts and Telecommunications, Nanjing, China; 2State Key Laboratory of Novel Soft-

ware Technology, Nanjing University, Nanjing, China; 3State key Laboratory Of Networking & Switching Technology, Beijing

University of Posts & Telecomm., Beijing, China.

Email: zhangyz@njupt.edu.cn

Received July 26th, 2010; revised August 16th, 2010; accepted 26th, 2010.

ABSTRACT

The development of the information technology has brought threats to human society when it has influenced seriously

the global politics, economics and military etc. But among the security of information system, buffer overrun vulner-

ability is undoubtedly one of the most important and common vulnerabilities. This paper describes a new technology,

named program slicing, to detect the buffer overflow leak in security-critical C code. First, we use slicing technolog y to

analyze the variables which may be with vulnerability and extract the expressions which will bring memory overflow.

Secondly, we utilize debug tech nology to get the size of memo ry applied by the va riable and the size of memory used for

these code segments (the slicing result) further. Therefore we can judge whether it will overflow according to the

analysis above. According to the unique excellence of program slicing performing in the large-scale program’s debug-

ging, the method to detect buffer overrun vulnerability described in this paper will reduce the workload greatly and

locate the code sentences affected by corresponding variable set quickly, particularly including the potential vulner-

ability caused by parameter dependence among the subroutines.

Keywords: Program Slicing, Buffer Overflow, Inter-Procedure Slicing, Debug, System Dependence Graph

1. Introduction and Related Work

As early as the beginning of 1970s, buffer overflow has

been wildly believed that it is caused by the defects of C

language’s design model. Array and pointer references

are not automatically bounds-checked, so programmers

must be up to do these checks by themselves. It is more

important that many of the string operations supported by

the standard C library such as strcpy(), strcat(), sprintf(),

gets(), are unsafe actually. They directly copy data with

unknown size to th e fixed buffer (as shown in Figure 1),

causing data overwriting in memory, access violation, or

execution of malicious code designed by hackers. The

data from CERT shows that 55 percent of general injec-

tion attacks are buffer overflow attacks [1], and that

among the thirty new vulnerabilities happened during

April 19 to May 30 in this year, buffer overflow holds 40

percen t [2].

Currently, there are some prob lems to be solv ed. Tools

with static buffer overflow detection based on string

matching algorithms maintain a high rate of false alarms.

If the functions existed in the program match the vulner-

ability database carrying by themselves, they give the

corresponding reports.

Now, the common detection methods check the bound

of arrays. They regard the memory space variable applied

as integer range, for example, char e.g. [10], and its range

is [1,10]. When data are copied into it, we must judge

whether the buffer overflows. But problems are still ex-

isted. First, we are hard to know how many sizes of the

buffer have been used. Because we used to call library

function to operate the string, and almost all of the library

// no buff er over f low

void NoVulFunc(void)

{

char dest[6];

……

strcpy(dest,”Hello”);

……

}

// buffer overflow

voi d VulFunc(v oid)

{

char dest[6];

……

strcpy(dest,”HelloWorld”);

……

}

This work was supported in part by the Natural Science Foundation o

China (60703086, 60873231, 60873049, 60973046), Jiangsu Natural

Science Foundation of China (BK2009426). Figure 1. An example of a buffer overflow.

Program Slicing Based Buffer Overflow Detection

966

function’s source code is private or existed in DLLs.

Second, these tools are hard to deal with a program with

multiple procedures. Lastly, tracking every variable’s

buffer is almost impossible in a large project.

To solve these problems, many researchers have pre-

sented methods and tools for detecting buffer overflows.

1) The static detection methods based on string match-

ing. Through string matching, the UNIX’s tool, grep, can

be used to find the unsafe library function call. The tool

RATS [3,4], developed by Secure Software inc., also

studies the unsafe function call in the source code. It

matches with the vulnerability database, and then gives

the description of the vulnerab ilities. Those tools as men-

tioned above can hardly analyze the semantics and

grammar of a program, thus they have many limitations

and high false positive.

2) The detection method based constraint analysis.

Splint [5], develop ed by university of Virginia, is a static

analysis tool used to detect the vulnerabilities in the C

program. It adds information of constraint to the source

code, and then makes lexical, grammar and semantic

analysis for the source. It judges the leaks that may hap-

pen in a program and gives the related instructions. From

the workflow of the Splint, its limitation is that it re-

quests analysts to be known the source well, and to add

notations to the interesting variables and functions.

3) The dynamic detection methods. The dynamic de-

tection tool of buffer overflows first inserts some detect-

ing codes in the place where it may happen buffer over-

flow, then compiles and executes the codes after insert-

ing, and finally, detects whether overflow happens during

the execution. But the dynamic detection has defects of

efficiency and coverage.

Against the problems described above, this paper pro-

poses a novel method of program analysis, program slic-

ing [6-8], to detect the buffer overrun .

2. Program Slicing

Program slicing [6], orig inally introduced by W eiser, has

been widely used in maintenance of software, program

debugging, software testing, code analysis, reverse engi-

neering and so on. For example, during the program de-

bugging, we hope to allo cate the codes causing the error.

But perhaps the program is too large to find by our hands.

If we apply program slicing, we can exclude the codes

that have nothing to do with the error, then allocate the

error in a smaller range.

In general, program slicing technology has experience

a lot: from static slicing to dynamic, from forward slicing

to backward, from single procedure to multiple, from

non-distributed slicing to distributed, etc.

The slice of a program with respect to program point p

and variable x consists of all statements and predicates of

the program that might affect the value of x at point p

(see Figure 2). This concept, originally discussed by

Mark Weiser, can be used to isolate individual computa-

tion threads within a program. In Weiser’s terminology, a

slicing criterion is a pair <p,v>, where p is a program

point and v is a subset of the program’s variable.

With the expansion of the program scale, it is inevita-

ble that program contains multiple procedures. So it ap-

pears more important to study inter-procedural slicing.

Regarding inter-procedural slicing as a question of

graph’s reachability, S.Horwitz et al. [8] introduce sys-

tem dependence graph (SDG) to represent the program’s

dependence graph (PDG).

PDG is a directed graph connected by different kinds

of vertexes and some edges (e.g. Figure 3). And the ver-

texes include function’s entrance node, declaration node,

assignation node, control predicate node, function’s call

site node, parameter node, the FINAL_USE node of

every variable, etc. The edges include control depend-

ence edge of program’s circuit, parameter-in and pa-

rameter edges generated by call and other data depend-

ence edges. If the definition of variable x in node n is a

reachable definition of node m, node m is data dependent

on n. Control dependents only exist in condition expres-

sion and inside the loop expression.

SDG (e.g. Figure 4) composes of procedure depend-

ence graph, which is connected by edges that represent

direct dependences between a call site and the called

procedure and edges that represent transitive depend-

ences due to calls.

// a si mp le examp l e

1 void EgForSlicing ()

2 {

3 int a = 0;

4 int b = 0;

5 int c = 5;

6 if(c > 10)

7 a++;

8 else

9 b++;

10 }

Slice for variable a in 7

expression

1 viod EgForSlicing ()

2 {

3 int a = 0;

6 if(c > 10)

7 a++;

10 }

Slice for variable c in 6

expression

1 viod EgForSlicing ()

2 {

5 int c = 5;

10 }

Slice for variable b in 9

expression

1 viod EgForSlicing ()

2 {

4 int b = 0;

6 if(c > 10)

8 else

9 b++;

10 }

Figure 2. A sample of program slicing.

Program Slicing Based Buffer Overflow Detection967

//a sample with single

//procedure

void main ()

{ int i = 0;

while (i < 10)

i++;

}

Control dependence

Data dependence

Figure 3. A program and its PDG.

//a sample with two

//procedures

void main()

{

int i = 0;

inc (i);

}

void inc ( int a)

{

a++;

}

Control dependence

Data dependenc e

Figure 4. A sample program and its SDG.

For every call site, we use four sorts of vertices to de-

note the parameter passing: on the calling side, informa-

tion transfer is represented by a set of vertices called ac-

tual-in and actual-out vertices. These vertices, which are

control dependent on the call-site vertex (see Figure 4),

represent assignment statements that copy the values of

the actual parameters to the call te mporaries an d from the

return temporaries, respectively. Similarly, information

transfer in the called procedure is represented by a set of

vertices called formal-in and formal-out vertices. These

vertices, which are control dependent on the procedure’s

entry vertex, represent assignment statements that copy

the values of the formal parameters from the call tempo-

raries and to the return temporaries, respectively.

3. An Algorithm of Buffer Overflow

Detection

In this section, we will show in detail an algorithm of

detecting buffer overflow through program slicing. The

algorithm includes four steps as follows.

Step 1. Constructing PDGs

Through the lexical and syntax analysis (with Flex and

BYACC) of the program to be detected, we first con-

struct the vertices in PDGs. There are usually seven

kinds of vertexes as follows.

1) The beginning vertex of compound statements

which includes function, if-else, switch statements and so

on;

2) The end vertex of compound statements;

3) Call-site vertex;

4) Actual-in and actual-out vertex;

5) Formal-in and formal-out vertex;

6) FinalUse vertex;

7) Others vertex such as ordinary definition statements

(e. g.: int i), predicate vertexes (e. g.: the judgment part

of a if statement) and jump statements (e. g.: break, goto

and so on).

Then we construct the edges of PDGs by analyzing the

program dependences (includes control dependences

and data dependences, see Figure 4).

Step 2. Constructing SDG

According to the PDGs in Step 1, the SDG can be con-

structed by the following substeps:

1) For each call site, a call edge from the call-site ver-

tex to the corresponding procedure-entry vertex, is added

into the PDG related.

2) For each actual-in vertex v at a call site, we add in

PDGs a parameter-out edge from v to the corresponding

formal-in vertex in the call procedure;

3) For each actual-out vertex v at a call site, we add in

PDGs a parameter-out edge to v from the corresponding

formal-out vertex in the called procedure;

4) We finally add in PDGs an edge between actual-out

vertex to actual-in vertex if they are reachable between

the corresponding formal-out and the corresponding for-

mal-in.

Step 3. Computing program slices by traversing SDG

in two phases

Supposing that we want to computer the inter-procedure

Program Slicing Based Buffer Overflow Detection

968

slice of the variables in vertex n.

Phase 1: Starting from the vertex n, we travel re-

versely the SDG through the control dependence edges

and data dependence edges (not including parameter-in

edges), and then mark all of reachable vertexes.

Phase 2: Starting from the vertexes marked in phase 1,

through the control edges (not including call-site edges)

and data dependence edges (not including parameter-out

edges), we mark all of reachable vertexes.

The result of the inter-procedure slice is the union set

of the marked vertexes in above two phases.

Step 4. Detecting Buffer Overflow

For each variable needed to analyze, we can use a data

structure to store the information of usage. The data

structure includes variables, the flags of overflow, the

usage information and the total size of memory applied

by a variable. For example, char p [10], the detection

model about p is as follows (see Figure 5).

From the abstract syntax tree obtained by lexical

analysis and syntax analysis, we can get each call func-

tion. Through the inter-procedure slicing of the parame-

ters of the vulnerable functions, we then can obtain a

piece of executable code segment which may affect these

parameters. At last, we watch the memory of the variable

for judging whether the buffer related is overflow, by

setting breakpoints in a debugger at the beginning of the

main function and the vulnerable function.

4. Implementation

Through the intermediate representation AST of a pro-

gram (from the lexical and syntax analysis of the pro-

gram), we can construct some useful graphs such as

PDGs and SDGs. In PDGs, the data structure of nodes

and edges are as follows.

struct PDGNode {

CFGType type;

//type of node, such as “DEFINE”, “IF”, “WHILE ”. etc

StructId astId;

//node’s ID in AST

PDGId father;

//connect to the control predicate

PDGId tBranch, fBranch;

//the PDGID of true branch or false branch

PDGId ifd;

//post-dominator

ListId prevHead;

//the first node’ID of the father node table

PDGId callLink;

//link to the first fun ction call site

DepEdgeId cdFirst;

//the ID of the node’s first control dependence edge

DepEdgeId ddFirst;

//the ID of the node’s first data dependence edge

noFlow

h el l o \0

hasUsed maxLen

Figure 5. The detection model of buffer overflow.

};

struct DepEdge {

int depSrc;

//the PDG ID of dependence source

int depDest;

//the PDG ID of dependence destination

DepEdgeId srcNext;

//the next dependence edge with the same dependence

//source

DepEdgeId destNext;

//the next dependence edg e with the same dependence

//destination

};

According to the algorithm in above section, we can

construct SDGs from PDGs by adding some edges with

the following fun ctions.

1) Adding control dependence edges from call-site to

the callee’s body, through the function AddEdge (CTRL_

EDGE, callSite, entry).

2) Adding data dependence edges from actual-in to

formal-in, through the function AddEdge (DATA_EDGE,

Actual_in, For m al _i n)

3) Adding data dependence edges from formal-out to

actual-out, through the function AddEdge (DATA_EDGE,

Formal_out , A c t ual _o ut )

The function AddEdge inserts dependence edges to the

dependence table according to the edge’s style. Then we

assign an ID of the new edge to the destNext (a member

variable of DepEdge). The following codes show in de-

tail the implementation of AddEdge.

AddEdge (int kind, int from, int to)

{ int lastEdge;

DepEdge dep (from, to);

depTable. insert (dep);

switch (kind)

{

case CTRL_EDGE:

lastEdge = pdgTable [fro m ]. cdFirst;

break;

case DATA_EDGE:

lastEdge = pdgTable [from] .ddFirst;

break;

}

if (lastEdge < 1)

Program Slicing Based Buffer Overflow Detection969

//if this node has no dependence edge

{

pdgTable [from]. cdFirst = depTable. getC urId();

return;

}

//if the node has dependence edge already

while ((depTable [la st Ed g e]. depDest! = to)&&

depTable [lastEdge]. destNext >= 1)

lastEdge = depTable [lastEdge]. destNext;

depTable [lastEdge]. destNext = depTable.getCurId ();

}

The dependence between the actual parameter can be

formed according to the dependence among the corre-

sponding formal parameters. When we analyze the li-

brary function, we suppose that each actual-out is de-

pendent on the actual-in. So we need not go deep into the

inner of the library function bo dy. The d etail codes of the

dependence among formal parameters are as follows.

void BuildInterActualParameterDep (int fnId)

{ //paramNum is the number of the parameter

for(int i = 0; i < fnTable[fnId]. paramNum; i+ +)

//judge whether the corresponding formal-in can

//reach the corresponding formal-out

if (IsReachable (formal-in, formal-out))

//Add edge between the corresponding actual-in

//and the corresponding actual-out;

AddEdge (DATA_EDGE, actual-in, actual-out);

}

According to the algorithm described in section 3, we

need to traverse the SDG in two phases. Because pa-

rameter -ou t edge s are no t followed , the traversal in phase

1 does not descend into procedures. But the effects of

such procedure are not ignored. The presence of transi-

tive flow dependence edges from actual-in to actual-out

vertices permits the discovery of vertices that can reach

the vertex you want to slice through a procedure call,

although the graph traversal does not actually descend

into the called procedure. In phase 2, because call edges

and parameter in edges are not followed, the traversal

does not ascend into the calling procedure; the transitive

flow dependence edges from actual-in to actual-out ver-

tices make such ascents unnecessary. So we can solve the

call-context problem (as shown in the following codes).

void InterProSlice (SDGs, PDGId vulPdgNode)

{

//phase 1: traverse in calling procedure

ReachingNo de (s, vulPdgNode, {parameter_out});

//phase 2: traverse in called procedure

//vSet is a set marked in phase 1;

ReachingNo de(s, vSet, {parameter_in, call});

}

void ReachingNode(SDG s, vSet, kind)

//vSet is the set of vertex

//kind is type of the vertex

{ stack<PDGId> node St ack;

push the vertices that exist in the vSet to the stack;

while (!nodeStack .IsEmpty())

{ pop a vertex v from the stack;

mark the vertex v;

while (if v’s depTable is not empty)

{

push the vertices existed in the depTable t o the stack;

}

After obtaining the result of the slicing, we set break-

point at the beginning of the main function and at the

place of the vulnerable function, then call the debugger

(for example: the debugger embedded in Microsoft Vis-

ual Studio or the Zeta debugger [9]) to execute by step

until the end. The implementation is as follows:

void BufferOverDetect (int start, int end, int vulPos, char *

fileName) {

char buf [10];

ZD_LoadProgram (fileName);

ZD_SetBreakPoint (start,true);

ZD_SetBreakPoint (vulPos,true);

ZD_RunTo (start);

While (start <= end)

{

ZD_RunTo (++start);

}

/*the function below is used to read a byte of

content starting from the memory address of

add to the buf. The add is passed as follows : if

you want to check whether vul (vul is defined

like this: char vul [10]) will be overflowed, we

just need to watch the content of the vul [10],

and the add is & vul [10].*/

ZD_Read (add, 1, buf);

}

5. A Sample

In this section, we will show based on program slicing a

sample (see Figure 6), where the variable of vulBuf will

be overfl owed.

Due to the restriction of space, parts of the vertices in

Figure 6 have been abbreviated shown as follows.

D:noR = DEFINE:noRelated;

D:noRBuf = DEFINE:noRelatedBu;

D:vulBuf = DEFINE:vulBuf;

A:noR++ = ASSIGN:noRelated++;

Program Slicing Based Buffer Overflow Detection

970

Figure 6. A sample of buffer overflow and its SDG.

C:copy = CALL_SITE:copy;

F:noR = FINALUSE:noRelated;

F:noRBuf = FINALUSE:noRelatedBuf;

F:vulBuf = FIN ALU SE :vulBuf;

A_IN = ACTUAL_IN;

A_OUT = ACTUAL_OUT;

F_IN = FORMAL_IN;

F_OUT = FORMAL_OUT;

A_IN_1 = ACTUAL_IN_1;

A_OUT_1 = ACTUAL_OUT_1;

A_IN_2 = ACTUAL_IN_2;

A_OUT_2 = ACTUAL_OUT_2;

After constructing the SDG, we start to find the vul-

nerable function call with the matching. Then we can

find the strcpy in the copy may cause overflow. So we

make inter-procedure slicing for variable p, and obtain

the result showed in Figure 7.

Then we call the debugger, and set breakpoint at the

beginning of the main function and at the place of the

strcpy, by starting to execute by step. At last, we will

find vulBuf [10] equals character of NULL (see Figure

8). This shows that vulBuf has been overflowed.

After debugging, we will give the corresponding re-

port.

There are some advantages in our detection tool based

on program slicing. First, this tool improves the accuracy

greatly compared with the ITS4 and RATS which use

string matching to detect the buffer overflows. Second,

through program slicing, we can get rid of the useless

codes. Compared with the detecting tool that sets a con-

straint for each variable and watches its value, our tool

can reduce the variables needed to watch, improve the

Program Slicing Based Buffer Overflow Detection971

# include <string.h>

void copy(char *p )

{ strcpy (p,”HelloWorld” ) ;

}

void main()

{ char vulBuf [10];

copy (vulBuf);

}

Figure 7. The slice result of the sample in Figure 6.

Figure 8. The result of a detection debugger.

performance. Third, compared with the Splint, our tool

needs not to insert information of constraint to th e source.

So the analyst needs not learn the program a lot.

6. Conclusions

This paper introduces inter-p rocedure slicing to solve the

problem of detecting buffer overflow. Compared with the

other methods, this method has the high performance and

excellent precision. But our tool is only fit for the pro-

grams coded by C, and it is still blindness in the ob-

ject-oriented langu ages. So eliminating the blindness will

be our further work.

REFERENCES

[1] “CERT/CC.Vulnerability Notes by Metric,” http://www.kb.

cert.org/vuls/bymetric?open&start=1&count=20

[2] “US-CERT Recently Published Vulnerability Notes,”

http://www.kb.cert.org/vuls/atomfeed?OpenView&start

=1&count=30

[3] “Secure Software, Rough Auditing Tool for Security,

(RATS),” Secure Software Inc, http://www.secure soft-

ware.com

[4] J. Viega, J. T. Bloch, T. Kohno and G. McGraw, “ITS4:

A Static Vulnerability Scanner for c and c++ Code,” An-

nual Computer Security Applications Conference, Hawaii,

2000, pp. 257-267.

[5] D. Evans and D. Larochelle, “Improving Security Using

Extensible Lightweight Static Analysis,” IEEE Software,

Vol. 19, No. 1, 2002, pp. 42-51.

[6] M. Weiser, “Program Slicing,” The IEEE Transactions on

Software Engineering, Vol. 10, No. 4, 1984, pp. 352-357.

[7] Y.Z. Zhang and B. W. Xu, “A Novel Formal Approach to

Program Slicing,” Science in China, Series E, Informa-

tion Sciences, Vol. 38, No. 2, 2008, pp. 161-176.

[8] S. Horwitz, T. Reps and D. Binkley, “Interprocedural

Slicing Using Dependence Graphs,” ACM Transaction on

Programming Languages and Systems, Vol. 12, No. 1, pp.

26-60.

[9] “Zeta Debugger,” http://www .fyzor.com/debug ger/index.htm