J. Software Engi neeri n g & Applications, 2010, 3, 965-971
doi:10.4236/jsea.2010.310113 Published Online October 2010 (http://www.SciRP.org/journal/jsea)
Copyright © 2010 SciRes. JSEA
965
Program Slicing Based Buffer Overflow Detection*
Yingzhou Zhang1,2,3, Wei Fu1,2, Xiaofei Qian1, Wei Chen1
1College of Computer, Nanjing University of Posts and Telecommunications, Nanjing, China; 2State Key Laboratory of Novel Soft-
ware Technology, Nanjing University, Nanjing, China; 3State key Laboratory Of Networking & Switching Technology, Beijing
University of Posts & Telecomm., Beijing, China.
Email: zhangyz@njupt.edu.cn
Received July 26th, 2010; revised August 16th, 2010; accepted 26th, 2010.
ABSTRACT
The development of the information technology has brought threats to human society when it has influenced seriously
the global politics, economics and military etc. But among the security of information system, buffer overrun vulner-
ability is undoubtedly one of the most important and common vulnerabilities. This paper describes a new technology,
named program slicing, to detect the buffer overflow leak in security-critical C code. First, we use slicing technolog y to
analyze the variables which may be with vulnerability and extract the expressions which will bring memory overflow.
Secondly, we utilize debug tech nology to get the size of memo ry applied by the va riable and the size of memory used for
these code segments (the slicing result) further. Therefore we can judge whether it will overflow according to the
analysis above. According to the unique excellence of program slicing performing in the large-scale programs debug-
ging, the method to detect buffer overrun vulnerability described in this paper will reduce the workload greatly and
locate the code sentences affected by corresponding variable set quickly, particularly including the potential vulner-
ability caused by parameter dependence among the subroutines.
Keywords: Program Slicing, Buffer Overflow, Inter-Procedure Slicing, Debug, System Dependence Graph
1. Introduction and Related Work
As early as the beginning of 1970s, buffer overflow has
been wildly believed that it is caused by the defects of C
language’s design model. Array and pointer references
are not automatically bounds-checked, so programmers
must be up to do these checks by themselves. It is more
important that many of the string operations supported by
the standard C library such as strcpy(), strcat(), sprintf(),
gets(), are unsafe actually. They directly copy data with
unknown size to th e fixed buffer (as shown in Figure 1),
causing data overwriting in memory, access violation, or
execution of malicious code designed by hackers. The
data from CERT shows that 55 percent of general injec-
tion attacks are buffer overflow attacks [1], and that
among the thirty new vulnerabilities happened during
April 19 to May 30 in this year, buffer overflow holds 40
percen t [2].
Currently, there are some prob lems to be solv ed. Tools
with static buffer overflow detection based on string
matching algorithms maintain a high rate of false alarms.
If the functions existed in the program match the vulner-
ability database carrying by themselves, they give the
corresponding reports.
Now, the common detection methods check the bound
of arrays. They regard the memory space variable applied
as integer range, for example, char e.g. [10], and its range
is [1,10]. When data are copied into it, we must judge
whether the buffer overflows. But problems are still ex-
isted. First, we are hard to know how many sizes of the
buffer have been used. Because we used to call library
function to operate the string, and almost all of the library
// no buff er over f low
void NoVulFunc(void)
{
char dest[6];
……
strcpy(dest,”Hello”);
……
}
// buffer overflow
voi d VulFunc(v oid)
{
char dest[6];
……
strcpy(dest,”HelloWorld”);
……
}
This work was supported in part by the Natural Science Foundation o
f
China (60703086, 60873231, 60873049, 60973046), Jiangsu Natural
Science Foundation of China (BK2009426). Figure 1. An example of a buffer overflow.
Program Slicing Based Buffer Overflow Detection
966
function’s source code is private or existed in DLLs.
Second, these tools are hard to deal with a program with
multiple procedures. Lastly, tracking every variable’s
buffer is almost impossible in a large project.
To solve these problems, many researchers have pre-
sented methods and tools for detecting buffer overflows.
1) The static detection methods based on string match-
ing. Through string matching, the UNIX’s tool, grep, can
be used to find the unsafe library function call. The tool
RATS [3,4], developed by Secure Software inc., also
studies the unsafe function call in the source code. It
matches with the vulnerability database, and then gives
the description of the vulnerab ilities. Those tools as men-
tioned above can hardly analyze the semantics and
grammar of a program, thus they have many limitations
and high false positive.
2) The detection method based constraint analysis.
Splint [5], develop ed by university of Virginia, is a static
analysis tool used to detect the vulnerabilities in the C
program. It adds information of constraint to the source
code, and then makes lexical, grammar and semantic
analysis for the source. It judges the leaks that may hap-
pen in a program and gives the related instructions. From
the workflow of the Splint, its limitation is that it re-
quests analysts to be known the source well, and to add
notations to the interesting variables and functions.
3) The dynamic detection methods. The dynamic de-
tection tool of buffer overflows first inserts some detect-
ing codes in the place where it may happen buffer over-
flow, then compiles and executes the codes after insert-
ing, and finally, detects whether overflow happens during
the execution. But the dynamic detection has defects of
efficiency and coverage.
Against the problems described above, this paper pro-
poses a novel method of program analysis, program slic-
ing [6-8], to detect the buffer overrun .
2. Program Slicing
Program slicing [6], orig inally introduced by W eiser, has
been widely used in maintenance of software, program
debugging, software testing, code analysis, reverse engi-
neering and so on. For example, during the program de-
bugging, we hope to allo cate the codes causing the error.
But perhaps the program is too large to find by our hands.
If we apply program slicing, we can exclude the codes
that have nothing to do with the error, then allocate the
error in a smaller range.
In general, program slicing technology has experience
a lot: from static slicing to dynamic, from forward slicing
to backward, from single procedure to multiple, from
non-distributed slicing to distributed, etc.
The slice of a program with respect to program point p
and variable x consists of all statements and predicates of
the program that might affect the value of x at point p
(see Figure 2). This concept, originally discussed by
Mark Weiser, can be used to isolate individual computa-
tion threads within a program. In Weiser’s terminology, a
slicing criterion is a pair <p,v>, where p is a program
point and v is a subset of the program’s variable.
With the expansion of the program scale, it is inevita-
ble that program contains multiple procedures. So it ap-
pears more important to study inter-procedural slicing.
Regarding inter-procedural slicing as a question of
graph’s reachability, S.Horwitz et al. [8] introduce sys-
tem dependence graph (SDG) to represent the program’s
dependence graph (PDG).
PDG is a directed graph connected by different kinds
of vertexes and some edges (e.g. Figure 3). And the ver-
texes include function’s entrance node, declaration node,
assignation node, control predicate node, function’s call
site node, parameter node, the FINAL_USE node of
every variable, etc. The edges include control depend-
ence edge of program’s circuit, parameter-in and pa-
rameter edges generated by call and other data depend-
ence edges. If the definition of variable x in node n is a
reachable definition of node m, node m is data dependent
on n. Control dependents only exist in condition expres-
sion and inside the loop expression.
SDG (e.g. Figure 4) composes of procedure depend-
ence graph, which is connected by edges that represent
direct dependences between a call site and the called
procedure and edges that represent transitive depend-
ences due to calls.
// a si mp le examp l e
1 void EgForSlicing ()
2 {
3 int a = 0;
4 int b = 0;
5 int c = 5;
6 if(c > 10)
7 a++;
8 else
9 b++;
10 }
Slice for variable a in 7
th
expression
1 viod EgForSlicing ()
2 {
3 int a = 0;
6 if(c > 10)
7 a++;
10 }
Slice for variable c in 6
th
expression
1 viod EgForSlicing ()
2 {
5 int c = 5;
10 }
Slice for variable b in 9
th
expression
1 viod EgForSlicing ()
2 {
4 int b = 0;
6 if(c > 10)
8 else
9 b++;
10 }
Figure 2. A sample of program slicing.
Copyright © 2010 SciRes. JSEA
Program Slicing Based Buffer Overflow Detection967
//a sample with single
//procedure
void main ()
{ int i = 0;
while (i < 10)
i++;
}
Control dependence
Data dependence
Figure 3. A program and its PDG.
//a sample with two
//procedures
void main()
{
int i = 0;
inc (i);
}
void inc ( int a)
{
a++;
}
Control dependence
Data dependenc e
Figure 4. A sample program and its SDG.
For every call site, we use four sorts of vertices to de-
note the parameter passing: on the calling side, informa-
tion transfer is represented by a set of vertices called ac-
tual-in and actual-out vertices. These vertices, which are
control dependent on the call-site vertex (see Figure 4),
represent assignment statements that copy the values of
the actual parameters to the call te mporaries an d from the
return temporaries, respectively. Similarly, information
transfer in the called procedure is represented by a set of
vertices called formal-in and formal-out vertices. These
vertices, which are control dependent on the procedure’s
entry vertex, represent assignment statements that copy
the values of the formal parameters from the call tempo-
raries and to the return temporaries, respectively.
3. An Algorithm of Buffer Overflow
Detection
In this section, we will show in detail an algorithm of
detecting buffer overflow through program slicing. The
algorithm includes four steps as follows.
Step 1. Constructing PDGs
Through the lexical and syntax analysis (with Flex and
BYACC) of the program to be detected, we first con-
struct the vertices in PDGs. There are usually seven
kinds of vertexes as follows.
1) The beginning vertex of compound statements
which includes function, if-else, switch statements and so
on;
2) The end vertex of compound statements;
3) Call-site vertex;
4) Actual-in and actual-out vertex;
5) Formal-in and formal-out vertex;
6) FinalUse vertex;
7) Others vertex such as ordinary definition statements
(e. g.: int i), predicate vertexes (e. g.: the judgment part
of a if statement) and jump statements (e. g.: break, goto
and so on).
Then we construct the edges of PDGs by analyzing the
program dependences (includes control dependences
and data dependences, see Figure 4).
Step 2. Constructing SDG
According to the PDGs in Step 1, the SDG can be con-
structed by the following substeps:
1) For each call site, a call edge from the call-site ver-
tex to the corresponding procedure-entry vertex, is added
into the PDG related.
2) For each actual-in vertex v at a call site, we add in
PDGs a parameter-out edge from v to the corresponding
formal-in vertex in the call procedure;
3) For each actual-out vertex v at a call site, we add in
PDGs a parameter-out edge to v from the corresponding
formal-out vertex in the called procedure;
4) We finally add in PDGs an edge between actual-out
vertex to actual-in vertex if they are reachable between
the corresponding formal-out and the corresponding for-
mal-in.
Step 3. Computing program slices by traversing SDG
in two phases
Supposing that we want to computer the inter-procedure
Copyright © 2010 SciRes. JSEA
Program Slicing Based Buffer Overflow Detection
968
slice of the variables in vertex n.
Phase 1: Starting from the vertex n, we travel re-
versely the SDG through the control dependence edges
and data dependence edges (not including parameter-in
edges), and then mark all of reachable vertexes.
Phase 2: Starting from the vertexes marked in phase 1,
through the control edges (not including call-site edges)
and data dependence edges (not including parameter-out
edges), we mark all of reachable vertexes.
The result of the inter-procedure slice is the union set
of the marked vertexes in above two phases.
Step 4. Detecting Buffer Overflow
For each variable needed to analyze, we can use a data
structure to store the information of usage. The data
structure includes variables, the flags of overflow, the
usage information and the total size of memory applied
by a variable. For example, char p [10], the detection
model about p is as follows (see Figure 5).
From the abstract syntax tree obtained by lexical
analysis and syntax analysis, we can get each call func-
tion. Through the inter-procedure slicing of the parame-
ters of the vulnerable functions, we then can obtain a
piece of executable code segment which may affect these
parameters. At last, we watch the memory of the variable
for judging whether the buffer related is overflow, by
setting breakpoints in a debugger at the beginning of the
main function and the vulnerable function.
4. Implementation
Through the intermediate representation AST of a pro-
gram (from the lexical and syntax analysis of the pro-
gram), we can construct some useful graphs such as
PDGs and SDGs. In PDGs, the data structure of nodes
and edges are as follows.
struct PDGNode {
CFGType type;
//type of node, such as DEFINE”, “IF”, “WHILE . etc
StructId astId;
//nodes ID in AST
PDGId father;
//connect to the control predicate
PDGId tBranch, fBranch;
//the PDGID of true branch or false branch
PDGId ifd;
//post-dominator
ListId prevHead;
//the first nodeID of the father node table
PDGId callLink;
//link to the first fun ction call site
DepEdgeId cdFirst;
//the ID of the nodes first control dependence edge
DepEdgeId ddFirst;
//the ID of the nodes first data dependence edge
P:
noFlow
h el l o \0
hasUsed maxLen
Figure 5. The detection model of buffer overflow.
};
struct DepEdge {
int depSrc;
//the PDG ID of dependence source
int depDest;
//the PDG ID of dependence destination
DepEdgeId srcNext;
//the next dependence edge with the same dependence
//source
DepEdgeId destNext;
//the next dependence edg e with the same dependence
//destination
};
According to the algorithm in above section, we can
construct SDGs from PDGs by adding some edges with
the following fun ctions.
1) Adding control dependence edges from call-site to
the callee’s body, through the function AddEdge (CTRL_
EDGE, callSite, entry).
2) Adding data dependence edges from actual-in to
formal-in, through the function AddEdge (DATA_EDGE,
Actual_in, For m al _i n)
3) Adding data dependence edges from formal-out to
actual-out, through the function AddEdge (DATA_EDGE,
Formal_out , A c t ual _o ut )
The function AddEdge inserts dependence edges to the
dependence table according to the edge’s style. Then we
assign an ID of the new edge to the destNext (a member
variable of DepEdge). The following codes show in de-
tail the implementation of AddEdge.
AddEdge (int kind, int from, int to)
{ int lastEdge;
DepEdge dep (from, to);
depTable. insert (dep);
switch (kind)
{
case CTRL_EDGE:
lastEdge = pdgTable [fro m ]. cdFirst;
break;
case DATA_EDGE:
lastEdge = pdgTable [from] .ddFirst;
break;
}
if (lastEdge < 1)
Copyright © 2010 SciRes. JSEA
Program Slicing Based Buffer Overflow Detection969
//if this node has no dependence edge
{
pdgTable [from]. cdFirst = depTable. getC urId();
return;
}
//if the node has dependence edge already
while ((depTable [la st Ed g e]. depDest! = to)&&
depTable [lastEdge]. destNext >= 1)
lastEdge = depTable [lastEdge]. destNext;
depTable [lastEdge]. destNext = depTable.getCurId ();
}
The dependence between the actual parameter can be
formed according to the dependence among the corre-
sponding formal parameters. When we analyze the li-
brary function, we suppose that each actual-out is de-
pendent on the actual-in. So we need not go deep into the
inner of the library function bo dy. The d etail codes of the
dependence among formal parameters are as follows.
void BuildInterActualParameterDep (int fnId)
{ //paramNum is the number of the parameter
for(int i = 0; i < fnTable[fnId]. paramNum; i+ +)
//judge whether the corresponding formal-in can
//reach the corresponding formal-out
if (IsReachable (formal-in, formal-out))
//Add edge between the corresponding actual-in
//and the corresponding actual-out;
AddEdge (DATA_EDGE, actual-in, actual-out);
}
According to the algorithm described in section 3, we
need to traverse the SDG in two phases. Because pa-
rameter -ou t edge s are no t followed , the traversal in phase
1 does not descend into procedures. But the effects of
such procedure are not ignored. The presence of transi-
tive flow dependence edges from actual-in to actual-out
vertices permits the discovery of vertices that can reach
the vertex you want to slice through a procedure call,
although the graph traversal does not actually descend
into the called procedure. In phase 2, because call edges
and parameter in edges are not followed, the traversal
does not ascend into the calling procedure; the transitive
flow dependence edges from actual-in to actual-out ver-
tices make such ascents unnecessary. So we can solve the
call-context problem (as shown in the following codes).
void InterProSlice (SDGs, PDGId vulPdgNode)
{
//phase 1: traverse in calling procedure
ReachingNo de (s, vulPdgNode, {parameter_out});
//phase 2: traverse in called procedure
//vSet is a set marked in phase 1;
ReachingNo de(s, vSet, {parameter_in, call});
}
void ReachingNode(SDG s, vSet, kind)
//vSet is the set of vertex
//kind is type of the vertex
{ stack<PDGId> node St ack;
push the vertices that exist in the vSet to the stack;
while (!nodeStack .IsEmpty())
{ pop a vertex v from the stack;
mark the vertex v;
while (if vs depTable is not empty)
{
push the vertices existed in the depTable t o the stack;
}
}
}
After obtaining the result of the slicing, we set break-
point at the beginning of the main function and at the
place of the vulnerable function, then call the debugger
(for example: the debugger embedded in Microsoft Vis-
ual Studio or the Zeta debugger [9]) to execute by step
until the end. The implementation is as follows:
void BufferOverDetect (int start, int end, int vulPos, char *
fileName) {
char buf [10];
ZD_LoadProgram (fileName);
ZD_SetBreakPoint (start,true);
ZD_SetBreakPoint (vulPos,true);
ZD_RunTo (start);
While (start <= end)
{
ZD_RunTo (++start);
}
/*the function below is used to read a byte of
content starting from the memory address of
add to the buf. The add is passed as follows : if
you want to check whether vul (vul is defined
like this: char vul [10]) will be overflowed, we
just need to watch the content of the vul [10],
and the add is & vul [10].*/
ZD_Read (add, 1, buf);
}
5. A Sample
In this section, we will show based on program slicing a
sample (see Figure 6), where the variable of vulBuf will
be overfl owed.
Due to the restriction of space, parts of the vertices in
Figure 6 have been abbreviated shown as follows.
D:noR = DEFINE:noRelated;
D:noRBuf = DEFINE:noRelatedBu;
D:vulBuf = DEFINE:vulBuf;
A:noR++ = ASSIGN:noRelated++;
Copyright © 2010 SciRes. JSEA
Program Slicing Based Buffer Overflow Detection
Copyright © 2010 SciRes. JSEA
970
Figure 6. A sample of buffer overflow and its SDG.
C:copy = CALL_SITE:copy;
F:noR = FINALUSE:noRelated;
F:noRBuf = FINALUSE:noRelatedBuf;
F:vulBuf = FIN ALU SE :vulBuf;
A_IN = ACTUAL_IN;
A_OUT = ACTUAL_OUT;
F_IN = FORMAL_IN;
F_OUT = FORMAL_OUT;
A_IN_1 = ACTUAL_IN_1;
A_OUT_1 = ACTUAL_OUT_1;
A_IN_2 = ACTUAL_IN_2;
A_OUT_2 = ACTUAL_OUT_2;
After constructing the SDG, we start to find the vul-
nerable function call with the matching. Then we can
find the strcpy in the copy may cause overflow. So we
make inter-procedure slicing for variable p, and obtain
the result showed in Figure 7.
Then we call the debugger, and set breakpoint at the
beginning of the main function and at the place of the
strcpy, by starting to execute by step. At last, we will
find vulBuf [10] equals character of NULL (see Figure
8). This shows that vulBuf has been overflowed.
After debugging, we will give the corresponding re-
port.
There are some advantages in our detection tool based
on program slicing. First, this tool improves the accuracy
greatly compared with the ITS4 and RATS which use
string matching to detect the buffer overflows. Second,
through program slicing, we can get rid of the useless
codes. Compared with the detecting tool that sets a con-
straint for each variable and watches its value, our tool
can reduce the variables needed to watch, improve the
Program Slicing Based Buffer Overflow Detection971
# include <string.h>
void copy(char *p )
{ strcpy (p,”HelloWorld” ) ;
}
void main()
{ char vulBuf [10];
copy (vulBuf);
}
Figure 7. The slice result of the sample in Figure 6.
Figure 8. The result of a detection debugger.
performance. Third, compared with the Splint, our tool
needs not to insert information of constraint to th e source.
So the analyst needs not learn the program a lot.
6. Conclusions
This paper introduces inter-p rocedure slicing to solve the
problem of detecting buffer overflow. Compared with the
other methods, this method has the high performance and
excellent precision. But our tool is only fit for the pro-
grams coded by C, and it is still blindness in the ob-
ject-oriented langu ages. So eliminating the blindness will
be our further work.
REFERENCES
[1] “CERT/CC.Vulnerability Notes by Metric,” http://www.kb.
cert.org/vuls/bymetric?open&start=1&count=20
[2] “US-CERT Recently Published Vulnerability Notes,”
http://www.kb.cert.org/vuls/atomfeed?OpenView&start
=1&count=30
[3] “Secure Software, Rough Auditing Tool for Security,
(RATS),” Secure Software Inc, http://www.secure soft-
ware.com
[4] J. Viega, J. T. Bloch, T. Kohno and G. McGraw, “ITS4:
A Static Vulnerability Scanner for c and c++ Code,” An-
nual Computer Security Applications Conference, Hawaii,
2000, pp. 257-267.
[5] D. Evans and D. Larochelle, “Improving Security Using
Extensible Lightweight Static Analysis,” IEEE Software,
Vol. 19, No. 1, 2002, pp. 42-51.
[6] M. Weiser, “Program Slicing,” The IEEE Transactions on
Software Engineering, Vol. 10, No. 4, 1984, pp. 352-357.
[7] Y.Z. Zhang and B. W. Xu, “A Novel Formal Approach to
Program Slicing,” Science in China, Series E, Informa-
tion Sciences, Vol. 38, No. 2, 2008, pp. 161-176.
[8] S. Horwitz, T. Reps and D. Binkley, “Interprocedural
Slicing Using Dependence Graphs,” ACM Transaction on
Programming Languages and Systems, Vol. 12, No. 1, pp.
26-60.
[9] “Zeta Debugger,” http://www .fyzor.com/debug ger/index.htm
Copyright © 2010 SciRes. JSEA