Video Based Vehicle Detection and Its Application in Intelligent Transportation Systems

Video based vehicle detection technology is an integral part of Intelligent Transportation System (ITS), due to its non-intrusiveness and comprehensive vehicle behavior data collection capabilities. This paper proposes an efficient video based vehicle detection system based on Harris-Stephen corner detector algorithm. The algorithm was used to develop a standalonevehicle detection and tracking system that determines vehicle counts and speeds at arterial roadways and freeways. The proposed video based vehicle detection system was developed to eliminate the need of complex calibration, robustness to contrasts variations, and better performance with low resolutions videos. The algorithm performance for accuracy in vehicle counts and speed was evaluated. The performance of the proposed system is equivalent or better compared to a commercial vehicle detection system. Using the developed vehicle detection and tracking system an advance warning intelligent transportation system was designed and implemented to alert commuters in advance of speed reductions and congestions at work zones and special events. The effectiveness of the advance warning system was evaluated and the impact discussed.


Introduction
The goals of Intelligent Transportation System (ITS) are to enhance public safety, reduce congestion, improved travel and transit information, generate cost savings to motor carriers and emergencies operators, reduce detrimental environmental impacts, etc. ITS technologies assist states, cities, and towns nationwide to meet the increasing demands on surface transportation system.The efficiency of an ITS system is mainly based on the performance and comprehensiveness of the vehicle detection technology.Vehicle detection and tracking are an integral part of any vehicle detection technology, since it gathers all or part of the information that are used in an effective ITS.
In transportation, vehicle detection system may be defined as a system which is capable of detecting vehicles and measure traffic parameters such as count, speed, incidents, etc.Also vehicle detection can be used for various transportation applications like: autonomous vehicle guidance, vehicle safety, etc. Vehicle detection by video cameras is one of the most promising non-intrusive technologies for large-scale data collection and implementation of advanced traffic control and management schemes.Vehicle detection is also the basis for vehicle tracking.The correct vehicle detection results in better tracking.Modern computer controlled traffic systems have more complex vehicle detection requirements than those adopted for normal traffic-actuated controllers for traffic signals, for which many off-the-shelf vehicle detectors were designed [1,2].Many useful and comprehensive parameters like-count, speed, vehicle classification, queue lengths, volume/lane, lane changes, microscopic and macroscopic behaviors can be evaluated through video based vehicle detection and tracking.Autoscope [1] and Iteris [2] are example of off-the-shelf commercial video based vehicle detection systems most commonly used in the nation.
This work focuses on developing a real-time vehicle detection system for low-resolution traffic video feed.The developed system determines the total and lane based vehicle counts and average speed of the vehicle for a given segment of the roadway.There exist many off-theshelf commercial video detection systems for vehicles for various applications like vehicle detection at intersections, vehicle incident detections, etc.However, vehicle data collection have been traditionally approached using one of the following sensor based approaches: radar, lidar, loop detectors, microwave sensors, etc. Lately, video based vehicle data collection systems are being explored [3] by commercial video detection systems.Moreover, these systems require extensive calibration and require user knowledge and expertise to configure these systems.Also, knowledge of unknown or accurate parameters (height of the camera) are required to obtain better results.Also variation in illumination of the video degrades the efficiency of the system.In order to address the above drawbacks, we propose a vehicle detection system for ITS application based on Harris-Stephen Corner Method (HSCM).The system requires fewer calibrations and is less immune to illumination changes because of the point detector and tracking methodologies adopted.The developed system was evaluated and the performance analyzed using a set of video feeds (8 sets) of 1 min.interval.The video feeds vary in illumination (captured during various times of the day), camera mount height, camera view angle and region of view.The system was also implemented on an embedded computer platform.The evaluations are tabulated and the advantages and accuracy of our implementation and discussed.

Background
Video based object or motion detection and tracking are two tasks that play a fundamental role in video surveillance systems, transportation systems, military applications, gaming systems, etc.This section mainly focuses on the problem of video based vehicle detection and tracking for ITS applications.Vehicle detection is a process of detecting the presence or absence of a vehicle in the video sequence.Vehicle tracking is defined as finding the location of a vehicle in each frame of the video sequence.Typically the result of detection is used as initialization process for tracking.Video based vehicle detection and tracking systems for ITS applications are performed using: 1) static or moving cameras, 2) single or multiple cameras, 3) fixed or Pan-Tilt-Zoom (PTZ) cameras.The efficiency of any vehicle detection system is based on the systems readiness to handle loss of information, noise in video, complexity of vehicle motion, vehicle occlusion, shape complexity, illumination changes and real-time processing.
Vehicle detection and tracking approaches can be broadly classified based on the representation of the object/vehicle, detection methods and tracking methods.Representations of vehicles for detection and tracking include points, shapes, silhouette, contours, and object models [4].Some of the initial approaches to vehicle detection and tracking systems involve spatial, temporal or spatio-temporal analysis of video sequences.Vehicle detection and tracking in general has been performed using one of the following methodology: point detection and tracking [5][6][7][8], edge detection [9][10][11][12], frame differentiation [13][14][15][16] thresholding and segmentation followed by feature extraction [17,18] and matching [19][20][21][22] (by correlation or template matching or supervised learning), and by optical flow methods [23][24][25][26].Point detection and tracking methods are fast and provide better results for illumination changes.Edge detection methods employ morphological edge detection schemes to determine the object/vehicle.Edge detection techniques are relatively fast and are less immune to illumination variance.However, tracking of the vehicle has to be performed by recursive edge detection on subsequent frames.Also, solutions for fixing disconnects in edges and contour irregularities are time consuming and vulnerable to noise.Frame differencing approaches are relatively fast, but require either a static background or reference image [15,27] or frequent updating of the background image [27,28] making it not suitable for slow moving vehicles.Feature extraction and matching methods derive dimensions, textures, color, shapes, etc. of vehicles and matching them and validate using templates or by correlation.Even though, these methods are comprehensive and can provide higher accuracy in differentiating objects, they are inflexible and are susceptible to intensity variations, shadows and object occlusions.Supervised learning methods are more suitable for object detection only, time consuming (training), and inflexible for location changes.Optical flow methods encode the temporal displacement of the pixels during motion and the variation in the spatial structure of the scene.This approach is computational expensive, vulnerable to noise, perspective distortion, occlusion, static objects and sensitive to the camera position.Some of the challenges of effective vehicle detection and tracking compared to general object detection and tracking are: 1) detection and tracking should be fast due to the relative fast movement of the object/vehicle, 2) should be independent of the location, camera region of view, camera resolution and mounted height, illuminetion, shadows and occlusion, 3) require less calibrations to determine traffic flow data, and 4) maximize true detections and minimize false detections.On the other hand, vehicle detection and tracking provide less challenge compared to general object tracking in: 1) vehicle direction and displacement is common and proximity-uniform, 2) vehicles shape and features are distinct from background and active objects (pedestrians), 3) motion of vehicles are bound to the roadway area (smaller frame size for processing), and 4) object detection by matching and supervised learning are overwhelming and is only required for vehicle classification.
The main goal of this work was to develop real-time embedded vehicle detection and tracking system from low resolution CCTV camera feeds to determine vehicle count and flow parameters that could be used for various ITS applications.Given the above realistic goals, point detection and tracking methodology was selected for our vehicle detection and tracking system for the following reasons: 1) extremely fast and can detect multiple objects in a single frame, 2) better performance with illumination variance, 3) since vehicle direction and displacement is common and proximity-uniform, iterative point tracking across multiple frames are quick, 4) camera region of view and camera positioning height are not required, and 5) can handle partial occlusion.
Point detectors are used to find interest points in images.Interest points have classically been used for motion detection and vehicle tracking.Also, interest point detectors have been proved to be invariant to illumination changes and camera viewpoint.The commonly used interest point detectors in literature are: Moravec's operator [5], Harris-Stephen corner interest point detector [6], KLT detector [7], and SIFT detector [8].Moravec's operator computes the image intensity variation in a 4 × 4 patch only for discrete set of shifts at every 45 degrees.Moravec's operator fails to detect an edge/interest point if the edge is in the direction of its neighbors.Harris-Stephen improved upon Moravec's operation by considering the following: 1) all possible small shifts are covered by performing an analytic expansion about the shift origin, 2) reduce noise by considering a smooth circular Gaussian window, 3) compute first order image derivatives in x and y directions to highlight the directional intensity variations, then a second order moment matrix, which encodes this variation for each pixel in a small neighborhood.The interest points are evaluated by determining the determinant and trace of this matrix and the interest point are derived after thresholding the interest point confidence value after applying non-maxima suppression.
Point tracking methods determine the correspondence of interest points across frames.Point correspondence methods can be broadly classified as deterministic and statistical methods [4].Deterministic methods use quailtative motion heuristics to constrain the correspondence problem.Statistical methods use explicit object features, parameters, and uncertainties into consideration to determine correspondence.Deterministic methods define a correspondence cost with a set of motion constraints associated with each object between frames.Some of the motion constraints used in literature for point tracking include: proximity, maximum velocity, smooth motion, common motion, rigidity and proximal-uniformity [4].The vehicle detection and tracking system implemented in this work is compared with Autoscope [29], a comercial video based vehicle detection system.The Autoscope vehicle detection system is based on background frame differencing [30] and inter-frame differencing for vehicle detection and edge detection and centroid correspondence method for tracking.The system also has significant built-in heuristics for shadows elimination and for detection under various weather conditions.Also, height of the camera position significantly influences the detection and tracking accuracy.The developed vehicle detection and tracking system is based on interest point detection and tracking using Harris-Stephen corner detector and point correspondence to determined the displacement shift in pixels corresponding to the vehicle travel.The developed system was used as vehicle detector for the projects, "Testing and Evaluation of the Effectiveness of Advanced Technologies for Work Zones" and "Test Queue Detection Systems for Preventing Accidents in Nevada", which were sponsored by Nevada Department of Transportation.The vehicle detector system was developed using Harris-Stephen corner detection algorithm using OpenCV library on a Arcom's Olympus Windows XP Embedded development kit running WinXPE operating system.The system developed was used to detect vehicle flow (count and speed) to determine congestion at work zones and special events and inform approaching vehicles of congestion and warning signs to reduce speeds.

Harris-Stephens Corner Detection and Point Tracking
In this section, Harris-Stephens corner detection algorithm used to determine interest points in the image is discussed.Point tracking using deterministic methods for point correspondence is used to track the vehicles.Also, spatial and temporal characteristics are used to derive vehicle counts.Speed of the vehicle is determined using vector mapping and scaling of interest points at different frames.
Harris-Stephens corner detection algorithm is based on the auto-correlation function of a signal, where the local auto-correlation function measures the local changes of the signal with patches shifted by a small amount in different directions.The Harris-Stephen corner detection method was improved upon Moravec's corner detector.The main drawback of Moravec's corner detector is that it is not isotropic [6].Harris-Stephen corner detector considers the differential of the corner score (auto-correlation) with respective to the direction, instead of using shifted patches.
Let us consider an 2-D image I, with an image area   , x y shifted by   , x y   .The weighted Sum of Squared Difference (SSD) or auto-correlation between the two image patches are denoted as is given as: x y y    can be approximated by using Taylor expansion as follows: where I x and I y are the partial derivatives of I with respect to x and y respectively.Therefore, the auto-correlation function can be expressed as an equation and as a matrix as follows:


The matrix A (Harris matrix), captures the intensity structure of the local neighborhood and is a smooth circular Gaussian window defined as follows: 2 , e x y w x y The Harris matrix is expressed as: x y x y x y x y


A corner point is characterized by a large variation of C in all direction of the vector   , x y .Let λ 1 and λ 2 be the eigenvalues of the matrix A. By analyzing the eigenvalues of A, the following inferences can be made: 1) If 1 0   and 2 0   then the pixel  ,  x y has an auto-correlation function that is flat and has no interest point.
2) If 1 0   and 2  has some large positive value, then the pixel auto-correlation function is ridge shape and interest point is an edge.
3) If 1  and 2  are both large positive values, the auto-correlation function is sharply peaked and the interest point is a corner.
Since the exact computation of the eigenvalues of the matrix is computationally expensive, computation of the function R has been suggested by [6].R is also refereed as the interest point confidence value.
The above expression reduces the problem of determining the eigenvalues of the matrix A to evaluating the determinant and trace of the matrix A to determine the interest points or the corner points of the object/vehicle.

 
The interest points are marked by thresholding R and applying non-maximal suppression.The value of has been determined empirically, and in literature a range of 0.04 -0.15 has been suggested.κ Object or vehicle tracking can be formulated as the correspondence of the interest points across frames.In this work, we employ a deterministic method for correspondence of interest points.Deterministic methods for interest point correspondence typically define a cost function.The cost function is a cost of associating each object or vehicle in frames and j j k  using a set of motion constraints.Minimization of the correspondence cost is usually modeled as an optimization problem.However, for vehicle tracking application the correspondence cost is modeled as combination of proximity and common motion constraints.In our work, the correspondence cost employed involves matching of the object or vehicle centroids within lanes along with spatial proximity and common motion constraints [4].In other words, based on the vehicle direction the interest points move along a common direction and the relative shift of the interest points are uniform and can be found in certain proximity.If

 
, C x y denote the set of interest points or corner points determined by Harris-Stephen corner detection algorithm for the frame j, then the centroid of the object is determining as follows:   , , Considering the proximity and common motion constraint assumption the centroid approach is suitable for determining the center of the object of the vehicle in the bounding region or vehicle detection zone.
Our next formulation explains the approach of determining the speed of the vehicle.Let N be the number of frames/sec captured for video processing.Let r be the total shift in the centroid of the vehicle from frame j to j + k expressed as pixels.The centroid point is denoted as , for the frames and j j k  respectively.The centroid displacement in pixel is determined as: If D (in miles) is the real-world distance from the reference points on the screen to , r x y d , then r is also evaluated by the above equation.The speed of the vehicle that has a centroid displacement v for the frames to d j j k  is determined by the following formula: Figure 1, shows the reference points ( and , r x y ), centroid displacement ( ) and the vehicle centroid at the frames and .

Vehicle Detection and Tracking
The process of vehicle detection and tracking in this work is implemented using Harris-Stephen Corner detection algorithm to determine the corners points and the interest points are tracked between video frames using deterministic interest point correspondence method.
Based on the location and displacement of the interest points, vehicle counts and vehicle speeds are determined.The tasks employed to determine the above process is explained as follows: Capture of live video feed: Live video feeds from CCTV cameras monitoring freeways and arterials are captured for video frame processing by a USB frame capture device.The video feeds are captured at various locations at different time of the day.Also, the CCTV cameras are pan-tilt-zoom cameras with varying camera field of view.Also, the height of the camera mounted is unknown.
Pre-processing of video frames: Using a GUI tool developed as part of our vehicle detection system, the user can select the region of interest on the captured video frame.The detection and tracking algorithms are only performed on this cropped image region to reduce the processing time of the system.The user is required to specify detection and speed zones using horizontal virtual reference lines.The detection zones are areas where the interest points are evaluated, vehicles detected and vehicle counts are incremented.The speed zones are adjacent to the detection zones, where the interest points are re-evaluated and vehicles are detected.As a rule of thumb, the detection zone length should be less than the vehicle length as seen in the video feed and the speed zone length should be just greater than the vehicle length as seen in the video feed.The user specifies the virtual vertical lane reference lines that segment the lanes on the video frame.These vertical lines are used to determine vehicle counts by lane.Also, the user specifies the direction of vehicle motion or traffic flow, the calibration reference line and the corresponding distance in physical distance.This reference distance is used to evaluate the speed of the vehicle.
Smoothing: Due to low quality of image captured from CCTV cameras (320 × 240 pixels), smoothing of the image to eliminate noise is performed.Gaussian smoothing is preferred as the noise or the nature of the object detected could be of a Gaussian probable function.The ROI in each frame is convoluted using a 2-D circular Gaussian function and its discrete approximation shown below: Color conversion: This task converts the color image/frame in the Region of Interest (RIO) from color values to gray-scale values.The video frames captured by the frame grabber device are in additive RGB color format.The grayscale image is derived using the following formula: * * intensity 0.2989 red 0.5870 green 0.1140 blue   

Vehicle detection using Harris-Stephen corner in-terest points detection:
The corner points of the vehicle in the detection zone is determined by the above mentioned Harris-Stephen corner point detectors [6].The corner points are used to detect the vehicles to determine vehicle counts.A thresholding scheme based on proximity is used to determine the interest points belonging to the same vehicle.If the interest points are in the threshold proximity but are in different lanes, then the interest points are considered disjoint.The centroid of these corner interest points are calculated and marked in the frame using the Equation (1).Using the centroid locations, traffic flow parameters like, total vehicle count, vehicle count/lane, total volume and volume/lane can be determined.
Vehicle Tracking and Speed Calculations: Using the HS corner detector and thresholding interest points and the centroid of these interest points are determined between detection and speed zones.The detection and speed zones are placed adjacent to each other.Care should be taken not to place them too far from each other and the zones are also smaller compared to the size of the vehicle seen in the image frame.This approach also assumes that there exists no large velocity change of the vehicle.The centroids of the interest points between frames are determined using point correspondence method discussed above.Since the direction of the vehicle travel or flow has been specified using user interface, matching of correspondence points (centroids) are based on proximity and smooth motion constraints.Correspondence of centroids in respective lanes is considered.This approach will result in false detections and speed if vehicles change lanes at the detection/speed zones.The pixel displacement r ) of the vehicle centroid are determined across frames (Equation ( 2)) and the speed of the vehicle is determined using the formula in Equation (3).Based on the above process the system collects average speed of the vehicles and average speed of the vehicle/lane.d k

Implementation
This section discusses the implementation of the vehicle detection and tracking algorithm based on the Harris-Stephen corner detector and interest point correspondence discussed in Section 3. The vehicle detection and tracking algorithm was used to determine vehicle counts, speed and volume of vehicle in a roadway segment and by lane.The developed system was used as a vehicle detector for real-time advance warning ITS system.The video frames from live video feeds from CCTV cameras were captured using the Hauppauge Live USB frame grabber device.The video frames were captured at (N) 29.95 frames/sec.The vehicle detection and tracking algorithms were developed using VC ++ development platform using OpenCV image processing library.The video frame captured size is 320 × 240 pixel in dimension.The points are determined (Figure 2(a)).Interest points are further determined using non-maxima supression and thresholding (Figure 2  The calibration process is supported by a user interface that converts the real-world geometry to frame or pixel geometry.The user interface provides the user to specify the units of the distance reference line in feet, direction of vehicle flow, horizontal lines to specify the detection and speed zones, vertical lines to specify the lane demarcations.Figure 1 shows a snapshot of the virtual calibration lines and detection zones on the captured frame.In order to offset the varying point of view or angle of view of the CCTV camera, the lane demarcation lines are double lanes for our application.The cropped frame is convoluted with the Gaussian function as shown in Equation ( 4) to minimize the noise in the video frame.Using HS corner detector the corner detector method the corner The developed vehicle detection system was evaluated on a series of video feeds (8 sets) of 1 min.interval that was recorded at many locations around the Las Vegas valley.The video feeds vary by location, illumination (recorded during different times of the day), road dimension, camera view angle and region of view.The system was evaluated for vehicle count and speed.Vehicle speeds during video recording were determined using radar technology.Vehicle counts were manually verified.Table 1 summarizes the average count of vehicles and the average speed of the vehicles in each lane over all sets of evaluation video.Table 2 and Figure

Vehicle Detection Application for ITS
The developed video based vehicle detection system was employed for advanced warning of congestion and queues at work zones and on freeways during special events.The advance warning system consists of a series of video monitoring stations equipped with video recording devices and our video based vehicle detection system.Vehicle queue lengths, speed and counts were monitored before work zones or special event locations and real-time information regarding congestions were transmitted using Radio Frequency (RF) modules with directional antennas to a portable variable message sign trailer few miles downstream.Figure 5 shows the advance warning system implementation at one of our test sites.The evaluations of the system were conducted at various times of the day and the vehicle speeds evaluated with and without the advance warning system in play.
The significance of the advance warning system on vehicle speeds by lane is shown in

Results and Discussions
The advance warning system was developed, imple mented and evaluated using many off-the-shelf components (cameras, digital video recorders, RF communication modules, etc.), and the developed video based vehicle detection system.The initial video based detection system employed (Autoscope) requires calibration, contrast adjustments, and fine-tuning of configuration parameters (camera height) for accurate results.Therefore, a video based vehicle detection system was developed using the Harris Stephen Corner detection method (HSCM) to eliminate the need of complex calibration and contrasts modifications.The performances of HSCM and Autoscope are compared for vehicle speed and count.The performance of HSCM is better when compared to the Autoscope with respect to vehicle speed.The HSCM provides an average speed of 64 mph compared to 62 mph determined by the Autoscope.Earlier speed test using radar devices indicate that the Autoscope determined speeds 5 mph less than the actual speed.Therefore, HSCM provides a better accuracy for speed than Autoscope.The performance of HSCM is better than Autoscope for vehicle counts in Lanes 1 and 2 (lanes closer to the camera).But for Lanes 3 and 4, the vehicle counts degrades significantly.This is due to the skew of the camera field of view.As the camera is installed on the light pole in the median, there exists a considerable skew of the captured video that results in elevated vehicle occlusions.This results in some count errors in vehicle detection on respective lanes.Also due to the camera field of view skew, some interest points of certain vehicles in lanes 2 and 3 are detected in both lanes (lanes 2 and 3 and lanes 3 and 4 respectively).This results in a single vehicle being counted in both lanes (lane 3 and 4).The above problem can be rectified by adjusting the camera field of view to minimize occlusion of vehicles on adjacent lanes.Proper placement of the virtual vertical lane reference lines and employing camera calibration methods or transforms will also reduce the count errors in lanes 3 and 4. Lanes 1 and 2 produce better results as they are less affected by the camera skew.Autoscope performed better for this type of video, as it adopts background subtraction method rather than interest point detection method for vehicle detection.Future efforts will focus on improving our current vehicle counts in lanes 3 and 4 using the above discussed solutions.Also, shadow elimination algorithms will be employed for vehicle detection and classification.
Finally the contribution of this work can be summarized as follows: 1) a vehicle detection and tracking system is based on interest point detection and tracking using Harris-Stephen corner detector and point correspondence for developed, 2) the vehicle detection and tracking system is capable of determining vehicle counts and vehicle speeds, 3) the system can determine vehicle counts and speeds from low resolution video feeds in real-time under various illumination conditions with very little configuration and calibration requirements, and 4) the vehicle detection system was used as part of an advance warning Intelligent Transportation System (ITS).
(b)).The interest points are shown by the bounding boxes and the centroid point shown as red dot in Figure 2(c).The bounding boxes denote the proximity region to determine if these interest points are of the same vehicle.The centroid of the vehicle is determined by the formula in Equation (1) and a vehicle count incremented or validated if a set number of interest points are detected in the detection zone as shown in Figures 3(a)-(c).Vehicle speed is determined by determining the displacement of the vehicle centroids across three video frames - j k , and j j k  as shown in Figures 4(a)-(c).The speed of the vehicle is determined as discussed in Equation (3).
region of interest selected by the user is by default 270 × 160 pixels.
6(b),summarizes the total average speed of the vehicles for all sets of evaluation videos.

Figure 1 .
Figure 1.Calibration lines for count and speed evaluation.

Figure 4 .
Figure 4. Detected of Harris-Stephen corner interest points on vehicles at frames: j − k, j and j + k.

Figure 5 .
Figure 5. Advance warning system implementation at test site.

Figure 6 (NFigure 6 .
Figure 6.(a) Effectiveness of advance warning system on vehicle speed; (b) Graphical comparison of the HSCM method and autoscope for vehicle speed.of 5 miles/hr in speeds and less traffic congestion at work zones and special events due to the deployment of the advance warning system.The next section discusses the performance of the detection algorithm and the impact of the advance warning system.