Morphological Background Model for the Analysis of Traffic Flows in Urban Areas ()
1. Introduction
Real-time traffic monitoring has become essential for modern cities, especially as urban growth creates complex mobility challenges. Intelligent mobility solutions, based on technologies such as the Internet of Things (IoT) and artificial intelligence, enable the collection, analysis, and management of data in real-time, optimizing vehicular flow and improving road safety [1] [2]. These technologies are part of a trend toward developing more sustainable and efficient transportation systems that help alleviate congestion in urban areas [3]. A widely adopted solution in computer vision is background subtraction, a technique that detects moving objects by differentiating between a scene’s dynamic and static elements [4]. However, conventional background subtraction models face significant difficulties under uncontrolled conditions, such as abrupt changes in illumination, the presence of shadows, and the variability of objects in the scene [5]. These problems are particularly critical in outdoor applications, where conditions change unpredictably.
Various statistical approaches, such as Gaussian mixtures [6], and methods based on non-parametric models [7], have been widely investigated to address these limitations. However, many of these approaches require a considerable amount of computational resources due to the need for floating-point calculations and continuous adaptation to variations in environmental conditions [6]. This limits their application in embedded systems or devices with limited resources, such as drones or autonomous surveillance systems [8].
Despite advances in this field, it remains necessary to develop robust and efficient algorithms that can adapt to changing environments without requiring high processing power [9]. In the context of smart cities, as established by the UN’s 2030 agenda, the goal is to implement resilient technologies that can adapt to evolving technological, social, and environmental needs. In particular, cities are expected to efficiently utilize road resources, which requires systems that can measure and analyze variables such as traffic density, vehicular flow, and space occupancy without compromising individual privacy [10].
Automated monitoring systems through computer vision present a viable alternative, as they allow the inference of these variables in a non-intrusive manner by analyzing videos captured at strategic points. Although computer vision is not perfect and presents certain limitations in terms of precision, its ability to provide indirect data about the use of road resources without violating privacy is a considerable advantage [11]. However, current algorithms used in these systems are limited by their reliance on floating-point operations and their lack of adaptability in outdoor environments where lighting conditions and scene objects vary constantly.
Mathematical morphology is a tool in image processing that enables the analysis and processing of geometric structures through operations such as erosion, dilation, opening, and closing [12]. These operations differ from other approaches, such as statistical models or learning-based algorithms, by operating directly on the image structure without requiring prior training or adjustment of complex parameters. Mathematical morphology is based on set operations that highlight specific characteristics of images, such as edges or textures, thereby facilitating object segmentation with low computational complexity [13]. Furthermore, using morphological operations ensures that the structural integrity of objects is preserved, which is especially useful in applications where a detailed analysis of the geometry of objects in the scene is desired.
In urban mobility, drones have revolutionized traffic monitoring by providing an unobstructed aerial view, which allows for the capture of detailed information about vehicular flow and the occupancy of road infrastructures [14]. Drones equipped with computer vision systems can provide real-time data from a top-down perspective, eliminating perspective and distortion issues that typically occur with ground-based cameras [15]. This aerial monitoring capability facilitates the detection of traffic patterns. It allows for a better understanding of vehicular dynamics, contributing to implementing strategies for mobility optimization and urban congestion reduction [16].
In this work, we propose a background model based entirely on morphological operations to highlight the textures and distinctive features of the scene, thereby enabling a more robust detection of moving objects. Moreover, using the mode as a criterion for updating the background model ensures continuous adaptation to gradual scene variations, such as illumination changes due to passing clouds or the sun’s movement. Operating exclusively with integer values avoids the computational overhead associated with floating-point operations, facilitating implementation in low-cost embedded systems such as FPGAs (Field Programmable Gate Arrays) or microcontrollers.
This approach has practical applications in urban traffic monitoring. By improving the detection of moving objects, it is possible to analyze their dynamic patterns over time and obtain information about traffic flow using drones. This enables the monitoring of urban areas without the need to physically instrument spaces to measure vehicular flow variables, thereby providing a portable monitoring technology adaptable to different urban areas.
2. Theoretical Foundations
2.1. Mathematical Morphology
Mathematical morphology enables the analysis of shapes and structures in images, initially in the realm of binary images and later extended to grayscale images [17] [18]. Its aim is to study and manipulate the geometric structures that form part of images, using operators that rely on the spatial relationships among the image elements [19]. This is fundamental to highlighting specific patterns without depending solely on pixel intensity, allowing morphological operators to extract relevant features for texture and shape analysis [12].
From a mathematical perspective, an image can be considered a graph in which each pixel is connected to its neighbors in a defined manner, generating a connected structure that facilitates manipulation through morphological operators [20]. In this context, the concept of connectivity defines how the elements within the image (such as pixels or groups of pixels) are interrelated to form observable patterns, which enables the use of techniques such as dilation, erosion, opening, and closing [13]. These techniques transform the image in such a way that the spatial properties of the regions of interest are either highlighted or smoothed, which is essential for the identification and analysis of internal structures [12].
(a)
(b)
Figure 1. Texture Representation. (a) Original grayscale image. (b) Texture representation based on intensities. Source: Own elaboration.
In video image analysis, the structures of interest can be represented by the intensity changes in each frame, as these changes encode the texture and shape information of the objects present in the scene [21]. The texture of the objects, in this context, is manifested in the variations of intensities reflected in the video (see Figure 1). Due to this relationship, the use of morphological operators allows for the implementation of analytical processes capable of detecting the dominant textures of the scene over time [12]. In this way, it becomes possible to identify and catalog elements that alter the structure of the dominant texture, thereby allowing the detection and classification of moving objects within the visual environment.
2.2. Main Morphological Operations
Dilation: Dilation is a basic operation in mathematical morphology that expands the regions of an image by connecting and enlarging the high-intensity areas in grayscale images. This process uses a structuring element—a small matrix or template with a specific shape—to define how dilation is applied to each image pixel. By comparing the local area around each pixel with the structuring element, dilation allows adjacent bright regions to merge. It makes certain structural details more prominent, which enhances the visibility of edges or internal features of objects [17] [22].
The dilation of a set
by a structuring element
is defined as:
(1)
In the case of a grayscale image
, the dilation is expressed as:
(2)
where
is the translated structuring element and
is the function that defines the value of the structuring element at position
.
Erosion: Unlike dilation, erosion reduces the high-intensity regions in an image, eliminating small elements and, at times, high-frequency noise in frames. When applying erosion with a structuring element, the operation evaluates each pixel’s neighbors and reduces the size of regions based on the shape and size of the element. This process is beneficial for isolating individual structures and cleaning up small imperfections that might interfere with the morphological analysis of an image [23].
The erosion of a set
by a structuring element
is defined as:
(3)
In a grayscale image, the erosion is expressed as:
(4)
Opening: Opening is a composite operation that consists of an erosion followed by a dilation. This process allows the smoothing of irregular contours and the elimination of small elements that are not part of the main structure of an image. It is particularly useful for separating objects in the image that are connected by thin structures or unwanted noise. By applying opening, the general shapes of objects can be maintained without additional elements that distort the analysis, thus facilitating segmentation and the identification of patterns in a complex image [12].
For a set
and a structuring element
, it is defined as:
(5)
In a grayscale image, opening is applied as:
(6)
Closing: Closing is the inverse process of opening, as it consists of applying a dilation first and then an erosion. Its main purpose is to fill gaps or small spaces in the image structures, connecting nearby regions and smoothing contours. In image processing applications, closing is effective in merging disjoint regions that form part of the same visual structure and eliminating small spaces between areas that should be analyzed as a single entity. This is useful for consolidating objects with irregular contours or removing small holes within solid regions [12] [23].
For a set
and a structuring element
, it is defined as:
(7)
In a grayscale image, closing is applied as:
(8)
2.3. Top-Hat Morphological Transform
The top-hat transform is defined as the difference between the original image and its morphological opening [17], thereby highlighting objects that are brighter than the background.
(9)
where
is the original image and
is the image after applying the opening operation. This step is crucial to enhance the textures and elements that will later allow for a more precise identification of vehicles and other moving objects. The effect of this operation is illustrated in Figure 2, which shows the result of applying the top-hat morphological transform to highlight brighter structures against the background. The top-hat morphological transform enables the highlighting of high-contrast areas when applied with flat structuring elements [22]. This property is used to detect the texture of the acquired image so that it can be employed to determine the stable intensity without being affected by translational changes along the intensity axis (illumination changes).
(a) Original image.
(b) Image after applying the top-hat morphological transform.
Figure 2. Implementation of the top-hat morphological transform. Source: Own elaboration.
3. Methodology
This work implements a background subtraction algorithm based entirely on morphological operations and integer logic. The scheme of the proposal is illustrated in Figure 3. Taking this diagram as a reference, the presented proposal is described step by step.
Figure 3. Flowchart of the methodology. Source: Own elaboration.
3.1. Video Capture
The process of video capture, preprocessing, and noise reduction involves selecting the urban artery of interest and positioning the drone at a specific altitude with a northward orientation. Following this, the drone will record video of the observable vehicle dynamics. The image sequence acquired by the drone contains noise from various sources, such as sensor limitations, compression artifacts, and atmospheric conditions. For simplicity and to reduce computational overhead, this work assumes zero-mean noise (
) and applies a low-pass filter for basic smoothing. While this assumption simplifies preprocessing, it does not fully represent real-world noise characteristics, which often include non-Gaussian and structured noise patterns. Future improvements will consider more advanced denoising techniques, such as median filtering or wavelet-based denoising, to better address the complexity of noise in drone-based imagery. Additionally, color saturation correction is applied to ensure that the capture spans the entire visible spectrum of the camera. For the development of the proposal, each frame is considered in grayscale.
3.2. Background Model Estimation
The background model is built using the mode resulting from the image after applying the top-hat transform, which is selected pixel by pixel across the video frames. This approach is chosen to avoid the use of floating-point operations, allowing all processing to be carried out with integer values, thereby improving computational efficiency and facilitating implementation in embedded systems.
The mode-based background model has the advantage of continuously adapting to gradual changes in the scene, such as variations in lighting or the passage of clouds, resulting in a more precise and robust background estimation over time.
As the video progresses, the algorithm dynamically updates the background model. When drastic changes in the scene are detected, such as a sudden increase in the number of moving objects, the model is reset, eliminating previously stored values and beginning a new adaptation process. This mechanism allows the algorithm to be resilient to abrupt changes in the scene without losing accuracy. Additionally, the use of a buffer of previous values enables the algorithm to respond quickly to these changes, ensuring a rapid convergence towards a stable background model.
3.3. Execution of the Algorithm
In each frame, the difference between the estimated background and the current image is calculated to identify areas with moving objects. This difference is thresholded and subjected to basic morphological operations to reduce noise and connect regions of motion, thereby allowing for precise segmentation of vehicles.
The background is gradually adjusted to reflect changes in the scene, while a buffer of previous values enables the detection and rapid response to abrupt changes, resetting the model where necessary. This process ensures that the background remains stable, effectively differentiating static elements from dynamic ones.
4. Discussion and Analysis of Results
To validate the proposal, a set of tests was conducted on outdoor videos. The characteristics of the videos are described below:
For each video, the algorithm will be applied and the moving areas will be detected. The operational conditions of the proposal are presented below:
The fixed threshold value of 40 for motion detection was empirically determined based on tests in controlled lighting conditions and scenes with moderate contrast between moving objects and the background. This choice simplifies the implementation and maintains computational efficiency by avoiding pixel-level adaptivity. However, fixed thresholds can be sensitive to illumination changes and scene variability. As a future improvement, adaptive thresholding strategies—such as local statistical analysis, percentile-based methods, or entropy-based segmentation—will be considered to enhance robustness in more dynamic environments.
4.1. Background and Foreground Estimation
A method based on constructing a data cube was implemented to analyze the evolution of each pixel’s intensity over time. This data cube allows the representation of the histogram of intensities for a specific row of pixels in a video sequence, providing a detailed view of how the intensities change and distribute over time.
The implementation begins with the selection of a row of pixels in the image. For each frame of the video, the intensity values of each pixel in this selected row are extracted. Subsequently, these values are recorded and organized into a histogram, a process repeated for each frame in the video sequence. This process results in a three-dimensional representation (the data cube), in which:
The X-axis represents the position of each pixel in the selected row.
The Y-axis corresponds to the histogram counts of the intensities for each pixel.
The Z (or temporal) axis represents the progression of frames in the video.
Figure 4 represents the data cube from a perspective that facilitates observing how each pixel’s intensity changes over time. This representation allows for the analysis of texture patterns and movements throughout the scene, highlighting the fluctuations in histogram counts that indicate variations in illumination or moving objects. In particular, the peaks and valleys in the histogram counts signal significant intensity changes in the pixels, which are often associated with the presence of moving objects or modifications in lighting conditions. This information is valuable for distinguishing between the static background of the image and the dynamic variations appearing in the scene, providing a solid foundation for motion detection and background analysis.
Figure 4. Pixel profile line illustrating the histogram of intensities encoded in pseudocolor. Dark areas represent low frequency and light areas represent high frequency. Source: Own elaboration.
In Figure 5(a), we show the roundabout image in its original state, captured in color and displaying both moving vehicles and road infrastructure. This image provides a visual basis of the environment, with all elements present—including vehicles, traffic signs, and pavement features. The inclusion of color in this image allows for a better interpretation of the different components and highlights the traffic elements in contrast to the static street background. This initial state serves as a visual reference for observing the changes obtained with the background model.
In Figure 5(b), the estimated background after applying the morphological transform and the statistical mode model is presented. As a result, only the road infrastructure is visible without vehicles or other dynamic elements. The application of the morphological transform has allowed for smoothing out irregularities in the image and accentuating the structural characteristics of the street, effectively eliminating the moving parts. This result confirms that the model based on statistical mode is effective in identifying consistent patterns over time, such as the pavement, and in eliminating transient elements, thereby achieving the goal of obtaining a clean background representation.
(a) Original image
(b) Estimated background using statistical mode after the top-hat morphological transform
(c) Estimated foreground (difference between the original image and the estimated background)
Figure 5. Implementation of the proposed model. Source: Own elaboration.
Finally, Figure 5(c) shows the estimated foreground, generated by calculating the difference between the original image and the estimated background. In this image, the moving vehicles clearly stand out against the background, providing an accurate representation of the dynamic elements. The difference in intensities highlights the vehicles, demonstrating that the proposed model is capable of effectively detecting and segmenting moving objects. This approach facilitates traffic analysis, as it allows for the identification of vehicle flow and other dynamic patterns at the roundabout without interference from the static background. Moreover, the clarity with which the vehicles are presented in the foreground underscores the model’s accuracy in preserving the details relevant for the analysis.
4.2. Intensity Behavior
Figure 6(a) represents the intensity of an arbitrary pixel from the video over time. A base intensity is observed, representing the background color corresponding to the surface of the avenue, and generally, it remains stable in intensity values, albeit with some fluctuations. These minor variations reflect changes in illumination conditions throughout the day, possibly due to solar movement or the presence of clouds altering the direct lighting. The vertical lines of higher intensity correspond to the appearance of vehicles passing over the pixel. These significant variations stand out from the background’s base value, which is useful for differentiating vehicle movement from the static background.
(a) Intensity of a pixel over time
(b) Pixel intensity histogram
Figure 6. (a) Intensity of a pixel over time. (b) Pixel intensity histogram. Source: Own elaboration.
This pattern demonstrates that the employed model can adapt to gradual changes in illumination, allowing the expected background value to adjust to the natural variations of the day without confusing them with vehicle movement. This suggests that the model is robust against light fluctuations—a fundamental characteristic for applications in outdoor environments where lighting conditions may vary considerably.
On the other hand, the histogram in Figure 6(b) shows the intensity distribution over a time interval of the video. A clear concentration is observed around certain intensity values that represent the background (the surface of the avenue). Although these values may exhibit slight variations due to changes in illumination, the distributions maintain a consistent shape in each channel, indicating stability in the background color.
This behavior reinforces the effectiveness of the model in capturing light variability without destabilizing the background representation, as the expected values do not undergo abrupt deviations that could be mistaken for movement. The stable shape of the histogram suggests that, despite intensity changes due to environmental conditions, the chromatic structure of the background is preserved, allowing for easy distinction of movement events when significant peaks appear in the histograms, caused by moving objects such as vehicles.
4.3. Limitations and Future Work
The current evaluation is limited to roundabout scenarios captured using a top-down camera angle. This choice was made to maintain a controlled environment, where the effects of occlusions and perspective distortion are minimized, facilitating the validation of the proposed morphological model. However, such focus restricts the generalizability of the findings to other urban traffic scenarios. Future evaluations will include more diverse scenes such as intersections, multi-lane roads, and environments with varying traffic densities and lighting conditions to comprehensively assess the algorithm’s robustness. Additionally, testing under different weather conditions and oblique camera perspectives will provide a broader understanding of the model’s adaptability in real-world applications [24]. Additionally, the algorithm presents limitations when facing certain real-world challenges. The model assumes a stationary camera, and although drone positioning is generally stable, minor vibrations or shifts can affect background consistency and reduce detection accuracy. Another concern is the presence of shadows, which may be misclassified as moving objects due to their changing shape and intensity, especially in scenes with varying sunlight conditions. Complex background dynamics—such as moving vegetation, reflections, or intermittent occlusions—also pose challenges. Since the model is based on the mode and operates on texture differences, background elements with high temporal variability may be incorrectly classified as foreground. Addressing these issues would require the integration of shadow detection mechanisms, motion compensation techniques, or hybrid models combining morphological features with temporal coherence analysis.
4.4. Applications in Automatic Monitoring Systems
The developed algorithm has applications in automatic monitoring systems, where it is necessary to analyze and track the movement of objects in a scene. In the case of traffic surveillance systems, the algorithm enables the detection and tracking of vehicles based on a background model that isolates the dynamic elements of the scene [25]. This is useful for applications such as traffic monitoring at intersections, urban patrols, and the evaluation of movement patterns [26].
Figure 7 shows how, based on the background model, it is possible to extract the trajectories of moving vehicles. Each video frame is compared with the background model to detect areas of change in pixel intensity, thereby identifying moving vehicles [27]. Once detected, their trajectories can be tracked by superimposing contours or visual markers. Lines connect the positions of the vehicles in each frame, generating a clear representation of their trajectories in real time.
Figure 7. The image shows how, based on the background model, moving vehicles are identified and their trajectories traced in real time. Source: Own elaboration.
This technique allows for the analysis of vehicle flows and the detection of anomalous behaviors, such as lane changes or sudden stops. Additionally, it is useful for implementing vehicle counting systems, measuring average speed, and detecting congestion in critical areas [28].
5. Conclusions
One of the main achievements of this study is the ability of the mode-based background model to dynamically adapt to changes in the scene, such as variations in illumination due to the passage of clouds or the movement of the sun. This type of adaptability ensures that the algorithm is resilient to gradual changes, maintaining accuracy and reducing the need to constantly restart or recalibrate the system. Additionally, it has been demonstrated that updating the model with a buffer of previous values contributes to rapid and stable convergence, ensuring an efficient response to abrupt changes.
Nonetheless, there are some situations that can be addressed in future work. For example, evaluating criteria for the detection of areas saturated with moving objects, exploring the impact of different types of morphological operations, and developing a comparative analysis of their effectiveness in various contexts. Incorporating continuous adaptation models and long-term machine learning could also be explored to assess the consistency of the method.
In conclusion, the approach based on mathematical morphology and the use of the mode as a criterion for determining the expected value and updating the background model provide an effective and efficient solution for object detection in urban environments, with great applicability in urban traffic monitoring. The combination of computer vision algorithms with the aerial perspective provided by drones has significant potential to revolutionize the way smart cities manage their road resources, contributing to safer and more efficient mobility. Future work should focus on optimizing performance in more complex scenarios and on integrating hybrid approaches that combine mathematical morphology with machine learning-based methods, in order to further enhance the robustness and adaptability of video analysis systems.
Acknowledgements
We wish to thank the CIICCTE (Centro de Investigación e Innovación en Ciencias de la Computación and Tecnología Educativa) laboratory belonging to the FIF-UAQ, which provided technical and infrastructure support.