Indoor human detection based on Thermal Array Sensor data and adaptive background estimation

Low Resolution Thermal Array Sensors are widely used in several applications in indoor environments. In particular, one of these cheap, small and unobtrusive sensor provides a low-resolution thermal image of the environment and, unlike cameras, it is capable to detect human heat emission even in dark rooms. The obtained thermal data can be used to monitor older seniors while they are performing daily activities at home, to detect critical situations such as falls. Most of the studies in Activity Recognition using Thermal Array Sensors requires Human Detection techniques to recognize humans passing in the sensor ﬁeld of view. This paper aims to improve the accuracy of the algorithms used so far by considering the temperature environment variation. This method leverages an adaptive background estimation and a noise removal technique based on Kalman Filter. In order to properly validate the system, a novel installation of a single sensor has been implemented in a Smart Environment: the obtained results show an improvement in human detection accuracy with respect to the state of the art, especially in case of disturbed environments.


Introduction
Facilitate older seniors independent living has become an important issue of the current research in the field of Assistive Technologies fostered by worldwide governments. Indeed, as reported in the technical report of the United Nations [1], the over-60 world population in the next 15 years is projected to grow by 56 percent, from 901 millions to more than 1.4 billions. The ageing process is particularly evident in Europe and in Northern America, where in 2015, more than one out of five people was over 60, and it is growing rapidly in other regions as well. Thus, Bierhoff et Al. [2], for example, report Health-Care Systems among the products and services based on Smart Home technologies. In particular, since their aim is to detect critical conditions or predict them on early stages and alert caregivers, Emergency Treatment services are crucial for older adults living in smart homes. This kind of systems usually rely on a network of sensors to unobtrusively monitor the life of a person, providing feedback to his/her beloved [3]. A concrete example of a widely demanded feature by the families of the elderly people who live alone is a Fall Detector, indeed falls are nowadays considered as the most frequent hazard for older seniors and they may endanger the physical and psychological health of a person, hindering independent living [4]. For this reasons, monitoring elderly people at home makes them feel safer and help their relatives to be more confident, knowing the well-being of their beloved.
Low Resolution Thermal Array Sensors (LR-TASs) are very suitable in a home environment for substantial reasons. Thanks to their low-resolution, these sensors provide useful data without invading the privacy of the dweller as it could happen using cameras or microphones. Furthermore, these devices are small, cheap, easy to be installed in a normal room, and they can work even in absence of light.
LR-TASs are composed of m×n infrared sensing elements, acquiring the temperature of a two dimensional area. In this paper we refer to experiments conducted using the Grid-Eye [5] sensor developed by Panasonic. This device is an 8-by-8 LR-TAS with sampling rate of 10 samples/s, a temperature range from -20 • C to 80 • C with 0.25 • C resolution, field of view of 60 • , and maximum distance declared to detect humans of 5m. Moreover, Grid-Eye sensor comes with an on-board thermistor that provides the environment temperature from -20 • C to 80 • C with 0.0625 • C resolution. It communicates through I 2 C interface to a wireless station which sends all the data to a central processing unit for the analysis.

Previous works
Human localization has several applications in Smart Home Environments, e.g., surveillance, health monitoring, and energy management. In particular, LR-TAS has been used to accomplish different tasks.
The goal of the work proposed by Sixsmith et Al. [6] is to detect falls of the older seniors through SIMBAD: Smart Inactivity Monitor using Array-Based Detectors. This system relies on two parallel modules to raise alarms. First of all it analyzes target motion to detect falls characteristic dynamics. Then, it monitors target inactivity and compares it with a map of acceptable inactivity periods in different locations in the field of view. This system has been tested in laboratory simulating predefined fall scenarios, reporting limited results in true positive rate without any false positive. This research includes also results related to a trial lasting two months in a single occupant house. After a training period where experts tuned the system parameters according to the output, still an unacceptably high false-alarm rate emerged.
Erickson et Al. [7] use a Thermal Array Sensor Network in order to measure occupancy of a building. The provided information is used to control heating, cooling, ventilation, and lighting of the building to optimize energy usage. The method proposed by the authors consists in removing the background to detect the pixels of the matrix that refer to humans, followed by an analysis of the connected components using K-Nearest Neighbors classifier, to estimate the number of people. Unfortunately, no significant results have been reported.
In Basu et Al. [8] the authors present a method to estimate number of people and the direction of their motion from LR-TASs data. A Support Vector Machine has been used to classify connected components and local peak counts, estimating the number of persons with 80% accuracy. Finally, they inferred the direction of a subject motion across a set of scenes using cross-correlation analysis.
Mashiyama et Al. [9] report a system for Activity Recognition using LR-TASs. The proposed method aims to detect five activities -No event, Stopping, Walking, Sitting, and Falling -in three steps: human body detection, feature extraction and classification. Considered the high performances of this method in classifying activities as reported by its authors, we have decided to further analyze and reimplement it in order to compare results obtained in a field trial test.
For sake of clarity and completeness let's report the crucial passages of the work proposed by Mashiyama et Al. with the notation used in the rest of this publication. Given an instant of time t, the frame I(t) represents the set of T i,j measurements taken by a LR-TAS sensor at time t, each one related to the corresponding (i, j) pixel. Fixing a time windows τ , the variance v i,j (t) of each pixel with t ≥ τ is computed as it follows: If the obtained variance v i,j (t) exceeds a given threshold V th , a moving person (Walking, Sitting, or Falling) is detected in the current frame. Conversely, if no movement has been detected, the discrimination between a Stopping person or No event is done according to the difference (T diff ) between a person temperature T p and the background temperature T b . Given n temp as the number of pixel covered by a standing person, the average of the first n temp pixels of a frame ordered by descending temperature gives T p . Similarly, the average of the remaining pixels gives T b . Finally, only if T diff exceeds a given threshold T th a standing person is revealed. The authors tested their system in a test bed experiment, reporting particularly good accuracy results in classifying the mentioned activities, specially considering just the detection phase, excluding the activity classification method.
Most of the work done in activity recognition and human detection using LR-TAS report experimental data obtained by tests performed in a controlled environment. However, as highlighted by Sixsmith et Al. [6], there are some limitations in this approach that have to be considered when building an effective indoor monitoring system. First of all, the positioning of the sensors must take into account the geometry of the environment and its contents to ensure that the vision of the sensor is not obstructed. Indeed, a system to detect falls would be useless if it would not guarantee its effectiveness over the entire walkable area of the house. Moreover, another factor requiring a deep study is the noise management: radiators, appliances, heaters or sunlight reflections have to be considered in the model of the system. Our work is mainly focused in improving human detection's performances handling noisy data.

Human detection
The following method aims at retrieving a probability estimation of the presence of at least one person in the LR-TAS field of view. The main steps of the algorithm are summarized in the flow presented in Figure 1: noise removal, background estimation and probabilistic foreground detection.

Noise removal
LR-TAS (Low Resultion -Termal Array Sensor) raw temperature data are characterized by the presence of noise perturbing the desired measured signal. These type of sensors usually denote low accuracy on a single measurement: Grid-Eye sensor, for example, report the value within Typ. 2.5 • C [5]. In order to remove the effect of the noise on the measured signal, we will consider the indoor environment as a dynamic system influenced by the external temperature, air conditioning systems, and the presence of people and appliances in general. Reducing the noise components from the measured signal makes the human detection system more robust to external variations. As already mentioned, Grid Eye is a matrix of 8 × 8 sensing elements measuring the temperature of a certain region of space. To get the temperature distribution in the space, the temperature evolution process on a single region will be taken as an independent dynamic system and, hence, the measurement made by a single sensing element will be filtered independently from other measurements.

Kalman Filtering
Consider a dynamic system S represented as follows: S : where s(t) is the variable to be estimated, y(t) is the value obtained measuring s(t) which is affected by the measurement noise term η(t), x(t) is the state variable at time t, ξ models the process noise, F is the system matrix and H is the measurement matrix. The proposed noise removal technique is based on Kalman Filter (KF) [10] and it is composed of two phases: extrapolation and correction. During the extrapolation phase the filter receives a prediction of the system statex(t) for the current step t using the system state estimatê x(t − 1) made on the previous step. During the correction phase the state predictionx(t) is adjusted by current measurements y(t) to obtain the corrected estimatex(t). The state prediction is expressed as: while the state estimate is represented as: where K(t) is the Kalman Gain [10] at time t.

LR-TAS data filtering
In order to obtain the expected value of the measured temperature, separating it from the noise component, we applied the Kalman-Filtering technique described in the previous paragraph. Let the state variable x(t) be represented by Thus, from Eq.
while the variable to be estimated s(t) can be derived from Eq. (2): The result of the application of KF to measurements collected by a single sensing element is shown in Figure 2: data have been collected in a perturbed environment with appliances in the sensor field of view and activating the air conditioning system. The influence of these factors on the ambient temperature is evident analyzing the room temperature collected by the on-board sensor thermistor (Fig. 3).

Background Estimation
The fundamental assumption to discriminate humans from the background is that the human temperature distribution has to differ from the ambient temperature distribution. In this condition, the human recognition task converges to the analysis of the difference between the current measurements of the sensor cells and the corresponding values of estimated temperature background. Nevertheless, the temperature background estimation should adapt to the environmental condition changes that can be relatively rapid.
Assuming that the thermistor measurement of the ambient temperature is almost not affected by the presence of humans, this information can be used as a reference to detect changes in the environmental conditions. Thus, the dependence between the background temperature T b(i,j) (t) of the sensing element in position (i, j) and the ambient temperature T a (t) is a function f (T a (t)) = T b(i,j) (t), it is possible to compute T b(i,j) (t) from T a (t). The analysis of the sensor cells and thermistor measurements shows a linear dependence (Fig. 4): the average correlation coefficient between T a (t) and T b(i,j) (t) is 0.62, and between T a (t) andT b(i,j) (t) is 0.9. Hence, the function can be expressed as: In order to compute β that globally minimized the least square errors, it is necessary to collect a great number of samples. In the final implementation of the proposed method, to make the learning period shorter and let the background estimation algorithm work on-line, an approximation of β is used [11]. Setting: β as been computed as it follows:β where τ is a time window and ‡ means pseudo inverse operation. Thus, the estimated background temperature is computed as it follows: while the residual squares are given by: Finally, since the estimation ofβ should involve only the background related measurements, Eq. (10) is extended using the analysis of the residual squares:

Probabilistic Foreground Detection
In order to provide as much information as possible in uncertain situations, the proposed method also computes for every cell the probability of human detection in every instant. For this reason, we modeled the probability function q(T i,j (t)), describing whether the measurement belongs to the background temperature distribution T b , as a logistic function: where k is the steepness of the function and the probability p(T i,j (t)) that that measurement does not belong to the background distribution is (Fig. 5):

Installation and Results
The proposed method aims to improve the accuracy of Human Detection algorithms using LR-TAS data in noisy environments. For this reason, the environment has been perturbed during the experiment using air-conditioning system, appliances, and exposing the sensor to sunlight reflection.

Sensor mount
In the literature, there are two opinions on the placement of LR-TAS for human detection: the wall [12], [13] and the ceiling [9]. After carrying out several experiments, it was found that the installation on the wall has several drawbacks: • Under real condition, furniture and other objects can obstruct the view of the sensor.
• The movement of a human -coming closer and further to the installation point -influences the amount of pixels representing him/her.
• Since the sensor tends to average the value of the temperature in the observed space, human movement also affects the temperature distribution. For these reasons, the sensor was mounted on the ceiling at the height of 2.7m with a resultant detection area on the floor of approximately 9m 2 (Fig. 6). Streaming data have been transmitted and stored on a central device. The comunnication has been implemented using Thread protocol [14]: an innovative solution designed by the Thread Group for IoT applications. This protocol allows to easily create an IPv6, meshed, robust and secure network of sensors.

Experiments
We have collected several datasets for a total duration of 4 days. During this period people were asked to perform everyday activities passing and staying under the sensor. The experiment data have been manually annotated to validate the proposed algorithm: every pixel is labeled as "1" if it represents human and "0" otherwise.

Results
The proposed method has been tested on the retrieved datasets. Figure 7(a) shows the performance of the temperature background estimation for a single pixel sensor. Comparing it with Figure 7(b), it is visible that where the filtered measurements are distant from the estimated background, the probability of human detection is very high.
In order to compare the obtained results with the method that, in our knowledge, reports the best accuracy value in the literature, we implemented also the human detection algorithm proposed by Mashiyama et Al. [9]: Figure 7(c) shows T diff computed using the method proposed by Mashiyama. Unfortunately, authors do not suggest any method to compute the threshold T th and it is very difficult to manually tune it.
To compare the results of the work of two algorithms, the measurement is said to represent human when a frame contains at least one measurement whose probability p(m(t)) > p th . The parameters used in both of the algorithms are provided in Table 1, while the obtained results are presented in Table 2.

Conclusions
We have presented a novel technique to detect humans in indoor environments using Low Resolution Thermal Array Sensor. This approach consider the temperature variation in the room due to external dynamics and noise. A Kalman Filter has been used to filter the noise on the temperature measurements while a background     Final results show an improvement in the Human Detection accuracy compared to the state of the art when performing a field trial in a real environment passing from 70% to 97%.
The mentioned results have been collected using a single sensor installation, however a multisensor system needs to be implemented in order to setting up a real scenario in a smart environment. This extension, which requires to handle technical e theoretical problems -from placing the sensors to retrieving one overall model with the global state of dwellers and environment -will be part of our future work.