Sound-Environment Monitoring Method Based on Computational Auditory Scene Analysis ()
1. Introduction
Recently, the analysis of large data sets, so-called “big data,” has allowed a variety of information to be extracted, and this information can help create certain services. Further, monitoring techniques can be useful for determining the phenomena that initially generated the recorded data. Thus, monitoring techniques are regarded as those that allow identification of the monitored environment conditions through analysis of the data observed within the area. For example, in the case of structural monitoring, which is known as building health monitoring, deterioration and damage to buildings can be checked using findings obtained through the analysis of sensor data, e.g., data acquired from acceleration sensors and cameras [1] . In this paper, sound environments are assumed to be the target field of the monitoring problem; that is, sound environment monitoring is addressed.
Various methods for understanding sound environments have been proposed to date. However, almost all researchers have focused on topics related to environmental sound recognition (ESR) [2] . For example, ESR techniques implemented with features such as a zero-crossing rate, Cepstral features, MPEG-7- based features, and autoregression-based features, which are extracted from environmental sounds, have been proposed [3] - [9] , along with a method of understanding environmental sounds that employs a matching pursuit algorithm [10] . To the best of the author’s knowledge, no studies have focused on determination of sound environments; therefore, such a method is presented here.
This study proposes an unconventional method that allows the analysis of sound environments using color, where the color rules are based on the concept of synesthesia [11] . That is, sound positions can be estimated using a sound position estimation approach, and a color based on three features extracted from the observed environmental sounds can be painted at the estimated position. Hence, painted sound patterns referred to as “painted sound maps” are obtained, from which sound environment scenarios can be recognized. The efficacy of the proposed monitoring method is evaluated using environmental sound data observed at a galleria.
2. Proposed Method
2.1. Overview of Proposed Method
For application of the proposed method, environmental sounds are first collected using a microphone array (Figure 1). Using these sounds, various sound environment conditions can be estimated. These scenarios are then expressed
Figure 1. Microphone array utilized to collect environmental sounds at galleria.
using colors, based on the knowledge of synesthesia.
Synesthesia is a phenomenon in which one kind of sensory stimulation is expressed as another sensation [12] . In the case of synesthesia relating sound and color, Nagata et al. have reported an experimental result in which keys, tones, and pitches were respectively related to hue, saturation, and brightness [13] . Based on this result, this study utilizes information on the keys, tones, and pitches of environmental sounds to draw painted sound maps.
Keys, tones, and pitches are assumed to be detected by the chromagram, sonogram, and sound spectrogram, respectively. Hence, the hue score is calculated using the key histogram yielded by the chromagram. Similarly, the saturation and brightness scores are calculated using the frequency-band histograms produced by the sonogram and sound spectrogram, respectively, with a clustering method then being applied to the environmental-sound spectrogram. Similar frequency components are categorized so that the frequency component dispersion of the environmental sounds is clarified. This dispersion information is then used to calculate the histogram with respect to the spectrogram frequency elements.
Below, the proposed method of sound environment analysis is presented in detail. The sound data
can be obtained using the microphone array shown in Figure 1, where a short-term Fourier transform
is applied to
. Then, the amplitude of
has max
, where
is a constant value, and the maximum period of
is 2 s. The sound position estimation is conducted using multiple signal classification (MUSIC) [14] , utilizing the microphone array outputs.
2.2. Key Information Extraction from Chromagram
The environmental sound chromagram is calculated using the MATLAB chroma toolbox [15] . First, the environmental-sound pitch features can be computed using the audio_to_pitch_via_FB function. Figure 2 and Figure 3 show an environmental sound and its pitch features obtained using audio_to_pitch_via_FB, respectively. Next, a chromagram can be calculated (Figure 4) using pitch_to_ chroma, based on pitch features such as those shown in Figure 3.
Figure 3. Pitch features of environmental sound shown in Figure 2.
Figure 4. Chromagram of environmental sound shown in Figure 2.
Subsequently, a histogram showing the key information indicated in the chromagram is calculated. For example, Figure 5 shows the histogram calculated based on the chromagram shown in Figure 4, where the ma_sh function of the MATLAB ma toolbox [16] , which can calculate a spectrum histogram from the chromagram, is used. The MATLAB function, hist, is then applied to the spectrum histogram. Depending on the histogram variability, the histogram data can be transformed into an 8-bit binary code as follows: 1) The histogram mean is calculated (dashed line in Figure 5); 2) Values greater or less than the mean are replaced with “1” or “0,” respectively; 3) An 8-bit binary code is obtained; the code corresponding to Figure 5 is “00000101.” Hence, the hue score is determined by converting the binary code to decimal values. It is apparent that a higher score indicates an environmental sound consisting of some dominant keys.
2.3. Tonal Information Extraction from Sonogram
The sonogram can be calculated using the MATLAB ma toolbox [16] , where the loudness sensation per frequency band is estimated using auditory models and the ma_sone function of the ma toolbox. Figure 6 shows the sonogram of the environmental sound shown in Figure 2.
The frequency-band histogram of the sonogram is computed [16] , and the saturation score is then determined using the same approach as that used for the hue score. Therefore, it is apparent that a higher score indicates an environmental sound with some characteristic components in the loudness sensation per frequency band.
2.4. Pitch Information Extraction from Spectrogram
A spectrogram can also be calculated. Figure 7 shows the spectrogram of the
Figure 5. Key histogram obtained from Figure 4 chromagram, with mean.
Figure 6. Sonogram of environmental sound shown in Figure 2.
Figure 7. Spectrogram of environmental sound shown in Figure 2.
environmental sound shown in Figure 2. An edge-extraction image processing technique is applied to the spectrogram, and the number of pixels in its frequency characteristic areas and their centroid frequencies are then computed. The frequency characteristic areas of the spectrogram detected by the edge extraction technique are categorized using an improved affinity propagation (IAP) method (see Appendix). For details of the affinity propagation, see [17] .
Each exemplar centroid frequency obtained by the IAP is classified into a low-, medium-, or high-frequency group; then, the frequency-group histogram can be acquired. The brightness score is obtained from the histogram in a similar manner to the case of the hue score. However, when the 8-bit binary code is obtained, the threshold determining “1” or “0” values is set to zero. Therefore, a low score indicates that the dominant frequency of the environmental sound is low, while a higher score indicates that the environmental sound consists of various frequencies.
2.5. Painted Sound Map from Three Scores
The hue, saturation, and brightness scores are used to draw the painted sound map, where the hue-saturation-brightness color model obtained using these three scores is converted to a red-green-blue (RGB) color model.
3. Experimental Results and Discussion
In this section, the efficacy of the painted sound map method is demonstrated using environmental sounds observed in the sound environment shown in Figure 1. In each demonstration, the painted sound map is drawn using the environmental sounds generated in a single day. Further, in each figure shown below, the position of the microphone array is indicated by a red circle.
3.1. Painted Sound Map of Sound Environment on Typical Day
The sound environment shown in Figure 1 is a shopping-center galleria, the painted sound map of which is shown in Figure 8. A train station is located near the galleria, outside the left side of Figure 8. Therefore, train sounds are generated intermittently. Further, rattling sounds from chairs and desks are generated during the galleria preparation time, along with the voices of children and students visiting the galleria.
Figure 9 shows the painted sound map for a different day. It is notable that the generated sound patterns are similar to those in Figure 8. From these two maps, it can be concluded that painted sound maps can be utilized to determine similarities in the sound patterns of sound environments.
3.2. Painted Sound Map on Windy Day
Figure 10 shows a painted sound map obtained on a windy day. Comparing Figures 8-10, it is apparent that the painted sound map varies with the state of the sound environment; hence, painted sound maps can be utilized to detect variations in a sound environment.
3.3. Mini-Concert Event
Figure 11 shows the painted sound map obtained for the same area during a mini-concert event. Blue tones are emphasized at the event location, which is on the left-hand side of the map. In addition, the painted sound map obtained on the same day is shown in Figure 12. At a glance, the painted sound map is simi-
Figure 8. Painted sound map of Figure 1 sound environment.
Figure 9. Painted sound map showing similar sound patterns to those of Figure 8.
Figure 10. Painted sound map obtained on windy day.
Figure 11. Painted sound map obtained during mini-concert event.
Figure 12. Painted sound map obtained on event day.
lar to those shown in Figure 8 and Figure 9. This indicates that the snapshot provided by the painted sound maps is important as regards detection of sound environmental changes.
From all the above results, it can be concluded that the proposed painted sound map drawn using the three scores discussed above is effective for sound environment analysis. In particular, this approach is useful for visually detecting and determining the sound environment conditions and their variations, and it should be noted that the snapshots provided by the painted sound maps work effectively in this regard.
4. Conclusions
This paper has proposed a method of monitoring sound environments based on computational auditory scene analysis. The proposed visualization technique allows sound environment conditions to be determined and represented using colors.
As future research work, the proposed monitoring technique can be applied to the monitoring of superannuated building structural conditions.
Acknowledgements
The author thanks Dr. Sashima and Dr. Kurumatani for helpful discussions. This work was partly supported by a JSPS KAKENHI Grant (Number 16H02911).
Appendix
1.1. Improved Affinity Propagation
The method proposed in this paper employs a message exchange clustering algorithm based on an affinity propagation (AP) method [17] . The AP performs clustering such that the data can be categorized by modifying the messages
and
according to
, (1)
, (2)
, (3)
where
is a message being sent from data point
in a cluster to a centroid candidate k (exemplar) in the cluster, indicating the appropriateness of data point k becoming the exemplar of
.
is a message being sent from an exemplar candidate k to data point
, indicating the appropriateness of
becoming a cluster member of k.
is the similarity between data points
and k, where, in each iterative step l,
and
are updated with those of the previous iteration, i.e.,
and
. The parameter “lam” denotes a damping factor and is set to 0 < lam < 1.
In the AP method, the exemplar is the data point k satisfying the inequality;
. (4)
Then, the exemplar satisfying condition (4) can be altered by the preference
[17] . That is,
influences the output clusters and the number of clusters. The
values are set before modification of
and
. In the original AP method, the
values were set to the median of all
values.
Here, it should be noted that the
values can be modified during the updates of
and
. Wang et al. have proposed an adaptive scanning method of preferences applicable to the
space to determine the optimal clustering solution [18] . They have also proposed a damping-factor adaptive adjustment method to improve the AP method convergence. This study proposes an
modification algorithm using the similarity
and satisfying the condition,
. (5)
That is, based on the
values with respect to the data point
satisfying condition (5), all
values are updated using
. (6)
This means that the
values are updated such that
does not become an outlier in its cluster. The parameters abs(x), mean(x), and std(x) denote the absolute value, the mean value, and the standard deviation of x, respectively.
Table 1. Clustering performance comparison.
Further, X denotes all data in a cluster consisting of data Xk’’. The parameters
and
are positive constants greater and less than one, respectively. Therefore, the AP algorithm proposed in this study is implemented by adding the original rules (1)-(4) to rules (5) and (6).
1.2. Improved Affinity Propagation (IAP) Performance
In this subsection, the proposed IAP method is compared with the original and adaptive AP methods, using the 2D random data points
generated with
. The data point number is N = 30. The original and adaptive AP methods are implemented using the MATLAB program obtained from [19] . The performances of the three algorithms are evaluated using the Calinski-Harabasz criterion [20] , which is the ratio of the between-cluster variance to the total within-cluster variance, defined as
(7)
Here, k denotes the number of clusters and SSB is the overall between-cluster variance, which is essentially the variance of all the cluster centroids from the grand centroid in the dataset, defined as
. (8)
Here,
indicates the number of elements per cluster,
is the centroid of cluster i,
is the overall mean of the dataset, and
is the L2 norm of *. Further, SSW is the overall within-cluster variance, defined as
, (9)
where
is a data point,
is the ith cluster, and
is the centroid of
A large positive value of VRC indicates that the clustering performance is superior. In this study,
and
in (5) and (6) were set to 1.5 and 0.9, respectively.
Table 1 shows the performance results, which were averaged over the results of 30 trials. It is apparent that the proposed IAP method has a longer computational time and more exemplars than the original AP method. However, the former exhibits superior clustering performance, compared with the two conventional methods. Note that the running time was calculated using a PC (CPU: i7-4770@3.4 GHz, RAM: 8.0 GB). Hence, it can be concluded that rules (5) and (6) work effectively in the original AP method.