Sound-Environment Monitoring Method Based on Computational Auditory Scene Analysis

Monitoring techniques are a key technology for examining the conditions in various scenarios, e.g., structural conditions, weather conditions, and disasters. In order to understand such scenarios, the appropriate extraction of their features from observation data is important. This paper proposes a monitoring method that allows sound environments to be expressed as a sound pattern. To this end, the concept of synesthesia is exploited. That is, the keys, tones, and pitches of the monitored sound are expressed using the three elements of color, that is, the hue, saturation, and brightness, respectively. In this paper, it is assumed that the hue, saturation, and brightness can be detected from the chromagram, sonogram, and sound spectrogram, respectively, based on a previous synesthesia experiment. Then, the sound pattern can be drawn using color, yielding a “painted sound map.” The usefulness of the proposed monitoring technique is verified using environmental sound data observed at a galleria.


Introduction
Recently, the analysis of large data sets, so-called "big data," has allowed a variety of information to be extracted, and this information can help create certain services.Further, monitoring techniques can be useful for determining the phenomena that initially generated the recorded data.Thus, monitoring techniques are regarded as those that allow identification of the monitored environment conditions through analysis of the data observed within the area.For example, in the case of structural monitoring, which is known as building health monitoring, deterioration and damage to buildings can be checked using findings obtained through the analysis of sensor data, e.g., data acquired from acceleration sensors and cameras [1].In this paper, sound environments are assumed to be the target field of the monitoring problem; that is, sound environment monitoring is addressed.
Various methods for understanding sound environments have been proposed to date.However, almost all researchers have focused on topics related to environmental sound recognition (ESR) [2].For example, ESR techniques implemented with features such as a zero-crossing rate, Cepstral features, MPEG-7based features, and autoregression-based features, which are extracted from environmental sounds, have been proposed [3]- [9], along with a method of understanding environmental sounds that employs a matching pursuit algorithm [10].
To the best of the author's knowledge, no studies have focused on determination of sound environments; therefore, such a method is presented here.
This study proposes an unconventional method that allows the analysis of sound environments using color, where the color rules are based on the concept of synesthesia [11].That is, sound positions can be estimated using a sound position estimation approach, and a color based on three features extracted from the observed environmental sounds can be painted at the estimated position.
Hence, painted sound patterns referred to as "painted sound maps" are obtained, from which sound environment scenarios can be recognized.The efficacy of the proposed monitoring method is evaluated using environmental sound data observed at a galleria.

Overview of Proposed Method
For application of the proposed method, environmental sounds are first collected using a microphone array (Figure 1).Using these sounds, various sound environment conditions can be estimated.These scenarios are then expressed Figure 1.Microphone array utilized to collect environmental sounds at galleria.using colors, based on the knowledge of synesthesia.Synesthesia is a phenomenon in which one kind of sensory stimulation is expressed as another sensation [12].In the case of synesthesia relating sound and color, Nagata et al. have reported an experimental result in which keys, tones, and pitches were respectively related to hue, saturation, and brightness [13].
Based on this result, this study utilizes information on the keys, tones, and pitches of environmental sounds to draw painted sound maps.Keys, tones, and pitches are assumed to be detected by the chromagram, sonogram, and sound spectrogram, respectively.Hence, the hue score is calculated using the key histogram yielded by the chromagram.Similarly, the saturation and brightness scores are calculated using the frequency-band histograms produced by the sonogram and sound spectrogram, respectively, with a clustering method then being applied to the environmental-sound spectrogram.Similar frequency components are categorized so that the frequency component dispersion of the environmental sounds is clarified.This dispersion information is then used to calculate the histogram with respect to the spectrogram frequency elements.
Below, the proposed method of sound environment analysis is presented in detail.The sound data ( ) t y can be obtained using the microphone array shown in Figure 1, where a short-term Fourier transform ( ) . Then, the amplitude of ( ) , where a is a constant value, and the maximum period of ( ) t y is 2 s.The sound position estimation is conducted using multiple signal classification (MUSIC) [14], utilizing the microphone array outputs.

Key Information Extraction from Chromagram
The environmental sound chromagram is calculated using the MATLAB chroma toolbox [15].First, the environmental-sound pitch features can be computed using the audio_to_pitch_via_FB function.Figure 2 and Figure 3 show an environmental sound and its pitch features obtained using audio_to_pitch_via_FB, respectively.Next, a chromagram can be calculated (Figure 4) using pitch_to_ chroma, based on pitch features such as those shown in Figure 3.   is calculated (dashed line in Figure 5); 2) Values greater or less than the mean are replaced with "1" or "0," respectively; 3) An 8-bit binary code is obtained; the code corresponding to Figure 5 is "00000101."Hence, the hue score is determined by converting the binary code to decimal values.It is apparent that a higher score indicates an environmental sound consisting of some dominant keys.

Tonal Information Extraction from Sonogram
The sonogram can be calculated using the MATLAB ma toolbox [16], where the loudness sensation per frequency band is estimated using auditory models and the ma_sone function of the ma toolbox.Figure 6 shows the sonogram of the environmental sound shown in Figure 2.
The frequency-band histogram of the sonogram is computed [16], and the saturation score is then determined using the same approach as that used for the hue score.Therefore, it is apparent that a higher score indicates an environmental sound with some characteristic components in the loudness sensation per frequency band.

Pitch Information Extraction from Spectrogram
A spectrogram can also be calculated.Figure 7 shows the spectrogram of the

M. Kawamoto
The frequency characteristic areas of the spectrogram detected by the edge extraction technique are categorized using an improved affinity propagation (IAP) method (see Appendix).For details of the affinity propagation, see [17].
Each exemplar centroid frequency obtained by the IAP is classified into a low-, medium-, or high-frequency group; then, the frequency-group histogram can be acquired.The brightness score is obtained from the histogram in a similar manner to the case of the hue score.However, when the 8-bit binary code is obtained, the threshold determining "1" or "0" values is set to zero.Therefore, a low score indicates that the dominant frequency of the environmental sound is low, while a higher score indicates that the environmental sound consists of various frequencies.

Painted Sound Map from Three Scores
The hue, saturation, and brightness scores are used to draw the painted sound map, where the hue-saturation-brightness color model obtained using these three scores is converted to a red-green-blue (RGB) color model.

Experimental Results and Discussion
In this section, the efficacy of the painted sound map method is demonstrated using environmental sounds observed in the sound environment shown in Fig- In each demonstration, the painted sound map is drawn using the environmental sounds generated in a single day.Further, in each figure shown below, the position of the microphone array is indicated by a red circle.

Painted Sound Map of Sound Environment on Typical Day
The sound environment shown in Figure 1 is a shopping-center galleria, the painted sound map of which is shown in Figure 8.A train station is located near the galleria, outside the left side of Figure 8.Therefore, train sounds are generated intermittently.Further, rattling sounds from chairs and desks are generated during the galleria preparation time, along with the voices of children and students visiting the galleria.
Figure 9 shows the painted sound map for a different day.It is notable that the generated sound patterns are similar to those in Figure 8. From these two maps, it can be concluded that painted sound maps can be utilized to determine similarities in the sound patterns of sound environments.

Painted Sound Map on Windy Day
Figure 10 shows a painted sound map obtained on a windy day.Comparing Figures 8-10, it is apparent that the painted sound map varies with the state of the sound environment; hence, painted sound maps can be utilized to detect variations in a sound environment.

Mini-Concert Event
Figure 11 shows the painted sound map obtained for the same area during a mini-concert event.Blue tones are emphasized at the event location, which is on the left-hand side of the map.In addition, the painted sound map obtained on the same day is shown in Figure 12.At a glance, the painted sound map is simi-     From all the above results, it can be concluded that the proposed painted sound map drawn using the three scores discussed above is effective for sound environment analysis.In particular, this approach is useful for visually detecting and determining the sound environment conditions and their variations, and it should be noted that the snapshots provided by the painted sound maps work effectively in this regard.

Conclusions
This paper has proposed a method of monitoring sound environments based on computational auditory scene analysis.The proposed visualization technique allows sound environment conditions to be determined and represented using colors.
As future research work, the proposed monitoring technique can be applied to the monitoring of superannuated building structural conditions.Further, X denotes all data in a cluster consisting of data Xk''.The parameters α and β are positive constants greater and less than one, respectively.
Therefore, the AP algorithm proposed in this study is implemented by adding the original rules ( 1)-( 4) to rules ( 5) and ( 6).

Improved Affinity Propagation (IAP) Performance
In this subsection, the proposed IAP method is compared with the original and adaptive AP methods, using the 2D random points . The data point number is N = 30.The original and adaptive AP methods are implemented using the MATLAB program obtained from [19].The performances of the three algorithms are evaluated using the Calinski-Harabasz criterion [20], which is the ratio of the between-cluster variance to the total within-cluster variance, defined as Here, k denotes the number of clusters and SS B is the overall between-cluster variance, which is essentially the variance of all the cluster centroids from the grand centroid in the dataset, defined as Here, i n indicates the number of elements per cluster, i x is the centroid of cluster i, x m is the overall mean of the dataset, and 2 * is the L2 norm of *.Further, SS W is the overall within-cluster variance, defined as where ′ x is a data point, i c is the ith cluster, and i m is the centroid of .

i c
A large positive value of VRC indicates that the clustering performance is superior.In this study, α and β in ( 5) and ( 6) were set to 1.5 and 0.9, respec- tively.
Table 1 shows the performance results, which were averaged over the results of 30 trials.It is apparent that the proposed IAP method has a longer computational time and more exemplars than the original AP method.However, the former exhibits superior clustering performance, compared with the two conventional methods.Note that the running time was calculated using a PC (CPU: i7-4770@3.4GHz, RAM: 8.0 GB).Hence, it can be concluded that rules ( 5) and ( 6) work effectively in the original AP method.

Figure 3 .
Figure 3. Pitch features of environmental sound shown in Figure 2.

Figure 4 .
Figure 4. Chromagram of environmental sound shown in Figure 2.

Figure 6 .
Figure 6.Sonogram of environmental sound shown in Figure 2.

Figure 7 .
Figure 7. Spectrogram of environmental sound shown in Figure 2.

Figure 8 .
Figure 8. Painted sound map of Figure 1 sound environment.

Figure 9 .
Figure 9. Painted sound map showing similar sound patterns to those of Figure 8.

Figure 10 .
Figure 10.Painted sound map obtained on windy day.

Figure 11 .
Figure 11.Painted sound map obtained during mini-concert event.

Figure 12 .
Figure 12.Painted sound map obtained on event day.