Emotion Detection by Analyzing Voice Signal Using Wavelet

Emotion is such a unique power of human trial that plays a vital role in dis-tinguishing human civilization from others. Voice is one of the most important media of expressing emotion. We can identify many types of emotions by talking or listening to voices. This is what we know as a voice signal. Just as the way people talk is different, so is the way they express emotions. By looking or hearing a person’s way of speaking, we can easily guess his/her personality and instantaneous emotions. People’s emotion and feelings are expressed in different ways. It is through the expression of emotions and feelings that people fully express his thoughts. Happiness, sadness, and anger are the main medium of expression way of different human emotions. To express these emotions, people use body postures, facial expressions and voca-lizations. Though people use a variety of means to express emotions and feelings, the easiest and most complete way to express emotion and feelings is voice signal. The subject of our study is whether we can identify the right human emotion by examining the human voice signal. By analyzing the voice signal through wavelet, we have tried to show whether the mean frequency, maximum frequency and L p values conform to a pattern according to its different sensory types. Moreover, the technique applied here is to develop a concept using MATLAB programming, which will compare the mean frequency, maximum frequency and L p norm to find relation and detect emotion by analyzing different voices.

Feature as machine learning, robotics, data science etc. have made our life easier and more enjoyable day by day. Although it is not possible to give emotion and feelings to a machine, we can often accurately identify the expression of emotions and feelings of a person by machine, analyzing the voice signal. We analyze voice signal by signal processing. When we want to express an emotion or feeling through our throat or through speech or through spoken or unspoken words, it simply takes the form of a basic signal in the form of sine and cosine [1]. These signals vary according to the type of person and person's voice. By processing these voice signals, we can get an idea of the individual's characteristics, identify him by his precious work and identify his various emotion. In previous these signals were processed with Fourier series and Fourier transform but there are some problems as Fourier series is not well time localized [2]. We use Haar wavelet to process our voice signal because wavelet is well time localized [3]. Although voice identification and classification began in early 1960's, it is now a very rich research field like image processing [4]. This enrichment of research is due to some of the characteristics that determine the character of a voice signal, such as loudness, amplitude, mean frequency, maximum frequency, L p norm, standard deviation etc. which differ from person to person. We have calculated the unique feature of signal by analyzing the voice signals, which are composed of five-level decomposition through Haar wavelet [5]. Emotion detection, one of the fields of speech classification, is a technique of extracting the speech characteristic information from people voice, and then is analyzed through machine and identifies the uniqueness of speech. Speech classification is an interconnected field of many others fields like image processing, robotics, artificial intelligence, machine learning, etc., currently a lot of work being done in machine learning with voice classification, voice identification and emotion detection, which has enriched our robotics much more. A large part of robotics today depend on speech command through which our daily electronic turns away from its button and switch. By adding speech classification to machine learning, we can look at a person's speech record and store the emotion of his speech as data to estimate his next field of emotion. With the addition of the voice detection feature to the CCTV camera, we can easily identify the culprit through emotion detection by hearing the voice without looking at the face. Emotion detection plays an important role for people with physical and mental disabilities who communicate mainly through machines. Therefore, emotion detection or voice classification has not only made our daily lives easier, it has become a more important part of our life. There are many ways of classifying voice for emotion detection; wavelet is one of the best tools among them [6].

Wavelet
The wavelet means small waves and in brief, a wavelet is an oscillation that decays quickly. Equivalent mathematical conditions for wavelet are: American Journal of Computational Mathematics

Haar Wavelet
The Hungarian mathematician Alfred Haar first introduced the Haar function in 1909 in his Ph.D. thesis. A function defined on the real line ℜ as Is known as the Haar function [7]. i i

Signal
A signal is defined as a function ( )

Wavelets and Signal Processing
The mathematical theory for wavelets deals for a great part with ways of obtain-ing series expansions of the type ( ) ( ) for certain functions f. Very often, the relevant functions f describe the time dependence of certain signals, i.e. the vibrations in a mechanical system or the current in an electric circuit. We now shortly describe a property of wavelets, which distinguish representations of the type ( ) ( )

Frequency and Maximum Frequency
Frequency is the number of occurrences of a repeating event per unit time. In physics frequency is the number of waves that pass a fixed point in unit time; also, the number of cycle or vibration during one unit of time by a body in periodic motion. A body in periodic motion is said to have undergone one cycle of one vibration after passing through a series of events of positions and returning to its original state. If the period, or time interval, required to complete one cycle of vibration is 1/2 second, the frequency is two per second [10]. The symbol most often used for frequency are f and the Greek letters ϑ (nu) and omega ω . In general, the frequency is the reciprocal of the period, or time interval. If T is the time period to complete one cycle then the frequency f can be written as When the number of cycle in a unit period is maximum then the frequency is maximum.

Mean Frequency
The mean frequency of a spectrum is calculated as the sum of the product of the spectrogram intensity (in dB) and the frequency, divided by the total sum of spectrogram intensity [10]. If n is the number of frequency bins in the spectrum, i f is the frequency of spectrum at bin i of n and i I is the intensity (in dB) of spectrum at bin i of n then the mean frequency mean f can be written as follows, For discrete function it can be defined as x are the components of f [11].
If we put 1 p = in the above equation then 1 f is called L 1 Norm.
If we put 2 p = in the above equation then 2 f is called L 2 Norm.

Experimental Sample
Here we take twelve-voice sample from four people. We collected our voice sample in four different mood (joy, sorrow and angry) from each person.

First Experimental Real Voice of First Person
First experimental real voice of first person who speaks by a microphone and say, "can you please tell me what is going on" with joy in between 0 to 5 seconds From statistical analysis of first experimental speech signal of first person, we see that, mean frequency of signal is 347 Hz; maximum frequency of signal is 453 Hz, L 1 Norm is 2263 and L 2 Norm is 23.35.

Second Experimental Real Voice of First Person
Second experimental real voice of first person who speaks by a microphone and say, "can you please tell me what is going on" with sorrow in between 0 to 5 seconds.
From statistical analysis of second experimental speech signal of first person, we see that, mean frequency of signal is 323 Hz, maximum frequency of signal is 428 Hz, L 1 Norm is 2311 and L 2 Norm is 17.26.

Third Experimental Real Voice of First Person
Third experimental real voice of first person who speaks by a microphone and say, "can you please tell me what is going on" with anger in between 0 to 5 seconds From statistical analysis of third experimental speech signal of first person, we see that, mean frequency of signal is 376 Hz, maximum frequency of signal is 508 Hz, L 1 Norm is 2181 and L 2 Norm is 28.02.

First Experimental Real Voice of Second Person
First experimental real voice of second person who speaks by a microphone and say, "can you please tell me what is going on" with joy in between 0 to 5 seconds.
From statistical analysis of first experimental speech signal of second person, we see that, mean frequency of signal is 366 Hz, maximum frequency of signal is 481 Hz, L 1 Norm is 2206 and L 2 Norm is 24.32.

Second Experimental Real Voice of Second Person
Second experimental real voice of second person who speaks by a microphone and say, "can you please tell me what is going on" with sorrow in between 0 to 5 seconds.
From statistical analysis of second experimental speech signal of second per-son, we see that, mean frequency of signal is 313 Hz, maximum frequency of signal is 423 Hz, L 1 Norm is 2327 and L 2 Norm is 15.28.

Third Experimental Real Voice of Second Person
Third experimental real voice of second person who speaks by a microphone and say, "can you please tell me what is going on" with anger in between 0 to 5 seconds. From statistical analysis of third experimental speech signal of second person, we see that, mean frequency of signal is 380 Hz, maximum frequency of signal is 497 Hz, L 1 Norm is 2049 and L 1 Norm is 27.88.

First Experimental Real Voice of Third Person
First experimental real voice of third person who speaks by a microphone and say, "can you please tell me what is going on" with joy in between 0 to 5 seconds.
From statistical analysis of first experimental speech signal of third person, we see that, mean frequency of signal is 372 Hz, maximum frequency of signal is 456 Hz, L 1 Norm is 2218 and L 2 Norm is 22.09.

Second Experimental Real Voice of Third Person
Second experimental real voice of third person who speaks by a microphone and say, "can you please tell me what is going on" with sorrow in between 0 to 5 seconds.
From statistical analysis of second experimental speech signal of third person, we see that, mean frequency of signal is 319 Hz, maximum frequency of signal is 433 Hz, L 1 Norm is 2295 and L 2 Norm is 16.43.

Third Experimental Real Voice of Third Person
Third experimental real voice of third person who speaks by a microphone and say, "can you please tell me what is going on" with anger in between 0 to 5 seconds.
From statistical analysis of third experimental speech signal of third person, we see that, mean frequency of signal is 396 Hz, maximum frequency of signal is 514 Hz, L 1 Norm is 2031 and L 2 Norm is 28.16.

First Experimental Real Voice of Forth Person
First experimental real voice of forth person who speaks by a microphone and say, "can you please tell me what is going on" with joy in between 0 to 5 seconds.
From statistical analysis of first experimental speech signal of forth person, we see that, mean frequency of signal is 360 Hz, maximum frequency of signal is 473 Hz, L 1 Norm is 2188 and L 2 Norm is 24.07..

Second Experimental Real Voice of Forth Person
Second experimental real voice of forth person who speaks by a microphone and F. Badsha, R. Islam say, "can you please tell me what is going on" with sorrow in between 0 to 5 seconds.
From statistical analysis of second experimental speech signal of forth person, we see that, mean frequency of signal is 322 Hz, maximum frequency of signal is 416 Hz, L 1 Norm is 2235 and L 2 Norm is 17.28.

Third Experimental Real Voice of Forth Person
Third experimental real voice of forth person who speaks by a microphone and say, "can you please tell me what is going on" with anger in between 0 to 5 seconds.
From statistical analysis of third experimental speech signal of forth person, we see that, mean frequency of signal is 402 Hz, maximum frequency of signal is 518 Hz, L 1 Norm is 2083 and L 2 Norm is 27.12.

Results and Discussions
In the above experimental study, we tried to predict the approximate emotion by analyzing the four unique and basic feature (mean frequency, maximum frequency, L 1 norm and L 2 norm) of voice signal. In this purpose, we analyzed the voice signal by Haar wavelet (Shown in Figure 1). The signal frequency is nothing but sampling number. The number of sample of each of our signals is 40,000.
In our experiment, we have taken three voices in three different mood (Joy, sorrow, anger) from each of four people (Figures 2-13). We calculate mean frequency, maximum frequency, L 1 norm and L 2 norm for each voice (Tables 1-4).
We can see from the table and chart (shown in Table 5 and Figure 14 in between 25 to 30. It can be observed that L 1 norm and L 2 norm gradually increase and decrease respectively form sorrow, joy to angry (shown in Table 6, Figure 15 and Figure 16).

Conclusion
In the above discussion, we have discussed many issues of voice signal. Our aim was to give a complete aspect of how voice signals behave in different ways of emotion of people. By processing the voice signal through wavelet analysis, we have seen that the signal changes its structure in case of three different moods (joy, sorrow and anger) and give characteristic values of mean frequency, maximum frequency, L 1 norm and L 2 norm. In this way by making a relationship among these characteristic calculations, we can easily determine the emotion of voice signal. These results not only help to determine the emotion but also they F. Badsha, R. Islam give important information about speakers. Emotion detection through voice signal analysis plays an important role in machine learning, robotics, artificial intelligence, data mining, voice biometric, intelligence division, forensic department etc. With the help of voice biometric, forensic and voice identification process, our intelligence department will be able to identify many criminals, as a result our cybercrime will greatly reduce. Now a day's voice automation is very popular and widely used medium. Already, many things are being done through voice commands and in the near future, this sector will be spread. Voice security is very popular among the many trusted security media. Therefore, in the present world, emotion detection, voice identification, specification and speaker recognition are a very important task. In this paper, we have generated codes with the help of MATLAB Programming and recorded our sound by regular headphone so some noise may be mixed with the original voice. We will try to improve this issue in the future.