Escolar Documentos
Profissional Documentos
Cultura Documentos
Senior Design Project Report Student: Tyler Havner Ismael Perez Technical Advisor: Dr. Reza Raeisi Dr. Daniel Bukofzer Dr. Sean Fulop
FALL 2012 Comments: _________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________
TABLE OF CONTENTS Section Course Evaluation Rubric ... Definition of Key Terms . ...... Page i ii
1 1 1 1 1 2
1.2 General Statement of the Problem . 1.3 Objective Solution . 1.4 Scope of the Study . 1.5 Project Limitation .. 2. Background Theory 2.1 Analog to Digital Converter (ADC) 2.2 Frequency Spectrum 2.3 Digital Filters 2.3.1 Infinite Impulse Response (IIR) Filters 2.3.2 Finite Impulse Response (FIR) Filters 2.4 FPGA . 2.5 Summary 3. Monetary Costs of Project .... 4. Methodology .... 4.1 Theoretical Concept 4.2 Detail Algorithm and Desig n Approach 4.2.1 Data Acquisition and the ADC 4.2.2 Start of the Word Detection 4.2.3 Frequency Analysis
3 3 4 6 6 7 7 9
10 10 13 13 15 16 21
4.2.5 Comparison Function 4.2.6 Driving Outputs 4.2.7 System Architecture 4.2.8 Training the System .. 4.3 Work Breakdown
22 23
26 25 30
5. Parts Ordering ..... 6. Finding and Conclusion .... 6.1 MatLab Findings ,,,,,. 6.2 Testing Results . 6.3 System Improvements .. 7. Conclusion .... References ...... Appendix A: MatLab Code Appendix B: Ordering Receipt s ..... Appendix C: DE2 Board Code ..
34
34 34 39 40
41
42
43
52
54
ECE 186B Course Evaluation Rubric 1. How successfully you were able to convert your problem statement or project objectives to your own engineering domain such as digital domain, control domain, microcontroller domain, etc. in order to find an approach to come up with solution. We were very successful at being able to convert our project objectives to your own engineering domain. We were able to create a MatLab model to observe the digital time as well as frequency domain. Also, were able to create the equivalent model in the microprocessor domain to program the DE2 board in C language. 2. How successfully you were able to determine the right engineering tools for the purpose of your project. I was successfully able to determine the right engineering tool through my years of lab equipment and software familiarity. When our project when it became clear that our project entailed a lot of digital signal processing (DSP) we quickly realized that MatLab would cut down design time by providing powerful DSP toolboxes to analyze each step of our design. Also, the bench tools allowed us to test all of our hardware. 3. The effectiveness of using the tools. We were effective at using the engineering tools for our project. We had to do a lot of research into some of the tools we used such as how to sample data from the microphone input channel on MatLab and programming the DE2 board in C. Due to our sound foundation in those areas we were able to understand how to accurately and effectively use those tools. 4. Your experience on being able to develop a prototype and simulation of it. We feel confident in our ability to utilize engineering tools come up with the best solution within the given constraints. When developing a prototype many issues arise that are usually not anticipated and the design needs to be adapted. We feel as though we made good design decisions in order to provide a working prototype in a timely manner. 5. Overall correctness of your design. Our design was correct according to the goals of the method we set out to test. There was some error associated with the casting of floating point accumulations into integer values but those were minor. The design could have been improved with a higher order of filter as had originally designed for but that would have required even longer computation times and memory.
Definition of Key Terms The following is a list of key terms along with their definitions from Wikipedia that will be needed in order to grasp the concept of our proposed system [6]. x x x Analog Signal: any continuous signal for which the time varying feature (variable) of the signal is a representation of some other time varying quantity Digital Signal: is a physical signal that is a representation of a sequence of discrete values (a quantified discrete-time signal) Pulse-code modulation (PCM): A PCM stream is a digital representation of an analog signal, in which the magnitude of the analog signal is sampled regularly at uniform intervals, with each sample being quantized to the nearest value within a range of digital steps. Frequency Spectrum: is a representation of a time-domain signal in the frequency domain. The frequency spectrum can be generated via a Fourier transform of the signal, and the resulting values are usually presented as amplitude and phase, both plotted versus frequency. Low Pass Filter: is an electronic filter that passes low-frequency signals but attenuates (reduces the amplitude of) signals with frequencies higher than the cutoff frequency. Band Pass Filter: is an electronic filter that passes frequencies within a certain range and rejects (attenuates) frequencies outside that range. Digital Filter: is characterized by its transfer function, or equivalently, its difference equation. Logarithmic Scale: is a scale of measurement using the logarithm of a physical quantity instead of the quantity itself. A simple example is a chart whose vertical axis has equally spaced increments that are labeled 1, 10, 100, 1000, instead of 1, 2, 3, 4. Each unit increase on the logarithmic scale thus represents an exponential increase in the underlying quantity for the given base (10, in this case). Decibel: is a logarithmic unit that indicates the ratio of a physical quantity (usually power or intensity) relative to a specified or implied reference level. A ratio in decibels is ten times the logarithm to base 10 of the ratio of two power quantities. Accumulator: is a register in which intermediate arithmetic and logic results are stored. Without a register like an accumulator, it would be necessary to write the result of each calculation (addition, multiplication, shift, etc.) to main memory. Aliasing: an effect that causes different signals to become indistinguishable (or aliases of one another) when sampled. FPGA: is an integrated circuit designed to be configured by the customer or designer after manufacturinghence "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC). ii
x x x x
x x
as well as programming in hardware description language (HDL) and C. The hardware portion of the project will require a background in electric motors and electronics to drive them. The courses we have taken to prepare for this project are as follows: Course Number ECE 71 ECE 121 ECE 124 ECE 134 ECE 138 Course Description Programming in C Electromechanical Systems Signals and Systems Communications Electronics II
Tyler:
Ismael:
Course Number ECE 107 ECE 124 ECE 176 ECE 178 CSCI 150
Course Description Digital Signal Processing Signals and Systems Verilog Coding Embedded Systems Software Engineering
1.5 Project Limitations Speech processing is a very robust problem and modern speech analysis is accomplished by using complex probabilistic characterizations of words and sentence structures known as Hidden Markov Chains [3]. The goal of our project is to create a reliable and accurate system that does not rely on such complex models. The system will attempt to be simplistic in order to lay a basis for future consumer products. One such restriction on our system will be speaker dependence. Creating a system that does not rely on a specific speaker is a very complex problem but our system will have a primary speaker (the homeowner). Also, our system will use single word recognition not continuous speech recognition. In other words the speaker will have to first train our system with several versions of the same word, thus yielding a reference fingerprint. The reference fingerprint represents the set of values that result from averaging the three set of values from the training words. Subsequent words can be recognized based upon how closely they relate to the saved reference fingerprint.
The size of the quantization levels are then based upon the amplitude bounds and the number of levels. Equation 2.1.2 shows this relationship with equal to the size of the quantization levels is the peak amplitude and is the number of quantization levels.
3
(2.1.2)
Figure 2.1.1: 3-bit PCM Encoded Waveform In Figure 2.1.1, the quantization levels are numbered on the left hand side of the y-axis while the binary encoded representation of the levels is shown on the right-hand side of the y-axis. The xaxis shows the encoded sampled values of the waveform. The sampling frequency that we will be using will be based on the Nyquist formula [1]. This formula says that in order to avoid aliasing of the reconstructed waveform the sampling frequency should be at least twice that of the highest frequency component or Nyquist frequency in the waveform. Equation 2.1.3 gives the Nyquist sampling theorem with equal to the Nyquist sampling rate and equal to the highest frequency component in the sampled waveform. (2.1.3)
2.2 Frequency Spectrum Speech processing is explored by performing spectral analysis to characterize the time-varying properties of the signal [7]. In other words speech processing requires a frequency domain representation of the signal to be analyzed. The Fourier transform, shown in Equation 2.2.1, does exactly that and transforms a time domain signal into its equivalent frequency domain representation, .
4
(2.2.1)
The Fourier transform often reveals characteristics of the signal that would not otherwise be readily apparent in the time domain. For example, the Fourier transform of the band limited rectangle function in the time domain becomes an infinite banded sinc function in the frequency domain as shown in Figure 2.2.1.
Figure 2.2.1: Fourier Transform of rect(t) In the case of a sampled digital waveform, , the discrete-time Fourier transform (DFT) is used. It is described by Equation 2.2.2.
(2.2.2)
The power spectral density (PSD) of a waveform builds on the Fourier transform of a waveform and gives the relationship to the energy of the signal with relation to frequency [8]. This property is described by Rayleighs Theore m and is shown in Equation 2.2.3.
(2.2.3)
Using Rayleighs Theorem to gain the PSD of the waveform the significant frequency components of the waveform become more apparent [8]. When using a discrete time waveform Rayleighs Theorem becomes Equation 2.2.4.
(2.2.4)
2.3 Digital Filters There are two types of digital filters available for our purposes. They are the infinite impulse response (IIR) or finite impulse response (FIR) filters. We studied the different attributes of each and based on that we made our decision. The following discusses some of the major differences between the two digital filters available to us. 2.3.1 IIR filters These types of filters are generally difficult to control and are typically unstable. We want to be able to control our filters but we also expect them to be stable. Normally there is no particular phase to describe the operation of these filters and they also have limited cycles. While infinite response filters have no particular phase the finite response filters have a linear phase, which is part of what makes them stable. The fact that these filters are infinite impulse response makes them non-causal. Both the poles and the zeros have an effect on these filters. Since IIR filters require less coefficients than FIR filters their cutoff will not be as sharp but for the same reason they require less memory than FIR filters. Figure 2.3.1 shows an IIR filter with a non-linear phase.
2.3.2 FIR filters These filters always have a linear phase therefore behave how one expects them to. Unlike IIR filters FIR filters are stable and have no limit to how many cycles you can have. They are stable because the output only depends on present and past values of the input. Another aspect that is different between these filters and IIR filters is that these types dont have analog history. This is because IIR filter are derived from analog. We wanted to use filters that were completely digitize or digital so FIR filters were the logical choice. FIR filters require less multiplications and additions than the alternative because they are of higher order. Delays are easier to implement on FIR filters but FIR filters require more memory than IIR filters. One of the reasons why FIR filters require more memory is because they typically require more coefficients for the sharp cutoff unlike IIR filters. These FIR filters only depend on the zeros of the transfer function. Figure 2.3.2 shows an FIR filter with a linear phase graph.
Figure 2.3.2: FIR Phase Plot 2.4 FPGA An FPGA is an IC that contains an array of identical logic cells with programmable interconnections also know as configurable logic blocks (CLBs) [9]. The user can program the functions realized by each logic cell and the connections between the cells. A typical CLB contains two or more function generators, often referred to as look-up tables or LUTs, programmable multiplexers, and D-CE flip-flops. The D-CE flip flop is just a normal D flip flop
7
with a clear enable bit. As long as the CE bit is not set the flip flop acts like a regular D flip flop. Figure 2.4.1 shows a simplified version of a CLB.
The CLB shown in Figure 2.4.1 contains two function generators, two flip-flops, and various multiplexers for routing signals within the CLB. Each function generator has four inputs and can implement any function of up to four variables. The function generators are implemented as lookup tables (LUTs). A four input LUT is essentially a reprogrammable read-only memory (ROM) with 16 1-bit words [9]. This ROM stores the truth table for the function being generated. The array of CLBs is then surrounded by a ring of input-output (I/O) interface blocks. Figure 2.4.2 shows the layout of part of a typical FPGA.
The I/O blocks in Figure 2.4.2 connect the CLB signals to directly to the IC pins [9]. Normally an FPGA contains other components such as memory blocks, clock generators, tri-state buffers, as well as other useful digital components. The user defined flexibility coupled with the large amount of memory on an FPGA makes it a great choice to handle the immense amount of DSP that our project will entail. Unlike the custom application-specific integrated circuit (ASIC) approach using FPGA technology one can work off of a simple design and build up on it. The ASIC approach is built for specific project while an FPGA is reprogrammable and can be used for various applications. The FPGA is programmed with the use of electrically programmable switches similar to other logic devices [10]. 2.5 Summary We briefly mentioned some of the theory that will be essential in order to complete our project. Understanding of ADC will be essential to gain an accurate representation of incoming speech signal, knowledge of the frequency spectrum will be needed to characterize the significant components of the frequency and digital filtering will be used to accomplish the spectral analysis. Knowledge of the inner workings of the FPGA will also be vital to the implementation of our project. The techniques will be demonstrated in detail in our methodology.
Chapter 4: Methodology
Our project will activate a door to open or a fan to turn on by using voice recognition. A word will be spoken into the microphone and once the word has been recognized and compared a signal will be sent to either the door or the fan to perform the specified operation. Extensive digital signal processing (DSP) will be used to process a word that is spoken into the microphone. Use of digital filters will be necessary to accomplish the DSP. 4.1 Theoretical Concept We need a simple method to gain the significant frequency content of a speech signal. Spectral analyzers are often used to gain the frequency content of a waveform but these devices are bulky, expensive, and far more sophisticated than our projects needs. A better, simplistic approach to reveal the frequency content of a speech signal is a band pass filter bank. The Fourier transform of a waveform can be thought of as a series of band pass filters with infinitesimally small bandwidths and center frequencies that grow infinitesimally larger so that essentially the output of each filter would represent one point on the Fourier transform of a waveform [3]. Obviously this is an idealized system that is not realizable but does emphasize that a band pass filter bank
10
can be used in order to expose the frequency spectrum of a waveform. Figure 4.1.1 shows a magnitude plot of a realistic bank of band pass filters [8].
Figure 4.1.1: Band Pass Filter Bank [8] This array of filters will capture frequencies that fall within their respective bandwidth. Based upon the outputs of each filter we can make inferences about the frequency content in that frequency band. The PSD will give a better understanding of the significant frequency content in the waveform. In order to gain the PSD we can use Rayleighs Theorem in Equation 2.2.3 and take the time average of the energy to get the power in each filter band [8]. The block diagram of such a system is shown in Figure 4.1.2.
11
Figure 4.1.2: Power Spectrum using a Filter Bank [8] In Figure 4.1.2 the signal x(t) is routed through multiple band pass filters. Each filters response is that part of the signal lying in the frequency range of the filter. The output of each filter is the input of a squarer block that simply takes the square of the signal. The output signal from any squarer is that part of the instantaneous signal power of the original x(t) that lies in the passband of the band pass filter. Then the time averager performs the time-average signal power. Each output response Px (fn) is a measure of the signal power of the original x(t) in a narrow band of frequencies centered at fn. Taken together, the Ps are an indication of the variation of the signal power with frequency or the power spectrum. In the filter bank model in Figure 4.1.1 all the filters are linearly spaced meaning they contain the same bandwidth. This method wastes a lot of bandwidth because the human ear does not process all frequencies the same and actually has unique variations [8]. Figure 4.1.3 shows the average human ears perception of the loudness of a constant -amplitude audio tone as a function of frequency [8].
12
Figure 4.1.3: Human Ear Perception of Loudness vs. Frequency [8] Humans can only produce speech signals up to about 10 kHz [8]. From Figure 4.1.3 it is evident that the human ear has a nonlinear response to frequencies and is highly sensitive to frequency changes in the first 4 kHz with a significant roll off occurring thereafter. Therefore the filter bank model for extracting speech analysis can be improved by logarithmically spacing the filters. Equations for the spacing of these filters will be discussed in further detail in Section 4.2.2. 4.2 Detail Algorithm and Design Approach In this section of the chapter we will discuss in detail the steps we have to take to complete our project successfully. At first we were thinking on using the Fast Fourier transform as our design approach but we decided upon the simpler filter bank processing. We initially wanted to use 10 filters spanning about 10 kHz but due to speed and memory limitations we opted for only 5 filters spanning about 8 kHz. 4.2.1 Data Acquisition and the ADC We will be acquiring data by inputting an analog signal from a microphone that will be connected to the mic-in port on the DE2 board. This port is connected to the 24-bit analog to digital converter (ADC) that is embedded on the board. The output of the ADC is the quantized
13
form of the input wave form. This quantized waveform is a large string of ones and zeros. Every 24 bits represent one point of the waveform. We did not need to use such a high resolution for our project so we decided to down convert the 24-bit ADC to a 12-bit ADC. Having a 24-bit ADC might cause the output of our filters to overflow. Figure 2.1.1 shows how a 3-bit ADC separates the values on the y-axis so our 12-bit will have values ranging from -2048 to 2047. Figure 4.2.1.1 shows and example of how to down convert from a 3 bit resolution to a 2 bit resolution.
Figure 4.2.1.1: Signed Down Conversion Since we are using the DE2 media computer system for our project the default sampling rate is 48 kHz. For our design purpose this is obviously oversampling so we need to down sample somehow. We accomplished this by only saving every third value of the sampled waveform. By doing this we down sampled from 48 kHz to 16 kHz. We came across many projects that stated that when dealing with voice recognition a 16 kHz sampling rate is ideal. The mic-in ADC on the board saves audio in 2-channel stereo quality, containing a right channel and a left channel. Figure 4.2.1.2 shows the audio register ports unto which the left and right channel data is stored.
14
Figure 4.2.1.2: Audio port registers [11] Since voice is only mono quality we only need to retrieve one channel because the other channel will simply be a copy of the data. Thus we could have used the data from either channel so decided to only use the data from the left channel for our project. 4.2.2 Start of Word Detection A crucial step to recognizing speech is locating the beginning of the spoken word (if there is one). For our system the ADC will sample for 3 seconds after the button has been pressed. We used a windowed approach in which the absolute average of two adjacent windows of n points each is compared it to a predefined threshold. Once the threshold is surpassed a pointer will then be specified at the start of the previous window and the samples will be saved into memory from this point onward for 8 K samples or half a second at a 16 KHz sampling rate. The flow chart in Figure 4.2.2.1 shows the design approach for programming the beginning of the word detection.
Equation 4.2.2.1 shows how to calculate the absolute average of the first window, from the initial sample to the endpoint of the window in the vector of sound samples, .
(4.2.2.1)
The average of the second window, , is computed from the sound samples starting at and ending at where the number of points in the window is equal to the difference of b and a or equivalently c and b. The computation for the second window is shown in Equation 4.2.2.2.
(4.2.2.2)
The difference between and is compared to the threshold value Th. If it is larger, then the spoken word is considered to start at . If this is not the case then the average of the oldest window ( ) is discarded, and replaced by . Then, the algorithm continually repeats until the word is detected or it reaches the end of the sound samples in which case no word was detected. The value for the threshold was calculated empirically using MatLab which can be seen in Appendix A.
4.2.3 Frequency Analysis Before we can pass the voice samples through the band pass filter bank we must first pass the values through a pre-emphasis filter. Speech signals normally experience some spectral roll-off of about 6-dB per octave [3]. This means that the amplitude is halved for each doubling of frequency. This phenomenon occurs due to the radiation effects of the sound from the mouth [3]. As a result, the majority of the spectral energy is concentrated in the lower frequencies, which results in an inaccurate estimation of the higher formants. However, the information in the high frequencies is just as important in understanding the speech as the low frequencies. To reduce this effect, the speech signal is filtered prior to the filter bank processing. The pre-emphasis filter makes the outputs of the filters nearly uniform across the spectrum at the expense of lowering the
16
amplitudes slightly. Equation 4.2.3.1 shows the how to calculate the output of the pre-emphasis filter. (4.2.3.1)
In Equation 4.2.3.1, is a coefficient most commonly in the range of 0.95 to 0.98 for speech applications. We opted for 0.97 for our design. The magnitude response of our pre-emphasis filter is shown in Figure 4.2.3.1.
Magnitude (dB)
1000
2000
6000
7000
8000
Figure 4.2.3.1: Pre-emphasis Filter Response From the magnitude plot it is apparent that this filter attenuates the lower frequencies while amplifying the higher frequencies to take care of the -6 dB roll-off. We originally desired to use 10 filters to create a filter bank which will cover the frequencies from 200 Hz to about 10 kHz. This however proved to be too ambitious and we had to remove some half of our designed filters so that we were left with 5 filters spanning 300 Hz to about 7 kHz. Each filter is logarithmically spaced out because of the way human voice behaves in the frequency domain. In general for a human spoken word most of the significant components are in the lower frequencies of the frequency spectrum. This is the reason why we need filters with smaller bandwidths in the lower spectrum. Normally a human voice falls between a range of about 300 Hz to 14 kHz but for the most part the significant frequencies for human voice range from 300 Hz to 2 kHz [4]. We
17
decided to use FIR filters since they have a linear phase without compromising the ability to approximate the ideal magnitude, unlike IIR filters. Unfortunately, they are computationally more expensive in implementation as they require more coefficients for the equivalent IIR filter [3]. After deciding what type of filters we should use we needed to calculate the bandwidths and center frequencies of each filter. The main equations that were used for calculating the bandwidths and the center frequencies of the filters will be shown below. Equation 4.2.3.2 and 4.2.3.3 were used to calculate the bandwidths of each filter then with the results obtained equation 4.2.3.3 was used to calculate the center frequencies of each filter. In equation 4.2.3.2 C equals the bandwidth of the first filter, we decided on 440 Hz. Then bi is the bandwidth for a given filter and Q is the total number of filters to be used, which will be 5. The in equation 4.2.3.3 represents the logarithmic growth factor that typically falls between 1 and 2. The value for was calculated to be 1.45 which would allow us to fit 5 filters into the 7 kHz range. = C (4.2.3.2)
2iQ
(4.2.3.3)
= + +
(4.2.3.4)
We obtained the coefficients from MatLab for our filters using the fir1 function which is a Hamming-window based, linear-phase filter. Another critical choice for our filters was the order of the filter which would determine its sharpness or effectiveness at simply passing frequencies within its band. We opted for 50th order filters which would give us a good sharpness. The transfer function for the FIR filter is shown in Equation 4.2.3.5.
(4.2.3.5)
18
The transfer function of FIR filters only possesses a numerator. This corresponds to an all-zero filter. In this equation the b terms are the filter coefficients, z is the delay element, and M is the order of the filter which in our case is 50. Equation 4.2.3.6 gives the difference equation to solve for the output of the FIR filter.
(4.2.3.6)
The direct form of the FIR filter structure is shown in Figure 4.2.3.2.
Figure 4.2.3.2: FIR Direct Form Structure From Equation 4.2.3.6 and Figure 4.2.3.2 it is apparent that the output of the filter is obtained through the linear combination of the last input samples weighted by the b coefficients. The figure shown in Figure 4.3.3 is that of 5 ideal filters that were generated using MatLab. Appendix A contains the code that was used in MatLab to obtain the graph in Figure 4.3.3. As you can see this filters have cutoff which are impossible to implement using real filters. This is because those idealized band pass filters are rectangular functions in the frequency domain which from Figure 2.2.1 becomes an infinite banded sinc function in the time domain. The sinc function is non-causal and has an infinite delay thus they can only be approximated in the time domain. By observing Figure 4.2.3.3 you can there are 5 filters logarithmically distributed from about 900 Hz to about 6.5 kHz. By this we mean that after each filter the next one keeps increasing in bandwidth by a growth factor alpha shown in Equation 4.2.3.3. Our alpha for the chosen BW represents bandwidth in the Figure 4.2.3.2. We had to limit the number of filters in our band pass filter band due to speed and memory constraints.
19
Magnitude
0.5
1000
2000
6000
7000
8000
Figure 4.2.3.3: Idealized Logarithmically Spaced BPF Bank A more realistic way of implementing the filters that we are going to use in our project is shown in figure 4.2.3.4. The figure shows the 5 filters with a somewhat sharp cutoff. The MatLab code for graphing figure 4.2.3.4 is shown in Appendix A.
FIR Filter Bank 50
Magnitude (dB)
-50
-100
-150
1000
2000
6000
7000
8000
Figure 4.2.3.4: Realizable FIR BPF Bank Since the actual filters are not ideal like those in Figure 4.2.3.3 thus there needs to be some overlap so that there is not a spectral loss between the filters. This is not desirable because a frequency smearing occurs where frequency content appears in neighboring filters due to the
20
overlap. This will not really affect the recognition because the training words saved to memory and word to be recognized will be subjected to the same spectral smearing. 4.2.4 Fingerprint Generation Once we have obtained all our points from the filters we need to calculate the energy for each. The energy is found by using equation 2.2.4, which is the cumulative summation of the squared output of the filters from each filter. A good rule of thumb is to have window lengths between 10-30 milliseconds. Since we are sampling at 16 kHz this would mean for a 10 milliseconds window length we would consider the energy at every 160 accumulated points to be a data point in the fingerprint representation for that respective filter. This is essentially the energy windowed over a certain length. Since all of our keywords are small we used saved one half of a second of sound (8000 points) after the detection of the beginning of a word. Once this is accomplished we will have a 500 point representation (100 points for each of the 5 filters) of the energy in the banded spectrum of the filter at discrete points in time throughout the spoken word. Figure 4.2.4.1 shows a general flow chart of what our system will go through to generate our main fingerprint for each word. First the speech signal from the microphone will pass through the ADC which will digitize the waveform. The output of the ADC will go then pass through the pre-emphasis filter before reaching the filter bank. The output of the filters will then be squared to obtain the instantaneous power which will be added to an accumulator to obtain the energy over a 10 millisecond intervals.
21
Figure 4.2.4.1: Flow Chart for Fingerprint Extraction We have to make a reference fingerprint for every word that we need to store in memory. The reference fingerprint is the average of the individual fingerprints for each training trial. For our system the user will have to say the keyword three times for the system to gain three individual fingerprints which will then be averaged together to create a reference fingerprint. 4.2.5 Comparison Function In the recognition mode the incoming fingerprint will be compared to each reference fingerprint and closest match will be recognized as the spoken word and displayed on the LCD. Thus we need a formula to calculate the difference between the reference fingerprint data points and the spoken word. The Euclidean formula which is derived from the Pythagorean Theorem gives the straight line distance between two vectors of n points. Equation 4.2.5.1 shows how to calculate the distance between two vectors p and q.
(4.2.5.1)
In Equation 4.2.5.1 we can extrapolate that the equivalent distance from p to q is the cumulative distances from each point in p to the corresponding point in q.
22
4.2.6 Driving Outputs We have planned to use only two outputs. One will be the dc motor which will power a fan and the other will be a servo motor that controls the opening of the door. The DC motor only has 2 terminals and simply requires a voltage across the terminals. The servo motor has 3 terminals. Two of the terminals are connected to Vcc and ground while the last terminal is a position signal that requires a pulse-width modulated (PWM) voltage. These outputs will be controlled using the expansion header I/O ports on the DE2 board. The DE2 board provides two 40-pin expansion headers that connect directly to 36 pins on the Cyclone II FPGA, and also provides DC +5V (VCC5), DC +3.3V (VCC33), and two GND pins [11]. Each pin on the expansion headers is connected to two diodes and a resistor that provide protection from high and low voltages. Depending on which word was recognized the corresponding pins should be set to either output or input. For example if the word STOP was recognized the pin controlling the fan should be set to input so that no voltage is supplied by that pin. Figure 4.2.6.1 shows the related schematics for one of the expansion headers (JP1).
Pins VCC5 and VCC33 which are voltage regulated power supplies and can provide higher currents but they are always high and cannot be controlled to switch the motor on or off. The other header I/O ports can be configured as digital outputs but they are current limited. They can only provide current up to 8 mA [11]. This is not nearly enough current to control the DC motor thus we will need a power circuit for it. A simple MOSFET can be used to control the movement
23
of DC motors or brushless stepper motors directly from computer logic [12]. As the motor load is inductive, a simple flywheel diode is connected across the inductive load to dissipate any back EMF generated by the motor when the MOSFET turns it off [12]. An additional silicon diode D1 can also be placed across the channel of a MOSFET switch when using inductive loads for suppressing overvoltage switching transients and noise giving extra protection to the MOSFET switch if required [12]. Resistor R2 is used as a pull-down resistor to help pull the output voltage down to 0V when the MOSFET is switched off [12]. We will use components from the lab to accomplish this circuit. Figure 4.2.6.2 shows the DC motor control circuit.
Figure 4.2.6.2: DC Motor Control Using MOSFET Referring to Figures 4.2.6.1 and 4.2.6.2, the VCC5 pin from the board will be connected to Vdd on the DC motor control circuit. One of the I/O pins such as I/O A0 will be connected to V IN and the circuit will be grounded using the GND pin. Similar to the DC motor, the servo motor will use the VCC5 pin and GND to power the motor while an I/O pin will be used to control the motor. The servo motor relies on PWM to control its position. The I/O pin will have to be programmed to generate a PWM signal. Generally the minimum pulse width will be about 1 millisecond and the maximum pulse width will be 2 milliseconds with a period of 40 milliseconds but the period is not nearly as critical as the pulse widths. Figure 4.2.6.3 shows the position of the servo motor with respect to the pulse width.
24
Figure 4.2.6.3: Servo Motor Position vs. Duty Cycle In order to open the door 90 degrees from the neutral position we will want either a one or two millisecond pulse depending upon which direction we want it to open. This would give us a full rotation of about 180 degrees. The simplest way to accomplish the PWM requirements for the control signal to the servo motor is to use a 555-timer circuit. The circuit in Figure 4.3.6.4 accomplishes the PWM requirements that we need to control the servo motor.
Figure 4.2.6.4: Servo Motor Control Circuit Using Equation 4.2.6.1 in order to solve for the time-low of the waveform we get a time low of 40.54 milliseconds.
(4.2.6.1)
25
Using Equation 4.2.6.2 in order to solve for the minimum time-high of the waveform that occurs when is shorted by the N-Ch FET produces a time high of 1.039 milliseconds.
(4.2.6.2)
Using Equation 4.2.6.2 in order to solve for the maximum time-high of the waveform in which and are both equal to 10 k we get a time high of 1.039 milliseconds Thus this circuit will provide the necessary PWM requirement for the positioning of our servo. 4.2.7 System Architecture For our system architecture the main component will be the FPGA board. Connected to the input of the FPGA board will be the microphone and to the output will be the chip that controls the dc motor. The architecture of the FPGA board is quite complicated to explain everything in detail but we will mention some of the components that we are going to be using. The core of an FPGA consists of the adaptive logic module (ALM). Figure 4.2.7.1 shows the structure of an ALM and its corresponding adders and registers.
26
The ALM is the key to the speed of the FPGA technology and to the efficiency of its architecture. An ALM can implement many functions because it has 8 inputs to its logic block. The ALM can also be separated into smaller LUTs. The components that are implemented on the FPGA board are the ADC, the audio in and the memory storage which include SSRAM, SDRAM, and FLASH. The FPGA board has a pre-configured system which we ended up using since it has the ADC configured. We used the pre configured media computer system that is available on the board. The original bit resolution of the media computer was 24-bit but we needed the resolution to be 12-bit. This was accomplished by shifting every 24-bit value by 12 bits to the right. Figure 4.2.7.2 shows a top view of the DE2 board and the components labeled.
Figure 4.2.7.2 DE2 FPGA Board [11] Figure 4.2.7.2 above has labeled all of the components on the DE2 board that we will be using. For example, some of these components include the SDRAM, Mic-in, LCD Module, and the toggle switches. The SRAM and FLASH memory locations can also be seen on the figure.
27
4.2.8 Training the System We are utilizing a total of 12 switches that are embedded on the DE2 FPGA board, these include from sw-0 to sw-11. There can only be one switch high at one time except when in recognizing mode otherwise if there is an undesired switch high the LCD display will show an error message. Recognizing mode is active when both sw0 and sw1 are high at the time that the record button is pressed. Three switches are used for every word that is to be stored in memory. The reason for using three switches for each word is because we need to record each individual word three different times and then average the values of each recording to obtain our reference fingerprint for each word. This procedure has to be done once for each word, which results in doing this four times because we have four different words. The average of each word will be saved on independent addresses in the SDRAM. We are saving the words on SDRAM because we think that the large quantity of values might cause the SRAM to overflow. SDRAM might be a bit slower than SRAM but do to the processor speed the delay is not significant. Aside from the 12 switches we are also using two pushbuttons. Pushbutton key1 is used for recording and pushbutton key2 is used for playing back the previously recorded word. We decided to have a playback function to allow us to listen to the recording so that we can make sure that it was a fair enough sample of the word. Whenever we have to record we must push key1 and depending on which switch is set high the appropriate target address should be obtained. When we are training our system to store the fingerprint of each word we will have to run the word three times and then take the average banded energy to acquire our fingerprint. The user will select which word he wishes to train by using the sliding switches on the board. Once all the words have been trained the system will be able to run in the recognition mode. Every word we will have 100 points which correspond to each of the 5 filters. Once we run this three times we will have the sum of the energy for 500 points then we will divide this value by 3 to give us the average energy, which will be our reference fingerprint. After we have our reference fingerprint and we speak a word into the microphone we will us the distance formula mentioned before to measure the difference between the spoken word and the words stored in memory. If the spoken word matches one of the stored words within distance threshold then the board should output a signal corresponding to the command that the word represents. However, if the spoken word
28
does not match any of the stored words the board should output some kind of message notifying that the word has no match. Table 4.2.1 shows the words that we will be using as our inputs and the corresponding outputs for when the words are recognized. When there is no word match the message WORD NOT RECOGNIZED will be displayed on the LCD display that is embedded on the DE2 board. Table 4.2.1 Inputs and Outputs Word Match GO STOP OPEN CLOSE NO MATCH Output Turn fan on Turn fan off Open door Close door WORD NOT RECOGNIZED
A system diagram is shown in figure 4.2.8.1 where the first image represents the microphone which is connected to the audio in port on the FPGA board. The second image labeled FPGA board represents the whole FPGA board which contains the ADC, the audio in port and its where we implemented our filtering design. You can see the SRAM and SDRAM modules on the DE2 board system diagram. These modules are controlled by the SRAM and SDRAM controller respectively, which can also be seen in the diagram. The 16x2 LCD display is controlled by the LCD port. We use the LCD display to show the messages for our program.
29
Figure 4.2.8.1: System Diagram [11] At first we attempted to build our own system to use on the FPGA board using the Quartus II program. We wanted to build our own system so that we could choose which components from the system diagram to use for our project. We ended up giving up on building our own system because we were getting many errors that we could not fix when running the system. After we started using the media computer system that we found on the Altera website we started with testing the switches, pushbuttons, and the LCD display. We successfully tested the ports that we planned to use in our project before actually running our program on the DE2 board. The Altera monitor program was very useful when we needed to find which memory location to use for our values. Using the memory tab on the monitor program we were able to see where the buffer stored all the values for the spoken words. The values on the buffer were the ones that we needed to store somewhere in memory for future use. 4.3 Work Breakdown Table 4.3.1 shows the division of work for our project. Each task shows its corresponding start and completion date along with the team member participated in accomplishing that task. It is proceeded by the Gannt chart for our project.
30
Ismael Ismael Ismael Ismael Tyler/Ismael Tyler/Ismael Ismael Ismael Tyler/Ismael Tyler/Ismael Tyler/Ismael Tyler Tyler/Ismael Tyler/Ismael Tyler/Ismael Tyler/Ismael 31
8/27/2012 9/15/2012 9/14/2012 9/25/2012 9/4/2012 10/3/2012 8/27/2012 11/21/2012 11/16/2012 11/21/2012 9/25/2012 10/2/2012 10/10/2012 10/16/2012 8/27/2012 11/6/2012 8/27/2012 8/29/2012 9/24/2012 11/2/2012 11/2/2012 11/6/2012 11/5/2012 11/20/2012 11/5/2012 11/10/2012 11/12/2012 11/20/2012 10/12/2012 12/1/2012
Speech Recognition Using FPGA Euclidean Distance Function Word Matching Function Threshold Calculation Signal to Outputs Outputs DC Motor Configure GPIO Pins Opamp Buffer High Side MOSFET Switch Servo Motor 555 Timer for PWM Signal Configure GPIO Pins High Side MOSFET Switch Project Testing / Refinement Debugging & Refinement 10/12/2012 10/22/2012 11/20/2012 11/21/2012 10/15/2012 10/15/2012 10/15/2012 11/1/2012 11/1/2012 10/15/2012 10/22/2012 10/15/2012 11/1/2012 11/26/2012 11/26/2012 10/20/2012 10/26/2012 12/1/2012 11/23/2012 11/6/2012 11/6/2012 11/6/2012 11/6/2012 11/6/2012 11/6/2012 10/30/2012 10/25/2012 11/6/2012 12/7/2012 12/7/2012 Tyler/Ismael Ismael Ismael Ismael Tyler/Ismael Tyler Ismael Tyler Tyler Tyler/Ismael Tyler Ismael Tyler Tyler/Ismael Tyler/Ismael
32
33
34
0.4
0.2
Amplitude
-0.2
-0.4
-0.6
-0.8
0.05
0.1
0.15
0.2
0.35
0.4
0.45
0.5
Figure 6.1.1: Go in Time-Domain From the plot of Go it is evident that the word begins with the humming of the g sound then much more powerful o vowel. The output of the filters for the word Go shown in Figure 6.1.2 reveals that significant content in the signal occurs in the fifth filter (spanning about 4.8 kHz to 7 kHz) at the beginning of the word. This is consistent with the high frequency consonant g. Also, significant content in the signal occurs in the second filter (spanning about 1.2 kHz to 2 kHz) at about 250 ms into the sample. This lower frequency is consistent with the o vowel.
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 0.05 -3 x 10
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.35
0.4
0.45
0.5
35
Figure 6.1.2: Output of Filters vs. Time for Go Figure 6.1.3 is the time-domain plot of the word Stop which will be used to turn off the dc motor. The high frequency hissing s sound is clearly at the start of the word followed by a hard soft t then op phoneme.
Filter Outputs for "Stop" 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
Amplitude
0.05
0.1
0.15
0.2
0.35
0.4
0.45
0.5
Figure 6.1.3: Stop in Time-Domain The output of the filters for the word Stop shown in Figure 6.1.4 reveals that significant content in the signal occurs in the fifth filter (spanning about 4.8 kHz to 7 kHz) at the beginning of the word. This is consistent with the high frequency s consonant. Also, the t-o-p sound is present across all of the filters at about 270 milliseconds in an appreciable amount.
36
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.5 0 -0.5 0.5 0 -0.5 0.5 0 -0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 Time (sec) 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Figure 6.1.4: Output of Filters vs. Time for Stop Figure 6.1.5 is the time-domain plot of the word Open which will be used to initiate the servo motor to signify the opening of a door. Interestingly enough the same op phoneme is present as it was at the end of the keyword stop. The word then end s with the low frequency consonant n sound.
Normalized Plot of "Open" 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
Amplitude
0.05
0.1
0.15
0.2
0.35
0.4
0.45
0.5
Figure 6.1.5: Open in Time-Domain The output of the filters for the word Open is shown in Figure 6.1.6. The beginning op sound is present across all of the filters in an appreciable amount just as it was in the word Stop. The
37
n sound does not even appear on our filt er outputs. This is most likely due to the fact that n is one of the lowest sounds and is getting attenuated by the pre-emphasis filter.
Filter Outputs for "Open" 0.2 0 -0.2 0.2 0 -0.2
Amplitude
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.2 0 -0.2 0.2 0 -0.2 0.2 0 -0.2 0 0.05 0.1 0.15 0.2 0.25 0.3 Time (sec) 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Figure 6.1.6: Output of Filters vs. Time for Open Figure 6.1.7 is the time-domain plot of the word Close whic h will be used to return the servo motor to its original position signifying the closing of a door. There is not much empty content for the word Close unlike the other keywords in which the differe nt sounds were clearly visible.
Filter Outputs for "Close" 0.06
0.04
0.02
Amplitude
-0.02
-0.04
-0.06
-0.08
0.05
0.1
0.15
0.2
0.35
0.4
0.45
0.5
The output of the filters for the word Close is shown in Figure 6.1.8. The beginning of the word begins with has very low amplitude outputs from the first two filters. That is then followed by significant content in the last three filters.
5 0 -5 2 0 -2
Amplitude
x 10
-3
0 0.05 -3 x 10
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
5 0 -5
0 0.05 -3 x 10
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.35
0.4
0.45
0.5
Figure 6.1.8: Output of Filters vs. Time for Close 6.2 Testing Results When testing our system on the DE2 Altera board we ran three rounds of ten trials for each word by each of us. The results of the voice recognition testing are shown in Table 6.2.1. Table 6.2.1: Results of Test
39
From these tests we can infer that not all speakers are created equal with our system. Tyler finished with an average recognition rate of 62.5% while Ismael finished with an average recognition rate of only 59.25%. Ismaels voice is definitely deeper so we suspect this was a contributing factor to the lower recognition rate. The word that had the highest accuracy at an average rate of 73.5% was Stop while the lowest accuracy at an average rate of 53.5% was Open. In one of the round with Stop Tyler was able to have a recognition rate of 90% (9 out of 10) but there were rates as low as 40% (4 out of 10) for Open. While this accuracy is not very practical for a real world application we are very pleased with these results given that we did not have any sophisticated pattern-matching algorithm. When testing we could usually tell when the word was going to be correctly identified just by how loud we inflected our voice and by the consistency of the speed at which we trained the word. We tested this theory briefly with our smart phones by recording an input word that yielded a correct output. When we used the recorded message the recognition rates were repeated in the 80% range so the system was consistent given a consistent input. 6.3 System Improvements Our system could become a practical solution given some improvement. The first area that needs to be addressed is the processing speed. At its current state the board takes about one minute and 20 seconds to complete all the necessary computation and make a decision. Using filters in a hardware designed parallel structure would greatly cut down on the latency that the serial software structure causes. Also, we could have convert to using IIR filters to that the equivalent filter order would not need to be so high yielding less computations. In addition using a fixed point representation would be a great improvement over the floating point math currently used by our filters. A method to allow for the use of a variable length for the input sounds would drastically improve its performance on very short or very long words instead of a fixed length like we currently have in place. Another area that has much room for improvement is our comparison function. Our system relies on accumulated energy over windowed bands but does not incorporate any type of pattern matching or linear regression techniques where the best match is found. The comparison function that we have in place now is very susceptible to shifts in the timing of the words so that the peaks and troughs of the fingerprints are out of place,
40
introducing a lot of error. Lastly, a normalization technique would help to dampen the variability due to the loudness of the spoken word.
Chapter 7: Conclusion
After applying the background theory, analysis using a MatLab prototype, and implementing a prototype on the Altera DE2 board, it is evident a speech recognition system can indeed be successfully implemented using FPGA technology. We achieved all of our proposed goal and objectives in the time allotted for our project. Improvements are needed to our current system in order to make it practical for the consumer. We set out to create a simple solution to speech recognition and our results were modest. Speech is a robust problem. As is often the case, in order to achieve accurate results complex problems invariably yield complex solutions. Regardless, we have learned a great deal about the density of speech and would to further our interest in the subject by continuing to improve upon our system.
41
References 1. Ifeachor, Emmanuel, and Berrie Jervis. Digital Signal Processing: A Practical Approach. Prentice Hall, 2002. Print 2. Torres, Grabriel. Hardware Secrets. LLC. April 21, 2006. Web. March 22, 2012. http://www.hardwaresecrets.com/article/317 3. Rabiner, Lawrence, and Biing-Hwang, Juang. Fundamentals of Speech Recognition. Prentice-Hall International, Inc. Print. 4. EVP Frequency Ranges. Web. March 29, 2012. http://www.paranormalghost.com/evp_frequency_ranges.htm 5. http://www.census.gov/newsroom/releases/archives/facts_for_features_special_editions/c b10-ff13.html 6. Wikipedia. Wikipedia Foundation, Inc. March 12, 2012. Web. March 28, 2012. http://www.wikipedia.org/ 7. Yarlagadda, R. K. Rao. Analog and Digital Signals and Systems. New York: Springer, 2010. Print. 8. Roberts, Michael J. Signals and Systems: Analysis Using Transform Methods and MATLAB. New York: McGraw-Hill, 2012. Print. 9. Roth, Charles H., and Larry L. Kinney. Fundamentals of Logic Design. Stamford, CT: Cengage Learning, 2010. Print. 10. Rose, Jonathan. Architecture of Filed-Programmable Gate Arrays. Web. April 30, 2012. http://isl.stanford.edu/groups/elgamal/abbas_publications/J029.pdf 11. DE2 Development and Education Board User Manual. Altera Cooporation, 2006. PDF. 12. "MOSFET as a Switch." Using the Power MOSFET. Web. 28 Mar. 2012. http://www.electronics-tutorials.ws/transistor/tran_7.html
42
43
44
end
end
45
46
end %---------------------if keywd == 4 if k == 1 load('STOP1') end if k == 2 load('STOP2') end if k == 3 load('STOP3') end qmic = STOP; end ptr = 1; % Initialize pointer.
ave1 = mean(qmic(ptr:ptr+w)); % Initialization of average windows. ave2 = ave1; % Go through the sound until the difference between the average of two % adjacent windows is significant. check = 1; error = 1; while check if abs(ave1-ave2) > thres check = 0; end if (ptr + 2*w > Ts*Fs/ds) check = 0; disp '[!] Error: No Word Detected.'; error = 0; end if check ptr = ptr + w; ave2 = ave1; ave1 = mean(abs(qmic(ptr:ptr+w))); end end if error word = qmic(ptr:((ptr-1)+ l*Fs/ds)); % Store the detected word %-------------------------------------------------------------------------% % Perform DSP %========================================================================= Xfft = abs(fft(word)); % Find FFT of data X2fft = Xfft(1:end/2).^2 + X2fft; % Square FFT data to get PSD end end X2fft = X2fft/trials; % Average PSD's FFT's mag = 20*log10(X2fft); % Convert into dB magnitude if error
47
48
49
50
ave1 = mean(qmic(ptr:ptr+w)); % Initialization of average windows. ave2 = ave1; % Go through the sound until the difference between the average of two % adjacent windows is significant. while check if abs(ave1-ave2) > thres check = 0; end if (ptr + 2*w > Ts*Fs/ds) check = 0; disp '[!] Error: No Word Detected.'; error = 1; end if check ptr = ptr + w; ave2 = ave1; ave1 = mean(abs(qmic(ptr:ptr+w))); end end if ~error word = qmic(ptr:((ptr-1)+ l*Fs/ds)); % Store the detected word %-------------------------------------------------------------------------% % FIR Filtering & Fingerprint Generation %========================================================================= % Apply Preemphasis Filter to Word % Note: Eliminates the -6dB per octave decay of the spectral energy for j = 2:rec s(j) = word(j) - 0.97*word(j - 1); end out1 = (filter(B1, 1, s)).^2; out2 = (filter(B2, 1, s)).^2; out3 = (filter(B3, 1, s)).^2; out4 = (filter(B4, 1, s)).^2; out5 = (filter(B5, 1, s)).^2; %out6 = (filter(B6, 1, s)).^2; % Display the reference fingerprint. % Note: only half of the fft is displayed since the fft of a real signal % is half redundant. figure('Name','Reference Fingerprint','NumberTitle','off'); subplot(n,1,1); plot(out1); subplot(n,1,2); plot(out2); subplot(n,1,3); ylabel ('Amplitude'); plot(out3); subplot(n,1,4); plot(out4); subplot(n,1,5); plot(out5); xlabel ('\omega \times N \div 4\pi'); end %------------------------------------------------------------------------%
51
52
53
Media Interrupt.c
#include "nios2_ctrl_reg_macros.h" /* these globals are written by interrupt service routines; we have to declare * these as volatile to avoid the compiler caching their values in registers */ extern volatile char byte1, byte2, byte3; /* modified by PS/2 interrupt service routine */ extern volatile int record, play, buffer_index; // used for audio extern volatile int timeout; // used to synchronize with the timer /* function prototypes */ void LCD_cursor( int, int ); void LCD_text( char * ); void LCD_cursor_off( void ); void VGA_text (int, int, char *); void VGA_box (int, int, int, int, short); void HEX_PS2(char, char, char); /* Start audio saving on SRAM address 08040000 */ /******************************************************************************** * This program demonstrates use of the media ports in the DE2 Media Computer * * It performs the following: * 1. records audio for about 10 seconds when an interrupt is generated by * pressing KEY[1]. LEDG[0] is lit while recording. Audio recording is * controlled by using interrupts * 2. plays the recorded audio when an interrupt is generated by pressing * KEY[2]. LEDG[1] is lit while playing. Audio playback is controlled by * using interrupts * 3. Draws a blue box on the VGA display, and places a text string inside * the box. Also, moves the word ALTERA around the display, "bouncing" off * the blue box and screen edges * 4. Shows a text message on the LCD display
54
/* these variables are used for a blue box and a "bouncing" ALTERA on the VGA screen */ int ALT_x1; int ALT_x2; int ALT_y; int ALT_inc_x; int ALT_inc_y; int blue_x1; int blue_y1; int blue_x2; int blue_y2; int screen_x; int screen_y; int char_buffer_x; int char_buffer_y; short color; /* set the interval timer period for scrolling the HEX displays */ int counter = 0x960000; // 1/(50 MHz) x (0x960000) ~= 200 msec *(interval_timer_ptr + 0x2) = (counter & 0xFFFF); *(interval_timer_ptr + 0x3) = (counter >> 16) & 0xFFFF; /* start interval timer, enable its interrupts */ *(interval_timer_ptr + 1) = 0x7; // STOP = 0, START = 1, CONT = 1, ITO = 1 *(KEY_ptr + 2) = 0xE; register, and bits to 1 (bit 0 is Nios II reset) */ *(PS2_ptr) = 0xFF; *(PS2_ptr + 1) = 0x1; enable interrupts */ NIOS2_WRITE_IENABLE( 0xC3 ); (interval (pushbuttons), 6 (audio), and 7 (PS/2) */ /* reset */ /* write to the PS/2 Control register to /* set interrupt mask bits for levels 0 * timer), 1 /* write to the pushbutton interrupt mask * set 3 mask
55
/* create a messages to be displayed on the VGA and LCD displays */ char text_top_LCD[60] = "Audio Record \0"; char text_top_VGA[20] = "Altera DE2\0"; char text_bottom_VGA[20] = "Media Computer\0"; char text_ALTERA[10] = "ALTERA\0"; char text_erase[10] = " \0"; /* output text message to the LCD */ LCD_cursor (0,0); // set LCD cursor location to top row LCD_text (text_top_LCD); LCD_cursor_off (); *(pin_ptr) = 0xffffffff; // turn off the LCD cursor /* the following variables give the size of the pixel buffer */ screen_x = 319; screen_y = 239; color = 0x1863; // a dark grey color VGA_box (0, 0, screen_x, screen_y, color); // fill the screen with grey // draw a medium-blue box around the above text, based on the character buffer coordinates blue_x1 = 28; blue_x2 = 52; blue_y1 = 26; blue_y2 = 34; // character coords * 4 since characters are 4 x 4 pixel buffer coords (8 x 8 VGA coords) color = 0x187F; // a medium blue color VGA_box (blue_x1 * 4, blue_y1 * 4, blue_x2 * 4, blue_y2 * 4, color); /* output text message in the middle of the VGA monitor */ VGA_text (blue_x1 + 5, blue_y1 + 3, text_top_VGA); VGA_text (blue_x1 + 5, blue_y1 + 4, text_bottom_VGA); char_buffer_x = 79; char_buffer_y = 59; ALT_x1 = 0; ALT_x2 = 5/* ALTERA = 6 chars */; ALT_y = 0; ALT_inc_x = 1; ALT_inc_y = 1; VGA_text (ALT_x1, ALT_y, text_ALTERA); while (1) { while (!timeout) ; // wait to synchronize with timer /* move the ALTERA text around on the VGA screen */ VGA_text (ALT_x1, ALT_y, text_erase); // erase ALT_x1 += ALT_inc_x; ALT_x2 += ALT_inc_x; ALT_y += ALT_inc_y; if ( (ALT_y == char_buffer_y) || (ALT_y == 0) ) ALT_inc_y = -(ALT_inc_y); if ( (ALT_x2 == char_buffer_x) || (ALT_x1 == 0) ) ALT_inc_x = -(ALT_inc_x); if ( (ALT_y >= blue_y1 - 1) && (ALT_y <= blue_y2 + 1) ) { if ( ((ALT_x1 >= blue_x1 - 1) && (ALT_x1 <= blue_x2 + 1)) || ((ALT_x2 >= blue_x1 - 1) && (ALT_x2 <= blue_x2 + 1)) ) { if ( (ALT_y == (blue_y1 - 1)) || (ALT_y == (blue_y2 + 1)) )
56
/**************************************************************************************** * Subroutine to send a string of text to the LCD ****************************************************************************************/ void LCD_text(char * text_ptr) { volatile char * LCD_display_ptr = (char *) 0x10003050; // 16x2 character display while ( *(text_ptr) ) { *(LCD_display_ptr + 1) = *(text_ptr); ++text_ptr; } } /**************************************************************************************** * Subroutine to turn off the LCD cursor ****************************************************************************************/ void LCD_cursor_off(void) { volatile char * LCD_display_ptr = (char *) 0x10003050; // 16x2 character display *(LCD_display_ptr) = 0x0C; // turn off the LCD cursor } /**************************************************************************************** * Subroutine to send a string of text to the VGA monitor ****************************************************************************************/ void VGA_text(int x, int y, char * text_ptr) {
57
// compute halfword
/**************************************************************************************** * Subroutine to show a string of HEX data on the HEX displays ****************************************************************************************/ void HEX_PS2(char b1, char b2, char b3) { volatile int * HEX3_HEX0_ptr = (int *) 0x10000020; volatile int * HEX7_HEX4_ptr = (int *) 0x10000030; /* SEVEN_SEGMENT_DECODE_TABLE gives the on/off settings for all segments in * a single 7-seg display in the DE2 Media Computer, for the hex digits 0 - F */ unsigned char seven_seg_decode_table[] = { 0x3F, 0x06, 0x5B, 0x4F, 0x66, 0x6D, 0x7C, 0x07, 0x7F, 0x67, 0x77, 0x7C, 0x39, 0x5E, 0x79, 0x71 }; unsigned char hex_segs[] = { 0, 0, 0, 0, 0, 0, 0, 0 }; unsigned int shift_buffer, nibble; unsigned char code; int i; shift_buffer = (b1 << 16) | (b2 << 8) | b3; for ( i = 0; i < 6; ++i ) {
58
Audio.c (Main)
#include "globals.h" #include <stdio.h> #include <math.h> /* globals used for audio record/playback */ extern volatile int record, play, buffer_index; extern volatile int left_buffer[]; extern volatile int right_buffer[]; void Euclidean_Dist(int i, int f_num, int *w, int *x, int *y); /* Function Prototype */ void PreEmphasis(int p, int *z); /* Function Prototype */ void averaging(int *a, int *b, int *c, int *d); int best_match(void); // function prototype void FIR_Filter(int trial, int samp_length, float B[], int *samp, int *out); volatile volatile volatile volatile volatile int int int int int * * * * * d1; d2; d3; d4; d5;
int taps = 50; int which_word; int trial; long int dist[5][2]; long int *d = &dist[0][0]; float B1[] = {-0.001085,-0.000904,-0.000504,-0.000093,-0.000067,-0.000898,-0.002737,0.004944,-0.005919,-0.003580,0.003497,0.014749,0.026864,0.034309,0.031310,0.014672,0.013812, -0.046694,-0.072710,-0.080748,-0.064499,0.025774,0.025145,0.072614,0.101135,0.101135,0.072614,0.025145,-0.025774,-0.064499,0.080748,-0.072710,-0.046694,-0.013812, 0.014672,0.031310,0.034309,0.026864,0.014749,0.003497,-0.003580,0.005919,-0.004944,-0.002737,-0.000898,-0.000067,-0.000093,-0.000504,-0.000904,0.001085}; float B2[] = {-0.001713,-0.001456,0.000013,0.002355,0.004189,0.003689,0.000365,0.003649,-0.004955,-0.002540,-0.000016,-0.002727,-0.010978,-0.016151,0.006105,0.022034,0.052881, 0.059114,0.022610,-0.045258,-0.103633,-0.107849,0.044031,0.055007,0.128266,0.128266,0.055007,-0.044031,-0.107849,-0.103633,0.045258,0.022610,0.059114,0.052881,
59
* * * * * *
60
int fifospace, leftdata, rightdata; SW_value = *(SW_ptr); if (*(audio_ptr) & 0x100) the Control register { int shift; int shift2; m = 0; n = 0; Rmode = 0; matches = 0; P_in = 8000; P1 = 0x250000; // check bit RI of
61
62
\0";
\0";
\0";
63
;6d600
//7d000 \0";
//8ca00 \0";
//9c400 \0";
; abe00
//bb800
64
//cb200 \0";
//dac00 + 130000 =
; ea600
else if(SW_value == 0x3) // recognizing mode { l_start_saving = 0x240000; temp_saving = 0x240000; *(red_LED_ptr) = 0x3; char text_top_LCD[60] = "Speak Now \0"; LCD_cursor (0,0); // set LCD cursor location to top row LCD_text (text_top_LCD); LCD_cursor_off (); recognize = 0x240000; //save a pointer for starting address in recognizing mode which_word = 5; } else {
65
} fifospace = *(audio_ptr + 1); // read the audio port fifospace register // store data until the the audio-in FIFO is empty or the buffer is full while ( (fifospace & 0x000000FF) && (buffer_index < BUF_SIZE) ) { left_buffer[buffer_index] = *(audio_ptr + 2); right_buffer[buffer_index] = *(audio_ptr + 3); ++buffer_index; if (buffer_index == BUF_SIZE) { // done recording record = 0; *(green_LED_ptr) = 0x0; LEDG *(audio_ptr) = 0x0; interrupts *(red_LED_ptr) = 0x0; red led buffer_index = 0; sum = 0; i = 0; sum2 = 0; // start address 0x2120 // ending address 0x126f40 while (i < 960) { for(k=0;k<100;k++) { sum += abs(*signal); signal++; signal++; } sum = sum/100; if(sum>sum2) { // turn off // turn off // turn off
66
67
68
matches = best_match(); if (matches == 0) { char text_top_LCD[60] = "Detected Fan LCD_cursor (0,0); // set LCD cursor location to top row LCD_text (text_top_LCD); LCD_cursor_off (); *(pin_ptr) = 0xfffffffe; } if (matches == 1) { char text_top_LCD[60] = "Detected Stop LCD_cursor (0,0); // set LCD cursor location to top row LCD_text (text_top_LCD); LCD_cursor_off (); *(pin_ptr) = 0xffffffff; } } } fifospace = *(audio_ptr + 1); register } } if (*(audio_ptr) & 0x200) the Control register { if (buffer_index == 0) *(green_LED_ptr) = 0x2; LEDG_1 // check bit WI of // read the audio port fifospace
\0";
\0";
// turn on
69
// turn off
70
71
return; } /**************************************************************/ /**************************************************************/ int best_match(void) { long int match1 = 0; long int match2 = 0; int match = 0; /****** match1 = dist[0][0]+dist[1][0]+dist[2][0]+dist[3][0]+dist[4][0]; match2 = dist[0][1]+dist[1][1]+dist[2][1]+dist[3][1]+dist[4][1]; if (match1 < match2) match = 0; else match = 1; ******/
if(dist[0][0] < dist[0][1]) { *(d1) = dist[0][0]-dist[0][1]; match1++; } else match2++; if(dist[1][0] < dist[1][1]) { *(d2) = dist[1][0]-dist[1][1]; match1++; } else match2++; if(dist[2][0] < dist[2][1]) { *(d3) = dist[2][0]-dist[2][1]; match1++;
72
73
74
samp++;
} /*****************************************************************/
Pushbutton.c
extern volatile int buffer_index; /*************************************************************************************** * Pushbutton - Interrupt Service Routine * * This routine checks which KEY has been pressed. If it is KEY1 or KEY2, it writes this * value to the global variable key_pressed. If it is KEY3 then it loads the SW switch * values and stores in the variable pattern ****************************************************************************************/ void pushbutton_ISR( void ) { volatile int * KEY_ptr = (int *) 0x10000050; // pushbuttons base address volatile int * audio_ptr = (int *) 0x10003040; // audio port address volatile int * green_LED_ptr = (int *) 0x10000010; // green LED address int KEY_value; KEY_value = *(KEY_ptr + 3); *(KEY_ptr + 3) = 0; if (KEY_value == 0x2) { // read the pushbutton interrupt register // Clear the interrupt // check KEY1 // turn on LEDG[1]
*(green_LED_ptr) = 0x2; // reset the buffer index to record buffer_index = 0; // clear audio-in FIFO *(audio_ptr) = 0x4; // turn off clear, and enable audio-in interrupts *(audio_ptr) = 0x1; } else if (KEY_value == 0x4) { *(green_LED_ptr) = 0x4; // reset buffer index to record buffer_index = 0; // clear audio-out FIFO // check KEY2
// turn on LEDG[2]
75
*(green_LED_ptr) = 0x8; // reset buffer index to record buffer_index = 0; // clear audio-in FIFO *(audio_ptr) = 0x4; // turn off clear, and enable audio-in interrupts *(audio_ptr) = 0x3; } ****/ return; }
76