User Tools

Site Tools


A PCRE internal error occured. This might be caused by a faulty plugin

===Quick links=== ==This wiki== * [[start|Main classwiki page]] * [[CLASS INFORMATION|Class Info]] * {{:Kuzma_Syllab_Spr2014_v4.pdf|Syllabus}} * [[FREQUENTLY ASKED QUESTIONS|FAQ]] * [[CLASS MATERIALS|Class materials]] * [[PHYSICS LABORATORY|Labs]] * [[|Schedule]] * [[PHYSICS WORKSHOP|Workshop]] * {{:workshops:ph299_syllabus_14sp.pdf|W/S syllabus}} * [[Computational Projects|Projects]] * [[White noise project|White noise]] * [[Rainbow project|Rainbow]] * [[Digital sound project|Digital sound]] * [[Announcements]] ==Earlier material== * [[Chapter 13]] * [[Chapter 14]] * [[Exam 1 review]] * [[Chapter 25]] * [[Chapter 26]] * [[Chapter 27]] * [[Exam 2 review]] * [[Final exam review]] ==Previous wikis== * [[|Ph202 - 2014]] ==Other learning tools== * [[|University D2L site]] * [[|Text & homework]] \\ <sub><color magenta>PH203KUZMASPRING2014</color></sub> ==Knowledge & computation== * [[|Wolfram]] $\alpha$ * [[wp>Physics_portal|Wikipedia]] * [[|Physical constants]] * [[| The Physics Hypertextbook]] * [[| HyperPhysics]] ==Add more by editing:== * [[sidebar|This sidebar]] * [[Tasks to do]] ==Help for editors== * [[doku>wiki:syntax|Help on wiki codes]] * [[|Help on wiki math]] * [[Tips on editing]] =="Sandboxes" for practice== * [[Draft page|Practice here]] * [[Draft page 2|Or here if locked-out]]


**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

======Digitizing the sound====== ==By Alisha Harrington, Christopher Hernandez, and Aaron Heston== ==in collaboration with Prof. Nicholas Kuzma== =====Introduction===== //Feel free to expand and contribute// Sound is what a living organism can perceive through its sense of hearing. [[digital sound project#references|[1]]] Physically, sound is vibrational mechanical energy that propagates through matter as a wave. For humans, hearing is limited to frequencies between about 20 Hz and 20000 Hz, with the upper limit generally decreasing with age. Other species (e.g. dogs) may have a different range of hearing. As a signal perceived by one of the major senses, sound is used by many species for detecting danger, navigation, predation, and communication. In Earth's atmosphere, water, and soil virtually any physical phenomenon, such as fire, rain, wind, surf, or earthquake, produces (and is characterized by) its unique sounds. Many species, such as frogs, birds, marine and terrestrial mammals, have also developed special organs to produce sound. In some species these became highly evolved to produce song (e.g., birds and whales) and (in humans) speech. Furthermore, humans have developed culture and technology (such as music, telephony and radio) that allows them to generate, record, transmit, and broadcast sounds. [[digital sound project#references|[2]]] ====Digital Recording==== [[wp>Sound]] can be [[wp>Digital recording|digitally recorded]] by virtually anyone, as many smart phones and personal computers have this capability. However, most people use these recordings simply to play them back at a later time, and relatively few are concerned with looking at the numerical record itself, analyzing and editing it, etc. Nonetheless, there are entire scientific and engineering fields of digital [[wp>speech recognition]] and [[wp>speech synthesis]]. Most of the progress in these fields stems from advances in mathematical analysis of digitally recorded sound. ===Recordings by authors=== Using [[wp>IGOR Pro]] software, N.K. recorded the following simple vowels and a syllable: "Aaa", "o", "ee", "Ok", as shown in Fig. 1. The following recording parameters were used: | **Table 1.** Recording parameters used in Fig. 1 ||| ^ Parameter ^ Units ^ Value ^ | Hardware used | | iMac | | Recording input | | Built-in microphone | | Software used | | IGOR Pro 6.22A | | Number of channels | | 1 | | Sampling rate | samples per second $\left({\text s}^{-1}\right)$ | 16000 | | {{ :figs:lec02fg3.jpg?nolink |}} | ^ Figure 1. Various spoken sounds recorded by N.K. ^ ====History==== // This timeline is adapted from the [[wp>Digital recording]] article, [[wp>Wikipedia]]// * In 1938, British scientist [[wp>Alec Reeves]] files the first patent describing [[wp>Pulse-code modulation]].((Robertson, David. "Alec Reeves 1902-1971" [[| Telephone History]].)) * In 1943, [[wp>Bell Labs|Bell Telephone Laboratories]] develops the first digital scrambled speech transmission system, [[wp>SIGSALY]]. ((J. V. Boone, J. V., Peterson R. R.: [[|"Sigsaly - The Start of the Digital Revolution"]].)) * In 1957, [[wp>Max Mathews]] of Bell develops the process to digitally [[wp>sound recording|record]] sound via [[wp>computer]]. * In 1967, the first [[wp>digital audio]] magnetic tape recorder is invented by [[wp>NHK]]'s research facilities in Japan. A 12-bit 30 kHz stereo device using a [[wp>compander]] (similar to [[wp>Dbx (noise reduction)|DBX Noise Reduction]]) to extend the dynamic range. * In 1975, [[wp>Thomas Stockham]] makes the first digital audio recordings using standard computer equipment and develops a digital audio recorder of his own design, the first of its kind to be offered commercially (through Stockham's [[wp>Soundstream]] company). * In 1970, James Russell patents the first digital-to-optical recording and playback system, which would later lead to the [[wp>Compact Disc]]. ((Inventor of the Week, [[|Michigan Institute of Technology]].)) * In 1972, [[wp>Denon]] invents the first 8-track [[wp>reel to reel]] digital recorder. * In 1977, Denon's music company [[wp>Denon Records]], a division of [[wp>Nippon Columbia]], became the first record label to record a first of all digitally recorded commercial album using their state-of-the-art "Denon 034 multi-track system". The album was [[wp>Archie Shepp]]'s "On Green Dolphin Street", became the first digitally-recorded album in the history of [[wp>Jazz music]] but didn't include yet the vocals.(([[]].)) * In 1978, Sound 80 Records of Minneapolis records "Flim and the BB's" (S80-DLR-102) directly to digital before pressing the vinyl LP. The mastering engineer is Bob Berglund. The recording system is a 3M Digital Audio Mastering System. * In 1979, the first digital [[wp>Compact Disc]] prototype was created as a compromise between sound quality and size of the medium. * In 1979, the first digitally recorded album of [[wp>popular music]] now with vocals, "[[wp>Bop 'Til You Drop]]" by guitarist [[wp>Ry Cooder]], was released by [[wp>Warner Bros. Records]]. The album was recorded in [[wp>Los Angeles]] on a 32-track digital machine built by the [[wp>3M]] corporation. Also, [[wp>Stevie Wonder]] digitally recorded his [[wp>soundtrack album]], "[[wp>Journey Through the Secret Life of Plants]]", three months after Cooder's album was released, followed by the Grammy-award self-titled [[wp>Christopher Cross (album)|debut album]] of American singer [[wp>Christopher Cross]] which was also 3M digitally recorded album. * In 1982, the first digital [[wp>compact disc]]s are marketed by [[wp>Sony]] and [[wp>Philips]], ((Encyclopædia Britannica: "Compact Disc". 2003 Deluxe Edition CD-ROM. Encyclopædia Britannica, Inc.)) and [[wp>New England Digital]] offers the [[wp>hard disk recorder]] (Sample-to-Disk) option on the [[wp>Synclavier]], the first commercial [[wp>hard disk]] (HDD) recording system.(([[|Synclavier history]].)) Also that same year, [[wp>Peter Gabriel]] releases, [[wp>Security (album)|''Security'']] and "[[wp>The Nightfly]]" released by [[wp>Donald Fagen]], which both were the early full digital recordings. ====Cultural and cinematographic references==== Digital sound recording is mentioned in ... In modern culture, it ... * ... * ... =====Theory===== ====Assumptions==== Digitally recording the sound involves a number of energy transformations, with the corresponding transformations of the oscillation amplitude. The analysis is greatly simplified by making the following assumptions, which are mostly true (except for very loud, very low or very high-pitch sounds): - The source of sound (e.g. vocal cords or a piano string) transforms some of the mechanical vibration energy into an outgoing sound wave * This transformation is assumed to be linear in amplitude * That is, doubling of the original oscillation amplitude doubles the amplitude of pressure deviations in the sound wave * A lot of sources transmit different sound intensity in different directions * Speaking or singing projects more intensity in front of the speaker/singer compared to the direction behind the person * A titled open lid on a concert grand piano is designed to project sound at the audience - The transmission of sound through the air usually distributes the same energy of sound over a wider area * Unless the sound is transmitted through a pipe or an elevator shaft, this leads to attenuation of the intensity with distance * The losses of the sound energy into heat during the transmission through air are usually negligible (in relatively clean, dust-free air) - The sound arriving at the [[wp>microphone]] is the linear superposition of the (variously delayed) sound waves emitted by all the sources * The superposition is the addition of instantaneous pressure deviations from atmospheric due to each wave as a function of time * Before the superposition, each wave is * individually attenuated (according to the distance and the directionality of the source) * individually delayed (by the time it takes to travel from the individual source to the microphone) - The pressure transducer in the microphone produces a voltage signal: $V(t)\sim \Delta P(t)$ * this signal is linearly proportional to the pressure deviations from the atmosphere, due to all inbound sound waves - The [[wp>Analog-to-digital converter|A/D converter]] transforms the continuous voltage signal $V(t)$ to the digital record $V_i(t_i)$ * The time points at which the signal is recorded are discrete evens $t_i=\{0,$ $\Delta t,$ $2\Delta t,$ $3\Delta t,$ $...$ $(N-1)\Delta t\}$ * The time spacing $\Delta t$ between the subsequent recorded numbers is called **dwell time**. * The inverse of dwell time is called the **sampling rate**: $\;f_\text{samp}=\frac{1}{\Delta t}$ * Units: the sampling rate is measured in "samples per second" (${\text s}^{-1}$ or, equivalently, Hz) * The total number of points recorded (including the one at time $t\!=\!0$) is called the **record length** $N$ * The [[wp>Nyquist frequency]] criterion states that the sampling rate must be twice the highest frequency to be recorded: * $f_\text{samp}\geq 2\,f_\text{max}$ * E.g., in audio CDs, the sampling rate is 44100 samples/sec, therefore allowing to record audio frequencies up to 20050 Hz * The **frequency resolution** of the recording (i.e. ability to distinguish two very close tones based on their recordings) is * $\Delta f=\frac{1}{N\Delta t}\;$, where $N\Delta t$ is the **duration** of the recording in time * For example, to distinguish two tones 1 Hz apart, one must record for at least 1 second. * The voltage values that are recorded are also not continuous but discrete: * The possible values are usually of the form $V_i=\Delta V\!\cdot\!v_i$ * where $\Delta V$ is the overall gain scale (measured and recorded in volts per bit (V/bit)) * and $v_i\in \big\{-\!2^{\,n-1}\!+\!1,$ $- 2^{\,n-1}\!+\!2,$ $...$ $-1,$ $0,$ $1,$ $2,$ $...$ $2^{\,n-1} \big\}$ are integers * $\in$ symbol means "with possible values from the following set" * The number $n$ is called the **bit depth**, **number of bits**, or **bit resolution** of the recorder, usually $n=8,$ $12,$ or $16$. * For example, in an 8-bit recorder, the integers $v_i$ can range from $-127$ to $128$ (in total, $2^8\!=\!256$ possible levels) * the most negative recorded voltage is $V_\text{min}=\left(-2^{\,n-1}\!+1\right)\Delta V$ * For an 8-bit recorder, this would evaluate to $V_\text{min}=-127\!\cdot\!\Delta V$ * the most positive recorded voltage is $V_\text{max}=2^{\,n-1}\Delta V$ * For an 8-bit recorder, this would evaluate to $V_\text{max}=128\!\cdot\!\Delta V$ * The smallest voltage difference that can be recorded is $\Delta V$. It is sometimes called **voltage quantization scale** * Substituting the continuous voltage $V(t_i)$ with the nearest possible $\Delta V\!\cdot\!v_i$ can cause some [[wp>Quantization (signal processing)|quantization errors]] for weak sounds - Neglecting quantization errors, the digital recording scales linearly with the pressure deviations of the sound at the microphone * $V_i(t_i)\sim V(T)$ - The discrete frequency analysis of the recording fairly accurately represents the frequency ingredients of the inbound sound * if the necessary conditions are met: * The sampling rate is at least twice the highest possible (or audible) frequency * The duration of the recording is longer than the inverse of the finest frequency difference to be resolved * The typical recorded signal level amplitude is much higher than the voltage quantization scale... but at the same time, * The maximum (and minimum) recorded signal levels are within the $V_\text{min}$ and $V_\text{max}$ bounds (the signal is not "clipped") =====Methods===== ====Computational goals==== The main goal of this project is to characterize the sound recordings (voice or a musical instrument), and to gain insight into the following: - Explore the limitations of digital recordings described in the Theory section above - How loud a sound can one record without clipping? - How quiet a sound can one record without quantization errors? * Simulate quantization errors by rounding the recorded sound to a much coarser voltage grid, -- how does it sound (describe)? - The spectral frequency analysis of sound: - What makes different vowels distinguishable (look at harmonics and overtones) - How does pitch (e.g. singing the same vowel at a different musical tone, low vs. high) affect the spectrum of the recording? - What about differences between the male and the female voice singing the same vowel? - What makes a specific musical instrument sound like it does (look at harmonics and overtones) - compare the spectrum of a note played on a musical instrument to a computer-generated beep (single sine-wave) - compare spectra of different notes played on the same instrument, or the same note played on different instruments - //feel free to suggest more// ====Software==== * [[wp>IGOR Pro]] version 6+ software will be used * A free, fully functional 30-day trial version 6.3 can be downloaded from [[|Wavemetrics]] website. * Installation is pretty standard, either for Mac or for Windows (XP, Vista, or 7 works fine, probably Win8 as well) * The following functionality is needed (''Cmd'' denotes the "apple" or "command" key on a Mac, ''Ctrl'' is the "contol" key on Windows): - Accessing the **command window** to type in commands and see the history of past commands (''Cmd-J'' on Mac, ''Ctrl-J'' on Win) - Accessing the **procedure window** to paste macros and functions if desired (''Cmd-M'' on Mac, ''Ctrl-M'' on Win) - Accessing the **Data Browser** window to see and manipulate the objects (datafolders, waves, variables, and strings) created so far * Available via the "Data" menu (on top of the window), "Data Browser" submenu (see Fig. 2 below) - Creating an array of numbers (or zeroes) for recording and playing sounds. Any array is called a **wave** in IGOR Pro. * In the command window, type (to execute any command you just typed, don't forget to hit "Enter" on the keyboard): <code>Make/N=20000 wave1</code> * This example will create a wave of length $N=20000$, named ''wave1''. It can be seen in the Data Browser in the ''root'' datafolder * IGOR code is **not** case sensitive, all commands and names can be entered in any case or any mixture of UPPER or lower CaSeS. - Setting up the time scale for a given wave * Any wave in IGOR can have "scaling" associated with it, e.g. the time points at which the data values were recorded * The "scaling" is characterized by equally-spaced intervals. * Only three parameters are stored in memory: - The "timing" of the initial point (in our case, $t\,=\,0$) - The timing interval between the successive points (i.e. the dwell time), in our case $\Delta t$ - The units of measurement (in our case, the character string "s" for seconds) * To set the "scaling" starting from $t\,=\,0$, with $\Delta t\,=50\,\mu$s, type into the command window: <code>Setscale/P x 0, 50e-6,"s",wave1</code> * Here ''/P'' denotes the "Start and delta" format * ''x'' is the x-scaling (the default, multidimensional waves can also have y, z, etc.). Therefore, in our case time is "x". * $50\times 10^{-6}$ can be entered as ''0.00005'', ''5E-5'', or ''50e-6'' -- all these forms are equivalent. * Alternatively (and equivalently), a sampling rate of 20000 samples per second can be set via <code>Setscale/P x 0, 1/20000,"s",wave1</code> * The above line works because $\Delta t=\frac{1}{f_\text{samp}}$ * The scaling can also be checked and set via a "Change Wave Scaling..." submenu in the "Data" menu (Fig. 2) - Assigning a pure sine wave data to an IGOR wave. * In the command window, type <code>wave1=0.5*sin(2*pi*1500*x)</code> * This will assign the following data to the wave: * $V(t)=0.5\sin\big(2\pi\!\cdot\!(1500\,{\text{Hz}})\!\cdot\!t\big)$ at the time points specified by the scaling * Note that ''*'' must be used for multiplication all the time * ''x'' in the expression refers to the x-scaling (default) set by the ''SetScale'' command. In our case, it denotes the time points $t_i$ * <color red>Warning</color>: This command will erase any data previously stored in ''wave1'' * Use ''Duplicate wave1 anotherwave'' to store all the ''wave1'' data (including the scaling) in ''anotherwave'' - Playing the sound recording from a wave: * In the command window, type: <code>playsound wave1</code> * If no sound is audible, please make sure your computer volume is not muted or set to zero (i.e. you can hear a [[|YouTube]] video) - Displaying the wave as a time plot * In the command window, type <code>display wave1</code> * Or, build a new graph via a "Windows" menu, "New Graph..." submenu. * Choose "calculate" for the x-wave (this will utilize the "scaling") * To zoom in on the plot, drag a box (called **marquee**) across the area you want to zoom in/out with the left button using a mouse. * Release the mouse button, then click inside the marquee to select the zoom option * To zoom out to the original, all-inclusive view, simply press ''Cmd-A'' on Mac, or ''Ctrl-A'' on Windows * Change the appearance by right-clicking or double-clicking on various components (margins, axes, traces, labels, grid, etc.) - Recording the sound... follow these steps: - Type in the command window, hitting the "Enter" key after each line:<code>SoundInStatus edit W_SoundInRates</code> * The first command gets IGOR to ask your computer's sound system for possible sampling rates. * The answers are stored in an automatically created ''W_SoundInRates'' wave * The second command simply displays this wave as an editable table * The first value is the number of possible values (usually 5 or 7), can be visible in the top left corner of Fig. 2 * After that all the possibilities are listed in units of ${\text s}^{-1}$. - Create an empty wave (see above) and set the sampling rate to one of the possibilities (using the ''SetScale'' command) - In the command window, type <code>SoundInRecord wave1</code> * <color red>Warning</color>: This command will erase any data previously stored in ''wave1'' * Use ''Duplicate wave1 anotherwave'' to store all the ''wave1'' data (including the scaling) in ''anotherwave'' * If an error comes up, check you sampling rate again. It should match one of the choices exactly. * The length of the recording is determined by the number of points in your wave (the record length $N$) and the sampling rate $f_\text{samp}$ - to make sure the recording is successful, play it back (see above). - Calculating and displaying the power spectrum of the sound as a function of frequency * In the command window, type <code>FFT/MAGS /DEST=wave1_fft wave1; display wave1_fft; ModifyGraph log(left)=1</code> * This is an example of executing 3 different IGOR commands on the same line: * The commands in this case need to be separated by a semicolon '';'' - The first command * computes the squared magnitude (power) of the [[wp>discrete-time Fourier transform]] of ''wave1'' as a function of frequency * saves it as a ''wave1_fft'' wave * note that the scaling of the ''wave1_fft'' wave is automatically in units of frequency (Hz) with the correct spacing $\Delta f$ * <color red>Warning</color>: This command will erase any data previously stored in ''wave1_fft'' * Use ''Duplicate wave1_fft anotherfft'' to store all the ''wave1_fft'' data (including the scaling) in ''anotherfft'' - The second command creates a new default-style graph of the power spectrum contained in ''wave1_fft'' - The third command changes the type of the vertical axis of the graph to log-type, so the weaker harmonics and overtones can be seen - Saving your work * Save the entire IGOR "experiment" (including the command history, code, data, plots, tables, variables and strings, everything) * Use the "File" menu, "Save Experiment As..." submenu * Make sure to select "Packed Experiment File" option to save everything as a single .pxp file. * Open the saved file to get back to where you finished last time * Access your file either from the Mac finder or Windows explorer, or from IGOR's "File" menu, "Recent Experiments" submenu | {{ :projects:sound:databrowser.png?nolink |}} | ^ Figure 2. Illustration of how to access the Data Browser window (shown on the right side) in IGOR Pro. ^ ====Coding tasks==== This is the detailed list of tasks to be accomplished: - Record, display, and replay either voice or musical instrument sounds (or any sound source you are interested in) - Try to simulate recordings that violate some of the assumptions in theory (clipped, over-quantized, under-digitized in time, too short) * How does each artifact sound? Can you tell if you hear one in a recording? - Compute the power spectrum of each recording, and relate your observations of various peaks - Try to correlate power spectra with the tone of the sound, volume, possible artifacts, type of vowel or musical instrument =====Data===== //Feel free to post links to your data here// =====References===== - Strutt ([[wp>John William Strutt, 3rd Baron Rayleigh|Rayleigh]]), J W; Lindsay, R B (1877). The Theory of Sound. Dover Publications. ISBN 0-4866-0292-3. - From N.K.'s past contributions to the [[wp>Sound]] article on [[wp>Wikipedia]].

digital_sound_project.1400479542.txt.gz · Last modified: 2014/05/19 06:05 by wikimanager