User Tools

Site Tools


white_noise_project

Modelling of the white noise

By ... , ... and ...
in collaboration with Prof. Nicholas Kuzma

Introduction

Feel free to expand and contribute

General background

Pure white noise can be observed when no signal is transmitted into the receiver, for example if a radio or a TV set is not tuned to any particular station, or if no microphone is plugged into an audio amplifier. As a matter of fact, any recording or measurement process exhibits some degree of white noise superimposed onto the recorded or measured signal. For example, any resistor at a given temperature will generate a white-noise voltage, called Johnson-Nyquist noise. In some applications, such as MRI or ultrasound imaging, the persistence of white noise in the images is the dominant factor negatively affecting the scan duration or the image quality, or both. The goal of this project is to get an insight into the statistical properties of white noise, and to distinguish between random and non-random origins of certain types of noise.

Personal observations by authors

 ...
Figure 1. White noise, observed by N.K. in an NMR spectrometer (tuned to 129Xe nuclear-precession frequency). The vertical scale is the detector voltage, and the horizontal scale is time in milliseconds. Figure 2. In contrast to Fig. 1, this recording is dominated by the 60-Hz interference from the power line (N.K.)

History

John B. Johnson, while working at Bell Labs in 1926, was the first to quantify the white noise in resistors. He described his results to Harry Nyquist, a Bell Labs theorist, who was able to come up with the explaination.





Figure 3. John B. Johnson and Harry Nyquist of Bell Labs, who came up with the first quantitative theory of white noise in resistors (Courtesy Wikipedia). Note that John's photo exhibits quite a bit of noise superimposed onto his image.

Cultural and cinematographic references

White noise (a.k.a “static”) is mentioned in …

Theory

Defining features

By definition, white noise is a sequence of statistically independent random measurements with the same distribution centered on 0. A more special, albeit perhaps more commonly encountered type of white noise is the Gaussian white noise, with the additional requirement that each sample in the recording is “normally” distributed (i.e. its statistical distribution has a “bell-shaped” curve,

  • $f(x)$ $=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{x^2}{2\sigma^2}}$, $\;\;\;\;{\text{Eq. }} (1)$

where

  • $f(x)$ is the probability density: the ratio of the probability of finding $x$ in the interval $(x,x\!+\!\Delta x)$ to the interval's width $\Delta x$, and
  • $\sigma$ is the so-called standard deviation (or “width”) of the distribution).
...
Figure 4. Normal (gaussian) probability distribution given by Eq. (1), where 68% of observations fall between $-\sigma$ and $\sigma$. About 95% of observations fall between $-2\sigma$ and $2\sigma$.

The hypothesis of the noise origins

The idea is, while each individual source of noise might not be random (for example, electromagnetic emissions from nearby microprocessors, electric motors next door, distant lightnings and radio stations all have very specific frequencies and time-domain signatures), when multiple unrelated sources are combined at the receiver input, their sum signal is, to a much greater degree, random. Moreover, as the central limit theorem of statistics would suggest, such sum of many unrelated non-random signals is not just random, but is also normally distributed according to Eq. (1). We shall experimentally test this hypothesis in this project.

Methods

Computational goals

The main goal of this project is to get a feeling of how unrelated non-random signals, such as sine waves, are randomly combined to produce much more random (and ultimately, normally-distributed) white noise. Specifically, we want to demonstrate

  1. The distribution of readings in a single sine-wave is very different from the Gaussian (the bell-shaped curve of Fig. 4).
  2. As several non-related sine waves of different frequencies are combined, the distribution of readings becomes lumpier.
  3. Eventually, when many sine waves are combined, the readings of their sum (or average) signal are distributed very closely to a Gaussian.
  4. feel free to suggest more

Software and data analysis

Microsoft Excel software will be used. The following functionality is needed:

  1. Creating a “grid” of numbers, e.g. a row or column of {0, 0.001, 0.002, … 1} for modeling the time points at which recordings take place
    • typing up such an array is tedious, but it's quite easy using the “drag by the corner” trick:
      • type 0 into a cell
      • type 0.001 into the next cell below
      • select both cells with a mouse
      • drag the bottom-right corner of the selection all the way down, generating the desired sequence
  2. Creating a formula in the next column, taking the preceding column as an input, and dragging it all the way to the end of the input column
    • you can create a formula by typing = into a cell, followed by a formula content (e.g. SQRT(2)), then hitting “Enter”:
      • =SQRT(2) will calculate the square root of 2
      • =SQRT(A2) will calculate the square root of the number in the cell A2 (i.e. in the column A, row 2)
    • See Exercise 3 below for more detailed instructions
  3. Generating random numbers, evenly distributed on an interval [0,1]
    • using the =RAND() function
      • Warning: all random numbers, and the numbers that depend on them, will change every time any cell in Excel is modified
    • These can be scaled to any interval [a,b] by simply using
      • =a+(b-a)*RAND()
  4. Generating normally-distributed random numbers
    • use the =sigma*NORM.S.INV(RAND()) function, where sigma is the desired standard deviation (width) of the curve
      • sigma can be just a number (e.g. 1), or a cell someplace else containing the desired value
  5. Exercise 1: generate a column of 200 (and/or, separately, 20000) normally-distributed random numbers with sigma=1
    • Question: How do we plot the input versus the output. What is the input and what is the output?
      • Answer: In this exercise, just create a column of random numbers. No need to plot yet.
  6. Plotting (using scatter chart) the output column versus the input column
    • with main title
    • and with axis labels (called “axis titles” in Excel)
  7. Exercise 2:
    1. create a column containing a grid of x values from $-$5 to 5 with step 0.01
    2. create another column next to it with the f(x) values generated using Eq. (1) with $\sigma\!=\!1$
    3. plot f values versus the x values
      • use “Scatter”, “Smooth-lined scatter” options
    4. add the main title and axis labels (titles)
    5. save that plot for the introduction section of your report
  8. Automatically counting the number of cells in a certain range in an array
    • use a pair of COUNTIF() functions:
      • To count the number of cells in the column B of your excel sheet that are greater or equal to 3.2 but less then 3.3, use
        • =COUNTIF(B:B,">=3.2")-COUNTIF(B:B,">=3.3")
          • Here, the first COUNTIF counts the number of cells that are greater or equal to 3.2
          • The second COUNTIF counts the number of cells that are greater or equal to 3.3
          • The difference between the two counts is the number of cells that belong to the $[3.2,3.3)$ semi-open interval.
          • The number 3.2 (if it happens in column B) will be counted, but the number 3.3 will not make it to the difference count
      • To count the number of cells in the range C2:C1001 that exceed the number in D4 but no greater than the number in D5, use &:
        • =COUNTIF(C2:C1001,">"&D4)-COUNTIF(C2:C1001,">"&D5)
  9. Exercise 3: in the same workbook containing the previous exercise,
    1. Create a column containing the grid of “bin boundaries” from $-$5.1 to 5.1 with step 0.3
      • do not overwrite the results of the previous exercises
    2. In the next column, create a grid of “bin centers”:
      • calculate the first center by typing the =0.5*(E2+E3) formula, assuming the first bin boundary is in E2, the second in E3, etc
      • hit enter, select this first bin center with a mouse, and drag the lower-right corner of this cell all the way down
      • this will automatically generate formulas for all other bin centers, with the inputs automatically shifting down as you drag
        • to avoid such automatic adjustment (later in the project - when it is not desirable) of the formula inputs, you can
          • use E$3 to avoid vertical shifting
          • use $E3 to avoid horizontal shifting when dragging horizontally
          • use $E$3 to avoid any shifting
        • this automatic shifting also happens when copying/pasting cells with formulas, and when shifting cells due to deleting or inserting
    3. In the next column, calculate numbers of random numbers generated in the first exercise that fall within each bin
    4. Plot the number of occurrences of the random numbers versus the bin centers.
      • use “Column”, “Clustered column” options
        • use the found numbers of occurrences as “Y values”, and the bin centers as “Category (X) axis labels”
      • alternatively, use “Scatter”, “Marked scatter” options:
        • select the bin centers as “X values” and the numbers of occurrences as “Y values”
      • Such plot is called a histogram
    5. Compare the histograms for small (N~200) and large (N~20000) numbers to the Gaussian shape plotted in the previous exercise.
      • for a more quantitative comparison, convert probability density f(x) of Eq. (1) to the observed bin counts in this exercise:
        • $f=\frac{P}{\Delta x}$ $=\frac{\text{probability}}{\text{bin width}}$ $=\frac{n_i/N}{\Delta x}$ $=\frac{n_i}{N\Delta x}$
        • $n_{i\,}(\text{predicted})=f\,N\Delta x$
          • here $\Delta x$ is the difference between successive bin boundaries
          • $n_i$ is the observed (or predicted) number of occurrences in the i th bin
          • $N$ is the total number of observations
    • Question: I have the two histograms plotted. For the quantitative comparison should I calculate the probability density for each bin? for the exercise with $N=200$ would this be done by using the following equation: ${\text{probability density}}=$ $\frac{\text{# observations in bin}}{200\times 0.3}$ ? I am having a hard time understanding if we need to calculate the predicted number in each bin. Should I do this? If so, how would I figure out how to do this?
      • Answer: to compare “experiment” with theory, you need either to convert your bin counts to the probability density (by dividing the counts by the total # of observations and by the bin width), and compare that to the theoretical curve, or, alternatively, convert the theoretical probability density to the predicted bin count, that is by multiplying the theoretical probability density by the total # of observations (200) and by the width of your bins (I guess, 0.3). Then plot the two curves on the same plot, the theoretical curve using lines and the experimental bin counts using dots or other symbols.
  10. Save the exercises above for the “Intro”, “Theory”, and “Methods” sections of your report
    • You can convert any screen content into an image that can be pasted into your report:
      • on a Mac, press Command, Ctrl, Shift, and 4 keys at the same time.
        • Release the keys, select any screen area by dragging a “cross-hair” pointer with a mouse or a track-pad across the image
        • Release mouse or trackpad
        • Switch to the editing software (e.g. Word or Pages), and paste at the desired spot
      • on a PC, press Alt and PrtScn (“Print Screen”) at the same time, then release
        • Switch to the editing software (e.g. Word, Powerpoint, or Paint)
        • Paste at the desired spot
        • Crop the excessive margins as needed
...
Figure 5. Examples of histogram plots in Excel.
In the top figure, the theoretical curve has been scaled (multiplied by $N\Delta x$) to yield the predicted numbers of counts in each bin.

Coding tasks

This is the detailed list of tasks to be accomplished:

  1. Generate a single sine wave of amplitude $A=5$, phase $\phi=1.5\,$rad, and frequency $\omega=2\pi\times 3.5\,$Hz:
    • $V(t)=5\sin\,(2\pi\cdot 3.5\,t+1.5)$
      • First, generate a column containing a grid of time values from 0 to several seconds with a step of 0.001 s
      • In the next column, type a formula containing the above equation and drag it by the corner all the way down
        • Use PI() or 3.1415926 for $\pi$, numbers from the preceding column for $t$
        • You can plot this sine wave (or some part of it) versus time for illustrative purposes for your report
  2. Calculate and plot the histogram of the observed values in this pure sine wave, using Exercise 3 above as a guide
    • comment on the apparent “bimodality” of this histogram in your “Discussion” section
    • what are the most-frequently observed values in a sine wave? Why is that so (explain based on your intuition)?
    • what are the least-often observed values in a sine wave? Why?
  3. Using the same grid of time values, generate several more sine waves with the same $A$ and $\omega$, but different $\phi$
    • Place these new columns next to the original sine-wave column
    • Finally, compute another column corresponding to the average of these waves:
      • For each time point, the wave average = (sum of recorded voltages in all the sine waves at this time point)/(number of waves)
    • Plot the resulting average wave. What properties does it have? Is it a good model for “random” noise?
  4. Using the same grid of time values, now generate 200 sine waves (by dragging formulas horizontally as well!):
    • Set all amplitudes to 1, all phases to 8
    • The frequencies $\omega$ should be on a grid from $2\pi\times 0.3\,$Hz to $2\pi\times 199.3\,$Hz with a step of $2\pi\times 1\,$Hz
  5. Compute the average of these sine waves in another column
    • you can use the =AVERAGE(H3:GY3) formula to average the numbers in the cell range from H3 to GY3 in this example)
  6. Plot some part of this average of 200 sine waves versus time.
    1. Does it look like “random noise”?
    2. If not, what is the pattern that you observe?
  7. Generate and plot the histogram of the measured values in this average sine wave.
    • Make sure to zoom in on the interesting part of the plot
      • How does the distribution of measured values, represented by this histogram, look like?
      • What are the most frequently observed values?
      • What are the least-frequently observed values?
      • Compare and contrast this to the histogram of a pure sine wave obtained above.
        • Try to offer your intuition as to why adding (averaging) several pure sine waves results in such a drastically different histogram
        • If intrigued, try to experiment with fewer than 200 sine waves. At what point does the dramatic change of the histogram happen?
    • Question: I need to figure out what is wrong with my frequency graph (average of 200 sine waves). I have re-plotted it again, checked my formulae, and I still am unable to find any error.
      • Answer: I looked at your file, and the data is actually correct. The problem is with your plot. Do you know how to change the axis range? As it is now, you are “zoomed in” too much on your figure: the x axis is only from 0.1 to 0.3 s somehow, and the y axis is from $-0.02$ to $+0.02$. Basically it is blowing up a tiny little aspect of the plot, and not showing the whole picture.
        1. Select your plot.
        2. On the “ribbon” in excel, select “Chart Layout”
        3. Then click the box “Axes”, “Horizontal axis”, “Axis options”
        4. Make sure the first two boxes (“minimum” and “maximum”) are checked, to stretch the scale over the whole range of your data
        5. Click OK
        6. Do the same for the vertical axis.
        7. Your plot will be showing the whole data then, and it will be correct.
  8. Finally, generate a number (you decide how many) of pure sine waves with different amplitudes and frequencies.
    • Try random frequencies and/or random amplitudes/phases (using a scaled version of the RAND() function).
    • Can you “synthesize” a noise trace that looks like Fig. 1?
    • How does a histogram of measured values of such a “noise” signal look like?
    • Can you estimate the standard deviation $\sigma$ of your noise signal?
      • you can either compare to the theoretical curve
      • or use the =STDEV(GZ2:GZ5001) function on the range (e.g. GZ2:GZ5001) of your generated voltages

References

white_noise_project.txt · Last modified: 2015/03/30 05:18 by wikimanager