Advanced Raster analysis with Spatial Analyst
and Excel: Multiple regression
Building on the skills and understanding from the last exercise (!),
we will now go beyond exploratory methods on to predictive methods.
We have 6 data elements (fields) in the soilsamp table, plus an elevation
layer (Digital Elevation Model or DEM), while our dependant variable
is Crop Yield. (Please think about how these could be changed into
variables that are relevant to your own research!)
A way to relate these variables to the output variable is Multiple
Linear Regression. This is a technique for determining the influence
of each particular variable on the output variable. We will perform
a linear regression using these variables, illustrating how raster
analysis is a more powerful tool for natural science applications.
You are solving an equation that like this:
Y = Ax + By + Cz
Where x, y and z are variables like moisture content, Ph, etc. And
A, B and C are the coefficients, Y is the dependant variable, in this
case Yield.
We will use Excel to give us the values for the coefficients, then
take these back into Arcview and multiply each raster layer by it's
calculated coefficient.
Do this:
Use the Tutorial Data from excercise 3
Spatial join the yield data and the soilsamp data.
Convert this to a 3D shapefile, taking the z-values from the DEM.
Add an elevation field and fill it with elevation values using the
Field Calculator and the .getz function.
Export the table, and then open it in Excel
Make sure the Analysis Toolpak is loaded.
The y values will come from the yield column, the x values will comes
from all of the other data values (not lat/long!)
Choose new sheet as the output.
Copy the coefficients (including the intercept) and go back to Arcview
to start multiplying each of your layers by it's coefficent.
Add up all of the layers for a final Predicted Yield map. Compares
this to the "real" yield map by subtracting one from the
other (this is the residual).
Whew! That was fun, huh?