hbackup

HARNEY DUNE PROJECT
PHASE 1 EXPLORATORY RESEARCH
OBJECTIVES, PROCEDURES AND PRELIMINARY RESULTS
Cameron M. Smith
15 January 2000

CONTENTS
1.0 DATA STRUCTURE
    1.2 Bootstrap Experiment
2.0 EXPLORATORY DATA ANALYSIS
    2.1 Objectives and Procedures
    2.2 Procedures
3.0 RESULTS
    3.1 Populations, Occupation Areas and Data Collection Units
    3.2 Assemblage
    3.3 Cluster Generation
    3.4 Cluster Discussion
    3.5 Assemblage Structure by Cluster
4.0 CONCLUDING REMARKS
5.0 REFERENCES

1.0 INTRODUCTION
This preliminary report on my examination of the Harney Dune site data (Raymond 1994) is intended to report on the properties of the data set in terms of suitability for future analysis, as well as to describe some of characteristics of some key variables. I first report on the new variable, ARTIFACT, which I used to identify gross clusters in the study area.
1.2 DATA STRUCTURE
The ARTIFACT variable is simply the sum of all artifact counts per collection unit. I did not include summary counts (OBSIDIAN, CHERT, BASALT) which would artificially inflate scores.
In using the variable ARIFACT to identify clusters, we must first examine the variable for its general statistical properties.
A histogram shows that the sample is not uniformly distributed; there is skewness towards lower values, and there is one very high-value outlier.

Skewness of ARTIFACT is 2.74; comare this to skewness of a rather normal distribution such as the variable NORTH (with roughly equal amounts of collection units north and south of the centerline) is -.02. A general rule of thumb is that in a uniform distribution, sample error will decrease with an increase in sample size. With this nonuniform distribution, however, we may not make this assumption. There are patches of high and low values, unevenly distributed on the landscape, such that random sampling may increase or decrease error.
To establish this quantitatively, I conducted a 'bootstrap' experiment, in which random samples of a range of high and low counts are taken from the 770 collection unit population. I then examined the errors associated with these samples for correlations (postive or negative) between sample size and error.
1.2 Bootstrap Experiment
Objective: determine whether sample size effects error measures of the variable ARTIFACT.
1. ARTIFACT
Sample of all collection units.
TOTAL OBSERVATIONS: 770
N OF CASES 770
MINIMUM 0.000
MAXIMUM 296.000
MEAN 25.071429
VARIANCE 866.030
STANDARD DEV 29.428
SKEWNESS(G1) 2.741
C.V. (Coefficient of Variation = (std. deviation/mean)) = 1.174
2. ARTIFACT
50% random sample (385 collection units randomly sampled).
TOTAL OBSERVATIONS: 385
N OF CASES 385
MINIMUM 0.000000
MAXIMUM 158.000000
MEAN 25.157895
VARIANCE 886.579102
STANDARD DEV 29.775478
SKEWNESS(G1) 1.993405
C.V. 1.284756
3. ARTIFACT
25% random sample (192 collection units randomly sampled).
TOTAL OBSERVATIONS: 192
N OF CASES 192
MINIMUM 0.000000
MAXIMUM 158.000000
MEAN 21.937500
VARIANCE 951.629581
STANDARD DEV 30.848494
SKEWNESS(G1) 2.160887
C.V.               1.420499
4. ARTIFACT
10% random sample (77 collection units randomly sampled).
N OF CASES 77
MINIMUM 0.000000
MAXIMUM 164.000000
MEAN 35.454545
VARIANCE 1683.224880
STANDARD DEV 41.027124
SKEWNESS(G1) 1.414111
C.V. 1.406199
Further Bootstrapping via Random Sampling
N OF CASES 213
MINIMUM 0.000000
MAXIMUM 144.000000
MEAN 24.352113
VARIANCE 580.059394
STANDARD DEV 24.084422
SKEWNESS(G1) 1.666893
C.V. 0.989008
N OF CASES 26
MINIMUM 8.000000
MAXIMUM 89.000000
MEAN 37.846154
VARIANCE 426.695385
STANDARD DEV 20.656606
SKEWNESS(G1) 0.567251
C.V. 0.545805
N OF CASES 261
MINIMUM 0.000000
MAXIMUM 149.000000
MEAN 25.375479
VARIANCE 572.981550
STANDARD DEV 23.937033
SKEWNESS(G1) 1.821640
C.V. 0.943314
N OF CASES 18
MINIMUM 0.000000
MAXIMUM 94.000000
MEAN 18.000000
VARIANCE 484.705882
STANDARD DEV 22.016037
SKEWNESS(G1) 2.348802
C.V. 1.223113
N OF CASES 12
MINIMUM 3.000000
MAXIMUM 47.000000
MEAN 24.500000
VARIANCE 178.636364
STANDARD DEV 13.365492
SKEWNESS(G1) 0.040445
C.V. 0.545530
N OF CASES 19
MINIMUM 0.000000
MAXIMUM 107.000000
MEAN 39.631579
VARIANCE 796.023392
STANDARD DEV 28.213887
SKEWNESS(G1) 1.270461
C.V. 0.711904
N OF CASES 15
MINIMUM 1.000000
MAXIMUM 78.000000
MEAN 21.733333
VARIANCE 761.638095
STANDARD DEV 27.597791
SKEWNESS(G1) 1.358457
C.V. 1.269837
N OF CASES 70
MINIMUM 0.000000
MAXIMUM 104.000000
MEAN 27.771429
VARIANCE 541.135404
STANDARD DEV 23.262317
SKEWNESS(G1) 1.331047
C.V. 0.837635
N OF CASES 111
MINIMUM 0.000000
MAXIMUM 158.000000
MEAN 21.495495
VARIANCE 955.397707
STANDARD DEV 30.909508
SKEWNESS(G1) 2.455825
C.V. 1.437953
N OF CASES 128
MINIMUM 0.000000
MAXIMUM 121.000000
MEAN 33.585938
VARIANCE 527.000431
STANDARD DEV 22.956490
SKEWNESS(G1) 1.094248
C.V. 0.683515
N OF CASES 185
MINIMUM 0.000000
MAXIMUM 164.000000
MEAN 29.821622
VARIANCE 968.962573
STANDARD DEV 31.128164
SKEWNESS(G1) 1.980534
C.V. 1.043812
In these 15 bootstrap trials, we see that error measures (standard deviation and Coefficient of Variation) of the variable ARTIFACT do not correlate well with sample size (either positively or negatively).

Thus, random samples of large or small size may generate high or low deviations from the average ARTIFACT score. This is largely due to the non-normal distribution of the variable ARTIFACT. We may conclude that large samples are not necessarily more useful to us than small samples, and that small samples will not necessarily generate high error figures. In fact, what is most clearly pointed out here is the unevenness of the variable ARTIFACT and, perhaps, the importance of careful sample selection (cluster identification), particularly on a scale relevant to human action in the past.
2.0 EXPLORATORY DATA ANALYSIS
2.1 Objectives and Procedures
My objectives in this exploratory phase of research are:
1. Identify clusters of artifacts which result largely from human behaviour.
2. Characerize the clusters in terms of artifact assemblage structure.
3. Compare the clusters to identify differences in area use (area is reflected by cluster).
I will proceed by selecting clusters based on the variable ARTIFACT, which simply indicates the n of artifacts per collection unit. For exploratory purposes, this is sufficient, but it must be remembered that some artifact types, such as chert clakes, are produced in higher numbers per use episode than other types, such as net weight. Thus, at a later time it will be necesary to identify clusters with more specific aims. For the moment, the objective is to characterize the most obvious clusters of artifacts.
Later analysis will reduce analytical unit sizes to reflect, for example, occupation episodes as mentioned above, using 3-collection unit groups as analyitical units representing single occupation episodes.
2.1 Procedures
This section indicates my procedures and files created, both for my own reference and to clarify my methods.
1. Creation of SYSTAT data file for 770 collection units. Produced file 'dune770'.
2. Creation of new variable in file 'dune770', ARTIFACT. This figure is the sum of all artifact counts per collection unit. Creation of new variable in same file: CLUSTER (to be filled in as cluster number identifier per collection unit when clusters are identified, in steps explained below).
3. Statistically characterize the entire 770-collection unit assemblage in terms of artifact class per collection unit. Produced text file '770 stats'.
4. Plot of 770 collection units in correct positions, with collection unit symbol color determined by variable ARTIFACT. Colors aid identification of clusters: darker colors = higher density. This file not saved, but used in step 5, below.
5. With plot produced in step 4, selection of analytical units by using 'lasso' tool to select plotted collection units with high ARTIFACT scores. This procedure identified ten clusters.
6. Assignment of cluster numbers to collection unit cases in data file 'dune770' (using new variable CLUSTER). Created graphic file 'clustermap.gif' in directory Graphics for visual identification of which collection units are members of which clusters.

7. Generation of statistical summaries of clusters 1-10; creation of text files 'cluster1stats' through 'cluster8stats'. Clusters 1-10 have roughly 3-6 collection units each; 57 collection units were used to generate these clusters.
8. Creation of new variable RATIO1 which is the ratio of production items to finished / used items. Production items = flakes, raw material and cores. Finished / used items are lithic tools, net weights, ground stone and fcr. Ratio successfully calculated for 43 of 57 cluster members using the following transform formula:
let RATIO1=
(CHERTFLA+CHERTCOR+CHERTRAW+OBSFLAKE+OBSCORE+OBSRAW+BASFLAKE+BASCORE+BASALTRA)
divided by
(CHERTTOO+OBSTOOL+NETWEIGH+GROUNDST+BASTOOL)
9. Created boxplot of RATIO1 values for each cluster, producing graphic 'ratio1.gif' in directory 'Graphics'. I expect that in domestic, living areas, there will be lower ratios of production items to tools and used items. This boxplot clealy shows high production item to tool/used item ratios in the extreme Southern and Northern areas, with lower ratios in the central area, particularly around the Mud Lake Slough. This strongly suggests that the Mud Lake Slough area (represented by clusters 5,6,7 and 8) was an occupation area, while other activities were undertaken in the mentioned Northerly and Southerly regions of the study area. We may say that while lithic production took place in all areas, it is concentrated in areas which do not have habitation-related features, and by extension, it occurred mostly outside habitation areas.

10. Creation of box plots for each main artifact category per cluster. Graphics saved as gif files in directory 'Graphics'. These to be printed and compared visually to one another as well as to the summary statistical matter generated for the entire 770-collection unit assemblage.
In the following plots, the variable being illustrated is coded vertically to the left:
chertcor=chert core, and so on. Note I do not have projectile point or feature type data at this time.

11. Cluster generation using JOIN (average linkage, euclidean distance) method to identify multivariate assemblage structure similarities and differences, using following variables as raw data:
CHERTFLA, CHERTCOR, CHERTTOO, CHERTRAW, OBSFLAKE, OBSBLADE, OBSCORE, OBSTOOL, OBSRAW, NETWEIGH, FCR, GROUNDST, BASFLAKE, BASCORE, BASTOOL, BASALTRA
Note that I did not use raw 'CHERT', 'OBSIDIAN' or 'BASALT' counts, as these reiterate what is already reported in CHERTFLAKE and so on.
This generated a text file which is actually a dendrogram. It was necessary to identify clusters by a letter code, the letter referring to the cluster e.g. cluster 1=A, cluster 2=B, and so on (A to J for 10 clusters).
Note that the cluster run was conducted on the 57 units assigned to clusters, but NOT using cluster identification numbers as a variable (otherwise clusters would form pretty much on the basis of the cluster number). This test was to identify whether clusters which I picked cluster together or apart based on multivariate assemblage structure.
Created text file 'cluster 1' to display this cluster run.
12. Cluster generation using JOIN (average linkage, euclidean distance) method to identify significance of variables in contribution to clusters generated in step 11:
CHERTFLA, CHERTCOR, CHERTTOO, CHERTRAW, OBSFLAKE, OBSBLADE, OBSCORE, OBSTOOL, OBSRAW, NETWEIGH, FCR, GROUNDST, BASFLAKE, BASCORE, BASTOOL, BASALTRA
Created text file 'cluster 2' to display this cluster run; this converted to 'dendrogram2.gif' file.

This cluster run identified CHERTFLAKE as the most important ordering variable, followed by OBSIDIANFLAKE and FCR. These drive much of the ordering of units into SYSTAT-generated clusters. In a principal components analysis, then, these would be the principal components which would account for the most variation in the data set. Other variables are much less internatlly variable and do less to order the clusters generated in step 11. Of these, however, NETWEIGHT, CHERTCORE and GROUNDSTONE are most variable; other variables contribute little to cluster ordering. From this we learn that CHERTFLAKE, OBSIDIANFLAKE, FCR, NETWEIGHT, CHERTCORE and GROUNDSTONE are the most variable of the artifact types observed here, and contribute the most to cluster ordering.
13. Cluster generation as in step 12, but after REMOVAL of a unit in Cluster 1 with an abnormally high value for CHERT (277). On examining the original cluster dendrogram, I decided that this collection unit's very high score (compare with a mean of 8 and a standard deviation of 16) artificially distorts the dendrogram results. The cluster thus generated in Step 13 is much more reflective of assamblage structures, not being so heavily driven to separate that collection unit from all others, whose subsequent variation was considered by the computer to be minimal. This better cluster dendrogram was saved as text file 'cluster 3'. The dendrogram was illustrated and saved as 'dendrogram.gif': see below.
The dendrogram thus created is presented and discussed below.
14. Visual examination of cluster dendrogram, box plots and summary statistics to identify unexpected patterning and for thinking before writing summary comments. Cluster characteristics are described below in RESULTS.
3.0 RESULTS
3.1 Populations, Occupation Areas and Data Collection Units
Here I discuss the potential of estimating the duration of use of the Harney Dune study area by incorporating a 'landscape approach' as suggested by Dewar and McBride (1992), incorporating a method broadly outlined in Hassan (1981). This is a preliminary examination, with rough estimates, but it seems to provide some interesting insights into possible analyses of the area.
The Harney Dune artifact and feature scatter addressed in this report is arranged over 77,000m^2 (7.7km^2). Because of relatively low Paiute population densities (REFERENCE), it is certain that this distribution does not represent one occupation, but multiple occupations of the area. One approach to understand such a palimpsest is to model the most liklely redundant process to have generated archaeological traces. In this case, the most likely redundant process is that of occupation of some portion of the study area by a band of Paiute people. Estimating the population and dwelling area used by such bands, per occupational episode, will assist in understanding (a) the formation of the deposits and (b) the relevant scale of observation and analysis of the data, which was collected in uniform grid units.
1826, Ogden, travelling past Harney Dune, said the domestic structures of the many Paiute situated there, were:
'...generally made of worm wood or grass and of a size to contain from 6 to eight persons.'
Ogden in Davies 1961:20.
We may estimate the area of such a structure using Cook's general figure of 3m^2 per the first six people of a dwelling, adding 9m^2 for each additional member (Cook 1972 in Hassan 1976). This yields figures of 18m^2 for a six-person dwelling, or a 45m^2 for a nine-person dwelling. Ethnographic records suggest that Paiute aggregations of more than 50 people were rare (REFERENCE). If Harney Lake was a seasonal population aggregation area, we may estimate such large populations occasionally gathering there, as suggested by Ogden's observation in 1826. Using these data together, we may estimate that 50-person agregations would demand:
Scenario A
Roughly eight six-person structures (48 people), each with a floor area of 18m^2 for a total 'Dwelling Occupation Area' of 144m^2,
or
Scenario B
Roughly six nine-person structures (54 people), each with a floor area of 45m^2, for a total 'Dwelling Occupation Area' of 270m^2.
'Dwelling Occupation Area' reflects only habitation structure floor area. Extra-dwelling facilites and activity areas, such as middens and hearths, very likely added area used by a given 50-person population. We may add such extra-habitation areas to a figure we may call 'Total Occupation Area'. If we estimate a 1m 'buffer zone' around each dwelling, we add roughly 8m^2 to the area of Scenario A, and roughly 6m^2 to the area of Scenario B (for Total Occupation Areas of 152m^2 and 276m^2, respectively). We must also add space representing general living space associated with an hypothetical occupation; area for such activities as cooking, stone- and bone-working, and so on. Ethnographic data suggest roughly 4m^2 of activity area per working person at an open-air site (Hayden 1979). If we estimate that of 50 people, at least 50% may be classified as working people (those requiring work space outside the dwelling), we must add at least 96m^2 of working area to Scenario A (for a Total Occupation Area of 248m^2) and 108m^2 to Scenario B (for a Total Occupation Area of 378m^2).
To summarize:
Scenario A: 48 people in eight dwellings: Total Occupation Area = 248m^2.
Scenario B: 54 people in six dwellings: Total Occupation Area = 378m^2.
Note that these are minimum figures per scenario.
Because each Harney Dune Archaeological Project collection unit is 10mx10m in size, with an area of 100m^2, Scenario A would be represented by roughly 2.5 collection units, while Scenario B would be represented by roughly 3.7 collection units. A single 10x10m collection unit with an area of 100m^2 may represent roughly half the area required for a large, 50-person aggregation; or it may represent an entire, more typical aggregation of roughly 25 people (roughly four or five extended families living in four or five dwellings).
We must also consider artifact scatterning processes which would likely increase the area of an archaeological record of a given occupation and use episode. Low-energy water action (lake level rise and fall, and occasional slough movement) and wildlife and human trampling are the most likely agents to have moved artifacts in this low-energy environment. Middens, hearths, ovens and large camprock features are unliklely to have moved significantly from their point of production or deposition. Therefore, I estimate only a 'scatter factor' of roughly 10m^2 to be added to each Total Occupation Area: for Scenario A this yields an Archaeological Record Area of 258m^2; for Scenario B, an Archaeological Record Area of 388m^2. Neither addition radically alters the number of collection units likely to be sensitive to archaeological correlates of a single occupation by a roughly 50-person population.
These estimates must be recognized as thinking devices, but they may also generally be used to make decisions regarding archaeological sampling and analysis. For example, when we are looking at occupation-debris distributions, it seems clear that while artifact distribution maps display high densities in, say, six or more adjacent collection units, these six units are very unlikely to represent only a single occupation; they more likely represent two to four occupations, depending on population per episode. We must remember, however, that such estimates here are being applied only to the concept of a dwelling episode: other activites, such as large-scale fishing (or small-scale lithic working) may generate Archaeological Record Areas larger or smaller than Total Occupation Areas, depending on the nature of the activity.
What may we say, then, with regard to analytical sampling units (as opposed to data collection units), based on our model of the Total Occuption Area and the associated Archaeological Record Area? With regard to occupations, we may estimate that a given occupation episode might be discernible in a minimum of one, and a maximum of four, data collection units. In seeking discrete occupations, then, I would estimate the use of three data collection units combined as a single analytical unit. I would also say these analytical units should be selected where they occur near middens and/or high concentations of camprock, the co-occurence of which I would expect to be more likely near dwelling areas.
A very crude estimate of site use duration might be calculated by multiplying the Archaeological Record Area of a given occupation by the area of land on which we see artifacts and features which are likely the result of occupations. For example, the area just North of Mud Lake Slough contains at least eight hearth features and numerous middens. This area is roughly 1,250m^2 in size. If each 50-person occupation was NOT in precisely the same spot as in the previous occupation, and moved, say 10m per occupation, and if each occupation 'consumed' roughly 300m^2 (between Scenario A and Scenario B), this 1.25km area would represent roughly 4.1 occupations (1250/300). Since it is unlikely that this is the case, and people probably re-occupied areas used beforehand (particularly when such areas were invested with facilites such as fire rings and middens), we note that this estimate is probably too low. We must recognize that occupation probably migrated from one camp area to another to some degree; but the degree of spatial redundancy must be recognized as unknown. To make a use duration estimate with this method, all we can do is select whatever degree of spatial redundancy we suspect was employed (and we must further recognize that degree of spatial redundancy could vary through time). In sum, we may consider the following figures, which estimates different numbers of occupation episodes in this 1.25km area depending on degree of spatial redundancy:
0% redundancy (sites never re-occupied) = 4.1 occupations
10% redundancy (sites re-occupied 10% of the time) = 4.62 occupations
20% redundancy (sites re-occupied 20% of the time) = 5.20 occupations
30% redundancy (sites re-occupied 30% of the time) = 5.95 occupations
40% redundancy (sites re-occupied 40% of the time) = 6.94 occupations
50% redundancy (sites re-occupied 50% of the time) = 8.33 occupations
60% redundancy (sites re-occupied 60% of the time) = 10.41 occupations
70% redundancy (sites re-occupied 70% of the time) = 13.88 occupations
80% redundancy (sites re-occupied 80% of the time) = 20.83 occupations
90% redundancy (sites re-occupied 90% of the time) = 41.66 occupations
100% redundancy (sites re-occupied 100% of the time) = cannot be calculated
We must also consider that this crude reckoning does not allow for re-occupation of an area skipped in the previous occupation episode: the estimate above simply calculates the 'consumption' of 1,250m^2 by different increments (camp areas of 300m^2 for zero redundancy, 270m^2 (300m^2-10%) for 10% redundancy, and so on).
It is very likely that not all concentrations of artifacts at Harney Dune represent discrete occupations. Some, such as the high concentrations of chert debitage towards the South end of the study area, probably represent different activities as well as different occupations.
3.3 Cluster Generation
After reviewing the data, I decided that K-means clustering was too insensitive to differences between the observed clusters: the K-means-derived clusters were driven almost entirely by the frequency of chert artifacts. I decided that it was in fact better to identify a number of clusters visually, using the variable ARTIFACT (discussed above) in rather small sampling units of 4-9 collection grid units each (the rational for such small sampling groups is also discussed above). Despite decades of research into the quantitative identification of spatial patterning, some authors recognize that the human mind is an excellent tool for spatial pattern recognition, and some prominent authors have rejected a quantitative cluster-identification approach (c.f. Keeley 1991), selecting obvious clusters by eye, and saving the statistical procedures for characterizing these clusters.
In short, in our case, I feel the obvious clusters seen in the ARTIFACT distribution were of a relevant scale and I expected would reveal more about human behaviour than larger cluster units. After selecting the clusters (see figure 'clustermap.jpg') I characterized each cluster in terms of statistical properties (seen in the boxplots) and in terms of assemblage structure percent composition. Results are discussed below. In general, I may say that the data show clear distinctions between these clusters, and that such distinctions are of the types we may expect to be related to human behaviour. I can conclude at this point that the data set is of sufficient quality for further investigation of the specific hypotheses.
The cluster run I did conduct was a JOIN clustering method often used appropriately for the comparison of multivariate assemblage structures (e.g. Shennan 1988: for an example see Smith 1996). Methods are described above. This cluster run generated a dendrogram seen in figure 'dendrogram1.gif'.
3.4 Cluster Discussion
The file 'dendrogram1.gif' displays the dendrogram generated by using artfact assemblage data per each of 56 collection units; these 56 units represent the 10 clusters identified visually and noted above. Note that cluster identifier number was NOT used as raw data, as this would improperly cluster units together simply because they are of the same cluster. The objective here was to see what clustering would result from artifact assemblage structure alone. In such cases, when this type of investigation shows that members of clusters identified visually tend to stick together in the dendrogram, one is reassured that the visually-identified clusters are real and that their member assemblages are to a degree similar.

The cluster dendrogram is labeled with 13 letters, A through M. These identify subclusters.
Subcluster A. These units, all from the southern third of the study area, are characterized by high frequencies of chert flakes and low frequencies of FCR. The occasionally also have low frequencies of chert cores.
Subcluster B. These two units, one in the far south and one in the Central habitation area, share the characteristic of low frequencies of obsidian blades; otherwise they are rather different in all respects.
Subcluster C. These units, all from either the Southern and Northern extremes of the study area, are characterized by low FCR scores.
Subcluster D. All three of these units, from the same cluster (#5, a proposed habitation area just south of the Mud Lake Slough), are characterized by relatively high frequencies of chert cores, obsidian flakes, FCR and ground stone. In assemblage composition, this cluster has a relatively high diverstity of artifacts.
Subcluster E. This subcluster pairs two members of Central habitation area clusters 5 and 7. Each has an unusually high frequency of basalt flakes (a relatively rare artifact type overall); each also has a relatively high frequency of obsidian blades and flakes.
Subcluster F. Here are grouped seven units from a variety of clusters; the group in sum is characterized by high freqencies of chert flakes (particularly two members from the most southerly Cluster #10 and two members from the also Southerly Cluster 3), basalt flakes (particularly the immediately-linked members of habitation area Clusters 5 and 7) and obsidian blades.
Subcluster G. Here we see a subcluster formed from rather different cluster members (of Cluser 2, to the South, and #5, a habitation area cluster just South of Mud Lake Slough). Though these areas are expected to have different functions, these two units are paired because of their relatively high frequencies of chert cores and ground stone. Because ground stone is a relatively rare category of artifact, it is likely that this is the factor pairing these otherwise rather different assemblages in this case.
Subcluster H. All members of this subcluster are from the Central habitation area, or northward of this area: these units are characterized by low frequencies of chert flakes, moderate to high frequencies of basalt flakes, and low frequencies of groundstone and netweights. This subgroup is clearly distinct from other subclusters in many respects, and distinguishes these Central-Northerly units as distinctly different from most of those to the South.
Subcluster I. This subcluster groups units from towards the Central-Northern area, characterized by moderate frequncies of groundstone and conspicuously low frequencies of basalt flakes.
Subcluster J. This rather large subcluster contains several members of cluster #4, a non-habitation area just South of Mud Lake Slough, and cluster numbers 10 and 11, both non-habitation areas far to the North. These unit assemblages are all characterized by low frequencies of FCR, basalt cores, netweights and obsidian blades. None have particularly high frequencies of any artifact class.
Subcluster K. This subcluster contains three members of the same non-habitation cluster (#4, just South of Mud Lake Slough) and one member of the habitation area cluster #7. These units have in common a low frequency of chert flakes and low chert raw material.
Subcluster L. This subcluster contains only units from the central habitation area clusters #6 and #8. These units are characterized by high frequencies of netweights.
Subcluster M. Here we find, as in subcluster L, an aggregation of units with high frequencies of netweights, but combined with the common feature of low frequencies of basalt cores and obsidian blades. It is important to note that all four of these subcluster member units are from habitation area clusters #6 and #8.
Finally, we see a unit entirely by itself, forming a 'cluster of one': this is a single habitation-area cluster member unit (from cluster #5, just South of Mud Lake Slough) with a singularly high frequency of basalt flakes and chert tool.
On reviewing the cluster analysis, we may state the following:
1. Assemblage composition is variable enough to be an effective ordering device in this analysis.
2. Assemblage composition orders the collection units into clusters with recognizable characteristics, such as high chert frequencies in the South and high FCR and Netweight frequencies in the North.
3. The sorting effected by assemblage compostions generally reflects the characteristics revealed in examination of assemblage composition in terms of cumulative percent contribution per collection unit -- this is to be expected and is reassuring that the cluster analysis is functioning properly.
4. Clear distinctions between areas may be characterized with this method, and finer distinctions will probably be possible in further analyses using more (and more specific) artifact classes.
3.5 Assemblage Structure by Cluster
The file '10clusterassbstruct.gif' displays the assemblage structure of the 10 clusters in terms of cumulative percent composition of the following artifact types: CHERT, OBSIDIAN, BASALT, NETWEIGHT, FCR and GROUNDSTONE.

Although not all variation is seen using these macro-categories, two things are immediately clear: the Southerly clusters 1, 2 and 3 are dominated by a high percentage of chert artifacts and the Central clusters 5, 6, 7 and 8 are rather similar in that they are each less dominated by any one assemblage component and in sum they are less dominated in particular by chert than the Southerly clusters. It is very likely that these clusters have higher artifact class diversity than the Southerly (or Northerly) clusters, which, combined with the more even distrivution of artifact classes here, further supports the hypothesis that these central units represent dwelling areas where a wide variety of activities took place. Somewhat less obvious is the domination of obisidian artifacts in the assemblages of clusters 4, 7, 9 and 10, which is notable that three of these four clusters occur outside the hypothesized habitation area. Again, obsidian was used and worked in the habitation areas, but, as with chert working, it appears to have been an activity spatially segregated from the habitation areas.
This discussion assumes a general but unknown degree of contemporaneity between all these clusters.
4.0 CONCLUDING REMARKS
I can conclude the following based on this initial examination of the dataset.
1. The data are of sufficient quality to be used to identify activity per sub-area of the study are; that is, there is nonradom variation in the data set that appears to be attributable to human behaviour rahter than to (solely) natural formation processes.
2. The data should be of sufficient quality to test specific hypotheses regarding site function.
3. Initial examination of visually obvious cluster artifact assemblage compositions clearly orders clusters into groups which represent different activities and/or area use durations. Further investigation should be very fruitful.
4. An interesting exploration will be of the use of site area as a surrogate of site occupation duration. I feel that we may investigate futher into this question by refining the models of Total Occupation Area, Archaeological Record Area, and so on.
5.0 REFERENCES
Ames, K.M. 1988. --- ? -- Paper on cluster analsyis and assemblage composition comparisons...Cameron look this up
Davies, K.G. 1961. Peter Skene Ogden's Snake Country Journal: 1826-27.
(Volume 23).
Hudson's Bay Record Society, London.
Dewar, R.E. and K.A. McBride. 1992. Remnant Settlement Patterns.
inRossignol, J. and L. Wandsnider, L. (eds) 1992. Space, Time and Archaeological Landscapes:227-255. Plenum Press, New York
Hassan, Fekri. 1981. Demographic Archaeology.
Academic Press, London.
Hayden, B. 1979. Palaeolithic Reflections: lithic technology and ethnographic excavation among Australian Aborigines.
Australian Institute of Aboriginal Studies, Canberra.
Keeley, L.H. 1991. Tool Use and Spatial Patterning; Complications and Solution.
in
Kroll, E. and T.D. Price (eds) 1991. The Interpretation of Archaeological Spatial Patterning:257-268.
Plenum Press, New York.
Raymanod, A.W. 1994. The Surface Archaeology of Harney Dune (35HA718), Malheur National Wildlife Refuge, Oregon.
US Department of the Interior, Fish and Wildlife Service, Region 1. Cultural Resource Series #9.
Shennan, S. 1988. Quantitative Archaeology.
Unoversity of Southampton Press, England.
Smith, C.M. 1996. Social Stratification Within a Protohistoric Plankhouse of the Pacific Northwest Coast: Usewear and Spatial Distribution Analysis of Chipped Lithic Artifacts.
MA Thesis, Department of Anthropology, Portland State University, USA.

Cameron M. Smith
Department of Archaeology
Simon Fraser University
January 15-17, 2000.