|
|
|
Click on PC or MAC to download the Data Set for this lab. PC (264 K) MAC (300 K) Spatial Analysis I: Classification
Note: This laboratory covers material in chapters 6 and 7. It requires the ArcView Spatial Analyst extension. If you do not have access to Spatial Analyst, skip this lab. Cartographic classification. Cartographic features are frequently classified so that patterns in the data can be visualized. Classification methods are statistical techniques for placing individual cases into groups called classes. Maps of polygons that are classified are called choropleth maps. Choropleth maps are frequently found in atlases and newspapers where they are used to portray information about areas and their subsets, e.g., countries and states. You will explore several automated classification techniques in ArcView. During this lab, we will consider ways to select the best classification variable for identifying spatial patterns. We will also look at some data error problems. Census data and spatial analysis. The U.S. Census is a major source of demographic data in the United States. Demographic data is data about the distribution of people. The census has created nested spatial units which contain increasingly smaller areas. The data that we will work with are spatial units called census tracts and are low resolution expressions of demographic information. The highest resolution units publicly available are blockgroups which nest heirarchically inside of the tracts. Tract areas are defined by population numbers. Areas of high population density have small tracts while areas of low population tend to have large tracts. You will see these patterns as you look at the tracts surrounding the Atlanta metropolitan area.
Task Set 1: Data Requirements for Classification In this laboratory we will explore the data requirements for classification. Because we are classifying polygons of varying size based on their data values, we must factor out the effect of size on the classified variable. For example a very large polygon should have many more people living in it than a very small polygon. To examine the differences in population structure, we look at a variable which is a ratio of population to area called population density. It is very important to remember that only area-normalized data or other ratio data can be used in a choropleth map. Start ArcView. Copy the Atlanta data directory into your own working directory. Add the Attract.shp theme from your working directory. This theme is composed of demographic data collected during the 1990 census. Open the table for Attract.shp and examine the fields that are available in this data. Question: List all of the fields (attributes) that are available in the attract.shp table. Look at the Fields labeled Pop_90, Pop_93, Pop_98, and Pop_growth. Question: Is the Pop_growth field related to Pop_90, Pop_93, and Pop_98? What do the values in the Pop_growth field mean? How were they calculated? If you were told that there is a companion handbook to the U.S. Census that explains these data, would you want to use it now? What would you want to know? Make your view active and select the Legend Editor.
Change the Legend Type to Graduated Color. More selection fields will appear.
Set the Classification Field to Pop_90 and Normalize by Sq_miles. If you do not like red, reset the Color Ramps to a color of your choice. Please do not use the Classify button yet - we will accept default classification.
Question: What process are you mapping when you normalize population with area? What do you think might be wrong about mapping raw population values without the normalization? Question: Why do you suppose the lowest class has a negative value for Population /Area? Since there appears to be a negative value somewhere in the Pop_90 data field which is skewing the classification we will need to do some data manipulations. Return to your view. Make sure that Attract.shp is active. Select the Query Builder.
A query window will pop up for Attract.shp. The query builder allows you to write logical queries in order to select particular records from your data set. Because these are logical expressions it is very important to have all parentheses in the correct locations. Incorrect expressions cannot be processed. Notice that there is a pair of parentheses in the window. If you use the automated selection tools, parentheses will be placed properly for you. In general, all opening parentheses must be matched by closing parentheses. The query builder window contains a scrolling list of Fields on the left and a scrolling list of Values on the right. In between the lists are the automatic logical operators that are available to you.
To write a query you will double click on a field, select a logical operator and then double click on a value. Your query expression will look like the following.
You have selected all of the records where Pop_90 is not equal to ñ99. The value ñ99 is not a data error in this case, but is rather a place holder for no data. Question: What does the logical operator <> mean? Describe in your own words what effect this query will have on your data set. Select New Set. Most of the polygons in your view should be selected. Now select from the menu bar Theme->Convert to Shapefile. A new theme should appear which does not contain the polygon whose attributes contain negative values. Give this theme a new name. Make your new theme active. Now classify the polygons according to the steps in Task Set 1 starting from the sentence after Question 2. Print this view and turn in with the lab. Next classify the polygons without any normalization. Print this view and turn in with the lab. Question: Describe the similarities and differences between the two themes and their legends. Do the classes from attract.shp have any interpretive value? Question: Compare your classified population map, normalized by area, to your map which is not normalized. Which is more meaningful? Write a short analysis of the population patterns in the Atlanta urban area. The variable that you created through classification was a population density variable. Another way to make such a variable is to create a new field and calculate the density value as a ratio of population to area. In order to do this you will need to edit the data. First make certain that you have write permissions for the data. Navigate to your own Atlanta directory. Open the directory and select View->Details.
In this example the Attributes field has an R and an A for each file. This means that the files are read only. In order to edit them you must change the attributes. If there is no R in the attributes field, go on to Task Set 3. If you do have read-only files you must change them to writeable files. Select all of the files - there should be five of them. Right click and look at the pull down menu. Select Properties. Under the General Properties tag look at the attributes. If Read-only is checked, click in the box and turn it off.
Select Apply. The Atlanta directory listing should now look like this.
Check your new theme for write privileges as well. If it is read only, reset the file attribute properties. Now we will edit the table for your new theme to create a ratio variable of population density. Make your theme active and select the Open Theme Table button.
With the table active, select Table->Start Editing. Select Edit->Add Field. Define Field Name as "popden", Field Type as "Number", and the Field Width as 16. Select OK.
The field popden should appear as the last field in your table. Now select Field->Calculate.
The Field Calculator will pop up. Here you will write a mathematical expression in which attribute names stand for the field values. Double click on the Field name Pop_90. Double click on the Request / (divide by). Double click on the Field name Sq_miles. Your Field Calculator should look like this.
Select OK. Select Table->Stop Editing and save your edits when queried. Now classify your data using the Legend Editor. Let the Classification Field be popden and select <None> for Normalize by. Select Apply. Question: Is there a difference between the two approaches to normalizing population data by area? What would have happened if you had used a Pop_98/Sq_miles ratio?
Task Set 2: Cartographic Classification We will now explore different types of classification available in ArcView. ArcView provides five approaches to classification: Natural Breaks (which is the default), Equal Area, Equal Interval, Quantile, and Standard Deviation. All of these classification approaches are based on the structure of a data variable. A Natural Breaks approach is based upon the histogram of a data variable with counts in the y axis and values in the x-axis. A simple histogram appears below.
In the histogram we can see a Natural Break about halfway along the x-axis. This classification approach assumes that humans can intuitively find such breaks in a histogram, and design classes accordingly. ArcView uses an algorithm to statistically optimize the Natural Breaks approach. This is a nice approach because it is intuitively easy to understand. It can be inappropriate when some classes in a histogram have many counts and others have few. For explanations of the other 4 classification approaches use ArcViewís help index and type in "natural breaks". The page on classification methods is brief, but should be sufficient. At the beginning of Exercise 1 we said that choropleth data must be ratio data. Look now at your data table for Atlanta (you have assigned this data a new name). Question: List all of the attributes for your data table. State whether you think each attribute is a ratio, is not a ratio, or you are not sure. Decide on a ratio variable field from your table which you will use in this exercise. You may not use the popden field that you created. Question: What is the name of the attribute you have chosen? How do you know that it is a ratio variable? Now you will classify your variable using each of the classification methods. You will print each map to turn in with your lab. Make certain that you use layout, or add text to your view so that your TA or instructor can tell which classification method you used for each map. With your theme active, open the Legend Editor. Set Classification Field to your selected variable. Set Normalize by to None. Click on the Classify button. A Classification window will pop up.
Pull down the Type menu, and set the Type to Natural Breaks. Set the number of classes to 5. Round values to d.d. Select OK. In your Legend Editor, select Apply. You should see the classes on your map change. Label this map either in the View or in Layout. Be sure to include the variable and the classification method used. Print your map. Repeat this technique for the remaining 4 classification techniques, changing the classification Type each time so that you make a map of your variable for each classification method. Question: Write a discussion of the differences and similarities between your maps. Do different approaches create different patterns? Do you see a spatial pattern in your variable? If so, what does the pattern mean? Which classification method do you like best for your variable? Note: Turn in 7 printed maps with this lab. Go to top
In this lab we have explored some of the database issues related to spatial classification. We have seen that when small errors or irregularities occur in a data table, extreme errors can be produced through mathematical manipulations. In order to use mathematical or statistical methods it is very important to develop the ability to spot problems in the data, diagnose, and solve them. In addition we experimented with creating some simple choropleth maps. Classification is a major area of cartography, and has been covered here very briefly. Further experimentation with the classification methods in ArcView can help to develop a deeper understanding of the effects of classification.
|