Home > What is Where? > ArcView 3.1 Lab >
     
What is Where?
ArcView 3.1 Lab

Click on PC or MAC to download the Data Set for this lab. PC (8 K) MAC (8 K)

Geocoding: Matching Addresses with Locations

Before proceeding with the lab, read about Address Geocoding in ArcView's Help Topics.

Geocoding Data. Geocoding data involves conflating tabular data with spatial data to establish the geographic locations of records in the table. As you will see in this lab, geocoding algorithms have their limitations. You will have an opportunity to refine your data and set certain preferences, but not others.

Address Matching. Address matching is a special type of geocoding algorithm that pinpoints an address on a street network. The geocoding algorithm finds an address based on certain assumptions. Typical assumptions, and those used in this lab, are that all blocks are numbered 1 to 100, that the even numbers fall on the right side, and the odd numbers fall on the left side of the street. It also assumes that the parcels are evenly spaced. We will consider situations in which these assumptions lead to good matches and poor matches.

Batch versus Interactive. Addresses are listed in many ways, and even the same address may be listed differently in two different databases. In this lab, you will set geocoding preferences to give some guidance in how the address is parsed. After the preferences are set up, geocoding is performed either automatically (batch mode) or interactively. As you will see, matching is not particularly good in many areas, and interactive matching is almost always necessary. In the best of circumstances only about 75% of addresses will match in batch mode.

Task Set 1
Task Set 2
Recap

 

 

Task Set 1: Geocoding I

In this exercise, we will match a small number of addresses to a very small section of the street network of Santa Barbara, California. The streets in Santa Barbara are difficult to address match because naming conventions are not standard. The use of street names such as Calle Laureles, or Alameda Padre Sierra makes the use of a geocoding algorithm difficult. In the case of this exercise we will be working with streets which have single names. You will find that the use of directional prefixes (such as N., W., E., and S.) also causes problems. In this task we will match five addresses to a small street grid. A journalist to track the addresses related to a mysterious event might have created this file. You need to create an address database file mystery.dbf. It should look exactly like the following table. If you are not certain how to create this dbf file, refer to previous labs. Save this file.

Copy Eastside.shp from the data directory to your work directory. Open ArcView, and select Add Theme. Add Eastside.shp to your view. Make it viewable and active. You should see a simple grid of streets. Use Theme->Auto-label to see the names of the streets (Hint: Allow overlapping labels). Select Theme->Properties. In the left hand column are options for properties. Click on Geocoding.

Address Style should be set to US Streets. In the scrolling window on the right, scroll down to Street Name, and make sure that Name is selected. Select OK. Select View->Geocode Addresses.

Make certain that the Address Style is US Streets, the Address Table is mystery.dbf, and the Address Field is address. You will have created mystery.dbf; you will need to use the directory browser to navigate to its location.

Select Batch Match.

Question: When the Re-match Addresses window pops up, what are the results of your matching? Is this what you expected?

From the Re-match Addresses window, select Geocoding Preferences.

The Geocoding Preferences window should pop up. This window allows you to set address matching parameters.

Question: By looking at the Re-match Address window you should be able to speculate on what the geocoding algorithm does. What do you think the scores mean? How is the algorithm treating the street network data?

Check all four boxes in the upper left corner of the window. On the right side of the window, reset Spelling Sensitivity to 40, Minimum match score to 30 and Minimum candidate score to 11. It may not be possible to set values exactly to 40, 30, and 11. Approximate values are fine. Select OK. When the Re-match Address window pops up, set the Re-match frame to All Records. Select Batch Rematch.

Question: What happened when you ran Batch Rematch?

Experiment with tuning the three algorithm parameters until you get some partial or good matches. Do not select the Done button.

Question: List three sets of parameters that you tried, as well as the Match rate and Match quality, e.g., Spelling Sensitivity, Minimum match score, Minimum candidate score, Match rate, Match quality.

Question: Did you get any Good Matches?

Now set the three parameters to 18, 14, and 11 respectively. Try Batch Re-match.

Question: How do you interpret the fact that you had to reduce the parameters to such low levels to get even partial matches?

You should now have 5 partial matches. In the next step we will use Interactive Re-match to complete the process of address matching. Do not select the Done button. Now that you have match candidates, you will use Interactive Re-match to select the correct streets. Select Interactive Re-match. The Geocoding Editor will pop up.

Each address will appear in the address frame and the candidates for matching will appear in the large scrolling table at the bottom left. You may now examine the candidates for matching. Select a candidate by left clicking on it. It will light up in yellow. After you select a candidate, click on the Match button. After you have interactively re-matched all of the records, select Done. The Re-match Address window will return. Select Done. Your view will return with a new theme, Geocd1.shp. Make this theme viewable and active, and use Theme->Auto-label to place the addresses in the View window.

Your view should now look like this:

Print a copy of your view and turn it in to your TA with your lab.

Question: Do you think the locations of the addresses look correct? How would you test the accuracy of this algorithm? If you had matched 10,000 addresses, how would you test the accuracy?

Question: If you were told that street addresses in Santa Barbara are numbered from 1 to 56 rather than 1 to 100, what do you think would be the effect on the matching algorithm?

Go to top

 

 

Task Set 2: Geocoding 2

Now we will look at the data table for the Eastside data set. The data set that you were given to work with was not perfectly designed for the address matching algorithm in ArcView. We will now fix the data table for Eastside.shp. Make Eastside.shp active. Use Open Theme Table to open the data table. Now we will do some editing.


Note: Once this editing is saved, your data table is permanently changed. Before you save your edits, be sure that they are correct.

With the Attributes of Eastside.shp table active, select Table->Start Editing. Select Edit->Add Field. Define the field as below:

Select OK. Add another field. Call it Sur_type. It should be a string and have a width of 5. When you have added these two fields, select Table->Save Edits. Do not stop editing. Now select the Dir field and choose the edit tool, ...

...to fill in each record with the street direction. For example, if the Street segment name for a record is E Valerio St., fill in the Dir attribute for that record with E. Sur_type can be edited in the same way for each record. For the above example, E Valerio St., fill in the Sur_type with St. Now you will edit the Name field. Remove the direction prefixes (E, N, S, and W) and the type suffixes (St, Av, etc.). The Name field should now contain only the name of the street. For the example above, the field should contain Valerio. Examples of the field names and attributes are shown below.

When you are certain that your edits are correct, select Table->Stop Editing, making certain to save your edits.

Question: Given your earlier experience with address matching, do you think that this will improve your Batch Match rate? Do you think that you will get any good matches, or only partial matches?

Now select Theme->Properties and with the Geocoding option active, edit the Address Matching Frame. The checked attributes are those attributes in the street data table that will be searched for matching. The more matchable attributes you have in your table, the more precise your matches will be. You should have read the documentation that explains these attributes and the information they contain in ArcView Help. If you have not, you need to do so now. Do not change the matching attributes for LeftFrom, LeftTo, RightFrom, or RightTo. PreDir (this means the direction is a prefix to the street name) should be set to Dir.

There should be no PreType because the type of street follows the street name in most Santa Barbara streets. StreetName should remain Name, but recall that we made major edits to the records in that field. StreetType should be set to Sur_type, and SurDir should be set to None. Select OK.

Now select View->Geocode Addresses. Be sure that the Address Table is mystery.dbf. Select Batch Match.

Print a copy of this new view and turn it in with your lab.

Question: What happened when you used Batch Match this time? Do you need to do interactive re-matching? Do you think that it is worthwhile to fully edit and optimize your street data table prior to address matching?

Question: It is, in fact, true that the streets in Santa Barbara are numbered from 1 to 56. Do you believe that your final map reflects the true locations of your 5 addresses? Is there a way to fix this? How would you do it?

The previous question deals with the problem of variation from the standard 1 to 100 numbering of street addresses on blocks in the United States. In Santa Barbara, street blocks are numbered from 1 to 56. Using your solution to the question, make the changes needed to the database, geocode the new data, and print the new view.

Turn this new view in with your lab.

Question: Do you think that making this change is important? Can you think of a real world problem that could be caused by poor address matching? Go to top

 

 

Recap

In this lab you have explored issues of data accuracy and optimization for use in a geocoding algorithm. It should be clear that while the algorithm is an automated procedure there are still many features of the process which the GIS user must control effectively in order to create effective analytic results. Understanding software documentation and referring to it when needed is an important part.

Go to top



Copyright © 1995-2009, Pearson Education, Inc., publishing as Pearson Prentice Hall Legal and Privacy Terms