Whence Data  |  <Previous  |  Next>

Gather some data from a Web Page


Entering data from a printed table of values or a table presented as a FAX or picture on the web can certainly be done. Often data on the web are found as genuine tables of text values. In this situation it is often possible to get the data directly from the web page and massage it into a useful form without the need to re-enter the data.

The National Oceanic and Atmospheric Administration has tabulated what are called "Global Surface Mean Temperature Anomalies":
" The average annual temperature of the globe is about 59 deg. F (15 deg. C). That value can be added to global anomalies to approximate absolute temperatures. Anomalies (also called departures from average) are used because they describe more accurately climatic variability over large areas than the absolute temperatures do and they give a frame of reference that allows for easier interpretation of the numbers. For example, a summer month over a large area may be cooler than average, both at a mountain top and in a nearby valley, but the absolute temperatures may be quite different at the two locations. The use of anomalies in this case will show that temperatures for both locations were below average. For these reasons, it is the anomalies that are computed for large-area summaries (like a hemisphere or the globe), not the temperature itself."

A page which lists departures from the 120 year means for land temperature for every month from January 1880 to December 2000, some 1440 data values is copied locally here. You will want to use this local copy to follow along in the examples below. You will find a set of links to the most recent updates to these historical data at the bottom of this page:
http://www.ncdc.noaa.gov/ol/climate/research/1998/anomalies/anomalies.html
the particular page copied locally came from:
http://www.ncdc.noaa.gov/ol/climate/research/1998/anomalies/land_F.all
Go to the local page here, select all the text on the page and copy it to the clipboard. This is done most easily by choosing Edit -> Select All and then Edit -> Copy.

Transfer the data to an Excel spreadsheet


Once on the clipboard, open Excel to a new worksheet and paste it in. Depending on your version of Excel, you may be able to simply paste, or you may need to perform some variation on Edit -> Paste Special... Text. The data should land in separate rows, not all in the first cell of the spreadsheet. The data are all now in the spreadsheet, but they are not nicely spread out across the columns of the spreadsheet. We will now massage the data into a still more useable form.

afterpaste2excel

Massage the data into clean columns


Select the first row by clicking on the number 1 to the left in the spreadsheet and then Edit -> Delete thereby deleting the row which has the descriptive text. We now have precisely the data that we want, but it remains to be distributed across the columns.

about2delete

Now select the first column and Data -> Text to Columns... These data are pretty clean so Excel is able to do a nice job of splitting the data apart as you can see from the preview. You can step through the Wizard confirming the way it has interpreted the columns of information.

text2cols

The data are now spread across the columns. You can scroll around in the spreadsheet and see all the values nicely arrayed in the rows and columns of the sheet.

Click on the empty space above row 1 and to the left of column A to select all the cells in the spreadsheet. Then double-click on the divider between column A and column B. This will cause Excel to change the width of all the columns of data so they are only just wide enough to show the data they contain and no wider, thereby pulling all the columns tightly together.

Now select the data from the upper left corner (above the column of years) to the lower right corner (December 2000) and then Edit -> Copy the data from the Excel spreadsheet.

copy2clip

Create a text file of the data

Now open your favorite TEXT editor and paste in the data you had copied from Excel. Save the file as noaa.txt and close your editor.

Don't have a favorite TEXT editor? Here are some suggestions about text editors provided with various operating systems.
Once you have performed the paste operation, you will see nice columns of data. While we had nice columns of data before, they were lined up by spaces in the original text from the web. Now each column is set apart by a single TAB character. Notice the way the cursor jumps from one number to the next as you move about with the cursor control keys. The TAB character plays the same role in the file that pressing the TAB key did when you entered the Physicians data earlier by hand. Save the file as noaa.txt and close NotePad.

saveas

Import the data into DataDesk


Return to DataDesk and File -> Import... noaa.txt. You will be prompted to use the first row (the month names) as variable names. Do so and you will have a nice folder of icons for each month, each with the 120 data values.

Other links with data you might like to investigate


Population
http://www.census.gov/ipc/www/world.html has links to various estimates of World Population.

http://www.census.gov/ipc/www/worldpop.html has data from 1950 to 2000 (local copy here).
http://www.census.gov/ipc/www/worldhis.html has estimates going back to prehistoric times (local copy here). This table would present a particular challenge to import since it has many blank areas. Probably only the first two columns would be easily used. A careful reading of the annotations would suggest ways to fill out the table to make it more appropriate to import into DataDesk.

Currency Exchange
http://www.oanda.com/convert/fxhistory has a tool for calling up a table of historical currency exchange values. There are several options for the format of the resulting table. We have been using ASCII in this exercise, though you may find that you can also readily work with any of the other options without too much difficulty.

Search for data on other topics of interest
http://www.google.com is an excellent search engine designed to locate "authoritative" web sites. Websites that rank high on a google search are ones which meet your search criteria and are most often referred to by other web sites.



Whence Data  |  <Previous  |  Next>

Last Updated: Thursday, March 17, 2005 3:04:17 PM