First Processing Steps

Now that I’ve got the data into R, lets see what it looks like.

First, lets look at which sites have been up the longest:

107  NC_Asheville_13_S           35.42    -82.56  11
108   NC_Asheville_8_SSW       35.49    -82.61   11
88   KY_Versailles_3_NNW      38.09    -84.75  10
105  MT_Wolf_Point_29_ENE    48.31  -105.10  10
106  MT_Wolf_Point_34_NE     48.49   -105.21  10
117  NH_Durham_2_N             43.17      -70.93   10
118   NH_Durham_2_SSW       43.11     -70.95  10
137  NV_Mercury_3_SSW         36.62   -116.02  10
150  RI_Kingston_1_NW          41.49     -71.54   10
151  RI_Kingston_1_W            41.48     -71.54    10

The “hump” in the data would be what we would expect over the seasons we need to “anomalize” the data to remove the seasonal effect.

This is done by applying the as.anomaly function:

as.anomaly<-function(index, values){

averages = tapply(values, INDEX = index,FUN= mean, na.rm = "T")
anom <-mapply(function(x, y) y - averages[as.numeric(x)] ,index, values)

Which give us this “bow tie” plot

Creating the CRN Data Set

Now that we’ve got the data downloaded and a set of functions to process the files, let’s create the data frame for the data and a meta data frame for general site data (name, lat , long ect). I also wanted to keep track of how many years of data there were for each site and to order them by the longest number of years (the order column). This was so I could look at the largest data sets first.

#directory where the function code was stored
dir <- "~/Documents/CRN-R code"

#directory where the data is stored
dir <- "~/Documents/CRN"

# don't automatically make strings into factors
options( stringsAsFactors = F)

# extract a list of all the sites
crnsites <- getsites()

#make a list of the filepaths
crnfilepaths <- getfilepaths()

# for each site make a list of the files for that site
crnfiles <- lapply(crnsites ,FUN = getsitefilenames,filepaths = crnfilepaths)
#now build up a list of each sites data
crnlist <- lapply(crnfiles,getsitefiles)

site.years <- as.numeric(lapply(crnfiles, function(x) length(x)))

#create a data frame of the meta data for the sites
crnmeta <- NULL
crnmeta <- sapply(crnlist, function(x) x$meta)

crnmeta <- data.frame(t(crnmeta), site.years, row.names = NULL)

#get the order for largest number of years to smallest
crnorder <- order(site.years, decreasing = T)
crnmeta = cbind(crnmeta, crnorder)
names(crnmeta) <- c("name", "lat", "lon", "years", "order")
crnmeta$lat <- as.double(crnmeta$lat)
crnmeta$lon <- as.double(crnmeta$lon)
# order the meta data to get a list of the longest sites
crn.longest <- crnmeta[crnmeta$order,] <- which(crnmeta$name %in% crn.longest$name[1:10])

crndata <- lapply(crnlist, function(x) x$data)

crndata.longest <- crndata[]

CRN Data input functions

First off I wanted a list of sites. Since each site has multiple entries that are in the yearly subdirectories I wrote a function to cycle through the directories and make a list of sites from the file names.

# cycles through the parent directory and retieves the filenames
# of the files in the sub directories extracting the name of the site
#from the file path and creating a list of unique site names
getsites<- function(){
crnpath = "~/Documents/CRN/"
crndata crndir
list.files(path = crndir)->crnfiles
list.files(path = crndir,full.names = TRUE)->crnfilepaths
crnsites crnsites


This function is similar but returns all of the file paths

getfilepaths <- function(){
crnpath = "~/Documents/CRN/"
crndata crndir
list.files(path = crndir)->crnfiles
crnfilepaths <- list.files(path = crndir,full.names = TRUE)

This function organizes the file paths into a list of lists:

getsitefilenames <-function(sitename, filepaths){
crnfilepaths[grep(sitename, filepaths)]

And this takes this information and gets the data for each site
creating a list of the data and a list of the sites meta data.

getsitefiles <-function(sitepaths){

sitedata <- lapply(sitepaths, read.table, na.strings = "-9999.0")
sitedatamerged 1){
for (i in 2: length(sitedata)){
sitedatamerged <- rbind(sitedatamerged, sitedata[[i]])
# name of the site from the file path
crnsitename <- substr(sitepaths[[1]], regexpr("CRNDAILY01", sitepaths[[1]])+16, regexpr("txt", sitepaths[[1]])-2)
crnlat <- as.double(sitedatamerged $V6[1])
crnlon <- as.double(sitedatamerged$V5[1])
crnsite <- c(name = crnsitename, lat = crnlat, lon = crnlon)
datestr <- sitedatamerged$V3
yearstr <- substring(datestr,1,4)
monstr <- substring(datestr,5,6)
daystr <- substring(datestr,7,8)
dateval <- strptime(datestr,format = "%Y%m%d")
juldate <- as.numeric(julian(dateval))
dayofyr <- as.character(dateval, format = "%j")

crndata = data.frame( crnsitename , as.factor(sitedatamerged$V1), yearstr , monstr ,daystr, juldate, dayofyr,sitedatamerged$V7, sitedatamerged$V8, sitedatamerged$V9, sitedatamerged$V10)
names(crndata)<-c("name" ,"wban", "year", "month", "day","julian", "dayofyr", "tmax", "tmin", "tmean", "tavg")

return(list(meta = crnsite, data = crndata))

Getting the CRN data

Many questions have been raised over the quality and reliability of the data that is being used to determine the global temperature. Issues such as microsite biases, UHI and TOBS have been raised. A number of sites such as Climate Audit, Surface and others, have made strides in determining problems with the temperature stations. In this decade an attempt to provide reliable sites has been implemented by NOAA
<blockquote>The U.S. Climate Reference Network (USCRN) consists of 114 stations developed, deployed, managed, and maintained by the National Oceanic and Atmospheric Administration (NOAA) in the continental United States for the express purpose of detecting the national signal of climate change. The vision of the USCRN program is to maintain a sustainable high-quality climate observation network that 50 years from now can with the highest degree of confidence answer the question: How has the climate of the nation changed over the past 50 years? These stations were designed with climate science in mind.

Given the availability of the CRN data some analysis can be undertaken to see if these sites show the same trends as the sites that are used to create the US portion of the temperature data for the global temps and to make some determination if corrections that are in use are actually doing what their implementors claim. Continue reading


I’ve created this blog to hold code and explanations for any climate related analysis that I do in the future.  My main interests at this time are sea ice and CRN analysis and my code choice is mainly R and JAVA if R doesn’t fill all my needs.  I decided to do this after seeing Steven Mosher’s blog where he is doing MoshTemp.