R tip: Sorting and keeping track of many response variables

January 23, 2012 – 5:03 pm

When doing the same analyses on many different response variables, it can be hard to keep track of the variables and have R print graphs and other output with the response variables of choice in the desired order. Here’s an example of the column names from a multivariate data set I’m working with:

> names(nts.summ)
[1] "date" "Season" "Plot" "DOC" "MBC" "DON" "MB.N"
[8] "NH4" "NO3" "moisture" "NAG" "CBH" "AG" "BG"
[15] "LAP" "BXYL" "PHOS" "PHENOX" "NetPerox" "UREASE" "logNH4"
[22] "logNO3" "logLAP" "logPHENOX" "logNetPerox" "microbialCN" "extractCN" "tempmax"
[29] "tempmin" "tempmean"

To make these easier to work with, I keep a character vector with the response variable names in the desired order (variables) and then create a reference vector (y) with a short for loop:

variables = c("DOC", "DON", "extractCN", "MBC", "MB.N", "microbialCN","NH4", "NO3", "NAG", "CBH", "AG", "BG", "BXYL", "PHOS", "LAP", "PHENOX", "NetPerox", "UREASE", "moisture", "tempmean")
y = NA; for(x in 1:length(variables)) {
  y[x] <- which(names(nts.summ)==variables[x]) }

> y
[1] 4 6 27 5 7 26 8 9 11 12 13 14 16 17 15 18 19 20 10 30

Now if I want to do a scatterplot matrix or, say, print the median for each variable, I can use the reference vector y.

splom(nts.summ[y])
for (x in y) print(median(nts[,x],na.rm=T))

The reference vector y can easily be modified as the analyses evolve. For example, if I decide to switch to a log-transformed version of a few variables, I just change their names in the code for variables, create a new y, and re-run.

Sorry, comments for this entry are closed at this time.