R tip: Sorting and keeping track of many response variables
January 23, 2012 – 5:03 pmWhen doing the same analyses on many different response variables, it can be hard to keep track of the variables and have R print graphs and other output with the response variables of choice in the desired order. Here’s an example of the column names from a multivariate data set I’m working with:
> names(nts.summ)
[1] "date" "Season" "Plot" "DOC" "MBC" "DON" "MB.N"
[8] "NH4" "NO3" "moisture" "NAG" "CBH" "AG" "BG"
[15] "LAP" "BXYL" "PHOS" "PHENOX" "NetPerox" "UREASE" "logNH4"
[22] "logNO3" "logLAP" "logPHENOX" "logNetPerox" "microbialCN" "extractCN" "tempmax"
[29] "tempmin" "tempmean"
To make these easier to work with, I keep a character vector with the response variable names in the desired order (variables
) and then create a reference vector (y)
with a short for
loop:
variables = c("DOC", "DON", "extractCN", "MBC", "MB.N", "microbialCN","NH4", "NO3", "NAG", "CBH", "AG", "BG", "BXYL", "PHOS", "LAP", "PHENOX", "NetPerox", "UREASE", "moisture", "tempmean")
y = NA; for(x in 1:length(variables)) {
y[x] <- which(names(nts.summ)==variables[x]) }
> y
[1] 4 6 27 5 7 26 8 9 11 12 13 14 16 17 15 18 19 20 10 30
Now if I want to do a scatterplot matrix or, say, print the median for each variable, I can use the reference vector y.
splom(nts.summ[y])
for (x in y) print(median(nts[,x],na.rm=T))
The reference vector y
can easily be modified as the analyses evolve. For example, if I decide to switch to a log-transformed version of a few variables, I just change their names in the code for variables
, create a new y
, and re-run.
Sorry, comments for this entry are closed at this time.