Word of the day: munging

September 15, 2011 – 9:54 pm

Sounds kind of disturbing, but apparently ‘data munging’ is the name for a task I do all the time: taking raw data and converting it into a form that is usable for analysis. I usually call this data processing, but now that I know ‘munging’…

For my munging, I tend to use a combination of Excel and R. Given a reasonably small data set, Excel is good at getting observations in rows, variables in columns, and sorting them. But all further subsetting, aggregating, dividing, etc is better done in R using packages like plyr and functions like reshape, aggregate, and the incredibly usefulĀ summarySE.

  1. 3 Responses to “Word of the day: munging”

  2. summarySE looks cool! I saw that the page that it is on is a summary page for ggplot2. Do you use ggplot2 ever? I have yet to move beyond the main plot interface, but it seems like packages like ggplot2 and lattice could be good formats to learn.

    By Aaron Berdanier on Sep 16, 2011

  3. I haven’t tried ggplot2, but it seems like a good package. I do use lattice a lot though. I think they are relatively comparable in capabilities although ggplot2 seems to be the hot new thing these days, so if I was starting over I might at least check it out. Its system of adding graphical elements (points, lines, bars, etc) in a modular way is perhaps a little more intuitive than the panel system in lattice. Here’s a nice comparison of code from both packages: http://bit.ly/p0fhFt

    By Anthony on Sep 16, 2011

  4. what a great combination of minging and dung.

    By Christopher on Nov 1, 2011

Sorry, comments for this entry are closed at this time.