Monday, January 2, 2012

R resources

Earlier we spoke about PITCHfx resources, and now we will learn about R resources. Well, what is R? Straight from wikipedia:

R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software,[2][3] and R is widely used for statistical software development and data analysis.[3]
If you are familiar with proprietary statistical packages like SPSS, SAS, and Matlab, then R is like an open-source variant. It's particularly similar to Matlab. And it's nothing like excel. In excel, you have nicely designed  graphical user interface (GUI) with buttons that you click on to do analysis. Yea, you can type in macros and things of that nature, but it's really quite different from R. And much less powerful.

While most of the work done in R is through a command console, there are GUIs for it that can be useful. I use two.

Rstudio

Rstudio is a nice integrated development environment (IDE) and it's light on memory usage. I use it whenever I'm working with massive sets of data, like 700,000 pitches. You don't need to fumble around with windows because everything is in one window. Download here: http://www.rstudio.org/

Deducer


Deducer is another GUI, and it's special because it gives you the power of the ggplot2 package in a GUI format. This means that for graphs, you don't need the command console. Of course it's still better to know the code as the GUI is kind of limiting, but it's not necessary here. Download here: http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual

When you're first starting off a nice friendly GUI can be very helpful, so I'd highly recommend downloading one of these (or both).

Now that you have it downloaded, you need to know what you're doing. A fantastic resource for leaning R is the art of R programming, available by PDF here. Not only will this tell you a lot about R, but about the best ways to use R. You can view other tutorials here: http://www.statmethods.net/about/books.html. While we don't need to know all of the capabilities of R for baseball analysis, it's best be familiar with the program before you start trying to make graphs and do analysis.

You also need to read Brian Mills website. He had a great sab-R-metrics series going earlier this year, most of which can be found here. Brian's website also combines R knowledge with baseball data, so it is similar to this website. All of his posts are tremendous resources, and Brian really knows his stuff. I would definitely recommend reading through all of his R related posts.

If you read through all of these resources and the PITCHfx resources, you'll be a PITCHfx and R master in a reasonable amount of time.


 

No comments:

Post a Comment