tag:blogger.com,1999:blog-7889639204875075260.comments2012-01-17T22:02:07.974-08:00pitchR/xJosh Weinstockhttp://www.blogger.com/profile/17633370535323168092noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-7889639204875075260.post-67810538449722952012-01-17T22:02:07.974-08:002012-01-17T22:02:07.974-08:00Thanks for pointing me to your post on data frames...Thanks for pointing me to your post on data frames. I understand that better now and I figured out that data were screwed up in R. When I used head(pitcher), the titles of the columns were separated by periods and each row of data had a "\t" in between each entry, leading me to believe the file was tab delimited. I fixed it using <br /><br />pitcher=read.csv("halladay.csv", sep="\t")<br /><br />Anyways, thanks again for this site and for your help tonight. I look forward to learning more as your site grows.MrBennettarhttp://www.blogger.com/profile/17365378036057750443noreply@blogger.comtag:blogger.com,1999:blog-7889639204875075260.post-90538966434224626072012-01-17T20:54:36.436-08:002012-01-17T20:54:36.436-08:00That's strange. Are you sure the data is not s...That's strange. Are you sure the data is not screwed up? To take a peak at the data, try<br /><br />head(pitcher) which will show you a few rows. There are also a few other ways to perform this step. Reading my post on data frames should help. Basically,<br /><br /><br />pitcher = subset(pitcher, !(pitch_type %in% c("IN", "")))<br /><br />is equivalent to<br /><br />pitcher = pitcher[pitcher$pitch_type!='IN' & pitcher$pitch_type!='', ]Josh Weinstockhttp://www.blogger.com/profile/17633370535323168092noreply@blogger.comtag:blogger.com,1999:blog-7889639204875075260.post-68004669757044777082012-01-17T20:43:27.257-08:002012-01-17T20:43:27.257-08:00I am now attempting to continue with the example a...I am now attempting to continue with the example and I am getting the following error message:<br /><br />Error in match(x, table, nomatch = 0L) : object 'pitch_type' not found<br /><br />after attempting this step:<br /><br />pitcher = subset(pitcher, !(pitch_type %in% c("IN", "")))<br /><br />I have done a little searching and reading but haven't found an answer (at least one that a beginner like me can understand). It looks to me like it is not recognizing the variable "pitch_type" from the halladay.csv file. Am I doing something wrong?MrBennettarhttp://www.blogger.com/profile/17365378036057750443noreply@blogger.comtag:blogger.com,1999:blog-7889639204875075260.post-75584802845457391182012-01-17T20:23:08.924-08:002012-01-17T20:23:08.924-08:00You only need to install it once, but you do need ...You only need to install it once, but you do need to load it every session. Thanks for visiting the site!Josh Weinstockhttp://www.blogger.com/profile/17633370535323168092noreply@blogger.comtag:blogger.com,1999:blog-7889639204875075260.post-33430395475490332422012-01-17T20:20:30.704-08:002012-01-17T20:20:30.704-08:00I am an R beginner so this question might be simpl...I am an R beginner so this question might be simple. Do I only need to load ggplot2 package once or do I have to load it each time I open R?MrBennettarhttp://www.blogger.com/profile/17365378036057750443noreply@blogger.comtag:blogger.com,1999:blog-7889639204875075260.post-37497490548454085142012-01-11T12:44:20.409-08:002012-01-11T12:44:20.409-08:00Thanks for the info/comments, Millsy. And about th...Thanks for the info/comments, Millsy. And about the ddply function and plyr package, this pdf shows some ways to make use out of it. http://www.jstatsoft.org/v40/i01/paper<br /><br />After getting comfortable with the package the amount of for loops and lapply/split that I use has declined a lot. <br /><br />Another way to generate the plots would be<br />pdf("filename.pdf")<br /><br />d_ply(pitcher, .(ump_id), function (x) {code to make plot})<br /><br />dev.off()<br /><br />And the way you are finding area with the splancs library seems better than what I'm doing, so I will have to check that out.Josh Weinstockhttp://www.blogger.com/profile/17633370535323168092noreply@blogger.comtag:blogger.com,1999:blog-7889639204875075260.post-53515415454655360322012-01-11T12:27:17.571-08:002012-01-11T12:27:17.571-08:00Also, sorry for the crappy code format through the...Also, sorry for the crappy code format through the comments. Ick!Millsyhttp://www.blogger.com/profile/05121540047611227512noreply@blogger.comtag:blogger.com,1999:blog-7889639204875075260.post-75839946410954130572012-01-11T12:26:41.238-08:002012-01-11T12:26:41.238-08:00Hmm. Interesting. Here's a few suggestions:
...Hmm. Interesting. Here's a few suggestions:<br /><br />1. I have found that a joint smooth is more appropriate than additive smooths for px and pz. UBRE score is lower for the former model than the latter (and the zone ends up being more interesting as well).<br /><br />2. If you're low on RAM, try using the "bam" version of "gam" in the mgcv package. This saves a lot of memory.<br /><br />3. I actually created a "for loop" for my plots. You can do it for lots of functions...prolly inefficient but it helps with making lots of similar pictures saved as files. Just do: <br /><br />myx <- matrix(data=seq(from=-2, to=2, <br /> length=100), nrow=100, ncol=100)<br /><br />myz <- t(matrix(data=seq(from=0,to=5, <br /> length=100), nrow=100, ncol=100))<br /><br />fitdata <- data.frame(px=as.vector(myx), <br /> pz=as.vector(myz))<br /><br />for(i in unique(umpire_id)) {<br /> <br /> d <- subset(data, data$umpire_id==i)<br /><br /> umpname <- paste(i, data$ump_FN, <br /> data$umpLN, sep="")<br /><br /> filename <- paste(i, "Heat.png", sep="")<br /> <br /> png(file=filename, height=850, width=750)<br /> <br /> fit <- gam(strike_call ~s(px,pz,k=51), <br /> method="GCV.Cp", <br /> family=binomial(link="logit"), data=d)<br /> <br /> mypredict <- predict(fit, <br /> fitdata, type="response")<br /> <br /> mypredict <- matrix(mypredict, <br /> nrow=c(100,100))<br /><br /> contour(x=seq(from=-1.5, to=1.5, <br /> length=100), y=seq(from=1, to=4, <br /> length=100), z=mypredict, lwd=2, <br /> lty="dashed", axes=T, levels=.5, <br /> labels="", labcex=1.3, col="darkred",<br /> xlab="Horizontal Location (ft., <br /> Umpire's View)", ylab="Vertical <br /> Location (ft.)", main=umpname)<br /> <br /> dev.off()<br /> }<br /><br />4. Once I have my model for an umpire, I could also add the simple code to calculate area in inches, or just do it manually:<br /><br />library(splancs)<br /><br />###Area Contour<br />clines <- contourLines(x=seq(from=-1.5, to=1.5, <br /> length=100), y=seq(from=1, to=4, <br /> length=100), mypredict)<br /><br />ump_area <- round(with(clines[[5]], <br /> areapl(cbind(x,y)))*12, 3)<br /><br />***<br />Anyway, good stuff. I'll have to play with "ddply". My code when doing crosstabs and stuff like that is really awful and inefficient. I have reached the limits of "tapply".<br /><br />:-)Millsyhttp://www.blogger.com/profile/05121540047611227512noreply@blogger.com