User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Buy MLB playoff tickets, plus 2011 World Series, 2011 ALCS tickets and NLCS game tickets. We also have Texas Rangers playoff schedule, tickets to Red Sox games and Yankees game tickets. Plus, buy Phillies baseball tickets, Tigers playoff tickets and the biggies like ALDS baseball tickets and 2011 NLDS tickets. |
Demarini, Easton and TPX Baseball Bats
|
AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets. |
Page rendered in 0.9554 seconds
57 querie(s) executed

Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
Or am I missing something?
Going back and gathering the data after the fact is definitely a pain, I did that to get Andy Sonnanstine's L/R splits for '05 and it took a lot of time to get that done.
I plan on doing something like this for Rays prospects in 2006 though I was only intending on getting data for specific players. I may be willing to expand that for every player on the Rays affiliates, it probably wouldn't take too much more time than getting specific players since I'll be going through the whole log anyway.
No, I didn't - I included hits. I also counted popups, and counted bunts separately (there were only a couple of those). Here's what I have for Hughes's outings:
Charleston:
4/8: 8 FB, 3 GB, 0 LD, 6 K, 1 W
4/13: 7 FB, 7 GB, 3 LD, 3 K, 2 W
4/18: 3 FB, 9 GB, 0 LD, 1 bunt, 2 K, 1 W
4/26: 3 FB, 5 GB, 0 LD, 1 bunt, 8 K, 2 W
5/1: 3 FB, 7 GB, 0 LD, 8 K, 3 W
5/7: 3 FB, 8 GB, 1 LD, 7 K, 2 W
5/15: 5 FB, 7 GB, 3 LD, 7 K, 1 W
5/20: 5 FB, 10 GB, 1 LD, 7 K, 1 W
5/25: 6 FB, 7 GB, 3 LD, 7 K, 1 W, 1 HB
6/1: 8 FB, 7 GB, 4 LD, 4 K, 1 W, 1 HB
6/7: 9 FB, 7 GB, 4 LD, 5 K, 0 W
6/12: 6 FB, 7 GB, 3 LD, 8 K, 1 W, 1 HB
Totals: 66 FB, 84 GB, 22 LD, 2 bunts, 70 K, 16 W, 3 HB
Tampa:
7/7: 5 FB, 2 GB, 0 LD, 1 bunt, 2 K, 1 W
7/13 (relief): 3 FB, 2 GB, 0 LD, 6 K, 1 W, 1 HB
7/20: 1 FB, 4 GB, 0 LD, 1 K, 0 W
7/25: 3 FB, 3 FB, 1 LD, 7 K, 1 W
7/31: 3 FB, 4 GB, 4 LD, 1 bunt, 5 K, 1 W, 2 HB
Totals: 15 FB, 15 GB, 5 LD, 2 bunts, 21 K, 4 W, 3 HB
--- MWE
-- MWE
-- MWE
It further has scripts to convert this data to the retrosheet format and load it into a mysql database. The code is in perl and, from a very cursory inspection, seems to be well written and well commented.
-- MWE
Meh was supposed to get a set of keys as well, but I don't think Dan's done anything about it yet.
-- MWE
Baseball America's 2006 almanac contains L/R splits for all hitters. This is, I believe, the first year they did this, so if anyone else is looking for that data for a 2005 player, I can probably look it up much faster than you can gather the data.
If you're looking for 2005 data, that is. However, if you're gathering 2006 data in real time, you'll have L/R splits for 2006 hitters long before BA publishes its 2007 almanac.
-- MWE
So is the TSN guide (with the exception of 2004, when everyone got hosed because of the problems with the minor leagues' official stat compiler).
-- MWE
You obviously have never seen me try to compile data manually.
I'm probably missing the point here, but I'm not sure why they'd do this, as they're providing the game logs free. I doubt they see much money (by MLB standards) in minor league stats, or they could be doing a lot more with the minor league web site, like providing splits themselves.
Mike,
I looked at a few Sox prospects game logs last year and it seemed like they recorded a lot of popups. Given that popups have some skill component in MLB pitchers it made me wonder that perhaps good minor league pitching prospects tended to have higher than normal popup rates.
If this type of a project gets off the ground it would be great to keep popups separate in case that is true.
Anyone know if there is any source for SB/CS for minor league catchers?
good thing i'm getting my bonus soon - what with tango/mgl's book, the o'reilly book, the sickels book, the ba 2006 book, and maybe even bpro's annual, i'm goint to spend a lot of money on baseball books in the next month.
I can likely make a small contribution, presuming spidering doesn't work.
And thanks for the email back. I'll let you know if I find it to be useful at all.
If some technological solution isn't forthcoming, I'd be willing to try to compile some data for the Pirates' farm system. I couldn't do it all, although I could try to find somebody to help.
Yes, it would.
-- MWE
However, since one can easily cut-and-paste the game log into a text file, I think there's a compromise to be reached between spidering and doing it all by hand. I'm working on a program that will take said text file and convert it as far as possible into a standardized pbp format. There are problems with that, too, though, because the game logs don't have all the necessary information--for instance, if a defensive player never makes a play and doesn't get replaced, his name never shows up in conjunction with his position. If a starting pitcher never makes a defensive play and isn't replaced, his name never shows up in the game log at all!
Thus, the box scores would appear to come in handy...but in the box scores (unlike the game logs) complete player names aren't given. So when Travis Smith throws a complete game, going to the box score to find "Smith, P" is less than ideal.
In other words, I don't think there's a 100% technological solution to this.
I do think however, that I can reduce the amount of work in "logging" each game into a database down to 120 seconds or less, maybe much less. For the ~10,000 minor league games of 2005, I don't think I'm going to do all those myself unless I can get that number down to 15 seconds or so. I may be able to create some sort of web interface where volunteers can enter the game logs and then, upon prompting, enter things like starting pitcher names and player positions that aren't given by the game log. Or it might be the time it takes to make that sufficiently user-friendly is greater than the time it takes me to do it all myself or work with a very small group of volunteers. We'll see.
Obviously I'm thinking out loud here. Perhaps less obviously, I'm a total amateur at this stuff; I wouldn't be surprised if somebody came along and said they could easily do all the tasks I'm saying aren't technologically doable.
Are you deriving that data from the pbp logs, or from team stats already compiled on the miLB site?
I'm 90-95% done with a parser to convert every play in an miLB pbp log into something standardized and more or less like retrosheet event codes. What I've got so far deals with upwards of 99% of plays...but as anybody who's spent time with retrosheet event codes knows, the last 1%, even the last 0.1% of plays can get tricky, what with stuff like "1(B)16(2)63(1)/LTP/L1"
The bigger challenge I'm looking at now is how to make the results optimally usable. Let's assume for a second I can turn all the available pbp logs into a database that more or less mirrors what miLB has on their servers, providing us with the trickle of data they let out. What do you want from that, and how do you want it?
Should my end goal be something that has syntax virtually identical to retrosheet, so that you can use RS's parsers to come up with csv files and the like in a format you're familiar with? I'm not sure I could deliver that, as I don't know how easily one could expand the RS syntax to include all minor leagues...and I don't really want to embark on yet ANOTHER system of labelling players so that we can tell one Angel Garcia (garca006) from another (garca007).
Would you rather I aim toward something in the vein of David Pinto's day-by-day database? I'd like to come up with that sort of thing no matter what ...if I do, what functionality do you want from it?
I see no reason why you could not use MiLB's ID numbers for players as the ID keys. And they are retrievable; E-mail me off-list if you don't know how to get them.
-- MWE
Nope.
-- MWE
I'm extracting it all from PBP logs and lots of excel equations. Like you said, 99.9% of the data can be extracted from the game logs, and that final .1% is human work for things such as complete games. However, I have gotten it done to just copying and pasting, for a whole season it takes about an hour, and 99.9% of the data will be spit out. As for the defensive stats, they seem out of my league right now and I don't see how to collect them at this moment.
For the past year, I have been collecting splits for the Red Sox minors on the soxprospects message boards, but I've had to insert things manually, which can take a lot of time. Realistically, I could only do 1 team without buring out and do the others after the season. However, I'm working equations to set up the lefty-righty splits and men on base, but they are still a work in progress. I'm hoping to get this set up by the start of the season.
Understanding that we don't have great bip-location/type data from those pbp logs, I'm pretty sure I've got my program set up to collect any defensive stats one can glean from retrosheet logs. (That is, I'm tracking who's on the field at all times, and as much of the bip-location/type as the logs gives us.) Speaking of which, I'm still testing, but I'm at the point where my program handles 99.5% or so of all plays, and outputs something virtually identical to a retrosheet log. So I guess it's doable. I wouldn't necessarily have charged ahead over the last few days if I had seen your post, but BTF has been inaccessible to me (and I've been sick) for the last three days or so. I'll shoot you an email so I can clarify what you've done and maybe we can help each other out a bit.
Not of which I am aware.
-- MWE
You must be Registered and Logged In to post comments.
<< Back to main