User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Buy MLB playoff tickets, plus 2011 World Series, 2011 ALCS tickets and NLCS game tickets. We also have Texas Rangers playoff schedule, tickets to Red Sox games and Yankees game tickets. Plus, buy Phillies baseball tickets, Tigers playoff tickets and the biggies like ALDS baseball tickets and 2011 NLDS tickets. |
Demarini, Easton and TPX Baseball Bats
|
AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets. |
Page rendered in 0.2713 seconds
54 querie(s) executed

Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Harold Posted: March 21, 2007 at 06:35 PM (#2315398)I haven't yet digested it yet, but RB did a great job of using Google spreadsheets to present the data.
The heck? . . . . OK, I think I got it fixed now. (checks). It's coming up 2007 now, at least on my computer. I think I know what I did wrong.
Looking it over, the standard deviation is almost always 12-13 wins. Exceptions: 14 for the A's in PECTOA. Yanks 14 in ZiPS. Toronto 14 in ZiPS. Indians 14 in ZiPS. LAA 14 in ZiPS. So I guess ZiPS has the highest standard deviation. The NL looks wide open all the way around.
Other than that, don't tell Verducci that the Yanks do this well. He may have an apoplexy.
ZIPS seems to produce the most clustered mean results however which probably either means that it regresses more towards the mean then the other projections or it uses quite different playing time distributions (I know Dan doesn't project playing time for his published projections here but assume he must for the DMB build).
I'm suprised the White Sox do so poorly in all the systems though I guess that "validates" the surprising PECOTA projection. Anybody know what's going on there? Massive injury projections? Bullpen disaster? Assuming Erstad, Pod, and Uribe will each get 700 PA?
AZD: 11.0 win difference between PECOTA (89.0) and DMB (89.0). PECOTA is (by far) the outlier. Both CHONE & ZIPS are at 82.9.
STL: 10.7 win difference between ZIPS (90.4) and PECOTA (79.7). ZIPS is the big outlier.
TBD: 9.9 win difference between Pecota (77.5) and ZIPS (67.6). Pecota's the outlier of the four.
DCN: 9.1 win difference betwen DMB (75.4) and PECOTA (66.3).
Tor: 8.8 win difference between DMB (89.1) and PECOTA (80.3). I should note that CHONE is in virutal agreement with PECTOA and ZIPS only a little ahead. DMB's the real outlier.
Bal: 6.8 win difference between ZIPS (78.7) and CHONE (71.9). I think it's funny that Szym's system is as optimistic as it gets toward Baltimore.
OAK: 6.5 win difference between ZIPS (86.5) and PECOTA (80.0). ZIPS is the outlier.
HOU: 5.8 win difference between CHONE (82.0) and ZIPS (76.2). ZIPS is the outlier.
SFG: 5.7 win difference between CHONE (82.4) and DMB (76.7). CHONE's the outlier.
MIL: 5.6 win difference between PECOTA (84.4) and ZIPS (78.8). PECOTA's the outlier.
BoX: 5.6 win difference between PECOTA (91.2) and DMB (86.5).
ATL: 5.6 win difference between ZIPS (87.3) and PECOTA (81.7).
NYY: 5.3 win difference between DMB (97.4) and ZiPS (92.1).
DET: 5.3 win difference between ZIPS (89.6) and PECOTA (84.3).
FLO: 5.1 win difference between PECOTA (77.3) and ZIPS (72.1).
Cle: 4.7 win difference between DMB (91.5) and CHONE (86.8).
LAD: 4.6 win difference between ZIPS (85.6) and PECOTA (81.0). ZIPS's the outlier.
LAA: 4.6 win difference between CHONE (87.5) and ZIPS (82.9).
NYM: 4.6 win difference between ZIPS (87.1) and DMB (82.5).
SEA: 4.6 win difference between DMB (77.8) and PECOTA (73.2).
CIN: 3.9 win difference between DMB (75.8) and ZIPS (71.9).
MIN: 3.8 win difference between CHONE (91.5) and ZIPS (87.7).
PIT: 3.8 win difference between PECTOA (75.8) and ZIPS (72.0).
CWS: 3.8 win difference between DMB (77.4) and PECOTA (73.6).
TEX: 3.6 win difference between PECOTA (80.6) and DMB (77.0).
COL: 3.6 win difference between PECOTA (80.2) and DMB (76.6)
SDP: 3.3 win difference between DMB (88.0) and ZIPS (84.7).
CHC: 3.0 win difference between ZIPS (85.5) and DMB (82.5).
PHI: 2.9 win difference between PECTOTA (87.6) and DMB (84.7).
KCR: 1.8 win difference between PECOTA (65.5) and CHONE (63.7). Incredibly, CHONES's actually an outlier on that one.
The latter two, along with assuming that the rotation will be mediocre at best.
Most systems predict very few injuries, which is one place they tend not to work, and DMB's injury simulator never has a player out for more than a few games.
Offense:
CHONE says tied with Detriot for 10th in runs scored with 782.
DMB says 6th in runs scored with 829.
PECOTA says 9th in runs scored with 776. Only 11 runs out of 12th place (Oak, LAA, and, gack, KC).
ZIPS says 8th in runs scored with 787.
Defense:
CHONE says 10th in runs allowed with 840.
DMB says 12th in runs allowed with 877.
PECOTA says 13th in runs allowed with 867. (Thank God for KC!)
ZIPS says 10th in runs allowed with 837.
Here's ZIPS White Sox projections. The first couple of comments in the thread mention how pessimistic the system is on the Sox pitchers.
I independently found that to be very amusing as well! 79 wins would be depressing as hell because it would convince the team they're on the right track. Of course, 60 wins would also do that because the front office is retarded.
I think the right number of iterations is "As many as you are willing to wait for". This is pretty much a Monte Carlo simulation, and I've been doing a few of those with 10000 iterations, however it depends on how much of a computer hog DMB is. I would guess that 1000 could be pretty useful, though.
Too low to be definitive? Of course. Too low to predict the AL Central? Yeah. Too low to be useful? No way. Most simulation studies in stat journals that I've referred to in my work use 500-1000 cases per condition. That's plenty to get typical behavior. The real problem is the projections ... the players are not going to display the same true talent that they are projected for, even for the best projection systems, and that will cause some additional disturbance from the projected standings that no number of interations will account for.
My machine is from 2000 so it takes me a while to run a sim, but newer machines are alot quicker. I don't do much multimedia, so I never felt the need to upgrade.
I conduct survey research for a living. I am constantly asked the question, "How can it be that you can talk with 500 or 1,000 people about something, such as what do you think about the President or the war, and have it be representative of the population as a whole. While there is a margin of error, the margin of error for 1,000 surveys is actually quite small (3.1% at a 95% confidence interval). The real problems to watch out for are such issues as question wording, order effect, the quality of the sample, etc.
Same thing holds true here. If the assumptions are valid, then 1,000 is more than enough to get a sense of what is going on. If there are problems with the assumptions, such as injury rates, you can do a million iterations and it won't be very useful.
Geez... no SUPREME COMMANDER for you!
The combined dataset has the A's at 83.6 wins, so it doesn't seem like either is an outlier. But then I averaged the A's win totals from the four 1000-sets (80.8, 83.8, 80, 86.5) and got 82.8. What am I missing?
It's an interesting project and I enjoy looking at it every year.
That said, as noted above, this is only as good as the assumptions going into it, too. So what it really means is "given the playing time assumptions and the error bars in the underlying projections, here's what it looks like when you play 1000 seasons" That means there's a huge amount of variation, not just the wins listed above but additionally the layer of projection error and PT estimation error, too.
Not that I have a better way, mind you, just noting the issues.
I never got into games more complex than Minesweeper.
Well, the number of iterations you run depends on the model (some require a set number of iterations plus a "burn in" period; some require running until a diagnostic parameter gets to a certain value, etc.). Specifically, I'd be interested to know if these were deterministic models that just ran based on initial data or if they (preferably) were some sort of stochastic/Bayesian models that "learned" based on simulated data.
But, yeah, I've never seen iterations below, say, one million (and that's for simple data sets, ie the evolution of a single gene based on population frequency, not something as complex as 30-odd teams w/players and stats etc. etc. etc.)
DMB is deterministic, right?
Sloppy math on my part. Division titles and wild card counts are correct, but I had the standard deviation and frequency calculations in the same columns I'm averaging the wins from. That's fixed now if you refresh.
That said, as noted above, this is only as good as the assumptions going into it, too.
Hence the six different disclaimers I wrote at the top.
The one thing I did differently this year is to use Baseball Prospectus's depth charts for playing times, which account for projected injuries and playing time for bench and rotation scrubs. I kept random injuries in there as well. Playing time should be just about the same across CHONE, PECOTA, and ZiPS. I didn't mess with Diamond Mind's depth charts.
As far as running more than 1000 iterations, it's just not viable. It takes my computer, which is a beast, about an hour to get through 25 simulations, and that's automated. To do the 4000 runs took about a week. Given the short runway between when the projection disks are available or when I finish building them and when the season starts, this is probably the practical limit.
No. If it was, you'd have a standard deviation of 0. You wouldn't see 66% of a team's wins falling in a range of 12-14 wins.
What's also funny is that CHONE has the best results for Chone Figgin's team. We try and build objective systems, but maybe somehow our biases can be completely erased. Or maybe its just a funny coincidence. Anyone know what Nate and Tom's favorite teams are?
And SG, awesome. That's an incredible effort.
I have a question - do you have a good way for running so many simulations and tabulating the results without a ton of manual legwork?
I wrote a program that does most of the work for me, and some spreadsheets with macros that pull everything together when it's done. It's a little clunky but it works ok. Check this post out for more details on how to run DMB in batch.
http://DodgerSims.blogspot.com/
vr, Xeifrank
If this turns out to be the final outcome of the 2007 season, the bolded part may constitute the understatement of the year.
Here's hoping ...
The blowout has a RS/RA of 782/772, which is 59 runs worse than last year (820/751). The 38-run drop in offense I can see. But it seems the pitching staff has greater depth and upside than in '06, so I don't get the +21 there.
You must be Registered and Logged In to post comments.
<< Back to main