| |||
Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand Saturday, January 21, 2006McCoy: Introducing The Fiato-Souders Intrinsic Analysis MatrixAs I’m still bogged down with the equally arduous task of gravelling my way through the Thunderegg 213 Song CD Box Set...seansatt gives this a go…“reworked metric and gives us a first look by using it to rank the best and worst teams of all time, plus he explains in a general sort of way how his metric works.” One of the first missions of almost every sabermetrician is to determine a preferred strategy for rating the performance of baseball teams and players while keeping in mind the many complicating factors that distort statistics like wins and losses and run differentials. There is a host of available data today that makes analysis of teams possible, but some understanding of the dynamic way in which those statistics combine to produce wins and losses is required, and this is not a simple matter. Empirical analysis has for years centered on the idea that averages tell enough of the story to be used as the backbone of any system designed to adjust raw statistics to account for the context in which they occured. This document will explore the problems with empirical sabermetrics and introduce a new tool designed to bridge the gap between the intrinsic skill of the players, and the real world statistics that define them. |
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsNewsblog: Keri: the History of the Eephus
(19 - 12:29am, May 23) Last: Squash Newsblog: AT&T Ad Showing First Woman In Major League Baseball Draws Strong Reactions (246 - 12:24am, May 23) Last: David Nieporent (now, with children) Newsblog: Is Bryce Harper D.C.’s most popular active athlete? (57 - 12:22am, May 23) Last: Hank G. Newsblog: Holding Separate Elections For Player and Non-Player Candidates Would Greatly Improve the Hall of Fame's Era Ballot Vote Process (14 - 12:21am, May 23) Last: Squash Newsblog: Dodgers suspend Erisbel Arruebarrena for remainder of 2015 season (5 - 12:17am, May 23) Last: Crispix reaches boiling point with lackluster play Newsblog: OMNICHATTER 5-22-15 (70 - 12:17am, May 23) Last: boteman asks Where's My Ring? Newsblog: Remembering How Mike Trout's Road to the Red Sox Was Detoured (13 - 12:15am, May 23) Last: Squash Newsblog: BBTF 2015 Shooty Memorial Softball Meet-up! (240 - 12:02am, May 23) Last: Howie Menckel Newsblog: OT: Monthly NBA Thread - May 2015 (1910 - 12:01am, May 23) Last: fuckin watermelons coming (nick swisher hygiene) Newsblog: 45 Ruth Pics from Library of Congress (15 - 11:55pm, May 22) Last: Hank G. Newsblog: OT: Soccer March 2015 (1040 - 11:44pm, May 22) Last: Textbook Editor Newsblog: After Harper's ejection, Span lifts Nats past Yankees 3-2 (68 - 11:14pm, May 22) Last: Squash Newsblog: OTP - May 2015: Jeb Bush would like to watch baseball with Teddy Roosevelt and Pitbull (3691 - 11:11pm, May 22) Last: Howie Menckel Newsblog: Brewers pitcher Will Smith ejected for using foreign substance (18 - 10:51pm, May 22) Last: The Anthony Kennedy of BBTF (Scott) Newsblog: Red Sox have quite a stockpile of talent in Greenville - Sports - The Boston Globe (30 - 10:49pm, May 22) Last: Commissioner Bud Black Beltre Hillman |
||
Page rendered in 0.3349 seconds |
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Best Regards, President of Comfort, Esq. Posted: January 21, 2006 at 04:26 PM (#1831415)Rob Base is gonna be pissed.
I thought this was pretty interesting. I’m sure the name “Fiato-Souders Intrinsic Analysis Matrix” will be very appealing to a lot of sportswriters and we will see some variations on it:
“Rosenthal-Gammons Rumor Verification Matrix”
“Rogers-Jenkins Clubhouse Chemistry Quantification Matrix”
“Lupica-Mariotti Character Measurement Matrix.”
This doesn’t pass the smell test.
1. Like the Baseball Almanac study, there seems to be something basicly wrong with the methodology that skews the extremes towards older teams. Or are you going to tell me there were as many great teams in the 1900’s as in the past 36 years? As many terrible teams in the 00’s as there have been in the past 46 years? Especially since they brag “This represents the first complete effort to separate the intrinsic abilities of teams from their contexts.” The ‘76 Reds (let alone ‘75) couldn’t beat the ‘02 Pirates straight up?
2. They seem to be quantifying defense somehow. They make the statement “The 2001 Mariners filled their outfield with defensively gifted players and loaded up on flyball pitchers to take advantage of the dead air in center” when talking about building a team to the park, yet how do you properly quantify that (defense is in their equation) enough to be “able to reproduce real-world statistics with a high degree of accuracy”?
3. They include scoring since 1874, but only rank teams since 1900 because “(p)rior to 1900, the intrinsic un (sic) differentials start to take off in magnitude owing largely to the wildly oneven (sic) distribution of talent, unstable franchises, shorter schedules, and higher run scoring environments…” Wouldn’t it be more correct to only use the data from the years ranked? I would also say that as 29 of the top teams (58%) came in the first 40 years of those ranked, a case could be made that there was still a wildly uneven distribution of talent.
At the bottom I believe he states that he did not do a league quality adjustment so that teams from the early era will do well because they are racking up runs and preventing runs against bad teams. He hasn’t figured out a way to adjust for league quality that is acceptable to him.
I thought this was pretty interesting. I’m sure the name “Fiato-Souders Intrinsic Analysis Matrix” will be very appealing to a lot of sportswriters and we will see some variations on it:
That’s hilarious. That’s exactly what I thought.
At the bottom I believe he states that he did not do a league quality adjustment so that teams from the early era will do well because they are racking up runs and preventing runs against bad teams. He hasn’t figured out a way to adjust for league quality that is acceptable to him.
Which is exactly why I did not think this study was all that helpful. To me, it’s all about quality of competition, not level of dominance within a weaker league.
Thats why to me, the 98 Yankees seem to stand out head and shoulders above all the other teams. They rank near the top in both studies, but IMO, played in a far more competitive league than the earlier era teams, and of course closed the deal with their playoff dominance.
And the 98 team did not even have the highest payroll in their division either.
I’m a little concerned that this method has too many free variables to hold up well.
In this method, each individual team can be characterized by Offense Intrinsic runs above average per game and by Defense Intrinsic runs above average per game. That’s eminently fair and reasonable. But each team also has assosciated Offensive park reactions and defensive park reactions, which are presumably one free variable apiece per team per park per year. Or do I have this wrong?
He says that in modern baseball, there are over 1000 unique variables per year in the modern era, which works out to ~33 variables per team per year. With interleague play, each team will play in about 15-20 parks per year, so one offensive reaction and one defensive reaction variable per park puts us right in his range of free variables.
But this is madness! For a given year, you’re solving a linear system with 1000 free variables and only 60 constraints (runs scored+allowed for each team). You could get millions of answers which are all equally valid.
Of course, he hedges a little bit by explaining that the offensive and defensive reaction variables are complicated, so I could just be completely missing the boat here. But it seems to me that a much better linear system would be to find linear least-squares values for the Park Intrinsic Runs above average per game for each park in the history of the major leagues and the League Intrinsic Runs above average per game for each year, and start the analysis over fresh from there.
If you start from that point, you’ve got a much more constrained system, and you should be able to trust the results much more. In fact, if you inserted constraints like
League runs scored = avg runs scored + Sum_{parks} Park runs above average + Sum_{teams} Team Offensive runs above average - Sum_{teams} Team Defensive runs above average
you could reduce the free variables to the individual park factors. And if you took the trouble of measuring how many runs were scored in each park each year, you could even constrain those.
The idea of approaching baseball history as one big linear system is a good one and has real value, but the offensive and defensive reaction variables are a very bad idea, unless they’ve got some huge number of constraint equations that they didn’t mention in the article.
Oops. The Constraint for Leagues should have the League Intrinsic runs included.
League runs scored = avg runs scored + League Intrinsic runs scored + Sum_{parks} Park runs above average + Sum_{teams} Team Offensive runs above average - Sum_{teams} Team Defensive runs above average
I think they may be on to something interesting, but I have concerns similar to Zach about all the variables. What we need to see are some DETAILS on the actual calculations for at least one team.
The 2002 Angels were good - the 31st best team ever? I thought they were just lucky.
The 2001 Mariners were the greatest chokejob in history? Okay.
1962 NYN -1.783
2002 DET -1.784
1936 PHA -1.812
1916 PHA -1.834
1939 SLA -1.849
1938 PHI -1.873
1911 BSN -1.887
1903 SLN -1.901
1996 DET -1.925
1932 BOS -1.930
2004 ARI -1.940
1954 PHA -1.974
1939 PHA -2.009
1915 PHA -2.024
2003 DET -2.112
So my 2003 Tigers were indeed the Worst Team Ever (OK, since 1900). And the team the year before that was 14th worst. And the ‘96 squad was 7th worst. (And all those teams were worse than the infamous ‘62 Mets.) Kinda explains my bad moods, eh?
Ah, well. I could’ve grown up in Philadelphia…
So my 2003 Tigers were indeed the Worst Team Ever (OK, since 1900). And the team the year before that was 14th worst. And the ‘96 squad was 7th worst. (And all those teams were worse than the infamous ‘62 Mets.)
You must have some think callouses on your wrists….
From the Fiato
“FSIA basically boils down to the equation that for, any game,
OIRAAPG - DIRAAPG + LIRAAPG + PIRAAPG + OPR - DPR = Actual runs above all-time average
That is, run scoring is a combination of:
- The all-time average (currently 4.53-ish)
- One team’s offensive strength
- The other team’s defensive weakness
- The home team’s league conditions
- The park conditions
- The first team’s offensive reaction to the particular park
- The other team’s defensive reaction to the particular park
Each of these, except for all-time average, is a variable. So, in the modern league, there are 30 team offensive strengths, 30 team defensive strengths, 2 league variables, 30 park variables, a maximum of 900 offensive park reactions (but about 1/3 of these are 0 and are consequently dropped altogether), and a maximum of 900 defensive park reactions.
So, we’ve got, in practice, about 1,180 (or so) variables for a single year. So FSIA constructs a matrix of linear equations, one for each variable. A different variable is made dependent for each equation, and the other variables become weighted sums of the other variables that affected this variable over the course of the season.
Now, the only problem is that, if two teams have the same schedule (which they did up until a few years ago), you won’t be able to solve the system of equations because some equations will look exactly the same. This condition is known in linear algebra as singularity, and I’ll give you a small example to illustrate:
2x + 2y + z = 1
2x + 2y + 3z = 3
Here, we can find out that z = 1. But then we’re left with:
2x + 2y = 0
2x + 2y = 0
That’s essentially one equation with two degrees of freedom; you don’t know what x and y are, nor do you have any way of finding out, except that one is the negative of the other. That’s obviously not helpful.
As such, this is where Bayesian probability comes in. This was one of the key points in Dr. Colley’s paper describing his matrix that is now part of the BCS (the paper is available in its entirety from http://colleyrankings.com/matrate.pdf ). The rule of succession in essence concedes that there is a possibility that we have not seen a sample representative of the entire distribution of teams/parks/etc.—past, present, and future.
In the case of teams, before we have seen a team play, and without any information about that team, we can only assume a priori that the team is somewhere between .000 and 1.000 with equal probability for any point between 0 and 1; the average of this uniform distribution is .500, and this is what we mean when we say that we are assuming the average a priori. As we process games, we realize a posteriori that the team is better or worse than .500, but the possibility still remains, albeit decreasingly so, that we have seen something not representative of the team’s strength.
To cut a long story short, the gist is that, for all variables, we “add in” a single game where run-scoring/allowing is equal to the all-time average. This represents our a priori assumption of the average until data proves otherwise. Note that this is a fundamental difference between Bayesian probabilistic statistics and frequentist statistics; the latter would attempt to process only the actual data; in the latter, you might use something like a t-stat to account for small sample sizes, but Bayesian probability addresses this by incorporating the initial fact (that all run scoring has averaged out to ~4.76 R/G) into the data itself.
This extra game is added ONLY to the dependent variable in each equation. Since we are assuming the average, the right-hand side, which is runs above average, remains the same. So the effect is to pull all teams closer to 0. Obviously, for teams that play a full schedule of 154 or 162 games, the pull will be small. But for teams that only play 3 or 4 games in a park, their park reactions will experience a significant center pull because there just simply isn’t enough data to deviate too far from 0, without extreme cases like 22-0 blowouts (and even then, those tend to be discounted heavily).
Note that, in my original matrix, I have the signs of defensive variables reversed, so that positive always points toward increasing run scoring, not increasing strength.
Once you have this system of linear equations set up, you can just solve it with a good linear system solver. I use an LU solver in C++, although theoretically the matrix is symmetric positive definite, so you could use Cholesky factorization if you wanted. Don’t attempt to do this by hand; even the smallest league (8 teams) will have on the order of 89 independent equations.
I do realize that, unfortunately, very few people actually have at their disposal an equation solver capable of handling 1,000+ variables efficiently. However, let’s take a VERY simple example. Suppose that team A plays team B twice, beating them by a combined score of 12-6. Suppose that the all-time average is 5 R/G (just to make the math a bit easier). Let’s also ignore park reactions for now, since those don’t make sense for a single matchup. Let AO = A’s offense, AD = A’s defense, BO = B’s offense, BD = B’s defense, P = park, L = league.
We have that:
3AO + 2BD + 2P + 2L = (12 - 2 * 5)
3BO + 2AD + 2P + 2L = (6 - 2 * 5)
3AD + 2BO + 2P + 2L = (6 - 2 * 5)
3BD + 2AO + 2P + 2L = (12 - 2 * 5)
2AO + 2BO + 2AD + 2BD + 5P + 4L = (18 - 4 * 5)
2AO + 2BO + 2AD + 2BD + 4P + 5L = (18 - 4 * 5)
As you can see, a simple two-team matchup requires 6 equations, 10 if you were to try to do park reactions. Anyway, you should notice that the park and league have 4 games, not 2. That’s because each offense-defense pairing is considered a “game”, and what A’s offense does against B’s defense is completely independent of what B’s offense does against A’s defense.
In traditional analysis, we would say that team A has a +6 run differential, while team B has a -6 differential. But solving the FSIA system above, we find that:
AO = +.523
BO = -.677
AD = -.677 (or +.677 in terms of goodness and not run allowing)
BD = +.523
P = -.154
L = -.154
What this says is that A’s offense would probably score, on average, 5.523 R/G against an average team in an average park under average league conditions, while B’s offense would score 4.323 R/G in that same environment, etc. Note that the system can’t differentiate between the park and league because, given the information that we have, both contributed to the exact same set of results under the exact same conditions, so it assigns the same number to both.”
Thanks for the clarification. I hadn’t realized that you looked at every matchup of every two teams, home and away (presumably home/away scores give the park reactions)
I’m still not convinced that the park reactions belong in the system, though. For the away team at least, you’re introducing an extra variable to explain the residual of run scoring in a foreign park after you’ve subtracted average offense, league offense, park factor and both teams’ intrinsic offensive and defensive strengths. For a sample size of something like 10 games. I have trouble believing that the figure it gives for foreign park reactions will be anything close to reliable.
It would be very interesting to see a graph of a team’s park reactions as a function of time, both for their home park and some away park that they commonly play at. Say, the Yankees at home and at Fenway park? Both the Yankees and the Red Sox have tended to have pretty stable rosters, so if the park reactions were measuring something real, you’d expect to see a pretty continuous line.
I can see why you might want to avoid something like least squares optimization for systems of equations of the size you’re considering.
You probably shouldn’t be optimizing here, you should be generating a sample from the posterior (assuming that your posteriors have finite variance—which they probably do, but it’s always a concern with Bayesian problems). In the messed up world of Bayesian statistics, it might be *easier* to generate samples from the posterior distribution than to optimize your linear system, as you can do them block-wise rather quickly and the space of the effects that you’re looking at is likely to be very constrained (especially if you would commit to parametrized distributions for your effects, which would smooth out the geometry of the mean surface).
And my biggest concern is that there is no way you should be using all 1000 variables in the regression. Given the inherent variance due to pure randomness, you’re going to be overexplaining effects leading to very wide credible intervals for the various coefficients. You would see that lack of precision if looked at the expectation surface that you’re maximizing. I would guess that sucker is very, very flat with that many parameters. You would actually be better served by using as few covariates as you can get away with…
Email me if you’re interested in following this up and I can contain my Bayesian geekdom to a less public forum: .(JavaScript must be enabled to view this email address)
...you should be generating a sample from the posterior
Why, I do that all the time! Or did someone think of that joke already?
You must be Registered and Logged In to post comments.
<< Back to main