You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Monday, August 19, 2002Makin? MoneyEstimating teams? revenue using a few simple numbers. Estimating Teams’ Revenue Using a Few Simple NumbersThese days everybody seems to be talking about the money end of Baseball: how much the players make, how much the owners make, how they should split things up, etc.? So I thought it would be interesting to see if there was a relatively simple way to estimate how much a MLB team could expect to make given a few variables (like the size of their market or how much they win).
Now I?d like to claim that this was an idea I produced from whole cloth, but it isn?t. Ron Johnson was doing this sort of thing a few years back and when last years numbers came out from MLB, I decided it might be neat to try and redo from scratch what he had done. I backburnered it for a while, and after some math work, I came up with something decent enough to put out there for public consumption.
WARNING: The following piece contains numbers, mathematics and other things some consider hazardous to their health. Proceed with caution.
A slight difference in mine and Ron?s system (if I remember his correctly) is that I?m not trying to estimate a particular team?s revenue to the most exact dollar possible, but rather to have a system which uses overriding general principles that apply to all teams.
After doing a lot of linear regression and other mathrelated work, the following are the variables the system uses: Population of the Team?s Metropolitan Area (as defined by the 2000 U.S. and Canadian Census Bureaus, Number of Teams in that Metropolitan Area, Per Capita Income of That Metropolitan Area (a must figure that Ron also included in his system), a team?s winning percentage the previous year and the four years previous to that as well, the number of home playoff games played this year and a yes or no variable as to whether the team?s home park opened less than two years ago.? The system is essentially a simple linear regression. It kind of needs to be a little simple so we can get a clear view of how each of these variables work since we?re interested in more than just the final number.
The first thing I did to make the system more useable is to combine all of those winning percentages into a single number. This doesn?t change the results at all, and is important in that a single number is much easier to work with when determining various figures (e.g. how much more revenue a single win produces). Anyway, with the 2001 revenues as our example year, the single number winning percentage is calculated as:
W% = (2000 W% * .25) + (1999 W% * .225) + (1998 W% * .2) + (1997 W% * .175) + (1996 W% * .15)
You will notice that the 2001 winning percentage is not listed. According to the numbers I have, it seems as if for the current year, only how many home playoff games the team plays makes a significant impact on revenue. However, the 2001 winning percentage would affect the revenue estimates for the years following, so the effect isn?t gone just delayed. This both makes sense, and of course follows what most people have said on the subject: that when a team sees sudden improvement, the biggest gains in attendance come the following season.
Once we have that, we can now use this formula for the revenue estimate:
Team Revenue = (W% * $430,169,580) + (Metro Population * $3.46)  (Teams in Metro Area *? $27,962,685) + (Home Playoff Games * $2,446,043)? + (Per Capita Income of Area * $2,655.60)  $160,287,379 + ($22,906,159 if the team?s stadium is less than two years old).
The first interesting thing to note is that if I were to father a child tomorrow, this would mean that the Diamondbacks could expect to earn an extra $3.46 next year.
The correlation coefficient between these revenues and those estimated by Forbes the last three years is .89 with an adjusted rsquared (for all the math geeks out there) of .78. The standard error of the estimate is about $18.3 million.
I know you?re saying, ?So what? What does this accomplish?? Well, a couple of things. First off, we can use this to estimate how much money an extra win generates for a baseball franchise. In a simple fashion, we can take 1 divided by 162 and multiply that by $430,169,580 and get about $2.655 million per additional win. Now before we stay put with this number, there are a bunch of questions to ask:
The first question is the big one. I tried a series of various functions trying to see if the amount of revenue per win varied across different markets (IE, is a win more valuable in New York than Kansas City). I tried as hard as I could, but I kept coming back to the same conclusion: if there is any change in the value of a win across markets, it?s that a win is worth more in the smaller market but to an extent that it is irrelevant and also not statistically significant. In layman?s terms, using the same revenue per win figure as a constant across all teams will give you as good an estimate as any other method I could come up with.
The second question is easier to handle. Though a win this year only goes toward a quarter of the W% figure for next year, it also goes to the year after?s, and so on. Still it?s money that, if compared to player?s salaries, is spent now and paid back later, meaning its value is diminished a little. This really is minimal and is offset by the fact that generating a win now might have some effect on whether you host playoff games, so the reduction in value is likely at least offset.
Without the current revenue sharing system, it looks as if the revenue per win number would be up around $3.5 million. It?s a good number to know because without revenue sharing, a player worth 5 extra wins would be worth $17.5 million dollars and with the current revenue sharing he?s only worth around $13.3 million.
Another interesting use for the formula would be to figure out a ?base? revenue rate. What I mean by that is to figure out the estimated revenue for each team based only on those factors the team has absolutely no control over. For this system that would be the size of the team?s Metropolitan Area, the per capita income in the team?s Area and how many teams are currently in the area. By doing this, we can estimate what each team?s revenues would be if they were all exactly equal in terms of producing on the field, marketing, new stadiums, and other areas in which teams can directly affect how much they rake in. A standard complaint from the stathead community is that a lot of teams don?t do nearly enough to generate their own revenue and instead rely on handouts from everybody else. This method would tell us how much each team would make if they all made exactly the same efforts in this regard.
Below is a chart of all the Major League teams and some revenue numbers and estimates for the 2001 season. The actual revenue numbers themselves are from Forbes yearly estimates of Major League Baseball revenues. Those numbers are the first column. The next column are the revenues that the above formula estimated the team would bring in based on all of the factors listed. The third column is a listing of the team?s ?base revenues,? which I explained above as the estimated revenue if each team performed equally on the field. The final column lists the ?base revenues? subtracted from the team?s actual revenues for 2001. This works as a measure of how well the team is currently drawing from its market (a? concept Derek Zumsteg wrote about in an interesting Baseball Prospectus piece the other day). The table:
As you can see, the highest ?base revenue? figures belong to the two New York teams, and the lowest belong to the Expos (due to the fact that the Per Capita Income in their Area is easily the lowest of the 30 teams). Immediately you?ll notice that the teams from the same markets all have identical ?base revenue? figures. This is because the system does not currently distinguish between the Yankees and Mets or Cubs and White Sox as to how large their market is. Why?
There is some question as to whether the seeming ?dominance? of one team in every twoteam market is due to inherent advantages or due to one team simply doing a better job promoting itself than another. Let?s use attendance as proxy for a team?s popularity within a market and see how the twoteam markets shakeout. Since the A?s moved to Oakland in 1968, we?ll use that as the backend of this discussion. The question is, of the 34 years from 19682001, how often has the ?dominant? team in the market outdrawn the other team?
Yankees 16 ? Mets 18 Now, I?ll elaborate further, but obviously the team that immediately sticks out is the Angels, who have only outdrawn the Dodgers four times, with the last time being all the way back in 1974. It seems to me the Angels are the only team that might have a complaint that they inherently have significantly less of the Los Angeles market than the system credits them with. You could still make arguments for the A?s and White Sox being inherently disadvantaged, and there are reasons to believe this might be so, but I think the extent to which this might be true is usually way overblown and any ?dominance? appears to be mostly due to one team providing a better product in some way.
As such, I think I might consider putting an adjustment in for the Angels (after all Orange County isn?t quite the same thing as Los Angeles), since there?s hard data to support it, but the others look to me like they have at least almost as good a deal as their counterpart.
Speaking of two team markets, Baltimore and Washington D.C. are considered the same market by the U.S. Census and so they are considered the same by this system. As you can see above, the system estimates that a second team moving into the area would cost the Orioles around $28 million a year in revenues. This would likely mean that the value of the franchise would take a bit of a hit if those numbers did come to pass. Now you might then conclude that Angelos has a right to squawk (a right he exercises with alarming frequency), but in reality you?ll see that the ?base revenue? estimates think that the Orioles have the best nonNew York situation in the league. Of course you could say that?s only because D.C. is included in the market, but Angelos can?t have it both ways. Either D.C. is his market or it isn?t, but either way you slice it the Orioles would still have a decent situation if a team moved into the District. ?It would be roughly equivalent to that of the Chicago franchises.
One final note is to talk about how all of this stuff ties in with revenue sharing proposals.? Under the current system, the ?inherent? advantage for the two New York teams over the ?average? team according to the ?base revenue? estimates is around $37 million a year.? Now one could argue that figure should be the additional amount of revenue the Yankees should have to share, but there are problems with this line of thought. First of all, a New York team incurs more costs than the average MLB team (rent, nonplayer employees, taxes and so forth). Making them equal on the revenue side would make them disadvantaged on the profit side. So that has to be taken into account.? Also to be taken into account is the fact that buying a percentage of the Yankees is a hell of a lot more expensive than buying that same percentage of say the Pirates. Doesn?t the person who made that investment have a right to an expect an advantage in revenues based on the fact that he paid more to get a share of them? I think in those terms, that $37 million advantage of the Yankees doesn?t seem that overwhelming.? Furthermore, if you?re going to institute more revenue sharing, it would be wise to add a mechanism to strengthen the relationship between winning and revenue. It?s that relationship that does the most by itself to level the playing field, and it also encourages teams to put out better products. Again, I don?t see where these numbers indicate that increased revenue sharing is all that great of an idea. To me at least, the Yankees big revenues look to me to be half inherent advantage and half excellent exploitation of their market.
In short, I hope this little formula gives some insight into exactly what factors cause teams to make money. I haven?t seen Ron Johnson around lately, but if he pops by, I?d love to hear how my formula compares with his. I?d also like to hear any suggestions on additional factors I could include in the formula. Fire away!

BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2014
(8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Minor Genius St. Louis Cardinals Midseason Report (12  12:42am, Aug 10) Last: bjhanke 

Page rendered in 0.4136 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Ken Arneson Posted: August 19, 2002 at 12:41 AM (#605886)I wonder if this is due to other entertainment avenues that baseball competes with. There's alot more to do in NYC than in KC. If one market is 5 times as large, but there are 5 times as many things to do, the "marginal market" that each city is after is the same total number.
Voros: maybe getting data from the various Tourism boards might allow us to account for this. Just a thought...
Interesting that the two teams that replaced classic old parks with shiny new ones, Detroit and the White Sox, are on the negative side.
Perhaps the Red Sox should think more than twice about replacing Fenway.
Most of the other negative teams play on fake grass or in football stadiums. The anomalies are Anaheim and Baltimore, which Voros addresses above.
As to the national revenue, I would rather just see it subtracted out of the lefthand side. A team's share of the national revenue isn't a function of its population size, per capita income, etc. The lefthand side should really just be locallygenerated revenue.
Remember that by last year's numbers, the Montreal Expos brought in over $52 million in revenues from the league alone, so that one could assume that if a team were allowed to exist while generating _no_ local revenue, they would probably take in somewhere around the $55 million mark.
The second comment is that there are a few problems with the revenue/win is not linear argument: one is that representing it as linear makes analysis before the season or at the very start of the season possible. You might figure a team is an 84 win team before the year starts, but that really means that the team migh have chance to win anywhere from 69 to 94 games with the chances of any one win total occurring increasing as it converges to 84. Therefore getting at that "curve" precisely can't be done because you don't know whether two extra wins gets you to 90 wins or to 78 wins. So the curve is "flattened" out considerably when you take this into account. Another point is that the win% numbers only affect revenue in subsequent years in the model. The only aspect of current year success that affects the model are the number of home playoff games played. Finally, and most importantly, curves that were bent, logarithmic, hyberbolic, exponential, and all sorts of other weirdness were tried and none of them improved the accuracy of the formula. I'm assuming because there are all sorts of different kinds of 84 win seasons, that the effects of these 84 win seasons differ greatly as well. Meaning that to get at the "average" effect from an 84 win season, the various different effects get averaged out to where we once again get a flat and average looking rate.
Finally, in order to use more complex functions, we'd probably need a much higher sample size than I have available (90 teams) in order to get at them. With only the 90 teams, all we're going to get are the simple trends and the important variables. If we try and get complex, the system doesn't become more accurate and becomes harder to work with.
A few things that should push up the accuracy of your model.
There's a strong upward trend in revenue. You can take this into consideration by including the year in the regression (well I actually used year1994)
I can't understate the importance of either having made the playoffs the previous season or in particular having won the World Series in the previous season.
Once you include these in the regression, the marginal value of a random win in previous seasons goes down a fair bit.
Likewise, making the playoffs and winning the world series in the season under consideration is very important. And including this in the regression lowers the value of a random win by quite a bit. (I know why you didn't include this in your model. In looking at what to bid on a player his impact on your potential playoff chances are very difficult to assess  particularly years down the line. And that's before getting into how long he rates to sustain that rate of play. Way easier to work with in a marginal revenue produced versus value over replacement study than what I have. )
I'm surprised that you found the marginal value of a win to be more or less constant across markets. I found winning percentage* market size to be both positive and significant. Makes for a right messy equation though. If you opted for a simplifying assumption, I can see why.
I don't know much about Canada, partly due to living in Florida. I'm curious whether Montreal is impoverished or something. However, any extra social services provided would increase the portion of people's money that they could spend on entertainment. If I make less money, but don't have to buy health insurance...
I did do this, in a more simplified way. I simply scaled up the revenue numbers to 2001 levels for each year.
As far as making the playoffs and world series, it is more or less in the model under the "home playoff games" heading. If you make the playoffs you get at least one home playoff game. If you make the World Series you could get as many as 10 or 11 playoff games.
I decided to use games instead of artificial levels, since that's more or less how the revenue is generated for the individual teams.
As far as the revenue per win/market size relationship, I was equally as surprised but I'm just not finding it.
It _could_ be that I'm estimating total revenue instead of local revenue and the revenue sharing combined with the error rate of the regression combined to reduce the difference to insignificant levels. I'm going to work on trying to compartmentalize the numbers a little more (breaking down how each revenue source is generated individually and then putting it back together), and maybe try and do a little bit more work on the right way to split two team markets (there could be significantly different revenue generating functions for two team markets than a oneteam market).
Well obviously with last years winning percentage being a key variable, making the playoffs is going to be contained within that. The problem I had with trying to add more playoff and WS data was simply that it didn't make the model more accurate, only more complex. If I do a few things different with the model than Ron, then whether a certain variable shows up as significant could be affected.
As for revenues, I simply scaled up the revenues in 1999 and 2000 to where the total league revenues were the same for all three seasons. This is a bit of a kludge for some questions such a model could work with (say overall revenue growth or something) but for my main purposes I figured it was the best option. YMMV.
You must be Registered and Logged In to post comments.
<< Back to main