Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Tuesday, June 02, 2009

RotoSynthesis: Liss: Sample Size and Statistical Significance

That off across the ocean there’s a million guys named Joe…Mauer?

I’m not going to mention a certain Twins catcher by name, but I do want to make a distinction between the two concepts above because it seems in our discussion of said catcher, they have been confused.

...The question is - at what point do we conclude that the coin is *not* evenly weighted? Well, it’s not simply a matter of how many times you flip the coin, i.e., sample size. It’s *also* a matter of the *magnitude of the deviation*. If you get 16 heads and four tails, I wouldn’t be so sure the coin is unevenly weighted. But if you get 20 heads and no tails, it’s almost certainly so. (Less than 1 in a million odds of that). In both cases, the sample size is quite small, but in the latter, it’s more than sufficient. When the coin lands at a an 80-percent heads clip, you need a larger sample to determine the coin is rigged because the magnitude of the deviation from the baseline (50/50) is less. And if the coin lands heads at a 55-percent clip, you need an even larger number of flips to determine whether it’s rigged.

So understand that the sample size is only one of two factors in determining the significance of the outlier. The other is the magnitude.

That’s why when you see Verlander strike out 60 batters in 44 IP or Joe Mauer - f*** it, I’ll mention him, hit 11 home runs, you cannot simply say, “it’s only one month, I’m not a believer” without also considering the magnitude of the deviation.

Is 60K in 44 IP a big enough magnitude to make one month significant. Is 11 home runs? In my opinion, yes. But whatever your opinion, you must address both factors if you’re going to get a good gauge of whether it’s dumb luck or a new baseline.

Repoz Posted: June 02, 2009 at 08:49 PM | 14 comment(s) Login to Bookmark
  Tags: sabermetrics, twins

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. bpasinko Posted: June 02, 2009 at 09:05 PM (#3203748)
Is he trying to say Mauer is for real in the most subtle way so when he hits 30 homers he can say I told ya so?
   2. Walt Davis Posted: June 03, 2009 at 12:58 AM (#3204237)
opinion? opinion?

You know, there is the whole discipline known as statistics and, you know, it might have occurred to one or two statisticians that this is a question worth answering.

He's stumbled into "the difference between two proportions." For quick and dirty, we'll assume they're independent -- you could certainly argue they're not since it's Mauer in both samples but we're only assuming that the old PAs are independent of the new PAs conditional on it being Mauer. You could also argue neither of these is a sample since we're looking at the entire population of Mauer PAs but then I'd have to do some hand-waving about super-populations and nobody wants that.

The difference is p1-p2 and its standard error is:

sqrt [ p1(1-p1)/n1 + p2(1-p2)/n2]

and here we get a z over 5 so yes it's "significant" except ....

The issue here is that this is a post-hoc test. If you'd hypothesized beforehand that Mauer would come back with a higher HR rate and now wanted to test your conclusion, this would be a legit test. But the chances that at least one player in MLB would see a huge spike in HR rate in May is a lot higher.

Still, it's over a 5-sigma variation which does suggest that this is a "process out of control" but all that tells us is that the "failure" rate (where each HR is a "failure") is higher than it used to be. The question then becomes what's the new rate. It will be several hundred PA before we'll have a good estimate of that but I'll just point out that Adrian Gonzalez had 11 HR in May in 5 fewer PA so there's no reason to think Mauer's new level of talent is any higher than Gonzalez's (who, to this point, has been a 25-35 HR guy though obviously he's on a torrid pace right now).

And as always, I point folks towards Steve Dillard, Aug 2-17 1979 -- this sort of thing can happen "randomly."
   3. Jose Can Still Seabiscuit Posted: June 03, 2009 at 02:28 AM (#3204470)
Why do you have a random two week stretch of Steve Dillard's career committed to memory?
   4. Shock Posted: June 03, 2009 at 03:08 AM (#3204502)
Why do you have a random two week stretch of Steve Dillard's career committed to memory?


This made me laugh pretty hard.

Really Walt, who the Christ is Steve Dillard? You're showing your age here, you should have used Shane Spencer :-P
   5. Jolly Old St. Nick Done Jumped The Ship Posted: June 03, 2009 at 03:16 AM (#3204509)
Personally, I am glad to be sharing the planet both with Steve Dillard** and those who remember what he did in what must have been the two greatest weeks of his life. Knowledge like this is probably what separates us from the fetuses.

**whoever he is, or was
   6. Shock Posted: June 03, 2009 at 03:27 AM (#3204512)
Agreed.

I absolutely love that I can go to BBRef, use the "Random Page" function, pull up a name like "Ed Sixsmith," post it here, and within a few hours Steve Treder or someone will give me his entire life story. (And it will be fascinating.)
   7. Crispix Attacks 2: Swag Airlines Posted: June 03, 2009 at 03:47 AM (#3204525)
Agreed. Which brings me to my next topic, what was the deal with...um...Rick Schu's...1990? Yeah, explain that one!~
   8. Misirlou is bad, he's nationwide Posted: June 03, 2009 at 04:00 AM (#3204533)
Agreed. Which brings me to my next topic, what was the deal with...um...Rick Schu's...1990? Yeah, explain that one!~


I was going to tell a story about Rich Schu moving Mike Schmidt off third base, but I suspect you already know, ...um, "Kevin".
   9. Boots Day Posted: June 03, 2009 at 04:00 AM (#3204535)
Steve Dillard was a backup second baseman for the Red Sox. Wasn't he? Am I some sort of monster for knowing that?
   10. Chris Dial Posted: June 03, 2009 at 04:14 AM (#3204541)
Still, it's over a 5-sigma variation which does suggest that this is a "process out of control" but all that tells us is that the "failure" rate (where each HR is a "failure") is higher than it used to be.
Dammit, walt, where were you a few weeks ago?

I was asking this question wrt pitcher-batter matchups. What's the tipping point.

5 sigma. IN 20 PAs what is that? 15 hits?
   11. Obama Bomaye Posted: June 03, 2009 at 04:20 AM (#3204546)
Am I some sort of monster for knowing that?

Perhaps not a monster, but you do have the aroma of peanuts about you.
   12. Walt Davis Posted: June 03, 2009 at 04:35 AM (#3204555)
Why do you have a random two week stretch of Steve Dillard's career committed to memory?

I'm a Cubs fan. When a guy, a complete nobody, hits like 600 for 2 weeks with half-a-dozen HRs or whatever, in an otherwise completely pointless season, it sticks sometimes. And it comes in handy when somebody says "[actual good player] has been on fire, this can't be random."

Granted, I don't have the year and dates committed to memory, I used to go to retrosheet and now b-r's game log to check.

Now do you wanna hear about the month or so when I was convinced that Scot Thompson was gonna be the next George Brett?
   13. Crispix Attacks 2: Swag Airlines Posted: June 03, 2009 at 04:38 AM (#3204557)
Can you give us a brief, 1,000 to 2,000-word encapsulation of Todd Haney's career and your thoughts on his place in history?
   14. Tango Posted: June 03, 2009 at 03:24 PM (#3204878)
All Walt is suggesting is that the "out of control" performance is no longer from the same "true talent" as from the previous performance. BUT, the new estimate of the updated true talent will not be anywhere close to the "out of control" performance. That's why he says he needs several hundred more PA.

Mauer in his career has 56 HR in 2514 PA, or 13 per 600 PA.

This year, he has 12 HR in 126 PA, or 57 per 600 PA.

Absent other information, from today to the end of the season, his HR rate will be far closer to 13 per 600 than 57 per 600.

How much though? This study from five years ago (post 2) suggests adding 131 PA of league average HR rate to your known information. So, if all we knew was the Mauer info of this year, his "true" HR rate would be about halfway between what he has hit (57 per 600) and what the league average is (16? per 600), or 36 or so per 600 PA.

However, for Mauer, we also have his past career, which shows him to not be a HR hitter. We can't simply discard that information (unless of course we have some reason to believe that the Mauer of old approaches batting differently than the Mauer of new).

Just taking a WAG, if we consider the Mauer of old providing a good prior, that the Mauer of new is showing a strong indicator, and we always have our trusty friend regression toward the mean, my WAG is that Mauer will hit 25 per 600 PA from now to the end of the season.

Of course, being a binomial means that I'm 95% sure that it's 25 per 600 PA, plus/minus 12 HR. Which basically means that whatever I say, or anyone says, insofar as Mauer is concerned, will be practically untestable.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Sponsor

Support BBTF

donate

Thanks to
Marc Sully's not booin'. He's Youkin'.
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOT: NHL is finally back thread
(362 - 2:42am, May 24)
Last: Robert in Manhattan Beach

NewsblogMariners sending Jesus Montero to Triple-A
(66 - 2:32am, May 24)
Last: rb's team is hopeful for the new year!

NewsblogRichie Ashburn’s Widow in Tears Over His Endangered Gladwyne Grave
(5 - 2:30am, May 24)
Last: Sunday silence

Newsblog[OTP-May] Politico: Congressional baseball game, May 1, 1926
(4284 - 2:17am, May 24)
Last: Joe Kehoskie

NewsblogMets’ Ike Davis On Struggles: ‘I Can’t Do Any Worse’
(24 - 12:28am, May 24)
Last: bobm

NewsblogOT: NBA Monthly Thread - May 2013
(1216 - 12:12am, May 24)
Last: thok

NewsblogDemystifying Red Sox Ownership - What Do They Do? (WEEI)
(27 - 12:06am, May 24)
Last: KT's Pot Arb

NewsblogOMNICHATTER for MAY 23, 2013
(77 - 11:10pm, May 23)
Last: Los Angeles El Hombre of Anaheim

NewsblogESPN: Forging bond with Pete Rose has helped fuel Joey Votto's desire to be great
(127 - 11:03pm, May 23)
Last: Everybody Loves Tyrus Raymond

NewsblogOT: The Soccer Thread, May 2013
(1123 - 10:55pm, May 23)
Last: puck

NewsblogAstros vendor brings snow cones into bathroom stall, gets fired
(21 - 10:03pm, May 23)
Last: Sunday silence

NewsblogLeyland breaks his own rule, lets Verlander get win after delay
(26 - 9:01pm, May 23)
Last: the Hugh Jorgan returns

NewsblogDaugherty: Brandon Phillips has been Reds' MVP so far
(18 - 8:14pm, May 23)
Last: TJ

NewsblogMitchell: Pedroia, Cano and Magical Thinking
(23 - 8:03pm, May 23)
Last: Robert in Manhattan Beach

Hall of Merit2014 Hall of Merit Ballot Discussion
(86 - 8:02pm, May 23)
Last: Ivan Grushenko of Hong Kong

Demarini, Easton and TPX Baseball Bats

 

 

 

AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets.

For wholesale prices on baseball gifts and equipment, check these stores out!

Baseball Autograph Signings
Baseball Card Supplies
Baseball Memorabilia
Baseball Collectibles
Baseball Equipment
Baseball Protective Gear

Page rendered in 0.1806 seconds
53 querie(s) executed