Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Monday, April 07, 2008

Bialik: A Numbers Guy Quiz on Probability

Attention all Mlodinowsters! (ahem…I would knock off this quiz in a minute, if I wasn’t busy tracking down that ‘Arnold Stang vs Super-Mechagodzilla’ video!)

Mr. Mlodinow, a visiting lecturer at Cal Tech and co-author of “A Brief History of Time” with Stephen Hawking, peppers “The Drunkard’s Walk,” set to be published next month, with dozens of examples, ranging from historical to personal to newsy. They serve to sketch an engaging history of probability and statistics, and to bolster his underlying thesis that the randomness that afflicts a sot’s amblings is pervasive in our lives. Furthermore, Mr. Mlodinow argues, we often act as if under the influence, failing to recognize the randomness in patterns and the patterns in randomness.

1. Suppose 1,000 athletes are tested for drugs. One in 10 have used the drugs, and the test has a 1% false-positive rate (and the false-negative rate is negligible). If an athlete from this group tests positive, what is the probability that she has used the drugs, to the nearest percentage point?

4. In baseball, suppose the American League champion is better than the National League champion, such that it has a 55% probability of winning each game against the NL champ. Then the NL champ nonetheless will win a best-of-seven-games series four in 10 times. What is the smallest odd number, X, for which a World Series between these two league champs that is best-of-X will ensure that there’s a 95% probability of a just result — the superior AL champ winning?

Repoz Posted: April 07, 2008 at 06:49 PM | 100 comment(s) Login to Bookmark
  Tags: sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Matt Clement of Alexandria Posted: April 07, 2008 at 07:21 PM (#2733733)
(1) is a good use of statistics, applying statistics to a statistical problem.

(4) is not a bad use of statistics, but part of a larger problematic with the use of statistics. The idea that teams possess a single factor of quality which can thus be universally and decontextually compared has not been shown to be the case, and is highly unlikely to be the case. It is true that lots of things happen in baseball that we don't expect, and some of them do not happen because one team expressed their real superiority over the other team, but the exact measure of random variation in baseball is not something we can learn from this sort of methodology. The presumption that this is at least a good estimate has also not been shown to the be case, and I would dispute that it is the case in many specific situations.
   2. Walt Davis Posted: April 07, 2008 at 08:07 PM (#2733803)
Assuming he's going for the typical answer, #5 has a hidden assumption -- that the host always opens a door with a cow behind it. If the host randomly chooses which door to open, you get a different answer. That is, what matters isn't the host's knowledge but the host's method of selection of which door to open.
   3. poludamas Posted: April 07, 2008 at 08:09 PM (#2733812)
There are additional minor wrinkles in the child gender questions relating to the unequal frequencies with which males and females are born.
   4. Moe Greene Posted: April 07, 2008 at 08:16 PM (#2733836)
Re Question #5: Actually, this "Let's Make a Deal" question comes up at my work from time to time, because one of my colleagues likes to use it in interviews for job candidates.

For some people, applying Bayes to this question is easy to understand. For others, they can't figure out how the new probability is NOT 50-50, which I guess is what Walt's referring to in post #2 above.
   5. Danny Posted: April 07, 2008 at 08:19 PM (#2733842)
I first heard--and learned the answers to--2, 3, and 5 in a long thread here about 4 years ago.
   6. BDC Posted: April 07, 2008 at 08:28 PM (#2733868)
The Monty Hall problem was baffling to me until somebody posed it this way: if there are a hundred doors, you choose one, and Monty opens 98 prizeless doors, should you switch?
   7. snapper (history's 42nd greatest monster) Posted: April 07, 2008 at 08:35 PM (#2733887)
#8 has nothing to do with statistics, it's just a fact. Letters in words are not distributed by any sort of stochastic distribution, thus, can not be estimated probabilistically.
   8. Mirabelli Dictu (Chris McClinch) Posted: April 07, 2008 at 08:44 PM (#2733905)
#8 has nothing to do with statistics, it's just a fact. Letters in words are not distributed by any sort of stochastic distribution, thus, can not be estimated probabilistically.

Yes and no. Knowing nothing of the English language, it would still be probable that there is at least one word of the form ????N? that is not of the form ???ING. All words of the form ???ING would necessarily be of the form ????N?, however.
   9. snapper (history's 42nd greatest monster) Posted: April 07, 2008 at 08:52 PM (#2733911)
All words of the form ???ING would necessarily be of the form ????N?, however.

Correct. That's the fact I'm referring to. I guess it's more of a logic exercise, but still, not probability.
   10. Padraic Posted: April 07, 2008 at 09:11 PM (#2733958)
Are we allowed to talk about the answer to the cow thing? I think I understand it, but I also think I have a problem with what I think is the answer.

I guess I can put it this way. Does the answer change if instead of it being phrased as "changing", you are simply asked to pick door 1 or 2? Meaning after Monty shows you the cow in door three, he doesn't say "do you want to change" but instead says "you can now pick between 1 or 2."

If this is the case, does it matter if you pick 1 or 2?
   11. Never Give an Inge (Dave) Posted: April 07, 2008 at 09:12 PM (#2733963)
Yes and no. Knowing nothing of the English language, it would still be probable that there is at least one word of the form ????N? that is not of the form ???ING.

I noticed at least one in the article.
   12. Kiko Sakata Posted: April 07, 2008 at 09:18 PM (#2733983)
Meaning after Monty shows you the cow in door three, he doesn't say "do you want to change" but instead says "you can now pick between 1 or 2."

If this is the case, does it matter if you pick 1 or 2?


I think the answer would be the same.

[warning: I'm going to give away the answer here]

Assuming Walt's assumption from #2, let's say you originally picked door #1, and Monty Hall revealed a cow in door #3. When you picked door #1, the odds of a car being there was 1 in 3. Hall revealing a cow does nothing to change those odds, because you already knew that either door 2 or door 3 (or both) had a cow behind it. But having revealed a cow behind door #3 does change your odds with respect to door #2 - there's now one cow and one car, so the odds of a car behind door #2 are 50/50. So you should pick door #2 regardless of how the question is worded.
   13. Padraic Posted: April 07, 2008 at 09:27 PM (#2734017)
Thanks kiko, that's how I looked at the problem too. When you pick door #1 you have a 33% of being right, but if you switch, it's 50%.

But, and this is where I'm confused, if you select door #1 anew as in being unrelated to your first choice, then I don't see how this selection isn't 50%. I'm saying that this selection is a new event (even if it's the same door) and so doesn't have the same "old" 33% chance, but a 50 percent chance.

I'm sure I am wrong and people love to explain this stuff, so shoot.
   14. Shock Posted: April 07, 2008 at 09:29 PM (#2734021)
Another way of explaining:

If you play the game 100 times and every time you stick with your original guess (or "re-pick" your original guess) you would expect to win about 33 times, as that is the probability.

But if you play the game 100 times and switch every time, you would expect to win 50 times, as half the time you will switch to the car and half the time you will switch to the cow.
   15. Hal Chase Headley Lamarr Hoyt Wilhelm (ACE1242) Posted: April 07, 2008 at 09:32 PM (#2734038)
Enough of you probabilisticians and your fancy numbers. Where can I go on the web to get PBP data for every contestant on every show of LMAD?
   16. Kiko Sakata Posted: April 07, 2008 at 09:35 PM (#2734047)
if you select door #1 anew as in being unrelated to your first choice, then I don't see how this selection isn't 50%


Because there was no chance that Hall would reveal a cow behind door #1 under the rules of the game. Once you make your initial choice, he's going to open one of the other two doors and reveal a cow (this is Walt's assumption, which is required to make this work). But, as I said, we already knew there was a cow behind at least one of the other two doors, so we haven't learned anything new with respect to door #1. The conditional probability is the same as the unconditional probability here. But for door #2, we've learned something new - there was definitely a cow behind door #3, which leaves only two equally likely possibilities (with respect to door #2).

If it's not intuitive, I think the only other way to show it is to list out all of the possibilities and just crunch out the numbers.
   17. Never Give an Inge (Dave) Posted: April 07, 2008 at 09:36 PM (#2734051)
If you switch, you have a 2/3 chance of being right. The probabilities have to sum to 1 at each stage of the game.
   18. DKDC Posted: April 07, 2008 at 09:54 PM (#2734085)
I think the alternate version in #6 is the easiest way to visualize it.

If there are a 100 doors, you have a 1 in 100 chance of randomly picking the door with the car behind it. There's a 99% chance that the car is somewhere else. This fact won't change no matter what Monty does.

After Monty opens every door except yours and one other, the other unopened door has a 99% chance of having the car behind it. The probability of having the car collapses from those 99 doors onto that one door. Your door is still stuck at 1%.
   19. Padraic Posted: April 07, 2008 at 09:55 PM (#2734091)
I do get it on a statistical level (the 33 vs. 50 was my first thought), but there is something of a continuity between the events in that I have knowldege of the first event.

If Monty had sent me home to think about it and I told my wife in the farmer's daughter's outfit to go in the next day with the instructions, "pick one a 'dem doors honey" then would the odds be 50/50 whether she picked 1 or 2?
   20. unemployed Jeff Posted: April 07, 2008 at 09:56 PM (#2734093)
The surgeon is his mother!

wait, what?
   21. Boots Day Posted: April 07, 2008 at 09:59 PM (#2734102)
I can't get into the article for some reason, but that second question in the excerpt reminded me of "The Black Swan," which I am reading right now. In that book, Taleb makes the point that using casino odds as a proxy or example of probabilities in the real world is totally bogus, becsue we know the exact odds in the casino, but we never, ever know the exact odds of anything in real life.

We never, ever know that one team has a 55 percent probability of beating another team. If nothing else, the fact that both teams will use three or four different starting pitchers in a series means that the probabilities will be different in every game. Then there are a thousand or so other variables you have to account for before you could know a true probability.
   22. Hal Chase Headley Lamarr Hoyt Wilhelm (ACE1242) Posted: April 07, 2008 at 10:05 PM (#2734118)
We never, ever know that one team has a 55 percent probability of beating another team.

True enough. One of the points of the exercise, though, is that even if you do know, the knowledge isn't worth a lot.
   23. KingKaufman Posted: April 07, 2008 at 10:21 PM (#2734168)
After Monty opens every door except yours and one other, the other unopened door has a 99% chance of having the car behind it. The probability of having the car collapses from those 99 doors onto that one door. Your door is still stuck at 1%.

Why? Monty has revealed two facts. There are 98 open doors with cows behind them. And there are two unopened doors, one with a cow behind it and one with a car behind it. Why is there a 99 percent chance that the door I didn't pick has the car?
   24. Voros McCracken of Pinkus Posted: April 07, 2008 at 10:22 PM (#2734172)
In that book, Taleb makes the point that using casino odds as a proxy or example of probabilities in the real world is totally bogus, becsue we know the exact odds in the casino

But we don't, actually. Probability models model reality, but the model never represents the "true" probability because even in roulette, dice and cards there are various effects that can alter the probabilities in small but real ways. There's the Spanish guy who got thrown out of all the Casino's in Europe for tracking the biases on Roulette wheels and using it to make money.

The affects that different pitchers, different playing conditions, etc. have on the outcome of baseball games may be more obvious effects that change that probability, but those effects exist in all sorts of seemingly "random" probabilities. After all if you could throw a pair a dice exactly the same under the exact same conditions, you'd get the same result every time. The probability model is simply a mathematical approximation of all of the various factors that come into play that would be difficult to impossible to model with any real accuracy.

So the real key, once we have the probabilistic model, is to estimate how well that model approaches reality. With "fair" dice it seems to work well. Probably not as well with the baseball example he gives, but I bet it's not a terrible approximation either.
   25. Eric J can SABER all he wants to Posted: April 07, 2008 at 10:33 PM (#2734193)
23: Basically, because the door you originally chose could not be opened regardless of whether or not there is a car behind it, while there is an excellent chance that the other door would have been opened if there was a cow behind it.
   26. minsc Posted: April 07, 2008 at 10:37 PM (#2734199)
So what's the deal with problems 2 and 3? How does knowing the girl's name change anything?
   27. Jimmy P Posted: April 07, 2008 at 10:38 PM (#2734200)
I'm missing how question 2 and 3 are different. Did he forget a word or a phrase?
   28. Never Give an Inge (Dave) Posted: April 07, 2008 at 10:42 PM (#2734205)
Why? Monty has revealed two facts. There are 98 open doors with cows behind them. And there are two unopened doors, one with a cow behind it and one with a car behind it. Why is there a 99 percent chance that the door I didn't pick has the car?

I'm not sure what the 100-door version of the problem adds, so I'm going back to the 3-door example.

The key point is that Monty couldn't open your door. He didn't have three doors to open and left two of them closed. He had two doors to open and left one of them closed. You told him to keep door #1 closed, so he didn't provide you with any new informatino about what was behind door #1.

When you chose door #1 the first time, there was 1/3 chance the car was behind door #1 and a 2/3 chance the car was behind doors #2 or #3.

Now, he has shown you it's not behind #3. The odds of it being behind #1 are still 1/3, because he has given you no new information about door #1. No matter where the car is, he's always going to be able to open another door that isn't #1 and show you a cow behind it.

If the odds of the car being behind door #1 are still 1/3, then the odds of it being behind #2 must be 2/3. So you should switch to door #2.

Another way to look at it is that he has told you, "If the car is behind either doors 2 or 3, it's NOT behind door 3." That is valuable information about door #2.

In the 100-door example, he's basically saying, "Of these 99 doors, I can show you there's no car behind 98 of them. But I CAN'T show you what's behind this last one." Because he knows where the car is, that's pretty valuable information.
   29. Rich Rifkin I Posted: April 07, 2008 at 10:44 PM (#2734207)
"The idea that teams possess a single factor of quality which can thus be universally and decontextually compared has not been shown to be the case, and is highly unlikely to be the case."

Decontextual is not a word. I looked for some time to try to find a synonym for "out of context" but couldn't find one. Nevertheless, if you want a neologism, I would go with excontextually. Ex- means "out of," while de- means "removal," so either could work. But excontextual sounds right to my ear.
   30. villageidiom Posted: April 07, 2008 at 10:46 PM (#2734211)
We never, ever know that one team has a 55 percent probability of beating another team. If nothing else, the fact that both teams will use three or four different starting pitchers in a series means that the probabilities will be different in every game. Then there are a thousand or so other variables you have to account for before you could know a true probability.

I was going to post this in response to MCA, but figured it was nitpicking and didn't bother. But since it continues...

The question reads: "In baseball, suppose..." Does it really matter if what he's asking you to suppose is unlikely or is simpler than really happens? He starts question 5 with "Suppose you're on a game show..." I'm surprised nobody has said "Well, this question is flawed because I'm never on a game show, and Let's Make A Deal is off the air, so who does he think he's fooling?"
   31. Eric J can SABER all he wants to Posted: April 07, 2008 at 10:52 PM (#2734222)
Is there a shortcut to doing #4 that I'm not catching? I think I've gotten to the answer by the brute force approach, but that doesn't seem to mesh well with the rest of the questions posed.
   32. Boots Day Posted: April 07, 2008 at 11:01 PM (#2734232)
The question reads: "In baseball, suppose..." Does it really matter if what he's asking you to suppose is unlikely or is simpler than really happens?

I posted something that I thought was germane to the topic. If you don't think it is, which is certainly within your rights, feel free not to comment on it.
   33. Hal Chase Headley Lamarr Hoyt Wilhelm (ACE1242) Posted: April 07, 2008 at 11:03 PM (#2734235)
I took time off from watching a game and stuck my nose back into a spreadsheet to tackle #4. I anticipated that the answer was going to be big, but not that it would be quite that big.
   34. mchengcit Posted: April 07, 2008 at 11:04 PM (#2734237)
Another explanation to the Monty Hall problem:

Let's call the door which hides the car Door #1. Of course, you don't know this, so let's you pick randomly:

You pick each door with 1/3 probability. Going on a case-by-case basis:

A) (1/3 probability) You pick Door #1. Monty now has to show you what's behind one of the other two doors. Both have cows, so he opens either with equal probability (1/2):
A1) (1/6 probability) Monty shows you a cow behind Door #2
A2) (1/6 probability) Monty shows you a cow behind Door #3.

B) (1/3 probability) You pick Door #2. Monty now has to open a door, but he can't open Door #1, because that has a car. Therefore, he is forced to open Door #3.

C) (1/3 probability) You pick Door #3. Monty now has to open Door #2, to show you a cow.

So, if Monty opens Door #3 to show you a cow, it's twice as likely that Door #2 hides the car (1/3 vs. 1/6). Same thing if Monty opens Door #2. As Dave says, the fact that Monty shows you that a car is NOT behind the door he opens is valuable information.

The problem is ambiguously worded. If Monty opens doors randomly, then all of the above do not apply. You don't gain any information about what's behind the third door because there's some probability that the door he opened could have revealed a car.

Gender ratio problem:

The answers to the two questions are different.

In Question #2, the conditions apply equally well to boy/girl families as well as to girl/girl families.
In Question #3, by assigning a name to one of the girls, it is twice as likely that the condition applies to a girl/girl family than to a boy/girl family. The fact that the name is very rare doesn't matter, I think. Try replacing the girl's rare name with "The eldest child is a girl", and you get the same result.
   35. DKDC Posted: April 07, 2008 at 11:07 PM (#2734244)
I like the 100 door example because it highlights how unlikely it is for you to pick the right door at the beginning. You'd have to be one lucky SOB to pick the 1 in a 100 doors with the car behind it.

Once I've picked it, I'm not going to gain any additional information about my door. No matter which door I choose, Monty's move is going to look the same to me: he'll reveal 98 losers.

So my 1% chance that I picked the right door from the outset doesn't change.
   36. Padraic Posted: April 07, 2008 at 11:12 PM (#2734251)
the fact that Monty shows you that a car is NOT behind the door he opens is valuable information.


I guess this is why I made a distinction earlier between "keeping" your choice and "picking" door #1 a second time. If you pick #1 a second time, this time you are doing so with the knowledge that door #3 has a cow, and that, therefore only one cow remains.

When you picked it the first time of course you didn't have that information but the second time you do.
   37. mchengcit Posted: April 07, 2008 at 11:16 PM (#2734253)

I guess this is why I made a distinction earlier between "keeping" your choice and "picking" door #1 a second time. If you pick #1 a second time, this time you are doing so with the knowledge that door #3 has a cow, and that, therefore only one cow remains.

When you picked it the first time of course you didn't have that information but the second time you do.


However, you also have information about door #2 by virtue of the fact that Monty DIDN'T open it, so door #1 and door #2 are no longer on equal footing.
   38. Padraic Posted: April 07, 2008 at 11:26 PM (#2734272)
you also have information about door #2 by virtue of the fact that Monty DIDN'T open it

True, I guess, but it still seems odd that if another contestant were brought on at that point with no knowledge of what had just transpired, he would have a 50% of being right if he guessed #1 while I have a 33% if I picked 1 again.

Or maybe it's not so odd since the whole question is about knowledge anyway...
   39. Never Give an Inge (Dave) Posted: April 07, 2008 at 11:28 PM (#2734277)
In Question #2, the conditions apply equally well to boy/girl families as well as to girl/girl families.
In Question #3, by assigning a name to one of the girls, it is twice as likely that the condition applies to a girl/girl family than to a boy/girl family. The fact that the name is very rare doesn't matter, I think. Try replacing the girl's rare name with "The eldest child is a girl", and you get the same result.


I agree that this is probably what he's getting at with the question. But I don't think knowing the name actually helps you, because once you know that there's at least one girl, you also know that one of the children has a girl's name. It's not true that "In Question #3, by assigning a name to one of the girls, it is twice as likely that the condition applies to a girl/girl family than to a boy/girl family."

Knowing that the eldest child is a girl, however, does change the question.
   40. Never Give an Inge (Dave) Posted: April 07, 2008 at 11:34 PM (#2734283)
True, I guess, but it still seems odd that if another contestant were brought on at that point with no knowledge of what had just transpired, he would have a 50% of being right if he guessed #1 while I have a 33% if I picked 1 again.

You'd both have the same odds of being right by picking #1 (33%). The difference is that you'd know your odds were 33%, while the other contestant would think his odds were 50%. That's the value of the added information.
   41. mchengcit Posted: April 07, 2008 at 11:45 PM (#2734306)
I agree that this is probably what he's getting at with the question. But I don't think knowing the name actually helps you, because once you know that there's at least one girl, you also know that one of the children has a girl's name. It's not true that "In Question #3, by assigning a name to one of the girls, it is twice as likely that the condition applies to a girl/girl family than to a boy/girl family.


Knowing the girl's name does make it twice as likely, because either of the two girls may have that name. In the question, it states that the name is one that one in a million females share. The probability that a girl/girl family has one girl with that name is actually 2 in a million (assuming of course the parents select that name with the same probability as the general population). A girl/girl family is twice as likely to meet the condition than a boy/girl family, where only one girl is available.
   42. Never Give an Inge (Dave) Posted: April 07, 2008 at 11:57 PM (#2734319)
(assuming of course the parents select that name with the same probability as the general population).

That's a pretty big assumption, though, isn't it?
   43. villageidiom Posted: April 08, 2008 at 12:02 AM (#2734325)
On question #5? There's no advantage to switch, given door 3 is revealed. Had he said that one of the other doors is revealed to be a cow, instead of door 3 specifically, then it would be to your advantage to adopt a switching strategy. But since he said door 3, that changes things.

Before a door is revealed, the probability of the other (not yours, not revealed) door having the car is 2/3. If in general you decide to switch, regardless of which door is revealed, you'll get the car 2/3 of the time. One-third of the time your first choice was correct (and switching was the wrong thing to do), while two-thirds of the time you'll switch to the correct door.

But that's not what was presented in the question. The question deals with the decision to switch if door 3 is revealed. That indeed changes the probability. If you know door 3 has a cow, the probability of a car doesn't shift just to door 2. It shifts to both doors equally.

This is very similar to the distinction between his questions 2 and 3 (or at least if you use older/younger instead of specific name, as someone suggested above). If you know general information, the probabilities are different from when you know specific info. If you always pick door 1 to start, you should always switch. If you always pick door 1 and he always reveals door 3, it doesn't matter whether you switch or not.

So, whoever said the question was poorly worded, I agree. He's asking about the switching strategy when you pick door 1 and are shown door 3. If he generalizes the problem, it's a different answer.


For those who like formulae:

Pr(car=1 | car<>2 and/or car<>3) = Pr(car<>2 and/or car<>3 | car=1) * Pr(car=1) / Pr(car<>2 and/or car<>3) = 100% * 33% / 100% = 33%.

Pr(car=1 | car<>3) = Pr(car<>3 | car=1) * Pr(car=1) / Pr(car<>3) = 100% * 33% / 67% = 50%.
   44. mchengcit Posted: April 08, 2008 at 12:13 AM (#2734341)
That's a pretty big assumption, though, isn't it?


For any one set of parents, yes. I guess what I am trying to say is that in the entire population, specifying that a girl is named X will select out girl/girl families with twice the frequency as boy/girl families, just because either of the girls can be named X. That of course is not true if you say, "The girl has a name", because in that case, all families will be counted equally whether they are boy/girl or girl/girl.

If you happen to know Bayes' Theorem, then it's easy:

P(girl/girl family | a girl in family has rare name) = P(a girl in family has rare name | girl/girl family) P(girl/girl family)/P(a girl in family has rare name)

P(a girl in family has rare name) = 1/1,000,000 (specified in question)
P(girl/girl family) = 1/4
P(a girl in family has rare name | girl/girl family) = 1 - (1 - 1/1000000)^2 = 2/1,000,000

P(girl/girl family | a girl in family has rare name) = (2/1,000,000 * 1/4)/(1/1,000,000) = 1/2

I guess this is where the rare name comes in because, if the name were common, then the probability of a girl in a girl/girl family having that name is not quite twice the probability of any girl having that name.
   45. Never Give an Inge (Dave) Posted: April 08, 2008 at 12:39 AM (#2734370)
If you always pick door 1 to start, you should always switch. If you always pick door 1 and he always reveals door 3, it doesn't matter whether you switch or not.

I disagree. First of all, you are being asked about a specific instance of a game. Nowhere does he say that this is a repeated game in which only door #3 is opened.

But even if that were the case, it doesn't really matter. What matters is that the car is always randomly placed behind one of the 3 doors such that the probability is 1/3 that it will be behind any of the doors at the beginning of the game, and that Monty can only open a door that has a cow behind it.
   46. Matt Clement of Alexandria Posted: April 08, 2008 at 01:24 AM (#2734427)
The question reads: "In baseball, suppose..." Does it really matter if what he's asking you to suppose is unlikely or is simpler than really happens? He starts question 5 with "Suppose you're on a game show..." I'm surprised nobody has said "Well, this question is flawed because I'm never on a game show, and Let's Make A Deal is off the air, so who does he think he's fooling?"
I disagree.

It is a constitutive fact of baseball that the probabilities are not known beforehand. This is along the lines of saying, "In baseball, suppose that your wide receiver wins the opening faceoff..." You can only suppose things that are possible in the game.
   47. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 01:45 AM (#2734442)
So, whoever said the question was poorly worded, I agree. He's asking about the switching strategy when you pick door 1 and are shown door 3. If he generalizes the problem, it's a different answer.

He has generalized the problem. He says:

opens another door, say No. 3, to reveal a cow


Anyway, the easiest way to think about it is this:

Do you want the door you originally picked, or all of the other doors? Because you know that Monty will open a door with a cow, and you know that one of the doors you didn't pick contains a cow. So you don't really gain anything by him opening the door. You're choosing between getting door 1 and getting both door 2 and door 3.

I usually point out the expansion to 100 in explaining the problem, so that might have been me in the earlier thread. If you pick one door, then get the option of keeping your door, or getting the other 99, which would you take? It's obvious that you take the other 99, right? Just because Monty opens 98 of them to show cows doesn't change that choice. It may make you nervous, but you still got the 99 doors.
   48. Robinson Cano Plate Like Home Posted: April 08, 2008 at 01:45 AM (#2734443)
I was introduced to the Monty Hall problem in college, logic class, and I was completely amazed. I loved it. What finally made it "click" for me was to think about it this way:

Suppose you know beforehand that, no matter what, Monty is going to reveal a cow. Then you know all you have to do in order to win the car is to pick one of the cows (and switch). Clearly you clearly have a 2 in 3 chance of picking a cow (and only a 1 in 3 chance of picking the car). In other words, you know from the start that switching is a better strategy than sticking. You know this is true even if Monty doesn't offer the switch. (In fact, the "proper" state of mind to be in--having chosen but before getting the option to switch--is to hope Monty offers a switch.)

Caveat: This applies only as long as Monty's choice to reveal a cow didn't depend on whether you picked a cow. You might think that Monty knows you're a clever guy, and would only show you one of the cows if you got lucky with your first pick....

I think one of the reasons that our intuition turns against the "correct" answer is that we properly guess that Monty has more information than we do (where the car is), and that Monty has interests that don't coincide with ours (e.g., preserve the property of "Let's Make a Deal" and make good television programming, among others). Thus, the "real life" implications of this problem are somewhat less impressive than the "laboratory" results. In truth, the Monty Hall problem shows us less that people are "irrational" than it shows that we have ingrained skepticism of the motivations of people like Monty Hall--which isn't really a flaw, IMO.

What this means is the fact that the problem includes the fact that Monty knows where the car is makes it a worse choice to switch. You'd prefer to have an MC who truly doesn't know (and thus couldn't be trying to trick you). So I disagree with (2.) above--you're better off switching if Monty is opening doors randomly. If he randomly opened the car, you wouldn't have to wonder whether you should switch. You'd have already lost.
   49. Voros McCracken of Pinkus Posted: April 08, 2008 at 01:51 AM (#2734452)
It is a constitutive fact of baseball that the probabilities are not known beforehand.

But they're not known before hand in any real life problem that probability theory models. Yes the difference between the model and the reality might be much smaller in dice than it is in baseball, but there's any number of factors that could affect the dice roll in a way such that the results would differ slightly from the model.

Take a deck of cards for example. Suppose you shuffle a deck of cards and place them on the table. You may think the known probability of the eight of clubs being the first card in the deck as 1 in 52, but in actuality the probability of that card being the eight of clubs is either 0% or 100%, it has already been determined. The 1 in 52 is simply a model to approximate the imperfect knowledge we have of that top card, not the real identity of that top card.

In poker hands where the eight of clubs has been dealt, is the probability that the first card is the eight of clubs in the next hand necessarily equal to 1 in 52? Couldn't it possibly be that imperfect shuffling might make it 1 in 51.6 or 1 in 52.4? Wouldn't that make whether a computer program shuffled virtual cards or a human shuffled real cards a factor in these probabilities.

Probability theory is not now, nor ever has been a reality in and of itself. It is our attempt to model our own human uncertainties. With perfect knowledge, most games of chance would be no such thing.

Whether talking cards, dice, baseball or politics, the applicability of a probability model depends entirely on its ability to approximate the real world reality. Saying that it only applies to questions with "known probabilities" is a false dichotomy because such a thing does not exist.
   50. snapper (history's 42nd greatest monster) Posted: April 08, 2008 at 01:57 AM (#2734459)
Saying that it only applies to questions with "known probabilities" is a false dichotomy because such a thing does not exist.

I think fair dice have a true, known probability.
   51. Kiko Sakata Posted: April 08, 2008 at 01:58 AM (#2734460)
You'd prefer to have an MC who truly doesn't know (and thus couldn't be trying to trick you).


But if the rules of the game are (a) Monty Hall must open a door with a cow, and (b) Monty Hall cannot open the door you've chosen, then there's no way he can trick you. If you did not pick the car at first, he's only got one choice - to reveal the one remaining cow, and if you did pick the car at first, he can open either door but the result's the same for you - you see a cow.

On the other hand, if Monty doesn't know where the car is and can hence reveal the car when he opens a door (thereby guaranteeing that you lose), then your odds are stuck on 1/3 no matter what you do.
   52. Shock Posted: April 08, 2008 at 02:01 AM (#2734464)
And I'll be the 18th person to try and explain the LMAD thing:

Suppose you play the game 1000 times, and choose to "switch" every single time. The only time you would ever lose is if you picked the car from the outset, and that would probably only happen about 333 times.

The 100-doors example makes it even more clear. If you switch every time, you would only lose if you happened to pick the car immediately from the get-go, and the odds of that happening are very low (1%.)
   53. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 02:21 AM (#2734475)
...choose to "switch" every single time. The only time you would ever lose is if you picked the car from the outset

This is an excellent and concise way to put it.
   54. Der Komminsk-sar Posted: April 08, 2008 at 02:23 AM (#2734479)
I think fair dice have a true, known probability.
By definition.

#4: I took time off from watching a game and stuck my nose back into a spreadsheet to tackle #4. I anticipated that the answer was going to be big, but not that it would be quite that big.
I was calculating this earlier today and my spreadsheet blew up (calculating the factorals) after game 171 - IIRC, I'd only reached the 80% threshold or so.

#2: P(girl/girl family) = 1/4
Strictly speaking, here it's 1 in 3 - you can eliminate boy/boy as a possibility.

Presuming I understand #7 correctly, I don't like this question, though I guess saying nearest 5 percentage points is their way of weasling out of its asymptotic nature.
   55. Der Komminsk-sar Posted: April 08, 2008 at 02:24 AM (#2734480)
This is an excellent and concise way to put it.
Agreed - I've not done a good job of explaining this problem to others.
   56. Voros McCracken of Pinkus Posted: April 08, 2008 at 02:55 AM (#2734495)
I think fair dice have a true, known probability.

No sir. The level of divergence from a simple 1 in 6 probability for so called "precision dice" in casinos may be very small, but it is not non-existent and is not required to be by law. It is doubtful there is a die in the world that has an exact 1 in 6 probability for every number. And if it did exist, it would almost certainly deviate from that after several uses, making a real world confirmation of its initial probability, impossible.

This is my point. The difference between the identity of a starting pitcher and the balance of a particular die (or the shuffle of a deck of cards) are differences in degree, not in kind. Therefore the applicability of a probability model should be due to how well the model conforms with reality in either case. That one is dice and the other baseball is irrelevant.
   57. Der Komminsk-sar Posted: April 08, 2008 at 03:18 AM (#2734507)
Voros,
I was about to argue that we shouldn't intrepret "fair dice" in the sense of physical dice, but as a necessary abstraction - 'til I reread your posts and realized we were saying the same thing. So ignore my blurb in #56 (done and done, I'm sure).
   58. Robinson Cano Plate Like Home Posted: April 08, 2008 at 04:06 AM (#2734526)
But if the rules of the game are (a) Monty Hall must open a door with a cow,

You're adding rules. Monty *did* open a door with a cow. You don't know he *had to*.
   59. villageidiom Posted: April 08, 2008 at 04:13 AM (#2734530)
I disagree.
You disagree that I'm surprised?

He has generalized the problem. He says:

opens another door, say No. 3, to reveal a cow
Fair enough. I was looking at Monty asking me if I want to switch to door number 2, not, "say", door number 2. But now I see that he does generalize it enough prior to that.
   60. Kiko Sakata Posted: April 08, 2008 at 04:43 AM (#2734540)
Monty *did* open a door with a cow. You don't know he *had to*.


See Walt's comment #2 in this thread. The conventional version of this game assumes that Monty always reveals a cow. If Monty can open a door with either a cow or a car, then the odds of winning are exactly 1/3 no matter what you do.
   61. Never Give an Inge (Dave) Posted: April 08, 2008 at 02:48 PM (#2734715)
See Walt's comment #2 in this thread. The conventional version of this game assumes that Monty always reveals a cow. If Monty can open a door with either a cow or a car, then the odds of winning are exactly 1/3 no matter what you do.

I guess it depends on the rules. If Monty showed me a car, I'd choose to switch to the door with the car behind it. Nothing in the rules says you can't switch to the door Monty just opened for you.
   62. Padraic Posted: April 08, 2008 at 04:09 PM (#2734775)
Hey, check this out! I'm convinced now. There is also a story related to Monty in the Times here.

Edit - 20 switches, 13 cars. 20 "stand Pat" Gillick's, 6 cars.
   63. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 04:14 PM (#2734778)
I guess this is where the rare name comes in because, if the name were common, then the probability of a girl in a girl/girl family having that name is not quite twice the probability of any girl having that name.

I'm really struggling with this question and explanation. Running through your equation with a common name (say, 1/100), still gives me .4975, which means that the rare part doesn't play a huge part in it.

Wait, what's the answer to #2? Is it 1/2 (assuming as above that the birth rate is 50%)? So the rare name knocks it down to 0.49999975 and a common name knocks it down to .4975.

Note: Probability was never my strong suit, primarily because I had a hard time with what Voros mentioned. That top card in the deck is what it is, no matter what I know or what the guy who looked at the deck knows. It took me a long time to internalize probability.
   64. DKDC Posted: April 08, 2008 at 04:29 PM (#2734790)
Hey, check this out! I'm convinced now.

It wasn't all that convincing to me.

I did 30 "switches" and won 17 cars, and 30 "stand pats" and won 16 cars.
   65. Pasta-diving Jeter (jmac66) Posted: April 08, 2008 at 04:51 PM (#2734811)
I always liked this one for a non-intuitive probability problem
   66. Francoeur Sans Gages (AlouGoodbye) Posted: April 08, 2008 at 05:05 PM (#2734833)
I'm really struggling with this question and explanation. Running through your equation with a common name (say, 1/100), still gives me .4975, which means that the rare part doesn't play a huge part in it.

Wait, what's the answer to #2? Is it 1/2 (assuming as above that the birth rate is 50%)? So the rare name knocks it down to 0.49999975 and a common name knocks it down to .4975.
What's throwing you off is that the answer to question #2 is NOT 1/2.

Assume an equal birth rate. There are the following gender possibilities for a family of two children:

Elder child is a boy, younger child is a boy (probability 1/4)
Elder child is a boy, younger child is a girl (probability 1/4)
Elder child is a girl, younger child is a boy (probability 1/4)
Elder child is a girl, younger child is a girl (probability 1/4)

So if we know that the family has one girl, the probability that both are girls is

(1/4)/(1/2 + 1/4) = 1/3.

The reason to make the name "rare" in question #3 is to make that answer vanishingly close to 1/2.
   67. Jimmy P Posted: April 08, 2008 at 05:11 PM (#2734847)
What's throwing you off is that the answer to question #2 is NOT 1/2.

What's throwing me off is how the question is stated.

2. You know that a certain family has two children, and that at least one is a girl.


3. You know that a certain family has two children, and you remember that at least one is a girl with a very unusual name (that, say, one in a million females share),


I'm missing how the name helps me. In both, I know at least one is a girl. In the latter, I know her first name. How does knowing the known girl's first name help me with the unknown child?
   68. Kiko Sakata Posted: April 08, 2008 at 05:31 PM (#2734897)
I'm missing how the name helps me. In both, I know at least one is a girl. In the latter, I know her first name. How does knowing the known girl's first name help me with the unknown child?


I think comment #42 in this thread has the best answer for you. Basically, it's twice as likely that a family with two girls will have a girl named Rhiannon (the most unusual name of a girl that I actually know that I could think of) than that a family with one girl will have a girl named Rhiannon. As others have suggested, this is generally true regardless of the unusualness of the name, although the more unusual the name, the closer the true probability will approach exactly 50%.
   69. Francoeur Sans Gages (AlouGoodbye) Posted: April 08, 2008 at 05:34 PM (#2734901)
Well, #68 shows you the maths for question #2, and #45 shows you the maths for question #3. So you see how doing the calculations you get different answers. If you want an explanation as to how it makes "intuitive" sense...

In two-children families, half of all girls live in GG families. So there's a 50% chance that a rare occurrence (like a rare name) happens to a girl in a GG family.

But only one-third of two children families with a girl in them are GG. So if we happen on a family with a girl in it, only a 1/3 chance that it's GG.
   70. Jimmy P Posted: April 08, 2008 at 05:40 PM (#2734925)
Thanks guys, I'm just trying to wrap my head around it now. It doesn't seem intuitive to me at all. I've always struggled at probability (but did well in probability classes, maybe because I worked harder at them).
   71. Never Give an Inge (Dave) Posted: April 08, 2008 at 05:55 PM (#2734962)
In two-children families, half of all girls live in GG families. So there's a 50% chance that a rare occurrence (like a rare name) happens to a girl in a GG family.

I think that is the unstated assumption on which the problem depends. However, names are not really randomly assigned the way that gender is. Maybe the family had a recently deceased grandmother named Rhiannon and they had determined to name one of their daughters Rhiannon no matter how many daughters they had. Maybe the family comes from a small tribe where the name Rhiannon is very common and everyone has a daughter by that name. I agree 50% is a reasonable best guess, but it's not necessarily the *right* answer without the key assumption.
   72. BDC Posted: April 08, 2008 at 06:03 PM (#2734975)
Maybe the family comes from a small tribe where the name Rhiannon is very common and everyone has a daughter by that name

An impenetrable enclave of Fleetwood Mac fans.
   73. Francoeur Sans Gages (AlouGoodbye) Posted: April 08, 2008 at 06:04 PM (#2734976)
Dave, the hypotheticals you make do not affect the problem. We only need the following assumptions:

1. That any given baby has a 1/2 probability of being a girl, independent of all other births.
2. That the probability of the rare name being given is independent of the number of girls born in the family.

That's it.
   74. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 08:12 PM (#2735396)
1. That any given baby has a 1/2 probability of being a girl, independent of all other births.
2. That the probability of the rare name being given is independent of the number of girls born in the family.

That's it.


My problem is that it doesn't actually matter what the name is or how rare it is. Just by knowing the name the probability jumps from 33% to near 50%. Even in the case of a very common name (1/10), you get .475. If half of the girls were named Josephine and you got a Josephine you'd be at .375. So knowing the name increases your odds of having GG in all cases, no matter how rare the name, according to the math.

Which means that if you had 9,999 families and pulled all of the ones who had at least one girl, you'd get 3,333 that had two girls. But if you had 9,999 families and pulled out all of the ones with at least one girl, then told the name of that girl, you'd get more than 3,333 that had two girls.

How is that possible?

(I understand you wouldn't get the exact distribution and all.)
   75. Francoeur Sans Gages (AlouGoodbye) Posted: April 08, 2008 at 08:56 PM (#2735552)
Which means that if you had 9,999 families and pulled all of the ones who had at least one girl, you'd get 3,333 that had two girls. But if you had 9,999 families and pulled out all of the ones with at least one girl, then told the name of that girl, you'd get more than 3,333 that had two girls.

How is that possible?
But those girls all have different names! I now see that you misunderstand the problem, which is why the answer makes no sense to you.

Suppose our chosen name is incrediby common, it's Mary, and 1/3 girls are called Mary. We have 9,999 families, all with at least one daughter, as in your example.

So we expect there to be 6,666 one-daughter families. 1/3 girls are called Mary, so we expect there to be 2,222 one-daughter families with a girl called Mary.

We expect there to be 3,333 two-daughter families, 1/3 girls are called Mary. How many of these families contain a girl called Mary? Clearly we expect more than 1/3 of them (1,111) to contain a girl called Mary, because there are two girls in each family. In fact we expect there to be 1852* two-daughter families with a girl called Mary.

What we are interested in is not this 1852 as a percentage of all 9,999 families, but this 1852 as a percentage of just those families with a girl called Mary. So the percentage is 1852/(1852+2222), which is about 45.5%.

*Rounded to 0 decimal places, that carried through.
   76. The Polish Sausage Racer Posted: April 08, 2008 at 09:05 PM (#2735573)
Hey, check this out! I'm convinced now.

It wasn't all that convincing to me.

I did 30 "switches" and won 17 cars, and 30 "stand pats" and won 16 cars.


I did 20 switches and won 15 cars, and 20 stand pats and won 10 cars.

Oddly enough, both times I tried it, the first half dozen times standing pat was always the right answer and then it normalized more.

Doesn't the game really depend on whether Monty is obligated to offer the switch at all? If he can choose to not give you the opportunity to switch, have you really learned any new information? Why would Monty in reality ever give you the opportunity to switch if you had a goat?
   77. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 09:54 PM (#2735668)
OK, so I need to think about it not as "selecting all the families with two kids", but "Selecting all of the families with at least one Mary", how many will have two girls as opposed to a girl and a boy. It's the same thing, but I can see it differently. I'm still a little fuzzy on how it all comes together.

There are only three girl names, and they are Mary, Kate, and Ashley and they're randomly distributed.

So pick all of the families that have an Ashley and there's a 45% that they have 2 girls. Same for Mary and same for Kate. But pick all of the families that have at least one girl and there's a 33% chance that they have 2 girls.

What's happened? There's overlap between your selections, I guess, when you select individually. Because the two girl families will have one Mary and one Ashley, etc. or two of the same.

OK, I need to think about that some more. I have no problem visualizing the Monty Hall or the birthday problems, but this one has me stumped.
   78. Eric J can SABER all he wants to Posted: April 08, 2008 at 10:52 PM (#2735719)
78: I suppose it might depend on whether the game is seen by future players; if the player is never given the opportunity to switch upon picking a goat, any player given the chance to switch will know they've selected the correct door. But apart from that, of course he'll give the player a goat at every opportunity.
   79. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 11:05 PM (#2735729)
Ah, I've got it, I think. In the simple example, it's not really knowing the names, it's knowing the percentage. So once you know the name, you can eliminate those families with none of that name. If you know the percentages, then you know the changed odds. In the simple 3-name example, knowing one name eliminates more of the single-girl families than double-girl families. Of course, with an even distribution you knew it had to be one of them, so I'm a little unclear on that.

However, I think you need to know the name in order to solve the problem. You can't narrow it down to 1 in a million without knowing the name. Because there might be 10,000 names that are 1 in a million, so just by knowing it's rare, you really don't know to use 1/1,000,000. If there are 10,000 other names that are 1 in a million, then you have to use 1/100, don't you? In the same way that knowing that the name is 1/3 in the simple example doesn't help you. You need to know which of the 3 it is.
   80. mchengcit Posted: April 08, 2008 at 11:17 PM (#2735737)
However, I think you need to know the name in order to solve the problem. You can't narrow it down to 1 in a million without knowing the name. Because there might be 10,000 names that are 1 in a million, so just by knowing it's rare, you really don't know to use 1/1,000,000. If there are 10,000 other names that are 1 in a million, then you have to use 1/100, don't you? In the same way that knowing that the name is 1/3 in the simple example doesn't help you. You need to know which of the 3 it is.


Yes, I think this is correct, although I guess it depends on the exact phrasing of the question.

If it's something like, "The girl has a very rare name (one in a million), what is the probability that all families with a girl of THAT NAME have two girls?", then you should use a probability of one in a million

If it's something like, "The girl has a very rare name (one in a million), what is the probability that all families with girls of names of that rarity have two girls?", then you'd have to figure out exactly how many names there are that have rarity one in a million, and add them all up. Of course, it doesn't change the final that much until you begin including a signficant percentage of the population (like 1/3 in your example).
   81. gator92 Posted: April 08, 2008 at 11:22 PM (#2735743)
I'm struggling with how the name can have an impact. Willing to be convinced, but so far not believing 50%. Maybe it's the wording of the problem, though...

Bialik's quiz says "you know that a certain family...", which I take to mean that it is a particular family you are talking about. Not one that will be selected randomly once you decide what name you remember the girl has, but a particular family that is already chosen.

So, suppose you have misremembered her name, and it is a very common one instead of very rare. Or you remember that "Rhiannon" is really her nickname, and her real name is Jennifer or something. Ot that she lied to you, and gave a phony name. Does your faulty recollection or her dishonesty change the probability of the gender of her sibling? Remember, we're (supposedly) talking about a "certain family", not a portion of a distribution cloud...

Edit: Or what if all names are equally unusual? Then knowing the name is irrelevant, right, because you have to assume she has a name...
   82. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 11:32 PM (#2735750)
If it's something like, "The girl has a very rare name (one in a million), what is the probability that all families with a girl of THAT NAME have two girls?", then you should use a probability of one in a million

Yes, exactly. It doesn't have to be a name, it can be any qualifier that gets to one in a million. If you just remember that she has a name that's rare, that's not enough. You need to know how uncommon her specific name is in order to use 1/1,000,000. But you're right, it's just significant digits until you get a large proportion.
   83. Greg Pope thinks the Cubs are reeking havoc Posted: April 08, 2008 at 11:34 PM (#2735752)
Bialik's quiz says "you know that a certain family...", which I take to mean that it is a particular family you are talking about. Not one that will be selected randomly once you decide what name you remember the girl has, but a particular family that is already chosen.

I think this is also significant. You're not using the name to select the family, you've already picked them, then remember the fact. So if that's actually true then the proportion didn't help in the selection. Does this change the calculation?
   84. mchengcit Posted: April 08, 2008 at 11:48 PM (#2735774)

Bialik's quiz says "you know that a certain family...", which I take to mean that it is a particular family you are talking about. Not one that will be selected randomly once you decide what name you remember the girl has, but a particular family that is already chosen.

Well, if we are talking about any particular family, then the probability is either 1 or 0. Either that family has two girls or it doesn't. The presumption in the problem is that you can't remember either way, so you are forced to fall back to the statistics and look at how likely families with a girl of that given name are likely to have 2 girls, which you deduce to be 2/3. Of course, for any single family, the probability is either 1 or 0, but overall, about 2 out of every 3 families with a girl of that name have two girls.

So, suppose you have misremembered her name, and it is a very common one instead of very rare. Or you remember that "Rhiannon" is really her nickname, and her real name is Jennifer or something. Ot that she lied to you, and gave a phony name. Does your faulty recollection or her dishonesty change the probability of the gender of her sibling? Remember, we're (supposedly) talking about a "certain family", not a portion of a distribution cloud...

Your faulty memory, of course, doesn't change the probability that that particular family has two girls. It's still either 1 or 0. However, it does change the probabilities that you use in order to deduce the likelihood from the general population.

For example, if the name is one in a million, the probability is actually 1.999999/3, not 2/3. If the name is one in ten, then the probability is 1.9/3 instead of 2/3.


Edit: Or what if all names are equally unusual? Then knowing the name is irrelevant, right, because you have to assume she has a name...

Again, it depends on how the question is phrased. If we live in a universe where all all names are rare, and you only know that she has a rare name, then you're right, it doesn't help you at all. However, if you can narrow it down to some subset of names, then things change...
   85. gator92 Posted: April 08, 2008 at 11:59 PM (#2735786)
I am onboard now, just cooked up a little simulation (should have done that before!) Here's what I did:

Make a field of around 200 cells in Excel, in groups of two, set them up as random numbers ("=rand()"). Then conditionally format each cell to red if the random number is less than 0.01, yellow if between 0.01 and 0.50, and green if >0.50. These correspond to "rare girl name", "non-rare girl name", "boy". Then hit F9 and count the red cells, and count how many times they occur next to a yellow vs. how many times next to a green. Do this for a while, and in no time you will be bouncing around 50% or so.

The problem I had was picturing the rare name as belonging to a single girl, when really it could belong to either of the two kids...

Edit: spelling
   86. mchengcit Posted: April 09, 2008 at 12:06 AM (#2735798)
For example, if the name is one in a million, the probability is actually 1.999999/3, not 2/3. If the name is one in ten, then the probability is 1.9/3 instead of 2/3

Should be 1.999999/4 and 1.9/4, sorry.
   87. Never Give an Inge (Dave) Posted: April 09, 2008 at 01:52 AM (#2735889)
The key to the rarity of the name is that it *theoretically* makes it exactly twice as likely that a family with two daughters will have a girl by that name as it is that a family with one daughter will have a girl by that name.

If the name is more popular than that, the problem gets more difficult, and all you people trying to do the math for more common names are oversimplifying things. This is because, as I said earlier, names are not randomly assigned like genders are. At the very least, the names of children within the same family are dependent on each other. For example, no family will give both of their girls the same name.

Consider the example in post 77. In a 2-girl family, there is virtually 0 probability of both girls being named Mary. But AlouGoodbye's probabilities imply that there is a 1/9 chance of that happening, because it assumes that the names of the two girls in the family are independent of each other.

Taking it a step further: Once the first girl in a family is named Mary, the probability of the second one being named Mary is 0. Therefore, for the probability in the overall population to be 1/3, the probability of 1-girl families having a Mary must be more than 1/3. And the probability of a 2-girl family having a Mary must be less than 2/3.

So that's why the author thinks the rarity of the name is relevant. He assumes that by choosing a really rare name, he makes it very close to twice as likely that a 2-girl family will have a Rhiannon than that a 1-girl family will have a Rhiannon. But that's an assumption.

Going back to Alou's post #75:

I think that is the unstated assumption on which the problem depends. However, names are not really randomly assigned the way that gender is. Maybe the family had a recently deceased grandmother named Rhiannon and they had determined to name one of their daughters Rhiannon no matter how many daughters they had. Maybe the family comes from a small tribe where the name Rhiannon is very common and everyone has a daughter by that name. I agree 50% is a reasonable best guess, but it's not necessarily the *right* answer without the key assumption.

Dave, the hypotheticals you make do not affect the problem. We only need the following assumptions:

1. That any given baby has a 1/2 probability of being a girl, independent of all other births.
2. That the probability of the rare name being given is independent of the number of girls born in the family.

That's it.


#2 may not be true, though, which is what I was trying to show with my examples. Consider the example where a name is rare in the overall population but common in a sub-population (like a small town or tribe). In fact, consider a very small tribe where everyone names their first daughter Rhiannon. The odds of a two-daughter family having a Rhiannon are exactly the same as the odds of a one-daughter family having a Rhiannon. Within that tribe, the odds are 100% in both cases. Within the overall population, the odds of being named Rhiannon may still be 1/1,000,000.

Or consider an extremely rare first name that, in some families, is passed down to the first-born female in each generation. In fact, this is the only way in which the name survives in the population today. Having two girls doesn't increase the odds that a family will have a girl by this name.

These examples may be farfetched -- my only point is that there is an implicit assumption underlying question #3.
   88. Der Komminsk-sar Posted: April 09, 2008 at 01:55 AM (#2735892)
You guys are making this way more complicated than it needs to be.
...
...
...
...
...
Carry on!
   89. Roy Hobbs of WIFFLE Ball Posted: April 09, 2008 at 02:30 AM (#2735931)
I guess I'm just stupid, but somebody help me here.

You know that a certain family has two children, and that at least one is a girl. But you can’t recall whether both are girls. What is the probability that the family has two girls — to the nearest percentage point?


If I already KNOW one child is a girl, isn't the probability of it being two girls 51/49 (or whatever nature states is the probability of any single child being female)? Unless, of course, the chromosomal "track record" of having a child of a particular gender matters at all when predicting the next one (father of two girls, with a third on the way, speaking). Ah well, this thread is making my head hurt. :-)
   90. Never Give an Inge (Dave) Posted: April 09, 2008 at 05:57 AM (#2736040)
Roy Hobbs, one way to think about it is that you don't know *which* child is a girl, only that one of them is.

Think of it like flipping a coin two times. If I told you that at least one of the times I flipped a head, what are the odds of having flipped heads twice? Well, there are four possible outcomes, each with equal probability -- HH, HT, TH, TT. The only piece of information that you have is that the outcome wasn't TT. The possible outcomes are HH, HT, TH. Each has equal probability, so the probability of HH is 1/3.

Another way to think about it is that of all the two-child families in the country, 25% will have two boys, 25% will have two girls, and 50% will have one of each. If you eliminate the families that have only boys, then 25/75 = 1/3 have two girls.
   91. Greg Pope thinks the Cubs are reeking havoc Posted: April 09, 2008 at 12:27 PM (#2736104)
These examples may be farfetched -- my only point is that there is an implicit assumption underlying question #3.

All that you've said is true, but my problem has been trying to understand the logic behind the answer of the simple problem, not the real-world problem. In other words, to get a real-world answer you'd need the tweaks that you suggested, but I can't get there until I understand the basics. And to be honest, I don't really care about the real-world problem anyway.

You're trying to teach me calculus when I'm struggling with algebra.
   92. TomH Posted: April 09, 2008 at 01:11 PM (#2736126)
so are the asnwers 92%, 33%, 50%, xx, 67%, 35%, 30%, and 'n'? I don't know a direct way to solve Q#4, altho I coudl simulate it in excel or matlab easily enough.
   93. Hal Chase Headley Lamarr Hoyt Wilhelm (ACE1242) Posted: April 09, 2008 at 01:29 PM (#2736136)
Q4: grinding it out in Excel with BINOMDIST leads to 269.

Isn't Q1 91%, not 92%?

I put 30% for Q7 too. How did you arrive at that number?
   94. shazbaat Posted: April 09, 2008 at 08:21 PM (#2736593)
For problem 3, it seems to me that to get the 50% answer you are assuming that because the name is a 1 in a million name, it is either more notable, or memorable. Otherwise you have:

1/2 the families with rare name X are boy/girl.
1/2 the families with rare name X are girl/girl.

but now, if you don't assume that either you were given the rare name, because it is rare, or you remembered it because it is rare, you have to split the second case, so you have

1/2 the families with rare name X are boy/girl.
1/4 the families with rare name X are girl/girl, and you remember the rare name X.
1/4 the families with rare name X are girl/girl, and you remember the other name.

now the third case only matters because it changes the distribution. from 1/2:1/2 to 1/4:1/2 or a 1/3 probability that the girl with the rare name came from a girl/girl family.
   95. Der Komminsk-sar Posted: April 10, 2008 at 03:38 AM (#2737569)
Q1: It's 92% (100 true positives, 9 false negatives).

Q4: grinding it out in Excel with BINOMDIST leads to 269.
Ah - I'd been using the POWER, FACT, and SUM functions over a range of cells.

Q7: 30%? I'd say 10% (it approaches 1/9 over large ranges of large numbers, which I presume is what they mean to do here). While these aren't random numbers and there are no doubt pressures that cause revenue figures to more often begin with 1 (followed by x digits), rather than 9 (followed by x-1 digits), it wouldn't be enough to bump our estimate upwards. Am I missing something?
   96. Lassus Posted: April 10, 2008 at 03:59 AM (#2737575)
Monty can only open a door that has a cow behind it.

Then how will I get my car? This game is broken!


That cow might be worth more than that Chevy Celebrity.
   97. Der Komminsk-sar Posted: April 16, 2008 at 03:02 AM (#2745849)
The answers aren't up yet, but I now think Q7 is 30%.

From Wikipedia:
...Benford's law can be explained if one assumes that the logarithms of the numbers are uniformly distributed; this means that a number is for instance just as likely to be between 100 and 1000 (logarithm between 2 and 3) as it is between 10,000 and 100,000 (logarithm between 4 and 5). For many sets of numbers, especially ones that grow exponentially such as incomes and stock prices, this is a reasonable assumption.

More precisely, Benford's law states that the leading digit d (d ? {1, …, b ? 1} ) in base b (b ? 2) occurs with probability proportional to log[sub]b[/sub](d + 1) ? log[sub]b[/sub]d = log[sub]b[/sub]((d + 1)/d). This quantity is exactly the space between d and d + 1 in a log scale.

In base 10, the leading digits have the following distribution by Benford's law, where d is the leading digit and p the probability:
d p
1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Shooty Survived the Shutdown of '14!
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOTP April 2014: BurstNET Sued for Not Making Equipment Lease Payments
(2651 - 8:49pm, Apr 24)
Last: zenbitz

NewsblogMichael Pineda ejected from Red Sox game after pine tar discovered on neck
(122 - 8:48pm, Apr 24)
Last: Avoid running at all times.-S. Paige

NewsblogConnie Marrero, oldest Major Leaguer, dies at 102
(14 - 8:47pm, Apr 24)
Last: BDC

NewsblogOT: The NHL is finally back thread, part 2
(234 - 8:44pm, Apr 24)
Last: zack

NewsblogCalcaterra: Blogger Murray Chass attacks me for bad reporting, ignores quotes, evidence in doing so
(38 - 8:41pm, Apr 24)
Last: Hysterical & Useless

NewsblogNY Times: The Upshot: Up Close on Baseball’s Borders
(51 - 8:37pm, Apr 24)
Last: RMc's desperate, often sordid world

NewsblogOMNICHATTER for 4-24-2014
(46 - 8:30pm, Apr 24)
Last: JE (Jason Epstein)

NewsblogThe Five “Acts” of Ike Davis’s Career, and Why Trading Ike Was a Mistake
(68 - 8:24pm, Apr 24)
Last: Walt Davis

NewsblogColiseum Authority accuses Athletics of not paying rent
(27 - 8:08pm, Apr 24)
Last: Steve Treder

NewsblogIndians Usher Says He Was Fired for Refusing to Wear Pro-Sin Tax Sticker
(23 - 8:07pm, Apr 24)
Last: eddieot

NewsblogMatt Williams: No problem with Harper's two-strike bunting
(28 - 8:00pm, Apr 24)
Last: Yellow Tango

NewsblogPrimer Dugout (and link of the day) 4-24-2014
(8 - 7:43pm, Apr 24)
Last: Eric J can SABER all he wants to

NewsblogJosh Lueke Is A Rapist, You Say? Keep Saying It.
(253 - 7:25pm, Apr 24)
Last: RMc's desperate, often sordid world

NewsblogToronto Star: Blue Jays pave way for grass at the Rogers Centre
(16 - 7:11pm, Apr 24)
Last: RMc's desperate, often sordid world

NewsblogFull Count » Red Sox to call up right-hander Alex Wilson, option Daniel Nava
(11 - 6:53pm, Apr 24)
Last: Joe Bivens, Minor Genius

Demarini, Easton and TPX Baseball Bats

 

 

 

 

Page rendered in 1.2282 seconds
52 querie(s) executed