|
|
|
|
Baseball Primer Newsblog— The Best News Links from the Baseball Newsstand
Monday, April 07, 2008
Attention all Mlodinowsters! (ahem...I would knock off this quiz in a minute, if I wasn’t busy tracking down that ‘Arnold Stang vs Super-Mechagodzilla’ video!)
Mr. Mlodinow, a visiting lecturer at Cal Tech and co-author of “A Brief History of Time” with Stephen Hawking, peppers “The Drunkard’s Walk,” set to be published next month, with dozens of examples, ranging from historical to personal to newsy. They serve to sketch an engaging history of probability and statistics, and to bolster his underlying thesis that the randomness that afflicts a sot’s amblings is pervasive in our lives. Furthermore, Mr. Mlodinow argues, we often act as if under the influence, failing to recognize the randomness in patterns and the patterns in randomness.
1. Suppose 1,000 athletes are tested for drugs. One in 10 have used the drugs, and the test has a 1% false-positive rate (and the false-negative rate is negligible). If an athlete from this group tests positive, what is the probability that she has used the drugs, to the nearest percentage point?
4. In baseball, suppose the American League champion is better than the National League champion, such that it has a 55% probability of winning each game against the NL champ. Then the NL champ nonetheless will win a best-of-seven-games series four in 10 times. What is the smallest odd number, X, for which a World Series between these two league champs that is best-of-X will ensure that there’s a 95% probability of a just result — the superior AL champ winning?
Repoz
Posted: April 07, 2008 at 02:49 PM | 100 comment(s)
Related News: General, Sabermetrics
|
My Bookmarks
You must be logged in to view your Bookmarks.
Hot Topics
|
|
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
(4) is not a bad use of statistics, but part of a larger problematic with the use of statistics. The idea that teams possess a single factor of quality which can thus be universally and decontextually compared has not been shown to be the case, and is highly unlikely to be the case. It is true that lots of things happen in baseball that we don't expect, and some of them do not happen because one team expressed their real superiority over the other team, but the exact measure of random variation in baseball is not something we can learn from this sort of methodology. The presumption that this is at least a good estimate has also not been shown to the be case, and I would dispute that it is the case in many specific situations.
For some people, applying Bayes to this question is easy to understand. For others, they can't figure out how the new probability is NOT 50-50, which I guess is what Walt's referring to in post #2 above.
Yes and no. Knowing nothing of the English language, it would still be probable that there is at least one word of the form ????N? that is not of the form ???ING. All words of the form ???ING would necessarily be of the form ????N?, however.
Correct. That's the fact I'm referring to. I guess it's more of a logic exercise, but still, not probability.
I guess I can put it this way. Does the answer change if instead of it being phrased as "changing", you are simply asked to pick door 1 or 2? Meaning after Monty shows you the cow in door three, he doesn't say "do you want to change" but instead says "you can now pick between 1 or 2."
If this is the case, does it matter if you pick 1 or 2?
I noticed at least one in the article.
I think the answer would be the same.
[warning: I'm going to give away the answer here]
Assuming Walt's assumption from #2, let's say you originally picked door #1, and Monty Hall revealed a cow in door #3. When you picked door #1, the odds of a car being there was 1 in 3. Hall revealing a cow does nothing to change those odds, because you already knew that either door 2 or door 3 (or both) had a cow behind it. But having revealed a cow behind door #3 does change your odds with respect to door #2 - there's now one cow and one car, so the odds of a car behind door #2 are 50/50. So you should pick door #2 regardless of how the question is worded.
But, and this is where I'm confused, if you select door #1 anew as in being unrelated to your first choice, then I don't see how this selection isn't 50%. I'm saying that this selection is a new event (even if it's the same door) and so doesn't have the same "old" 33% chance, but a 50 percent chance.
I'm sure I am wrong and people love to explain this stuff, so shoot.
If you play the game 100 times and every time you stick with your original guess (or "re-pick" your original guess) you would expect to win about 33 times, as that is the probability.
But if you play the game 100 times and switch every time, you would expect to win 50 times, as half the time you will switch to the car and half the time you will switch to the cow.
Because there was no chance that Hall would reveal a cow behind door #1 under the rules of the game. Once you make your initial choice, he's going to open one of the other two doors and reveal a cow (this is Walt's assumption, which is required to make this work). But, as I said, we already knew there was a cow behind at least one of the other two doors, so we haven't learned anything new with respect to door #1. The conditional probability is the same as the unconditional probability here. But for door #2, we've learned something new - there was definitely a cow behind door #3, which leaves only two equally likely possibilities (with respect to door #2).
If it's not intuitive, I think the only other way to show it is to list out all of the possibilities and just crunch out the numbers.
If there are a 100 doors, you have a 1 in 100 chance of randomly picking the door with the car behind it. There's a 99% chance that the car is somewhere else. This fact won't change no matter what Monty does.
After Monty opens every door except yours and one other, the other unopened door has a 99% chance of having the car behind it. The probability of having the car collapses from those 99 doors onto that one door. Your door is still stuck at 1%.
If Monty had sent me home to think about it and I told my wife in the farmer's daughter's outfit to go in the next day with the instructions, "pick one a 'dem doors honey" then would the odds be 50/50 whether she picked 1 or 2?
wait, what?
We never, ever know that one team has a 55 percent probability of beating another team. If nothing else, the fact that both teams will use three or four different starting pitchers in a series means that the probabilities will be different in every game. Then there are a thousand or so other variables you have to account for before you could know a true probability.
True enough. One of the points of the exercise, though, is that even if you do know, the knowledge isn't worth a lot.
Why? Monty has revealed two facts. There are 98 open doors with cows behind them. And there are two unopened doors, one with a cow behind it and one with a car behind it. Why is there a 99 percent chance that the door I didn't pick has the car?
But we don't, actually. Probability models model reality, but the model never represents the "true" probability because even in roulette, dice and cards there are various effects that can alter the probabilities in small but real ways. There's the Spanish guy who got thrown out of all the Casino's in Europe for tracking the biases on Roulette wheels and using it to make money.
The affects that different pitchers, different playing conditions, etc. have on the outcome of baseball games may be more obvious effects that change that probability, but those effects exist in all sorts of seemingly "random" probabilities. After all if you could throw a pair a dice exactly the same under the exact same conditions, you'd get the same result every time. The probability model is simply a mathematical approximation of all of the various factors that come into play that would be difficult to impossible to model with any real accuracy.
So the real key, once we have the probabilistic model, is to estimate how well that model approaches reality. With "fair" dice it seems to work well. Probably not as well with the baseball example he gives, but I bet it's not a terrible approximation either.
I'm not sure what the 100-door version of the problem adds, so I'm going back to the 3-door example.
The key point is that Monty couldn't open your door. He didn't have three doors to open and left two of them closed. He had two doors to open and left one of them closed. You told him to keep door #1 closed, so he didn't provide you with any new informatino about what was behind door #1.
When you chose door #1 the first time, there was 1/3 chance the car was behind door #1 and a 2/3 chance the car was behind doors #2 or #3.
Now, he has shown you it's not behind #3. The odds of it being behind #1 are still 1/3, because he has given you no new information about door #1. No matter where the car is, he's always going to be able to open another door that isn't #1 and show you a cow behind it.
If the odds of the car being behind door #1 are still 1/3, then the odds of it being behind #2 must be 2/3. So you should switch to door #2.
Another way to look at it is that he has told you, "If the car is behind either doors 2 or 3, it's NOT behind door 3." That is valuable information about door #2.
In the 100-door example, he's basically saying, "Of these 99 doors, I can show you there's no car behind 98 of them. But I CAN'T show you what's behind this last one." Because he knows where the car is, that's pretty valuable information.
Decontextual is not a word. I looked for some time to try to find a synonym for "out of context" but couldn't find one. Nevertheless, if you want a neologism, I would go with excontextually. Ex- means "out of," while de- means "removal," so either could work. But excontextual sounds right to my ear.
I was going to post this in response to MCA, but figured it was nitpicking and didn't bother. But since it continues...
The question reads: "In baseball, suppose..." Does it really matter if what he's asking you to suppose is unlikely or is simpler than really happens? He starts question 5 with "Suppose you're on a game show..." I'm surprised nobody has said "Well, this question is flawed because I'm never on a game show, and Let's Make A Deal is off the air, so who does he think he's fooling?"
I posted something that I thought was germane to the topic. If you don't think it is, which is certainly within your rights, feel free not to comment on it.
Let's call the door which hides the car Door #1. Of course, you don't know this, so let's you pick randomly:
You pick each door with 1/3 probability. Going on a case-by-case basis:
A) (1/3 probability) You pick Door #1. Monty now has to show you what's behind one of the other two doors. Both have cows, so he opens either with equal probability (1/2):
A1) (1/6 probability) Monty shows you a cow behind Door #2
A2) (1/6 probability) Monty shows you a cow behind Door #3.
B) (1/3 probability) You pick Door #2. Monty now has to open a door, but he can't open Door #1, because that has a car. Therefore, he is forced to open Door #3.
C) (1/3 probability) You pick Door #3. Monty now has to open Door #2, to show you a cow.
So, if Monty opens Door #3 to show you a cow, it's twice as likely that Door #2 hides the car (1/3 vs. 1/6). Same thing if Monty opens Door #2. As Dave says, the fact that Monty shows you that a car is NOT behind the door he opens is valuable information.
The problem is ambiguously worded. If Monty opens doors randomly, then all of the above do not apply. You don't gain any information about what's behind the third door because there's some probability that the door he opened could have revealed a car.
Gender ratio problem:
The answers to the two questions are different.
In Question #2, the conditions apply equally well to boy/girl families as well as to girl/girl families.
In Question #3, by assigning a name to one of the girls, it is twice as likely that the condition applies to a girl/girl family than to a boy/girl family. The fact that the name is very rare doesn't matter, I think. Try replacing the girl's rare name with "The eldest child is a girl", and you get the same result.
Once I've picked it, I'm not going to gain any additional information about my door. No matter which door I choose, Monty's move is going to look the same to me: he'll reveal 98 losers.
So my 1% chance that I picked the right door from the outset doesn't change.
I guess this is why I made a distinction earlier between "keeping" your choice and "picking" door #1 a second time. If you pick #1 a second time, this time you are doing so with the knowledge that door #3 has a cow, and that, therefore only one cow remains.
When you picked it the first time of course you didn't have that information but the second time you do.
However, you also have information about door #2 by virtue of the fact that Monty DIDN'T open it, so door #1 and door #2 are no longer on equal footing.
True, I guess, but it still seems odd that if another contestant were brought on at that point with no knowledge of what had just transpired, he would have a 50% of being right if he guessed #1 while I have a 33% if I picked 1 again.
Or maybe it's not so odd since the whole question is about knowledge anyway...
In Question #3, by assigning a name to one of the girls, it is twice as likely that the condition applies to a girl/girl family than to a boy/girl family. The fact that the name is very rare doesn't matter, I think. Try replacing the girl's rare name with "The eldest child is a girl", and you get the same result.
I agree that this is probably what he's getting at with the question. But I don't think knowing the name actually helps you, because once you know that there's at least one girl, you also know that one of the children has a girl's name. It's not true that "In Question #3, by assigning a name to one of the girls, it is twice as likely that the condition applies to a girl/girl family than to a boy/girl family."
Knowing that the eldest child is a girl, however, does change the question.
You'd both have the same odds of being right by picking #1 (33%). The difference is that you'd know your odds were 33%, while the other contestant would think his odds were 50%. That's the value of the added information.
Knowing the girl's name does make it twice as likely, because either of the two girls may have that name. In the question, it states that the name is one that one in a million females share. The probability that a girl/girl family has one girl with that name is actually 2 in a million (assuming of course the parents select that name with the same probability as the general population). A girl/girl family is twice as likely to meet the condition than a boy/girl family, where only one girl is available.
That's a pretty big assumption, though, isn't it?
Before a door is revealed, the probability of the other (not yours, not revealed) door having the car is 2/3. If in general you decide to switch, regardless of which door is revealed, you'll get the car 2/3 of the time. One-third of the time your first choice was correct (and switching was the wrong thing to do), while two-thirds of the time you'll switch to the correct door.
But that's not what was presented in the question. The question deals with the decision to switch if door 3 is revealed. That indeed changes the probability. If you know door 3 has a cow, the probability of a car doesn't shift just to door 2. It shifts to both doors equally.
This is very similar to the distinction between his questions 2 and 3 (or at least if you use older/younger instead of specific name, as someone suggested above). If you know general information, the probabilities are different from when you know specific info. If you always pick door 1 to start, you should always switch. If you always pick door 1 and he always reveals door 3, it doesn't matter whether you switch or not.
So, whoever said the question was poorly worded, I agree. He's asking about the switching strategy when you pick door 1 and are shown door 3. If he generalizes the problem, it's a different answer.
For those who like formulae:
Pr(car=1 | car<>2 and/or car<>3) = Pr(car<>2 and/or car<>3 | car=1) * Pr(car=1) / Pr(car<>2 and/or car<>3) = 100% * 33% / 100% = 33%.
Pr(car=1 | car<>3) = Pr(car<>3 | car=1) * Pr(car=1) / Pr(car<>3) = 100% * 33% / 67% = 50%.
For any one set of parents, yes. I guess what I am trying to say is that in the entire population, specifying that a girl is named X will select out girl/girl families with twice the frequency as boy/girl families, just because either of the girls can be named X. That of course is not true if you say, "The girl has a name", because in that case, all families will be counted equally whether they are boy/girl or girl/girl.
If you happen to know Bayes' Theorem, then it's easy:
P(girl/girl family | a girl in family has rare name) = P(a girl in family has rare name | girl/girl family) P(girl/girl family)/P(a girl in family has rare name)
P(a girl in family has rare name) = 1/1,000,000 (specified in question)
P(girl/girl family) = 1/4
P(a girl in family has rare name | girl/girl family) = 1 - (1 - 1/1000000)^2 = 2/1,000,000
P(girl/girl family | a girl in family has rare name) = (2/1,000,000 * 1/4)/(1/1,000,000) = 1/2
I guess this is where the rare name comes in because, if the name were common, then the probability of a girl in a girl/girl family having that name is not quite twice the probability of any girl having that name.
I disagree. First of all, you are being asked about a specific instance of a game. Nowhere does he say that this is a repeated game in which only door #3 is opened.
But even if that were the case, it doesn't really matter. What matters is that the car is always randomly placed behind one of the 3 doors such that the probability is 1/3 that it will be behind any of the doors at the beginning of the game, and that Monty can only open a door that has a cow behind it.
Then how will I get my car? This game is broken!
It is a constitutive fact of baseball that the probabilities are not known beforehand. This is along the lines of saying, "In baseball, suppose that your wide receiver wins the opening faceoff..." You can only suppose things that are possible in the game.
He has generalized the problem. He says:
Anyway, the easiest way to think about it is this:
Do you want the door you originally picked, or all of the other doors? Because you know that Monty will open a door with a cow, and you know that one of the doors you didn't pick contains a cow. So you don't really gain anything by him opening the door. You're choosing between getting door 1 and getting both door 2 and door 3.
I usually point out the expansion to 100 in explaining the problem, so that might have been me in the earlier thread. If you pick one door, then get the option of keeping your door, or getting the other 99, which would you take? It's obvious that you take the other 99, right? Just because Monty opens 98 of them to show cows doesn't change that choice. It may make you nervous, but you still got the 99 doors.
Suppose you know beforehand that, no matter what, Monty is going to reveal a cow. Then you know all you have to do in order to win the car is to pick one of the cows (and switch). Clearly you clearly have a 2 in 3 chance of picking a cow (and only a 1 in 3 chance of picking the car). In other words, you know from the start that switching is a better strategy than sticking. You know this is true even if Monty doesn't offer the switch. (In fact, the "proper" state of mind to be in--having chosen but before getting the option to switch--is to hope Monty offers a switch.)
Caveat: This applies only as long as Monty's choice to reveal a cow didn't depend on whether you picked a cow. You might think that Monty knows you're a clever guy, and would only show you one of the cows if you got lucky with your first pick....
I think one of the reasons that our intuition turns against the "correct" answer is that we properly guess that Monty has more information than we do (where the car is), and that Monty has interests that don't coincide with ours (e.g., preserve the property of "Let's Make a Deal" and make good television programming, among others). Thus, the "real life" implications of this problem are somewhat less impressive than the "laboratory" results. In truth, the Monty Hall problem shows us less that people are "irrational" than it shows that we have ingrained skepticism of the motivations of people like Monty Hall--which isn't really a flaw, IMO.
What this means is the fact that the problem includes the fact that Monty knows where the car is makes it a worse choice to switch. You'd prefer to have an MC who truly doesn't know (and thus couldn't be trying to trick you). So I disagree with (2.) above--you're better off switching if Monty is opening doors randomly. If he randomly opened the car, you wouldn't have to wonder whether you should switch. You'd have already lost.
But they're not known before hand in any real life problem that probability theory models. Yes the difference between the model and the reality might be much smaller in dice than it is in baseball, but there's any number of factors that could affect the dice roll in a way such that the results would differ slightly from the model.
Take a deck of cards for example. Suppose you shuffle a deck of cards and place them on the table. You may think the known probability of the eight of clubs being the first card in the deck as 1 in 52, but in actuality the probability of that card being the eight of clubs is either 0% or 100%, it has already been determined. The 1 in 52 is simply a model to approximate the imperfect knowledge we have of that top card, not the real identity of that top card.
In poker hands where the eight of clubs has been dealt, is the probability that the first card is the eight of clubs in the next hand necessarily equal to 1 in 52? Couldn't it possibly be that imperfect shuffling might make it 1 in 51.6 or 1 in 52.4? Wouldn't that make whether a computer program shuffled virtual cards or a human shuffled real cards a factor in these probabilities.
Probability theory is not now, nor ever has been a reality in and of itself. It is our attempt to model our own human uncertainties. With perfect knowledge, most games of chance would be no such thing.
Whether talking cards, dice, baseball or politics, the applicability of a probability model depends entirely on its ability to approximate the real world reality. Saying that it only applies to questions with "known probabilities" is a false dichotomy because such a thing does not exist.
I think fair dice have a true, known probability.
But if the rules of the game are (a) Monty Hall must open a door with a cow, and (b) Monty Hall cannot open the door you've chosen, then there's no way he can trick you. If you did not pick the car at first, he's only got one choice - to reveal the one remaining cow, and if you did pick the car at first, he can open either door but the result's the same for you - you see a cow.
On the other hand, if Monty doesn't know where the car is and can hence reveal the car when he opens a door (thereby guaranteeing that you lose), then your odds are stuck on 1/3 no matter what you do.
Suppose you play the game 1000 times, and choose to "switch" every single time. The only time you would ever lose is if you picked the car from the outset, and that would probably only happen about 333 times.
The 100-doors example makes it even more clear. If you switch every time, you would only lose if you happened to pick the car immediately from the get-go, and the odds of that happening are very low (1%.)
This is an excellent and concise way to put it.
By definition.
#4: I took time off from watching a game and stuck my nose back into a spreadsheet to tackle #4. I anticipated that the answer was going to be big, but not that it would be quite that big.
I was calculating this earlier today and my spreadsheet blew up (calculating the factorals) after game 171 - IIRC, I'd only reached the 80% threshold or so.
#2: P(girl/girl family) = 1/4
Strictly speaking, here it's 1 in 3 - you can eliminate boy/boy as a possibility.
Presuming I understand #7 correctly, I don't like this question, though I guess saying nearest 5 percentage points is their way of weasling out of its asymptotic nature.
Agreed - I've not done a good job of explaining this problem to others.
No sir. The level of divergence from a simple 1 in 6 probability for so called "precision dice" in casinos may be very small, but it is not non-existent and is not required to be by law. It is doubtful there is a die in the world that has an exact 1 in 6 probability for every number. And if it did exist, it would almost certainly deviate from that after several uses, making a real world confirmation of its initial probability, impossible.
This is my point. The difference between the identity of a starting pitcher and the balance of a particular die (or the shuffle of a deck of cards) are differences in degree, not in kind. Therefore the applicability of a probability model should be due to how well the model conforms with reality in either case. That one is dice and the other baseball is irrelevant.
I was about to argue that we shouldn't intrepret "fair dice" in the sense of physical dice, but as a necessary abstraction - 'til I reread your posts and realized we were saying the same thing. So ignore my blurb in #56 (done and done, I'm sure).
You're adding rules. Monty *did* open a door with a cow. You don't know he *had to*.
Fair enough. I was looking at Monty asking me if I want to switch to door number 2, not, "say", door number 2. But now I see that he does generalize it enough prior to that.
See Walt's comment #2 in this thread. The conventional version of this game assumes that Monty always reveals a cow. If Monty can open a door with either a cow or a car, then the odds of winning are exactly 1/3 no matter what you do.
I guess it depends on the rules. If Monty showed me a car, I'd choose to switch to the door with the car behind it. Nothing in the rules says you can't switch to the door Monty just opened for you.
Edit - 20 switches, 13 cars. 20 "stand Pat" Gillick's, 6 cars.
I'm really struggling with this question and explanation. Running through your equation with a common name (say, 1/100), still gives me .4975, which means that the rare part doesn't play a huge part in it.
Wait, what's the answer to #2? Is it 1/2 (assuming as above that the birth rate is 50%)? So the rare name knocks it down to 0.49999975 and a common name knocks it down to .4975.
Note: Probability was never my strong suit, primarily because I had a hard time with what Voros mentioned. That top card in the deck is what it is, no matter what I know or what the guy who looked at the deck knows. It took me a long time to internalize probability.
It wasn't all that convincing to me.
I did 30 "switches" and won 17 cars, and 30 "stand pats" and won 16 cars.
Assume an equal birth rate. There are the following gender possibilities for a family of two children:
Elder child is a boy, younger child is a boy (probability 1/4)
Elder child is a boy, younger child is a girl (probability 1/4)
Elder child is a girl, younger child is a boy (probability 1/4)
Elder child is a girl, younger child is a girl (probability 1/4)
So if we know that the family has one girl, the probability that both are girls is
(1/4)/(1/2 + 1/4) = 1/3.
The reason to make the name "rare" in question #3 is to make that answer vanishingly close to 1/2.
What's throwing me off is how the question is stated.
I'm missing how the name helps me. In both, I know at least one is a girl. In the latter, I know her first name. How does knowing the known girl's first name help me with the unknown child?
I think comment #42 in this thread has the best answer for you. Basically, it's twice as likely that a family with two girls will have a girl named Rhiannon (the most unusual name of a girl that I actually know that I could think of) than that a family with one girl will have a girl named Rhiannon. As others have suggested, this is generally true regardless of the unusualness of the name, although the more unusual the name, the closer the true probability will approach exactly 50%.
In two-children families, half of all girls live in GG families. So there's a 50% chance that a rare occurrence (like a rare name) happens to a girl in a GG family.
But only one-third of two children families with a girl in them are GG. So if we happen on a family with a girl in it, only a 1/3 chance that it's GG.
I think that is the unstated assumption on which the problem depends. However, names are not really randomly assigned the way that gender is. Maybe the family had a recently deceased grandmother named Rhiannon and they had determined to name one of their daughters Rhiannon no matter how many daughters they had. Maybe the family comes from a small tribe where the name Rhiannon is very common and everyone has a daughter by that name. I agree 50% is a reasonable best guess, but it's not necessarily the *right* answer without the key assumption.
An impenetrable enclave of Fleetwood Mac fans.
1. That any given baby has a 1/2 probability of being a girl, independent of all other births.
2. That the probability of the rare name being given is independent of the number of girls born in the family.
That's it.
2. That the probability of the rare name being given is independent of the number of girls born in the family.
That's it.
My problem is that it doesn't actually matter what the name is or how rare it is. Just by knowing the name the probability jumps from 33% to near 50%. Even in the case of a very common name (1/10), you get .475. If half of the girls were named Josephine and you got a Josephine you'd be at .375. So knowing the name increases your odds of having GG in all cases, no matter how rare the name, according to the math.
Which means that if you had 9,999 families and pulled all of the ones who had at least one girl, you'd get 3,333 that had two girls. But if you had 9,999 families and pulled out all of the ones with at least one girl, then told the name of that girl, you'd get more than 3,333 that had two girls.
How is that possible?
(I understand you wouldn't get the exact distribution and all.)
Suppose our chosen name is incrediby common, it's Mary, and 1/3 girls are called Mary. We have 9,999 families, all with at least one daughter, as in your example.
So we expect there to be 6,666 one-daughter families. 1/3 girls are called Mary, so we expect there to be 2,222 one-daughter families with a girl called Mary.
We expect there to be 3,333 two-daughter families, 1/3 girls are called Mary. How many of these families contain a girl called Mary? Clearly we expect more than 1/3 of them (1,111) to contain a girl called Mary, because there are two girls in each family. In fact we expect there to be 1852* two-daughter families with a girl called Mary.
What we are interested in is not this 1852 as a percentage of all 9,999 families, but this 1852 as a percentage of just those families with a girl called Mary. So the percentage is 1852/(1852+2222), which is about 45.5%.
*Rounded to 0 decimal places, that carried through.
I did 20 switches and won 15 cars, and 20 stand pats and won 10 cars.
Oddly enough, both times I tried it, the first half dozen times standing pat was always the right answer and then it normalized more.
Doesn't the game really depend on whether Monty is obligated to offer the switch at all? If he can choose to not give you the opportunity to switch, have you really learned any new information? Why would Monty in reality ever give you the opportunity to switch if you had a goat?
There are only three girl names, and they are Mary, Kate, and Ashley and they're randomly distributed.
So pick all of the families that have an Ashley and there's a 45% that they have 2 girls. Same for Mary and same for Kate. But pick all of the families that have at least one girl and there's a 33% chance that they have 2 girls.
What's happened? There's overlap between your selections, I guess, when you select individually. Because the two girl families will have one Mary and one Ashley, etc. or two of the same.
OK, I need to think about that some more. I have no problem visualizing the Monty Hall or the birthday problems, but this one has me stumped.
However, I think you need to know the name in order to solve the problem. You can't narrow it down to 1 in a million without knowing the name. Because there might be 10,000 names that are 1 in a million, so just by knowing it's rare, you really don't know to use 1/1,000,000. If there are 10,000 other names that are 1 in a million, then you have to use 1/100, don't you? In the same way that knowing that the name is 1/3 in the simple example doesn't help you. You need to know which of the 3 it is.
Yes, I think this is correct, although I guess it depends on the exact phrasing of the question.
If it's something like, "The girl has a very rare name (one in a million), what is the probability that all families with a girl of THAT NAME have two girls?", then you should use a probability of one in a million
If it's something like, "The girl has a very rare name (one in a million), what is the probability that all families with girls of names of that rarity have two girls?", then you'd have to figure out exactly how many names there are that have rarity one in a million, and add them all up. Of course, it doesn't change the final that much until you begin including a signficant percentage of the population (like 1/3 in your example).
Bialik's quiz says "you know that a certain family...", which I take to mean that it is a particular family you are talking about. Not one that will be selected randomly once you decide what name you remember the girl has, but a particular family that is already chosen.
So, suppose you have misremembered her name, and it is a very common one instead of very rare. Or you remember that "Rhiannon" is really her nickname, and her real name is Jennifer or something. Ot that she lied to you, and gave a phony name. Does your faulty recollection or her dishonesty change the probability of the gender of her sibling? Remember, we're (supposedly) talking about a "certain family", not a portion of a distribution cloud...
Edit: Or what if all names are equally unusual? Then knowing the name is irrelevant, right, because you have to assume she has a name...
Yes, exactly. It doesn't have to be a name, it can be any qualifier that gets to one in a million. If you just remember that she has a name that's rare, that's not enough. You need to know how uncommon her specific name is in order to use 1/1,000,000. But you're right, it's just significant digits until you get a large proportion.
I think this is also significant. You're not using the name to select the family, you've already picked them, then remember the fact. So if that's actually true then the proportion didn't help in the selection. Does this change the calculation?
Well, if we are talking about any particular family, then the probability is either 1 or 0. Either that family has two girls or it doesn't. The presumption in the problem is that you can't remember either way, so you are forced to fall back to the statistics and look at how likely families with a girl of that given name are likely to have 2 girls, which you deduce to be 2/3. Of course, for any single family, the probability is either 1 or 0, but overall, about 2 out of every 3 families with a girl of that name have two girls.
Your faulty memory, of course, doesn't change the probability that that particular family has two girls. It's still either 1 or 0. However, it does change the probabilities that you use in order to deduce the likelihood from the general population.
For example, if the name is one in a million, the probability is actually 1.999999/3, not 2/3. If the name is one in ten, then the probability is 1.9/3 instead of 2/3.
Again, it depends on how the question is phrased. If we live in a universe where all all names are rare, and you only know that she has a rare name, then you're right, it doesn't help you at all. However, if you can narrow it down to some subset of names, then things change...
Make a field of around 200 cells in Excel, in groups of two, set them up as random numbers ("=rand()"). Then conditionally format each cell to red if the random number is less than 0.01, yellow if between 0.01 and 0.50, and green if >0.50. These correspond to "rare girl name", "non-rare girl name", "boy". Then hit F9 and count the red cells, and count how many times they occur next to a yellow vs. how many times next to a green. Do this for a while, and in no time you will be bouncing around 50% or so.
The problem I had was picturing the rare name as belonging to a single girl, when really it could belong to either of the two kids...
Edit: spelling
Should be 1.999999/4 and 1.9/4, sorry.
Now we're using my last name too? Man, this is a confusing thread for me to read.
If the name is more popular than that, the problem gets more difficult, and all you people trying to do the math for more common names are oversimplifying things. This is because, as I said earlier, names are not randomly assigned like genders are. At the very least, the names of children within the same family are dependent on each other. For example, no family will give both of their girls the same name.
Consider the example in post 77. In a 2-girl family, there is virtually 0 probability of both girls being named Mary. But AlouGoodbye's probabilities imply that there is a 1/9 chance of that happening, because it assumes that the names of the two girls in the family are independent of each other.
Taking it a step further: Once the first girl in a family is named Mary, the probability of the second one being named Mary is 0. Therefore, for the probability in the overall population to be 1/3, the probability of 1-girl families having a Mary must be more than 1/3. And the probability of a 2-girl family having a Mary must be less than 2/3.
So that's why the author thinks the rarity of the name is relevant. He assumes that by choosing a really rare name, he makes it very close to twice as likely that a 2-girl family will have a Rhiannon than that a 1-girl family will have a Rhiannon. But that's an assumption.
Going back to Alou's post #75:
I think that is the unstated assumption on which the problem depends. However, names are not really randomly assigned the way that gender is. Maybe the family had a recently deceased grandmother named Rhiannon and they had determined to name one of their daughters Rhiannon no matter how many daughters they had. Maybe the family comes from a small tribe where the name Rhiannon is very common and everyone has a daughter by that name. I agree 50% is a reasonable best guess, but it's not necessarily the *right* answer without the key assumption.
Dave, the hypotheticals you make do not affect the problem. We only need the following assumptions:
1. That any given baby has a 1/2 probability of being a girl, independent of all other births.
2. That the probability of the rare name being given is independent of the number of girls born in the family.
That's it.
#2 may not be true, though, which is what I was trying to show with my examples. Consider the example where a name is rare in the overall population but common in a sub-population (like a small town or tribe). In fact, consider a very small tribe where everyone names their first daughter Rhiannon. The odds of a two-daughter family having a Rhiannon are exactly the same as the odds of a one-daughter family having a Rhiannon. Within that tribe, the odds are 100% in both cases. Within the overall population, the odds of being named Rhiannon may still be 1/1,000,000.
Or consider an extremely rare first name that, in some families, is passed down to the first-born female in each generation. In fact, this is the only way in which the name survives in the population today. Having two girls doesn't increase the odds that a family will have a girl by this name.
These examples may be farfetched -- my only point is that there is an implicit assumption underlying question #3.
...
...
...
...
...
Carry on!
If I already KNOW one child is a girl, isn't the probability of it being two girls 51/49 (or whatever nature states is the probability of any single child being female)? Unless, of course, the chromosomal "track record" of having a child of a particular gender matters at all when predicting the next one (father of two girls, with a third on the way, speaking). Ah well, this thread is making my head hurt. :-)
Think of it like flipping a coin two times. If I told you that at least one of the times I flipped a head, what are the odds of having flipped heads twice? Well, there are four possible outcomes, each with equal probability -- HH, HT, TH, TT. The only piece of information that you have is that the outcome wasn't TT. The possible outcomes are HH, HT, TH. Each has equal probability, so the probability of HH is 1/3.
Another way to think about it is that of all the two-child families in the country, 25% will have two boys, 25% will have two girls, and 50% will have one of each. If you eliminate the families that have only boys, then 25/75 = 1/3 have two girls.
All that you've said is true, but my problem has been trying to understand the logic behind the answer of the simple problem, not the real-world problem. In other words, to get a real-world answer you'd need the tweaks that you suggested, but I can't get there until I understand the basics. And to be honest, I don't really care about the real-world problem anyway.
You're trying to teach me calculus when I'm struggling with algebra.
Isn't Q1 91%, not 92%?
I put 30% for Q7 too. How did you arrive at that number?
1/2 the families with rare name X are boy/girl.
1/2 the families with rare name X are girl/girl.
but now, if you don't assume that either you were given the rare name, because it is rare, or you remembered it because it is rare, you have to split the second case, so you have
1/2 the families with rare name X are boy/girl.
1/4 the families with rare name X are girl/girl, and you remember the rare name X.
1/4 the families with rare name X are girl/girl, and you remember the other name.
now the third case only matters because it changes the distribution. from 1/2:1/2 to 1/4:1/2 or a 1/3 probability that the girl with the rare name came from a girl/girl family.
Q4: grinding it out in Excel with BINOMDIST leads to 269.
Ah - I'd been using the POWER, FACT, and SUM functions over a range of cells.
Q7: 30%? I'd say 10% (it approaches 1/9 over large ranges of large numbers, which I presume is what they mean to do here). While these aren't random numbers and there are no doubt pressures that cause revenue figures to more often begin with 1 (followed by x digits), rather than 9 (followed by x-1 digits), it wouldn't be enough to bump our estimate upwards. Am I missing something?
Then how will I get my car? This game is broken!
That cow might be worth more than that Chevy Celebrity.
From Wikipedia:
...Benford's law can be explained if one assumes that the logarithms of the numbers are uniformly distributed; this means that a number is for instance just as likely to be between 100 and 1000 (logarithm between 2 and 3) as it is between 10,000 and 100,000 (logarithm between 4 and 5). For many sets of numbers, especially ones that grow exponentially such as incomes and stock prices, this is a reasonable assumption.
More precisely, Benford's law states that the leading digit d (d ∈ {1, …, b − 1} ) in base b (b ≥ 2) occurs with probability proportional to log[sub]b[/sub](d + 1) − log[sub]b[/sub]d = log[sub]b[/sub]((d + 1)/d). This quantity is exactly the space between d and d + 1 in a log scale.
In base 10, the leading digits have the following distribution by Benford's law, where d is the leading digit and p the probability:
d p
1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%
You must be Registered and Logged In to post comments.
<< Back to main