A quirk of the mighty t-test

Posted by piantado on March 12, 2009

This situation came up in Rebecca’s class last Wednesday. Some students had gathered data that, looked something like this made up version:

This shows for each subject, their RTs on two conditions of the experiment (Green/Blue). If you throw this sucker into the paired t-test, you find that the difference between conditions is, in fact, not significant (t = -1.8994, df = 10, p > 0.08), even though the difference in means is large (-1.28), and every subject shows the correct trend. But the signed-rank test comes out (p < 0.003).

As Talia pointed out, there’s one subject, the first one, which has a much bigger effect size than the others, and the t-test takes this into account. And, if you remove this first subject, the T-test comes out significant (t = -5.8054, df = 9, p < 0.0005). This is because the t statistic takes cares about the variance in the differences (for each subject) between conditions. If you leave in the first subject, this variance is large and the t statistic decreases in magnitude.

So adding a data point with the correct trend can actually decrease your t statistic, and make this difference nonsignificant. Why is this counterintuitive? Because the first point is a much bigger effect than the others in the right direction! And removing it actually decreases the mean difference between conditions (to -0.62).

Beware!

How long till I take over? 1

Posted by piantado on January 20, 2009

I played around with a fun problem this weekend with Celeste. We started to talk about descendants and ancestors. What are the odds that someone descended from me will be alive T years in the future? What are the odds I will have no descendants in 10000 years? What is the relationship between the number of kids I have and the odds my descendants will all die out?

There are some funny phenomena relevant to this. Mitochondrial Eve is one. The number of people descended from Genghis Kahn is another. In Knuth’s Volume 1, he talks about a “proof” by H. W. Watson that under certain assumptions, infinitely many people will be born in the future, but each family line will die with probability 1 (pg 383). This is of course logically inconsistent, and Knuth proves so: that in an infinitely tall tree, you can find an infinite path of descent. If people live forever, someone’s family will live forever. Which should be obvious.

Here’s a simple model for thinking about infinite family trees. Suppose that there is some bounded population P. To create people for the next generation, the following is repeated: two people are chosen at random, bred, and their (one) child inhabits the next generation. That child is a descendant of their two parents. This is repeated until the next generation is filled up with people. And so on. This neglects a lot of important things–like gender, natural selection, sexual selection, and the grossness of incest. But maybe its not such a bad start…

First, two observations. If the population size is bounded, then it is eventually going to be the case that everyone is a descendant of me, or nobody is. This is because everyone being my descendant, and nobody being my descendant are fixed points—once either is true, it is true for all of time. And each generation there is some probability of either of these happening, so given enough time, one must come true. The second thing to notice is that the expected number of descendants of me at the next time step is 1-(1-p)^2 = p(2-p), where p is the proportion of people who are my descendants at the current time step. This is because the probability of getting someone in the next generation who is not a descendant of me is the probability that neither parent was a descendant of me, or (1-p)(1-p). (This also shows that p=0 and p=1 are the only fixed points). Thus, at each generation the number of my descendants should grow by a factor of (2-p). Things are looking good.

But of course there is a lot of variability, especially when you only have a few kids initially. So how many kids should someone have in order to be pretty sure your genes will take over? Since I know more about perl than stochastic processes, I wrote a little script to figure this out. If anyone wants to solve this analytically, I’d like to know, because I can’t run the perl script very quickly on a population of size 6 billion.

Here are some results, showing the probability that everyone is eventually a descendant of me for various population sizes and number of kids I have, averaged over 100 runs:

size=1000 size=10000 size=100000
1 Kids 0.8 0.78 0.79
2 Kids 0.95 0.96 0.94
3 Kids 1.0 0.99 0.99
4 Kids 1.0 1.0 1.0

So, you don’t need many kids to eventually take have a good odds of taking over. At least when the population is relatively small. But in the larger population, probably most people’s kids reproduce themselves more readily than in this model, so you don’t need many kids to eventually take over. I wonder if anyone has attempted to model the distribution of family names along these lines…

JFK’s clubs 1

Posted by piantado on November 23, 2008

Just read an interesting letter from TICS by Paul Bloom and Susan Gelman: Psychological essentialism in selecting the 14th Dalai Lama. In it, they discuss the fact that humans tend to act as though physical objects can acquire some kind of nonphysical essence. For example, John F Kennedy’s golf clubs sold at auction for $772,500. That’s pretty silly.

It’s especially silly when you consider that there is no physical characteristic of the golf clubs that link them to JFK. They are just golf clubs! They are worth that much not because they are made of gold, or because the objects are specially different from any others. They are worth that much because of their history. Yet, nothing about their current state contains any real information about that history (or maybe they have JFK’s DNA on them?).You could swap in some other golf clubs from the same manufacturer and nobody would know! So what’s so special about them?

On the one hand, I find this intuition very compelling–that there is something essentially different about those golf clubs as compared to any others. Their history is different. They’ve been through different things. But on the other hand, I don’t believe that they are interestingly different as physical objects from other golf clubs. You could never tell by looking at them who’s owned them! What’s a hardcore materialist to do?

I wonder about psychological essentialism poking its head into other social and political domains. Americans seem to care less about the number of Iraqi deaths than American deaths. We also care less about the rights of immigrants, and noncitizens. How much of our national and social in-group mentality is driven by this idea that somehow this group of people is different from this other group of people? That somehow differences of origin or culture are somehow relevant to what rights a group should have?

Said another way, how could a country seriously believe that its citizens were so essentially different from any other group, that rights should only be guaranteed to citizens? How can anyone seriously believe that those golf clubs are so essentially different as to be worth three quarters of a million dollars? Is thinking that American citizens deserve things other groups don’t any different from thinking JFK’s clubs have intrinsic value?

“just” theories and the science of the ordinary

Posted by piantado on August 31, 2008

It’s often hard to explain to people what I do. If I tell them I study how kids learn language, the almost universal response I get is “don’t parents just teach their kids?” Or if I say I study language processing, trying to understand how people convert a sound signal into a meaning and how they deal with things like ambiguity, the typical response is “don’t people just figure it out?” Yes and yes. Parents just teach kids language and people just figure out what sounds mean what. Can I have my PhD now?

I often wonder if physicists and biologists and chemists get these kinds of responses when they try to describe their work. Has there been some physicist somewhere who tried to explain his latest theory of quantum gravity, but was cut off by someone saying “well don’t things just fall?” Or a biologist who was explaining the developmental dynamics of the patterns on butterflys’ wings, but was interrupted by “well don’t they just grow like that?”

Probably there has. But I think it’s worse for us cognitive scientists because we study things which people find easy and effortless. People feel like they understand them. Almost everything we study is done without serious conscious effort, and so it feels to us like we are “just” solving the problem. I remember reading the PDP book and seeing an example in the first few pages about reaching for a cup of coffee. How do people reach for their cup of coffee on their desk? Well you just reach for it.

But when you do, you face all kinds of computational complexities that we are only beginning to be able to solve. You must recognize your coffee cup, perhaps from a view you’ve never seen before. You must recognize other things on the desk, and move your muscles in such a way as to reach for the cup and not knock anything else over. You have to shape your hand to the handle, figure out how much force to exert to pick it up without dropping it, but not crush it.  How to support its weight, how to move it without spilling it, how far to move it, how to translate your three-dimension image of the world into motor actions, etc. etc. You do all this effortlessly–you just pick it up.

The fact that you “just” do it is the remarkable part. It’s not the deeper explanatory theory we want to find; it is a statement of the problem we are trying to solve. How do you reach for the cup? How do kids “just” learn language? How do adults “just” understand it?

The hard part is understanding the computational processes that explain the how and the why. If you’ve programmed a lot, you know the painful process of making everything correct and explicit in a program. You know it’s very hard to take high-level things which are intuitive and make them explicit. “Lift the cup so that you don’t spill it and avoid putting it on your keyboard” are fine, intuitive instructions for another human. But it’s remarkably hard to implement them in a program which require explicitness about everything–how much force to send to which motors, etc etc. If I had to describe what cognitive science is, I would say that’s it’s pretty much that: making explicit the computational processes that underlie what we do. The fact that we, as humans, find it easy to execute those computational processes is mostly irrelevant, except that it prevents people from seeing how remarkable we are as biological-computational machines.

But we aren’t just remarkable; we are also very weird. Take language: you sit on one side of the room and I on the other, and I get some idea in my head. We don’t know what it really means for me to have the idea, but I get one, and decide to convey it to you. To do this, I shake the air a little bit, bumping the molecules back and forth, and they ricochet and bounce between us, eventually bumping your eardrums. You interpret these molecular bumpings in some way, using a code that you inferred as a child from listening to the bumpings around you. And the code is not easy to learn–this is why you can’t understand a language you never heard. For one, it is richly structured in such a way to let us communicate arbitrarily complex ideas–I can talk about John, or John’s grocer, or John’s grocer’s brother’s uncle, or the uncle of John’s grocer’s brother who once flipped off the pope–and (intrusively) construct some new mental object in your brain. Also, the mapping between the bumpings and what the bumpings mean is somewhat arbitrary: not much would change if “dog” meant cat and “cat” meant dog. The code is hard enough to learn that a very smart monkey can’t do it, but easy enough that every human can–without explicit instruction. After feeling some of these bumpings, you get a new idea in your head. The idea you construct has the power to influence your actions and beliefs impressively: it might lead you to cry, laugh, reconsider your politics, or blow up a federal building. Isn’t that strange? Remarkable? What would a martian think of our quaint communication system?

I think seeing and appreciating the weirdness and complexity of the world is why some people–myself included–do basic science. Often, the most fundamental progress in science comes from recognizing that something we thought was intuitive really isn’t. Did you know that hot water can freeze faster than cold water, that you can only insert “fuckin” in certain places in English words (”stu-fuckin-pendous” sounds okay but “stupen-fuckin-dous” doesn’t), or the world is crazily different for tiny organisms (or other low-Reynolds number life)?

What the cognitive scientist has is awe for the ordinary; an appreciation of how hard things really are to do, even though we find them easy. Fortunately, once you see the complexity in one cognitive act, it is easy to find it in everything you do. Give me an algorithm for how you move your fingers to pull your housekey out on your key chain. Or how you catch yourself when tripping. Or how you can decide if an animal off in the distance is a donkey. How can you recognize someone by how they walk? How do you reason about nonexistent or abstract things (can a unicorn trip? sneeze? laugh?)? How you decide if it’s wrong to sleep with your stepsister? Why can’t John McCain smile? And after the mundanely easy, things get even harder–how do the Car Talk guys diagnose a Ford? How does a musician improvise? How does an expert play chess, or writer choose sentences?

I think this is why I do basic science–there is so much to be in awe of when you really look. Anyone can appreciate the concert, but only a lucky few get to appreciate the act of reaching for the instrument.

security is not free 1

Posted by piantado on August 24, 2008

I was standing in airport security a while ago, wondering about how much all of this security cost society. Unfortunately, this is a hard thing to quantify: how much is our time worth? How much does it cost us to stand a few minutes in an airport security line while you fly? I could imagine quantifying it in terms of lost wages or the cost to pay TSA workers, but what is best?

One way that’s good is to figure out how much time is lost per year–how many hours does it take away from American’s lives every year? The TSA reports that there were 708,400,522 people screened in 2006, with an average wait time of 3.79 minutes. This comes out to 5108 people-years of lost time due to airport security screenings. Or, if you assume that people live to be 80 years old, airport security costs an equivalent of 63 person-lifetimes every year. 63 lifetimes. American lost 63 lives worth of time due to airport security. How many lives worth of time lost due to hijackings? None.

It’s worth thinking about what an optimal airport security policy would be. It should be clear that if security were too lax, we would be losing more people to terrorism than airport security wait lines. Conversely, if security were too tight, we would be losing more people due to security wait times. Picture a system in which you were screened for a few weeks before flying. This might totally eliminate terrorism on aircraft (but probably not), yet it would cost us a substantial amount of time. The way to determine if it is too much time is to see if the medicine costs more American lifetimes than the disease. Which, at 63-0, it seems to.

Said another way, what matters is the total number of lives lost–lives lost to security policies plus lives lost to terrorism. If no lives are lost to terrorism, then we could reduce the total number of lives lost by decreasing the security until the number of lives worth of lost time due to security is equal to the number lost due to terrorism. This increases the lives lost to terrorism, but decreases the total lives lost overall.

This means it is not rational to have a “zero tolerance” policy for airline terrorism. The best policy allows as many deaths, on average, from airline terrorism as it does from waiting in security lines. Unfortunately, it’s not clear how much to adjust it–maybe a small change would lead to a huge number of terrorist deaths.

Of course, people may have a different definition of “best” than the one here. After all, waiting in a security line is a guaranteed loss of 3.7 minutes, but potentially dying from terrorism is a small probability of a much bigger loss of minutes. Prospect theory tells us that people are risk averse for small probabilities of losses. So, people might rather pay the 3.7 minutes than have a very small chance of being killed by a terrorist. Even if that’s what people want, though, anything other than an equal number of deaths due to terrorism and security wait times costs society more human lifetimes.

What’s the point? The point is that security is not free. It is not free in terms of our rights and civil liberties. And it is demonstratably nonoptimal in terms of the number of American lifetimes worth of time it costs. The only thing worse than terrorism costing Americans part of their lives is the government doing it more so.