visualizing nonindependence

Posted by piantado on October 28, 2008

Here’s a problem I recently had: suppose you are given a sample of some joint distribution on X and Y. That is you are given a bunch of (X,Y) pairs. On thing you may want to know is whether the distribution P( Y | X ) depends on X. You can’t always test this by doing a regression–sometimes the expected of Y may not depend on X, but the distribution of Ys might. Sometimes P(Y | X) may be entirely different types of distributions for various X.

But here’s one fun technique: it is easy to compute the marginal densities P(Y) and P(X) using, say, density estimation. Then we can do a 2D density estimation where we weight each data point (x,y) by 1 / ( P(x) P(y) ).

Weighting the density estimation like this doesn’t recover a function for the actual density. Instead, it gives you a value P(Y | X) / P(Y). To see this, note that the density we get out should approximate

P(X,Y) / ( P(x) P(y) ) = P(X) P(Y | X) / ( P(x) P(y) ) = P(Y | X) / P(Y)

Thus, if P(Y | X) is independent of X–and therefore equal to P(Y)–you will recover a constant function. So if the function looks non-constant, you know that there is some kind of interaction. This thing P(Y | X) / P(Y) can be interpreted as how much the probability changes by conditioning on X. That is, how much the probability of that specific Y value depends on whether or not you know X. If you take logs, you get how much the suprisal of Y changes by conditioning on X.

An example is shown below:

This was generated by taking X to be distributed uniformly in [0,1]. Y is then chosen to be a normal distribution with SD 1, centered at 10*X. I ran this density estimation technique, and plotted the log densities. As things get more red, they get more negative. A script to generate it (and can be modified to play with other data) can be found here. If you squint a bit, you can see how this represents that distribution’s nonindependence: for example, if you don’t know X, a high Y is very possible and maybe as likely as a small Y. But if you know X is small, a high Y is very unlikely. So, when X is small and Y is large, you get red, meaning that by conditioning on X, the probability in that region goes down a lot. Hopefully this can help visualize how they are nonindependent.