[Gentoo] Why fitting a line to points is weird
Yesterday was an enlightening day. At work, I’m trying to compare two protein structures — one is a higher-resolution version of the other. I’m plotting a certain characteristic of each against the other and finding the slope of the line. Great news! There’s a really cool trend between pairs of structures. My boss asked me to switch the X and Y axes around, since one typically puts the newer data on the Y, and the data to which it’s being compared on the X.
So I did, and guess what? The slopes don’t match! That’s right, switching the axes around doesn’t necessarily result in an inverse slope when you’re doing a linear regression. Why, you ask? Because the technique only minimizes for the Y direction. Suddenly my really cool trend isn’t a trend at all, and all the slopes are equal within error.
What we’ve decided to do is put the more accurate data in the X axis, to better account for the larger error on the Y axis using linear regression.
If any of you know much statistics, I’d like to hear a more accurate, better way to come up with a slope that’s robust to flipping the axes (perhaps by minimizing both X and Y distances or residuals?). This method needs to already be implemented in some open-source program and fairly trivial to learn.