I screwed up. Sorry.
Yesterday I ran the standard deviations of 1790 census numbers. I did not compute them wrongly or interpret them wrongly, but I did use the wrong measure entirely given the nature of the data. The standard deviation works best for data that fit into a bell curve arrangement. The first standard deviation is thirty-odd percent in either direction from the peak of that curve. The census refused to give me data that cooperated with that. I still think that North and South had significant overlap, if not as much as I did when I wrote yesterday’s post. None of the state-level data changes. But because the census data has, in effect, two peaks (one for the North and one for the South) the standard deviation is unhelpfully large. It expresses a mathematical fact with limited real world utility.
I noticed the large normal range and a few other oddities when writing the post, but chalked them up to my general ignorance of statistics. When working over the 1800 statistics, I saw the same pattern developing. I reached out to a friend who did chemistry and physics in college and now teaches junior high science. He put in a fair amount of time finding out where I went wrong and getting the right tool for the data. Unlike me, he knows his math.
For heterogeneous data like the censuses, average deviation fits better than standard deviation. Luckily, my spreadsheet can do that too. I’ll put the new data up in their own post shortly.