Yesterday I heard that the county I live in, Harris County, is the 3rd largest is the United States. (In population. It’s nowhere near the largest in area.) Somehow I’ve lived here a couple decades without knowing that. Houston is the 4th largest city in the US, so it’s no shock that Harris County the 3rd largest county, but I hadn’t thought about it.
I knew that city populations followed a power law, so I wanted to see if county populations do too. I thought they would, but counties are a little different than cities. For example, cities grow and shrink over time but counties typically do not.
To cut to the chase, county populations do indeed follow a power law. They have the telltale straight line graph on a log-log plot. That is for the largest counties. The line starts to curve down at some point and later drops precipitously. That’s typical. When you hear that something “follows a power law” that means it approximately has a power law distribution over some range. Nothing hasĀ exactly a power law distribution, or any otherĀ ideal distribution for that matter. But some things follow a power law distribution more closely and over a longer range than others.
Even though Los Angeles County (10.1 million) is the largest by far, it doesn’t stick out on a log scale. It’s population compared to Cook County (5.2 million) and Harris County (4.6 million) is unremarkable for a power law.
Is this a typical Pareto coupled with a Lognormal at the low end?
I’m curious about the counties at the low end of the list… Where’s the source dataset, is it public?
@John: It’s typical to see a concave curve on the low end with a sharp turn down. Not sure just what kind of distribution fits that. There have been a lot of proposed generalizations of the Pareto distribution that attempt to fit more of the data with one distribution family.
A lot of times you’ll see things fall apart on the low end because the numbers are getting so small that your implicit continuity assumptions break down and things get noticeably discrete. That’s not quite what’s happening hear. The smallest county is tiny — 88 people! — but it’s not like we’re getting down to 1’s and 2’s, like you might with web traffic data.
@Mike: You can find the data here, and they got the data from the US Census Bureau.
A lognormal distribution seems to fit very well (mu=10.27, sigma=1.491).
For what it is worth, it fits an Inverse Weibull
F(x) = alpha*beta*(1/((x-min)*beta))^alpha+1*
exp(-(1/((x-min)*beta)^alpha)
Notice how this reduces to power law when x moves very far from min. [I would have put equation in image but can’t seem to paste]
Thanks for the data source