That’s not as silly a question as it sounds. Defining the size of a city is tricky task that has major economic implications: how much should you invest in a city if you don’t know how many people live and work there?
The standard definition is the Metropolitan Statistical Area, which attempts to capture the notion of a city as a functional economic region and requires a detailed subjective knowledge of the area before it can be calculated. The US Census Bureau has an ongoing project dedicated to keeping abreast of the way this one metric changes for cities across the continent.
Clearly that’s far from ideal. So our old friend Eugene Stanley from Boston University and a few pals have come up with a better measure called the City Clustering Algorithm. This divides an area up into a grid of a specific resolution, counts the number of people within each square and looks for clusters of populations within the grid. This allows a city to be defined in a way that does not depend on its administrative boundaries.
That has significant implications because clusters depend on the scale on which you view them. For example, a 1 kilometre grid sees New York City’s population as a cluster of 7 million, a 4 kilometre grid makes it 17 million and the cluster identified with an 8 kilometre grid scale, which encompassing Boston and Philadelphia, has a population of 42 million. Take your pick.
The advantage is that this gives a more or less objective way to define a city. It also means we’ll need to reanalyse of some of the fundamental properties that we ascribe to cities growth. For example, the group has studied only a limited numer of cities in the US, UK and Africa but already says we’ll need to rethink Gibrat’s law which states that a city’s growth rate is independent of its size.
Come to think of it, Gibrat’s is a kind of weird law anyway. Which means there may be some low hanging fruit for anybody else who wants to re-examine the nature of cities.
Ref: arxiv.org/abs/0808.2202: Laws of Population Growth
That sounds amazingly naive, and doesn’t in any way get close to doing what it sets out to do (coming up with a more objective way of measuring the size of a city)
There’s the one obvious objection that is mentioned here, namely that the size of the cells is arbitrary, and the larger cell you choose, the larger the city becomes.
There’s another, the *placement* of the cell-boundaries have similar effects. Place the cells so that they meet downtown, and any city magically splits into four smaller cities.
There’s a third; If you uniformly use the same cell-size, then either new-york breaks up into dozens of smaller cities, or really small cities, separated from eachothers by miles of farmland, join into one.
Then again, this paper isn’t even about the clusterin-algorithm, but rather gives some results on city-growth, the clustering is just used to obtain raw data.
Shouldn’t some kind of 2D density estimation work better?
Dear Eivind,
First, I would like to thank you for your comment. The points you raised are very important since they are the key of the clustering algorithm.
We have tested our results when cells are defined in different ways (larger cells, changing the *placement*, and others): Cells and clusters are not the same objects. The cells are defined with a grid that is initially placed *on the map*. Once cells are defined, one runs the clustering algorithm to obtain the clusters by recursively joining populated neighboring cells. The algorithm stops when the cluster boundary has no new neighboring populated cells.
About your first objection, larger cells can be understood as a larger level of coarse-graining. This should be seen as a feature of the algorithm and not as a weakness, since it allows for studying population dynamics at different length-scales. In addition, in the paper we study the effect of modifying the level of coarse-graining and found no statistical difference (unless the coarse-graining is extremely large, of the order of the largest cluster).
About your second objection, the *placement* of the cells is irrelevant. If you have 4 populated neighboring cells, the clustering algorithm will make them part of the same cluster, regardless of where the four cells are located.
About your third objection. This is related to the previous objection: Two neighboring populated cells are part of the same cluster, so that New York is always one cluster (New York is never broken up into many smaller cities because all the cells composing it are populated so that the clustering algorithm joins them into the same cluster).
This paper presents an unbiased way to define population clusters through an algorithm that has a feature: one can modify the level of coarse-graining depending on what one is studying.
Thank a lot again for your comment, and please let me know if you have any questions.