Replicating González et al. (2008) with Santiago de Chile Mobile Phone Data

In 2008, González, Hidalgo, and Barabási published a landmark study in Nature titled “Understanding individual human mobility patterns“. Using mobile phone records from a European country (?), they argued that human movement follows a truncated power law; a Lévy flight-like pattern where most trips are short, but occasional long-range jumps occur with non-negligible probability. The paper shaped a (two?) decade(s) of mobility research and influenced models of epidemic spreading, urban planning, transportation research and even disaster prevention.

It’s been a while since I’ve been meaning to replicate their core findings using a different dataset: anonymized XDR (data session) records from Santiago de Chile. I used a short period I had handy covering 17 days in early 2020 (February 27 to March 15). The dataset, however, is substantially larger than the original: 3.6 million unique users generating 464 million displacement observations. Also, obviously, it comes from a different continent and a different era of mobile technology. The question is simple: do González et al.’s scaling laws hold up?

The aggregate displacement distribution

The central empirical claim of the paper is that the probability of observing a displacement of length \(\Delta r\) follows a truncated power law:

$$
P(\Delta r) \sim \Delta r^{-\beta} \exp(-\Delta r / \kappa)
$$

where \(\beta \approx 1.75\) and \(\kappa\) is a cutoff distance. This heavy-tailed distribution is the statistical signature of Lévy flight-like mobility: displacements span several orders of magnitude with no characteristic scale (up to the cutoff).

We computed \(P(\Delta r)\) from the full set of 464 million displacements and fitted a truncated power law using maximum likelihood estimation. The result is shown below.

The pure power-law exponent is \(\alpha = 1.74\) (yes, I called it \(\alpha\) not \(\beta\) 😩), remarkably close to the \(\beta \approx 1.75\) reported by González et al. The truncated power-law fit, which jointly estimates the cutoff, yields a lower exponent of 1.67 (shown in the figure), as the exponential decay absorbs some of the tail weight. This is a clean replication of their Figure 1a: the heavy-tailed character of human mobility is robust across datasets, geographies, and time periods.

The radius of gyration

González et al. also characterized mobility through the radius of gyration \(r_g\), a measure of how far each individual typically ranges from their center of mass. They found that \(r_g\) itself follows a truncated power law:

$$P(r_g) \sim r_g^{-\beta_r} \exp(-r_g / \kappa_r)$$

with \(\beta_r \approx 1.65\). This means that most people are relatively sedentary, but a fat tail of highly mobile individuals exists.

We computed \(r_g\) for each of the 3.6 million users and fitted the distribution.

The pure power-law exponent is 1.41, somewhat lower than the \(\beta_r \approx 1.65\) reported in the original paper. The truncated power-law fit yields 1.08, a larger gap that likely reflects real differences between the datasets: Santiago’s urban geography, the density of cell towers, and the observation period (17 days vs. 6 months) all affect the estimated range of individual mobility. Shorter observation windows mechanically compress \(r_g\) estimates because infrequent long-range trips may not be captured.

The rescaling collapse

Perhaps the most elegant result in González et al. is the rescaling collapse. They stratified users by their radius of gyration, computed the conditional displacement distribution \(P(\Delta r \mid r_g)\) within each bin, and then showed that plotting \(r_g \cdot P(\Delta r \mid r_g)\) against \(\Delta r / r_g\) causes all curves to collapse onto a single universal function. This implies that individual mobility patterns share a common shape, merely rescaled by each person’s characteristic travel distance.

We reproduced this analysis using seven \(r_g\) bins spanning 1 to 300 km. The rescaled curves do show meaningful overlap, but the collapse is not as tight as reported in the original paper. We quantified this using the coefficient of variation across curves at shared rescaled-\(x\) bins and obtained a mean CV of 0.55, indicating moderate but “imperfect” collapse. The 17-day observation window may again play a role: with limited data per user, the conditional distributions are noisier, and the \(r_g\) estimates themselves carry more uncertainty.

The exponent identity

González et al. derived a theoretical relationship linking the aggregate and individual exponents:

$$\beta = \alpha_{\text{ind}} + \beta_r – 1$$

where \(\alpha_{\text{ind}}\) is the typical individual-level power-law exponent. Using our estimates (\(\alpha_{\text{ind}} = 1.62\), \(\beta_r = 1.41\)), this predicts an aggregate exponent of \(\beta = 2.03\), compared to our observed value of 1.74, a discrepancy of about 0.29. In the original paper, the relationship holds more tightly, likely because the longer observation window yields more stable exponent estimates at both the individual and aggregate levels.

So?

The core finding of González et al. replicates cleanly in our Santiago de Chile data; ie., that aggregate human mobility follows a truncated power law with exponent near 1.75. This is the most robust result in the paper and the one with the largest practical consequences for mobility modeling.

The secondary results (the \(r_g\) distribution, the rescaling collapse, and the exponent identity) show the right qualitative behavior but with quantitative differences. Thus, these are best taken not as failures of the original analysis but as sensitivity to dataset characteristics. A 2020 17-day window with XDR records in a South American metropolis is a meaningfully different empirical setting than 6 months in 2008 (2007?) of call records in a European country. The broad patterns survive, though!

Tags: , ,