xdr

You are currently browsing the archive for the xdr category.

The raw trips maps is now corrected!! Every arc in this map of Santiago is coloured by how much the raw mobile network data was wrong. Red means under-counted. Blue means over-represented. Width is corrected trip volume.

The striking thing is that the map is overwhelmingly red. Not just at the periphery, where sparse tower coverage is expected to introduce bias. Even the thick, dominant corridors are red.

The reason is likely geometry. The flows that rank highest visually are the ones that combine high volume with long distance, and long-distance commuting in Santiago almost always has at least one endpoint in a less-covered area outside the dense urban core. That is exactly where tower density drops, detection probability falls, and the network starts to miss trips.

This is the corrected origin-destination matrix for the Santiago metropolitan region (the raw one is here), derived from mobile phone (XDR) data and reweighted using inverse probability weighting (IPW) to account for heterogeneous tower density. The colour of each arc is the log-ratio of corrected to raw trips for that specific corridor, *not the trip count itself*.

We do this here:
– Ferres, L., & Elejalde, E. (2026). Systematic biases in mobile phone mobility data from heterogeneous tower density. Zenodo. https://doi.org/10.5281/zenodo.19484460

I’ll write more about the statistical fraamework we used, but the detaills are there and in the associated github.

The literature has treated cell towers as isotropic light bulbs since 2008. They’re really not (that’s why sometimes you don’t have a mobile phone signal!).


Left: Voronoi tessellation of 1,292 BTS across Santiago (R13), ~6 km window over the downtown core. Each tower owns every point closer to it than any other mast, with 360° coverage assumed.

Right: the same towers, drawn as directional sector wedges built from the azimuth field that has always been in the catalog. Most masts carry three antennas radiating ~120° beams; a few carry six.

Dropping the isotropic assumption pulls the median effective radius from 504 m to 375 m region-wide, and from 245 m to 181 m inside this window. Sub-kilometre spatial resolution from a column already in the data.

The black gaps in the right panel are not missing values. They are the map being honest about where no antenna is aiming.

I’m working on a series of blog posts and “spinoffs” of this paper:

Ferres, L., & Elejalde, E. (2026). Systematic biases in mobile phone mobility data from heterogeneous tower density. Zenodo. https://zenodo.org/records/19484460

Next, I’ll try to explain the statistical methods we used, and how we can correct the values.

Tags: , , ,

This is one of the most complete pictures of a city’s daily movement that money can buy. Eigh hundred and ninety million trips.

Origin-destination matrix from mobile phone data (XDR records) in Santiago, Chile. 658k H3 hex-to-hex pairs, 6.5 weeks of observations, top 2,000 flows ranked by log(volume) x sqrt(distance).

Each arc is a quadratic Bezier between H3 resolution-8 cell centroids (~460 m edges). Hexagons colored by log10 of total trip throughput. Dark background, no basemap, just the data.

Tower density in Santiago varies by two orders of magnitude between urban core and rural periphery, which means short-distance and rural trips are systematically invisible. These are three stylistic variations (amber, electric, and magenta/pink) of the uncorrected picture.

– Code: https://github.com/leoferres/mobilens
– Preprint: https://zenodo.org/records/19484460

Tags: , , ,

New preprint and accompanying software release: “Systematic biases in mobile phone mobility data from heterogeneous tower density,” with Erick Elejalde (L3S, Leibniz Universität Hannover).    

Mobile phone records (CDRs and the higher-resolution XDRs) are now standard inputs for human mobility, epidemic modelling, and disaster response, but the spatial distribution of cell towers introduces measurement biases that are rarely quantified. Towers cluster in cities and thin out in rural areas. The result is a spatially structured detection floor: short rural trips never cross a sector boundary and are invisible, rural users get misattributed to oversized Voronoi cells, and origin-destination matrices end up artificially urban-centric. The biases are correlated with the very variable (urbanicity) that researchers most often want to study.

We characterise the problem and propose a six-step correction pipeline:

1. Sector polygons inferred from antenna azimuth and height, replacing the standard tower-point Voronoi tessellation
2. Detection-floor modelling at the per-site level
3. Dasymetric redistribution of census population onto an H3 hexagonal grid
4. OD construction with intra-site sector-crossing recovery
5. Inverse probability weighting with a tower-density-aware inclusion probability          
6. Fay-Herriot small area smoothing toward a gravity prior

Applied to the Región Metropolitana de Santiago using a 63,832 antenna catalog, the 2024 Chilean census, and 6.5 weeks of XDR data:

– Sector polygons give a 3.0x gain in effective spatial resolution over tower-point Voronoi (median 299 m versus 894 m)
– The 50% detection threshold ranges from 16 m in the urban core to 2,542 m at the most isolated site
– Intra-site sector crossings recover roughly 100 million short-distance trips (median displacement 429 m) that are invisible at the tower level
– IPW uplifts rural comuna flows by 50 to 73%, while the urban core is slightly downweighted
– Fay-Herriot shrinkage weights vary from about 0.7 in the urban core to under 0.1 at the periphery, mirroring the tower-density gradient

The pipeline is implemented in mobilens, an MIT-licensed Python library that is operator- and country-agnostic. The minimum inputs are a tower catalog with azimuth and height, a census population layer at any administrative level, and a study area boundary polygon. Steps 1 to 3 (the spatial characterisation of the bias) can be carried out without any XDR data, which makes the library useful even where records are unavailable.

– Code: https://github.com/leoferres/mobilens
– Preprint: https://zenodo.org/records/19484460

Substantive feedback before journal submission is welcome!!

Tags: , ,

In 2008, González, Hidalgo, and Barabási published a landmark study in Nature titled “Understanding individual human mobility patterns“. Using mobile phone records from a European country (?), they argued that human movement follows a truncated power law; a Lévy flight-like pattern where most trips are short, but occasional long-range jumps occur with non-negligible probability. The paper shaped a (two?) decade(s) of mobility research and influenced models of epidemic spreading, urban planning, transportation research and even disaster prevention.

It’s been a while since I’ve been meaning to replicate their core findings using a different dataset: anonymized XDR (data session) records from Santiago de Chile. I used a short period I had handy covering 17 days in early 2020 (February 27 to March 15). The dataset, however, is substantially larger than the original: 3.6 million unique users generating 464 million displacement observations. Also, obviously, it comes from a different continent and a different era of mobile technology. The question is simple: do González et al.’s scaling laws hold up?

The aggregate displacement distribution

The central empirical claim of the paper is that the probability of observing a displacement of length \(\Delta r\) follows a truncated power law:

$$
P(\Delta r) \sim \Delta r^{-\beta} \exp(-\Delta r / \kappa)
$$

where \(\beta \approx 1.75\) and \(\kappa\) is a cutoff distance. This heavy-tailed distribution is the statistical signature of Lévy flight-like mobility: displacements span several orders of magnitude with no characteristic scale (up to the cutoff).

We computed \(P(\Delta r)\) from the full set of 464 million displacements and fitted a truncated power law using maximum likelihood estimation. The result is shown below.

The pure power-law exponent is \(\alpha = 1.74\) (yes, I called it \(\alpha\) not \(\beta\) 😩), remarkably close to the \(\beta \approx 1.75\) reported by González et al. The truncated power-law fit, which jointly estimates the cutoff, yields a lower exponent of 1.67 (shown in the figure), as the exponential decay absorbs some of the tail weight. This is a clean replication of their Figure 1a: the heavy-tailed character of human mobility is robust across datasets, geographies, and time periods.

The radius of gyration

González et al. also characterized mobility through the radius of gyration \(r_g\), a measure of how far each individual typically ranges from their center of mass. They found that \(r_g\) itself follows a truncated power law:

$$P(r_g) \sim r_g^{-\beta_r} \exp(-r_g / \kappa_r)$$

with \(\beta_r \approx 1.65\). This means that most people are relatively sedentary, but a fat tail of highly mobile individuals exists.

We computed \(r_g\) for each of the 3.6 million users and fitted the distribution.

The pure power-law exponent is 1.41, somewhat lower than the \(\beta_r \approx 1.65\) reported in the original paper. The truncated power-law fit yields 1.08, a larger gap that likely reflects real differences between the datasets: Santiago’s urban geography, the density of cell towers, and the observation period (17 days vs. 6 months) all affect the estimated range of individual mobility. Shorter observation windows mechanically compress \(r_g\) estimates because infrequent long-range trips may not be captured.

The rescaling collapse

Perhaps the most elegant result in González et al. is the rescaling collapse. They stratified users by their radius of gyration, computed the conditional displacement distribution \(P(\Delta r \mid r_g)\) within each bin, and then showed that plotting \(r_g \cdot P(\Delta r \mid r_g)\) against \(\Delta r / r_g\) causes all curves to collapse onto a single universal function. This implies that individual mobility patterns share a common shape, merely rescaled by each person’s characteristic travel distance.

We reproduced this analysis using seven \(r_g\) bins spanning 1 to 300 km. The rescaled curves do show meaningful overlap, but the collapse is not as tight as reported in the original paper. We quantified this using the coefficient of variation across curves at shared rescaled-\(x\) bins and obtained a mean CV of 0.55, indicating moderate but “imperfect” collapse. The 17-day observation window may again play a role: with limited data per user, the conditional distributions are noisier, and the \(r_g\) estimates themselves carry more uncertainty.

The exponent identity

González et al. derived a theoretical relationship linking the aggregate and individual exponents:

$$\beta = \alpha_{\text{ind}} + \beta_r – 1$$

where \(\alpha_{\text{ind}}\) is the typical individual-level power-law exponent. Using our estimates (\(\alpha_{\text{ind}} = 1.62\), \(\beta_r = 1.41\)), this predicts an aggregate exponent of \(\beta = 2.03\), compared to our observed value of 1.74, a discrepancy of about 0.29. In the original paper, the relationship holds more tightly, likely because the longer observation window yields more stable exponent estimates at both the individual and aggregate levels.

So?

The core finding of González et al. replicates cleanly in our Santiago de Chile data; ie., that aggregate human mobility follows a truncated power law with exponent near 1.75. This is the most robust result in the paper and the one with the largest practical consequences for mobility modeling.

The secondary results (the \(r_g\) distribution, the rescaling collapse, and the exponent identity) show the right qualitative behavior but with quantitative differences. Thus, these are best taken not as failures of the original analysis but as sensitivity to dataset characteristics. A 2020 17-day window with XDR records in a South American metropolis is a meaningfully different empirical setting than 6 months in 2008 (2007?) of call records in a European country. The broad patterns survive, though!

Tags: , ,

As a simple exercise, I wanted to compare what Google knows about my trip, versus what the phone company knows. Here’s a reconstruction of a day, Oct 19, 2020, going from home to my university and then back. I’ve chosen expeditiousness over beauty, though.

The blue point to the right is my University, the one to the left is home. The blue line is the GPS of my Google maps timeline. The red dots are the latitude and longitude of the antennas I’ve connected to during the day, and the red line is the trajectory I followed (given the antennas I’ve connected to, using MovingPandas).

All in all, it’s not a bad fit, but there are indeed a few kilometers of difference between GPS and cell towers. This is somewhat worse for shorter trips:

(I’ll fix the x-axis at some point 🙂)

Even though the trip is reconstructed (more or less in the same way as before), I do get to connect to far away tower (>2.7km) from where I physically am. This means we need to be very careful and “clean up” trajectories when working with mobile phone data!

References

Tags: ,