In 2002, about 2% of Chile’s residents were foreign-born. In 2024, it was over 8%. One of the fastest immigration transitions in Latin America, and the question driving the 2026 edition of IELE756 at UDD: where do immigrants settle, what work do they find, and do their health outcomes diverge from those of Chilean-born residents?

The data

We use the 2024 Census microdata: 19M+ persona records linked to households and dwellings, parquet, three relational tables joined via id_vivienda / id_hogar / id_persona. The migration variables are right there: place of residence 5 years ago (p24_lug_resid5), place of birth (p25_lug_nacimiento), arrival period (p26_llegada_periodo), nationality (p27_nacionalidad). Then two health datasets:

  • ENO (Enfermedades de Notificación Obligatoria, MINSAL DEIS), 2007 to 2024, ~333k records, semicolon-delimited CSV. The Desconocido nationality category has to be reported and excluded from rate denominators, which is the kind of data hygiene students do not learn from textbook examples.
  • GRD (Grupos Relacionados de Diagnóstico, MINSAL DEIS), 2019 to 2024, ~5M hospital discharges, pipe-delimited yearly zips, ICD-10 coded against CIE-10.xlsx for diagnostic chapter grouping.

The three datasets contain different people. A census respondent is not the same individual as an ENO notification or a hospital discharge. Linking them is therefore necessarily ecological: aggregate to the comuna level, join on codigo_comuna, fit Poisson or Negative Binomial regression with a population offset, interpret incidence rate ratios, and then sit with the ecological fallacy explicitly.

Class as bin-packing

There are 21 pairs. Each is assigned 1 to 3 comunas (mostly RM) totalling ~330k residents. The exact range is 324,087 to 568,106 with a mean of 356,479; allocation is a small Python script that sorts comunas largest-first and packs them greedily into the bins that currently have the smallest total. Puente Alto (568,106) and Maipú (521,627) end up alone; the long tail (Alhué, 5,765; San Pedro, 9,522; María Pinto, 12,572) gets folded into mixed groups.

Pipeline shape over the trimester: Tarea 0 (shallow contact, all three datasets) -> Tarea 1 (deep on Census, demographic + migration profile) -> Tarea 2 (deep on ENO + GRD, rates by nationality) -> Tarea 3 (merged at the comuna level, count regression fitted, ecological fallacy critiqued explicitly).

Natural vs. artificial intelligence

The final project is 25 of 70 points, split as: GitHub repo (10), 8 to 10 minute video (7), and a handwritten in-person memo (8 points, individual, 45 minutes, no AI / laptops / notes). Tareas and the async final components allow AI with a disclosure paragraph in the README. The memo does not, and is graded per student even though the project is paired.

The point of that split: if you cannot defend your repo on paper, the repo’s score does not save you. It is the only integrity lever in a course where everything else is plausibly AI-augmented, and it changes how teams work weeks before the memo even happens. Pair quizzing each other from the prompt bank, on paper, no tools open.

The final asks for one anomaly, defended. Things that qualify: a coefficient that flips sign between Poisson and Negative Binomial with material consequences for interpretation; a comuna whose ENO rate is more than two SDs from the regional mean for a specific disease, adjusted for population; a choropleth whose spatial pattern contradicts the underlying demographic correlation. Things that do not: “foreign-born residents have higher TB rates” (documented for years; not a surprise).

Materials

  • Syllabus, week-by-week schedule, dataset notes, group/comuna assignments: leoferres.blog/teaching/iele756/2026-1/.
  • Walkthrough scripts (load and explore each of the three datasets), week-0 slide deck, and the bin-packing comuna assigner are linked from the Resources page.
  • Previous editions: Spring 2025 (environmental exposures + NCD burden).

Important note on interpretation

This is an important and sensitive topic. A central part of the course is making clear to students that, although the datasets are real, the work is an academic and pedagogical exercise. The results should not be interpreted as definitive empirical claims about immigrant health, disease burden, or public policy.

The analyses are designed to teach data integration, ecological inference, count regression, uncertainty, and the limits of aggregate data. Any substantive conclusions should be treated as provisional and illustrative. Robust empirical claims in this area require dedicated study designs, domain expertise, institutional review, and interpretation by public health and migration specialists.

If you teach a similar course and want to compare notes on how ecological-inference pedagogy plays out in practice, write to me.

Tags: , , ,

The raw trips maps is now corrected!! Every arc in this map of Santiago is coloured by how much the raw mobile network data was wrong. Red means under-counted. Blue means over-represented. Width is corrected trip volume.

The striking thing is that the map is overwhelmingly red. Not just at the periphery, where sparse tower coverage is expected to introduce bias. Even the thick, dominant corridors are red.

The reason is likely geometry. The flows that rank highest visually are the ones that combine high volume with long distance, and long-distance commuting in Santiago almost always has at least one endpoint in a less-covered area outside the dense urban core. That is exactly where tower density drops, detection probability falls, and the network starts to miss trips.

This is the corrected origin-destination matrix for the Santiago metropolitan region (the raw one is here), derived from mobile phone (XDR) data and reweighted using inverse probability weighting (IPW) to account for heterogeneous tower density. The colour of each arc is the log-ratio of corrected to raw trips for that specific corridor, *not the trip count itself*.

We do this here:
– Ferres, L., & Elejalde, E. (2026). Systematic biases in mobile phone mobility data from heterogeneous tower density. Zenodo. https://doi.org/10.5281/zenodo.19484460

I’ll write more about the statistical fraamework we used, but the detaills are there and in the associated github.

The literature has treated cell towers as isotropic light bulbs since 2008. They’re really not (that’s why sometimes you don’t have a mobile phone signal!).


Left: Voronoi tessellation of 1,292 BTS across Santiago (R13), ~6 km window over the downtown core. Each tower owns every point closer to it than any other mast, with 360° coverage assumed.

Right: the same towers, drawn as directional sector wedges built from the azimuth field that has always been in the catalog. Most masts carry three antennas radiating ~120° beams; a few carry six.

Dropping the isotropic assumption pulls the median effective radius from 504 m to 375 m region-wide, and from 245 m to 181 m inside this window. Sub-kilometre spatial resolution from a column already in the data.

The black gaps in the right panel are not missing values. They are the map being honest about where no antenna is aiming.

I’m working on a series of blog posts and “spinoffs” of this paper:

Ferres, L., & Elejalde, E. (2026). Systematic biases in mobile phone mobility data from heterogeneous tower density. Zenodo. https://zenodo.org/records/19484460

Next, I’ll try to explain the statistical methods we used, and how we can correct the values.

Tags: , , ,

This is one of the most complete pictures of a city’s daily movement that money can buy. Eigh hundred and ninety million trips.

Origin-destination matrix from mobile phone data (XDR records) in Santiago, Chile. 658k H3 hex-to-hex pairs, 6.5 weeks of observations, top 2,000 flows ranked by log(volume) x sqrt(distance).

Each arc is a quadratic Bezier between H3 resolution-8 cell centroids (~460 m edges). Hexagons colored by log10 of total trip throughput. Dark background, no basemap, just the data.

Tower density in Santiago varies by two orders of magnitude between urban core and rural periphery, which means short-distance and rural trips are systematically invisible. These are three stylistic variations (amber, electric, and magenta/pink) of the uncorrected picture.

– Code: https://github.com/leoferres/mobilens
– Preprint: https://zenodo.org/records/19484460

Tags: , , ,

New preprint and accompanying software release: “Systematic biases in mobile phone mobility data from heterogeneous tower density,” with Erick Elejalde (L3S, Leibniz Universität Hannover).    

Mobile phone records (CDRs and the higher-resolution XDRs) are now standard inputs for human mobility, epidemic modelling, and disaster response, but the spatial distribution of cell towers introduces measurement biases that are rarely quantified. Towers cluster in cities and thin out in rural areas. The result is a spatially structured detection floor: short rural trips never cross a sector boundary and are invisible, rural users get misattributed to oversized Voronoi cells, and origin-destination matrices end up artificially urban-centric. The biases are correlated with the very variable (urbanicity) that researchers most often want to study.

We characterise the problem and propose a six-step correction pipeline:

1. Sector polygons inferred from antenna azimuth and height, replacing the standard tower-point Voronoi tessellation
2. Detection-floor modelling at the per-site level
3. Dasymetric redistribution of census population onto an H3 hexagonal grid
4. OD construction with intra-site sector-crossing recovery
5. Inverse probability weighting with a tower-density-aware inclusion probability          
6. Fay-Herriot small area smoothing toward a gravity prior

Applied to the Región Metropolitana de Santiago using a 63,832 antenna catalog, the 2024 Chilean census, and 6.5 weeks of XDR data:

– Sector polygons give a 3.0x gain in effective spatial resolution over tower-point Voronoi (median 299 m versus 894 m)
– The 50% detection threshold ranges from 16 m in the urban core to 2,542 m at the most isolated site
– Intra-site sector crossings recover roughly 100 million short-distance trips (median displacement 429 m) that are invisible at the tower level
– IPW uplifts rural comuna flows by 50 to 73%, while the urban core is slightly downweighted
– Fay-Herriot shrinkage weights vary from about 0.7 in the urban core to under 0.1 at the periphery, mirroring the tower-density gradient

The pipeline is implemented in mobilens, an MIT-licensed Python library that is operator- and country-agnostic. The minimum inputs are a tower catalog with azimuth and height, a census population layer at any administrative level, and a study area boundary polygon. Steps 1 to 3 (the spatial characterisation of the bias) can be carried out without any XDR data, which makes the library useful even where records are unavailable.

– Code: https://github.com/leoferres/mobilens
– Preprint: https://zenodo.org/records/19484460

Substantive feedback before journal submission is welcome!!

Tags: , ,

In 2008, González, Hidalgo, and Barabási published a landmark study in Nature titled “Understanding individual human mobility patterns“. Using mobile phone records from a European country (?), they argued that human movement follows a truncated power law; a Lévy flight-like pattern where most trips are short, but occasional long-range jumps occur with non-negligible probability. The paper shaped a (two?) decade(s) of mobility research and influenced models of epidemic spreading, urban planning, transportation research and even disaster prevention.

It’s been a while since I’ve been meaning to replicate their core findings using a different dataset: anonymized XDR (data session) records from Santiago de Chile. I used a short period I had handy covering 17 days in early 2020 (February 27 to March 15). The dataset, however, is substantially larger than the original: 3.6 million unique users generating 464 million displacement observations. Also, obviously, it comes from a different continent and a different era of mobile technology. The question is simple: do González et al.’s scaling laws hold up?

The aggregate displacement distribution

The central empirical claim of the paper is that the probability of observing a displacement of length \(\Delta r\) follows a truncated power law:

$$
P(\Delta r) \sim \Delta r^{-\beta} \exp(-\Delta r / \kappa)
$$

where \(\beta \approx 1.75\) and \(\kappa\) is a cutoff distance. This heavy-tailed distribution is the statistical signature of Lévy flight-like mobility: displacements span several orders of magnitude with no characteristic scale (up to the cutoff).

We computed \(P(\Delta r)\) from the full set of 464 million displacements and fitted a truncated power law using maximum likelihood estimation. The result is shown below.

The pure power-law exponent is \(\alpha = 1.74\) (yes, I called it \(\alpha\) not \(\beta\) 😩), remarkably close to the \(\beta \approx 1.75\) reported by González et al. The truncated power-law fit, which jointly estimates the cutoff, yields a lower exponent of 1.67 (shown in the figure), as the exponential decay absorbs some of the tail weight. This is a clean replication of their Figure 1a: the heavy-tailed character of human mobility is robust across datasets, geographies, and time periods.

The radius of gyration

González et al. also characterized mobility through the radius of gyration \(r_g\), a measure of how far each individual typically ranges from their center of mass. They found that \(r_g\) itself follows a truncated power law:

$$P(r_g) \sim r_g^{-\beta_r} \exp(-r_g / \kappa_r)$$

with \(\beta_r \approx 1.65\). This means that most people are relatively sedentary, but a fat tail of highly mobile individuals exists.

We computed \(r_g\) for each of the 3.6 million users and fitted the distribution.

The pure power-law exponent is 1.41, somewhat lower than the \(\beta_r \approx 1.65\) reported in the original paper. The truncated power-law fit yields 1.08, a larger gap that likely reflects real differences between the datasets: Santiago’s urban geography, the density of cell towers, and the observation period (17 days vs. 6 months) all affect the estimated range of individual mobility. Shorter observation windows mechanically compress \(r_g\) estimates because infrequent long-range trips may not be captured.

The rescaling collapse

Perhaps the most elegant result in González et al. is the rescaling collapse. They stratified users by their radius of gyration, computed the conditional displacement distribution \(P(\Delta r \mid r_g)\) within each bin, and then showed that plotting \(r_g \cdot P(\Delta r \mid r_g)\) against \(\Delta r / r_g\) causes all curves to collapse onto a single universal function. This implies that individual mobility patterns share a common shape, merely rescaled by each person’s characteristic travel distance.

We reproduced this analysis using seven \(r_g\) bins spanning 1 to 300 km. The rescaled curves do show meaningful overlap, but the collapse is not as tight as reported in the original paper. We quantified this using the coefficient of variation across curves at shared rescaled-\(x\) bins and obtained a mean CV of 0.55, indicating moderate but “imperfect” collapse. The 17-day observation window may again play a role: with limited data per user, the conditional distributions are noisier, and the \(r_g\) estimates themselves carry more uncertainty.

The exponent identity

González et al. derived a theoretical relationship linking the aggregate and individual exponents:

$$\beta = \alpha_{\text{ind}} + \beta_r – 1$$

where \(\alpha_{\text{ind}}\) is the typical individual-level power-law exponent. Using our estimates (\(\alpha_{\text{ind}} = 1.62\), \(\beta_r = 1.41\)), this predicts an aggregate exponent of \(\beta = 2.03\), compared to our observed value of 1.74, a discrepancy of about 0.29. In the original paper, the relationship holds more tightly, likely because the longer observation window yields more stable exponent estimates at both the individual and aggregate levels.

So?

The core finding of González et al. replicates cleanly in our Santiago de Chile data; ie., that aggregate human mobility follows a truncated power law with exponent near 1.75. This is the most robust result in the paper and the one with the largest practical consequences for mobility modeling.

The secondary results (the \(r_g\) distribution, the rescaling collapse, and the exponent identity) show the right qualitative behavior but with quantitative differences. Thus, these are best taken not as failures of the original analysis but as sensitivity to dataset characteristics. A 2020 17-day window with XDR records in a South American metropolis is a meaningfully different empirical setting than 6 months in 2008 (2007?) of call records in a European country. The broad patterns survive, though!

Tags: , ,

I love the simplicity of content-driven themes like Tarski by Ben Eastaugh. The design stays out of the way and lets the writing do the talking (see how Terence Tao uses this theme). But Tarski was hopelessly out of date (last commits were 15 years ago), so I used Codex to refresh the codebase and make it work with newer versions of PHP and WordPress.

Here’s what I changed:

  • Added proper WordPress title-tag support and moved title generation to modern filters (document_title_separator and pre_get_document_title).
  • Removed the hardcoded tag from header.php.
  • Added wp_body_open() to improve compatibility with newer plugins.
  • Rebuilt the document title logic to match modern WordPress contexts (front page, home, archives, taxonomy, etc.).
  • Updated deprecated tarski_doctitle() to delegate to the new title builder.
  • Fixed PHP 8.x constructor deprecations in TarskiOptions and the recent posts widget.
  • Updated the widget to use posts_per_page, wp_reset_postdata(), and a safe offset.
  • Fixed PHP 8.x warnings around dynamic properties and header option checks.
  • Matched the Walker_Comment method signatures to avoid fatal errors in WordPress 6.9 / PHP 8.3.
  • Hardened option handling to avoid undefined index notices.

In addition, hosting on Bluehost required an explicit configuration change for PHP session garbage collection. I created or edited a .user.ini file with the following settings:

session.gc_probability = 1
session.gc_divisor = 100

I am deliberately conservative with respect to security, as PHP is not my primary language. If you notice any remaining issues or outdated patterns, pull requests are very welcome. I’ve put the refreshed code in a GitHub repo here: https://github.com/leoferres/tarski.git. I’m not really planning to maintain this code except for fixing small security vulnerabilities like I said above.

Tags: , , , , , ,

A Father's Love

A Mother’s Love, attributed to a little-known Swedish painter Carl Olof Petersen (1828–1881), was said to have been painted in 1867 after he heard stories of a grieving mother who begged for her lost child’s return and instead embraced something monstrous. Petersen, remembered in small circles for his dark style and interest in themes of motherhood and the supernatural, supposedly captured this disturbing vision in a work that shocked audiences of the time, with rumors of fainting and even a curse before it disappeared into a private collection. The painting shows an old woman in pale robes, eyes closed in gentle devotion, holding a demonic figure with glowing red eyes and sharp teeth, its dark, clawed body a harsh contrast to her calm but determined expression.

Except it isn’t. It’s AI generated (the painting, not the story). The image first surfaced on Instagram (though the original post has since been removed) under the title “A Mother’s Love”, credited to Carl Olof Petersen. It spread quickly, gaining traction before disappearing, only to reappear in new uploads; one of which has already reached over 16K likes. Commenters marveled at what they thought was a forgotten masterpiece, speculating on its symbolism and supposed place in art history, though some have since noted it was AI-made. In fact, the “painting” was created with AI image generation tools, the false backstory crafted to make it seem real. The conversation then moved to Reddit, where users began to question its origins (see this thread), and soon after the truth came out.

But to me, it felt not only real, it also moved me deeply because of my own experience: I could relate to the mother’s feelings. And I know I’m not the only one. On Reddit, in r/WhatIsThisPainting, one user wrote, “It looks real to me, too,” echoing the certainty many expressed when first seeing the image. The same response appeared on Instagram, where the reposted version and other shares drew comments about the powerful emotions it stirred. One caption read, “AI can generate art… it depicts the all-encompassing nature of a mother’s love and the intense connection I felt.” Taken together, these reactions show how strongly people connect to the image, even after learning it isn’t a “real” painting.

What does this mean for art? To me, it shows that what matters most is not who made it or what tools were used, but its message and how it makes us feel. An image created by AI can still hit just as hard as a painting on canvas, because art is really about the meaning we take from it (not the medium used to express it). This one touched me in a very personal way, to the point of strong emotion. If something moves us, whether joy, sadness, or wonder, then it works as art. That means AI isn’t just copying art; it’s stepping into the same space where art has always lived: in the feelings and reactions it brings out in us.

But there’s one more thing. I can’t paint, and I know I never will. To learn oil painting at even a basic level would take me years, maybe a lifetime, and even then there’s no promise that I could ever reach the skill needed to show what I truly mean. That gap, the distance between what I feel inside and what my hands could put on a canvas, has always been impossible to cross. Yet with AI, I can create something that gets closer to the picture in my head. It gives me a way to share emotions and ideas that would otherwise stay locked away, unexpressed.

A Father's Love

I’ve lived through what almost every father goes through with their children: the tension between love and discipline, the struggle to guide while also letting go, the challenge of sharing hard-won experience. So I began to wonder: what would a “father’s love” look like if it were a painting? I asked an AI to create one in a similar style as “A Mother’s Love”. And here it is: “A Father’s Love“:

I loved it, more than I can really explain. What struck me was the strength and the rough violence in the picture. The father looks like he’s been stabbed, maybe dying, yet he’s embracing the child just like the mother was. The “child” is holding on to him as well, almost as if something important has been understood in that moment. Being a father often has that mix of toughness and care. There’s the need to protect, even if it costs you, and the need to be firm, even if it causes pain. Fathers can show love through hard lessons, through conflict, and sometimes even through anger. That edge of violence, whether in struggle, discipline, or sacrifice, can still carry a kind of love inside it.

Even if I’m the only one who feels this, that’s okay. What amazes me is that I was able to express something in a “painting” that I never could have made on my own. And if one day there’s a machine that can take this image and translate it into oil on canvas, with all the richness and weight that medium deserves, I would be glad.

There’s a lot more I could say, but I’ll end with what matters most to me: painting, like language, music, or any art, is only a tool for sharing ideas. If you don’t have an idea, the tool won’t help you; you’ll still have nothing to say. But AI has given me a way to express something I wanted to show without words, without needing years of training in the medium. We’ve entered, in a very real sense, the era of ideas. I’m still the one responsible for the message, for what it means. Tools like language or painting are now cheap. What counts, more than ever, are the ideas themselves.

This post also appears on Substack.

Tags:

As a simple exercise, I wanted to compare what Google knows about my trip, versus what the phone company knows. Here’s a reconstruction of a day, Oct 19, 2020, going from home to my university and then back. I’ve chosen expeditiousness over beauty, though.

The blue point to the right is my University, the one to the left is home. The blue line is the GPS of my Google maps timeline. The red dots are the latitude and longitude of the antennas I’ve connected to during the day, and the red line is the trajectory I followed (given the antennas I’ve connected to, using MovingPandas).

All in all, it’s not a bad fit, but there are indeed a few kilometers of difference between GPS and cell towers. This is somewhat worse for shorter trips:

(I’ll fix the x-axis at some point 🙂)

Even though the trip is reconstructed (more or less in the same way as before), I do get to connect to far away tower (>2.7km) from where I physically am. This means we need to be very careful and “clean up” trajectories when working with mobile phone data!

References

Tags: ,

This is my original ZX80 computer (by the way, the TV is also from the 1980s, a Sanyo, Portable Deluxe… I like the orange color very much):

I bought this a few years ago on Ebay, and it’s probably one of my most prized possessions. It featured a whopping 1KB of RAM, no sound, and it was black and white.

The original ZX80 came with a single “demo” cassette, produced by Lamo-Lem Laboratories in La Jolla, California, and an accompanying manual. The combo looked like this:

I have scanned the booklet and all the inserts in high resolution and made it available on the web in case someone else, like me, is interested in these technologies and the “early history” of home computing, Sinclair in particular.

I have uploaded five PDF files to my github account:

  1. The Lamo-Lem manual covers (1 page): This is a high-definition (1200 dpi, 17,347,913 bytes or about 17MB) photographic scan of both sides of the manual’s (very) soft cover (like the one in the picture above). You might have to rotate the pdf.
  2. The Lamo-Lem manual’s content pages (7 pages): This is a high-definition (1200 dpi, 10,833,875 bytes or about 10MB) text scan of the contents of the manual. The darker pages in the scan are in fact a darker beige in the original. This content explains how to use the different programs in the package.
  3. The Lamo-Lem inserts (1 page): This is a high-definition (1200 dpi, 9,279,479 bytes or about 9MB) photographic scan of the keyboard overlay that came with the manual. The yellow one (on top) is for the Composer, the blue one (bottom) is for the Checkbook Balancer. They may have been attached, given that the right margin shows a “Contents (r) by Lamo-Lem” message. Looks like it may just be a simple careless cutting by the original owner, but then again, maybe not, we know that these things were really very cheap!
  4. The Lamo-Lem manual’s content pages (1 page): This is a high-definition (1200 dpi, 2,276,034 bytes or about 2MB) text scan of the reference cards that came with the manual. The one on the left is the one for the Composer program and the one on the right is for the Etch-A-Screen program. There’s some annotation by the previous owner, I assume. Interesting that the character encoding was not ASCII.
  5. The Lamo-Lem ZX80 program design pages (2 pages): This is a high-definition (1200 dpi, 3,433,445 bytes or about 2MB) text scan of a helper for program design, archiving and maybe debugging. This is my favorite. It has sections for describing “named” variables, “one-letter” variables, arrays (with their parenthesis, A(), B(), …, Z()) and strings (A$, B$, …, Z$) and the coding itself. It’s like a paper comment section of a program. Notice that this was also intended for the ZX81.

I hope this is helpful to somone else.

DISCLAIMER: I’m not sure who owns the copyright for this, and I’d be grateful if anyone could tell me whether I’m violating any copyright laws. If so, I will take this information down asap. Still, I’d very much appreciate if the copyright holders would grant me their permission to post this online. This belongs to a museum! I’ve done due diligence in finding people related to Lamo-Lem Labs in La Jolla, California but have so far had no success.

Tags: , , ,

« Older entries