Overview
How did Chile go from roughly 2% foreign-born in 2002 to over 8% in 2024? Where do immigrants settle, what work do they find, and do their health outcomes differ from those of Chilean-born residents?
The Census 2024 microdatos give us the most detailed snapshot ever of Chile’s population: over 19 million individual records linked to their households and dwellings. Two complementary health datasets enrich the picture:
- ENO (Enfermedades de Notificación Obligatoria), the national mandatory notifiable-disease surveillance registry, with about 333,000 records from 2007 to 2024.
- GRD (Grupos Relacionados de Diagnóstico), hospital discharge records covering about 5 million episodes from 2019 to 2024, coded with ICD-10 diagnoses.
In this project-driven course, students work in pairs to weave all three datasets into a coherent narrative: Migration, Health, and Socioeconomic Integration in Chile. Each team is assigned one to three comunas (mostly Región Metropolitana) chosen so that each team covers a similar total population. Each team builds a complete analytical pipeline: from raw parquet/CSV/pipe-delimited files through demographic profiling, disease surveillance analysis, hospital discharge analysis, spatial mapping, and finally cross-dataset ecological modeling at the comuna level.
Because the three datasets contain different people (a census respondent is not the same individual as an ENO notification or a hospital discharge), cross-dataset analysis is necessarily ecological: each dataset is aggregated to the comuna level and linked by codigo_comuna. Understanding when ecological inference is valid, and when it is not, is an explicit learning objective.
Inverted lectures
This is not a typical lecture course, by design. Each week students receive an assignment that introduces a specific skill or analytical technique. The structure is intentionally inverted: the instructor points to high-quality tutorials, documentation, and video content to study independently before class. Class time is spent diving deeper, debugging, and resolving roadblocks together. Engagement is
essential: come to class having reviewed the material, having tried the assignment, and with real questions ready.
Topics
The course covers, roughly in order:
- Tools. Jupyter (Colab / VS Code / JupyterLab), GitHub, Markdown; Python with pandas, pyarrow, geopandas, matplotlib, seaborn or plotly; statsmodels for regression.
- Census 2024 microdata. Vivienda / hogar / persona tables, linking keys (
id_vivienda,id_hogar,id_persona), parquet, hierarchical joins, missing-value conventions (-99,NA). - ENO. Semicolon-delimited CSV, 2007 to 2024; key fields: disease code, notification date, comuna, nationality, education; cleaning the “Desconocido” nationality category; rates over time.
- GRD. Pipe-delimited zipped files (one per year), 2019 to 2024; key fields
DIAGNOSTICO1,COMUNA, nationality, length of stay, severity; ICD-10 lookup viaCIE-10.xlsxwith diagnostic chapter grouping. - Demographic analysis. Age pyramids by sex, dependency ratios, household typology.
- Migration variables.
p24_lug_resid5(residence 5 years ago),p25_lug_nacimiento(place of birth),p26_llegada_periodo(arrival period),p27_nacionalidad. - Labor force, education, housing. Labor force status, occupation (
cod_ciuo), economic activity (cod_caenes), commute mode; years of schooling (escolaridad), education level (cine11); housing quality, overcrowding, internet access, tenure. - Spatial analysis. Comuna-level mapping with geopandas + shapefiles; choropleths.
- Modeling. Logistic regression for binary census outcomes; Poisson and Negative Binomial for count outcomes (with population offset); ecological regression linking aggregate census predictors to health outcomes; coefficient interpretation (odds ratios, incidence rate ratios); predicted-rate maps; residual maps.
Prerequisites
- Taller de Programación en Python.
- IIP225A (Probability and Statistics).
- or instructor permission.
Assignments (summary)
All work is done in pairs, with each pair assigned one to three comunas of roughly similar total population. The three datasets are introduced progressively: Tarea 0 gives a shallow first contact with all three; Tarea 1 goes deep on Census; Tarea 2 goes deep on ENO + GRD; Tarea 3 merges everything at the comuna level.
- Tarea 0 (5 pts): Setup and first contact with all three datasets. Due Mar 12.
- Tarea 1 (10 pts): Demographic profile and migration landscape. Due Mar 26.
- Tarea 2 (10 pts): Health landscape (ENO + GRD). Due Apr 16.
- Quiz 1 (2 pts, bonus): Share comuna-level summary tables in a class-wide format. Due Apr 20.
- Tarea 3 (10 pts): Cross-dataset ecological modeling. Due Apr 30.
- Final project (25 pts): Pick one anomaly from your pipeline and defend it: GitHub repo + 8 to 10 minute video + handwritten in-person memo. See the Final project page.
Full assignment briefs (PDF and Markdown) are on the Assignments page.
Grading
| Component | Points |
|---|---|
| Participation (engagement, questions, collaboration) | 10 |
| Tarea 0 (setup and first contact) | 5 |
| Tarea 1 (demographics and migration) | 10 |
| Tarea 2 (health landscape: ENO + GRD) | 10 |
| Tarea 3 (ecological modeling) | 10 |
| Final project: GitHub repo | 10 |
| Final project: video | 7 |
| Final project: handwritten memo (individual) | 8 |
| Total | 70 |
You cannot pass the class without completing most of the core assignments and participating actively, as defined above.
Policies
- Late policy. 2-day extension available by email to staff before the deadline.
- Collaboration. Work is done in pairs. Both members must understand all code submitted.
- AI policy.
- Tareas 0 to 3 and the final project’s async components (repo + video): AI tools are allowed. Disclose them in the README of your final repo.
- Final project’s in-person memo: no AI, no laptops, no notes. This is the contract that gives the async work its meaning.
- Submission. Canvas (PDF of notebook + repo link for the final project; video link; the memo happens live).
Past instantiations
A similar course is offered often. See Fall 2025 and Spring 2025, or the parent course page.
