Syllabus

Downloads: PDF · Markdown

Overview

How did Chile go from roughly 2% foreign-born in 2002 to over 8% in 2024? Where do immigrants settle, what work do they find, and do their health outcomes differ from those of Chilean-born residents?

The Census 2024 microdatos give us the most detailed snapshot ever of Chile’s population: over 19 million individual records linked to their households and dwellings. Two complementary health datasets enrich the picture:

  • ENO (Enfermedades de Notificación Obligatoria), the national mandatory notifiable-disease surveillance registry, with about 333,000 records from 2007 to 2024.
  • GRD (Grupos Relacionados de Diagnóstico), hospital discharge records covering about 5 million episodes from 2019 to 2024, coded with ICD-10 diagnoses.

In this project-driven course, students work in pairs to weave all three datasets into a coherent narrative: Migration, Health, and Socioeconomic Integration in Chile. Each team is assigned one to three comunas (mostly Región Metropolitana) chosen so that each team covers a similar total population. Each team builds a complete analytical pipeline: from raw parquet/CSV/pipe-delimited files through demographic profiling, disease surveillance analysis, hospital discharge analysis, spatial mapping, and finally cross-dataset ecological modeling at the comuna level.

Because the three datasets contain different people (a census respondent is not the same individual as an ENO notification or a hospital discharge), cross-dataset analysis is necessarily ecological: each dataset is aggregated to the comuna level and linked by codigo_comuna. Understanding when ecological inference is valid, and when it is not, is an explicit learning objective.

Inverted lectures

This is not a typical lecture course, by design. Each week students receive an assignment that introduces a specific skill or analytical technique. The structure is intentionally inverted: the instructor points to high-quality tutorials, documentation, and video content to study independently before class. Class time is spent diving deeper, debugging, and resolving roadblocks together. Engagement is
essential: come to class having reviewed the material, having tried the assignment, and with real questions ready.

Topics

The course covers, roughly in order:

  • Tools. Jupyter (Colab / VS Code / JupyterLab), GitHub, Markdown; Python with pandas, pyarrow, geopandas, matplotlib, seaborn or plotly; statsmodels for regression.
  • Census 2024 microdata. Vivienda / hogar / persona tables, linking keys (id_vivienda, id_hogar, id_persona), parquet, hierarchical joins, missing-value conventions (-99, NA).
  • ENO. Semicolon-delimited CSV, 2007 to 2024; key fields: disease code, notification date, comuna, nationality, education; cleaning the “Desconocido” nationality category; rates over time.
  • GRD. Pipe-delimited zipped files (one per year), 2019 to 2024; key fields DIAGNOSTICO1, COMUNA, nationality, length of stay, severity; ICD-10 lookup via CIE-10.xlsx with diagnostic chapter grouping.
  • Demographic analysis. Age pyramids by sex, dependency ratios, household typology.
  • Migration variables. p24_lug_resid5 (residence 5 years ago), p25_lug_nacimiento (place of birth), p26_llegada_periodo (arrival period), p27_nacionalidad.
  • Labor force, education, housing. Labor force status, occupation (cod_ciuo), economic activity (cod_caenes), commute mode; years of schooling (escolaridad), education level (cine11); housing quality, overcrowding, internet access, tenure.
  • Spatial analysis. Comuna-level mapping with geopandas + shapefiles; choropleths.
  • Modeling. Logistic regression for binary census outcomes; Poisson and Negative Binomial for count outcomes (with population offset); ecological regression linking aggregate census predictors to health outcomes; coefficient interpretation (odds ratios, incidence rate ratios); predicted-rate maps; residual maps.

Prerequisites

  • Taller de Programación en Python.
  • IIP225A (Probability and Statistics).
  • or instructor permission.

Assignments (summary)

All work is done in pairs, with each pair assigned one to three comunas of roughly similar total population. The three datasets are introduced progressively: Tarea 0 gives a shallow first contact with all three; Tarea 1 goes deep on Census; Tarea 2 goes deep on ENO + GRD; Tarea 3 merges everything at the comuna level.

  • Tarea 0 (5 pts): Setup and first contact with all three datasets. Due Mar 12.
  • Tarea 1 (10 pts): Demographic profile and migration landscape. Due Mar 26.
  • Tarea 2 (10 pts): Health landscape (ENO + GRD). Due Apr 16.
  • Quiz 1 (2 pts, bonus): Share comuna-level summary tables in a class-wide format. Due Apr 20.
  • Tarea 3 (10 pts): Cross-dataset ecological modeling. Due Apr 30.
  • Final project (25 pts): Pick one anomaly from your pipeline and defend it: GitHub repo + 8 to 10 minute video + handwritten in-person memo. See the Final project page.

Full assignment briefs (PDF and Markdown) are on the Assignments page.

Grading

Component Points
Participation (engagement, questions, collaboration) 10
Tarea 0 (setup and first contact) 5
Tarea 1 (demographics and migration) 10
Tarea 2 (health landscape: ENO + GRD) 10
Tarea 3 (ecological modeling) 10
Final project: GitHub repo 10
Final project: video 7
Final project: handwritten memo (individual) 8
Total 70

You cannot pass the class without completing most of the core assignments and participating actively, as defined above.

Policies

  • Late policy. 2-day extension available by email to staff before the deadline.
  • Collaboration. Work is done in pairs. Both members must understand all code submitted.
  • AI policy.
    • Tareas 0 to 3 and the final project’s async components (repo + video): AI tools are allowed. Disclose them in the README of your final repo.
    • Final project’s in-person memo: no AI, no laptops, no notes. This is the contract that gives the async work its meaning.
  • Submission. Canvas (PDF of notebook + repo link for the final project; video link; the memo happens live).

Past instantiations

A similar course is offered often. See Fall 2025 and Spring 2025, or the parent course page.