Data

Three official Chilean datasets anchor the course. Bulk data is not mirrored on this website; each team downloads from the official source listed below. Variable dictionaries and methodology documents are also fetched directly from the official portals.

1. Census 2024 microdata (INE)

  • What. Microdata from the 2024 Chilean census: about 19 million individual records, linked to their households (hogares) and dwellings (viviendas). Three relational tables: persona, hogar, vivienda; linking keys id_persona, id_hogar, id_vivienda.
  • Format. Parquet (read with pandas via pyarrow). Missing-value conventions: -99, NA.
  • Key variables for this course.
    • Migration: p24_lug_resid5, p25_lug_nacimiento, p26_llegada_periodo, p27_nacionalidad.
    • Labor: sit_fuerza_trabajo, cod_ciuo, cod_caenes, p45_medio_transporte.
    • Education: escolaridad, cine11, asistencia_*.
    • Housing: materialidad, hacinamiento, tenencia, internet access.
    • Geography: codigo_comuna.
  • Download. INE Censo 2024 results portal.
  • Documentation. Manual de uso de microdatos, variable dictionary, and variable glosses are available on the same INE portal (search for microdatos).

2. ENO: Notifiable disease surveillance (MINSAL)

  • What. Enfermedades de Notificación Obligatoria, the national mandatory disease-surveillance registry. About 333,000 records covering 2007 to 2024.
  • Format. Semicolon-delimited CSV.
  • Key variables. Disease code, notification date, comuna of residence, nationality, education. Note: nationality includes a “Desconocido” category that must be reported and excluded from nationality-specific rates.
  • Download. MINSAL DEIS public datasets portal: repositoriodeis.minsal.cl. Look for ENO (base ENO, validada y anonimizada).
  • Documentation. Variable dictionary (diccionario_de_variables_eno_*.xlsx) and methodology PDF (Metodología de validación, anonimización y publicación de bases de datos ENO) are on the same portal.

3. GRD: Hospital discharge records (MINSAL)

  • What. Grupos Relacionados de Diagnóstico, hospital discharge records, about 5 million episodes covering 2019 to 2024, ICD-10 coded.
  • Format. Pipe-delimited files, one per year, distributed as .zip / .rar.
  • Key variables. DIAGNOSTICO1 (primary ICD-10 diagnosis), COMUNA (residence), nationality, length of stay, severity, age, sex.
  • Download. MINSAL DEIS: repositoriodeis.minsal.cl. Look for GRD Público (one zip per year).
  • Documentation. ICD-10 lookup (CIE-10.xlsx), ICD-9 lookup (CIE-9.xlsx), master tables (TablasMaestrasBasesGRD.xlsx) are bundled with the GRD download. WHO ICD-10 browser (English): icd.who.int/browse10/2019/en.

Cartography (INE)

  • What. Comuna and regional boundaries for choropleth mapping. The course primarily uses Región Metropolitana (R13).
  • Format. ESRI shapefile and file geodatabase (.gdb).
  • Download. INE geodatos abiertos; the Cartografía Censo 2024 bundle includes regional and communal layers.
  • Library. geopandas. Linking key with the other datasets: codigo_comuna.

Important notes on cross-dataset analysis

The three datasets contain different people: a census respondent is not the same person as an ENO notification or a hospital discharge. Linking them is therefore necessarily ecological: aggregate to the comuna level and join on codigo_comuna. Tarea 3 asks you to do this explicitly and discuss when ecological inference is valid and when it is not (the ecological fallacy).

Privacy and reuse

ENO and GRD are published by MINSAL DEIS as anonymised research datasets. Re-identification attempts are forbidden. Census microdata are aggregated to the household level and do not include direct identifiers, but standard care still applies: do not publish individual-level slices that could re-identify small comuna populations. When in doubt, aggregate.