Final Project

Downloads: PDF · Markdown · Anomaly proposal template (MD)

Key dates

  • Released: Friday, May 9, 2026.
  • Anomaly proposal due (gate, ungraded): Friday, May 15, 2026.
  • Repo + video due: Friday, May 29, 2026 (Canvas).
  • In-person handwritten memo: Friday, June 5, 2026.

Why this format

By the end of Tarea 3 your pipeline has produced thousands of numbers. Most are uninteresting; a few are not. The final project asks you to pick one finding from your pipeline that genuinely surprised you and explain it well enough that an outsider believes you understand it. Shrink the scope and grow the depth.

This is also an exercise in critical thinking without leaning on AI. The repo and the video are async and may be produced with any tools you like, including AI assistants, with appropriate disclosure in the README. The third component is taken in person, on paper, and it will tell us whether you actually own what your repo claims.

Components at a glance

ComponentModePts
1. GitHub repoasync, pair10
2. Videoasync, pair (8 to 10 min)7
3. Handwritten in-person memosync, individual (45 min)8
Total25

What is an “anomaly”?

A single, concrete, defensible finding from your pipeline that you did not expect. Examples that qualify:

  • A comuna whose ENO notification rate is more than two standard deviations from the regional mean for a specific disease, after adjusting for population.
  • A coefficient whose sign flips between Poisson and Negative Binomial in your Tarea 3 model, with material consequences for interpretation.
  • A subgroup (a nationality, an age band, an occupation) whose hospitalization length-of-stay distribution has a heavy right tail not visible at the comuna level.
  • An apparent migration corridor (residence 5 years ago vs. current comuna) that is much larger or smaller than population alone would predict.
  • A choropleth where the spatial pattern is the opposite of what the demographic correlation predicts.

Examples that do not qualify:

  • “Foreign-born residents have a higher TB notification rate.” Already documented; not a surprise.
  • “Maipú has more discharges than San Ramón.” Population scale, not an anomaly.
  • A finding fully explained by the ecological-fallacy discussion you already wrote in Tarea 3. Pick something new.

Step 0: Anomaly proposal (Fri May 15, ungraded gate)

Submit on Canvas, one paragraph (~150 words):

  1. What is the anomaly, in one sentence?
  2. Where does it live: comuna(s), variable(s), figure or table cell from Tarea 1, 2, or 3?
  3. Why is it surprising: what would you have predicted, and how far off is the observed value?
  4. First-pass alternatives you already suspect (not yet committing).

You will hear back by Mon May 18 with one of: approved, pick another, or narrow this. No grade attaches, but the gate is required: a missed proposal blocks the rest of the project. Template: anomaly-proposal-template.md.

Component 1: GitHub repo (10 pts, async)

Extend your existing team repo with a focused notebook and a polished README built around your chosen anomaly.

your-repo/
  README.md
  requirements.txt              (or environment.yml)
  notebooks/
    tarea0.ipynb                (carry forward, no edits required)
    tarea1.ipynb
    tarea2.ipynb
    tarea3.ipynb
    final_anomaly.ipynb         (NEW)
  data/                         (paths or download script; do
                                 NOT commit large parquet/zip)
  figs/
    headline.png                (the figure used in your video)
    ...                         (diagnostic / supporting figures)

final_anomaly.ipynb must contain: a section header stating the anomaly in one paragraph (identical wording to your video opener); the headline figure; the minimal isolation code (filters, joins, aggregations); at least two alternative-explanation checks, each as a small code cell with a short Markdown comment; a closing Markdown cell with the three or four sentences that match the conclusion of your video.

README must answer, in under two minutes for a TA who has never seen your repo: which comunas; what the anomaly is; which notebook produces the headline figure and how long it takes; install and run instructions; an AI-use disclosure (which tools, for what; honest, specific). Undisclosed AI use is what we penalise; honest disclosure is not.

Repo rubricPtsCheck
Reproducibility3Clone + follow README; headline figure regenerates without manual fixes.
Pipeline correctness3Joins, offsets, units verified by spot-check; no silent imputation.
final_anomaly.ipynb quality2Anomaly clearly isolated; alternatives explored in code, not just in prose.
README + organization + AI disclosure2Two-minute read; present and complete.

Component 2: Video (7 pts, async, 8 to 10 minutes)

A focused defense of one finding, not a tour of three datasets. Suggested structure:

SectionTimeContent
1. The anomaly60 to 90 secOne sentence. One figure. No throat-clearing.
2. Demographic context60 to 90 secMinimum from Tareas 1 and 2 needed to make the anomaly legible.
3. Depth dive3 to 5 minAlternatives. Evidence for each. Model checks. Ecological fallacy framing.
4. So what60 secWhat this suggests for public health or migration policy AND what it cannot say.
5. Limits and next steps30 to 60 secWhat you would do with one more week.

Both partners must speak. A laptop microphone in a quiet room is fine; record at 1080p so the figures are legible.

Video rubricPtsCheck
Anomaly framing2Specific, falsifiable, grounded in numbers, not vibes.
Depth of explanation3Alternatives ruled out with evidence, not asserted away.
Visual + delivery quality2Figures legible at 1080p; both partners speak; paced for an outside viewer.

Component 3: Handwritten in-person memo (8 pts, individual, 45 min)

This is the only integrity lever in the project, so it carries real weight. The session is in person. No laptops, no phones, no notes, no AI. You write by hand on paper provided by the staff. Each member of a pair writes their own memo; the two memos are graded independently.

On the day, the staff picks three prompts from the bank below; you answer all three in roughly 15 minutes each. Bring a pen.

  1. State your team’s anomaly in one paragraph. Reference the specific comuna(s), variable(s), and approximate magnitude of the effect.
  2. Which notebook and which cell produces the headline number? Sketch the data path: file to filter to join to aggregation.
  3. Name two alternative explanations. For each, the quickest check that would have falsified it, and what your check actually showed.
  4. Why does (or does not) your anomaly survive an ecological fallacy critique? Use your own data.
  5. If you had one more week, what is the single check you would run, and what would you expect to find?
  6. A skeptical reader claims your anomaly is an artefact of anonymisation in the ENO data. Respond.
  7. Pick one comuna not assigned to your team and predict, with reasoning, whether your anomaly should appear there too.
Memo rubric (per student)PtsCheck
Specificity3Cites real numbers, real comunas, real code paths from memory.
Critical reasoning3Alternatives, falsification, limits; not assertion.
Coherence with team artefacts2What you wrote matches what your repo and video say.

If one partner produces a clearly thin memo while the other does not, only the thin partner loses points. The grade is individual.

AI policy

  • Async (repo + video). AI tools allowed. Disclose them in the README. You remain accountable for every line and every claim.
  • In-person (memo). No AI, no laptops, no notes. The async work leans on tools; the in-person work checks that you understood it.

Deliverables checklist

  • Fri May 15: anomaly proposal (one paragraph) on Canvas.
  • Fri May 29: PDF of final_anomaly.ipynb on Canvas; link to GitHub repo (with README, AI-use disclosure, requirements.txt, four Tarea notebooks, and final_anomaly.ipynb); link to the video (YouTube unlisted, Vimeo, or Drive; must play without login).
  • Fri Jun 5: bring a pen. We provide paper. Bring nothing else.

Tips and common pitfalls

  • Don’t pick something you already explained in Tarea 3. That work is graded; pick something new.
  • Don’t over-claim. An anomaly you think is real but cannot rule out as an artefact is fine to present, as long as you say so.
  • Pre-mortem the memo. Sit across from your partner and quiz each other with the prompt bank, on paper, with no tools open. Surprises mean your README is missing something.
  • Treat the AI disclosure as a feature, not a confession. Specific is fine; vague is not.
  • Headline figure first. Build the one figure you would defend first; everything else is downstream of that figure.