Source: R for Data Science with additions from The Art of Statistics: How to Learn from Data.
Source: R for Data Science
What does it mean for an analysis to be reproducible?
Near term goals:
✔️ Can the tables and figures be exactly reproduced from the code and data?
✔️ Does the code actually do what you think it does?
✔️ In addition to what was done, is it clear why it was done?
Long term goals:
✔️ Can the code be used for other data?
✔️ Can you extend the code to do other things?
Results produced are more reliable and trustworthy (Ostblom and Timbers 2022)
Facilitates more effective collaboration (Ostblom and Timbers 2022)
Contributing to science, which builds and organizes knowledge in terms of testable hypotheses (Alexander 2023)
Possible to identify and correct errors or biases in the analysis process (Alexander 2023)
| Reproducibility error | Consequence | Source(s) |
|---|---|---|
| Limitations in Excel data formats | Loss of 16,000 COVID case records in the UK | (Kelion 2020) |
| Automatic formatting in Excel | Important genes disregarded in scientific studies | (Ziemann, Eren, and El-Osta 2016) |
| Deletion of a cell caused rows to shift | Mix-up of which patient group received the treatment | (Wallensteen et al. 2018) |
| Using binary instead of explanatory labels | Mix-up of the intervention with the control group | (Aboumatar and Wise 2019) |
| Using the same notation for missing data and zero values | Paper retraction | (Whitehouse et al. 2021) |
| Incorrectly copying data in a spreadsheet | Delay in the opening of a hospital | (Picken 2020) |
Source: Ostblom and Timbers (2022)
Scriptability \(\rightarrow\) R
Literate programming (code, narrative, output in one place) Jupyter-notebooks and Deepnote
Version control \(\rightarrow\) Git / GitHub (Beyond the scope of this course)
R is a statistical programming language
Jupyter notebooks are a convenient interface for R
Let’s all create a Deepnote account using our CofI email addresses.
Fully reproducible reports – the analysis is run from the beginning each time you Run the full notebook
Code blocks for writing code and markdown blocks for writing prose
Visual editor to make document editing experience similar to a word processor (Google docs, Word, Pages, etc.)
Every application exercise and assignment is written in a Jupyter notebook
You’ll have a template notebook to start with
The amount of scaffolding in the template will decrease over the semester
Any time we are working on AEs, I will randomly assign you to groups of two or three. Each person will have a role:
Complete the activity.