AE 02: Bike rentals in Washington, DC

Simple linear regression

Important
  • Open RStudio and create a subfolder in your AE folder called “AE-02”

  • Go to the Canvas and locate the AE-2 assignment. That link should take you directly there.

  • Upload the ae-02.qmd and dcbikeshare.csv files into the folder you just created.

library(tidyverse)
library(broom)
library(ggformula)
library(patchwork)

Data

Our data set contains daily rentals from the Capital Bikeshare in Washington, DC in 2011 and 2012. It was obtained from the dcbikeshare data set in the dsbox R package.

We will focus on the following variables in the analysis:

  • count: total bike rentals
  • temp_orig: Temperature in degrees Celsius
  • season: 1 - winter, 2 - spring, 3 - summer, 4 - fall

Click here for the full list of variables and definitions.

Exercises

Exercise 0

The following code will load the data. Note that you will likely need to change the file path. You can do this easily by deleting the existing path, pressing the TAB button while your cursor is inbetween the quotes, and then selecting the file you want to load.

bikeshare <- read_csv("../data/dcbikeshare.csv")

Exercise 1

Below is code to create visualizations of the distributions of daily bike rentals and temperature. Create a scatter plot of the number of rentals vs the temperature and store it in a variable called p3. Make sure to include good appropriate labels. Finally, uncomment the last line of code in this chunk. What do you think this last line does? Comment on what you see.

p1 <- gf_histogram(~ count, data = bikeshare, binwidth = 250) |> 
  gf_labs(x = "Daily bike rentals")

p2 <- gf_histogram(~temp_orig, data = bikeshare) |>  
  gf_labs(x = "Temperature (Celsius)")

# (p1 | p2) / p3

[Add your answer here]

Exercise 2

In the raw data, seasons are coded as 1, 2, 3, 4 as numerical values, corresponding to winter, spring, summer, and fall respectively. Recode the season variable to make it a categorical variable (a factor) with levels corresponding to season names, making sure that the levels appear in a reasonable order in the variable (i.e., not alphabetical).

# add code developed during livecoding here

Exercise 3

We want to evaluate whether the relationship between temperature and daily bike rentals is the same for each season. To answer this question, first create a scatter plot of daily bike rentals vs. temperature faceted by season.

# add code developed during livecoding here

Exercise 4

  • Which season appears to have the strongest relationship between temperature and daily bike rentals? Why do you think the relationship is strongest in this season?

  • Which season appears to have the weakest relationship between temperature and daily bike rentals? Why do you think the relationship is weakest in this season?

[Add your answer here]

Exercise 5

Filter your data for the season with the strongest apparent relationship between temperature and daily bike rentals. Give the resulting data set a new name.

# add code developed during livecoding here

Exercise 6

Going back to Exercise 1, there appears to be one day with a very small number of bike rentals. Use filter to figure out what that day was? Why were the number of bike rentals so low on that day? Hint: You can Google the date to figure out what was going on that day.

Exercise 7

Using the subset of the data from Exercise 5, fit a linear model for predicting daily bike rentals from temperature for this season.

# add code developed during livecoding here

Exercise 8

Use the output to write out the estimated regression equation.

[Add your answer here]

Exercise 9

Interpret the slope in the context of the data.

Exercise 10

Interpret the intercept in the context of the data.

Exercise 11

Create a visualization of the model we just created.

# add code developed during livecoding here

Exercise 12

Using only addition, subtraction, multiplication, and division. Compute the number of rentals our model would predict if it were 10 degrees Celcius outside.

Exercise 13

Verify your answer using the predict function.

# add code developed during livecoding here
Important

To submit the AE:

  • Render the document to produce the HTML with all of your work from today’s class.
  • The driver for your group should upload your .qmd and .html files to the Canvas assignment.