AE 04: Mathematical Models

Songs on Spotify

Important
  • Open RStudio and create a subfolder in your AE folder called “AE-04”

  • Go to the Canvas and locate your AE 04 assignment to get started.

  • Upload the ae-04.qmd and spotify-popular.csv files into the folder you just created.

library(tidyverse)
library(ggformula)
library(broom)
library(knitr)
library(patchwork) #arrange plots in a grid

Data

The data set for this assignment is a subset from the Spotify Songs Tidy Tuesday data set. The data were originally obtained from Spotify using the spotifyr R package.

It contains numerous characteristics for each song. You can see the full list of variables and definitions here. This analysis will focus specifically on the following variables:

variable class description
track_id character Song unique ID
track_name character Song Name
track_artist character Song Artist
track_popularity double Song Popularity (0-100) where higher is better
danceability double Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
energy double Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
loudness double The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
valence double A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
tempo double The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
duration_ms double Duration of song in milliseconds
spotify <- read_csv("../data/spotify-popular.csv")

What makes a song danceable? To answer this question, we’ll analyze data on some of the most popular songs on Spotify, i.e. those with track_popularity >= 80. We’ll use linear regression to fit some models to predict a song’s dancebility. Each group will be assigned a different predictor variable.

Predictor Assignment:

Below is your assigned predictor. Look at the table above for a definition.

  • Group 1: energy
  • Group 2: loudness
  • Group 3: valence
  • Group 4: tempo

Exercise 0

Below are plots as part of the exploratory data analysis. Change duration_ms and the axis labels to match your explanatory variable.

p1 <- gf_histogram(~duration_ms, data = spotify) |>  
  gf_labs(title = "Distribution of Song Duration", 
       subtitle = " for Popular songs on Spotify", 
       x = "Duration (ms)")

p2 <- gf_histogram(~danceability, data = spotify) |> 
  gf_labs(title = "Distribution of Danceability", 
       subtitle = "for Popular songs on Spotify", 
       x = "Danceability")
p1 + p2 # The patchwork package will arrange your plots for you

gf_point(danceability ~ duration_ms, data = spotify) |> 
  gf_labs(title = "Danceability vs. Duration", 
       subtitle = "Popular songs on Spotify", 
       x = "Duration (ms)", 
       y = "Danceability")

Exercise 1

Fit a model using your assigned explanatory variable to predict a songs danceability. Use tidy and kable to neatly display your model and have your reporter write you model on the white board. Be prepared to verbally interpret the slope.

## add code

Exercise 2

Write down the the null and alternative hypotheses to test whether your explanatory variable is a useful predictor.

Exercise 3

Identify the standard error of \(\hat{\beta}_1\) and your T-statistic from the output above. Interpret your test statistic. Do you think this provides evidence that your explanatory variable is a useful predictor of danceability?

Exercise 4

Identify and interpret the p-value in the output from Exercise 1. On the white board, draw a sketch of your p-value.

Exercise 5

Based on your p-value, how strong is the evidence that your assigned explanatory variable is a useful predictor of danceability?

Exercise 6

Write a conclusion for your test in the context of the problem.

Exercise 7

Use tidy to compute a (95 - your group number)% confidence interval for your slope.

Exercise 8

On the white board, draw a picture representing the critical value for you confidence interval.

Exercise 9

Interpret your confidence interval in the context of the problem.

Exercise 10

Predict the average danceability for a song with:

  • an energy score of 0.859
  • a loudness score of -5.03
  • a valence of 0.52
  • a tempo of 125

Report and interpret a 90% confidence interval for the average danceability. Have your reporter write you interval and estimate on the board.

Exercise 11

Report and interpret a 90% confidence interval for a single song with:

  • an energy score of 0.859
  • a loudness score of -5.03
  • a valence of 0.52
  • a tempo of 125

Exercise 12

Would you expect you interval to get wider or narrower if you were making a prediction for all songs with the above characteristics?

Important

To submit the AE:

  • Render the document to produce the HTML file with all of your work from today’s class.
  • Upload your QMD and HTML files to the Canvas assignment.