library(tidyverse)
library(ggformula)
library(broom)
library(knitr)
library(patchwork) #arrange plots in a grid
AE 04: Mathematical Models
Songs on Spotify
Data
The data set for this assignment is a subset from the Spotify Songs Tidy Tuesday data set. The data were originally obtained from Spotify using the spotifyr R package.
It contains numerous characteristics for each song. You can see the full list of variables and definitions here. This analysis will focus specifically on the following variables:
variable | class | description |
---|---|---|
track_id | character | Song unique ID |
track_name | character | Song Name |
track_artist | character | Song Artist |
track_popularity | double | Song Popularity (0-100) where higher is better |
danceability | double | Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. |
energy | double | Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
loudness | double | The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db. |
valence | double | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
tempo | double | The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. |
duration_ms | double | Duration of song in milliseconds |
<- read_csv("../data/spotify-popular.csv") spotify
What makes a song danceable? To answer this question, we’ll analyze data on some of the most popular songs on Spotify, i.e. those with track_popularity >= 80
. We’ll use linear regression to fit some models to predict a song’s dancebility
. Each group will be assigned a different predictor variable.
Predictor Assignment:
Below is your assigned predictor. Look at the table above for a definition.
- Group 1:
energy
- Group 2:
loudness
- Group 3:
valence
- Group 4:
tempo
Exercise 0
Below are plots as part of the exploratory data analysis. Change duration_ms
and the axis labels to match your explanatory variable.
<- gf_histogram(~duration_ms, data = spotify) |>
p1 gf_labs(title = "Distribution of Song Duration",
subtitle = " for Popular songs on Spotify",
x = "Duration (ms)")
<- gf_histogram(~danceability, data = spotify) |>
p2 gf_labs(title = "Distribution of Danceability",
subtitle = "for Popular songs on Spotify",
x = "Danceability")
+ p2 # The patchwork package will arrange your plots for you p1
gf_point(danceability ~ duration_ms, data = spotify) |>
gf_labs(title = "Danceability vs. Duration",
subtitle = "Popular songs on Spotify",
x = "Duration (ms)",
y = "Danceability")
Exercise 1
Fit a model using your assigned explanatory variable to predict a songs danceability
. Use tidy
and kable
to neatly display your model and have your reporter write you model on the white board. Be prepared to verbally interpret the slope.
## add code
Exercise 2
Write down the the null and alternative hypotheses to test whether your explanatory variable is a useful predictor.
Exercise 3
Identify the standard error of \(\hat{\beta}_1\) and your T-statistic from the output above. Interpret your test statistic. Do you think this provides evidence that your explanatory variable is a useful predictor of danceability
?
Exercise 4
Identify and interpret the p-value in the output from Exercise 1. On the white board, draw a sketch of your p-value.
Exercise 5
Based on your p-value, how strong is the evidence that your assigned explanatory variable is a useful predictor of danceability
?
Exercise 6
Write a conclusion for your test in the context of the problem.
Exercise 7
Use tidy
to compute a (95 - your group number)% confidence interval for your slope.
Exercise 8
On the white board, draw a picture representing the critical value for you confidence interval.
Exercise 9
Interpret your confidence interval in the context of the problem.
Exercise 10
Predict the average danceability for a song with:
- an energy score of 0.859
- a loudness score of -5.03
- a valence of 0.52
- a tempo of 125
Report and interpret a 90% confidence interval for the average danceability. Have your reporter write you interval and estimate on the board.
Exercise 11
Report and interpret a 90% confidence interval for a single song with:
- an energy score of 0.859
- a loudness score of -5.03
- a valence of 0.52
- a tempo of 125
Exercise 12
Would you expect you interval to get wider or narrower if you were making a prediction for all songs with the above characteristics?
To submit the AE:
- Render the document to produce the HTML file with all of your work from today’s class.
- Upload your QMD and HTML files to the Canvas assignment.