library(tidyverse)
library(ggformula)
library(broom)
library(knitr)
library(patchwork) #arrange plots in a gridAE 04: Mathematical Models
Songs on Spotify
Data
The data set for this assignment is a subset from the Spotify Songs Tidy Tuesday data set. The data were originally obtained from Spotify using the spotifyr R package.
It contains numerous characteristics for each song. You can see the full list of variables and definitions here. This analysis will focus specifically on the following variables:
| variable | class | description |
|---|---|---|
| track_id | character | Song unique ID |
| track_name | character | Song Name |
| track_artist | character | Song Artist |
| track_popularity | double | Song Popularity (0-100) where higher is better |
| danceability | double | Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. |
| energy | double | Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
| loudness | double | The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db. |
| valence | double | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
| tempo | double | The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. |
| duration_ms | double | Duration of song in milliseconds |
spotify <- read_csv("../data/spotify-popular.csv")What makes a song danceable? To answer this question, we’ll analyze data on some of the most popular songs on Spotify, i.e. those with track_popularity >= 80. We’ll use linear regression to fit some models to predict a song’s dancebility. Each group will be assigned a different predictor variable.
Predictor Assignment:
Below is your assigned predictor. Look at the table above for a definition.
- Group 1:
energy - Group 2:
loudness - Group 3:
valence - Group 4:
tempo
Exercise 0
Below are plots as part of the exploratory data analysis. Change duration_ms and the axis labels to match your explanatory variable.
p1 <- gf_histogram(~duration_ms, data = spotify) |>
gf_labs(title = "Distribution of Song Duration",
subtitle = " for Popular songs on Spotify",
x = "Duration (ms)")
p2 <- gf_histogram(~danceability, data = spotify) |>
gf_labs(title = "Distribution of Danceability",
subtitle = "for Popular songs on Spotify",
x = "Danceability")
p1 + p2 # The patchwork package will arrange your plots for you`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
gf_point(danceability ~ duration_ms, data = spotify) |>
gf_labs(title = "Danceability vs. Duration",
subtitle = "Popular songs on Spotify",
x = "Duration (ms)",
y = "Danceability")Exercise 1
Fit a model using your assigned explanatory variable to predict a songs danceability. Use tidy and kable to neatly display your model and have your reporter write you model on the white board. Be prepared to verbally interpret the slope.
## add codeExercise 2
Write down the the null and alternative hypotheses to test whether your explanatory variable is a useful predictor. Don’t computer your p-vaue, but write down some all the factors you can think of that will impact the p-value.
Exercise 3
Identify the standard error of \(\hat{\beta}_1\) and your T-statistic from the output above. Interpret your test statistic. Do you think this provides evidence that your explanatory variable is a useful predictor of danceability?
Exercise 4
Identify and interpret the p-value in the output from Exercise 1. On the white board, draw a sketch of your p-value.
Exercise 5
Based on your p-value, how strong is the evidence that your assigned explanatory variable is a useful predictor of danceability?
Exercise 6
Write a conclusion for your test in the context of the problem.
Exercise 7
Use tidy to compute a 95% confidence interval for your slope.
Exercise 8
On the white board, draw a picture representing the critical value for you confidence interval.
Exercise 9
Interpret your confidence interval in the context of the problem.
To submit the AE:
- Render the document to produce the HTML file with all of your work from today’s class.
- Upload your QMD and HTML files to the Canvas assignment.