AE 04: Mathematical Models

Songs on Spotify

Important

Open RStudio and create a subfolder in your AE folder called “AE-04”
Go to the Canvas and locate your AE 04 assignment to get started.
Upload the ae-04.qmd and spotify-popular.csv files into the folder you just created.

library(tidyverse)
library(ggformula)
library(broom)
library(knitr)
library(patchwork) #arrange plots in a grid

Data

The data set for this assignment is a subset from the Spotify Songs Tidy Tuesday data set. The data were originally obtained from Spotify using the spotifyr R package.

It contains numerous characteristics for each song. You can see the full list of variables and definitions here. This analysis will focus specifically on the following variables:

variable	class	description
track_id	character	Song unique ID
track_name	character	Song Name
track_artist	character	Song Artist
track_popularity	double	Song Popularity (0-100) where higher is better
danceability	double	Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
energy	double	Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
loudness	double	The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
valence	double	A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
tempo	double	The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
duration_ms	double	Duration of song in milliseconds

spotify <- read_csv("../data/spotify-popular.csv")

What makes a song danceable? To answer this question, we’ll analyze data on some of the most popular songs on Spotify, i.e. those with track_popularity >= 80. We’ll use linear regression to fit some models to predict a song’s dancebility. Each group will be assigned a different predictor variable.

Predictor Assignment:

Below is your assigned predictor. Look at the table above for a definition.

Group 1: energy
Group 2: loudness
Group 3: valence
Group 4: tempo

Exercise 0

Below are plots as part of the exploratory data analysis. Change duration_ms and the axis labels to match your explanatory variable.

p1 <- gf_histogram(~duration_ms, data = spotify) |>  
  gf_labs(title = "Distribution of Song Duration", 
       subtitle = " for Popular songs on Spotify", 
       x = "Duration (ms)")

p2 <- gf_histogram(~danceability, data = spotify) |> 
  gf_labs(title = "Distribution of Danceability", 
       subtitle = "for Popular songs on Spotify", 
       x = "Danceability")
p1 + p2 # The patchwork package will arrange your plots for you

`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

gf_point(danceability ~ duration_ms, data = spotify) |> 
  gf_labs(title = "Danceability vs. Duration", 
       subtitle = "Popular songs on Spotify", 
       x = "Duration (ms)", 
       y = "Danceability")

Exercise 1

Fit a model using your assigned explanatory variable to predict a songs danceability. Use tidy and kable to neatly display your model and have your reporter write you model on the white board. Be prepared to verbally interpret the slope.

## add code

Exercise 2

Write down the the null and alternative hypotheses to test whether your explanatory variable is a useful predictor. Don’t computer your p-vaue, but write down some all the factors you can think of that will impact the p-value.

Exercise 3

Identify the standard error of \(\hat{\beta}_1\) and your T-statistic from the output above. Interpret your test statistic. Do you think this provides evidence that your explanatory variable is a useful predictor of danceability?

Exercise 4

Identify and interpret the p-value in the output from Exercise 1. On the white board, draw a sketch of your p-value.

Exercise 5

Based on your p-value, how strong is the evidence that your assigned explanatory variable is a useful predictor of danceability?

Exercise 6

Write a conclusion for your test in the context of the problem.

Exercise 7

Use tidy to compute a 95% confidence interval for your slope.

Exercise 8

On the white board, draw a picture representing the critical value for you confidence interval.

Exercise 9

Interpret your confidence interval in the context of the problem.

Important

To submit the AE:

Render the document to produce the HTML file with all of your work from today’s class.
Upload your QMD and HTML files to the Canvas assignment.