[SongPlz-Bot] 2. Severless & Data Ingestion & Recommender System
There are two basic recommender systems: (1) Collaborative Filtering, (2) Content-Based Filtering. It differs by what kinds of data that you are working with. Collaborative Filtering approach works with the user-item interactions types of data, such as ratings or buying behavior. On the other hand, Content-Based Filtering approach is for the attribute information about the users and items, such as textual profiles or relevant keywords.
In this post, I am going to perform an effective song recommendataion system with the combination of two user’s informations - mood and favorite artist.
Recommender system architecture
First, each user will get questions like below asking user’s mood by color and favorite singer.
- PHASE 1 : React to commands in Slack channel, do some basic operations like retrieving global top 10 songs.
- PHASE 2 :
Mood
For the song recommendataion algorithm based on user’s mood, I took some references from researchgate.
So, few colors will be listed to let users choose depending on their mood (red 🔴, yellow 🟡, navy blue 🔵, purple 🟣, white ⚪). Also, Spotify provides audio feature
for each song which contains value of Danceability, Energy, Instrumentalness, Liveness, Loudness, Speechiness, Tempo, Valence from 0.0 to 1.0. In this post, two audio features - energy
and valence
- will be used mainly.
Energy
is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. Tracks with highvalence
sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry) (Bo Plantinga).
After we collected the information related to mood
& color
, and researched available information we can get from Spotify, we came up with the final recommendation approach to recommend songs based on user’s information.
Singer
Artists and Track Data ETL pipeline is required.
We calculate average of audio features of each artists’s top 10 songs that user would like to listen by using either Euclidean Distance or Cosine Distance. Save top 3 most similar artists into postgreSQL or MySQL.
Data Modelling
There are many more Response status code as you can check HERE.
- Artist
Column | Data Type |
---|---|
Artist Id | VARCHAR(256) |
Artist Name | VARCHAR(256) |
Artist Genre | VARCHAR(256) |
Followers | INT(11) |
Popularity | INT(11) |
Artist Uri | VARCHAR(256) |
Artist Info | VARCHAR(256) |
-
Artist Genre
-
TRACK
Column | Data Type |
---|---|
URI | VARCHAR(256) |
Track Name | VARCHAR(256) |
Artist Uri | extension to be used for dest files. |
-
GENRE
-
MOOD