Youhee Kil

[Machine Learning] Basic Math for ML with Python (Part 1)

Youhee published on 2022-07-26 included in Machine Learning

Machine learning is nothing but a geometry problem ! All data is the same. Understanding geometry is very important to solve machine learning problems. [Numpy] Basic Linear Algebra Linear algebra, mathematical discipline that deals with vectors and matrices and, more generally, with vector spaces and linear transformations (Britannica) In this lecture, you will learn Scala, Vector, Array, Tensor Dot product & Norm Multiplication & Transpose & Invertible matrix Linear Transformation Eigen Value & Eigen Vector Cosine Similarity and perform these with Numpy

[SQL] RFM Analysis with SQL

Youhee published on 2022-07-15 included in Data Engineer

What is RFM? RFM is a technique that can perform Customer Segmentation to determine customer’s buying behavior. Company uses the RFM metric as a customer behavior segmentation indicator to improve marketing strategies for revenue increases by reactivating customers to more royal. R(Recency) : The last time the customer made a purchase. The smaller the number, the better. F(Frequency) : Number of transactions. The bigger the number, the better

[Starbucks Twitter Sentiment Analysis] Instructions and Spark NLP

Youhee published on 2022-06-08 included in Project

Setup with Confluent Kafka, Spark, Delta Lake with Databricks and AWS Project Final Diagram Instruction In this post, we will set up environment to perform Starbucks Twitter Sentiment Analysis with Confluent Kafka, Spark, Delta Lake with Databricks and AWS. Step 1. Twitter API Credentials As we performed in the previous post, we need to get Twitter API Credentials. After getting it, we save these credential information in .

[Starbucks Twitter Sentiment Analysis] Architecture Planning

Youhee published on 2022-06-06 included in Project Sentiment-Analysis

Architecture Planning From Kafka to Delta Lake using Apache Spark Structured Streaming Image Source: From Kafka to Delta Lake using Apache Spark Structured Streaming 1. Aim The aim of the Starbucks Twitter Sentimental Analysis project is to build end-to-end twitter data streaming pipeline to analyze brand sentiment analysis. Brand sentiment analysis is, to put it simply, a way of determining the general attitude toward your brand, product, or service.

Sentiment Analysis with NLTK, TextBlob, Spark Streaming

Youhee published on 2022-06-05 included in Sentiment-Analysis

TextBlob The TextBlob method produces polarity and subjectivity score. The polarity score which falls between [-1.0, 1.0] indicates a sensentivity from the sentence. If the score is below zero (0.0), sensitivity of the sentence is negativity. While the score is above zero (0.0), then the sensitivity of the sentence is positive. The subjectivity score which falls between [0.0, 1.0] identifies whether the sentence is objective or subjectivity. If the score is close to 0.

[SongPlz-Bot] 2. Severless & Data Ingestion & Recommender System

Youhee published on 2022-05-23 included in Project

There are two basic recommender systems: (1) Collaborative Filtering, (2) Content-Based Filtering. It differs by what kinds of data that you are working with. Collaborative Filtering approach works with the user-item interactions types of data, such as ratings or buying behavior. On the other hand, Content-Based Filtering approach is for the attribute information about the users and items, such as textual profiles or relevant keywords. In this post, I am going to perform an effective song recommendataion system with the combination of two user’s informations - mood and favorite artist.