Data Story: Recommendation Systems

Recommendation Systems require three components to provide recommendations:

Background data (info. that the system has before the recommendation)
Input data (provided by user)
Recommendation algorithm

Non-Personalized Recommendation: Based on populations' average opinions

-> Lack of context, come with problems while your taste diverse from the average

Source: https://www.youtube.com/watch?v=JEYLfIVvR9I

* Collaborative Filtering (CF):
Assumption: If a person A has the same opinion as a person B on an issue I1, A is more likely to adopt B's opinion on a different issue I2 than a randomly chosen person.
1. Aggregate ratings for items from different users (Rating records of users)
2. User profile including ratings for one or more items
3. Use the background data to calculate pair wise similarities between items
- Challenges
1). New items or new users ( together know as ramp-up or cold-start problem (Schein et al. 2002))
2). Sparsity of data will affect recommendations (Balabanovic and Shoham 1997), (The number of ratings is low compared to the number of items)
- Two approaches for CF
1). Model-based approach (SVD, or Matrix Factorization)
2). Memory-based approach (Similarity between item-item or user-user)

* Content-based recommendation:
The content-based approach (Mooney and Roy 2002) recommends a user to items whose content is similar to content that the user has previously viewed or selected.
1. Use the features of the items
2. User preference in terms of content features

*Knowledge-based recommendation:

-------------------------------------------------------------------------------------------------------------------------

Prediction Accuracy:

* Measuring Ratings Prediction Accuracy
1. Root Mean Squared Error (RMSE): Square of (Predicted - Actual Rating)
RMSE=1n∑i=1n(yi−ŷ i)2‾‾‾‾‾‾‾‾‾‾‾‾‾‾⎷
2. Mean Absolute Error (MAE): Absolute of (Predicted - Actual Rating)

* Measuring Usage Prediction (does not predict ratings but items for users)
1. Precision (P@N)
2. Recall
3. F-Measure
4. MAP (Mean Average Precision)
5. MRR (Mean Reciprocal Rank)

When the # of recommendations is preordained (determined), use the precision-recall curves
and when the # of recommendations is not preordained, use the Receiver Operating Characteristic (ROC) curves.

Precision-recall curves emphasize the proportion of recommended items that are preferred while ROC curves emphasize the proportion of items that are not preferred that end up being recommended.

-------------------------------------------------------------------------------------------------------------------------
Some other metrics rather than accuracy: Sometimes you don't need the similar things to what you have bought, but rather would like to some fresh ones or even surprise!

Diversity: Things that are not in the same categories etc.

Serendipity: Things that you didn't expect etc. Business Goal: get people to consume less popular items.

-------------------------------------------------------------------------------------------------------------------------

Algorithms:

Useful Tools

LensKit (Java）
Apache Mahout (Java, Use Hadoop for scalability, General ML capabilities, several recommender algorithms)
MyMediaLight(C#, .Net)
GraphLab (C++ and Java, High-performance matrix factorization, Efficient on large dataset on single machine)
RankLib (Learning To Rank)

Useful books and Course

Recommender systems handbook
Introduction to recommender systems (Coursera)

-------------------------------------------------------------------------------------------------------------------------

LinkedIn Recommender System

Headers

Recommendation Systems

No comments:

Post a Comment