Monday, January 22, 2018

User Modeling in Telecommunications and Internet Industry (KDD 2015)

& another video:

  • User Modeling is about:
    • who U R?
    • where R U?
    • what's next?

  • User Modeling is for machines, not for human decision maker
    • you cannot put human bias for what data to collect (as much as possible, AMAP)
    • human design how to collect AMAP
    • require a good business model (Nokia competition data does not reflect industry reality) 

  • User Modeling
    • collaborative, crowd intelligence
    • continuous incremental (life long) learning (closed loop)
    • feature engineering is important
    • privacy centric (not afterthought)

Wednesday, January 10, 2018

About rejection, rejected papers, projects, etc.

A popular tweet I saw today reminds me the video I watched several days ago. And "Don't let temporary failures discourage you!!!"

Sunday, December 31, 2017

Good bye 2017

Time flies when you are living abroad. It has been over three years since I arrived and studied in Ireland. This year has been a busy one for me with a great surprise for our born baby, and it is time for finishing up my PhD thesis as soon as possible. As always being expected, some things planned went well and some did not happen, even though, it is always motivating to have my bucket list for the new year. I missed the opportunities for participating some conferences I've planned to, and made it to attend some conferences I've planned, and as always, it will be happening again and again with acceptances and rejections from conference venues. Even so, it was a great experience to submitting top-tier conferences and getting feedbacks from many experts.

I also applied many internship opportunities related to machine learning, RecSys, etc. and had one phone interview from IBM. The industry needs are the overlap between machine learning and media/text analytics which also denotes the popularity of deep learning and its applications in computer vision, NLP, etc. Although I did not make it, I had another opportunity to attend Lisbon machine learning summer school this year, which is one of the largest ML summer school, and started to know many interesting and motivating techniques which I aim to apply to my research recently.

And it is time for me to re-think about future opportunities I'd like to take after graduation. Different from last time in South Korea after my master's degree, things have been changed from two family members to three. On one hand, it is always afraid to prepare for the next step when the graduation is upcoming. On the other hand, it is also very exciting to face new challenges and take new journeys - "A man is not old as long as he is seeking something.".

So make your resolutions well and try your best to achieve them as many as possible in 2018!

Thursday, August 10, 2017

Mac 15 inch, 2010 GPU panic What I tried so far...

  • Stop automatically switching in system preferences
  • Use gfxCardStatus 2.1, use integrated GPU only
  • Chrome: chrome://settings/system, disable "user hardware acceleration..."
  • Stop using Google Drive (unless sync)
  • ~/Library/Preferences/ByHost remove things contain "windowserver", reboot (re-do if you connected to an external monitor)
  • Monday, July 31, 2017

    7th Lisbon Machine Learning Summer School Report

    I had a really great chance to attend 7th Lisbon Machine Learning Summer School (LxMLS2017) in Lisbon,  Portugal from 19th to 27th. In its 7th edition, LxMLS2017 has several hundreds of applicants, which results in a selective decision (41%) to limit 200+ participants for the summer school. LxMLS2017 has many sponsors including Google, and there are also many other machine/deep learning summer school options such as the Deep Learning Summer School in Bilbao, Spain. One of the reasons for preferring LxMLS2017 might be the practical lab sessions with basic as well as in depth talks, which let me feel more like a school:) Also, if you missed it by any reason, the slides and lab guide are available in the following links for you to catch up.

    Day-1: Probabiliy & Python

    The first day of the school is more like warming up session, which includes (1) an overview of probability (by Mario A. T. Figueiredo), and (2) an introduction to Python (by Luis Pedro Coelho) for making everyone in the same starting point. The Python tutorial is very compact but informative, which can be found in the speaker's github repo.

    Day-2: Linear Learners

    The morning session introduces linear learners by STEFAN RIEZLER, including:

    • Naive Bayes
    • Perceptron
    • Logistic Regression
    • Support Vector Machines
    Furthermore, the speaker talked about convex optimization for learning parameters of these models, especially, how to use Gradient Descent / Stochastic Gradient Descent to reach the minimum.

    Figure 1

    The bottom line about the difference between Batch (Offline) & Stochastic (Online) learnings is online learning do each update based on each random (stochastic) sample in the training dataset while the former one do each update based on all of the samples in the training dataset. In this regard, stochastic (online) learning is mostly used in the current big data era in terms of the majority of problems.

    Evening talk was given by FERNANDO PEREIRA from Google, who is Distinguished Scientist at Google, where he leads projects in natural-language understanding and machine learning. The talk "LEARNING AND REPRESENTATION IN LANGUAGE UNDERSTANDING" gives some work at Google using deep learning as well as Knowledge Graphs for learning & representing language for various applications in their products.

    Day-3: Sequence Model

    Noah Smith from Uni. of Washington provided a great tutorial about sequence model including:

    • Markov Models
    • Hidden Markov Models (HMMs)
    • Viterbi Algorithm
    • Learning Algorithms for HMMs

    A basic model with strong independence of each word is Bag of Words model, i.e, every word is independent of every other word. Figure 2 shows a nice representation of Bag of Words model where words on the ground can be some unimportant words depend on your task (e.g., stopwords for search)

    Figure 2
    Obviously, as the strong assumption (independence of each word) is not usually the case on NLP, the simple model performs poorly on modeling language.

    To make the model better, a simple improvement is based on the idea that each word depends on its previous word, which becomes 1st Order Markov Model. In the same way, we can extend the model by the word depends on its m previous words, which becomes m-th Order Markov Models.

    m-th Order Markov Model

    Hidden Markov Model (HMM) is a model over sequence of symbols, but there is missing information associated with each symbol - its "state"

    In other words, HMM is a joint model over observable symbols and their hidden/latent/unknown classes. 

    For instance, in PoS tagging, PoS tags are states (unknown classies) of words.

    Then we can move on to the decoding problem: given the learned parameters & a new observation sequence, find the "best" sequence of hidden states. And with a different definition of "best", we have different approaches such as (1) posterior decoding, and (2) viterbi decoding.

    For example, the "best" is different in each of the following two problems:
    1. Pay 1EUR if we get the sequence wrong
    2. Pay 0.1EUR for every wrong label/class for each word

    Viterbi decoding is for the first problem, which aims at finding the most probable sequence of hidden states, and posterior decoding is for the second problem. Viterbi algorithm can be explained by the matrix below where rows denote all the states, and columns denote a sequence. Then, the algorithm proceeds from left to right:
    • compute the maximum probability to transition in every new state given the previous states
    • find the most probable state at the end
    • backtrack to get the final sequence of states

    Viterbi Algorithm

    Evening talk was given by ALEXANDRA BIRCH from Uni. of Edinburgh on Machine Translation (MT) with the subject "SMALLER, FASTER, DEEPER: UNIVERSITY OF EDINBURGH MT SUBMITTION TO WMT 2017", which describes works done by their group on MT, and comparative performance on the WMT against other submissions. Importantly, the speaker talked models that smaller, faster and deeper, which can be trained in a usual environment in an academic setting (with limited resources). slides

    Day-4: Learning Structured Predictors

    XAVIER CARRERAS from XEROX, which is now Naver (Google in South Korea) Labs Europe, gave the lecture on learning structured predictors using Named Entity Recognition (NER) as an example.

    A simple model is decomposing the prediction of the sequence of labels into predicting each label at each position, which named local classifers. In the following, f(x, i, l) denotes manually created features based on the position i and label l.

    The direct comparison between between local classifiers and HMM is shown below.

    Q: How can we incorporate the feature-rich & label interactions together?

    Log-linear models 

    Day-5: Syntax and Parsing

    Yoav Goldberg from Bar Ilan University gave the lecture on syntax and parsing.

    • What is parsing?
    • Phrase-based (constituency) trees (PCFG, CKY)
    • Dependency trees (Graph parsers, transition parsers)
    Parsing is dealing with the problem of recovering the structure in natural language (e.g., linguists create Linguistic Theories for defining this structure). Understanding the structure is helpful for other NLP tasks such as sentiment analysis, machine translation etc. And different the structure in yesterday, the structure in day-5 is hierarchical one.

    CFG (Context Free Grammer) is an important concept for parsing, which presented on the left. 

    PCFG (Probablistic CFG) is like a CFG, but each rule has an associated probability, and our goal is then get a tree with maximum probability.

    Parsing with a PCFG is finding the most probable derivation for a given sentence. CKY algorithm is an algorithm for doing that.

    Dependency trees capture the dependency between words in a sentence. Three main approaches of dependency parsing were introduced. The first approach is parsing the sentence to constituency structure, and then extract dependencies from the trees. The second graph-based approach (Golbal Optimization), which define a scoring function over (sentence, tree) pairs, and then search for the best-scoring structure. Finally, the transition-based approach starts with an unparsed sentence, and apply locally-optimal actions until the sentence is parsed.

    In the evening, there was a demo session by dozens of companies working on ML/DL with respect to various areas. It is interesting to see how ML/DL is transforming the world in so many domains such as medicine search, energy, government etc.

    The last two days of the summer school talked about deep learning, which is so hot recent years, especially with the successful applications in the areas such as speech recognition, computer vision, and NLP, thanks to the big data & advanced computing powers.

    slide from the course "Deep Learning" Udacity

    Day-6: Introduction to Neural Networks

    Day-6 is about neural networks from BHIKSHA RAJ (CMU).

    • Neural Networks (NN) and what can they model
    • Issues about learning
    NN have established state-of-the-art in many problems such as speech recognition, Go. NN began as computational models of the brain. The NN models have been evolved from the earliest model of cognition (associationism), the more recent model (connectionist), and current NN models (connectionist machines). 

    BHIKSHA RAJ also showed how NN can model different functions from boolean to the function with complex decision boundaries using Multi-Layer Perceptrons (MLP). An interesting fact is that the analysis of weights in the perceptron. He explained that neuron fires if the correlation between the weight pattern and the inputs exceeds a threshold, i.e., perceptron is acturally a correlation filter!

    Then BHIKSHA RAJ explained why deep matters...

    Q: When we should call a deep network? Usually we call is a deep network when we have more than 2 hidden layers.

    Deeper networks may require exponentially fewer neurons than shallower networks to express the same function.

    The second topic of this lecture is about learning NN parameters, which includes how to define input/output, error/cost functions, backpropagation, and convergence of learning. The final slide of the lecture showed how different approaches for optimizing gradient descent converge with time by Sebastian.

    In the evening, GRAHAM NEUBIG from CMU gave an introduction to "SIMPLE AND EFFICIENT LEARNING WITH DYNAMIC NEURAL NETWORKS" using DyNet, which is a framework for the other paradigm - Dynamic Graphs compared to Static Graphs used in TensorFlow and Theano.

    Static Graphs (TensorFlow, Theano)

    Dynamic Graphs (Chainer, DyNet, PyTorch)

    Day-7: Modeling Sequential Data with Recurrent Networks

    The final lecture of the summer school was given by CHRIS DYER from CMU & DeepMind.

    • Recurrent Neural Networks (RNN, in the context of language models)
    • Learning parameters, LSTM
    • Conditional Sequence Models
    • Machine Translation with Attention
    The main difference between Feed-forward NN and RNN is the later one incorporates the history at current process.

    The problem of RNN is vanishing gradients, i.e., we cannot adjust the weight of h1 based on the error occurred at the end.

    Visualization of LSTM: Christopher Olah

    Then, the speaker talked about conditional language models, which assigns probabilities to sequences of words given some context x (e.g., the author, an image etc.). One of the important part is then how to encode the context into a fixed-size vector, which has been proposed with different approaches such as conventional sequence models, LSTM encoder etc.

    Next part of the lecture is about Machine Translation with Attention. In translation, each sequence is represented as a matrix where each column is a vector for the corresponding word in the sequence. Attention gives signal that which column should we give more attention at current translation. Afterwards, the lecture discussed different approaches for calculating attentions.

    In the end, some tricks, e.g., depth of the models, and mini-batching have been introduced. An interesting observation is that depth seems less important with respect to text compared to audio/visual processing. One possible hypothesis might be more transformation of the input is required for ASR, image recognition, etc.,

    Evening talk: KYUNGHYUN CHO from New York Uni. & Facebook AI research gave a practical talk on "NEURAL MACHINE TRANSLATION AND BEYOND", which includes the latest (a few weeks...) progress of neural machine translation. The talk first showed the very neat history of machine translation, and then showed how neural machine translation models have taken over and became the state-of-the-art on different language translation.

    About the summer school:

    It was a really great summer school with lectures provided by experts (of course with a big effort by the organizers), and I'd like to highly recommend it for anyone who is interested machine learning and deep learning. And if you're familiar with the topics covered by the summer school, I expect you will get many fresh and views on what you already-known. If you are not familiar with those topics like me, you can also get a pretty good overview and starting points for adopting these techniques for your problem.

    Other reports on the summer school:

    Friday, July 7, 2017

    Hypertext2017 Travel Report

    I participated the 28th ACM Conference on Hypertext and Social Media (HT), which was located at Prague, Czech Republic from 4-7th, July. HT is a top-tier ACM conference in the areas of Hypertext and Social Media. This is the first time I'm attending HT, and interesting to know that TBL was demonstrated WWW in 1991 Hypertext conference This year, HT has 69 regular paper submissions with a 27% acceptance rate, and 12 short-presentations. As I was at UMAP conference twice before, and HT has been held in close proximity with UMAP with similar program committees, I was wondering what's the difference between the two conferences. After attending the conference, I guess the key difference is while UMAP is more focused on the context of e-learning, such as user modeling, RecSys in educational systems, HT is more focused on linking data & resources and Social Media. Although HT has wide range of acceptance rate, overall, it has good average citation according to ACM DL.


    Keynote: Peter Mika SCHIBSTED (Yahoo before)

    It is interesting to see the keynote on Semantic Web in HT. In this talk, we look back at the history of the Semantic Web. The speaker discussed what the original aspirations of its creators were, and what has been achieved in practice in these two decades including some achievements especially in terms of search engines. In addition, also some failures which have not been achieved based on original visions.

    What happened to the Semantic Web? from Peter Mika

    Most of the presentations today related to studying problems on Social Media, such as hate speech:

    • Mainack Mondal, Leandro Augusto de Araújo Silva and Fabrício Benevenuto: A Measurement Study of Hate Speech in Social Media
    • Stringhini and Athena Vakali: Hate is not binary: Studying abusive behavior of #GamerGate on Twitter
    These talks were interesting as I was interested in computational social science when I first started my PhD. For example, the first one above discussed about "how to measure hate speech?", "does the anonymity plays a role in it?", and how these phenomena differ across countries. The results, based on Twitter dataset were interesting. The authors found that there are more anonymous account of hate speech compared to baseline (random), i.e, users post more hate speech.



    Kristina Lerman is Research Team Lead at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. She talked about position bias in Social Mdedia, e.g., posts will be less likely to be seen with lower position, with more newer tweets coming, former tweets then become less likely to be seen with their positions moving down… and the phenomenon is more serious for well-connected users. Also, it is interesting that well-connected hubs are less likely to retweet older posts, retweet probability decreases with connectivity - highly connected people are less susceptible to infection, due to their increased cognitive load.

    The presentations on day-2 were diverse, consists of linking content, crowd sourcing, story telling... And the following paper which tackles the problem of understanding task clarity in crowdsourcing platforms, especially CrowedFlow…, and how to measure it, won the best paper award in HT2017.

    • Ujwal Gadiraju, Jie Yang and Alessandro Bozzon: Clarity is a Worthwhile Quality - On the Role of Task Clarity in Microtask Crowdsourcing


    The presentations on day-3 were about location-based social networks, user modeling, ratings/reviews and visualizations. One of the interesting papers was the following one which I had read about the previous work about happy map done by Daniele QUercia (Bell Labs Cambridge). This paper talked about various elements which might affect perceptions (such as safety etc.) of people about places.

    • David Candeia, Flávio Figueiredo, Nazareno Andrade and Daniele Quercia: Multiple Images of the City: Unveiling Group-Specific Urban Perceptions through a Crowdsourcing Game

    My presentation was about "Leveraging Followee List Memberships for Inferring User Interests for Passive Users on Twitter", which is an extended work upon previous work in ECIR2017.

    Leveraging Followee List Memberships for Inferring User Interests for Passive Users on Twitter from GUANGYUAN PIAO

    Overall, the conference has around 70+ participants. However, what's impressive is the audiences were actively asking questions, and participated in discussions. In addition, the organizers made the proceedings available before the conference along with conference navigator developed by Uni. Pittsburgh:


    Next year, HT2018 will be in Baltimore, USA. It is a good conference and hope I will have chances to attend the conference in the future as well.

    Friday, March 17, 2017

    RecSys Related Libraries List

    fastFM (Factorization Machines, Python)
    lodreclib (LODRecSys, SPrank etc., Java)

    To co-curate the list of recommender system libraries, I created a github repository which contains the list of libraries. Please feel free to send requests to add/update the information.

    Saturday, March 4, 2017

    IBM phone interview for Research Intern

    It was the first time I'm doing a phone interview, and also was the first time to do an interview for an internship in an industry lab. It took around 30 mins, and the overall process was smooth.

    The overall process was as below:

    1. Introduce the research internship (e.g., duration, starting date, main to-do list etc.)
    2. Other attendees (research scientists in the same team) were asked about several research questions about my research.
      • What's the research question and how do you approach it and evaluate it?
      • What kind of skills (NLP, which is related to the position) have been used?
      • What kind of machine learning techniques you are familiar with?
      • How can your research methodologies be generalized into other corpus?
      • What's your future direction to extend your current work?
    3. The last step was for me to ask some questions about the internship.

    They kindly informed me after two days as they found a better match to that position.