Thursday, August 10, 2017

Mac 15 inch, 2010 GPU panic What I tried so far...






  • Stop automatically switching in system preferences
  • Use gfxCardStatus 2.1, use integrated GPU only
  • Chrome: chrome://settings/system, disable "user hardware acceleration..."
  • Stop using Google Drive (unless sync)
  • ~/Library/Preferences/ByHost remove things contain "windowserver", reboot (re-do if you connected to an external monitor)
  • Monday, July 31, 2017

    7th Lisbon Machine Learning Summer School Report

    I had a really great chance to attend 7th Lisbon Machine Learning Summer School (LxMLS2017) in Lisbon,  Portugal from 19th to 27th. In its 7th edition, LxMLS2017 has several hundreds of applicants, which results in a selective decision (41%) to limit 200+ participants for the summer school. LxMLS2017 has many sponsors including Google, and there are also many other machine/deep learning summer school options such as the Deep Learning Summer School in Bilbao, Spain. One of the reasons for preferring LxMLS2017 might be the practical lab sessions with basic as well as in depth talks, which let me feel more like a school:) Also, if you missed it by any reason, the slides and lab guide are available in the following links for you to catch up.

    http://lxmls.it.pt/2017/?page_id=65
    http://lxmls.it.pt/2017/LxMLS2017.pdf

    Day-1: Probabiliy & Python


    The first day of the school is more like warming up session, which includes (1) an overview of probability (by Mario A. T. Figueiredo), and (2) an introduction to Python (by Luis Pedro Coelho) for making everyone in the same starting point. The Python tutorial is very compact but informative, which can be found in the speaker's github repo.

    https://github.com/luispedro/talk-python-intro

    Day-2: Linear Learners


    The morning session introduces linear learners by STEFAN RIEZLER, including:

    • Naive Bayes
    • Perceptron
    • Logistic Regression
    • Support Vector Machines
    Furthermore, the speaker talked about convex optimization for learning parameters of these models, especially, how to use Gradient Descent / Stochastic Gradient Descent to reach the minimum.

    Figure 1

    The bottom line about the difference between Batch (Offline) & Stochastic (Online) learnings is online learning do each update based on each random (stochastic) sample in the training dataset while the former one do each update based on all of the samples in the training dataset. In this regard, stochastic (online) learning is mostly used in the current big data era in terms of the majority of problems.


    Evening talk was given by FERNANDO PEREIRA from Google, who is Distinguished Scientist at Google, where he leads projects in natural-language understanding and machine learning. The talk "LEARNING AND REPRESENTATION IN LANGUAGE UNDERSTANDING" gives some work at Google using deep learning as well as Knowledge Graphs for learning & representing language for various applications in their products.

    Day-3: Sequence Model


    Noah Smith from Uni. of Washington provided a great tutorial about sequence model including:

    • Markov Models
    • Hidden Markov Models (HMMs)
    • Viterbi Algorithm
    • Learning Algorithms for HMMs

    A basic model with strong independence of each word is Bag of Words model, i.e, every word is independent of every other word. Figure 2 shows a nice representation of Bag of Words model where words on the ground can be some unimportant words depend on your task (e.g., stopwords for search)

    Figure 2
    Obviously, as the strong assumption (independence of each word) is not usually the case on NLP, the simple model performs poorly on modeling language.

    To make the model better, a simple improvement is based on the idea that each word depends on its previous word, which becomes 1st Order Markov Model. In the same way, we can extend the model by the word depends on its m previous words, which becomes m-th Order Markov Models.

    m-th Order Markov Model



    Hidden Markov Model (HMM) is a model over sequence of symbols, but there is missing information associated with each symbol - its "state"


    In other words, HMM is a joint model over observable symbols and their hidden/latent/unknown classes. 


    For instance, in PoS tagging, PoS tags are states (unknown classies) of words.


    Then we can move on to the decoding problem: given the learned parameters & a new observation sequence, find the "best" sequence of hidden states. And with a different definition of "best", we have different approaches such as (1) posterior decoding, and (2) viterbi decoding.

    For example, the "best" is different in each of the following two problems:
    1. Pay 1EUR if we get the sequence wrong
    2. Pay 0.1EUR for every wrong label/class for each word

    Viterbi decoding is for the first problem, which aims at finding the most probable sequence of hidden states, and posterior decoding is for the second problem. Viterbi algorithm can be explained by the matrix below where rows denote all the states, and columns denote a sequence. Then, the algorithm proceeds from left to right:
    • compute the maximum probability to transition in every new state given the previous states
    • find the most probable state at the end
    • backtrack to get the final sequence of states


    Viterbi Algorithm

    Evening talk was given by ALEXANDRA BIRCH from Uni. of Edinburgh on Machine Translation (MT) with the subject "SMALLER, FASTER, DEEPER: UNIVERSITY OF EDINBURGH MT SUBMITTION TO WMT 2017", which describes works done by their group on MT, and comparative performance on the WMT against other submissions. Importantly, the speaker talked models that smaller, faster and deeper, which can be trained in a usual environment in an academic setting (with limited resources). slideshttp://lxmls.it.pt/2017/birchNMT.pdf

    Day-4: Learning Structured Predictors


    XAVIER CARRERAS from XEROX, which is now Naver (Google in South Korea) Labs Europe, gave the lecture on learning structured predictors using Named Entity Recognition (NER) as an example.

    A simple model is decomposing the prediction of the sequence of labels into predicting each label at each position, which named local classifers. In the following, f(x, i, l) denotes manually created features based on the position i and label l.


    The direct comparison between between local classifiers and HMM is shown below.


    Q: How can we incorporate the feature-rich & label interactions together?

    Log-linear models 


    Day-5: Syntax and Parsing


    Yoav Goldberg from Bar Ilan University gave the lecture on syntax and parsing.
    Parsing

    • What is parsing?
    • Phrase-based (constituency) trees (PCFG, CKY)
    • Dependency trees (Graph parsers, transition parsers)
    Parsing is dealing with the problem of recovering the structure in natural language (e.g., linguists create Linguistic Theories for defining this structure). Understanding the structure is helpful for other NLP tasks such as sentiment analysis, machine translation etc. And different the structure in yesterday, the structure in day-5 is hierarchical one.


    CFG (Context Free Grammer) is an important concept for parsing, which presented on the left. 

    PCFG (Probablistic CFG) is like a CFG, but each rule has an associated probability, and our goal is then get a tree with maximum probability.





    Parsing with a PCFG is finding the most probable derivation for a given sentence. CKY algorithm is an algorithm for doing that.


    Dependency trees capture the dependency between words in a sentence. Three main approaches of dependency parsing were introduced. The first approach is parsing the sentence to constituency structure, and then extract dependencies from the trees. The second graph-based approach (Golbal Optimization), which define a scoring function over (sentence, tree) pairs, and then search for the best-scoring structure. Finally, the transition-based approach starts with an unparsed sentence, and apply locally-optimal actions until the sentence is parsed.

    In the evening, there was a demo session by dozens of companies working on ML/DL with respect to various areas. It is interesting to see how ML/DL is transforming the world in so many domains such as medicine search, energy, government etc.

    The last two days of the summer school talked about deep learning, which is so hot recent years, especially with the successful applications in the areas such as speech recognition, computer vision, and NLP, thanks to the big data & advanced computing powers.

    slide from the course "Deep Learning" Udacity

    Day-6: Introduction to Neural Networks


    Day-6 is about neural networks from BHIKSHA RAJ (CMU).

    • Neural Networks (NN) and what can they model
    • Issues about learning
    NN have established state-of-the-art in many problems such as speech recognition, Go. NN began as computational models of the brain. The NN models have been evolved from the earliest model of cognition (associationism), the more recent model (connectionist), and current NN models (connectionist machines). 

    BHIKSHA RAJ also showed how NN can model different functions from boolean to the function with complex decision boundaries using Multi-Layer Perceptrons (MLP). An interesting fact is that the analysis of weights in the perceptron. He explained that neuron fires if the correlation between the weight pattern and the inputs exceeds a threshold, i.e., perceptron is acturally a correlation filter!


    Then BHIKSHA RAJ explained why deep matters...

    Q: When we should call a deep network? Usually we call is a deep network when we have more than 2 hidden layers.

    Deeper networks may require exponentially fewer neurons than shallower networks to express the same function.

    The second topic of this lecture is about learning NN parameters, which includes how to define input/output, error/cost functions, backpropagation, and convergence of learning. The final slide of the lecture showed how different approaches for optimizing gradient descent converge with time by Sebastian.

    http://ruder.io/optimizing-gradient-descent/index.html



    In the evening, GRAHAM NEUBIG from CMU gave an introduction to "SIMPLE AND EFFICIENT LEARNING WITH DYNAMIC NEURAL NETWORKS" using DyNet, which is a framework for the other paradigm - Dynamic Graphs compared to Static Graphs used in TensorFlow and Theano.


    Static Graphs (TensorFlow, Theano)

    Dynamic Graphs (Chainer, DyNet, PyTorch)

    Day-7: Modeling Sequential Data with Recurrent Networks


    The final lecture of the summer school was given by CHRIS DYER from CMU & DeepMind.

    • Recurrent Neural Networks (RNN, in the context of language models)
    • Learning parameters, LSTM
    • Conditional Sequence Models
    • Machine Translation with Attention
    The main difference between Feed-forward NN and RNN is the later one incorporates the history at current process.


    The problem of RNN is vanishing gradients, i.e., we cannot adjust the weight of h1 based on the error occurred at the end.

    Visualization of LSTM: Christopher Olah


    Then, the speaker talked about conditional language models, which assigns probabilities to sequences of words given some context x (e.g., the author, an image etc.). One of the important part is then how to encode the context into a fixed-size vector, which has been proposed with different approaches such as conventional sequence models, LSTM encoder etc.

    Next part of the lecture is about Machine Translation with Attention. In translation, each sequence is represented as a matrix where each column is a vector for the corresponding word in the sequence. Attention gives signal that which column should we give more attention at current translation. Afterwards, the lecture discussed different approaches for calculating attentions.

    In the end, some tricks, e.g., depth of the models, and mini-batching have been introduced. An interesting observation is that depth seems less important with respect to text compared to audio/visual processing. One possible hypothesis might be more transformation of the input is required for ASR, image recognition, etc.,



    Evening talk: KYUNGHYUN CHO from New York Uni. & Facebook AI research gave a practical talk on "NEURAL MACHINE TRANSLATION AND BEYOND", which includes the latest (a few weeks...) progress of neural machine translation. The talk first showed the very neat history of machine translation, and then showed how neural machine translation models have taken over and became the state-of-the-art on different language translation.


    About the summer school:


    It was a really great summer school with lectures provided by experts (of course with a big effort by the organizers), and I'd like to highly recommend it for anyone who is interested machine learning and deep learning. And if you're familiar with the topics covered by the summer school, I expect you will get many fresh and views on what you already-known. If you are not familiar with those topics like me, you can also get a pretty good overview and starting points for adopting these techniques for your problem.

    Other reports on the summer school:





    Friday, July 7, 2017

    Hypertext2017 Travel Report


    I participated the 28th ACM Conference on Hypertext and Social Media (HT), which was located at Prague, Czech Republic from 4-7th, July. HT is a top-tier ACM conference in the areas of Hypertext and Social Media. This is the first time I'm attending HT, and interesting to know that TBL was demonstrated WWW in 1991 Hypertext conference https://home.cern/images/2014/01/tim-berners-lee-demonstrates-world-wide-web. https://www.quora.com/Why-is-Sir-Tim-Berners-Lee-unnoticed-when-his-contribution-is-comparable-to-Jobs-and-Gates. This year, HT has 69 regular paper submissions with a 27% acceptance rate, and 12 short-presentations. As I was at UMAP conference twice before, and HT has been held in close proximity with UMAP with similar program committees, I was wondering what's the difference between the two conferences. After attending the conference, I guess the key difference is while UMAP is more focused on the context of e-learning, such as user modeling, RecSys in educational systems, HT is more focused on linking data & resources and Social Media. Although HT has wide range of acceptance rate, overall, it has good average citation according to ACM DL.




    Day-1:

    Keynote: Peter Mika SCHIBSTED (Yahoo before)

    It is interesting to see the keynote on Semantic Web in HT. In this talk, we look back at the history of the Semantic Web. The speaker discussed what the original aspirations of its creators were, and what has been achieved in practice in these two decades including some achievements especially in terms of search engines. In addition, also some failures which have not been achieved based on original visions.


    What happened to the Semantic Web? from Peter Mika

    Most of the presentations today related to studying problems on Social Media, such as hate speech:

    • Mainack Mondal, Leandro Augusto de Araújo Silva and Fabrício Benevenuto: A Measurement Study of Hate Speech in Social Media
    • Stringhini and Athena Vakali: Hate is not binary: Studying abusive behavior of #GamerGate on Twitter
    These talks were interesting as I was interested in computational social science when I first started my PhD. For example, the first one above discussed about "how to measure hate speech?", "does the anonymity plays a role in it?", and how these phenomena differ across countries. The results, based on Twitter dataset were interesting. The authors found that there are more anonymous account of hate speech compared to baseline (random), i.e, users post more hate speech.

    Day-2:

    Keynote: "A MEME IS NOT A VIRUS: THE ROLE OF COGNITIVE HEURISTICS IN INFORMATION DIFFUSION" by Kristina Lerman

    Kristina Lerman is Research Team Lead at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. She talked about position bias in Social Mdedia, e.g., posts will be less likely to be seen with lower position, with more newer tweets coming, former tweets then become less likely to be seen with their positions moving down… and the phenomenon is more serious for well-connected users. Also, it is interesting that well-connected hubs are less likely to retweet older posts, retweet probability decreases with connectivity - highly connected people are less susceptible to infection, due to their increased cognitive load.

    The presentations on day-2 were diverse, consists of linking content, crowd sourcing, story telling... And the following paper which tackles the problem of understanding task clarity in crowdsourcing platforms, especially CrowedFlow…, and how to measure it, won the best paper award in HT2017.

    • Ujwal Gadiraju, Jie Yang and Alessandro Bozzon: Clarity is a Worthwhile Quality - On the Role of Task Clarity in Microtask Crowdsourcing


    Day-3:

    The presentations on day-3 were about location-based social networks, user modeling, ratings/reviews and visualizations. One of the interesting papers was the following one which I had read about the previous work about happy map done by Daniele QUercia (Bell Labs Cambridge). This paper talked about various elements which might affect perceptions (such as safety etc.) of people about places.

    • David Candeia, Flávio Figueiredo, Nazareno Andrade and Daniele Quercia: Multiple Images of the City: Unveiling Group-Specific Urban Perceptions through a Crowdsourcing Game

    My presentation was about "Leveraging Followee List Memberships for Inferring User Interests for Passive Users on Twitter", which is an extended work upon previous work in ECIR2017.



    Leveraging Followee List Memberships for Inferring User Interests for Passive Users on Twitter from GUANGYUAN PIAO


    Overall, the conference has around 70+ participants. However, what's impressive is the audiences were actively asking questions, and participated in discussions. In addition, the organizers made the proceedings available before the conference along with conference navigator developed by Uni. Pittsburgh: http://halley.exp.sis.pitt.edu/cn3/portalindex.php

    Proceedings: http://dl.acm.org/citation.cfm?id=3078714&picked=prox&cfid=782021270&cftoken=32813465

    Next year, HT2018 will be in Baltimore, USA. It is a good conference and hope I will have chances to attend the conference in the future as well.

    Friday, March 17, 2017

    RecSys Related Libraries List

    fastFM (Factorization Machines, Python)
    lodreclib (LODRecSys, SPrank etc., Java)

    Saturday, March 4, 2017

    IBM phone interview for Research Intern

    It was the first time I'm doing a phone interview, and also was the first time to do an interview for an internship in an industry lab. It took around 30 mins, and the overall process was smooth.

    The overall process was as below:

    1. Introduce the research internship (e.g., duration, starting date, main to-do list etc.)
    2. Other attendees (research scientists in the same team) were asked about several research questions about my research.
      • What's the research question and how do you approach it and evaluate it?
      • What kind of skills (NLP, which is related to the position) have been used?
      • What kind of machine learning techniques you are familiar with?
      • How can your research methodologies be generalized into other corpus?
      • What's your future direction to extend your current work?
    3. The last step was for me to ask some questions about the internship.

    They kindly informed me after two days as they found a better match to that position.


    Sunday, February 26, 2017

    Putty SSH remote server with password in Windows

    putty.exe -ssh [username]@[host] -pw [yourpassword]

    You can also run some commands immediately after logging into the server.

    putty.exe -ssh [username]@[host] -pw [yourpassword] -m [yourfiilename]

    You can edit the file with [yourfiilename] with commands you need to execute, e.g., a file contains commands for ending a screen 

    screen -S [screename] -X quit

    Tuesday, February 14, 2017

    아일랜드 IT 연구소들

    Nokia Bell Lab, Ireland

    Dublin based scientists at Bell Labs were instrumental in the development of the award-winning lightRadio® cube. Scientists here focus on algorithms, RF hardware, and system designs for the next-generation of small cells. Data analytics, network systems and cloud computing are also part of the research program in Dublin.

    IBM Research, Ireland

    At IBM Research - Ireland our scientists and engineers are helping clients and partners make better decisions using an array of cognitive IoT technologies and expertise. Together we are testing new technologies on real business problems and discovering new growth opportunities.
    We are focused on Cognitive IoT, Cognitive Integrated Healthcare, Interactive Reasoning, Data Centric Computing, Cloud and Privacy. Our research teams are collaborating with academic and industrial partners on several projects including research programs established by the European Union Horizon 2020 as well as pioneering collaborative projects developed side by side with University College Dublin scientists in our collaboratory.

    Intel Labs Europe Open Lab – Ireland

    The ILE Open Lab - Ireland, located on the Intel Ireland campus in Leixlip, Co. Kildare, is home to two research labs; the IoT (Internet of Things) Systems Research Lab and the Cloud Services Lab.  The Open Lab’s facilities include physical laboratories, offices, meeting and demo spaces as well as innovation/collaboration studio spaces.

    The IoT Systems Research Lab conducts research focused on the Internet of Things with a particular emphasis on distributed edge computing, Machine to Machine (M2M) communications, IoT Applications, and data analytics.  The Lab also manages a Connected Cities research portfolio which includes the Intel Collaborative Research Institute (ICRI) for Sustainable Connected Cities located in London - www.cities.io. The IoT Systems Research Lab applies a ‘Living Lab’ approach for elements of its research by developing open-innovation ecosystems and partnerships to validate research through real world deployments and test beds. The Living Lab concept has been used in a number of cities including in Dublin, London and San Jose, California.

    The Cloud Services Lab’s research agenda is focused on interoperability, dependability and platform/service differentiation for Cloud Computing across compute, storage, and network technologies.  Specific research topics include instrumentation and manageability of Cloud and Network infrastructures, open interface specifications, service level agreements, context awareness, and digital preservation.  The Lab’s research output includes usage models, prototypes, technologies, and open standards which enable dependable cloud environments and support automatic discovery and selection of optimal platform services.  The Lab is also actively advancing management framework processes for technology adoption.


    Accenture, Ireland

    Microsoft, Ireland

    SAP Business Objects (Predictive Analytics)

    Academic Research Centres:

    ADAPT, Insight Centre

    Tuesday, February 7, 2017

    Creating another environment on Anaconda (e.g., python 2)



    Create another environment on Anaconda, e.g., if you installed default is python 3, and you want to use python 2:

    - conda create -n python2 python=2.7 anaconda

    which will create python 2.7 environment with the name "python2".


    Activate and deactivate the environment (env):
    - source activate python2
    - source deactivate

    You can install packages for python 2.* environment with the python2 env being activated


    Use spyder with python 2.7

    You can run the spyder app with python 2.7 by activating the python2 env, and type spyder in the command line.

    source activate python2
    - spyder


    In case the spyder is not automatically installed during creating the env, you should install it first.

    - conda install -n python2 spyder