Thursday, July 14, 2016

UMAP2016 Travel Report

This week, I attended the 24th Conference on User Modeling, Adaptation and Personalization (UMAP 2016). This year, it held in conjunction with Hypertext Conference sharing some sessions (e.g., Doctoral Consortium, Keynote speakers). Overall, there were around 130 participants for the conference. This year, the conference received 123 submissions with a 28 % acceptance rate. 




A major change in this year was the presentation format. Different from previous years, we present 13 mins (long), 8 mins (short) with a poster session to receive more audiences and discussions.
  • Keynote Speakers:

The first keynote speaker was Hossein Derakhshan: Killing the Hyperlink, Killing the Web: the Shift from Library-Internet to Television-Internet.


The speaker is an Iranian-Canadian blogger who was imprisoned in Tehran from November 2008 to November 2014. He is credited with starting the blogging revolution in Iran and is called the father of Persian blogging by many journalists.



Some impressive phrases during the speech:

- Many internet users in Brazil and India think Facebook is the Internet
- With 150 "likes", Facebook can know better about you than your parents, with 300 likes, the service can know better you than your spouse

The second speaker Lada Adamic, who is leading the Product Science group within Facebook's Data Science Team.

The speaker described three large-scale analyses of re-share cascades on Facebook, which were performed in aggregate using de-identified data.








Summaries of the speech:



- Cascades grow

- Cascades recur

- Cascades evolve


The third speaker Sandra Carberry, who is one of the founders of the User Modeling research area at the first woskshop in Maria Laach, 1986, gave a talk on "User Modeling: the Past, the Present and the Future".

--------------------------------------------------------------------------------------------------------------------------

I was there to present a short paper, a doctoral consortium paper and an extended abstract.


  • Short Paper






  • Doctoral Consortium 

In the Doctoral Consortium, each student was assigned an expert in your topic. Tsvika Kuflik, who is on the editorial board of UMUAI, was my mentor during the conference and offered many constructive feedbacks about my thesis. 

  •  Extended Abstract
This preliminary work describes a first step of user modeling using different fields of LinkedIn profiles to investigate which field of LinkedIn profiles can be helpful for user modeling in the context of MOOC recommendations.



Many audiences asked about data collection. We used Google Customized Search Engine to search the LinkedIn website using a specific keyword like "coursera" to filter out LinkedIn profiles containing Coursera courses. For the details about the dataset, you can check the post here.

-----------------------------------------------------------------------------------------------------------------------

Impressively, the proceedings of UMAP 2016 have been available during the conference.




Thursday, June 9, 2016

MySQL - innodb_buffer_pool_size

테이블이 엄청나게 커지면서 실험에 bottleneck이 되었는데 검색하다보니 innodb_buffer_pool_size 요놈 때문인 거 같다. index를 만들었는데도 너무 느렸는데 확인해보니 innodb_buffer_pool_size가 index데이터를 cache하는 설정이다... 서버에 메모리는 빵빵하니 크게 설정해놓으면 놓을 수록 살짝 메모리DB처럼 되간다고 보면 된다고 하네...

The size in bytes of the buffer pool, the memory area where InnoDB caches table and index data. The default value is 128MB. 

Wednesday, June 8, 2016

MySQL - Index Cardinality

Index를 만들고 나서도 query가 너무 느려서 좀더 검색해밨는데 Cardinality때문이었다. Cardinality가 Index를 만드는데 아주 중요한 역할을 한다.

예를 들면Gender이라는 column에 값이 male과 female이 있다고 할 때 남자 50% 여자 50%의 비례로 분포되었다고 하면, cardinality가 2가 된다. 그 말인 즉, where gender = "male"로 검색했을때 100%의 row들을 검색하던 것이 50%가 된다는 뜻이다. 다시 말해서 그래도 엄청난 양의 row들을 scan해야 된다는 뜻....

varchar 유형의 column을 255btye로 끊여서 index를 만들었었는데 후에 들어오는 row의 값들이 길이가 늘어나면서 cardinality가 엄청나게 작은 수치로 되어서 얘들에 대해서는 검색이 엄청 늘어난 것이었다...


원문: https://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/

Wednesday, June 1, 2016

在爱尔兰申请加拿大签证

头一回申请加拿大签证,没想到网申后还得把原件邮到伦敦去办理。开始的时候,因为跟申根签证不一样,加拿大有网上申请就直接从以下网址申请了。

http://www.cic.gc.ca/english/index.asp

材料的话注册后网站上会有根据你申请的签证类型提供多选问题,回答完后根据你的情况会列出该签证你需要递交的文档。全部扫描上传即可,上传之前确认一下Instrunctions就可以。


费用:100CAD

本以为这样后会拿着护照去都柏林拿签证,后来收到邮件说要把passport寄到最近的Visa Application CenterVAC(VAC)。根据邮件的网址可以根据你的国家要寄到哪一个VAC,爱尔兰要寄到伦敦的VAC。

VAC会另收passport transmission费用,加上passport得他们邮寄过来,额外的付了121欧。
passport transmission15英镑左右
邮寄到伦敦:7欧
VAC邮寄passport回来的费用:60英镑左右
银行汇英镑:15欧

汇完给他们发送receipt证明钱汇过去以后,他们确认后会给一个tracking number。之后就直接去网站上查即可。http://www.vfsglobal.ca/canada/UnitedKingdom/track_your_application.html

好像要求original passport以后邮到VAC之后程序很快,结果2-3天就更新了。可以到你申请的http://www.cic.gc.ca/english/index.asp上登录查询,会显示application has been approved。

参考事项:
一般cic网站上可以查询你的类型签证大约需要几个工作日,如果过了几天还没有更新的话,可以通过cic网站上的[contact us]->[IRCC Web form]联系问一下。

Friday, May 27, 2016

Spring JDBC DBCP to get rid of noroute, last packet was sent 1ms ago exceptions

When using Spring JDBC for connection, it creates one connection for each call and it will end up with running out listening ports since it will take 60 seconds to release the connection (until then, it is TIME_WAIT: it can be checked using command: (netstat -nat | grep TIME_WAIT | wc -l) ).

To get rid of it, I found that using DCBP BasicDataSource instead of spring datasource with pooling.

Download : DCBP jar from https://commons.apache.org/

<bean id="dataSource" destroy-method="close"
  class="org.apache.commons.dbcp2.BasicDataSource">
      <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
      <property name="url" value="jdbc:mysql://localhost:3306/RecSys2016"/> 
      <property name="connectionProperties" value="useUnicode=yes;characterEncoding=utf8;"/>
      <property name="username" value="root"/>
      <property name="password" value="root"/>

</bean>


Monday, May 23, 2016

Java: Date from unix timestamp

Multiply by 1000, since java is expecting milliseconds:
java.util.Date time=new java.util.Date((long)timeStamp*1000);

Wednesday, May 4, 2016

DBpedia difference between dbo and dbp predicates/properties

From DBpedia-disccusion mailing list: https://groups.google.com/forum/#!topic/thosch/CN1kBh3auCk

DBpedia differentiates between information extracted from the wikipedia dumps without an alignment to the DBpedia ontology (raw extraction) and the mapping based extraction (based on mappings between wikipedia infoboxes and the ontology).

Mapping-based Properties

High-quality data extracted from Infoboxes using the mapping-based extraction. The predicates in this dataset are in the /ontology/ namespace.
Note that this data is of much higher quality than the Raw Infobox Properties in the /property/ namespace. For example, there are three different raw Wikipedia infobox properties for the birth date of a person. In the /ontology/ namespace, they are all mapped onto one relation http://dbpedia.org/ontology/birthDate. It is a strong point of DBpedia to unify these relations.

Filtering dbo properties using SPARQL:

SELECT ?s ?o WHERE
{
  ?s <http://www.w3.org/2000/01/rdf-schema#label> ?o .
  FILTER regex(str(?s), "^http://dbpedia.org/ontology") .
}

Saturday, April 30, 2016

이탈리아 피렌체 중앙시장 근처 맛집 - osteria pepo

피렌체 중앙시장에 갓다가 시장안의 푸드코트에서 점심 먹으려 했으나...

인간이 너무 많은 관계로 밖에서 맛집 수색작업 시작...

처음엔 Trattoria Mario라는 레스토랑을 가려고 햇으나 줄이 ㅎㄷㄷ해서 옆집으로...

 osteria pepo 라는 아담한 사이즈의 레스토랑... 라자니아와 그린 패퍼 스테이크 메뉴를 시켯는데 

역시 이탈리아라 어디든 다 맛있다... 

다 먹고 나올때 보니 Trattoria Mario 안가길 잘 했다는 생각이.. 
줄이 그새 더 많이 늘었다 .. 시간 여유 있으신 분들은 한번 도전해보시길...