Tuesday, September 20, 2016

在爱尔兰申请申根签证-德国大使馆

在德国大使馆申请申根跟意大利和法国大使馆申请差不多。

在爱尔兰申请申根签证-意大利大使馆

在爱尔兰申请申根签证-法国大使馆


最不同点是邀请信跟学校信都得是亲自签字的元原版不能使复印件。。。

  • Invitation letter (original signature)
  • School letter (original signature)

德国那边的邀请信直到去签证的时候还没有寄过来,德国Leipzig邮到Galway用了7天。。。把情况跟那边确认已发送的邮件打印给签证官,终于在走之前拿了签证。拿到的原版后来寄了过去。

16:30在galway邮寄的第二天早上到达dublin。

Wibitaxonomy to HDT file

http://wibitaxonomy.org/  is a unified, 3-phase approach to the automatic creation of a bitaxonomy for Wikipedia, that is, an integrated taxonomy of Wikipage pages and categories.

Download RDF file from http://wibitaxonomy.org/download.jsp

http://www.rdfhdt.org/ provides faster access to the triples. I tried to create .hdt file for the .ttl file from wibitaxonomy but found that it is not possible due to the memory limits in my personal computer. So I ended up with parsing it in a server.

Tuesday, September 13, 2016

SEMANTiCS 2016 Travel Report

Day-1: Tutorials & Workshops

I attended the afternoon session about Knowledge Engineering track using PoolParty from Semantic Web Company. I'm interested in how those Semantic Technologies being used in different enterprises, and what kind of solutions they need for soloving what kinds of problems. There were many industrial participants in Europe including Springer etc. As a researcher working closely on Semantic Technologies,  Some told they are already using PoolParty and some were attending for better understanding of using Semantic Technologies in Enterprise scenarios, and most of the cases were wondering about integration of heterogeneous data sources, taxonomies and ontologies.

Day-2: Main conference

Stats: This year's conference received 85 submissions with 18 full papers(21.2%) and 8 short papers.

The first keynote: "Linked data experience at Springer Nature" by Michele Pasin

Dr. Michele talked about a summary of Springer's experience with Linked Data & Semantic Technologies for enterprise metadata management at large scale. He also introduced scigraph.com - a upcoming LD platform: one place for their all linked data efforts towards liked science data.





The second keynote: "The semantics of human network" by Marie Wallace, IBM

Marie from IBM shared their experience of using human network which generated by their enterprise social networks using IBM connections for different applications and services. She stressed that capturing human context at a global level, which is happening thanks to the social networks and IoT enabled world, is really important to help human digital experience.



These social dashboards for each employee shows different factors such as activity, reaction etc. of your personal social status and can also provide some recommendations for your improvements in different aspects.

I presented my full paper: "Exploring Dynamics and Semantics of User Interests for User Modeling on Twitter for Link Recommendations" in the Knowledge Discovery session. It is impressive to see the room was full of audiences and had interesting discussion with some audiences. This work also won the best paper award at #semanticsconf.





Day-3: Main conference

The first keynote: "Learning with Memory Embeddings and its Application in the Digitalization of Healthcare" by Volker Tresp from SIEMENS

He talked about mapping of the knowledge graph to a tensor representation whose entries are predicted by models using latent representations of generalized entities, and extension of this approach for medical decision processes.







The second keynote: "Enriching Content with User Data and Semantic Information" by Cathy Dolbear from Oxford Press

She talked about combining human-authored semantic information with semantic tags and taxonomy classifications automatically extracted from our content. She also introduce the Oxford Global Languages project, which links lexical information from multiple global and also digitally under-represented  languages such as isiZulu and Urdu in a triple store.









It was a wonderful event which can meet industry people who are dealing with real-world problems with Semantic Technologies, as well as academic researchers. Hope to attend the conference again in the future:)


Thursday, July 14, 2016

UMAP2016 Travel Report

This week, I attended the 24th Conference on User Modeling, Adaptation and Personalization (UMAP 2016). This year, it held in conjunction with Hypertext Conference sharing some sessions (e.g., Doctoral Consortium, Keynote speakers). Overall, there were around 130 participants for the conference. This year, the conference received 123 submissions with a 28 % acceptance rate. 




A major change in this year was the presentation format. Different from previous years, we present 13 mins (long), 8 mins (short) with a poster session to receive more audiences and discussions.
  • Keynote Speakers:

The first keynote speaker was Hossein Derakhshan: Killing the Hyperlink, Killing the Web: the Shift from Library-Internet to Television-Internet.


The speaker is an Iranian-Canadian blogger who was imprisoned in Tehran from November 2008 to November 2014. He is credited with starting the blogging revolution in Iran and is called the father of Persian blogging by many journalists.



Some impressive phrases during the speech:

- Many internet users in Brazil and India think Facebook is the Internet
- With 150 "likes", Facebook can know better about you than your parents, with 300 likes, the service can know better you than your spouse

The second speaker Lada Adamic, who is leading the Product Science group within Facebook's Data Science Team.

The speaker described three large-scale analyses of re-share cascades on Facebook, which were performed in aggregate using de-identified data.








Summaries of the speech:



- Cascades grow

- Cascades recur

- Cascades evolve


The third speaker Sandra Carberry, who is one of the founders of the User Modeling research area at the first woskshop in Maria Laach, 1986, gave a talk on "User Modeling: the Past, the Present and the Future".

--------------------------------------------------------------------------------------------------------------------------

I was there to present a short paper, a doctoral consortium paper and an extended abstract.


  • Short Paper






  • Doctoral Consortium 

In the Doctoral Consortium, each student was assigned an expert in your topic. Tsvika Kuflik, who is on the editorial board of UMUAI, was my mentor during the conference and offered many constructive feedbacks about my thesis. 

  •  Extended Abstract
This preliminary work describes a first step of user modeling using different fields of LinkedIn profiles to investigate which field of LinkedIn profiles can be helpful for user modeling in the context of MOOC recommendations.



Many audiences asked about data collection. We used Google Customized Search Engine to search the LinkedIn website using a specific keyword like "coursera" to filter out LinkedIn profiles containing Coursera courses. For the details about the dataset, you can check the post here.

-----------------------------------------------------------------------------------------------------------------------

Impressively, the proceedings of UMAP 2016 have been available during the conference.




Thursday, June 9, 2016

MySQL - innodb_buffer_pool_size

테이블이 엄청나게 커지면서 실험에 bottleneck이 되었는데 검색하다보니 innodb_buffer_pool_size 요놈 때문인 거 같다. index를 만들었는데도 너무 느렸는데 확인해보니 innodb_buffer_pool_size가 index데이터를 cache하는 설정이다... 서버에 메모리는 빵빵하니 크게 설정해놓으면 놓을 수록 살짝 메모리DB처럼 되간다고 보면 된다고 하네...

The size in bytes of the buffer pool, the memory area where InnoDB caches table and index data. The default value is 128MB. 

Wednesday, June 8, 2016

MySQL - Index Cardinality

Index를 만들고 나서도 query가 너무 느려서 좀더 검색해밨는데 Cardinality때문이었다. Cardinality가 Index를 만드는데 아주 중요한 역할을 한다.

예를 들면Gender이라는 column에 값이 male과 female이 있다고 할 때 남자 50% 여자 50%의 비례로 분포되었다고 하면, cardinality가 2가 된다. 그 말인 즉, where gender = "male"로 검색했을때 100%의 row들을 검색하던 것이 50%가 된다는 뜻이다. 다시 말해서 그래도 엄청난 양의 row들을 scan해야 된다는 뜻....

varchar 유형의 column을 255btye로 끊여서 index를 만들었었는데 후에 들어오는 row의 값들이 길이가 늘어나면서 cardinality가 엄청나게 작은 수치로 되어서 얘들에 대해서는 검색이 엄청 늘어난 것이었다...


원문: https://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/

Wednesday, June 1, 2016

在爱尔兰申请加拿大签证

头一回申请加拿大签证,没想到网申后还得把原件邮到伦敦去办理。开始的时候,因为跟申根签证不一样,加拿大有网上申请就直接从以下网址申请了。

http://www.cic.gc.ca/english/index.asp

材料的话注册后网站上会有根据你申请的签证类型提供多选问题,回答完后根据你的情况会列出该签证你需要递交的文档。全部扫描上传即可,上传之前确认一下Instrunctions就可以。


费用:100CAD

本以为这样后会拿着护照去都柏林拿签证,后来收到邮件说要把passport寄到最近的Visa Application CenterVAC(VAC)。根据邮件的网址可以根据你的国家要寄到哪一个VAC,爱尔兰要寄到伦敦的VAC。

VAC会另收passport transmission费用,加上passport得他们邮寄过来,额外的付了121欧。
passport transmission15英镑左右
邮寄到伦敦:7欧
VAC邮寄passport回来的费用:60英镑左右
银行汇英镑:15欧

汇完给他们发送receipt证明钱汇过去以后,他们确认后会给一个tracking number。之后就直接去网站上查即可。http://www.vfsglobal.ca/canada/UnitedKingdom/track_your_application.html

好像要求original passport以后邮到VAC之后程序很快,结果2-3天就更新了。可以到你申请的http://www.cic.gc.ca/english/index.asp上登录查询,会显示application has been approved。

参考事项:
一般cic网站上可以查询你的类型签证大约需要几个工作日,如果过了几天还没有更新的话,可以通过cic网站上的[contact us]->[IRCC Web form]联系问一下。

Friday, May 27, 2016

Spring JDBC DBCP to get rid of noroute, last packet was sent 1ms ago exceptions

When using Spring JDBC for connection, it creates one connection for each call and it will end up with running out listening ports since it will take 60 seconds to release the connection (until then, it is TIME_WAIT: it can be checked using command: (netstat -nat | grep TIME_WAIT | wc -l) ).

To get rid of it, I found that using DCBP BasicDataSource instead of spring datasource with pooling.

Download : DCBP jar from https://commons.apache.org/

<bean id="dataSource" destroy-method="close"
  class="org.apache.commons.dbcp2.BasicDataSource">
      <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
      <property name="url" value="jdbc:mysql://localhost:3306/RecSys2016"/> 
      <property name="connectionProperties" value="useUnicode=yes;characterEncoding=utf8;"/>
      <property name="username" value="root"/>
      <property name="password" value="root"/>

</bean>


Monday, May 23, 2016

Java: Date from unix timestamp

Multiply by 1000, since java is expecting milliseconds:
java.util.Date time=new java.util.Date((long)timeStamp*1000);