International research planning meeting at Columbia University, New York



The main purpose of this research planning meeting is to build on established collaborative relationships between CBS, Copenhagen University, and the Institute for Data Sciences and Engineering at Columbia University, as well as US researchers from other institutions, including UC Santa Cruz and Cornell University. The research collaborations are in the area of Big Data and Language Technology: in particular, the group will focus on Topic Classification and Sentiment Analysis.

We have established a strong relationship with the media monitoring company Infomedia, which possesses over 50 million Danish newspaper articles, many of them classified for sentiment and topic. In this research we seek to produce automatic approaches that are as good or better than the manual processes currently employed by Infomedia. This is an unusual opportunity in that it involves a massive collection of relevant text data, and this is especially interesting because this data is in Danish, and language generally lacking large collections of data. The meetings will also involve identification of similarly important and challenging English datasets.

The two primary technologies are Topic Classification and Sentiment Analysis: both of these involve Machine Learning techniques such as Naive Bayes, Support Vector Machines, and Deep Learning. The participants in this research group include some of the world’s leading experts in the application of these technologies in Language Technology, including Owen Rambow at Columbia and Claire Cardie at Cornell University.

The primary goal of this meeting is to produce concrete initial results in the research areas of Topic Classification and Sentiment Analysis, applied to extremely large and important data sets. This opens new frontiers in the area of Language Technology, and is especially important in the Danish context.


