How can we understand how societies become polarized using Twitter data? How might we extract emotion dynamics from individual time series data? In order to answer research questions such as these, the social and behavioral sciences increasingly rely on large and diverse data sets. In this course, we provide students with the practical skills necessary to take advantage of these novel data sources by providing an introduction to data wrangling, data visualization and machine learning models using the programming language Python.
|Academic dates:||19 July - 6 August 2020|
|Housing dates:||18 July - 7 August 2020|
|Academic fee:||€ 1600 read more about what’s included|
|Credits:||6 European Credits|
|Who is this programme for?||
For current university students (Bachelors and Masters) who want to acquire skills in machine learning methods and have an interest in the social and behavioral sciences. PhD candidates who wish to learn about machine learning are also welcome to apply. Participants should have taken at least one statistical class. See the How to apply page for more information.
|Academic director:||Javier Garcia-Bernado & Jonas Haslbeck|
|Early application deadline:||1 February 2020|
|Regular application deadline:||1 April 2020|
This three-week programme will give students a solid introduction to data analysis using the easy and widely used programming language, Python. The structure of the course is such that students learn about a new method or skill in the morning lecture and then immediately apply them in a practical session in the afternoon, thereby ensuring that students acquire hands-on knowledge they can apply to their own research questions. The practical sessions consist of data analysis problems based on real data taken from the social and behavioral sciences, such as the European Social Survey, data from social media (Twitter and LinkedIn), questionnaire data, and time series collected with the Experience Sampling Method (ESM).
After a short introduction to programming with Python, we first focus on data preparation, including cleaning data, merging data and the handling of missing values. Next, students will learn how to explore data with descriptive statistics and meaningful data visualizations. The largest part of the course will focus on learning and applying statistical and machine learning methods. Starting with linear regression and logistic regression, we carefully introduce more advanced prediction models such as random forests and support vector machines. Next to prediction models, we also cover clustering methods such as k-means, hierarchical clustering and t-SNE. While we cover some advanced methods, the focus of the course is on providing a conceptual understanding of the methods and ensuring that students know how to apply the methods in practice.
By the end of the programme, students will be able to leverage large datasets to answer research questions by applying machine learning methodology. Using Python, participants will be able to clean and combine datasets, create meaningful and beautiful visualizations, and carry out and draw conclusions from statistical analysis. Upon completion, participants will be able to understand when and how to use machine learning tools in the social and behavioral sciences.
|Credits||6 ECTS, 3 weeks|
|Language of instruction||English|