Information détaillée concernant le cours

[ Retour ]

Automated web data collection


26-27 March 2024

Lang EN Workshop language is English

Elisa Volpi, UNIGE


Prof. Dominic Nyhuis (Leibniz University Hannover)



The comprehensive digitalization of information constitutes an enormous opportunity for the social sciences. Contemporary social scientists have access to an abundance of data that enables research on questions that would have been well beyond empirical study a mere two decades ago. What is more, even single researchers can amass enormous datasets with the right tools at almost no costs. This course aims to give students an overview of the opporunities and equip them with the basic tools to conduct their own data collection projects. To this end, the course will cover the main web technologies, specifically communication standards, such as URL and HTTP, standards for structuring information, such as HTML/XML, languages to query information, such as XPath, CSS selectors, and regular expressions, as well as application programming interfaces. Relying on these tools, students will conduct their own data collection projects. Basic knowledge of the programming language R is expected.



Tentative schedule for workshop


March 26, Morning session (starting at 10h)


-          Introduction

-          HTML/XML


March 26, Afternoon session


-          XPath/CSS Selectors

-          Regular expressions


March 27, Morning session (starting at 10h)


-          HTTP/URLs

-          APIs


March 27, Afternoon session


-          Application

-          Wrap-up





Délai d'inscription 19.03.2024
short-url short URL

short-url URL onepage