Information détaillée concernant le cours

[ Retour ]
Titre

Automated web data collection

Dates

26-27 March 2024

Lang EN Workshop language is English
Organisateur(s)/trice(s)

Elisa Volpi, UNIGE

Intervenant-e-s

Prof. Dominic Nyhuis (Leibniz University Hannover)

 

Description

The comprehensive digitalization of information constitutes an enormous opportunity for the social sciences. Contemporary social scientists have access to an abundance of data that enables research on questions that would have been well beyond empirical study a mere two decades ago. What is more, even single researchers can amass enormous datasets with the right tools at almost no costs. This course aims to give students an overview of the opporunities and equip them with the basic tools to conduct their own data collection projects. To this end, the course will cover the main web technologies, specifically communication standards, such as URL and HTTP, standards for structuring information, such as HTML/XML, languages to query information, such as XPath, CSS selectors, and regular expressions, as well as application programming interfaces. Relying on these tools, students will conduct their own data collection projects. Basic knowledge of the programming language R is expected.

 

Programme

Tentative schedule for workshop

 

March 26, Morning session (starting at 10h)

 

-          Introduction

-          HTML/XML

 

March 26, Afternoon session

 

-          XPath/CSS Selectors

-          Regular expressions

 

March 27, Morning session (starting at 10h)

 

-          HTTP/URLs

-          APIs

 

March 27, Afternoon session

 

-          Application

-          Wrap-up

Lieu

UNIBE

Information
Places

15

Délai d'inscription 19.03.2024
short-url short URL

short-url URL onepage