Quantitative Analysis of Linguistic Data

In July 2014, ZüKL and VariaForMea are jointly organising the doctoral seminar "Quantitative Analysis of Linguistic Data" (QUALD). Four blocks will offer an inspiring, application-oriented introduction into the collection and management of quantitative data as well as into their analysis based on statistical methods and their interpretation.

The seminar is open to all Swiss PhD students. Members of the PhD Programme Linguistics at UZH are granted free access as long as places are available. External students have to send an application (see below). Students affiliated with VariaForMea (universities of Freiburg, Geneva, Lugano, Neuchâtel, Zurich) can get reimbursement for travel and accommodation expenses from VariaForMea (see this link for details). The seminar is not open for students from outside Switzerland.

General information

Time and place

22 - 25 July 2014, University of Zurich, room KO2-F-172


The seminar consists of four blocks that refer to each other but can also be followed separately. Each block stretches over two half days.


The titles of the blocks are (teachers in brackets):

See here for more detailed descriptions.


Students who have participated in a block and have successfully solved exercises are granted 1 ECTS point per block.


If you want to participate, please send us the following information:

  • Full name
  • Affiliation (university, department, PhD programme if applicable)
  • Blocks you would like to participate in
  • ECTS points needed? (yes/no)
  • Any topics within the blocks would be especially interested in

If you are not a member of the PhD Programme Linguistics of UZH, please also attach your CV and a cover letter.

The information and documents should be sent to:

You will be notified about your acceptance until 30 April (VariaForMea) / 15 May (DPL, others).

Please understand that the number of participants is restricted to 20 per block. 10 places each are reserved for members of the PhD Programme Linguistics and for external participants. If less than 10 applications have been submitted by 15 April in either of the two groups, applicants on the waiting list of the other group may be assigned one of the reserved places.

Detailed course descriptions

① Practical introduction to statistics

  • Time: 22 + 23 July, 9:00-13:00
  • Teacher: Gerold Schneider
  • Description: Statistical methods have become part and parcel of modern linguistics and are used in such different fields as phonetics, typology, historical linguistics, or corpus linguistics. They are the primary means of analysing quantitative data and are indispensable when it comes to judging their significance. This course offers a short and practical introduction to statistical methods and their use in linguistics. We will first have a look at methods for describing quantitative data such as the mean and standard deviation. Following this, some statistical significance tests (t-test, χ2-test) will be introduced together with their theoretical backgrounds. Finally we will also have a look at the concept of statistical models.
  • Requirements: Exercises will be done in each participant's spreadsheet software of choice and the programming language R. Both should therefore be preinstalled. Some common spreadsheet programmes are Excel (Windows, commercial) and OpenOffice (platform-independent, freeware). R can be downloaded for free from www.r-project.org. Previous knowledge of programming languages is not necessary.
  • Language:German or English (depending on participants)

② Database design

  • Time: 22 + 23 July, 14:30-18:30
  • Teacher: Steven Moran
  • Description: Databases are structures for storing data whose purpose is to make entering, searching, analysing, and updating data as easy and efficient as possible. Databases are growing more and more important in linguistics as amounts of data become large and relations between data more complex. This course starts with an introduction into the basics of database theory and then focuses on practical aspects: Which questions have to be answered before creating a database? Which database management systems (DBMS) exist, and for which applications are they suitable? Which techniques are important for collecting data efficiently? How can a DBMS help with analysing data, and which interfaces exist to other analysis tools?
  • Requirements: The following programmes must be pre-installed::
    • any spreadsheet software (e.g. MS Excel, OpenOffice Calc)
    • MAMP (Mac) or XAMP (Windows)
    • Navicat for MySQL (LOCALHOST copy, may require MAMP/XAMP). There is a free trial version available.
    Additional programmes that will be used in examples (installation optional):
  • Language: English

③ From theory to data and back

  • Time: 24 + 25 July, 9:00-13:00
  • Teacher: Tanja Samardžić
  • Description: Using quantitative data in linguistics has many advantages - they can be analysed using established statistical techniques, they are easier to process objectively than qualitative data, and they are easier to relate to other data. However, all types of data can only be used scientifically once they are assigned an interpretation in a theoretical context. Therefore, this course is all about the question how quantitative data can be linked to linguistic theories and concrete problems. Based on two elaborated examples, we will see how one can define plausible linguistic hypotheses which can be tested in a quantitative framework. In particular, we will discuss the following questions: What kind of data do I need to prove or corroborate my theory? How do I look at my data? How do I draw sound conclusions from my dataset?
  • Requirements: Participants should download the preparation sheet, fill it in and bring it with them to the course. The sheet also contains recommended literature with links. Two texts are not available online but can be downloaded from this page: Croft and Poole (2008) and Samardžić (2014).
  • Language: English

④ Data transformation

  • Time: 24. + 25.7., 14:30 - 18:30 Uhr
  • Teacher: Taras Zakharko
  • Description: Statistic analysis usually requires data to be available in a particular format. However, real world data may look quite different - a text corpus, a typological database, a collection of questionnaire answers. In order to analyse such data, they must first be aggregated, reshaped and reconstituted into a suitable form - a prodecure commonly known as data transformation. This course is a practical introduction to data transformation using R - a programming environment for statistical computing and data mining. Being highly flexible and easy to learn and use, R is becoming increasingly popular among linguists. Beside this we will also touch on the topic of data visualisation.
  • Requirements: R (downloadable from www.r-project.org) should be preinstalled on the participants' computers.
  • Language: German or English (depending on participants)