Corpus archive

ZüKL's main goal is to connect the linguistic departments of UZH. As a part of this, we try to improve the visibility of resources (especially corpora) which are available or under construction at the different departments. Before this background, ZüKL offers to archive completed corpora and make them accessible to the public.

In order to archive corpora on the ZüKL server, you must make sure that it fulfills the following requirements:

Format and documentation

  • Data must be present in plain text (e.g. verticalised, CSV/tab-delimited) or XML (with DTD or XML schema), preferably in a standard format such as TEI, XCES, PAULA, EXMARaLDA, Linguistic Annotation Framework (LAF, ISO 24612)
  • A documentation of the used format, annotation schema, and annotation guidelines is required.
  • Binary files that can only be opened with dedicated software will not be accepted.

Publication

ZüKL only archives corpora that are free to use. The minimal requirement is that UZH researchers get access to the corpus, but we prefer open access to the public for non-commercial use (provided that copyrights, licences and protection of privacy are respected). This requires that a licence is chosen for every corpus.

Information on licences:

Every corpus needs an official citation form. Moreover, for every corpus a contact person and a responsible department of UZH must be given.

Publication options

If your corpus meets the requirements, you may choose between the following options for its publication.

Download vs web access

You can choose whether your corpus should be downloadable directly from the ZüKL website or be accessible via an online access tool like CQPWeb or ANNIS. Online access tools have the advantage of offering easy and platform-independent access.

Direct vs controlled access

In either case you may further decide whether people should access the corpus directly or whether they should send a request through a contact form first. The recipient may either be the ZüKL technical coordinator or an administrator from your own team.

The technical coordinator usually grants access to corpora automatically after receiving a request. The registration process still gives you the possibility of overviewing all access permits. For more control and the possibility to deny requests, it is recommended to define a contact person from your side. In such a case you should make sure that the e-mail address in question is checked regularly. After you accept a request, you will have to inform the technical coordinator so s/he can create a new account.

Any questions?

If you would like to archive your corpus on the ZüKL server, please contact the ZüKL technical coordinator.

Also note our other services.