OpenSoNaR is an online corpus retrieval system that allows for analyzing and searching the SoNaR and CGN corpora.

The current application contains two corpora:

- The SoNaR corpus contains more than 500 million words of text from various domains and genres. All texts were tokenised, POS tagged and lemmatised. The named entities were also labelled. All annotations of SoNaR were produced automatically. The corpus data are available for researchers, cf. More information about the corpus can be found in the corpus documentation at

- The Corpus of Spoken Dutch (Corpus Gesproken Nederlands, CGN) is a collection of 900 hours (almost 9 million words) of contemporary Dutch speech, originating from Flemish and Dutch speakers. The speech fragments (spontaneous and prepared) are aligned with various transcriptions (including orthographic, phonetic) and annotations (lemma, POS tags). All annotations have been verified manually, except for the phonetic transcription: only 11,3% was verified. The corpus data are available for researchers, cf

In verband met het auteursrecht is dit product alleen toegankelijk met een gebruikersnaam en wachtwoord. Bent u in dienst van een universiteit of wetenschappelijk instituut? Dan kunt u inloggen met de gebruikersnaam en het wachtwoord van uw eigen organisatie.

Staat uw organisatie niet in de lijst of heeft u geen account bij een academische instelling? Dan kunt u bij CLARIN.EU een account aanvragen.

To use this application you need an account. Employees of universities or research institutes can log in with the user ID and password of their own organization. Click on the login button, select your organization from the list, and log into the website by using your academic account.

If you do not have an account at an academic institute, please apply for an account at CLARIN.EU.