- 1. THE CORPUS
- 2. USING THE CORPUS
- 3. PROJECT/PUBLICATIONS
This is an old revision of the document!
You can query the corpus for very different pieces of information such as messages written in the chats, part of speech annotations, demographic information like the age of the informant, or statistical information like the number of messages in a chat.
Please keep in mind that all the fields in the corpus are text fields.
The following three options for querying the corpus are described in more detail in the sub-sections of this document:
Please remember to always keep in mind the unit that you are querying. If you query in individual tokens, you do not have to consider separators such as spaces, punctuation, tabs etc. If, on the other hand, you work on a whole message, you have to take such things into account. You also have to remember that querying over whole messages is very slow and can lead to time outs.