Rhythmic Patterns, Parameter Setting & Language Change

July, 2001

Syntax and the "Tycho Brahe" Corpus

The Tycho Brahe Corpus is part of the project Rhythmic Pattern, Parameter Setting and language Change, whose primary goal is to model up the relationship between prosody and syntax in the process of language change which led from Classical Portuguese to Modern European Portuguese. It consists of texts written by Portuguese authors born between 1550 and 1850. This electronic corpus, developed in the lines of the Penn-Helsinki Parsed Corpus of Middle English, is hosted at the network of the Institute of Mathematics and Statistics of the University of São Paulo, being available to scholars for educational and research purposes. The methodology followed for the construction of the corpus is presented in http://www.ime.usp.br/~tycho/corpus/manual/.

The basic hypotheses of the project are the following:

  1. the syntactic change, which occurred in Portuguese at the beginning of the 19th century, was driven by a previous prosodic change, which took place during the 18th century and affected the rhythmic pattern of the spoken language;
  2. for the purposes of this research, the prosody of Classical Portuguese is identical to the prosody of Brazilian Portuguese;
  3. written texts reflect their author's rhythmic patterns, through lexical and syntactic choices driven by prosodic considerations which are not affected by the norm.

This project adopts the Principles and Parameters approach to syntactic theory which has been developed by N. Chomsky and collaborators. The relation between syntax and prosody at the interface between grammar and the Articulatory-Perceptual performance system is modeled by the Thermodynamic Formalism. The statistical analysis of historical texts is based on the theoretical tools developed by A. Kroch and collaborators. Besides the regression models used in the statistical analysis of historical texts, the analysis of the phonetic data requires statistical modeling and inference of the underlying stochastic processes. The organisation of the Tycho Brahe Corpus follows the steps taken for the Penn-Helsinki Parsed Corpus of Middle English. In particular, automatic morphological and syntactic parsers are developed for Portuguese.

This is a multi-disciplinary project, involving several scientific domains. Consequently, the team of researchers working in this project includes syntacticians, phonologists, phoneticists, specialists in the history of Portuguese, statistical-physicists, probabilists, statisticians and computer scientists.

In order to develop this research, we are providing a detailed account of the clitic placement changes in texts written by Portuguese authors born between 1550 and 1850, describing the grammars at use. The first results of this research are available in "First results from the Tycho Brahe Corpus".

This project also aims at providing an account of the time evolution of the rhythmic patterns of Portuguese as detected in historical written texts. This will make it possible to verify the hypothesis that the syntactic change from Classical to Modern European Portuguese was the result of a previous prosodic change, and to date both changes. Furthermore, apart from providing a better comprehension of the linguistic phenomena, the use of mathematical formalism in modeling them may also lead to new results in stochastic processes and statistics.


