Our starting point was the original intuition of the linguists of the project that a change in prosody was responsible for the syntactic change leading from Classical to Modern European Portuguese. Language change is related to language acquisition. Therefore, the question was how to formalize the relationship between prosody and syntax in the acquisition process.
The project is based on a new model of language acquisition, in which prosody plays a leading role in the selection of a grammar by the learning child. This model has to do with the interface between Internal and External Language, more precisely, with the Articulatory-Perceptual interface, to use Chomsky's terminology in the Minimalist Program. However, the Principle and Parameters approach to language developed by Chomsky only accounts for linguistic competence and says nothing about performance. Due to this, no model was available in the Linguistic Theory to account for the type of questions we were interested in:
It turns out that the Thermodynamic Formalism of Statistical Physics suggests a good mathematical model in which these questions can be addressed in a rigorous way. The model is a family of probability measures, the Gibbs states. In this framework, prosody is represented by a thermodynamic potential, and syntax is represented by a set of constraints defining the set of possible productions (cf. Fernández and Galves 2001, Galves and Galves 1995, and Cassandro, Collet, Galves and Galves 1999).
Therefore it is just fair to say that the interaction among linguists and mathematicians was a condition sine qua non for the very existence of the project. The mathematical model for the interface between prosody and syntax produced by this interaction has been the main source of intuitions and the guiding line of the project.
Traditionally in linguistic theory, people either work with the notion of grammar or consider linguistic performance. In the first case, people work with the abstract notion of "ideal native speaker" able to decide categorically if a given production is allowed or not by his grammar. In the second case, they use statistical methods to describe in a quantitative way the language produced by a given society and they ignore the notion of grammar. As for the members of the first group, they do not use statistics at all, even when they consider historical linguistics. Until recently, the only exception to this general picture was the remarkable work of Tony Kroch and collaborators.
The reason of this dichotomy may be found in the difficulty of putting together grammatical competence and linguistic performance. The Thermodynamic Formalism, again, provides a suitable framework in which a model for this interface can be worked out.
Just to give an example, the notion of cost function can be used to describe the way different prosodic patterns affect the choice of the utterances in the history of Portuguese. In the definition of the cost function, both syntactic and prosodic constraints are taken into account. Through the statistical analysis of historical data, it is possible to infer the parameters defining the cost function. This makes the link between the statistical analysis of historical data and the linguistic theoretical discussion about the grammars involved in the change from Classical to Modern European Portuguese.
The work of Tony Kroch and collaborators already argues in favor of linking statistical analysis and linguistic theory. The formalism developed in the project gives an explicit and formal description of the way different components of the language faculty interact and how this interaction can be retrieved from the statistical analysis of historical data. This is the main insight produced by the dialogue among mathematicians (and statistical physicists and statisticians), and linguists from different areas (phonology, phonetics, syntax, psycholinguistics, historical linguistics) working in the project.
The project applies to Linguistics the paradigm of Statistical Physics. It is probably too early to see the significance of our results for other fields. Maybe the common characteristic of the different activities of the project is the fact that they all address the question of feature (or pattern) identification in the presence of competing evidence. This could be the source of new insights and tools in other fields.
Just to give an example, consider the maturation model of language acquisition presented in Cassandro, Collet, Galves and Galves 1999. The model is a non homogeneous Markov chain taking values in the set of all grammars. At each step, the chain either jumps to a new grammar or stays at its current position. The decision is taken under the influence of a new utterance produced by the parental grammar. More precisely, the decision is taken on a probabilistic basis, and aims to minimize some suitable cost function associated to the utterance. The question is that different utterances are associated to different and probably competing cost functions. Linguistically speaking, it is reasonable to look for situations in which the chain always converges in law to a measure supported by an unique grammar. This should take place for almost all choices of utterances and the limiting grammar should not depend on the specific choice of utterances. Linguistic intuition suggests that some conditions on the structure of the cost function should be true. It turns out that these conditions are sufficient to assure the convergence to an unique and well-specified grammar. Mutatis mutandis, this type of condition could be useful in other situations in which a decision must be taken in the presence of competing evidence.
Next: Syntax and the "Tycho Brahe" Corpus