Zentrum für Statistik
Uni von A-Z
Universität Bielefeld > Zentrum für Statistik

3. Nachwuchsworkshop des ZeSt

Am 5. Mai 2017 findet an der Universität Bielefeld der 3. Nachwuchsworkshop des Zentrums für Statistik statt.

Im Rahmen des Nachwuchsworkshops stellen sich Doktorandinnen und Doktoranden der am Zentrum für Statistik beteiligen Bereiche gegenseitig ihre Forschungsfelder vor und diskutieren darüber. Der Nachwuchsworkshop findet von 9:30-17:00 Uhr in W9-109 statt.

Bei Interesse melden Sie sich bitte per E-Mail bei Nina Westerheide.

Programm des Nachwuchsworkshops


Vorträge der teilnehmenden Doktorandinnen und Doktoranden (in alphabetischer Reihenfolge)

Timo Adam
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Gradient boosting in Markov-switching distributional regression models

We propose a novel class of flexible latent-state time series regression models which we call Markov-switching generalized additive models for location, scale and shape. In contrast with conventional Markov-switching generalized additive models, the presented methodology allows to model latent state-dependent distribution parameters beyond the mean - including variance, skewness and kurtosis parameters - as potentially smooth functions of a given set of explanatory variables. We derive a novel EM algorithm and demonstrate how the recently introduced gradient boosting framework can be exploited to prevent overfitting while simultaneously performing variable selection. The suggested approach is illustrated in a real-data example, where we model the conditional distribution of the daily average price of energy in Spain over time.


Manuel Batram
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Model selection and model averaging in MACML-estimated MNP models

This paper provides a review of model selection and model averaging methods for multinomial probit models estimated using the MACML approach. The proposed approaches are partitioned into test based  methods (mostly derived from the likelihood ratio paradigm), methods based on information criteria and model averaging methods. Many of the approaches first have been derived for models estimated using maximum likelihood and later adapted to the composite marginal likelihood framework. In this paper all approaches are applied to the MACML approach for estimation. The investigation lists advantages and disadvantages of the various methods in terms of asymptotic properties as well as computational aspects. We find that likelihood-ratio-type tests and information criteria have a spotty performance when applied to MACML models and instead propose the use of an empirical likelihood test. Furthermore, we show that model averaging is easily adaptable to CML estimation and has promising performance w.r.t to parameter recovery. Finally model averaging is applied to a real world example in order to demonstrate the feasibility of the method in real world sized problems.


Rainer Buschmeier
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

On the Specification and Estimation Performance of the CCA Subspace Algorithm in Multiple Frequency I(1) Data

The subspace algorithm in the canonical correlation variant due to Larimore (1983) has recently been shown to be a strongly consistent estimator of the system matrices when the underlying dgp is a VARMA process possessing seasonal unit roots Bauer and Buschmeier (2016). Also tests for the number of stochastic trends have been proposed therein. A simulation exercise in that study comes to the conclusion that determining the number of stochastic trends based on the subspace algorithm can be preferable over likelihood ratio testing when the dgp is more general than a finite-order vector autoregression and contains a large number of variables. In this paper, this finding is investigated further. For this matter, multivariate dgps with an increasing dimension and an increasing number of unit roots at the seasonal frequencies are used to generate the data. Based on these, also the accuracy of the estimates of the cointegrating relationships are compared for the subspace algorithm and the likelihood-based procedure of Johansen and Schaumburg (1999).


Hendrik ter Horst
Universität Bielefeld, Cognitive Interaction Technology (CITEC)

Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models

The problems of recognizing mentions of entities in texts and linking them to unique knowledge base identifiers have received considerable attention in recent years. In this paper we present a probabilistic system based on undirected graphical models that jointly addresses both the entity recognition and the linking task. Our framework considers the span of mentions of entities as well as the corresponding knowledge base identifier as random variables and models the joint assignment using a factorized distribution. We show that our approach can be easily applied to different technical domains by merely exchanging the underlying ontology. On the task of recognizing and linking disease names, we show that our approach outperforms the state-of-the-art systems DNorm and TaggerOne, as well as two strong lexicon-based baselines. On the task of recognizing and linking chemical names, our system achieves comparable performance to the state-of-the-art.


Denise Kerkhoff
Universität Bielefeld, Fakultät für Psychologie und Sportwissenschaft, Abteilung Psychologie, Arbeitseinheit 06 - Methodenlehre

Influence of Sample Size on Parameter Estimates in Three-Level Random Effects Models

In psychological research, observational units are oftentimes nested within superordinate groups. Whenever data is clustered, researchers need to account for hierarchy in the data by means of multilevel modeling, as differential effects on higher level observations impact analysis results. Especially in three-level longitudinal models, it is often unclear which overall sample size is necessary and how many units per level need to be measured for reliable parameter estimation results and sufficient statistical power for hypothesis testing. This research project aims at developing a guide to deciding on optimal sample sizes for various psychological fields of research where three-level data structures are common. In a first step, typical observational units, sample sizes, dropout rates, and methods for data collection are gathered for different fields of psychological research. Based on this information, samples of different sizes with characteristics typical for each field are then simulated and analyzed using – if possible – the MCMC method, in order to assess soundness of estimation results. By analyzing and evaluating various data structures, our goal is to provide comprehensive sample size recommendations for different research questions. Currently, we are collecting and discussing possible data characteristics and software programs to conduct our simulations.


Jan Klostermann
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

A method to analyze brand perceptions using user-generated content

The increasing amount of time that consumers spend on social networks in the last years lays the foundations for extracting brand related content to analyze consumers’ brand perceptions. This talk presents a versatile approach to harvest the rich informational value from Instagram’s social network. Using the human associative memory model as the theoretical framework, we generate associative brand maps based on the sentiment that users express and the co-occurrence of tags they use. Tags are a common way of labelling online content and previous research has shown their convenience in social media mining. A multifaceted empirical study of the approach is presented to illustrate the theoretical implications and managerial usefulness.


Marius Ötting
Universität Bielefeld, Fakultät für Psychologie und Sportwissenschaft, Abteilung Sportwissenschaft, Arbeitsbereich V - Sport & Fakultät für Wirtschaftswissenschaften

Match-Fixing in the Italian Serie B – An Empirical Approach Using Flexible Regression

Between 2009 and 2015, several match-fixing occurrences were detected in Italian soccer, especially in the second division (Serie B). This resulted in forced relegation and point deduction for various teams, potentially endangering the integrity of this league. Recent literature suggests to model betting odds using game and team characteristics, in order to identify deviation between actual and fair betting odds. In this work, both betting odds and the total volume of bets placed are modeled, instead of considering only the former. In this regard, the Serie B is a useful case study, since here for several matches it has effectively been proven that they were fixed. In order to find matches with a strong indication of match-fixing activities, pre-game data from the betting exchange platform Betfair are analysed. In addition to betting odds and volumes, several explanatory variables are included in the data, such as home and away team, the type of bet and the matchday. From visual inspection of the data it can be found that betting volumes vary substantially across the 42 matchdays, rendering it difficult to accommodate the matchday effect within a linear model. In order to account for this empirical pattern, the very flexible GAMLSS model framework is introduced, which allows to simultaneously model several parameters of the distribution of the response variable and to estimate smooth functional effects of noncatergorical covariates. Finally, outliers are detected via the standardized residuals of the estimated model. These suspicious matches are presented together with a confusion matrix.


Benjamin Paaßen
Universität Bielefeld, Cognitive Interaction Technology (CITEC)

Transfer Learning for robust prostheses control

Although modern bionic prostheses are mechanically flexible and precise enough to control even individual fingers, controlling many degrees of freedom at once in real time requires a rapid and intuitive user interface. This is offered by myoelectric controllers which record the residual muscle activity of the amputee and infer the intended motion via machine learning models. Critically, such machine learning models need to be robust with respect to everyday disturbances, such as electrode shift, sweat, fatigue, or posture changes in order to be applicable – but all machine learning models up to date fail in this regard. We propose transfer learning as an approach to improve robustness with respect to electrode shift. This framework allows to adjust the model to the disturbance without retraining it using as few new training data points as possible.


Jennifer Pohle
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Pragmatic order selection in Hidden Markov Models

We discuss the notorious problem of order selection in hidden Markov models, i.e. of selecting an adequate number of states, highlighting typical pitfalls and practical challenges arising when analyzing real data. Extensive simulations are used to demonstrate the reasons why well-established formal procedures for model selection, such as those based on standard information criteria, tend to favor models with numbers of states that are undesirably large in situations where states shall be meaningful entities. We also offer a pragmatic step-by-step approach together with comprehensive advice for how practitioners can implement order selection. Our proposed strategy is illustrated with a real-data case study on muskox movement.


Lena Verneuer

Universität Bielefeld, Fakultät für Soziologie

Aspects of Validity: Scenario-Technique, Self-Report Social Desirability

The panel study ’Crime in the modern City’ [CRIMOC] focusses on the emergence and development of deviant and delinquent behaviour of adolescents resp. young adults. Self-report data is collected by a standardised questionnaire (paper-pencil method) since 2002 and carries all  characteristics of a typical quantitative survey. In order to overcome the limitations of retrospective self-report-data and to get a more situational view on the topic ‘(violent) reactions to experienced deviant behaviour’, an additional instrument was conceptualised: A verbal scenario is implemented in the questionnaire since 2013. It serves as a measurement for (hypothetical) reactions to deviant behaviour in a specific conflictual situation. This scenario gives the possibility to combine experimental methods with survey data and thereby expands the possibilities for different analyses. In this context, the linkage between the self-report measures and the scenario is of special interest. The given scenario is primarily expected to trigger certain scripts for suitable reactions to experienced deviant behaviour– at least from the respondents point of view. Due to the sensitive topic of the scenario, an evocation of external social norms (judgement of social environment) carrying aspects of Social Desirability is possible, too. With the application of Latent Class Analysis (LCA) this potential response bias can be anticipated. Based on the  suggestions of Eifler, Pollich and Reinecke (2015), patterns of answers to the scenario and the selfreports can be classified. As a result, this method gives the opportunity to separate honest from desirable answering patterns for further analyses.

EIFLER, Stefanie/ POLLICH, Daniela/ REINECKE, Jost (2015): Die Identifikation sozialer Erwünschtheit bei der Anwendung von Vignetten mit Mischverteilungsmodellen, in: Eifler, Stefanie/Pollich, Daniela (2015): Empirische Forschung über Kriminalität. Methodologische und methodische
Grundlagen. Wiedbaden: Springer VS, 217-247.



Vortrag im ZeSt

Am 18.07.2018 spricht Dr. Yuri Malitsky von der Morgan Stanley, USA im Rahmen des Kolloquiums des ZeSt. Der Vortrag findet zwischen 14:00 und 15:00 Uhr in Raum U3-140 statt. Weiter....

5. Nachwuchsworkshop des ZeSt vom 25. bis 27. Juni 2018

Weitere Informationen zum Nachwuchsworkshop finden Sie hier.

Bewerbungszeitraum Masterstudiengang Statistische Wissenschaften

Die Bewerbungsfrist für das Wintersemester 2018/19 beginnt am 1.6.2018 und endet am 15.7.2018. Weiter...