skip to main contentskip to main menuskip to footer Universität Bielefeld Play Search

13. Nachwuchsworkshop

Am 16. Februar 2024 findet an der Universität Bielefeld der 13. Nachwuchsworkshop des Zentrums für Statistik statt.

Im Rahmen des Nachwuchsworkshops stellen sich Doktorandinnen und Doktoranden der am Zentrum für Statistik beteiligen Bereiche gegenseitig ihre Forschungsfelder vor und diskutieren darüber.

Bei Interesse melden Sie sich bitte per E-Mail bei Dr. Nina Westerheide.

Programm des Nachwuchsworkshops

 

Vorträge der teilnehmenden Doktorandinnen und Doktoranden
(in alphabetischer Reihenfolge)

 

Jonas Bauer
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

(Dis-)similarities of finite and infinite mixtures in clustering

Mixture models are widely used for clustering in many application areas. While infinite mixture models (IMMs) consist of an unbounded number of components K, finite mixture models (FMMs) require the modeller to set this important parameter in advance. As a result, IMMs have gained attention as an attractive alternative to avoid setting K. Since then much research has been devoted to investigating the consistency of IMMs, particularly in the context of model misspecification and kernel misspecification. To date, the question of when to use IMMs and when to avoid them has not been fully answered. In this talk, we discuss the fundamental differences between the two modelling approaches, parameterisations to make them similar, and the problems that can arise under misspecification. Thereby we derive probability distributions for the number of observed clusters under both models. These distributions can be used for a data-driven decision on which model to prefer.

 

Sebastian Büscher
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

New Lagrange Multiplier type test for CML models

Lagrange multiplier tests (LM tests), Wald tests, and likelihood-ratio tests form the backbone of statistical testing within the maximum likelihood framework. Whilst the Wald and likelihood-ratio tests require estimation of the unrestricted model to discriminate between two nested models, the LM tests rely on the gradient information of the unrestricted model at the maximum likelihood estimate of the restricted model. This method is particularly useful when testing for pooling within a population. By calculating the gradient contribution of each individual, an LM test can be performed to test for differences between groups of individuals without the need to estimate additional models or to implement extra dummy variables for the different groups into the model. Multiple testing for different groupings of individuals is, therefore, computationally efficient. In this talk, it will be demonstrated how this concept can be extended to Composite Marginal Likelihood (CML) estimation, enabling LM testing not only on an individual level but also on the finer level of pairs of observations. This allows for testing of heteroscedasticity within the observations of individuals, such as temporal effects or unaccounted auto-regressive error structures, without the need to explicitly model them. Finite sample simulation studies support the theoretical foundations of this new Lagrange multiplier type test by confirming the theoretical distribution of the proposed test statistics under the Null hypothesis, as well as by evaluating the statistical power of the test given different types and severities of violations of the Null hypothesis.

 

Carlina Feldmann
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Mixtures of hidden Markov models for joint inference from time series showing distinct behavioural patterns

Statistical analyses in ecology are often concerned with individual variation, a crucial indicator for species fitness amid environmental changes like climate change. For gaining insight into variation in animals’ behaviour from time series data, the widely used hidden Markov models (HMMs) can be extended to include random effects. However, the resulting mixed HMMs may lack flexibility, as they usually model variation only between some individual parameters. To extend the flexibility offered by (discrete) random effects, a mixture of HMMs is proposed that allows all model parameters and even the model structure to vary across subgroups. The proposed model will be employed to investigate the intraspecific variability in foraging strategies of Galápagos sea lions. While in a previous analysis of the data Schwarz et al. (2021) employed a two-step approach involving a cluster analysis to categorize Galápagos sea lions into three subgroups and subsequent HMMs to learn about their unique foraging strategies, the proposed mixture of HMMs integrates both steps into one model to jointly infer group membership and characteristics. Preliminary results and the challenge of choosing starting values for this model will be discussed.

 

Kurtulus Kidik
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Autoregressive models for matrix valued time series with multiple terms

Matrix-valued (variate) time series (MaTS) data are becoming increasingly available in a variety of fields such as economics, finance, psychology, biology, social science, environmental science, functional Magnetic Resonance Imaging (fMRI) studies, etc. Many matrix sequences are observed over time, displaying serial dependence that contains valuable insights for modelling and predicting future outcomes. Bilinear autoregressive models are one approach to analyzing matrix time series. In this context, autoregressive models for such data typically involve only one term per time lag, obtained by pre- and post- multiplying the matrix observations at lag j with square matrices. The corresponding vectorized time series then possesses lag matrices, which are Kronecker products between the two square matrices. However, this imposes limitations on flexibility. Introducing terms for each lag enhances modeling flexibility, but concurrently exacerbates identifiability issues. To address this issue, a novel identification procedure is proposed, and examined for its properties. This identification scheme is integrated in an alternating optimization method, leading to consistent estimates and specifying integer-valued parameters such as lag length and the number of terms. The model is illustrated using economic data, with a focus on both stationary and integrated cases, which is particularly pertinent for economic applications and various other areas.

 

Jan-Ole Koslik
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Inference on the state process of periodically inhomogeneous hidden Markov models

Over the last decade, hidden Markov models (HMMs) have become increasingly popular in statistical ecology, where they constitute natural tools for studying animal behavior based on complex sensor data. Corresponding analyses sometimes explicitly focus on — and in any case need to take into account — periodic variation, for example by quantifying the activity distribution over the daily cycle or seasonal variation such as migratory behavior. For HMMs including periodic components, we discuss important mathematical properties that allow for comprehensive statistical inference related to periodic variation, thereby also providing guidance for model building and model checking. Specifically, we derive the periodically varying unconditional state distribution as well as the time-varying and overall state dwell-time distributions — all of which are of key interest when the inferential focus lies on the dynamics of the state process. We will find, that the dwell-time distributions of periodically inhomogeneous HMMs can deviate substantially from a geometric distribution, thus compensating biologically unrealistic consequences of the Markov property.

 

Rouven Michels
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Exploring team-level momentum effects: a study of offensive and defensive performances in the NBA

The phenomenon of momentum has been extensively explored in various studies. However, these are mostly restricted to offensive performances of teams whereas there is limited evidence concerning the interaction between offense and defence of teams. Using play-by-play data from NBA seasons 2015/16 to 2018/19, our objective is to investigate potential team-level momentum effects. Employing a state-space model, we initially analyse offensive and defensive performances independently, and subsequently integrate them within a joint framework. Our findings reveal the absence of significant momentum effects in both individual state processes. Conversely, a positive and significant momentum effect is observed in the combined model. This indicates that a successful defensive performance influences the offense and vice versa. Additionally, it underlines the necessity of incorporating the interaction between offense and defence for an accurate analysis of momentum in team sports.

 

Lennart Oelschläger
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Modeling unobserved choice behavior heterogeneity - Comparing methods in terms of support recovery and estimation speed

In many studies on decision-making, it is crucial to model variations in interpersonal preferences, as this plays a vital role in making accurate policy recommendations. These variations are often not directly observable and are only partially explained by exogenous regressors. Researchers commonly address this by modeling unobserved heterogeneity as random variation in parameters, where parameters differ across decision-makers according to an unknown distribution estimated from available data. Recent advancements have introduced various tools for this purpose, including parametric and non-parametric approaches to model the mixing distribution, as well as frequentist and Bayesian methods for estimating these models. However, the trade-off between reliable recovery of the true distribution's support and numerical feasibility remains a relatively unexplored aspect. In this talk, we compare Bayesian and frequentist estimation methods, specifically focusing on a parametric normal mixing distribution within multivariate probit models. Our analysis aims to answer three key questions: Which method is faster and more stable? What are the implications on out-of-sample predicted choice probabilities? How do these outcomes depend on data characteristics, such as dimensionality and correlations?

 

Katrin Rickmeier
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Exploring the Impact of AI on Skill Demand: A Spatial Analysis of German Job Vacancies

Although artificial intelligence (AI) is becoming increasingly popular among the general public, there is little evidence of its impact on the workplace. This study aims to investigate the effects of AI on labour demand in Germany by analysing changes in skill requirements using online job vacancy data. We are employing quantitative methods to analyse job requirements, relying on job vacancy data sourced from online platforms covering the near-universe of posted vacancies. To provide a comprehensive understanding, we use a disaggregated approach to examine industry-specific trends and regional variations. As the labour market continues to evolve, our findings contribute to the ongoing dialogue on the societal implications of technological advancements, particularly within the context of Germany's distinct regional and industrial dynamics. First analyses of the introduction era of ChatGPT indicate a shift in skill demand patterns. The initial results suggest a decrease in the need for IT skills, which is in line with prior research, implying that employers are adjusting their skill requirements in response to the widespread adoption of AI. Examining the regional disparities in the effect of the recent rise in AI prevalence in the workplace on current skill demand offers a crucial viewpoint on the changing dynamics of the labour market. By centring our research on job vacancy data, this study aims to contribute to a more nuanced comprehension of the regional implications of technological adoption in Germany. Furthermore, this study has the potential to aid in the formulation of effective policies that promote balanced economic growth and opportunities across diverse regions.

 

Ferdinand Stoye
Universität Bielefeld, Medizinische Fakultät OWL, Arbeitsgruppe 12 - Biostatistik und Medizinische Biometrie

A discrete time-to-event model for the meta-analysis of full ROC curves

The development of new statistical models for the meta-analysis of diagnostic test accuracy studies is still an ongoing field of research, especially with respect to summary receiver operating characteristic (ROC) curves. In the recently published updated version of the "Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy", the authors point to the challenges of this kind of meta-analysis and propose two approaches. However, both of them come with some disadvantages, such as the non-straightforward choice of priors in Bayesian models or the requirement of a two-step approach where parameters are estimated for the individual studies, followed by summarizing the results. As an alternative, we propose a novel model by applying methods from time-to-event analysis. To this task we use the discrete proportional hazard approach to treat the different diagnostic thresholds, that provide means to estimate sensitivity and specificity and are reported by the single studies, as categorical variables in a generalized linear mixed model, using both the logit- and the asymmetric cloglog-link. This leads to a model specification with threshold-specific discrete hazards, avoiding a linear dependency between thresholds, discrete hazard, and sensitivity/specificity and thus increasing model flexibility. We compare the resulting models to approaches from the literature in a simulation study. While the estimated area under the summary ROC curve is estimated comparably well in most approaches, the results depict substantial differences in the estimated sensitivities and specificities. We also show the practical applicability of the models to data from a meta-analysis for the screening of type 2 diabetes.

 

Dora Tinhof
Universität Bielefeld, Fakultät für Psychologie und Sportwissenschaft, Abteilung Psychologie, Arbeitseinheit 6 - Psychologische Methodenlehre und Evaluation

Evaluating Construct Measurement and Validity with Complex Study Designs: An Illustration of Challenges and Benefits

Recent discussions surrounding the replication crisis in psychological research stress the importance of thoroughly evaluating the steps preceding hypothesis testing to ensure meaningful and replicable inferences. Such steps entail formalizing psychological theories, assessing the fulfillment of underlying assumptions, refining and extending study designs, and validating construct measurement. The latter two aspects comprise the core topics of this talk. Both benefits and challenges of evaluating construct measurement and validity within complex study designs are illustrated based on an application of the German Big Five Inventory-2 (Danner et al., 2019) in a longitudinal multi-rater, multi-situation study. Measurement invariance (MI) testing and reliability assessment are essential for establishing the validity and accuracy of measurement instruments and consequently for the interpretability of results. Accordingly, the current study evaluates MI and reliabilities across all measurement points and combinations of raters and situations, facilitating an in-depth assessment of construct reliability, stability, and validity. Challenges regarding the definition and handling of partial measurement invariance are addressed, alongside necessary deviations from this study’s preregistration, underscoring the importance of a systematic approach to pre-hypothesis testing steps. Two key messages are conveyed in this talk. Firstly, some of the methodological obstacles encountered in striving for a more open scientific process are being highlighted. Secondly, the replicability benefits gained by implementing comprehensive study designs are emphasized and insights into overcoming associated challenges are provided.

 

Julian Wäsche

Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Optimization approaches for the experimental design of leukaemia treatment studies

Leukaemia is the most frequent type of cancer for paediatric patients. In pre-clinical trials, genetically modified cancer cells are studied in mice to provide a basis for possible new treatments. For this kind of experiment, only one measurement of tumour load can be taken per animal, as it has to be sacrificed. This talk depicts optimization approaches to select measurement time points efficiently. The focus here is on methods based on profile likelihoods and the Fisher information matrix. Both approaches aim to use as few animals as possible, but at the same time enable reliable parameter estimation for an underlying logistic growth model.

 

Houda Yaqine
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Capturing Household-level Dynamics in Stochastic Differential Equation Models of Infectious Disease Transmission

In this study, we introduce a new Susceptible-Infectious-Recovered (SIR) compartmental model that integrates household structures within the population framework, through a set of coupled stochastic differential equations (SDEs). These equations are designed to systematically quantify the temporal evolution of susceptible, infected, and recovered individuals, factoring in the stochastic nature of disease transmission dynamics. Our SIR model focuses on the concept of heterogeneous mixing, characterized by average contact rates among distinct population subgroups. We have specifically delineated two primary mixing patterns: public and within-household interactions through the  development of detailed contact matrices. To assess the model’s responsiveness and robustness, we conducted a comprehensive global sensitivity analysis, focusing particularly on parameters that influence the social behavior within sub-populations. By comparing our model with a deterministic equivalent, we underscore the critical role of stochastic elements in capturing aspects of disease spread dynamics that deterministic models may overlook, thereby highlighting the nuanced understanding offered by incorporating stochastic processes.

 

 

 

 

 


Zum Seitenanfang