workshop12 - Universität Bielefeld

Breadcrumb überspringen und zum Hauptmenü wechseln

12. Nachwuchsworkshop

7. September 2023 fand an der Universität Bielefeld der 12. Nachwuchsworkshop des Zentrums für Statistik statt.

Im Rahmen des Nachwuchsworkshops haben sich Doktorandinnen und Doktoranden der am Zentrum für Statistik beteiligen Bereiche gegenseitig ihre Forschungsfelder vorgestellt und darüber diskutiert.

Bei Interesse melden Sie sich bitte per E-Mail bei Dr. Nina Westerheide.

Programm des Nachwuchsworkshops

Vorträge der teilnehmenden Doktorandinnen und Doktoranden
(in alphabetischer Reihenfolge)

Matthieu Bulte
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Autoregressive Models for Time Series of Random Objects

Random variables in metric spaces indexed by time and observed at equally spaced intervals are receiving increased attention due to their broad applicability. However, the absence of inherent structure in metric spaces has resulted in a literature that is predominantly non-parametric and model-free. To address this gap in models for time series of random objects, we introduce an adaptation of the classical autoregressive model tailored for data lying in a Hadamard space. The parameters of interest in this model are the Fr\'echet mean and a correlation parameter, both of which we prove can be consistently estimated from data. Additionally, we propose a test statistic and establish its asymptotic normality, thereby enabling hypothesis testing for the absence of autocorrelation. Finally, we introduce a bootstrap procedure to obtain critical values for the test statistic under the null hypothesis. Our theoretical findings are illustrated by extensive numerical studies.

Sebastian Büscher
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

How to test for intra-personal heterogeneity and temporal effects in discrete choice models in an economic manner

Testing for temporal effects or intra-personal heterogeneity in discrete choice models typically involves the estimation of complex, dynamic discrete choice models incorporating all effects that need to be tested for. Not only is this computationally expensive, but it can also be difficult to implement in existing software, depending on the effects and dynamics required. Inspired by the Lagrange Multiplier test, we present a new way to test for temporal effects and intra-personal heterogeneity in discrete choice models by exploiting the structure of Composite Marginal Likelihood (CML) estimation. In contrast to standard maximum likelihood estimation, CML estimators use pairwise marginal likelihoods rather than the joint likelihood over all observations of an individual. We use the gradient contributions of the individual pairwise CML margins to test for distributional differences between groups of these margins, indicating temporal effects or intra-personal heterogeneity not accounted for in the model. The effectiveness of the test is demonstrated using synthetic data sets with different temporal effects built into the data generating process.

Kurtulus Kidik
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Exploring High-Dimensional Matrix-Valued Time Series: Methods and Applications

Matrix-valued or matrix-variate time series (MaTS) data consists of observations over time in the form of a matrix. In other words, MaTS data can be represented as a sequence of matrices where each matrix corresponds to a time point and captures the relationships between multiple variables. They are useful for capturing complex and dynamic phenomena in various fields, such as economics, finance, meteorology, signal processing, biology, neuroscience, medical imaging, social networks, IT communication, geostatistics and many others. This kind of data is more complex than traditional univariate or multivariate time series data because it involves analyzing the dynamics and dependencies within and between matrices. MaTS and modeling has growing interest in the time series econometrics area. Therefore, modelling matrix-valued time series has become an interesting and important research topic in recent years. Because; for example, the increasing level of economic and financial integration has made it crucial to view economies as a unified, extensive and interconnected entity. Within this context, the primary difficulties revolve around the analysis and interpretation of large panels in which the entities can be depicted as countries, observed through multiple indicators over time. At this point, MaTS models offer a promising approach to tackle these challenges. However, modeling and analyzing of matrix-valued time series poses significant challenges due to their high dimensionality and intricate dependence structure. This talk briefly introduces matrix-valued time series and discusses recent developments in MaTS, related models, applications, extensions and limitations in the literature. Additionally, we propose a framework for modelling and estimation procedure for MaTS that allows us to exploit the matrix structure and achieve dimensional reduction and interpretability. Furthermore, we address computational challenges by proposing efficient algorithms for more accurate analysis, enabling practitioners to efficiently process massive amounts of data.

Simon Lütkewitte
Universität Bielefeld, Fakultät für Soziologie, Bielefeld Graduate School for History and Sociology (BGHS)

Sports Participation and Gender Homophily: Longitudinal Evidence for Male Adolescents based on German Friendship Network Data

By using longitudinal friendship network data from Germany, and by differentiating between team sport participation and participating sports in other organized settings, I investigate the causal relationship between sports participation and gender-homophily for the group of male teenagers. In my analysis, I use the share of male friends, and the number of male/female friends as gender-homophily-related measures. By calculating logistic regression models, I investigate whether having a high share of male friends (having a high number of male/ female friends) explains selection into / dropout effect from these sports. Additionally, fixed effects regression models were performed to see whether within-individual change in sports participation can predict change in gender-homophily in these male teenagers. Preliminary findings show that boys with many male friends are more likely to select themselves into team sports and are less likely to drop out. However, when looking at the share of male friends this finding is not significant. This can be explained by the fact that male teenagers with few friends also have on average a high share of male friends. In the fixed effect models, we do not observe significant effects (neither for team sport nor for non-team sports). Results speak against the hypothesis that team sports participation fosters gender-homophily in male teenagers during adolescence. Findings rather suggest that the absolute size of the friendship network of male friends is a better predictor for predicting selection and dropout effects concerning team sport participation.

Lennart Oelschläger
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Initialization of Numerical Optimization in R

When involved in maximum likelihood estimation, statisticians inevitably encounter numerical optimization, where both the outcome and computation time are greatly influenced by the optimization method and initial values. The {ino} R package facilitates easy comparisons of optimization methods and initialization strategies. This will be illustrated in the talk through three specific applications: 1) the contrasting performance between the Expectation-Maximization algorithm and gradient-based methods for mixture models, 2) strategies for circumventing local likelihood optima in hidden Markov models, and 3) the acceleration of probit model estimation. Additionally, a novel initial estimator for probit model estimation is introduced. This estimator leverages the constant utility direction in the latent utility space for alternative-varying coefficients, thus providing statistically consistent initial estimates that can be computed very quickly.

Predrag Pilipovic
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Addressing Hypoellipticity and Partial Observation Issues in Second-Order SDEs using the Strang Estimator

Stochastic differential equations (SDEs) are powerful tools for modeling complex dynamical systems. We propose a parameter estimation framework based on the Strang splitting scheme for the second-order nonlinear SDEs. The Strang splitting scheme is a numerical approximation of the solution to SDE that allows us to construct a pseudo-likelihood function for maximum likelihood estimation (MLE) of the parameters. We start by transforming the second-order SDE into a system of first-order SDEs by introducing an auxiliary velocity variable. Therefore, the challenges of hypoellipticity and partial observation arise. If we assume that we observe the auxiliary velocity, the resulting Strang estimator is consistent and efficient, unlike the Euler-Maruyama-based estimator, which does not exist due to hypoellipticity. However, in practical scenarios where the velocity variable is unobserved, we approximate it using finite difference methods. This approximation leads to a loss of efficiency, resulting in higher asymptotic variance for the Strang estimator. However, this is an expected and common property for all discrete MLE under the partial observation case.

Sophie Schmiegel
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

CRESCENT: Stratification of Chronic Pain Patients using Machine Learning

Medical secondary care units often struggle with long waiting times for patient appointments. In rheumatology, a waiting time of several months until a patient is first seen can lead to impairments in well-being and health: A delayed initiation of an effective anti-inflammatory disease modifying treatment may have serious health consequences, e.g. irreparable joint damage in patients with rheumatoid arthritis. In addition, patients suffer from persistent pain while not receiving adequate treatment. Muskuloskeletal pain may originate broadly from three prevalent conditions: an inflammatory rheumatic disease, osteoarthritis and chronic pain syndromes. Long waiting times occur when primary care units experience difficulties in attributing pain to a specific disease and thus refer a large number of patients to rheumatology, many of them actually suffering from chronic pain conditions such as fibromyalgia. The aim of this study is to improve the stratification of inflammatory rheumatic diseases, chronic pain disorders and osteoarthritis, being well aware of the fact that these conditions are not mutually exclusive, but may occur simultaneously. To address this research question, we applied to all patients presenting to rheumatology a simple questionnaire targeting patients with fibromyalgia and measured C-reactive protein, a laboratory blood marker for inflammation. Based on these two easily measurable variables we attempt to improve the dis- ease stratification. Furthermore, we use machine learning such as binary or multi-label classification and analyse patient data under special consideration of vital signs, laboratory values and questionnaire information. Improving the stratification of pain patients can help to correctly identify patients in need of anti-inflammatory therapy, which will increase the patients’ life quality.

Qingchuan Sun
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Stochastic Differential Equations: Elevating Parameter Estimation Robustness in Dynamic Modeling

Dynamic modeling using Ordinary Differential Equation (ODE) and Stochastic Differential Equation (SDE) models plays a pivotal role in various fields of science. In this paper, we delve into the robustness of parameter estimation in these two models. We compare the robustness of parameter estimation using widely used methods, i.e., least squares estimation for ODE models and maximum likelihood estimation for SDE models. Using two examples - one-dimensional linear ODE and Ornstein–Uhlenbeck processes, and deterministic and stochastic SIR models - this study scrutinizes the robustness of parameter estimation under different scenarios of model misspecification. Using simulated data, we find that the SDE model provides stable parameter estimates that are similar to the true parameter values, even when the data come from different models. In addition, we conducted the test using COVID-19 data from Denmark. It is noteworthy that the SDE model consistently produces more robust parameter estimates for the SIR and SEIR models for datasets of different quality. This study highlights the potential of the SDE model in capturing the features of a dynamic system, demonstrating its ability to produce robust and accurate parameter estimates under different scenarios.

David Winkelmann
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Momentum effects in team sports: Analysing the interaction of offence and defence in the NBA

In sports, momentum is a crucial factor that can swing games in favour of one team. This effect is related to the so-called hot hand, referring to a state of exceptional abilities compared to the average performance of an athlete. Translated to team sports, momentum effects refer to the tendency for a team instead of a single player to maintain a positive or negative streak during a game. In previous literature, momentum effects have only been investigated for offensive performances, despite the fact that in most team sports defensive actions take place between two successive offensive attacks. To overcome this limitation, we jointly model offensive and defensive performances in basketball and investigate whether the interaction leads to a leveraged team-level momentum. We embed play-by-play data from the NBA in a state-space framework and address the complexity of the data structure in the following manner: Firstly, we analyse momentum effects separately for the offence and defence with team-fixed effects. Secondly, in the joint framework, we still model the offensive and defensive performance individually but rely on a single latent process impacting both models. This allows us to analyse the interaction between the offence and defence when considering momentum.