workshop9 - Universität Bielefeld

Breadcrumb überspringen und zum Hauptmenü wechseln

9. Nachwuchsworkshop

16. Juli 2021 findet an der Universität Bielefeld der 9. Nachwuchsworkshop des Zentrums für Statistik statt.

Im Rahmen des Nachwuchsworkshops stellen sich Doktorandinnen und Doktoranden der am Zentrum für Statistik beteiligen Bereiche gegenseitig ihre Forschungsfelder vor und diskutierten darüber.

Bei Interesse melden Sie sich bitte per E-Mail bei Dr. Nina Westerheide.

Programm des Nachwuchsworkshops

Vorträge der teilnehmenden Doktorandinnen und Doktoranden
(in alphabetischer Reihenfolge)

Marvin Bürmann
Universität Bielefeld, Fakultät für Soziologie

Social Capital or Transmission of Skills? - The Effect of Social Origin on Undereducation

While much research on educational mismatches focused on overeducation – i.e. having more education than required for a job –Wiedner and Schaeffer (2020) recently show that undereducation – i.e. having less education than required – plays a more important role than the avoidance of overeducation when explaining the effect of social origin on wages. Hence, it is important to understand why employees with high status parents are more likely to access higher positions despite their lack of educational performance. Although access to vacancies via the parental social network seems to be an obvious explanation, Wiedner and Schaeffer do not find any evidence for such an effect. Although they were able to explain a part of the social origin effect by cognitive abilities, other underlying mechanisms of the effect remain unclear. I analyse one further possible explanation: The intergenerational transmission of occupational skills. As they are very specific and difficult to measure, occupational skills are usually unobserved in cognitive ability tests. However, if the effect of social origin can additionally be explained by the fact that undereducated employees from higher status parents often work in the same occupation like their parents, I assume this to be due to the intergenerational transmission of occupational skills. Analyses are based on the German Socio-Economic Panel Study (SOEP) from 2000 to 2018. Germany is chosen because of its highly occupationally stratified labor market, which makes it the most likely case to observe an effect of intergenerational transmission of occupational skills on undereducation, if there is any.

Sebastian Büscher
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Econometrics vs. Machine Learning: Who is better at predicting human mobility patterns?

Discrete choice models and the random utility framework are regularly used in econometrics when it comes to explaining choices between a fixed set of options. They allow for relatively easy interpretation of the effects that different variables have on the choice model, helping policymakers in their decision process. When it comes to prediction, however, machine learning methods, such as Random Forests, Deep Learning, and Gradient Boosting, are emerging to be preferred due to their high flexibility and predictive accuracy. Part of the econometrics community, however, are sceptical about ML methods due to their mostly black-box nature. We want to compare the predictive performance of several machine learning methods with that of econometric methods on a data set of human mobility pattern choices, using socio-economic variables as regressors. This is done by measuring and comparing the out-of-sample scaled log-likelihood, the Brier score and the percentage of correctly predicted choices.

André Hottung
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Learning a Latent Search Space for Routing Problems using Variational Autoencoders

Methods for automatically learning to solve routing problems are rapidly improving in performance. While most of these methods excel at generating solutions quickly, they are unable to effectively utilize longer run times because they lack a sophisticated search component. We present a learning-based optimization approach that allows a guided search in the distribution of high-quality solutions for a problem instance. More precisely, our method uses a conditional variational autoencoder that learns to map points in a continuous (latent) search space to high-quality, instance-specific routing problem solutions. The learned space can then be searched by any unconstrained continuous optimization method. We show that even using a standard differential evolution search strategy our approach is able to outperform existing purely machine learning based approaches.

Paulo Emilio Isenberg Lima
Universität Bielefeld, Fakultät für Soziologie

From generational conflict to generational link? The environmental issue in families 1984-2019

More than almost any other topic, the environmental issue, which has experienced a renewed run since the success of the "Fridays for Future" protests, is characterized by a narrative of generational conflict, analogous to the German environmental movement in the 1970s.1 It describes a generational conflict of changing values from the conservative, economic growth and physical integrity-oriented values of the parents' generation to an emancipatory, ecologically oriented critique of economic and energy policies by the children's generation. The renaissance of this narrative is accompanied by indicators that show that history does not necessarily have to repeat itself. Rather, results of socialization research suggest that the intergenerational transmission of certain values causes value stability between generations. In addition, an intra-individual value stability of values internalized in childhood and adolescence is described over the individual life course. The aim of this contribution is therefore to empirically investigate intergenerational relations and a possible influence of value change with the help of the German Socio-Economic Panel (SOEP). The SOEP has been surveying households annually since 1984 on, among other things, attitudes toward environmental protection and commitment. Explicitly, the analysis is intended to map a line of conflict between opponents and proponents of environmental protection, between generations within families, or between families that have a tradition of environmental protection. It will also look at the historical development of this line of conflict since 1984, as the narrative of generational conflict in environmental movements has been challenged in the past.

Benedikt Langenberg
Fakultät für Psychologie und Sportwissenschaft, Abteilung Psychologie, Arbeitseinheit 6 - Psychologische Methodenlehre und Evaluation

Estimating and Testing Causal Mediation Effects in Single-Case Experimental Designs

In this talk, we present single-case causal mediation analysis as the application of causal mediation analysis to data collected within a single-case experiment. This method combines the focus on the individual with the focus on mechanisms of change, rendering it a promising approach for both mediation and single-case researchers. For this purpose, we propose a new method based on time-discrete longitudinal structural equation modeling to estimate the direct and indirect treatment effects. We demonstrate how to estimate the model for a single-case experiment on stress and craving in a routine alcohol consumer before and after an imposed period of abstinence. Furthermore, we present a simulation study that examines the estimation and testing of the standardized indirect effect. We use maximum likelihood and permutation procedures to calculate p-values and standard errors of the parameters estimates. The new method is promising for testing mediated effects in single-case experimental designs. We further discuss limitations of the new method with respect to causal inference, as well as more technical concerns, such as the choice of the time lags between the measurements.

Sina Mews
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Latent-state modelling of patients’ disease progression in continuous time based on medical claims data

Claims data are routinely collected for billing and reimbursement purposes of statutory health insurance companies and include resource use, costs, and diagnoses on the basis of real-life healthcare provisions. These large databases are increasingly used in health research, for example on treatment patterns, in epidemiological studies and recently, to investigate patients’ disease severity and its progression. Regarding the latter, multi-state models offer an appealing framework to analyse the evolution of disease activity through discrete (disease) states. As those disease states are often not directly observed or may contain classification errors, hidden Markov models (HMMs) provide a flexible approach to infer properties of the disease process underlying the observed data, such as the number of drugs prescribed or patients’ consultations. These models, however, necessitate the a priori specification of a finite number of disease states through which patients progress. While the number of states is often chosen according to clinically defined disease stages, the model’s states are derived in a data-driven way and thus, do not necessarily correspond to any pre-defined disease stages. Furthermore, selecting the number of states based on information criteria such as AIC or BIC is hardly feasible as those criteria usually point to a large number of states, hence impeding the model’s interpretability. Therefore, choosing an adequate number of (disease) states remains a challenging task. To overcome this limitation, we propose to use state-space models (SSMs) comprising continuous (disease) states instead. These models not only avoid the (rather restrictive) assumption of a discrete number of states, but also allow to model changes in patients’ health gradually, which could be more suitable for some health conditions. In this talk, I will compare HMMs and SSMs for analysing disease progression using claims data from patients diagnosed with Multiple Sclerosis. As claims data are collected at irregular points in time, i.e. only when a patient interacts with the healthcare system, both models are formulated in continuous time. I will present some preliminary results of the analysis, and in particular, focus on possible strengths and weaknesses of both model formulations.

Rouven Michels
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Using State Space Models for investigating time-varying team strengths in a live betting market

In a football betting market, it seems obviously clear that customers prefer placing bets on higher quality teams rather than on weaker ones. However, within a game and thus, in live betting markets, the term ‘strength’ can be measured in different ways. We want to examine whether stakes are driven by initial pre-game strengths or in-game strengths, where the latter is derived by actions taken place on the pitch, such as shots and passes. To tackle this question and to account for a team’s latent ‘momentum’ we consider state space models (SSM). Furthermore, to model the potentially time-varying effect of both variables related to teams’ strength, we make use of penalized B-Splines.

Julia Norget
Fakultät für Psychologie und Sportwissenschaft, Abteilung Psychologie, Arbeitseinheit 6 - Psychologische Methodenlehre und Evaluation

Latent state-trait models for experience sampling data: Introducing the R package lsttheory

Latent State-Trait (LST) Models are a useful tool to distinguish between measurement error, situational and stable influences. With the increasing availability and thus rising use of experience sampling methodology, LST models have been extended to accommodate for large numbers of measurement occasions. In this talk, we provide an overview of different LST models and discuss the different kinds of latent traits which can be included in the models (i.e. single trait, day-specific traits, indicator-specific traits), inclusion or non-inclusion of autoregression and different equivalence assumptions that can be imposed on these models. We present the R package lsttheory together with a shiny app which enables researchers to easily apply LST models to their data. The R-package and app will also allow for an integration of covariates in these models. We further illustrate the software with an empirical data example from the Interdependence in Daily Life Study (Columbus, Molho, Righetti & Balliet, 2020). For one week and seven times a day, 284 participants rated the situational interdependence in their last social encounter on five dimensions, measured with two items per dimension. The conflict of interests dimension is used for illustration. We demonstrate how to test the assumption that parameters are measurement invariant over time and show how to assess psychometric properties of the items with the LST variance components for a model with and without an additional covariate. The presentation concludes with a short discussion under which conditions (e.g. number of indicators, number of measurement occasions, reliabilities of the indicators, sample size) the presented LST models still converge and we offer advice and alternatives in case of non-convergence.

Lennart Oelschläger
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

On the initialization of multinomial probit models

The multinomial probit model is a widely-used and flexible tool to explain the choices that individuals make among a discrete set of alternatives. Understanding the driving factors behind these choices is of central interest in many scientific areas, for example in transportation and marketing. Fitting the model to observed data traditionally is achieved by numerically searching those parameters that maximize the model's likelihood function. However, since the number of model parameters rise quadratically with the number of choice alternatives and covariates, practitioners are quickly faced with numerical challenges like non-concavity, existence of local optima and the curse of dimensionality. Furthermore, the choice of an initial parameter vector, from where the numerical optimization is started, highly influences the speed of convergence and the rate of finding the global optimum. This talk presents alternatives to the naive random initialization approach that are based on alternating optimization and subsample estimation and share the same ambitions: easy to apply and substantially improve computation time.

Nele Stadtbäumer
Fakultät für Psychologie und Sportwissenschaft, Abteilung Psychologie, Arbeitseinheit 6 - Psychologische Methodenlehre und Evaluation

Predicting cancer patients' quality of life: Comparing the performance of machine learning methods using Monte Carlo simulation

Supervised machine learning (ML) has become a popular tool for covariate selection in large datasets. Many different approaches have been developed including regression, shrinkage, subset selection and tree-based methods. In this presentation, we demonstrate how to choose the right tool for data from a longitudinal study of cancer patients as an empirical example. Our goal here was to predict patients’ quality of life in the after-care at the time of diagnosis. We conducted a large simulation study, comparing and evaluating the performance of twelve different supervised ML approaches (e.g. ordinary least squares regression, ridge regression, the lasso, regression trees, random forests, bagged and boosted trees, stepwise regression). We alternated the effect sizes and the correlation structure of the predictors and tested for lower and higher order interactions and different sample sizes. As performance evaluation and prediction accuracy showed, some methods reacted more sensitively to sample size, interaction order and the effect sizes and correlation structure of the predictors than others. Forward stepwise regression, the lasso and all-pairs lasso outperformed the other ML methods in lower order interaction settings. When the data only included higher order interactions, boosted and bagged trees performed best at detecting relevant predictors.

Dorian Tsolak & Anna Karmann (joint project)
Universität Bielefeld, Fakultät für Soziologie

Utilizing Pose Estimation To Detect Gender Stereotypes In Images - A Case Study Using Instagram Data

In the recent decade, social media has been identified as an important source of digital trace data, reflecting real world behaviour in an online environment. Many researchers have analyzed social media data, often text messages, to make inferences about peoples attitudes and opinions. Yet many such opinions and attitudes are not saliently expressed, but remain implicit. One example are gender role attitudes, that are hard to measure using textual data. In this regard, images posted on social media such as Instagram may be better suited to analyze the phenomenon. Existing research has shown that men and women differ in how they portray themselves when being photographed (Goffman 1979, Götz & Becker, 2019, Tortajada et al., 2013). Our study is concerned with the question how images from social media containing gender self-portrayal can be harnessed as a measure of gender role attitudes. We rely on a subset of a data set consisting of about 800,000 images collected from Instagram in 2018. We present a new approach to quantify gender portrayal using automated image processing. We use a body pose detection algorithm to identify the 2-dimensional skeletons of persons within images. We then cluster these skeletons based on the similarity of their body pose. As a result we obtain a number of clusters which can be identified as gender typical poses. Examples of typical female body poses include S-shaped body poses reflecting sexual appeal, the feminine touch (touching the own body or hair) implying insecurity, or asymmetric body posture representing fragility. Typical male body poses include the upper body facing the camera square to show strength, or a view aimed into the distance signifying pensiveness. The (self)-portrayal of women and men has been an active field of research across various disciplines including sociology, psychology and media studies, but has usually been analyzed by qualitative means using small, manually labeled data sets. We provide an automated approach that allows for a quantitative measurement of gender role attitudes within pictures by examining gender portrayal via body poses. Our results contribute to a better understanding of online/social media gender reproduction mechanisms.

David Winkelmann
Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Dynamic stochastic inventory management in e-grocery retailing

Inventory management optimisation in a multi-period setting with dependent demand periods requires the determination of replenishment order quantities in a dynamic stochastic environment. Retailers are faced with uncertainty in demand and supply for each demand period. In grocery retailing, perishable goods evoke stochastic spoilage and further amplify uncertainty. Assuming a lead time of multiple days, realisations of these stochastic determinants lead to a stochastic inventory at the beginning of the period in which the order is supplied. While existing contributions in the literature focus on the effect of only one of these stochastic components, we propose to integrate all of them into a joint framework, explicitly modelling demand, supply shortages, and spoilage using suitable probability distributions learned from data. As the resulting optimisation problem is analytically intractable in general, we use dynamic optimisation methods incorporating Monte Carlo techniques to fully propagate the associated uncertainties in order to derive replenishment order quantities. We develop a general inventory management framework and investigate the importance of modeling each source of uncertainty with an appropriate probability distribution.
Additionally, we conduct sensitivity analyses with respect to the different model determinants. Using data from an European e-grocery retailer our model is applied to a business case where we combine parameter estimation for the underlying probability distributions with our optimisation approach to verify the practical application of our results. Our findings acknowledge the importance of properly modeling stochastic variables using suitable probability distributions for a cost-effective dynamic inventory management process.

Houda Yaquine

Universität Bielefeld, Fakultät für Wirtschaftswissenschaften

Mixture of finite Polya trees model for serology tests. Case study: Covid-19 tests

We use a mixture of finite Polya trees Bayesian model to diagnose the status of disease and to quantify the discriminatory ability of the continuous tests used for Covid19 in the case of having no-gold standard data. We get estimations of the ROC curves associated to each test, the optimal thresholds as well as the predictive probabilities of the disease.