# UU statistics research seminar series

The Department of Statistics organises weekly seminars during working periods of every semester. The seminars take place in seminar room, H317, on Wednesdays, starting 10:15. Interested participants are welcome.

The seminars cover all research areas actively pursued at the department and both in theory and applications. For the major part of the seminar series national and international speakers are invited with the aim to connect the department with colleagues across Sweden and abroad that work in areas of mutual interest. The reset of the series in each semester is reserved for doctoral students who present the latest findings of their research.

For questions and more information about the UU Statistics seminars, please contact Rauf Ahmad.

## Statistics Seminars Autumn 2023

## Statistics seminars spring 2023

Table of contents:

### 2023-05-24 (review seminar): Posterior rate of convergence for composite quantile regression

Speaker: Lukas Arnroth, Department of Statistics, Uppsala University. Time and place: 2023-05-24 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

Composite quantile regression is based on the convex combination of single quantile quantile loss functions and enjoys many advantages over single quantile regression. The Bayesian extension is based on the finite mixture of asymmetric Laplace densities. This paper mainly aims to contribute to the theoretical justification of Bayesian composite quantile regression from the perspective of Bayesian density estimation. As such, we further show that the asymmetric Laplace distribution can be used for Bayesian density estimation in general. We obtain upper bounds on rates of convergence for mixtures of asymmetric Laplace densities. For finite mixtures we obtain the parametric rate up to a logarithmic factor, and a slower rate for infinite mixture.

### 2023-05-17 (review seminar): Can Model Averaging Improve Propensity Score Based Estimation of Average Treatment Effects?

Speaker: Valentin Zulj, Ddepartment of Statistics, Uppsala University. Time and place: 2023-05-17 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

In drawing causal inferences from observational data, researchers often need to model the propensity score. To date, the literature on the estimation of propensity scores is vast, and includes covariate selection algorithms as well as super learners and model averaging procedures. The latter often tune the estimated scores to be either very accurate or to provide the best possible result in terms of covariate balance.

This paper focuses on using inverse probability weighting to estimate average treatment effects, and makes the assertion that the context requires both accuracy and balance to yield suitable propensity scores. Using Monte Carlo simulation, the paper studies whether frequentist model averaging can be used to simultaneously account for both balance and accuracy in order to remove some bias from estimated treatment effects. The simulations suggest that the combined procedure does not result in a consistent or substantial reduction of bias.

### 2023-05-03: Penalized QMLE with parameters on the boundary and an application to the class of ARCH(Q) for large Q

Speaker: Anders Rahbek, University of Copenhagen, Denmark. Time and place: 2023-05-03 at 10:15–11:30, Ekonomikum room H317.

Abstract to be announced.

### 2023-04-19: Relative survival and other summary measures of survival useful for population-based cancer data

Speaker: Therese Andersson, Karolinska institutet (KI), Stockholm. Time and place: 2023-04-19 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

I will introduce the field of population-based cancer survival analysis and its role in cancer control. I will especially cover the concept of relative survival and why it is often preferred over cause-specific survival for the study of cancer patient survival using data collected by population-based cancer registers. I will also present different summary measures of cancer-patient survival, such as, the loss in life expectancy due to cancer and crude vs net probabilities of death. Each of these measures show different aspects of cancer patient survival, and examples from published population-based studies will be presented and discussed.

### 2023-04-12: Cross-Lingual Dependency Parsing

Speaker: Sara Stymme, Department of Linguistics and Philology, Uppsala University. Time and place: 2023-04-12 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

Lately, there has been an increasing amount of work on cross-lingual learning, and how models for a target language can be improved using data from other languages. In this talk, I will focus on dependency parsing, the task of constructing a syntactic tree for an input sentence.

The resource Universal Dependencies, with harmonised treebanks for more than 100 languages, serves as a great test bed for cross-lingual learning. I will show how we can improve results for a specific language, by including training data from one or more other languages. This is useful particularly for low-resource languages, but I will show that it can be useful also for high-resource languages, especially when there is no in-language data for the target domain. I will also discuss how to choose suitable transfer languages for a given target language.

### 2023-03-22: Variational inference of dynamic factor models with arbitrary missing data

Speaker: Erik Spånberg, Department of Statistics, Stockholm University. Time and place: 2023-03-22 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

Many forecasting institutions deal large multifaceted data sets, with time series of different frequencies, sample sizes, publication dates and general availability patterns. The same institutions are often under crucial time constraints, limiting their analytical capabilities. Dynamic factor models (DFMs) are popular tools for analysing large data sets, however they are often estimated by point-estimate methods, disregarding parameter uncertainty. Parameter uncertainty can be addressed by Bayesian inference, but that may be too computationally costly. Variational inference is a method that approximates Bayesian inference. I show that it can be applied to DFMs – including arbitrary availability patterns in the data – with large computational gains. This allows for deeper and more versatile analysis for the most fast-paced forecasting institutions.

### 2023-03-08: Distances for gene-expression transcriptomics: Metric clustering of mRNA transcripts

Speaker: Jim Blevins, Department of Statistics, Uppsala University. Time and place: 2023-03-08 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

In molecular-biology, the expression of genes is measured for hundreds or thousands of genes for each observational unit (tissue or microbes). Genes with similar vectors of gene expression (mRNA transcripts or transcriptomics) are paired; gene-pair similarity is quantified using metrics on mRNA vectors. Using such distances on paired genes, biostatisticians identify gene sets (and gene modules) for further investigation. We develop metrics for the identification of interesting gene sets. We relate our metric approach to statistical methods --- e.g., the least-squares methods of Cramér, Wold, and Whittle.

### 2023-02-08: Modelling brain connectivity using hierarchical VAR models

Speaker: Anders Lundquist, Umeå University. Time and place: 2023-02-08 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

Analysis of brain connectivity is important for understanding how information is processed by the brain. We propose a hierarchical vector autoregression (VAR) model for analysing brain connectivity, modelling so-called functional and effective connectivity simultaneously and allows for both group- and single-subject inference as well as group comparisons. We illustrate our approach in a resting-state fMRI data set with autism spectrum disorder (ASD) patients and healthy controls, and compare with similar models used in existing connectivity literature.

In the talk, I will spend a fair amount of time introducing the application before diving into the statistical issues. This is joint work with Bertil Wegmann (LiU), Anders Eklund (LiU), and Mattias Villani (LiU/SU).

## Statistics seminars autumn 2022

Table of contents:

### 2022-12-14 (PhD review seminar): Objective Causal Inferences based on Real World Data: A comparative effectiveness evaluation of abiraterone acetate against enzalutamide

Speaker: Paulina Joneus, Department of Statistics, Uppsala University. Time and place: 2022-12-14 at 10:15–12:00, Ekonomikum room H317.

**Abstract**

Regulatory authorities are recognizing the need for real-world evidence (RWE) as a complement to randomized controlled trials (RCT) in the approval of drugs. However, RWE need to be fit for regulatory purposes. There is an ongoing discussion regarding if the pre-publication of a protocol on appropriate repositories, e.g., ClinicalTrials.gov, would increase the quality and objectivity of RWE, as is the case for RCT. This paper illustrates that an observational study based on a pre-published protocol can entail the same level of detail as a protocol for a randomized experiment.

The strategy is exemplified by designing a comparative effectiveness evaluation of abiraterone acetate (AA) against enzalutamide (ENZ) in clinical practice. These two cancer drugs are prescribed to patients with advanced prostate cancer. Two complementary designs, including pre-analysis plans, were published before data on outcomes and proxy-outcomes were obtained. The underlying assumptions are assessed using the proxy-outcomes, and both analyses show an increased mortality risk from being prescribed AA compared to ENZ.

### 2022-11-30 (PhD review seminar): Selection bias: An R package for bounding the selection bias

Speaker: Stina Zetterström, Department of Statistics, Uppsala University. Time and place: 2022-11-30 at 10:15–12:00, Ekonomikum room H317.

**Abstract**

Selection bias is a systematic error that can occur when subjects are included or excluded in the analysis based upon some selection criteria for the study population. This bias can threaten the validity of the study, and methods for estimating the effect of selection bias are desired. One method of estimating the effect of selection bias is through sensitivity analysis, and one type of sensitivity analysis is bounding the bias.

In this work, we present an R package that can be used to calculate two such previously proposed bounds for selection bias. One bound is based on assumptions of values of sensitivity parameters, and this bound is referred to as the SV bound. The other bound is based solely on the observed data, and is therefore referred to as the assumption free (AF) bound. Furthermore, we derive feasible regions for the sensitivity parameters as well as conditions for the SV bound to be sharp, where sharp means that the bias can be as large as the bound. We illustrate both the R package and the sharpness of the bound with a simulated dataset that emulates a study where the effect of zika virus on microcephaly in Brazil is investigated.

### 2022-11-23: Modelling consensus emergence with nonlinear dynamics

Speaker: Yvette Baurne, Department of Statistics, Lund University. Time and place: 2022-11-23 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

The study of emergent, bottom-up, processes has long been of interest within organisational and group research. Emergent processes refer to how dynamic interactions among lower-level units (e.g. individuals) over time form a new, shared, construct or phenomena at a higher level (e.g. work group). To properly study emergence of shared constructs one needs models, and data, that both take into account variability across individuals and groups (multilevel), and variability over time (longitudinal).

We make three contributions to the modelling of the emergent process of consensus. First, we propose a formal definition of consensus emergence. Second, we identify two separate patterns of consensus emergence and introduce two models to account for these patterns; the Homogeneous Consensus Emergence Model (HomCEM) and the Heterogeneous Consensus Emergence Model (HetCEM). Third, we show how Gaussian Processes can be used to further extend the consensus emergence models, allowing them to capture nonlinear dynamics, on both individual and group level, in emergent processes.

### 2022-11-09: Identification of (seasonal) ARMA models revisited

Speaker: Johan Lyhagen, Department of Statistics, Uppsala University. Time and place: 2022-11-09 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

The standard approach nowadays when identifying a stationary ARMA model is to analyse the autocorrelation function (ACF) and the partial autocorrelation function (PACF). Then estimate the model and check the residuals for remaining autocorrelation, significance of parameters etc. If the model doesn’t pass the check, it is revised. One problem with this approach is that the ACF/PACF pattern in the residuals do not trivially carry over to the ACF/PACF implied by the model. In this paper, a graphical tool is derived aiming to help the identification of ARMA models.

### 2022-10-19: Variational inference for max-stable processes

Speaker: Alexander Engberg, Department of Statistics, Uppsala University. Time and place: 2022-10-19 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

Max-stable process provide natural models for the modelling of spatial extreme values observed at a set of spatial sites. Full likelihood inference for max-stable data is, however, complicated by the form of the likelihood function as it contains a sum over all partitions of sites. As such, the number of terms to sum over grows rapidly with the number of sites and quickly becomes prohibitively burdensome to compute.

We propose a variational inference approach to full likelihood inference that circumvents the problematic sum. To achieve this, we first posit a parametric family of partition distributions from which partitions can be sampled. Second, we optimise the parameters of that family in conjunction with the max-stable model to find the partition distribution best supported by data, and to estimate the max-stable model parameters. In a simulation study we show that our method enables full likelihood inference in higher dimensions than previous methods, and is readily applicable to data sets with a large number of observations. Furthermore, our method can easily be used in a Bayesian framework.

### 2022-10-05: A Bayesian semi-parametric approach for inference on the population partly conditional mean from longitudinal data with dropout

Speaker: Maria Josefsson, Department of Statistics, Uppsala University. Time and place: 2022-10-05 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

Studies of memory trajectories using longitudinal data often result in highly non-representative samples due to selective study enrolment and attrition. An additional bias comes from practice effects that result in improved or maintained performance due to familiarity with test content or context. These challenges may bias study findings and severely distort the ability to generalise to the target population.

In this study we propose an approach for estimating the finite population mean of a longitudinal outcome conditioning on being alive at a specific time point. We develop a flexible Bayesian semi-parametric predictive estimator for population inference when longitudinal auxiliary information is known for the target population. We evaluate sensitivity of the results to untestable assumptions and further compare our approach to other methods used for population inference in a simulation study. The proposed approach is motivated by 15-year longitudinal data from the Betula longitudinal cohort study. We apply our approach to estimate lifespan trajectories in episodic memory, with the aim to generalize findings to a target population.

### 2022-09-28: Diffusion Index forecast models with smooth transitions

Speaker: Ingrid Mattsson, Department of Statistics, Uppsala University. Time and place: 2022-09-28 at 10:15–11:30, Ekonomikum room H317.

**Abstract**

In this paper we extend the Diffusion Index (DI) forecast model, introduced by Stock & Watson (2002), by allowing for smooth transition type nonlinearity (DIST). This is achieved by incorporating a logistic transition function in a factor augmented forecast model, where the factors are estimated using principal components.

Our main contribution is to theoretically justify bootstrap tests for linearity and parameter constancy, based on the wild bootstrap algorithm for linear factor augmented models, developed by Gonçalves & Perron (2014). A Monte Carlo experiment is performed, and it is shown that the wild bootstrap test has desirable small sample properties even in the most general case, where the test based on the regular OLS estimator has considerable size distortions. An empirical example is further included to demonstrate how the DIST model can outperform its linear counterpart in a forecasting situation.

### 2022-09-14: Flexible Latent Variable Model Framework for Latent DIF Detection

Speaker: Gabriel Wallin, Umeå University and London School of Economics. Date and location: 2022-09-14 at 10:15–12:00, Ekonomikum room H317.

**Abstract**

In psychometrics, a field concerned with theory and techniques for psychological and educational measurement, it is standard procedure to assess the presence of differential item functioning (DIF). DIF means that questionnaire/test items function differently for different groups of respondents, after controlling for the latent construct that is intended to be measured. It for example occurs in educational testing when groups such as defined by e.g., gender or ethnicity have different probabilities of answering a given item correctly, after controlling for the latent ability that the exam is intended to measure. As such, it relates to fairness in educational testing.

When DIF detection is not based on known groups such as gender or ethnicity but on unknown, homogeneous subgroups, the problem is typically referred to as latent DIF detection, which will be the focus of this talk. To that end, I will present a flexible modelling framework that combines a general latent factor model with a latent class model to capture both normal response behaviour for non-DIF items and deviant behaviour for DIF items. In the proposed model, a sparse DIF effect parameter is introduced that is allowed to vary between the latent classes identified by the model.

Our main contributions are two-folded: Firstly, unlike previous research on DIF detection, no prior knowledge of DIF-free items is required. Instead, they are identified through an 1 penalty on the DIF effect parameter in the marginal likelihood function of the model. Secondly, the proposed model considers a multiple latent group setting, whereas only two groups (a so called manifest and a focal group) are typically facilitated in current DIF detection methods. We propose an EM algorithm for model estimation, where the maximization step is carried out using a quasi-Newton proximal algorithm. Results based on both simulated and empirical data together with theoretical results will be presented.