mODa14

Ahipasaoglu Selin

University of Southampton

TITLE: Optimal experimental design under linear constraints: a column generation framework

We present a generalized column generation algorithm for solving large-scale optimal experimental design problems under linear constraints. The approach exploits the sparsity of optimal solutions and applies to D-optimal and A-optimal criteria, as well as their weighted mixtures.

Bailey Rosemary

University of St Andrews

TITLE: Semi-Latin squares and their extensions

A semi-Latin square is an n X n square array of cells, with each cell containing k plots. Treatments are allocated to these in such a way that each treatment occurs once in each row and once in each column. Now cells form another system of blocks. Previous work has been done on how to make the treatment allocation optimal. Extended semi-Latin squares incorporate another blocking factor. Now each column is subdivided into k columns, called lines. So cells and lines both form systems of incomplete blocks. So the problem of creating optimal (or, at least, near-optimal) designs becomes much harder.

Baur Pauline

TU Dortmund University

TITLE: Optimal Designs for Nonlinear Regression with Dependent Errors using Matrix Norms

(joint work with Kirsten Schorning andHolger Dette) The focus is on the design of experiments for nonlinear parametric regression curves, where the data may be dependent. We are particularly interested in determining sets of optimal time points where the corresponding observations result in a precise estimation of the unknown parameter. For that purpose, we use the transition between discrete-time and continuous-time data structures, assuming that the continuous model provides maximal information and serves as the benchmark for approximation. We derive the likelihood function and Fisher information matrices for both frameworks using results from stochastic analysis. We then determine the optimal time points for the discrete case such that a matrix-norm-based distance between the Fisher information matrices of the discrete and continuous models is minimal. A simulation study illustrates the advantages of the new approach, showing that the use of optimal designs yields substantially better parameter estimates than the use of uniform designs in terms of MSE and prediction. Starting with simple nonlinear regression models that include a parametric nonlinear regression function and a Brownian motion error process, we extend the methodology to more complex cases with time-dependent volatility functions.

Bect Julien

L2S / CentraleSupélec

TITLE: Uncertainty functionals revisited: Concavity and Jensen’s inequality (DPP)

This article presents a theoretical study of uncertainty functionals on general measurable spaces. These functionals are fundamental in experimental design and global sensitivity analysis, where they are used to quantify variability and information content in probabilistic models. As first articulated in DeGroot’s seminal 1962 article, a natural requirement is that uncertainty should decrease on average when additional information is obtained. This requirement is equivalent to the probabilistic form of Jensen’s inequality on the space of probability measures. Our main results show that concavity is necessary but not sufficient for Jensen’s inequality to hold whenever the underlying measurable space is infinite. We also provide practicable sufficient conditions under which the desired property holds. These results contribute to a clearer mathematical foundation for uncertainty quantification. Several open questions are formulated.

Bertoglio Cristóbal

University of Groningen

TITLE: OED with k-space data: application to inverse hemodynamics

Subject-specific cardiovascular models rely on parameter estimation using measurements such as 4D Flow MRI data. However, acquiring high-resolution, high-fidelity functional flow data is costly and taxing for the patient. As a result, there is growing interest in using highly undersampled MRI data to reduce acquisition time and thus the cost, while maximizing the information gain from the data. Examples of such recent work include inverse problems to estimate boundary conditions of aortic blood flow from highly undersampled k-space data. The undersampled data is selected based on a predefined sampling mask which can significantly influences the performance and the quality of the solution of the inverse problem. While there are many established sampling patterns to collect undersam- pled data, it remains unclear how to select the best sampling pattern for a given set of inference parameters. In this paper we propose an Optimal Experimental Design (OED) framework for MRI measurements in k-space, aiming to find optimal masks for estimating specific parameters directly from k-space. As OED is typically applied to sensor placement problems in spatial locations, this is, to our knowledge, the first time the technique is used in this context. We demonstrate that the masks optimized by employing OED consistently outperform conventional sampling patterns in terms of parameter estimation accuracy and variance, facilitating a speed-up of 10x of the acquisition time while maintaining accuracy.

Chazal Clémentine

ENSAE, Paris

TITLE: A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives

Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility in- troduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of sub- optimality called gradient discrepancy, and in particular a kernel gradient discrepancy (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coin- cides with the kernel Stein discrepancy (KSD), and we obtain a novel characterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algo- rithms are proposed and studied, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and predictively ori- ented posteriors presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control.

de la Calle Carlos

University of Oviedo

TITLE: Optimal subsampling in presence of model misspecification

In Big Data settings, the covariates of a large population of N observations are often available at low cost, while obtaining the response variable might be expensive or time-consuming. In such settings, one must carefully select a subsample of size n for which the response will be measured and a supervised model subsequently fitted. Classical subsampling strategies, both random and deterministic, are typically designed under the assumption of a correctly specified model, and may therefore produce unreliable predictions when the working model is misspecified. In this work, we address the problem of selecting an optimal subsample of size n from a large population of N observations, with the goal of minimizing the Mean Squared Prediction Error (MSPE) at a set of target prediction points, when a linear working model is fitted to data generated by a true model that includes an unknown misspecification term. The MSPE decomposes into two additive components: the classical I-criterion, which controls prediction variance, and a bias component due to model misspecification. Since the latter depends on an unknown function, we derive two Upper-Bias criteria — upper bounds for the bias component that can be minimized independently of the misspecification. The tighter bound requires no assumption on the misspecification function; a coarser bound connects our framework to the low-condition-number approach of Meng et al. (2020). We apply a memetic algorithm to construct the set of non-dominated solutions with respect to the I-criterion and an Upper-Bias criterion, and identify three representative designs: the I-optimal subsample, the Upper-Bias optimal subsample, and a compromise optimal subsample corresponding to the elbow of the Pareto front. Preliminary simulation results, obtained under several misspecification structures and strengths, are promising, and a more thorough exploration of simulation scenarios is currently ongoing.

de Vries Klazien

University of Groningen

TITLE: Bayesian D-optimal designs for GAMLSS in psychological test norming

We develop Bayesian D-optimal designs for GAMLSS models commonly used in continuous norming of psychological tests, namely heteroscedastic normal and beta-binomial distributions. By applying Bayesian optimal design methodology to GAMLSS models, we demonstrate how prior normative data can be used to inform sampling designs for recalculating outdated normed scores. In a case study for a word development test, we derive exact Bayesian D-optimal designs and evaluate their sensitivity to prior specification. We find that the resulting designs are remarkably robust under moderately informative priors, closely resembling local D-optimal designs, while near-uninformative priors result in more space-filling designs.

Deldossi Laura

Università Cattolica del Sacro Cuore - Milan

TITLE: Random-X prediction error under model misspecification

(joint work with Alvaro Cia-Mina, Jesus Lopez-Fidalgo and Chiara Tommasi) In recent years, advances in computing, particularly the internet, have enabled the collection of Big Data. This has been the key of the current boom of Artificial Intelligence, fed with huge amounts of data, not necessarily of good quality. Processing them with traditional analyses may be unaffordable and new techniques are continuously being developed. Optimal subsampling addresses this challenge by selecting a representative subset that preserves predictive accuracy while reducing computational and labeling costs. When the sampling strategy depends on an assumed model, a critical issue is model misspecification. We develop a theoretical framework for subsampling under misspecification with the goal to minimize a prediction error metric obtained integrating with respect to the marginal distribution of explanatory variables. A key contribution is a theorem that provides an analytical expression for “the best” inclusion probabilities in a Poisson sampling scheme, by connecting sampling theory and optimal experimental design for probability sub-measures. Theoretical results are validated through simulations, focusing on simple linear regression with explicit solutions. A real data application is also considered.

Eichhorn Julia

TU Dortmund University

TITLE: Detecting structural variations between different concentration-response experiments

Click here for abstract

Eidsvik Jo

NTNU, Norway

TITLE: Adaptive spatio-temporal sampling with marine robots

The advent of drones with computing units provides new opportunities and challenges for statistical sampling. In this presentation, we focus on marine robots and in particular so-called autonomous underwater vehicles (AUV), that can be deployed for in-situ ocean sampling useful for surveying, monitoring or mapping purposes. Equipped with a computer unit, an AUV can sense the ocean environment, update its onboard model based on data, plan where and when to navigate for efficient spatio-temporal sampling efforts, and act using its engines and navigation systems. We demonstrate methods for combining spatio-temporal statistical modeling and autonomous robotic systems for efficient experimental designs. Building on Gaussian random field models, we present an approach for real-time AUV operation and planning. This relies on fast computation of acquisition functions like maximum expected improvement, minimum expected Bernoulli variance or entropy which have closed-form solutions for the Gaussian model and enable adaptive sampling. In several applications with ocean front mapping, mine tailings pollution monitoring or chlorophyll hotspot sampling, we develop, test and deploy algorithms for efficient AUV sampling.

Filova Lenka

Comenius University Bratislava

TITLE: Optimal Exact Designs of Multiresponse Experiments under Linear and Sparsity Constraints

Click here for abstract

Flournoy Nancy

University of Missouri

TITLE: Results and challenges in estimation for dose-finding designs

Click here for abstract

Friedli Lea

Technical University of Munich

TITLE: CRPS-Based Targeted Sequential Design with Application in Chemical Space

Gaussian processes (GPs) have become a widely used tool for modeling unknown functions across various domains. In many applications, particular interest lies in a specific range of the response, with the goal of identifying inputs that lead to desired outputs. To enhance GP model performance in this setting, we employ weighted scoring rules to develop sequential design strategies that selectively augment the training dataset. Specifically, we study pointwise and integral criteria based on the threshold-weighted Continuous Ranked Probability Score (CRPS), using two different weighting measures. We showcase an application in synthetic chemistry, where the objective is to identify molecules with specific properties. However, the presented acquisition strategies are applicable to a wide range of fields and pave the way to further developing sequential design relying on scoring rules.

Frieri Rosamarie

University of Bologna

TITLE: Combining information from adaptive experiments (DPP)

Combining evidence across experiments is a fundamental problem for both statistical theory and practice. Yet a complete framework of guidelines discussing how aggregation should be approached across differing design settings remains underdeveloped. The problem becomes especially complex under adaptive randomization, in particular when experiments interact through their designs. In such settings, the allocation mechanism could be ancillary or not for the likelihood and may have an impact on the appropriate inference to carry out. This paper addresses the problem of combining data from multiple (adaptive) experiments and discusses how to effectively aggregate information. We address allocation-adaptive and covariate-adjusted procedures, as well as response-adaptive and covariate-adjusted response-adaptive designs, showing how the nature of these different procedures impacts on the information sharing from an inferential perspective. We further examine independent and interacting experiments, including concurrent studies in stratified populations and characterize the asymptotic behaviour of estimators. The results provide a useful perspective on when information borrowing across adaptive experiments preserves statistical validity and when additional work is required.

Gautier Athénaïs

ONERA

TITLE: SLGP-based surrogates for spatially dependent discrete outputs: modelling, uncertainty quantification, and data acquisition (DPP)

Discrete and categorical responses arise in many modern spatial experiments, but most design methodologies focus on mean behaviour or binary classification. We propose Spatial Logistic Gaussian Processes (SLGPs) as general-purpose surrogates for spatially varying discrete distributions. The SLGP representation yields a full posterior on the probability mass function at any location, allowing principled uncertainty quantification and distribution-aware loss functions. We derive acquisition strategies that select new samples based on the current uncertainty.

Hainy Markus

Johannes Kepler University Linz

TITLE: Sequential Bayesian Experimental Design for Prediction in Physical Experiments Informed by Computer Models

In many scientific and engineering domains, physical experiments are often costly, non-replicable, or time-consuming. The Kennedy & O’Hagan (KOH) model framework has become a widely used approach for combining simulator runs with limited experimental observations. Under a Bayesian implementation, the simulator output, model discrepancy, and observation noise are jointly modeled by coupled Gaussian processes, followed by coherent posterior inference and uncertainty quantification. This work presents a genuinely sequential Bayesian experimental design (BED) framework explicitly aimed at improving the predictive performance of the KOH model. We employ a mutual information (MI)–based criterion and develop a hybrid variant that integrates with measures of local model complexity, leading to significantly more efficient design decisions. We further theoretically establish asymptotic links between the MI-based criterion and the classical integrated mean squared prediction error (IMSPE) minimization criterion. In practice, we find the MI-based criterion is more comprehensive and robust than the IMSPE minimization criterion, especially when the model is highly uncertain in the early stages of the experiment. We demonstrate the effectiveness of the proposed methods through both a synthetic example and a real biochemical case study, and compare the MI- and IMSPE-based criteria against several other classical design criteria under sequential (offline) and adaptive (online) BED settings.

Hooker Andrew

Uppsala University

TITLE: Optimizing clinical study designs for CAR-T cell therapy: Development of an efficient sampling strategy through optimal experimental design

Designing efficient studies for emerging therapies poses recurring challenges for optimal experimental design: uncertain prior parameter values, substantial between-subject variability, and practical constraints that preclude fixed sampling schedules. We developed a robust optimal design framework addressing these challenges and applied it to a clinical study of CAR-T cell therapy, a cancer immunotherapy with complex nonlinear dose-exposure and exposure-response dynamics. Building on a published mechanistic nonlinear mixed-effects model of CAR-T cell kinetics and tumor dynamics, we (1) determined the minimal population size required to identify model parameters, (2) compared alternative sampling strategies, and (3) derived flexible sampling windows in place of fixed timepoints, accounting for between-population variability and prior parameter uncertainty. The resulting design specified 60 patients, three fixed tumor-burden assessments (days 0, 30, 90), and three CAR-T cell sampling windows (days 2-4, 12-18, 32-47 post-infusion). In stochastic simulation and estimation, the windowed design outperformed fixed-timepoint designs, confirming its efficiency and robustness. The work illustrates how robust and flexible optimal design can deliver implementable schedules for complex nonlinear mixed-effects models in settings where rigid sampling is infeasible.

Jones Bradley

Retired

TITLE: Unrestricted Bridge (U-Bridge) Designs

We propose a simple method to generate designs for continuous factors when the response data are noisy. Our “unrestricted bridge (U-Bridge) designs” are closely related to the bridge designs of Jones et. al., but are much easier to construct. U-Bridge designs provide a middle ground between optimal designs and traditional space-filling designs. They support fitting both polynomial models and more flexible models such as Gaussian process models. This allows for consideration of both variance and bias when the data are noisy.

Kang Lulu

University of Massachusetts Amherst

TITLE: Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy (DPP)

High-dimensional datasets are frequently subject to contamination by outliers and heavy-tailed noise, which can severely bias standard regularized estimators like the Lasso. While Maximum Mean Discrepancy (MMD) has recently been introduced as a “universal” framework for robust regression, its application to high-dimensional Generalized Linear Models (GLMs) remains largely unexplored, particularly regarding variable selection. In this paper, we propose a penalized MMD framework for robust estimation and feature selection in GLMs. We introduce an l_1-penalized MMD objective and develop two versions of the estimator: a full O(n^2) version and a computationally efficient O(n) approximation. To solve the resulting non-convex optimization problem, we employ an algorithm based on the Alternating Direction Method of Multipliers (ADMM) combined with AdaGrad. Through extensive simulation studies involving Gaussian linear regression and binary logistic regression, we demonstrate that our proposed methods significantly outperform classical penalized GLMs and existing robust benchmarks. Our approach shows particular strength in handling high-leverage points and heavy-tailed error distributions, where traditional methods often fail.

Karvonen Toni

Lappeenranta–Lahti University of Technology LUT

TITLE: Good and bad kernels

I discuss which covariance kernels one should use and which to avoid when constructing a design with a Gaussian random fields. If the kernel is real analytic (e.g., the Gaussian kernel), the MSE tends to zero as the number of design points increases regardless of where these points are placed. For such kernels it is difficult to distinguish a good design from a bad one. If the kernel is rougher, the design must be space-filling if the MSE is to tend to zero.

Koutra Vasiliki

King’s College London

TITLE: Designing experiments on networks

In many modern applications, experiments are conducted on units connected through a network structure, such as social, biological, or technological systems. In these settings, classical assumptions are violated because outcomes may be influenced not only by a unit’s own treatment, but also by treatments on connected units, leading to interference and spillover effects. In this talk, I will present recent work on experimental design in networked settings, aiming to make better use of the network structure to improve the estimation of treatment effects in large or complex systems. I will illustrate the approach through examples, highlighting improvements over standard methods and discussing some of the challenges.

Koval Karina

Heidelberg University

TITLE: Measure transport for Bayesian optimal experimental design

Bayesian optimal experimental design provides a framework for selecting informative experiments in inverse problems, but is often computationally challenging due to the need to approximate complex, high-dimensional posterior distributions. These challenges are particularly pronounced in problems governed by partial differential equations (PDEs), where the high cost of forward solves and the large-dimensional parameter spaces compound the computational burden. This talk explores how measure transport methods can help address these challenges. By constructing deterministic mappings between measures, transport methods enable efficient approximation of posteriors and related quantities arising in design objectives. In particular, such mappings can be reused across multiple data realizations, enabling amortized inference in Bayesian settings. We discuss how such approaches can be used to improve the scalability of experimental design in Bayesian inverse problems and illustrate their potential through several examples.

Lamberti Justin

TU Dortmund University

TITLE: Optimal Designs for Large Time-Dependent Networks (DPP)

We consider the problem of optimally allocating measurement devices in complex time-dependent networks so that precise state estimation of the network at a future time-point is possible. For that purpose, we formulate a linear random-effects model in which the network structure and time-dependence can be separated. We then focus on the problem of optimally allocating measurement devices of different precision by reformulating the discrete optimization problem as a continuous one. Using the A-optimality criterion, we formulate an optimality criterion for the optimal state estimation of the network at a future time-point and we provide an analytical solution of the corresponding A-optimal designs. However, calculating the \(A\)-optimal design becomes computationally demanding when the network structure is large and complex. In this situation, the network has to be reduced before calculating the \(A\)-optimal design. For that purpose, we propose two methods: one using the quotient graph and the other cutting the network into subgraphs that are then treated separately. We show in a simulation study that the designs based on the cutting approach perform well compared to the analytically determined designs in terms of A-efficiencies.

Lane Adam

Cincinnati Children’s Hospital

TITLE: Improving Efficiency with Relevant Subset Designs

The majority of historical designs are fixed in nature, where fixed indicates a design is specified in advance of the experiment. The conventional wisdom is that the set of fixed designs is sufficient to produce efficient experiments. This work challenges this convention and shows how adaptation can improve the efficiency of an a priori design. Specifically, in the context of a sequential experiment, where observations are accrued in a series of runs, an adaptive design is proposed that is more efficient than the corresponding fixed design.

Leonov Sergei

CSL Behring

TITLE: Integrating Bayesian and Promising Zone Approaches in a Phase 3 Clinical Trial (DPP)

We describe a Phase 3 clinical trial design planned in a rare-disease population to compare a new treatment with a control. The primary efficacy endpoint is time to a predefined event within a fixed follow-up period. The design incorporates two interim analyses: the first is based on Bayesian posterior probabilities, while the second evaluates conditional power and applies a promising zone framework. We present results of simulations that explore the operating characteristics of the proposed design

Malevich Nadja

Universität Münster

TITLE: Paired Comparisons Designed for Personnel Recruiting (DPP)

Paired comparison experiments are often used to obtain more reliable responses than single-stimulus evaluations. In contrast to standard paired comparisons of categorical factors, in the present setup, the explanatory variables are quantitative and their settings must be chosen from a highly constrained design region. Under these restrictions, optimal designs differ substantially from commonly used standard solutions. Based on the structure of the design region, we derive D- and A-optimal approximate designs that can be readily implemented in practice.

Marshall Daniel

Brunel University of London

TITLE: Experimental Design on Networks: Exploring Nonlinear Social Influence

Experiments on interconnected units, such as users in online social networks, pose distinctive challenges for both inference and design due to network interference and parameter‑dependent information structures. We study the POW-DEG nonlinear social influence model on a retweet network of X/Twitter users and evaluate how designs optimised under different criteria perform under both prior misspecification and network misspecification. The poster presents pairwise profile‑likelihood confidence regions for designs obtained under different optimality criteria and compares pseudo‑Bayesian and point‑prior approaches for the POW-DEG model. Our results show that explicitly optimising designs - rather than relying on randomised or balanced treatment allocations - improves efficiency in nonlinear networked experiments, even when assumptions about the network topology or parameter priors are incorrect.

May Caterina

Universita’ del Piemonte Orientale

TITLE: Optimal design of experiments for dynamic factors and functional responses

We study the problem of optimal experimental designs for the estimation of a non-concurrent linear regression model where both the response and the factors are continuous functions. After obtaining the variance-covariance matrix of the estimator of the functional coefficient which minimizes the integrated sum of squared errors, we extend the definition of optimal design to this estimator, and we provide the expression of the A-optimal and of the D-optimal designs. In order to compute these optimal designs through a suitable algorithm, an experimenter should set in advance the choice of basis functions (e.g., type and dimension) to represent the predictors and to expand the functional coefficients in the direction of the predictors. As we show, the functional form of the optimal designs is affected by this choice. Simulations show that functional A–optimal designs provide substantial gains over naïve, discretization-based strategies by yielding more informative response trajectories and, hence, more accurate estimates of the coefficient surface.

Mojmir Mutny

ETH Zurich, Broad Institute

TITLE: Active Exploration via Experiment Design in Markov Chains

Optimal Experiment Design is a classic field in statistics, closely related to Active Learning in Machine Learning. It assumes that through a series of system interactions, typically queries, we can estimate an unknown quantity. The goal is to develop an algorithmic strategy that optimally gathers information in a budget-constrained scenario. Traditionally, it is assumed that any query can be selected at any time or interaction round. However, in this talk, I will discuss more complex scenarios where interactions change the state of the experimenter, thereby restricting the possible queries. These state transitions are modeled using a Markov chain, and the overall process can be described as a Markov Decision Process (MDP) with a non-linear reward function. I will examine different problem such as exploration, preference learning and maximum identification with applications in spatial surveillance, personalization and chemical reactor optimization. Additionally, I will link this exposition to the optimal control literature.

Müller Christine H.

TU Dortmund

TITLE: Optimal designs for the Kalman filter applied to electrical power distribution grids (DPP)

We consider dynamic electrical power distribution grids where unknown voltage states are strongly time dependent. These unknown states are estimated via an extended Kalman filter based on power measurements at the nodes of the grid. The design problem is here to determine nodes of the grid where the measurements should be taken with high precision and where lower precision is possible. Since the covariance matrices of state estimation and state prediction are time dependent and are additionally of complicated form, we propose a simplified design criteria based on a simplified matrix. Although this matrix is still time dependent, it is much easier to treat and methods developed for other networks can be applied. We show this for a star network using the trace and the determinant. A simulation study demonstrates that optimal designs for these simplified criteria are also very good for minimizing the trace and the determinant of the covariance matrices of state estimation and state prediction. Besides a star network, this simulation treats also a network with series connection.

Muñoz del Río Alba

Universidad a Distancia de Madrid

TITLE: Efficient Isothermal Experimental Designs for Parameter Estimation in Predictive Microbiology

Predictive microbiology relies on mathematical models to describe microbial growth under controlled environmental conditions and to support food-safety decision-making. A common experimental strategy consists of performing several isothermal experiments, fitting a primary growth model at each temperature, and then using the estimated maximum growth rates as input data for a secondary temperature-dependent model. Although widely used, this two-step procedure may overlook the uncertainty associated with the intermediate estimates, which can affect the precision of the final model predictions. In this work, Optimal Experimental Design is used to improve the planning of isothermal growth experiments for parameter estimation in predictive microbiology. The Baranyi model is considered as the primary model and the Ratkowsky square-root model as the secondary model, using as case study the growth of Pseudomonas spp. on button mushrooms. Several experimental scenarios are analysed: fixed-temperature designs for estimating the primary model parameters, c-optimal designs focused on the maximum growth rate, and joint designs in which both temperature and time are treated as controllable variables. The results show that standard isothermal designs may lead to a relevant loss of efficiency compared with optimal designs. In particular, D-optimal designs improve the joint estimation of primary model parameters, while c-optimal designs substantially reduce the uncertainty associated with the estimation of the maximum growth rate. Moreover, designing jointly over temperature and time provides a more efficient strategy for estimating both primary and secondary model parameters, reducing the number of required points of observations and allowing relevant savings in experimental duration with only minor losses in efficiency.

Overstall Antony

University of Southampton, UK

TITLE: General Bayesian optimal design of experiments

Bayesian optimal design is a well-established approach to planning experiments. A distribution for the responses, i.e. a statistical model, is assumed which is dependent on unknown parameters. A utility function is then specified giving gain in information in estimating the true values of the parameters, using the Bayesian posterior distribution. A Bayesian optimal design is given by maximising expectation of the utility with respect to the distribution implied by statistical model and prior distribution for the true parameter values. The approach accounts for the experimental aim, via specification of the utility, and of assumed sources of uncertainty. However, it is predicated on the statistical model being correct. Recently, a new type of statistical inference, known as General Bayesian inference, has been proposed. This is Bayesian-like, i.e. uncertainty for unknown quantities is represented by a posterior distribution, but does not necessarily require specification of a statistical model. The resulting inference is less sensitive to misspecification of the statistical model. This talk introduces General Bayesian optimal design: a framework for optimal design of experiments under General Bayesian inference. A computational approach to find designs in practice is outlined and the framework is demonstrated on exemplars including linear models, and experiments with count and time-to-event responses.

Parker Ben

Brunel, University of London

TITLE: Design of Experiments on Networks for Autoregressive Models (DPP)

We investigate how to design experiments under an autoregressive model, where the response of an experimental is affected by the response of others that it is connected to, according to an autoregressive parameter as well as the strength of connections between the experimental units, specified by a known adjacency matrix. We derive the Fisher information for this model, and as the optimal design depends on the unknown parameters, we present (pseudo-)Bayesian optimal designs for point priors (locally optimal designs) and simple priors. We extend the model to include blocking, and indirect treatment effects, and present optimal designs for a wide range of models useful on experiments on networks, including spatial models. We give examples in crop experiments, marketing experiments on social networks, crossover trials, and spatial experiments, but the ideas and methods can readily be generalised to many designed experiments on networks.

Patan Maciej

University of Zielona Góra

TITLE: Maximin efficient sensor location for estimation of subsets of parameters

Optimal sensor location for parameter estimation of spatiotemporal systems is usually focused on maximizing an optimality criterion defined on the Fisher information matrix (FIM) associated with the estimated parameters. But different optimality criteria may yield different optimal locations. Therefore, strong interest is generated by compromise locations which would produce decent values for a broadest possible class of design criteria. Here a method is proposed to compute sensor locations which maximize the minimal efficiency with respect to the class of orthogonally invariant information criteria. This class is broad enough to include all optimum design criteria encountered in practice. It turns out that the minimal efficiency with respect to this class equals that with respect to a finite set of criteria generalizing the well-known E-optimum design criterion. A major contribution of the approach proposed here is the focus on estimating only a subset of parameters as accurately as possible. The other parameters, called nuisance parameters, must also be estimated although we are interested in neither their values, nor accuracies. In order to make selection of a best subset of gauged sites from a possibly very large set of candidate sites computationally tractable, its convex relaxation is introduced. Two major problems to be tackled are the potential singularity of the optimal information matrix associated with all unknown parameters and the nondifferentiability of the optimality criterion. The former is settled by imposing a constraint on the minimal allowable value of the determinant of the information matrix. The latter is resolved by reformulating the problem as a convex semi-infinite programming problem whose solution is sought by solving a sequence of finite low-dimensional min-max problems using extremely efficient generalized simplicial decomposition. The excellent performance of the proposed technique is illustrated by an example involving optimal sensor-node acitation problem for a process described by a system of partial differential equations.

Pronzato Luc

CNRS

TITLE: Random and Quasi-Random Designs for Quantisation in High Dimension

Click here for abstract

Prus Maryna

University of Hohenheim, Germany

TITLE: Optimal Design for Multi-Environment Crop Variety Testing

Click here for abstract

Reckermann Katharina

Roche

TITLE: Design of Experiments (DoE) in Pharmaceutical Technical Development

Design of Experiments (DoE) has become an indispensable, systematic framework in pharmaceutical technical development for efficiently navigating complex, multivariable systems. In process development, DoE is utilized to identify critical process parameters (CPPs) to optimize yield and product quality, while simultaneously establishing a robust design space resilient to inherent manufacturing variability. Furthermore, applying DoE to analytical development ensures the optimization of method accuracy and precision.

Riccomagno Eva

University of Genova

TITLE: Data-Driven Strategies for Detecting and Sampling Misrepresented Subgroups (DPP)

Economic policy research often investigates population well-being, with particular attention to the links between unequal living conditions, low education, and social exclusion. Sample surveys such as European Union Statistics on Income and Living Conditions (EU-SILC) are widely used for this purpose and inform public policy, yet their sampling designs may fail to adequately represent rare, hard-to-sample, or under-covered subgroups. We propose a generalisable data-driven integrated workflow to detect outliers in survey data that may signal under-represented subgroups. Identified groups can then be characterised to inform targeted resampling strategies that improve survey inclusiveness. An empirical application using the 2019 EU-SILC data for the Italian region of Liguria shows that citizenship, material deprivation, large household size, and economic vulnerability are indicators of under-representation.

Rodriguez Diaz Juan M.

University of Salamanca

TITLE: Optimal designs for drug detection in the human body and further applications

A review study of models for detecting the presence and evaluating the amount of different kinds of drugs in the human body will be performed from the perspective of designing the best experimental framework for estimating model parameters. Usually derived from the compartmental-model approach some variations, such as scenarios involving several incorporations at different times, will be analyzed and compared with existing bibliography. New models for single and multiple intake scenarios will be introduced; in the case of alcohol consumption, these may be justified according to the actual behavior of the incorporation and elimination phases of this substance in humans. Optimal and quasi-optimal designs will be proposed. Finally, a procedure for estimating the quantity of alcohol in an individual at a past time, based on one or more observations taken at the present, will be proposed and discussed using these models. In cases where the estimation could lead to a punitive situation for the subject, a conservative procedure for deciding whether or not the individual should be penalized will be proposed, always aiming for a balance between the need to monitor alcohol consumption (for instance, when driving) and the presumption of innocence.

Ronco Davide

Politecnico di Torino

TITLE: How Deep Learning can support the Steel wheel design: a practical case study

In the automotive industry, vehicle wheels play a crucial role in ensuring overall vehicle safety. A failure or an inadequate wheel design can compromise not only the vehicle’s stability and handling, but also the safety of its occupants, potentially leading to severe consequences. One of the key aspects in wheel design is fatigue strength, namely the ability of the wheel to withstand repeated cyclic loads without developing damage. Ensuring adequate fatigue resistance is essential to guarantee that the wheel maintains its safety performance throughout the entire service life of the vehicle. However, wheel design is not limited to safety requirements alone. Automotive manufacturers also impose a wide range of additional constraints that must be satisfied, such as weight reduction, vehicle handling performance, compatibility with braking systems, wheel covers and valves, aesthetic appearance, and, increasingly important, environmental sustainability, including the reduction of CO₂ emissions throughout the product life cycle. As a result, wheel design becomes a complex optimization process in which multiple competing factors must be carefully balanced. In many cases, achieving the optimal solution requires a large number of design iterations, often exceeding fifty. Each iteration is typically supported by advanced structural analyses based on the Finite Element Method (FEM), making the overall optimization process computationally intensive and potentially requiring several weeks to complete. The objective of this work is to accelerate this development process by drastically reducing design times and enabling optimal solutions to be achieved within minutes through the application of Deep Learning technologies. The proposed methodology aims to optimize the wheel design process by leveraging the capabilities of neural networks and Deep Learning techniques to significantly reduce computational time while maintaining a good level of accuracy and guiding designers toward the most promising design direction. The approach is structured into two main phases: the training phase and the validation phase. 1. Training phase The first phase involves training a neural network using data obtained from FEM structural analyses. In this stage, the wheel dimensional parameters (namely the geometrical dimensions defining the wheel shape) are used as input variables for the neural network. The outputs consist of the stress results in the most critical areas obtained from the FEM simulations to which the wheel is subjected, the wight and the raw material needed for its construction. During the training process, the model learns the complex relationships between the geometric parameters and the mechanical responses (stress distributions) of the wheel by processing large amounts of data. This learning process enables the neural network to develop an understanding of the wheel’s structural behavior, allowing it to generate reliable predictions without the need to perform the numerous iterative FEM calculations traditionally required during the design process. 2. Validation Phase Once trained, the neural network is tested and validated through a verification process in which the predicted results are compared with those obtained using traditional FEM methods. The outputs generated by the neural network are evaluated to assess their consistency and reliability with respect to the FEM data, showing a strong correlation between the two approaches. A major advantage of the proposed methodology is that, once the model has been trained, it is capable of generating predictions within minutes rather than requiring weeks to complete FEM iterations across dozens or even hundreds of design configurations. The neural network can rapidly explore the design space by generating and evaluating tens of thousands of input–output combinations in a very short time. This enables a far more efficient search for the optimal solution, allowing the design team to focus on more creative and innovative aspects of the development process instead of repetitive computational iterations. The present work is not limited to the description of the proposed methodology, but also includes exploratory results that demonstrate the effectiveness of the developed model. In particular, a practical case study is presented in which the neural network is used to explore different wheel geometric configurations while simultaneously optimizing structural performance and reducing computational time. The speed and reliability offered by Deep Learning therefore not only accelerate the design process, but also provide a significant competitive advantage. The ability to rapidly evaluate multiple design configurations and identify optimal solutions much more efficiently than traditional approaches enables companies to respond more quickly to market demands and reduce the time-to-market of the final product.

Rosa Samuel

Comenius University Bratislava

TITLE: Removing nonoptimal points for D-optimal exact designs

One of the most common problems in optimal design of experiments is computing D-optimal exact designs for a large finite design space. In this talk, which is based on [Harman, Rosa (2026): “Removal of redundant candidate points for the exact D-optimal design problem”, Statistics and Computing], we give necessary conditions that must be satisfied by any support point of a D-optimal exact design. These conditions enable rapid elimination of redundant candidate points without loss of optimality, thereby reducing memory requirements and runtime of subsequent exact-design algorithms. We also prove that, for a sufficiently large number of trials, the support of every D-optimal exact design is contained in a set that typically coincides with the support of a D-optimal approximate design. We demonstrate the approach on problems with candidate sets of up to 100 million points. The proposed approach reduced the initial candidate sets by several orders of magnitude, thereby making it possible to compute D-optimal exact designs for these problems via mixed-integer second-order cone programming.

Rosenberger William

George Mason University

TITLE: A Framework for Two-Stage Enrichment Trial Designs (DPP)

We develop a framework for a two-stage enrichment design to investigate a treatment effect in a clinical trial. A learning stage identifies a potentially benefitting subgroup, and that information is carried forward to a confirmatory stage with adapted enrollment criteria. Final inference combines data from both stages, and subgroup performance is evaluated using false inclusion and false exclusion rates. In stage 1, to estimate a benefitting subgroup, we propose a nonparametric regression level-set approach that relaxes monotonicity assumptions among continuous biomarkers and the treatment effect. The resulting optimal subgroup is designed to balance statistical validity with ethical considerations in patient enrollment. We describe potential methods to conduct the second stage and relevant inference, topics that should be considered research in progress.

Roth Katrin

Bayer AG

TITLE: Design challenges for dose-finding studies in drug development - Part 1: Estimands and Intercurrent Events

Dose-finding studies play a critical role in drug development, aiming to identify doses that provide an optimal balance between efficacy and safety. While substantial methodological progress has been made over the past decades, recent developments in regulatory guidance and increasing therapeutic complexity introduce new challenges for the design of such studies. In this first part of this presentation, we discuss challenges arising from the implementation of the estimands framework, with a particular focus on intercurrent events. Intercurrent events such as treatment discontinuation, use of rescue medication, or dose modifications are common in dose-finding studies and can fundamentally affect the interpretation of dose–response relationships. We illustrate how different strategies for handling intercurrent events translate into different estimands and how these choices impact optimal design considerations and modeling assumptions. The implications for commonly used dose-finding approaches are discussed, highlighting potential mismatches between design objectives and the targeted estimand. Overall, this presentation aims to raise awareness of some challenges in applying optimal design theory in pharmaceutical dose-finding studies.

Sada Allo Maider

Universidad Pública de Navarra

TITLE: Joint Estimation of Target Dose and Slope in Binary Response Models: A Geometric Approach (DPP)

A common goal in many fields is to identify the level of a stimulus (dose) that has a prescribed percentage of positive binary responses. We also consider the estimation of the slope of the response function at the dose of interest to calibrate the importance of error in the dose estimate. We use a parameterization whose information matrix depends directly on the parameters of interest. By mapping the problem to an induced dose space, the optimization process becomes independent of the unknown parameters. This dependency, however, is reintroduced when transforming the optimal doses from the induced space back into the real dose space. Unfortunately, the symmetrical properties that appear in the geometrical reasoning of previous works no longer hold. In contrast, the geometry presented here provides a mechanistic understanding that overcomes that of two-dimensional graphic methods. For example, one can explicitly observe how designs are constructed from convex combinations of elementary designs, revealing the specific contribution of each dose to the total information and providing a comprehensive view of the matrices of all possible designs. For a wide class of optimality criteria and response models, we find that the optimal design for the joint estimation of the dose and the slope is a two-point design symmetric about the median. This discovery reduces the computational complexity from a five-parameter to a single-parameter problem as we show for finding the standardized A- and cc-optimal designs.

Schmelter Thomas

Bayer AG

TITLE: Design challenges for dose-finding studies in drug development - Part 2: Finding more than the right dose

Dose-finding studies play a critical role in drug development, aiming to identify doses that provide an optimal balance between efficacy and safety. While substantial methodological progress has been made over the past decades, recent developments in regulatory guidance and increasing therapeutic complexity introduce new challenges for the design of such studies. This second part of the presentation addresses emerging challenges when the objective of a dose-finding study extends beyond identifying a single dose to selecting an appropriate treatment regimen. This is, e.g., the case in settings with titration schemes or flexible regimens adapting to patient response or in settings where also interval an optimal interval between treatment applications needs to be identified. We discuss how this added complexity affects both design and analysis, and the increased dimensionality of the design space. Methodological and practical considerations are illustrated using motivating examples. Overall, this presentation aims to raise awareness of some challenges in applying optimal design theory in pharmaceutical dose-finding studies.

Schorning Kirsten

TU Dortmund University

TITLE: Optimal Designs for Drug Combination Studies

In toxicology, drug combination studies become increasingly important as they investigate potential interaction effects rather than the individual impacts of toxic substances. We first investigate potential interactions between two different toxic substances. In particular, we define an extension of effective doses to the bivariate case. Using nonlinear surface models to describe the dose-dose-response relationship, this definition yields the concept of effective dose combinations, which are contour lines of the fitted surface. We then propose an optimality criterion that aims to estimate these effective dose combinations precisely. In particular, an optimal design minimizes the width of the confidence band of the contour lines of interest. Optimal design theory is developed for this problem, including equivalence theorems and efficiency bounds. The performance of the optimal design is illustrated through examples that model dose-combination data using various nonlinear surface models. It is demonstrated that the optimal design for identifying effective dose combinations yields a more precise estimation of the effective dose combinations than the classical ray or factorial designs. Finally, we present different approaches to extend the results to combination studies involving a large number of toxic substances.

Singh Arpan

CTRU, University of Leeds

TITLE: Non-Orthogonal Factorial Designs for Trials of Therapist-Delivered Interventions: Allowing for Therapist Capacity, Availability and Turnover

Disentangling therapist effects from intervention effects has long been considered a challenge in randomised trials of therapist-delivered interventions. When separating these effects is of interest, factorial designs have been proposed in which therapists and interventions are regarded as two potentially interacting components of a complex intervention, represented by two treatment factors (one random, one fixed). In these trial designs, each therapist delivers each intervention (therapists and interventions are crossed), and the resulting therapist-intervention combinations are randomised to patients. It is assumed that each therapist is allocated the same number of patients (equal capacity), that all therapists within a centre are available at the point of randomisation (equal availability), and that therapists remain for the duration of the trial (no turnover). In practice, these are strong assumptions that if enforced may lead to delayed or suspended randomisation, prolonged recruitment and increased trial costs. We propose a set of novel randomisation strategies that explicitly account for therapist capacity, availability, and attrition while maintaining desirable statistical properties. Using extensive simulation studies, we compare the proposed and existing methods across a wide range of realistic trial scenarios, assessing trade‑offs between recruitment efficiency and statistical performance. The resulting guidance provides a practical framework for trialists designing trials of therapist-delivered interventions under real‑world constraints.

Solinus Robin

Ruhr-Universität Bochum

TITLE: Optimal Design for Spatio-Temporal Stochastic Processes

Spatially and temporally varying data can often be understood as a realisation of a spatio-temporal stochastic process to take the dependence structure into account. This general point of view includes different models as special cases. The common problem is to choose an optimal spatio-temporal product design, meaning that one measures at each location at all chosen time points. A natural approach is to globally minimise the mean squared prediction error (MSPE) of the best linear predictor, where the latter only depends on the mean and covariance structure of the process and the measurement error. We discuss two ways to achieve this, first by minimising the integrated MSPE and second by minimising the maximum MSPE, which in other words optimises the worst case. Naturally, this is more complicated when the process’s mean and covariance function are unknown. In this case, a parametric model is assumed and different approaches to consider for the uncertainty introduced by the parameter estimation are discussed.

Somogyi Pál

Department of Applied Mathematics and Statistics, Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia

TITLE: Computing optimal designs of experiments for multivariate-response models

Despite the increasing prevalence of vector observations, computation of optimal experimental design for multi-response models has received limited attention. To address this problem within the framework of approximate designs, we introduce mREX (Metrika 89:2, 217:242, 2025), an algorithm that generalizes the randomized exchange algorithm REX (J Am Stat Assoc 115:529, 348-361, 2020), originally specialized for single-response models. The mREX algorithm incorporates several improvements: a novel method for computing efficient sparse initial designs, an extension to all differentiable Kiefer’s optimality criteria, and an efficient method for performing optimal exchanges of weights. For the most commonly used D-optimality criterion, we propose a technique for optimal weight exchanges based on the characteristic polynomial of a matrix. The mREX algorithm typically converges to optimal designs faster than available alternative methods, although it does not require advanced mathematical programming solvers.

Spangl Bernhard

BOKU University

TITLE: Stratified sampling optimum allocation for fixed precision

In stratified sampling, the optimal sample size allocation for fixed total sample size is determined by the Neyman-Tschuprov allocation formula. We derive a “reversed Neyman-Tschuprov formula” to determine the optimal sample size allocation for fixed precision. The motivation is applications in which the total sample size is not specified but the desired precision is. This presentation is based on a joint ongoing work with Norbert Kaiblinger.

Stadler Alexandra

Johannes Kepler University Linz

TITLE: Improving AI Explainability through Design of Experiments

In artificial intelligence (AI), the complexity of many models and processes surpasses human understanding, making it challenging to determine why a specific prediction is made. This lack of transparency is particularly problematic in critical fields like healthcare, where trust in a model’s predictions is paramount. As a result, the explainability of machine learning (ML) and other complex models has become a key area of focus. Efforts to improve model explainability often involve experimenting with AI systems and approximating their behavior through interpretable surrogate mechanisms. However, these procedures can be resource-intensive. Optimal design of experiments, which seeks to maximize the information obtained from a limited number of observations, offers promising methods for improving the efficiency of these explainability techniques. To demonstrate this potential, we explore Local Interpretable Model-agnostic Explanations (LIME), a widely used method introduced by Ribeiro et al. (2016). LIME provides explanations by generating new data points near the instance of interest and passing them through the model. While effective, this process can be computationally expensive, especially when predictions are costly or require many samples. LIME is highly versatile and can be applied to a wide range of models and datasets. In this work, we focus on models involving tabular data, regression tasks, and linear models as interpretable local approximations. By utilizing optimal design of experiments’ techniques, we reduce the number of function evaluations of the complex model, thereby reducing the computational effort of LIME by a significant amount. We consider this modified version of LIME to be energy-efficient, and thus preferable.

Stallrich Jonathan

North Carolina State University

TITLE: An Optimal Design Framework for Lasso Sign Recovery

Supersaturated designs investigate more factors than there are runs, and are often constructed under a criterion measuring a design’s proximity to an unattainable orthogonal design. The most popular analysis identifies active factors by inspecting the solution path of a penalized estimator, such as the lasso. Recent criteria encouraging positive correlations between factors have been shown to produce designs with more definitive solution paths so long as the active factors have positive effects. Two open problems affecting the understanding and practicality of supersaturated designs are: (1) do optimal designs under existing criteria maximize support recovery probability across an estimator’s solution path, and (2) why do designs with positively correlated columns produce more definitive solution paths when the active factors have positive sign effects? To answer these questions, we develop criteria maximizing the lasso’s sign recovery probability. We prove that an orthogonal design is an ideal structure when the signs of the active factors are unknown, and a design constant small, positive correlations is ideal when the signs are assumed known. A computationally-efficient design search algorithm is proposed that first filters through optimal designs under new heuristic criteria to select the one that maximizes the lasso sign recovery probability.

Strouwen Arno

Strouwen Statistics/PumasAI/KULeuven

TITLE: From Optimal Designs to Optimal Policies

At mODa13, several speakers argued that adaptive experimental design should be framed as learning a policy rather than solving a sequence of static optimization problems. In the years since, the machine learning community has made rapid progress on exactly this vision. Deep Adaptive Design, normalizing flows, variational approaches to expected information gain, amortized inference networks, and diffusion-based posterior estimators now enable fully adaptive Bayesian experimental design in real time. This talk surveys these developments and connects them to concepts familiar to the mODa community. We illustrate these ideas through applications to nonlinear dynamical systems, building on our earlier mODa contributions on optimal experimental design for parameter estimation in biological systems. We show how amortized policy-based design can be combined with mechanistic ODE models to tackle design under model uncertainty, and discuss how model discrimination techniques can guide the identification of unknown model components.

Stufken John

George Mason University

TITLE: Data subsampling strategies

Subsampling from large datasets has attracted growing attention over the past decade, leading to the development of a wide range of methods. After a brief high-level overview, this presentation will examine two representative approaches in greater detail: one model-based and one model-free. The model-based method is designed to achieve efficient estimation of model parameters, whereas the model-free approach emphasizes accurate prediction of future observations.

Suzuki Yuya

Aalto University

TITLE: Approximation of differential entropy in Bayesian optimal experimental design

Click here for abstract

Sverdlov Alex

Novartis

TITLE: Optimal Experimental Designs for Wiener Process Models in Clinical Research Studies (DPP)

We consider disease progression models formulated as Wiener processes and develop new approaches for optimizing design of clinical studies that employ such models. We focus on a model that jointly characterizes the mean drift, within- process variance, and between-subject (population) variance. Covariate effects on these parameters are modeled through a generalized regression framework, yield- ing a flexible class of parametric models. Within this framework, various optimal design problems can be formulated and solved using established theoretical tools. The proposed methodology is illustrated through optimal design problems motivated by randomized controlled trials for disease-modifying therapies.

Tarima Sergey

University of Kentucky

TITLE: The effect of informative adaptation on properties of Bayes estimators

(joint work with Nancy Flournoy) This work formalizes the concept of “channeled dependence” for sequential sampling that was introduced by Flournoy and Tarima (2023). A posterior density only depends on the interim decision (e.g., a dose change or a stopping decision) though induced random sample sizes, while the impact of the interim decision rule on the distribution of the observable random variable is not fully captured by the interim decision. To address the invariance of the posterior to the interim decision rule, we consider a channeled posterior, which uses the conditional-on-interim-decision density while the traditional posterior uses the unconditional density. Compared to the traditional posterior mean, the mean the channeled posterior mean showed smaller mean squared error (MSE) integrated over the prior density across the majority of Monte-Carlo simulation studies. The performance of post-testing estimators is also compared using frequentist and Bayes MSEs.

Vazquez Alan

Tecnologico de Monterrey

TITLE: Can Large Language Models Generate Two-Level Fractional Factorial Designs?

Two-level fractional factorial designs permit the study of multiple factors using a limited number of runs. Traditionally, these designs are obtained from catalogs available in standard textbooks or statistical software. However, modern Large Language Models (LLMs) can now produce two-level fractional factorial designs, but the quality of these designs has not been previously assessed. In this presentation, we evaluate two popular classes of LLMs, namely GPT and Gemini models, to construct two-level fractional factorial designs of various sizes. To this end, we use prompting techniques to develop a high-quality set of design construction tasks for the LLMs. We compare the designs obtained by the LLMs with the optimal designs and provide recommendations.

Wang HaiYing

University of Connecticut

TITLE: Antithetic Designs and Stratification for Subsampling

Massive datasets present both opportunities and challenges for statistical estimation, with subsampling emerging as an effective strategy to balance statistical efficiency against computational cost. Existing approaches typically draw independent or conditionally independent subsamples, focusing on selecting individually informative observations. We propose an antithetic subsampling framework that intentionally induces negative dependence among sampled observations to reduce the variance of the resulting estimator. Our method selects groups of observations with designed negative correlations, yielding improved estimation efficiency relative to independent subsampling. Additionally, we will present a Maximum-Variance-Reduction Stratification (MVRS) method that partitions the data to reduce the variance of subsampling estimators. MVRS incurs only a linear additional computational cost and can be seamlessly combined with existing random subsampling designs to further boost efficiency.

WANG Yaping

East China Normal University

TITLE: Uniform projection designs under the stratified L2-discrepancy (DPP)

This paper studies a uniform projection criterion for space-filling designs under the stratified L2-discrepancy. The criterion, denoted by \(Phi_{SD}\), is the average squared stratified L2-discrepancy over all two-dimensional projections. For Utype (n,m,sp) designs, we derive an explicit formula for \(Phi_{SD}\) in terms of row-pairwise weighted hierarchical distances, and we establish sharp lower and upper bounds with equality conditions. We further show that many known optimal constructions attain the lower bound of ΦSD, and that designs attaining the lower bound of the full stratified L2-discrepancy also attain the lower bound of \(Phi_{SD}\). The criterion can be evaluated in O(\(n^2m\)) time, with a modest reduction in arithmetic operations compared with direct projection-wise evaluation. Numerical studies illustrate the theoretical results and show that \(Phi_{SD}\) is effective for assessing low-dimensional projection uniformity.

Xiao Qian

Shanghai Jiao Tong University

TITLE: Batch-Sequential Maximum One-Factor-At-A-Time Designs

Accurate prediction of satellite trajectories in low Earth orbit requires reliable quantification of atmospheric drag, yet high-fidelity drag simulators are too computationally expensive for routine large-scale uncertainty quantification. In the GRACE satellite-drag application considered here, each simulator run is costly, and reliable screening of physical inputs is needed before constructing surrogate models or performing broader uncertainty propagation. This case study exposes a practical limitation of the original Maximum One-Factor-At-A-Time (MOFAT) design: although MOFAT is economical for one-shot factor screening, it does not provide a principled way to augment the design sequentially when the initial run budget is insufficient. We address this problem by developing two batch-sequential extensions of MOFAT: the Interpolation MOFAT (I-MOFAT) and Sliced MOFAT (S-MOFAT) designs. Both preserve the structure required for estimating total Sobol’ indices while allowing the design to grow in stages. I-MOFAT supports local refinement through intermediate levels, whereas S-MOFAT promotes broader exploration through a sliced construction. Numerical studies and the GRACE case study show that these designs identify influential inputs more reliably than competing screening methods under constrained computational budgets.

Yang Min

University of Illinois at Chicago

TITLE: Subdata Selection for Measurement-Constrained Linear Regression under Model Misspecification

In many modern applications, obtaining response variables is costly, time-consuming, or technically challenging, making it impractical to measure outcomes for all available units. In such settings, covariates may be available for a large pool of candidates, but responses can only be collected for a small, carefully selected subset, making the choice of which units to measure a consequential decision. Most existing subsampling methods are developed for massive datasets, emphasizing the trade-off between statistical accuracy and computational cost, and typically relying on the assumption of a correctly specified model. These features limit their suitability in measurement-constrained settings, where statistical efficiency is paramount and model misspecification is often unavoidable. To address these challenges, we develop a new subset selection framework for linear regression that accommodates potential model misspecification. The proposed approach targets predictive expected loss and yields an \(A\)-optimality–type criterion that depends only on covariate information, requiring no response observations at the selection stage. We refer to this method as Expected Loss Optimal Selection (ELOS). By focusing on predictive risk rather than model-based optimality, ELOS remains well-defined and robust under misspecification. Simulation studies and real data applications demonstrate that ELOS achieves strong predictive performance and compares favorably with existing methods across a range of scenarios.

Youssef Noha

The American University in Cairo

TITLE: Maximum Entropy Sequential Design with ML-II, INLA, and MCMC Updating: A Comparative Study (DPP)

This paper investigates maximum entropy sequential design for deterministic computer experiments using Gaussian process surrogates under a fixed simulation budget. We compare three hyperparameter updating strategies within a common entropy-driven framework: (i) Type-II maximum likelihood (ML-II/empirical Bayes), (ii) INLA-based approximate Bayesian updating, and (iii) MAP with full Bayesian propagation via MCMC. Performance is evaluated using pointwise root mean squared error (RMSE), posterior predictive RMSE, integrated posterior variance (IV), an entropy proxy, and computational cost across benchmarks of increasing dimensionality, including the one-dimensional Forrester function, the two-dimensional Branin function, a three-dimensional nonlinear simulator, and the six-dimensional Hartmann function. Across all settings, ML-II and MAP+FullBayes achieve comparable predictive accuracy, while differences in uncertainty quantification are more pronounced. ML-II contracts predictive uncertainty more aggressively, whereas MAP+FullBayes maintains more conservative uncertainty estimates due to integration over hyperparameters. INLA exhibits higher computational cost, and its uncertainty behavior is configuration-dependent: it retains larger uncertainty in one- and two-dimensional settings but contracts uncertainty most aggressively in the three-dimensional case, suggesting that its behavior is not uniformly conservative across benchmarks. These results demonstrate that entropy-based sampling effectively identifies informative regions of the input space, while the treatment of hyperparameter uncertainty governs the trade-off between computational efficiency and uncertainty quantification. The findings highlight the importance of selecting an appropriate updating strategy depending on whether accurate point prediction or reliable uncertainty estimates are the primary objective.

Yu Jun

Beijing Institute of Technology

TITLE: Weighted Conformal Field Design for Sensitivity Experiments (DPP)

Multi-factor sensitivity experiments are fundamental for estimating the probability surface of a binary response and identifying critical reliability boundaries under strict budget constraints. To address the limitations of existing pointwise methods in high-dimensional settings, we propose the Weighted Conformal Field Design. First, we introduce a novel asymmetric design criterion based on filling distance that explicitly targets the manifold of interest, ensuring dense geometric coverage while strictly penalizing failure regions. Second, we develop an active learning algorithm integrating weighted conformal prediction to optimize this objective. By replacing model-dependent variance estimates with distribution-free prediction sets, our approach provides robust uncertainty quantification that remains statistically valid even in data-scarce regimes plagued by model misspecification. Supported by rigorous theoretical guarantees on validity and convergence, extensive experiments demonstrate that our algorithm significantly outperforms state-of-the-art baselines in estimation accuracy and sampling efficiency.

Zhou Yongdao

Nankai University, China

TITLE: Sequential Experimental Designs for Kriging Model (DPP)

Computer experiments have become an indispensable alternative to complex physical and engineering experiments. The Kriging model is the most widely used surrogate model, whose core goal is to minimize the discrepancy between the surrogate model and the true response model over the entire experimental domain. However, existing sequential design methods may have certain limitations. Observation-based batch sequential designs have not been sufficiently explored in existing studies, while pointwise sequential designs tend to be less efficient in terms of information utilization and resource allocation. Such methods typically require numerous iterations to accumulate sufficient information, which may lead to prolonged experimental cycles. To bridge these research gaps, this paper proposes two novel pointwise sequential design criteria and a general batch sequential design framework. Furthermore, the proposed batch sequential design framework addresses the inherent point clustering problem in naive batch selection, thereby enabling the efficient extension of any sequential criterion to batch scenarios. The proposed designs exhibit favorable robustness. Simulations on some test functions demonstrate that the proposed methods outperform existing approaches in terms of fitting accuracy in most cases, verifying the robustness of the developed approaches.

Zhu Hao

Johannes Kepler University Linz

TITLE: Amortized Predictive Expected Information Gain for Adaptive Path-Constrained Experimental Design

Adaptive experimental design for spatial fields becomes particularly challenging when measurements must follow a feasible path, as in mobile sensing, environmental monitoring, or robotic exploration. In this setting, greedy one-step design can be short-sighted, while direct evaluation of long-horizon expected information gain is usually computationally prohibitive. We propose an amortized Bayesian design framework that combines predictive expected information gain with sequential decision-making methods for path-constrained experiments. Our utility targets information about future predictive quantities, rather than only model parameters. Under a fully Bayesian Gaussian process model, teacher utilities are obtained using nested Monte Carlo estimates of predictive expected information gain. A history-conditioned surrogate is then trained to map observed trajectories, candidate future locations, and responses to fast estimates of terminal and incremental information gain. This provides a cheap reward model for long-horizon planning. Building on this surrogate, we investigate two complementary planning mechanisms. First, a soft actor-critic policy is trained to make non-myopic path decisions under movement constraints. Second, a short-horizon model predictive control or reranking layer can refine the policy’s proposed actions using local surrogate scores, including predictive information gain and a complexity-aware term based on local curvature of the posterior predictive mean. This hybrid structure is intended to combine global long-horizon planning with local adaptive correction. The planned study compares the proposed SAC/MPC design strategy with greedy expected-information-gain policies and amortized design approaches inspired by Deep Adaptive Design. Preliminary synthetic examples are used to examine when non-greedy path planning provides advantages over one-step acquisition, especially in spatial fields with localized complex structure.