2023 IMS International Conference on Statistics and Data Science (ICSDS)
Date:
We are interested in evaluating the efficacy of Benralizumab, a medication to treat asthma, by using tomography scans captured during expiration and inspiration before and after one year of treatment. The medical working hypothesis posits that patients with improved conditions will exhibit enhanced expiration scans after treatment, which is manifested by higher Hounsfield unit values. This results in a shift to the right in the histogram built from post-treatment image compared to the pre-treatment one. Irpino and Verde mimicked the classical linear regression method so that it can be applied to quantile functions instead of real-valued observations. We generalize their approach and obtain confidence intervals and laws of the estimators in the model via a maximum likelihood approach.The model was implemented in a Python code and applied to a real data set of patients treated. Ongoing research aims to predict 2D-histograms post-treatment from CT scans during inspiration and expiration after registration, along with corresponding pre-treatment histograms, while including scalar covariates.
## We are interested in evaluating the efficacy of Benralizumab, a medication to treat asthma, by using tomography scans captured during expiration and inspiration before and after one year of treatment. The medical working hypothesis posits that patients with improved conditions will exhibit enhanced expiration scans after treatment, which is manifested by higher Hounsfield unit values. This results in a shift to the right in the histogram built from post-treatment image compared to the pre-treatment one.
Irpino and Verde^{1} mimicked the classical linear regression method so that it can be applied to quantile functions instead of real-valued observations. We generalize their approach and obtain confidence intervals and laws of the estimators in the model via a maximum likelihood approach.From the space $S_Q$ of quantile functions, we define the subspace $S_{Q_\mathcal{N}}$ of quantile functions of normally distributed random variables.
We consider the probability space $(\Omega, \mathcal{B}, \mathbf{P})$ the set of all possible outcomes for the random experiment pulling a patient. Now, let $\mu\in\mathbf{R}$ and $\sigma^2,\beta>0$ be unknown parameters, and let us define the random element $Q$: \(\begin{array}{rcl} \textrm{Q}:=\textrm{Q}_{\mu,\sigma^2,\beta}: \Omega & \rightarrow & S_{Q_\mathcal{N}} \\ \omega & \mapsto & q_X:=\textrm{Q}(\omega), \textrm{ quantile function of X } \sim \mathcal{N}(u, v^2), \end{array}\) where $u=U(\omega)$ and $v=V(\omega)$ are realizations of the independent real-valued random variables $U \sim \mathcal{N}\bigr(\mu, \sigma^2\bigl)$ and $V \sim \mathcal{E}xp\left(\beta^{-1}\right)$. We note $\textrm{Q}\sim Q\mathcal{N}(\mu,\sigma^2,\beta^{-1})$.
Let $Q_1^X , Q_2^X, \ldots, Q_n^X $ the predictive deterministic quantile functions from $ S_{Q_{\mathcal{N}}} $ and $\epsilon_1, \ldots, \epsilon_n$ quantile normal random elements distributed as $Q\mathcal{N}(0,\sigma^2,\beta^{-1})$.
Let $\textrm{Q}1^Y, \ldots, \textrm{Q}_n^Y $ be i.i.d.\ copies of a quantile-valued random element $\textrm{Q}^Y:\Omega \rightarrow S{Q}$. For the individual, $i$ we note $\bar{x}{Q_i}$ the mean of the quantile function $Q_i^X$ and for all $t\in]0,1[$ , $Q^{cX}_i(t) = Q^{X}_i(t) - \bar{x}{Q_i} $.
The statistical model is defined as follows: $ \forall i \in {1, 2, \ldots, n}, \forall t \in ]0,1[ \quad Q_i^Y(t) = \alpha + b_1 \bar{x}_{Q_i} + b_2 Q^{cx}_i(t) + \epsilon_i(t) $ where the unknown real-valued coefficients $\alpha$, $b_1$ and $b_2$ will be estimated via a likelihood approach.
The model was implemented in a Python code and applied to a real data set of 40 patients treated by Benarlizumab.
The approach described above has some limitations, including the loss of spatial information and the assumption of linear relationships between voxel distributions. Further investigation is needed to develop a more general distribution-on-distribution regression method, such as the works by Chen^{2} and Ghodrati and Panaretos^{3}. But our approach has the advantage of being simple, easy to use and understood by practitioners. Ongoing research aims to predict 2D-histograms post-treatment from CT scans during inspiration and expiration after registration, along with corresponding pre-treatment histograms, while including scalar covariates.
References
Irpino and Verde (2015). Linear regression for numeric symbolic variables: a least squares approach based on Wasserstein Distance. Advances in Data Analysis and Classification, v.9, n.1,p.81–106.
Chen,Lin and Müller (2023). Wasserstein Regression. Journal of the American Statistical Associationv.118, n.542, p.869–882.
Ghodrati L., Panaretos V. (2022) Distribution-on-distribution regression via optimal transport maps.Biometrika, v.109, p.957–974.