Home

Page 8 of 150

SCI Publications

2024

A.M. Chalifoux, L. Gibb, K.N. Wurth, T. Tenner, T. Tasdizen, L. MacDonald. “Morphology of uranium oxides reduced from magnesium and sodium diuranate,” In Radiochimica Acta, Vol. 112, No. 2, pp. 73-84. 2024.

ABSTRACT

Morphological analysis of uranium materials has proven to be a key signature for nuclear forensic purposes. This study examines the morphological changes to magnesium diuranate (MDU) and sodium diuranate (SDU) during reduction in a 10 % hydrogen atmosphere with and without steam present. Impurity concentrations of the materials were also examined pre and post reduction using energy dispersive X-ray spectroscopy combined with scanning electron microscopy (SEM-EDX). The structures of the MDU, SDU, and UO_x samples were analyzed using powder X-ray diffraction (p-XRD). Using this method, UO_x from MDU was found to be a mixture of UO₂, U₄O₉, and MgU₂O₆ while UO_x from SDU were combinations of UO₂, U₄O₉, U₃O₈, and UO₃. By SEM, the MDU and UO_x from MDU had identical morphologies comprised of large agglomerates of rounded particles in an irregular pattern. SEM-EDX revealed pockets of high U and high Mg content distributed throughout the materials. The SDU and UO_x from SDU had slightly different morphologies. The SDU consisted of massive agglomerates of platy sheets with rough surfaces. The UO_x from SDU was comprised of massive agglomerates of acicular and sub-rounded particles that appeared slightly sintered. Backscatter images of SDU and related UO_x materials showed sub-rounded dark spots indicating areas of high Na content, especially in UO_x materials created in the presence of steam. SEM-EDX confirmed the presence of high sodium concentration spots in the SDU and UO_x from SDU. Elemental compositions were found to not change between pre and post reduction of MDU and SDU indicating that reduction with or without steam does not affect Mg or Na concentrations. The identification of Mg and Na impurities using SEM analysis presents a readily accessible tool in nuclear material analysis with high Mg and Na impurities likely indicating processing via MDU or SDU, respectively. Machine learning using convolutional neural networks (CNNs) found that the MDU and SDU had unique morphologies compared to previous publications and that there are distinguishing features between materials created with and without steam.

N. Cheng, O.A. Malik, Y. Xu, S. Becker, A. Doostan, A. Narayan. “Subsampling of Parametric Models with Bifidelity Boosting,” In Journal on Uncertainty Quantificatio., ACM, 2024.

ABSTRACT

Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise at reducing the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bifidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the nonboosted solution.

Y. Chen, Y. Ji, A. Narayan, Z. Xu. “TGPT-PINN: Nonlinear model reduction with transformed GPT-PINNs,” Subtitled “arXiv preprint arXiv:2403.03459,” 2024.

ABSTRACT

We introduce the Transformed Generative Pre-Trained Physics-Informed Neural Networks (TGPT-PINN) for accomplishing nonlinear model order reduction (MOR) of transport-dominated partial differential equations in an MOR-integrating PINNs framework. Building on the recent development of the GPT-PINN that is a network-of-networks design achieving snapshot-based model reduction, we design and test a novel paradigm for nonlinear model reduction that can effectively tackle problems with parameter-dependent discontinuities. Through incorporation of a shock-capturing loss function component as well as a parameter-dependent transform layer, the TGPT-PINN overcomes the limitations of linear model reduction in the transport-dominated regime. We demonstrate this new capability for nonlinear model reduction in the PINNs framework by several nontrivial parametric partial differential equations.

M. Cooley, S. Zhe, R.M. Kirby, V. Shankar. “Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation,” Subtitled “arXiv preprint arXiv:2406.02336,” 2024.

ABSTRACT

We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced accuracy over a variety of problems, we present (1) a family of orthogonality constraints that impose mutual orthogonality between the polynomial and the DNN within a PANN; (2) a simple basis pruning approach to combat the curse of dimensionality introduced by the polynomial component; and (3) an adaptation of a polynomial preconditioning strategy to both DNNs and polynomials. We test the resulting architecture for its polynomial reproduction properties, ability to approximate both smooth functions and functions of limited smoothness, and as a method for the solution of partial differential equations (PDEs). Through these experiments, we demonstrate that PANNs offer superior approximation properties to DNNs for both regression and the numerical solution of PDEs, while also offering enhanced accuracy over both polynomial and DNN-based regression (each) when regressing functions with limited smoothness.

M. Cooley, R.M. Kirby, S. Zhe, V. Shankar. “HyResPINNs: Adaptive Hybrid Residual Networks for Learning Optimal Combinations of Neural and RBF Components for Physics-Informed Modeling,” Subtitled “arXiv:2410.03573,” 2024.

ABSTRACT

Physics-informed neural networks (PINNs) are an increasingly popular class of techniques for the numerical solution of partial differential equations (PDEs), where neural networks are trained using loss functions regularized by relevant PDE terms to enforce physical constraints. We present a new class of PINNs called HyResPINNs, which augment traditional PINNs with adaptive hybrid residual blocks that combine the outputs of a standard neural network and a radial basis function (RBF) network. A key feature of our method is the inclusion of adaptive combination parameters within each residual block, which dynamically learn to weigh the contributions of the neural network and RBF network outputs. Additionally, adaptive connections between residual blocks allow for flexible information flow throughout the network. We show that HyResPINNs are more robust to training point locations and neural network architectures than traditional PINNs. Moreover, HyResPINNs offer orders of magnitude greater accuracy than competing methods on certain problems, with only modest increases in training costs. We demonstrate the strengths of our approach on challenging PDEs, including the Allen-Cahn equation and the Darcy-Flow equation. Our results suggest that HyResPINNs effectively bridge the gap between traditional numerical methods and modern machine learning-based solvers.

H. Csala, A. Mohan, D. Livescu, A. Arzani. “Physics-constrained coupled neural differential equations for one dimensional blood flow modeling,” Subtitled “arXiv:2411.05631,” 2024.

ABSTRACT

Computational cardiovascular flow modeling plays a crucial role in understanding blood flow dynamics. While 3D models provide acute details, they are computationally expensive, especially with fluid-structure interaction (FSI) simulations. 1D models offer a computationally efficient alternative, by simplifying the 3D Navier-Stokes equations through axisymmetric flow assumption and cross-sectional averaging. However, traditional 1D models based on finite element methods (FEM) often lack accuracy compared to 3D averaged solutions. This study introduces a novel physics-constrained machine learning technique that enhances the accuracy of 1D cardiovascular flow models while maintaining computational efficiency. Our approach, utilizing a physics-constrained coupled neural differential equation (PCNDE) framework, demonstrates superior performance compared to conventional FEM-based 1D models across a wide range of inlet boundary condition waveforms and stenosis blockage ratios. A key innovation lies in the spatial formulation of the momentum conservation equation, departing from the traditional temporal approach and capitalizing on the inherent temporal periodicity of blood flow. This spatial neural differential equation formulation switches space and time and overcomes issues related to coupling stability and smoothness, while simplifying boundary condition implementation. The model accurately captures flow rate, area, and pressure variations for unseen waveforms and geometries. We evaluate the model’s robustness to input noise and explore the loss landscapes associated with the inclusion of different physics terms. This advanced 1D modeling technique offers promising potential for rapid cardiovascular simulations, achieving computational efficiency and accuracy. By combining the strengths of physics-based and data-driven modeling, this approach enables fast and accurate cardiovascular simulations.

H. Csala, O. Amili, R.M. D'Souza, A. Arzani. “A comparison of machine learning methods for recovering noisy and missing 4D flow MRI data,” In Int J Numer Method Biomed Eng, Vol. 40, No. 11, 2024.

ABSTRACT

Experimental blood flow measurement techniques are invaluable for a better understanding of cardiovascular disease formation, progression, and treatment. One of the emerging methods is time-resolved three-dimensional phase-contrast magnetic resonance imaging (4D flow MRI), which enables noninvasive time-dependent velocity measurements within large vessels. However, several limitations hinder the usability of 4D flow MRI and other experimental methods for quantitative hemodynamics analysis. These mainly include measurement noise, corrupt or missing data, low spatiotemporal resolution, and other artifacts. Traditional filtering is routinely applied for denoising experimental blood flow data without any detailed discussion on why it is preferred over other methods. In this study, filtering is compared to different singular value decomposition (SVD)-based machine learning and autoencoder-type deep learning methods for denoising and filling in missing data (imputation). An artificially corrupted and voxelized computational fluid dynamics (CFD) simulation as well as in vitro 4D flow MRI data are used to test the methods. SVD-based algorithms achieve excellent results for the idealized case but severely struggle when applied to in vitro data. The autoencoders are shown to be versatile and applicable to all investigated cases. For denoising, the in vitro 4D flow MRI data, the denoising autoencoder (DAE), and the Noise2Noise (N2N) autoencoder produced better reconstructions than filtering both qualitatively and quantitatively. Deep learning methods such as N2N can result in noise-free velocity fields even though they did not use clean data during training. This work presents one of the first comprehensive assessments and comparisons of various classical and modern machine-learning methods for enhancing corrupt cardiovascular flow data in diseased arteries for both synthetic and experimental test cases.

D. Dade, J.A. Bergquist, R.S. MacLeod, X. Ye, R. Ranjan, B. Steinberg, T. Tasdizen. “A Survey of Augmentation Techniques for Enhancing ECG Representation Through Self-Supervised Contrastive Learning,” In Computing in Cardiology 2024, 2024.

ABSTRACT

The electrocardiogram (ECG) is the most common non-invasive tool to measure the electrical activity of the heart and assess cardiac health. Despite their ubiquity and utility, traditional ECG analysis methods are limited in many impactful diseases. Machine learning tools can be employed to automate task-specific detection of diseases, and to detect patterns that are ignored by traditional ECG analysis. Contemporary machine learning tools are limited by requirements for large labeled datasets, which can be scarce for rare diseases. Self-supervised learning (SSL) can address this data scarcity. We implemented the momentum contrast(MoCo) framework, a form of SSL, using a large clinical ECG dataset. We then assessed the learning using Low Left Ventricular Ejection Fraction (LVEF) Detection as the downstream task. We compared the SSL improvement of LVEF classification across different input augmentations. We observed that optimal augmentation hyperparameters varied substantially based on the training dataset size, indicating that augmentation strategies may need to be tuned based on problem and dataset size.

H. Dai, S. Joshi. “Refining Skewed Perceptions in Vision-Language Models through Visual Representations,” Subtitled “arXiv preprint arXiv:2405.14030,” 2024.

ABSTRACT

Large vision-language models (VLMs), such as CLIP, have become foundational, demonstrating remarkable success across a variety of downstream tasks. Despite their advantages, these models, akin to other foundational systems, inherit biases from the disproportionate distribution of real-world data, leading to misconceptions about the actual environment. Prevalent datasets like ImageNet are often riddled with non-causal, spurious correlations that can diminish VLM performance in scenarios where these contextual elements are absent. This study presents an investigation into how a simple linear probe can effectively distill task-specific core features from CLIP’s embedding for downstream applications. Our analysis reveals that the CLIP text representations are often tainted by spurious correlations, inherited in the biased pre-training dataset. Empirical evidence suggests that relying on visual representations from CLIP, as opposed to text embedding, is more practical to refine the skewed perceptions in VLMs, emphasizing the superior utility of visual representations in overcoming embedded biases

S. Dasetty, T.C. Bidone, A.L. Ferguson. “Data-driven prediction of αIIbβ3 integrin activation pathways using nonlinear manifold learning and deep generative modeling,” In Biophysical Journal, Vol. 123, 2024.

ABSTRACT

The integrin heterodimer is a transmembrane protein critical for driving cellular process and is a therapeutic target in the treatment of multiple diseases linked to its malfunction. Activation of integrin involves conformational transitions between bent and extended states. Some of the conformations that are intermediate between bent and extended states of the heterodimer have been experimentally characterized, but the full activation pathways remain unresolved both experimentally due to their transient nature and computationally due to the challenges in simulating rare barrier crossing events in these large molecular systems. An understanding of the activation pathways can provide new fundamental understanding of the biophysical processes associated with the dynamic interconversions between bent and extended states and can unveil new putative therapeutic targets. In this work, we apply nonlinear manifold learning to coarse-grained molecular dynamics simulations of bent, extended, and two intermediate states of aIIbb3 integrin to learn a low-dimensional embedding of the configurational phase space. We then train deep generative models to learn an inverse mapping between the low-dimensional embedding and high-dimensional molecular space and use these models to interpolate the molecular configurations constituting the activation pathways between the experimentally characterized states. This work furnishes plausible predictions of integrin activation pathways and reports a generic and transferable multiscale technique to predict transition pathways for biomolecular systems.

Y.S. Dogrusoz, L. Bear, J.A. Bergquist, A. Rababah, W. Good, J. Stoks, J. Svehikova, E. van Dam, D.H. Brooks, R.S. MacLeod. “Evaluation of five methods for the interpolation of bad leads in the solution of the inverse electrocardiography problem,” In Physiological Measurement, Vol. 24, No. 45, 2024.
DOI: 10.1088/1361-6579/ad74d6

ABSTRACT

Objective.This study aims to assess the sensitivity of epicardial potential-based electrocardiographic imaging (ECGI) to the removal or interpolation of bad leads.Approach.We utilized experimental data from two distinct centers. Langendorff-perfused pig (n= 2) and dog (n= 2) hearts were suspended in a human torso-shaped tank and paced from the ventricles. Six different bad lead configurations were designed based on clinical experience. Five interpolation methods were applied to estimate the missing data. Zero-order Tikhonov regularization was used to solve the inverse problem for complete data, data with removed bad leads, and interpolated data. We assessed the quality of interpolated ECG signals and ECGI reconstructions using several metrics, comparing the performance of interpolation methods and the impact of bad lead removal versus interpolation on ECGI.

Main results.The performance of ECG interpolation strongly correlated with ECGI reconstruction. The hybrid method exhibited the best performance among interpolation techniques, followed closely by the inverse-forward and Kriging methods. Bad leads located over high amplitude/high gradient areas on the torso significantly impacted ECGI reconstructions, even with minor interpolation errors. The choice between removing or interpolating bad leads depends on the location of missing leads and confidence in interpolation performance. If uncertainty exists, removing bad leads is the safer option, particularly when they are positioned in high amplitude/high gradient regions. In instances where interpolation is necessary, the inverse-forward and Kriging methods, which do not require training, are recommended.

Significance.This study represents the first comprehensive evaluation of the advantages and drawbacks of interpolating versus removing bad leads in the context of ECGI, providing valuable insights into ECGI performance.

J. Dong, E. Kwan, J.A. Bergquist, B.A. Steinberg, D.J. Dosdall, E. DiBella, R.S. MacLeod, T.J. Bunch, R. Ranjan. “Ablation-induced left atrial mechanical dysfunction recovers in weeks after ablation,” In Journal of Interventional Cardiac Electrophysiology, Springer, 2024.

ABSTRACT

Background

The immediate impact of catheter ablation on left atrial mechanical function and the timeline for its recovery in patients undergoing ablation for atrial fibrillation (AF) remain uncertain. The mechanical function response to catheter ablation in patients with different AF types is poorly understood.

Methods

A total of 113 AF patients were included in this retrospective study. Each patient had three magnetic resonance imaging (MRI) studies in sinus rhythm: one pre-ablation, one immediate post-ablation (within 2 days after ablation), and one post-ablation follow-up MRI (≤ 3 months). We used feature tracking in the MRI cine images to determine peak longitudinal atrial strain (PLAS). We evaluated the change in strain from pre-ablation, immediately after ablation to post-ablation follow-up in a short-term study (< 50 days) and a 3-month study (3 months after ablation).

Results

The PLAS exhibited a notable reduction immediately after ablation, compared to both pre-ablation levels and those observed in follow-up studies conducted at short-term (11.1 ± 9.0 days) and 3-month (69.6 ± 39.6 days) intervals. However, there was no difference between follow-up and pre-ablation PLAS. The PLAS returned to 95% pre-ablation level within 10 days. Paroxysmal AF patients had significantly higher pre-ablation PLAS than persistent AF patients in pre-ablation MRIs. Both type AF patients had significantly lower immediate post-ablation PLAS compared with pre-ablation and post-ablation PLAS.

Conclusion

The present study suggested a significant drop in PLAS immediately after ablation. Left atrial mechanical function recovered within 10 days after ablation. The drop in PLAS did not show a substantial difference between paroxysmal and persistent AF patients.

J. Dong, E. Kwan, J.A. Bergquist, D.J. Dosdall, E.V. DiBella, R.S. MacLeod, G. Stoddard, K. Konstantidinis, B.A. Steinberg, T.J. Bunch, R. Ranjan. “Left atrial functional changes associated with repeated catheter ablations for atrial fibrillation,” In J Cardiovasc Electrophysiol, 2024.
DOI: 10.1111/jce.16484
PubMed ID: 39474660

ABSTRACT

Introduction: The impact of repeated atrial fibrillation (AF) ablations on left atrial (LA) mechanical function remains uncertain, with limited long-term follow-up data.

Methods: This retrospective study involved 108 AF patients who underwent two catheter ablations with cardiac magnetic resonance imaging (MRI) done before and 3 months after each of the ablations from 2010 to 2021. The rate of change in peak longitudinal atrial strain (PLAS) assessed LA function. Additionally, a sub-study of 36 patients who underwent an extra MRI before the second ablation, gave us an additional time segment to evaluate the basis of change in PLAS.

Results: In the two-ablation, three MRI sub-study 1, the PLAS percent change rate was similar before and after the first ablation (r₁₁ = -0.9 ± 3.1%/year, p = 0.771). However, the strain change rate from postablation 1 to postablation 2 was significantly worse (r₁₂ = -23.7 ± 4.8%/year, p < 0.001). In the sub-study 2 with four MRIs, all three rates were negative, with reductions from postablation 1 to pre-ablation 2 (r₂₂ = -13.3 ± 2.6%/year, p < 0.001) and from pre-ablation 2 to postablation 2 (r₂₃ = -8.9 ± 3.9%/year, p = 0.028) being significant.

Conclusion: The present study suggests that the more ablations performed, the more significant the decrease in the postablation mechanical function of the LA. The natural progression of AF (strain change from postablation 1 to pre-ablation 2) had a greater negative influence on LA mechanical function compared to the second ablation itself suggesting that second ablation in patients with recurrence after first ablation is an effective strategy even from the LA mechanical function aspect.

S. Dubey, Y. Chong, B. Knudsen, S.Y. Elhabian. “VIMs: Virtual Immunohistochemistry Multiplex staining via Text-to-Stain Diffusion Trained on Uniplex Stains,” Subtitled “arXiv:2407.19113,” 2024.

ABSTRACT

This paper introduces a Virtual Immunohistochemistry Multiplex staining (VIMs) model designed to generate multiple immunohistochemistry (IHC) stains from a single hematoxylin and eosin (H&E) stained tissue section. IHC stains are crucial in pathology practice for resolving complex diagnostic questions and guiding patient treatment decisions. While commercial laboratories offer a wide array of up to 400 different antibody-based IHC stains, small biopsies often lack sufficient tissue for multiple stains while preserving material for subsequent molecular testing. This highlights the need for virtual IHC staining. Notably, VIMs is the first model to address this need, leveraging a large vision-language single-step diffusion model for virtual IHC multiplexing through text prompts for each IHC marker. VIMs is trained on uniplex paired H&E and IHC images, employing an adversarial training module. Testing of VIMs includes both paired and unpaired image sets. To enhance computational efficiency, VIMs utilizes a pre-trained large latent diffusion model fine-tuned with small, trainable weights through the Low-Rank Adapter (LoRA) approach. Experiments on nuclear and cytoplasmic IHC markers demonstrate that VIMs outperforms the base diffusion model and achieves performance comparable to Pix2Pix, a standard generative model for paired image translation. Multiple evaluation methods, including assessments by two pathologists, are used to determine the performance of VIMs. Additionally, experiments with different prompts highlight the impact of text conditioning. This paper represents the first attempt to accelerate histopathology research by demonstrating the generation of multiple IHC stains from a single H&E input using a single model trained solely on uniplex data. This approach relaxes the traditional need for multiplex training sets, significantly broadening the applicability and accessibility of virtual IHC staining techniques.

K. Eckelt, K. Gadhave, A. Lex, M. Streit. “Loops: Leveraging Provenance and Visualization to Support Exploratory Data Analysis in Notebooks,” In Computer Graphics Forum, 2024.

ABSTRACT

Exploratory data science is an iterative process of obtaining, cleaning, profiling, analyzing, and interpreting data. This cyclical way of working creates challenges within the linear structure of computational notebooks, leading to issues with code quality, recall, and reproducibility. To remedy this, we present Loops, a set of visual support techniques for iterative and exploratory data analysis in computational notebooks. Loops leverages provenance information to visualize the impact of changes made within a notebook. In visualizations of the notebook provenance, we trace the evolution of the notebook over time and highlight differences between versions. Loops visualizes the provenance of code, markdown, tables, visualizations, and images and their respective differences. Analysts can explore these differences in detail in a separate view. Loops not only improves the reproducibility of notebooks but also supports analysts in their data science work by showing the effects of changes and facilitating comparison of multiple versions. We demonstrate our approach’s utility and potential impact in two use cases and feedback from notebook users from various backgrounds.

G. Eisenhauer, N. Podhorszki, A. Gainaru, S. Klasky, M. Parashar, M. Wolf, E. Suchtya, E. Fredj, V. Bolea, F. Poschel, K. Steiniger, M. Bussmann, R. Pausch, S. Chandrasekaran. “Streaming Data in HPC Workflows Using ADIOS,” Subtitled “arXiv:2410.00178v1,” 2024.

ABSTRACT

The “IO Wall” problem, in which the gap between computation rate and data access rate grows continuously, poses significant problems to scientific workflows which have traditionally relied upon using the filesystem for intermediate storage between workflow stages. One way to avoid this problem in scientific workflows is to stream data directly from producers to consumers and avoiding storage entirely. However, the manner in which this is accomplished is key to both performance and usability. This paper presents the Sustainable Staging Transport, an approach which allows direct streaming between traditional file writers and readers with few application changes. SST is an ADIOS “engine”, accessible via standard ADIOS APIs, and because ADIOS allows engines to be chosen at run-time, many existing file-oriented ADIOS workflows can utilize SST for direct application-to-application communication without any source code changes. This paper describes the design of SST and presents performance results from various applications that use SST, for feeding model training with simulation data with substantially higher bandwidth than the theoretical limits of Frontier’s file system, for strong coupling of separately developed applications for multiphysics multiscale simulation, or for in situ analysis and visualization of data to complete all data processing shortly after the simulation finishes.

I.J. Eliza, J. Wagoner, J. Wilburn, N. Lanza, D. Hajas, A. Lex. “Accessible Text Descriptions for UpSet Plots,” In 2024 1st Workshop on Accessible Data Visualization (AccessViz), pp. 1--4. 2024.

ABSTRACT

Data visualizations are typically not accessible to blind and low-vision users. The most widely used remedy for making data visualizations accessible is text descriptions. Yet, manually creating useful text descriptions is often omitted by visualization authors, either because of a lack of awareness or a perceived burden. Automatically generated text descriptions are a potential partial remedy. However, with current methods it is unfeasible to create text descriptions for complex scientific charts. In this paper, we describe our methods for generating text descriptions for one complex scientific visualization: the UpSet plot. UpSet is a widely used technique for the visualization and analysis of sets and their intersections. At the same time, UpSet is arguably unfamiliar to novices and used mostly in scientific contexts. Generating text descriptions for UpSet plots is challenging because the patterns observed in UpSet plots have not been studied. We first analyze patterns present in dozens of published UpSet plots. We then introduce software that generates text descriptions for UpSet plots based on the patterns present in the chart. Finally, we introduce a web service that generates text descriptions based on a specification of an UpSet plot, and demonstrate its use in both an interactive web-based implementation and a static Python implementation of UpSet.

Y. Epshteyn, A. Narayan, Y. Yu. “Energy Stable and Structure-Preserving Algorithms for the Stochastic Galerkin System of 2D Shallow Water Equations,” Subtitled “arXiv:2412.16353,” 2024.

ABSTRACT

Shallow water equations (SWE) are fundamental nonlinear hyperbolic PDE-based models in fluid dynamics that are essential for studying a wide range of geophysical and engineering phenomena. Therefore, stable and accurate numerical methods for SWE are needed. Although some algorithms are well studied for deterministic SWE, more effort should be devoted to handling the SWE with uncertainty. In this paper, we incorporate uncertainty through a stochastic Galerkin (SG) framework, and building on an existing hyperbolicity-preserving SG formulation for 2D SWE, we construct the corresponding entropy flux pair, and develop structure-preserving, well-balanced, second-order energy conservative and energy stable finite volume schemes for the SG formulation of the two-dimensional shallow water system. We demonstrate the efficacy, applicability, and robustness of these structure-preserving algorithms through several challenging numerical experiments.

A. Ferrero, E. Ghelichkhan, H. Manoochehri, M.M. Ho, D.J. Albertson, B.J. Brintz, T. Tasdizen, R.T. Whitaker, B. Knudsen. “HistoEM: A Pathologist-Guided and Explainable Workflow Using Histogram Embedding for Gland Classification,” In Modern Pathology, Vol. 37, No. 4, 2024.

ABSTRACT

Pathologists have, over several decades, developed criteria for diagnosing and grading prostate cancer. However, this knowledge has not, so far, been included in the design of convolutional neural networks (CNN) for prostate cancer detection and grading. Further, it is not known whether the features learned by machine-learning algorithms coincide with diagnostic features used by pathologists. We propose a framework that enforces algorithms to learn the cellular and subcellular differences between benign and cancerous prostate glands in digital slides from hematoxylin and eosin–stained tissue sections. After accurate gland segmentation and exclusion of the stroma, the central component of the pipeline, named HistoEM, utilizes a histogram embedding of features from the latent space of the CNN encoder. Each gland is represented by 128 feature-wise histograms that provide the input into a second network for benign vs cancer classification of the whole gland. Cancer glands are further processed by a U-Net structured network to separate low-grade from high-grade cancer. Our model demonstrates similar performance compared with other state-of-the-art prostate cancer grading models with gland-level resolution. To understand the features learned by HistoEM, we first rank features based on the distance between benign and cancer histograms and visualize the tissue origins of the 2 most important features. A heatmap of pixel activation by each feature is generated using Grad-CAM and overlaid on nuclear segmentation outlines. We conclude that HistoEM, similar to pathologists, uses nuclear features for the detection of prostate cancer. Altogether, this novel approach can be broadly deployed to visualize computer-learned features in histopathology images.

S. Garg, J. Zhang, R. Pitchumani, M. Parashar, B. Xie, S. Kannan. “CrossPrefetch: Accelerating I/O Prefetching for Modern Storage,” In 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2024.

ABSTRACT

We introduce CrossPrefetch, a novel cross-layered I/O prefetching mechanism that operates across the OS and a user-level runtime to achieve optimal performance. Existing OS prefetching mechanisms suffer from rigid interfaces that do not provide information to applications on the prefetch effectiveness, suffer from high concurrency bottlenecks, and are inefficient in utilizing available system memory. CrossPrefetch addresses these limitations by dividing responsibilities between the OS and runtime, minimizing overhead, and achieving low cache misses, lock contentions, and higher I/O performance.

CrossPrefetch tackles the limitations of rigid OS prefetching interfaces by maintaining and exporting cache state and prefetch effectiveness to user-level runtimes. It also addresses scalability and concurrency bottlenecks by distinguishing between regular I/O and prefetch operations paths and introduces fine-grained prefetch indexing for shared files. Finally, CrossPrefetch designs low-interference access pattern prediction combined with support for adaptive and aggressive techniques to exploit memory capacity and storage bandwidth. Our evaluation of CrossPrefetch, encompassing microbenchmarks, macrobenchmarks, and real-world workloads, illustrates performance gains of up to 1.22x- 3.7x in I/O throughput. We also evaluate CrossPrefetch across different file systems and local and remote storage configurations.

Page 8 of 150