SCI Publications
2026
S.F. Ahmed, G. Rasineni, F. Koehler, A.Z.B. Aziz, M. Wang, A. Gyulassy, B. Summa, J. Q. Brown, V. Pascucci, S. Y. Elhabian.
PC-MIL: Decoupling Feature Resolution from Supervision Scale in Whole-Slide Learning, Subtitled arXiv:2604.12100v1, 2026.
Whole-slide image (WSI) classification in computational pathology is commonly formulated as slide-level Multiple Instance Learning (MIL) with a single global bag representation. However, slide-level MIL is fundamentally underconstrained: optimizing only global labels encourages models to aggregate features without learning anatomically meaningful localization. This creates a mismatch between the scale of supervision and the scale of clinical reasoning. Clinicians assess tumor burden, focal lesions, and architectural patterns within millimeter-scale regions, whereas standard MIL is trained only to predict whether "somewhere in the slide there is cancer." As a result, the model's inductive bias effectively erases anatomical structure. We propose Progressive-Context MIL (PC-MIL), a framework that treats the spatial extent of supervision as a first-class design dimension. Rather than altering magnification, patch size, or introducing pixel-level segmentation, we decouple feature resolution from supervision scale. Using fixed 20x features, we vary MIL bag extent in millimeter units and anchor supervision at a clinically motivated 2mm scale to preserve comparable tumor burden and avoid confounding scale with lesion density. PC-MIL progressively mixes slide- and region-level supervision in controlled proportions, enabling explicit train-context x test-context analysis. On 1,476 prostate WSIs from five public datasets for binary cancer detection, we show that anatomical context is an independent axis of generalization in MIL, orthogonal to feature resolution: modest regional supervision improves cross-context performance, and balanced multi-context training stabilizes accuracy across slide and regional evaluation without sacrificing global performance. These results demonstrate that supervision extent shapes MIL inductive bias and support anatomically grounded WSI generalization.
T. M. Athawale, K. Moreland, D. Pugmire, C. R. Johnson, P. Rosen, M. Norman, A. Georgiadou,, A. Entezari.
MAGIC: Marching Cubes Isosurface Uncertainty Visualization for Gaussian Uncertain Data with Spatial Correlation, In TVCG, IEEE, 2026.
In this paper, we study the propagation of data uncertainty through the marching cubes algorithm for isosurface visualization for correlated uncertain data. Consideration of correlation has been shown paramount for avoiding errors in uncertainty quantification and visualization in multiple prior studies. Although the problem of isosurface uncertainty with spatial data correlation has been previously addressed, there are two major limitations to prior treatments. First, there are no analytical formulations for uncertainty quantification of isosurfaces when the data uncertainty is characterized by a Gaussian distribution with spatial correlation. Second, as a consequence of the lack of analytical formulations, existing techniques resort to a Monte Carlo sampling approach, which is expensive and difficult to integrate into visualization tools. To address these limitations, we present a closed-form framework to efficiently derive uncertainty in marching cubes level-sets for Gaussian uncertain data with spatial correlation (MAGIC). To derive closed-form solutions, we leverage the Hinkley’s derivation on the ratio of Gaussian distributions. With our analytical framework, we achieve a significant speed-up and enhanced accuracy of uncertainty quantification over classical Monte Carlo methods. We further accelerate our analytical solutions using many-core processors to achieve speed-ups up to 585× and integrability with production visualization tools for broader impact. We demonstrate the effectiveness of our correlation-aware uncertainty framework through experiments on meteorology, urban flow, and astrophysics simulation datasets.
A.Z.B. Aziz, S.F. Ahmed, G. Rasineni, M. Wang, O. Hatipoglu, M. Ricci, M. Shaw, G. Li, J. Q. Brown, V. Pascucci, S. Elhabian.
SIMPLER: H&E-Informed Representation Learning for Structured Illumination Microscopy, Subtitled arXiv:2604.10334v1, 2026.
Structured Illumination Microscopy (SIM) enables rapid, high-contrast optical sectioning of fresh tissue without staining or physical sectioning, making it promising for intraoperative and point-of-care diagnostics. Recent foundation and large-scale self-supervised models in digital pathology have demonstrated strong performance on section-based modalities such as Hematoxylin and Eosin (H&E) and immunohistochemistry (IHC). However, these approaches are predominantly trained on thin tissue sections and do not explicitly address thick-tissue fluorescence modalities such as SIM. When transferred directly to SIM, performance is constrained by substantial modality shift, and naive fine-tuning often overfits to modality-specific appearance rather than underlying histological structure. We introduce SIMPLER (Structured Illumination Microscopy-Powered Learning for Embedding Representations), a cross-modality self-supervised pretraining framework that leverages H&E as a semantic anchor to learn reusable SIM representations. H&E encodes rich cellular and glandular structure aligned with established clinical annotations, while SIM provides rapid, nondestructive imaging of fresh tissue. During pretraining, SIM and H&E are progressively aligned through adversarial, contrastive, and reconstruction-based objectives, encouraging SIM embeddings to internalize histological structure from H&E without collapsing modality-specific characteristics. A single pretrained SIMPLER encoder transfers across multiple downstream tasks, including multiple instance learning and morphological clustering, consistently outperforming SIM models trained from scratch or H&E-only pretraining. Importantly, joint alignment enhances SIM performance without degrading H&E representations, demonstrating asymmetric enrichment rather
W. Bangerth, C. R. Johnson, D. K. Njeru, B. van Bloemen Waanders.
Estimating and using information in inverse problems, In Inverse Problems and Imaging, Vol. 24, pp. 1--33. 2026.
ISSN: 1930-8337
DOI: 10.3934/ipi.2026003
In inverse problems, one attempts to infer spatially variable functions from indirect measurements of a system. To practitioners of inverse problems, the concept of "information" is familiar when discussing key questions such as which parts of the function can be inferred accurately and which cannot. For example, it is generally understood that we can identify system parameters accurately only close to detectors, or along ray paths between sources and detectors, because we have "the most information" for these places.
Although referenced in many publications, the "information" that is invoked in such contexts is not a well understood and clearly defined quantity. Herein, we present a definition of information density that is based on the variance of coefficients as derived from a Bayesian reformulation of the inverse problem. We then discuss three areas in which this information density can be useful in practical algorithms for the solution of inverse problems, and illustrate the usefulness in one of these areas – how to choose the discretization mesh for the function to be reconstructed – using numerical experiments.
T. Bidone.
Rethinking Contractility in Active Cytoskeletal Matter, In Biophysical Journal, 2026.
R.T. Black, S.A. Maas, W. Wu, J. Maheshwari, T. Kolev, J.A. Weiss, M.A. Jolley.
An open-source computational framework for immersed fluid-structure interaction modeling using FEBio and MFEM, Subtitled arXiv:2601.08266v1, 2026.
Fluid-structure interaction (FSI) simulation of biological systems presents significant computational challenges, particularly for applications involving large structural deformations and contact mechanics, such as heart valve dynamics. Traditional ALE methods encounter fundamental difficulties with such problems due to mesh distortion, motivating immersed techniques. This work presents a novel open-source immersed FSI framework that strategically couples two mature finite element libraries: MFEM, a GPU-ready and scalable library with state-of-the-art parallel performance developed at Lawrence Livermore National Laboratory, and FEBio, a nonlinear finite element solver with sophisticated solid mechanics capabilities designed for biomechanics applications developed at the University of Utah. This coupling creates a unique synergy wherein the fluid solver leverages MFEM's distributed-memory parallelization and pathway to GPU acceleration, while the immersed solid exploits FEBio's comprehensive suite of hyperelastic and viscoelastic constitutive models and advanced solid mechanics modeling targeted for biomechanics applications. FSI coupling is achieved using a fictitious domain methodology with variational multiscale stabilization for enhanced accuracy on under-resolved grids expected with unfitted meshes used in immersed FSI. A fully implicit, monolithic scheme provides robust coupling for strongly coupled FSI characteristic of cardiovascular applications. The framework's modular architecture facilitates straightforward extension to additional physics and element technologies. Several test problems are considered to demonstrate the capabilities of the proposed framework, including a 3D semilunar heart valve simulation. This platform addresses a critical need for open-source immersed FSI software combining advanced biomechanics modeling with high-performance computing infrastructure.
J. Bond, J. Pake, C. David, A. McNutt, T.S. Muwonge, D. Orchard, R. Perera.
Literate Execution, Subtitled arXiv:2604.26967v2, 2026.
\emphLiterate programming, introduced by Knurth, interleaves code and prose so that a program can be read as both executable and explanatory text. We propose \emphliterate execution, which inverts this relationship: rather than embedding code within a static narrative, we treat documentation -- and other expository elements such as visualisations -- as first-class artefacts that can be computed alongside a running program and then integrated into a view of its execution. We explore this idea through Fluid, a programming language with a provenance-tracking runtime that records fine-grained dependencies between inputs and outputs. These provenance relationships can be surfaced as interactions that allow readers to explore how intermediate values contribute to a result. By integrating visualisation, provenance, and exposition, literate execution aims to make programs more explorable and self-explanatory, and explorable explanations easier to program.
A. Busatto, L.C.R. Tanner, J.A. Bergquist, G. Plank, K. Gillette, A. Narayan, R.S. MacLeod.
Uncertainty quantification of conduction velocity in models of cardiac spread of activation, In Med Biol Eng Comput, Springer Nature, 2026.
This study quantified the effect of conduction velocity (CV) variability on cardiac electrical activation patterns, a key factor for cardiac digital twins. We examined how myocardial and endocardial longitudinal, transverse, and sheet CVs influence ventricular activation across multiple pacing sites. Three porcine biventricular heart models, each including a fast-conducting endocardial layer, were used to simulate electrical activation with an eikonal approach. Uncertainty quantification with polynomial chaos expansion systematically varied six CV parameters within physiological ranges. In total, 1,868 simulations from eight ventricular pacing sites were analyzed for activation time, variability, and global sensitivities. Myocardial longitudinal CV showed the greatest influence on activation timing (global sensitivity up to 0.98). Endocardial-layer longitudinal CV was similarly important for endocardial stimuli, while transverse and sheet CVs had minimal effects. Activation-time variability reached 15 ms, increasing with distance from the pacing origin. Longitudinal CVs, particularly myocardial and endocardial-layer, dominate ventricular activation dynamics and should be prioritized when personalizing cardiac digital twins. Accounting for CV uncertainty is essential for accurate prediction and therapy optimization.
A. Busatto, J.A. Bergquist, T. Tasdizen, B.A. Steinberg, R. Ranjan, R.S. MacLeod.
Predicting Ventricular Arrhythmia in Myocardial Ischemia Using Deep Learning, In Heart Rhythm O2, Elsevier, 2026.
Background Myocardial ischemia can trigger ventricular arrhythmias with life-threatening consequences. Current monitoring is largely reactive, limiting opportunities for preventive intervention. Objective To determine whether high-resolution epicardial electrograms contain predictive signatures that enable forecasting the timing of premature ventricular contractions (PVCs) during acute ischemia, and to quantify subject-specific data requirements for effective personalization. Methods We analyzed epicardial sock electrograms (247 electrodes, 1 kHz) from 21 porcine acute ischemia experiments comprising 2,252 spontaneous PVCs. Signals were segmented into overlapping sequences of 3, 5, or 7 consecutive non-PVC beats with a continuous target of time-to-next PVC. A 6-layer Long Short-Term Memory (LSTM) network (hidden size 128) with temporal attention was trained using mean absolute error (MAE). Performance was evaluated in (A) pooled 80/10/10 cross-validation and (B) leave-one-experiment-out testing with subject-specific fine-tuning using 10% or 15% of held-out data. Results In Paradigm A, MAE decreased with longer context (6.50 s for 3 beats, 5.97 s for 5 beats, 4.73 s for 7 beats) with excellent calibration (R2>0.996). In Paradigm B, increasing fine-tuning from 10% to 15% reduced mean MAE by 9.6–14.6 s and flattened error growth with prediction horizon, improving the fraction of predictions within 30–60 s windows. Conclusion Epicardial electrograms support accurate PVC time-to-event forecasting during acute ischemia, and modest subject-specific adaptation substantially improves generalization, motivating development of real-time predictive monitoring tools.
A. Cattaneo, M.K. Ballard, R.M. Kirby, V. Shankar.
JetSCI: A Hybrid JAX-PETSc Framework for Scalable Differentiable Simulation, Subtitled arXiv:2604.22087v1, 2026.
The rapid rise of scientific machine learning (SciML) has expanded the role of differentiable modeling, surrogate modeling, and data-driven constitutive laws in large-scale simulation. The JAX framework provides an attractive environment for these workflows through automatically differentiable programs, vectorization, GPU acceleration, and while enabling seamless learning of surrogate models. However, large-scale simulation still relies on mature HPC infrastructure. Libraries, such as PETSc, provide scalable MPI-based parallelism, robust linear and nonlinear solvers, and advanced preconditioning capabilities that remain difficult to reproduce in JAX-only workflows. We present JetSCI, a hybrid JAX-PETSc framework that unifies these complementary strengths. JetSCI uses JAX for GPU-parallel differentiable discretizations and PETSc for robust, scalable solution of the resulting systems on distributed-memory architectures, exposing multilevel parallelism through GPU acceleration within nodes and MPI parallelism across nodes. For finite element discretizations of heterogeneous micromechanics problems, JetSCI outperforms JAX-only implementations in efficiency and accuracy.
L. Chenarides, R. Ladislau, M. Parashar, S. Porter, J. Lane.
Data-usage descriptors as search metadata: the case of food security data and the National Data Platform (2015-2025), Subtitled Research Square Preprint, 2026.
DOI: https://doi.org/10.21203/rs.3.rs-8569040/v1
Scientific data is a critical input into scientific research. Yet the research data landscape is constantly changing as new datasets emerge, others are retired, or some disappear altogether. Data-usage descriptors can substantially advance research productivity by reducing the time that researchers spend finding new and relevant datasets in their research field. This paper describes how to generate data usage descriptors by finding how datasets are used in publications and then linking the dataset information to the publication metadata. It also shows how usage descriptors can be used to find other related datasets and their usage. It concludes by arguing that the approach represents a critical piece of foundational infrastructure that could be deployed in repositories as part of a referenceable, navigable, and contextual data framework. This article contains a reproducible workflow for constructing data-usage descriptors, based on analyzing the full text of publications in the Dimensions database. The illustrative use case is research on food security. The illustrative repository is the National Data Platform.
L. Cicci, S. Qian, C. Rodero, M. Strocchi, C. Corrado, F. Campos, S. Malik, A. Lee, A. Qayyum, K. Gillette, J. Isbister, R. Sy, M. Lee, M. Noseda, R. Wilkinson, G. Plank, M. Bishop, S. Niederer.
Personalising cardiac electrophysiology models from CT and ECG for 3D activation imaging and tissue characterisation, Subtitled Research Square Preprint, 2026.
Background: Electrocardiographic imaging maps cardiac electrical activity
non-invasively but is restricted to the epicardium. Computational electrophysiol-
ogy models can predict 3D activation and tissue properties but require extensive
parameter calibration.
Methods: We introduce an unbiased workflow combining sensitivity analysis
with emulator-based Bayesian history matching to calibrate over 100 organ- and
tissue-scale parameters. The framework incorporates CT-scan images and 12-
lead ECGs with a multi-scale electrophysiology model to generate personalised
ventricular simulations.
Results: The framework was tested on seven subjects (four with synthetic and
three with clinical ECGs), with validation performed using high-density body
surface potentials from a 252-electrode vest for the clinical cases. Calibrated
models reproduced individual ECG morphologies and showed strong agreement
with independent measurements (Pearson’s correlation coefficient: 0.80 ± 0.04).
Conclusions: The study links non-invasive data with high-fidelity simulations to
estimate spatially-varying properties, supporting personalised cardiac modelling
for clinical use.
H. Csala, A. Arzani .
Decomposed sparse modal optimization: Interpretable reduced-order modeling of unsteady flows, In International Journal of Heat and Fluid Flow, Vol. 117, Elsevier, pp. 110124. 2026.
ISSN: 0142-727X
DOI: https://doi.org/10.1016/j.ijheatfluidflow.2025.110124
Modal analysis plays a crucial role in fluid dynamics, offering a powerful tool for reducing the complexity of high-dimensional fluid flow data while extracting meaningful insights into flow physics. This is particularly important in the study of cardiovascular flows, where modal techniques help characterize unsteady flow structures, improve reduced-order modeling, and inform disease diagnosis and rapid medical device design. The most commonly used method, proper orthogonal decomposition (POD), is highly interpretable but suffers from its linearity, which limits its ability to capture nonlinear interactions. In this work, we introduce decomposed sparse modal optimization (DESMO), a nonlinear, adaptive extension of POD that improves the accuracy of flow field reconstruction while requiring fewer modes. We use modern gradient descent-based optimization tools to optimize the spatial modes and temporal coefficients concurrently while using a sparsity-promoting loss term. We demonstrate these on a canonical fluid flow benchmark, flow over a cylinder, a real-world example, blood flow inside a brain aneurysm, and a turbulent channel flow. DESMO can identify spatial modes that resemble higher-order POD modes while uncovering entirely new spatial structures in some cases. Different versions of DESMO can leverage Fourier series for modeling temporal coefficients, an autoencoder for spatial mode optimization, and symbolic regression for discovering differential equations for temporal evolution. Our results demonstrate that DESMO not only provides a more accurate representation of fluid flows but also preserves the interpretability of classical POD by having an analytical modal decomposition equation, offering a promising approach for reduced-order modeling across engineering applications.
Z. Cutler, J. Wilburn, H. Shrestha, Y. Ding, B. Bollen, K. Abrar Nadib, T. He, A. McNutt, L. Harrison, A. Lex.
ReVISit 2: A Full Experiment Life Cycle User Study Framework, In IEEE Transactions on Visualization and Computer Graphics, Vol. 32, IEEE, 2026.
Online user studies of visualizations, visual encodings, and interaction techniques are ubiquitous in visualization research. Yet, designing, conducting, and analyzing studies effectively is still a major burden.Although various packages support such user studies, most solutions address only facets of the experiment life cycle, make reproducibility difficult, or do not cater to nuanced study designs or interactions. We introduce reVISit 2, a software framework that supports visualization researchers at all stages of designing and conducting browser-based user studies. ReVISit supports researchers in the design, debug & pilot, data collection, analysis, and dissemination experiment phases by providing both technical affordances (such as replay of participant interactions) and sociotechnical aids (such as a mindfully maintained community of support). It is a proven system that can be (and has been) used in publication-quality studies---which we demonstrate through a series of experimental replications. We reflect on the design of the system via interviews and an analysis of its technical dimensions. Through this work, we seek to elevate the ease with which studies are conducted, improve the reproducibility of studies within our community, and support the construction of advanced interactive studies.
D. Dade, J.A. Bergquist, R.S. MacLeod, B.A. Steinberg, T. Tasdizen.
Self-Supervised Contrastive Learning Enables Robust ECG-Based Cardiac Classification, In Heart Rhythm O2, Elsevier, 2026.
Background
Objective
Methods
Results
Conclusions
T. Duan, Z. Wang, L. Shen, S. Niu, G. Doretto, D.A. Adjeroh, C. Tao.
Data Distribution Evolution for Robust EEG Emotion Recognition with Limited Data Resource, In IEEE Transactions on Affective Computing, IEEE, pp. 1--14. 2026.
DOI: 10.1109/TAFFC.2026.3684827
Proper decoding of human emotions based on physiological electroencephalography (EEG) signals significantly contributes to the development of human-computer interface related applications. Current major challenges hindering the recognition performance include the following: 1) the high variance and unknown noise that exist in the EEG recordings; 2) the size of EEG datasets are relatively small given the acquisition effort and annotation cost. It is worthwhile to explore approaches to improve decoding robustness under low data resource scenarios. Previous works utilized data augmentation techniques to tackle this problem using manually designed augmentation operations, leading to sub-optimal performance. In this study, we propose a principled framework to perform dynamic evolution on signal data and improve robustness in the occurrence of unknown corruptions or variances. The framework is formed with bi-level distributionally robust optimization (DRO), and improves robustness by simultaneously optimizing on a family of evolved distributions instead of the single training data distribution. We transform the formed gradient flow system into different types of concrete evolution instantiations based on Langevin dynamics and Hamiltonian dynamics, with tailored divergence measures serving as distance constraint. We performed extensive evaluation of the proposed approach on datasets covering different types of affective states, with model robustness tested on different types of corruptions and adversarial examples. The model outperforms competitive baselines by a significant margin on these challenging emotion recognition benchmarks, especially for low data resource scenarios.
Z.J. Eatough, R.J. Lisonbee, A.C. Peterson, S.Y. Elhabian, M. K. Mills, N. Krähenbühl, A. L. Lenz.
Morphologic assessment of peritalar compensation in patients with advanced varus ankle osteoarthritis, In Skeletal Radiology, Springer Nature, 2026.
Objective Varus ankle malalignment is observed in most ankle osteoarthritis patients with approximately half of these patients presenting with peritalar compensation, where the subtalar joint is aligned valgus to compensate for a varus tibiotalar joint. This study developed a 3D weight-bearing computed tomography–based multi-bone statistical shape model to quantify morphologic and alignment differences between compensated and non-compensated presentations of advanced varus ankle osteoarthritis. Materials and methods Our assessment included 70 individuals, 44 diagnosed with advanced varus ankle osteoarthritis, and 26 asymptomatic controls. Each participant underwent weight-bearing computed tomography. Semi-automatic segmentations produced patient-specific 3D bone reconstructions of the distal tibia, distal fibula, talus, calcaneus, navicular, and cuboid. A multi-bone statistical shape model was created using each of the 3D bone reconstructions. Joint space distance, coverage area, and congruence index were measured at equivalent anatomic locations within articular coverage obtained from the statistical shape model. Results Eleven principal component analysis modes retained 85.8% variance. Significant differences existed in mode 1 (medial malleolus and talar dome morphology, fibular positioning; 26.6% variance, p
M. Elhadidy, R.M. D'Souza, A. Arzani.
SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators, Subtitled arXiv:2603.20410, 2026.
Scientific machine learning is increasingly used to build surrogate models, yet most models are trained under a restrictive assumption in which future data follow the same distribution as the training set. In practice, new experimental conditions or simulation regimes may differ significantly, requiring extrapolation and model updates without re-access to prior data. This creates a need for continual learning (CL) frameworks that can adapt to distribution shifts while preventing catastrophic forgetting. Such challenges are pronounced in fluid dynamics, where changes in geometry, boundary conditions, or flow regimes induce non-trivial changes to the solution. Here, we introduce a new architecture-based approach (SLE-FNO) combining a Single-Layer Extension (SLE) with the Fourier Neural Operator (FNO) to support efficient CL. SLE-FNO was compared with a range of established CL methods, including Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), replay-based approaches, Orthogonal Gradient Descent (OGD), Gradient Episodic Memory (GEM), PiggyBack, and Low-Rank Approximation (LoRA), within an image-to-image regression setting. The models were trained to map transient concentration fields to time-averaged wall shear stress (TAWSS) in pulsatile aneurysmal blood flow. Tasks were derived from 230 computational fluid dynamics simulations grouped into four sequential and out-of-distribution configurations. Results show that replay-based methods and architecture-based approaches (PiggyBack, LoRA, and SLE-FNO) achieve the best retention, with SLE-FNO providing the strongest overall balance between plasticity and stability, achieving accuracy with zero forgetting and minimal additional parameters. Our findings highlight key differences between CL algorithms and introduce SLE-FNO as a promising strategy for adapting baseline models when extrapolation is required.
I.J. Eliza, X. Huang, A. Panta, A. Sahistan, Z. Li, A.A. Gooch, V. Pascucci.
Animating Petascale Time-varying Data on Commodity Hardware with LLM-assisted Scripting, Subtitled arXiv:2603.07053v1, 2026.
Scientists face significant visualization challenges as time-varying datasets grow in speed and volume, often requiring specialized infrastructure and expertise to handle massive datasets. Petascale climate models generated in NASA laboratories require a dedicated group of graphics and media experts and access to high-performance computing resources. Scientists may need to share scientific results with the community iteratively and quickly. However, the time-consuming trial-and-error process incurs significant data transfer overhead and far exceeds the time and resources allocated for typical post-analysis visualization tasks, disrupting the production workflow. Our paper introduces a user-friendly framework for creating 3D animations of petascale, time-varying data on a commodity workstation. Our contributions: (i) Generalized Animation Descriptor (GAD) with a keyframe-based adaptable abstraction for animation, (ii) efficient data access from cloud-hosted repositories to reduce data management overhead, (iii) tailored rendering system, and (iv) an LLM-assisted conversational interface as a scripting module to allow domain scientists with no visualization expertise to create animations of their region of interest. We demonstrate the framework's effectiveness with two case studies: first, by generating animations in which sampling criteria are specified based on prior knowledge, and second, by generating AI-assisted animations in which sampling parameters are derived from natural-language user prompts. In all cases, we use large-scale NASA climate-oceanographic datasets that exceed 1PB in size yet achieve a fast turnaround time of 1 minute to 2 hours. Users can generate a rough draft of the animation within minutes, then seamlessly incorporate as much high-resolution data as needed for the final version.
S.A. Faroughi, F. Mostajeran, A. Arzani, S. Faroughi.
Symbolic--KAN: Kolmogorov-Arnold Networks with Discrete Symbolic Structure for Interpretable Learning, Subtitled arXiv:2603.23854, 2026.
Symbolic discovery of governing equations is a long-standing goal in scientific machine learning, yet a fundamental trade-off persists between interpretability and scalable learning. Classical symbolic regression methods yield explicit analytic expressions but rely on combinatorial search, whereas neural networks scale efficiently with data and dimensionality but produce opaque representations. In this work, we introduce Symbolic Kolmogorov-Arnold Networks (Symbolic-KANs), a neural architecture that bridges this gap by embedding discrete symbolic structure directly within a trainable deep network. Symbolic-KANs represent multivariate functions as compositions of learned univariate primitives applied to learned scalar projections, guided by a library of analytic primitives, hierarchical gating, and symbolic regularization that progressively sharpens continuous mixtures into one-hot selections. After gated training and discretization, each active unit selects a single primitive and projection direction, yielding compact closed-form expressions without post-hoc symbolic fitting. Symbolic-KANs further act as scalable primitive discovery mechanisms, identifying the most relevant analytic components that can subsequently inform candidate libraries for sparse equation-learning methods. We demonstrate that Symbolic-KAN reliably recovers correct primitive terms and governing structures in data-driven regression and inverse dynamical systems. Moreover, the framework extends to forward and inverse physics-informed learning of partial differential equations, producing accurate solutions directly from governing constraints while constructing compact symbolic representations whose selected primitives reflect the true analytical structure of the underlying equations. These results position Symbolic-KAN as a step toward scalable, interpretable, and mechanistically grounded learning of governing laws.
Page 1 of 151
