Exploratory Data Analysis

Ariel Exoplanet Atmosphere Data Analysis

Understanding exoplanet atmospheric spectra from the NeurIPS Ariel Data Challenge 2024

673
Exoplanets
283
Wavelengths
170GB
Raw Data
2029
Ariel Launch
๐Ÿ“ฆ Data Source: Kaggle Ariel Data Challenge 2024 โ€” Simulated data for the ESA Ariel Space Telescope (launching 2029)
Click any graph to expand
๐Ÿ”ญ What Is This Data?
The Ariel Space Telescope will study 1,000+ exoplanet atmospheres starting in 2029. This dataset simulates what Ariel will observe: light passing through alien atmospheres, revealing which molecules are present (water, COโ‚‚, methane, etc.)
673 simulated exoplanets | 283 wavelength bins (1.1 - 7.8 ฮผm) | 170 GB raw data
๐Ÿ“Š

Understanding the Data

Raw Telescope Data
11,250 frames per planet
Contains noise & jitter
โ†’
Atmospheric Spectrum
283 wavelength measurements
Reveals molecular composition

Each spectrum tells us what molecules are in the exoplanet's atmosphere. Water shows absorption at certain wavelengths, COโ‚‚ at others, etc.

๐Ÿงช

Appendix: ML Architecture Experiments

We also experimented with various ML models on a simplified spectral regression task. Note: This is not the same as solving the full competition problem.

Rank Model Architecture Paper Date Test Rยฒ
๐Ÿฅ‡ FourierKAN Fourier-based KAN (Statistically Verified) Jun 2024 0.9474
๐Ÿฅˆ Griffin (RG-LRU) Gated Linear Recurrence (Google DeepMind) Feb 2024 0.9415
๐Ÿฅ‰ Liquid NN Liquid Time-Constant Networks (MIT) Jun 2020 0.9457
4th Smooth Fourier Physics-Informed Fourier Basis Novel 0.9441
5th Griffin++ Optimized RG-LRU Variant Novel 0.9434
6th MEGA EMA + Gated Attention (Meta) Sep 2022 0.9415
7th Neural ODE ODE-based Dynamics (NeurIPS 2018 Best) Jun 2018 0.9410
8th JacobiKAN Jacobi Polynomial KAN Jun 2024 0.9391
9th KAN Polynomial KAN (MIT) Apr 2024 0.9387
10th TTT-Linear Test-Time Training (Stanford/Meta) Jul 2024 0.9382
11th Hyena Long Conv + Gating (Stanford) Mar 2023 0.9379
12th 1D-CNN Spectral Convolutions Classic 0.9366
13th Bayesian MC Dropout ICML 2016 0.9359
14th Transformer ViT-style Spectral Oct 2020 0.9331
15th xLSTM Exponential Gating May 2024 0.9262
๐Ÿ“–

Beginner's Guide to the Data

New to exoplanet spectroscopy? Start here! These visualizations explain everything from scratch.

What Is This Data?
๐Ÿ“Š What IS this data? The big picture, key numbers, and input vs output explained
Data Patterns
๐Ÿ” Hidden patterns: Why spectra are smooth, low-dimensional, and predictable
Why ML Works
๐Ÿค– Why ML works here: Connecting data patterns to predictability
๐Ÿ”ฌ

Deep Dive: Comprehensive Data Analysis

Ready to go deeper? These visualizations explore the full dataset in detail.

Complete Data Overview
๐Ÿ“Š Complete Overview: Key statistics, spectral coverage, data pipeline, and distributions
Wavelength Science
๐ŸŒˆ Wavelength Science: Mean spectrum, variance analysis, and wavelength correlations
Planet Diversity
๐Ÿช Planet Diversity: Sample spectra, PCA clustering, dimensionality analysis
Calibration Data
๐Ÿ”ง Calibration Data: Dark frames, flat fields, dead pixels - what the raw data looks like
Statistical Analysis
๐Ÿ“ˆ Advanced Statistics: Distribution shapes, smoothness, normality tests
๐Ÿง 

Novel 2024 Architectures

FourierKAN (Jun 2024) ๐Ÿฅ‡

Fourier series as learnable activation functions. Ideal for spectral data with periodic wavelength patterns. arXiv:2406.01034

KAN (Apr 2024)

Kolmogorov-Arnold Networks use learnable polynomial/spline activations on edges. Published by MIT & Northeastern. arXiv:2404.19756

Mamba (Dec 2023)

State Space Model with selective mechanism by Gu & Dao (CMU/Princeton). O(n) complexity. arXiv:2312.00752

xLSTM (May 2024)

Extended LSTM by Hochreiter et al. (JKU Linz). Exponential gating and matrix memory. arXiv:2405.04517

1D-CNN

Classic but effective for spectral data. Treats input as 1D signal and extracts local patterns through convolution. Fast and proven.

๐Ÿ› ๏ธ

Tech Stack

FourierKAN (Novel)
KAN (Novel)
Mamba SSM
xLSTM
PyTorch + CUDA
RTX 3060 12GB
170GB ZIP (~360GB unzipped)
scikit-learn
๐Ÿ“š

All 17 Papers Cited

๐Ÿ† Griffin (RG-LRU)

arXiv:2402.19427 (Feb 2024)
De, Smith et al. Google DeepMind. Gated Linear Recurrence - OUR CHAMPION!

FourierKAN ๐Ÿฅˆ

arXiv:2406.01034 (June 2024)
Fourier Kolmogorov-Arnold Network. Rยฒ = 0.9463

KAN

arXiv:2404.19756 (April 2024)
Liu et al., MIT & Northeastern. Learnable polynomial activations.

JacobiKAN

arXiv:2406.09798 (June 2024)
Fractional KAN with Jacobi orthogonal polynomials.

WavKAN

arXiv:2405.12832 (May 2024)
Bozorgasl & Chen, Boise State. Morlet wavelet multi-resolution.

Liquid NN ๐Ÿฅ‰

arXiv:2006.04439 (June 2020)
Hasani, Lechner et al. MIT & TU Wien. Liquid Time-Constant Networks.

Neural ODE

arXiv:1806.07366 (NeurIPS 2018 Best Paper)
Chen et al., U of Toronto. Continuous ODE dynamics.

Hyena

arXiv:2302.10866 (March 2023)
Poli et al., Stanford. Long convolutions + data-controlled gating.

MEGA

arXiv:2209.10655 (Sep 2022)
Ma et al., Meta AI (FAIR). EMA + Gated Attention.

TTT-Linear

arXiv:2407.04620 (July 2024)
Sun et al., Stanford & Meta. Test-Time Training with hidden state learning.

Mamba

arXiv:2312.00752 (Dec 2023)
Gu & Dao, CMU & Princeton. Selective state spaces, O(n) complexity.

Mamba2

arXiv:2405.21060 (ICML 2024)
Dao & Gu. Structured State Space Duality. 2-8ร— faster.

xLSTM

arXiv:2405.04517 (May 2024)
Beck, Hochreiter et al., JKU Linz. Exponential gating & matrix memory.

MC Dropout (Bayesian)

arXiv:1506.02142 (ICML 2016)
Gal & Ghahramani, Cambridge. Dropout as Bayesian approximation.

Vision Transformer

arXiv:2010.11929 (Oct 2020)
Dosovitskiy et al., Google Research. Patch-based transformer.