Gaia Eclipsing Binary Temperature Prediction Models

Pre-trained Random Forest models for effective temperature (Teff) prediction of eclipsing binary stars using Gaia DR3 photometry.

Models Overview

This repository contains the models used in the best-of-four ensemble approach for temperature prediction of 2.1 million eclipsing binaries.

Best-of-Four Ensemble Models

The ensemble selects the prediction with the lowest Random Forest uncertainty for each object from four models:

Model Description Features Performance Contribution
Flag 1 Highest quality Gaia sources Gaia photometry MAE: 195K, R²: 0.91 29.2%
Teff-only Colors only (corrected) Gaia photometry MAE: 290K, R²: 0.67 25.6%
Teff+logg Colors + surface gravity Gaia + logg MAE: 277K, R²: 0.73 23.8%
Teff+clustering Colors + cluster features Gaia + 5 cluster probs MAE: 610K, R²: 0.65 21.4%

Ensemble Performance:

  • Mean uncertainty: 203 K (22.8% improvement over single best model)
  • Coverage: 97.2% of 2.1M eclipsing binaries

Available Models

1. Flag 1 Model (Primary)

Model ID: rf_gaia_colors_flag1_20251218_094937

Description: Trained exclusively on high-quality Gaia GSP-Phot sources (flag 1) with corrected temperatures.

Training Data:

  • 711,666 high-quality Gaia sources
  • Target: Corrected Teff (polynomial correction for stars >10,000K)

Features (6):

  • Gaia photometry: g, bp, rp
  • Gaia colors: bp_rp, g_rp, g_bp

Performance:

  • MAE: 195 K
  • RMSE: 312 K
  • R²: 0.9126
  • Within 10%: 98.5%

Files:

  • rf_gaia_colors_flag1_20251218_094937.pkl (1.37 GB)
  • rf_gaia_colors_flag1_20251218_094937_metadata.json
  • rf_gaia_colors_flag1_20251218_094937_SUMMARY.txt

2. Cluster Model

Model ID: rf_gaia_cluster_teff_corrected_20251208_092934

Description: Gaia photometry enhanced with K-means clustering features for improved temperature prediction.

Training Data:

  • Same training set as other corrected models
  • Target: Corrected Teff

Features (11):

  • Gaia photometry: g, bp, rp
  • Gaia colors: bp_rp, g_rp, g_bp, bp_g
  • Clustering probabilities: 5 cluster probabilities from K-means (k=5)

Performance:

  • MAE: 610 K
  • RMSE: 1171 K
  • R²: 0.6535
  • Within 10%: 65.6%

Top Features:

  1. bp_rp (20.5%)
  2. g_bp (17.6%)
  3. cluster_prob_2 (13.0%)

Files:

  • rf_gaia_cluster_teff_corrected_20251208_092934.pkl (1.26 GB)
  • rf_gaia_cluster_teff_corrected_20251208_092934_metadata.json
  • rf_gaia_cluster_teff_corrected_20251208_092934_SUMMARY.txt
  • rf_gaia_cluster_teff_corrected_20251208_092934_clustering_kmeans.pkl (4.1 MB)
  • rf_gaia_cluster_teff_corrected_20251208_092934_clustering_scaler.pkl (1 KB)

Note: This model requires the clustering model and scaler for feature generation before prediction.

Usage

Loading a Model

import joblib
from pathlib import Path

# Download model from HuggingFace
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Dedulek/gaia-eb-teff-models",
    filename="rf_gaia_colors_flag1_20251218_094937.pkl"
)

# Load model
model = joblib.load(model_path)

Making Predictions (Flag 1 Model)

import numpy as np

# Prepare features (example)
features = np.array([[
    15.5,  # g magnitude
    16.2,  # bp magnitude
    14.8,  # rp magnitude
    1.4,   # bp_rp color
    0.7,   # g_rp color
    -0.7   # g_bp color
]])

# Predict temperature
teff_pred = model.predict(features)[0]
print(f"Predicted Teff: {teff_pred:.0f} K")

# Get uncertainty (if model supports it)
# Note: RF uncertainty requires full tree method

Using Cluster Model

import joblib
import numpy as np
from sklearn.preprocessing import StandardScaler

# Load main model and clustering components
model = joblib.load("rf_gaia_cluster_teff_corrected_20251208_092934.pkl")
kmeans = joblib.load("rf_gaia_cluster_teff_corrected_20251208_092934_clustering_kmeans.pkl")
scaler = joblib.load("rf_gaia_cluster_teff_corrected_20251208_092934_clustering_scaler.pkl")

# Prepare base features (Gaia photometry)
base_features = np.array([[15.5, 16.2, 14.8, 1.4, 0.7, -0.7, -2.1]])

# Generate clustering features
scaled_features = scaler.transform(base_features[:, :7])  # Scale photometry
cluster_probs = kmeans.predict_proba(scaled_features)     # Get cluster probabilities

# Combine features
full_features = np.hstack([base_features, cluster_probs])

# Predict
teff_pred = model.predict(full_features)[0]

Model Training

All models were trained using:

  • Algorithm: Random Forest Regressor
  • Trees: 100 estimators
  • Max depth: None (fully grown trees)
  • Uncertainty: Full tree method for accurate RF uncertainties
  • Target: Corrected Teff (polynomial correction applied for stars >10,000K)

Correction Coefficients

Temperature correction coefficients are available in the dataset repository:

Companion Dataset

The predictions from these models are available in: Dedulek/gaia-eb-teff-datasets

The dataset includes:

  • Best-of-four catalog (2.1M eclipsing binaries)
  • Photometry data (Gaia + Pan-STARRS + 2MASS)
  • Quality flags and uncertainties

Citation

If you use these models, please cite:

@misc{gaia_eb_teff_models_2024,
  title={Gaia Eclipsing Binary Temperature Prediction Models},
  author={Your Name/Team},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Dedulek/gaia-eb-teff-models}}
}

Model Performance Comparison

Flag 1 Model

  • Best for: High-quality predictions with lowest uncertainty
  • Strengths: Trained on highest quality Gaia sources, excellent R²
  • Use case: When you need maximum accuracy and have Gaia flag 1 sources

Cluster Model

  • Best for: Objects with distinct clustering patterns
  • Strengths: Captures population structure, good for certain stellar types
  • Use case: Complementary to photometry-only models

License

These models are released under CC BY 4.0. You are free to use, share, and adapt with attribution.

Updates

  • 2024-12-18: Added Flag 1 model (29.2% contribution to ensemble)
  • 2024-12-08: Added Cluster model with K-means features
  • 2024-12-09: Best-of-four ensemble approach (203K mean uncertainty)

Contact

For questions or issues, please open an issue in the repository or contact the authors.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support