Gaia Eclipsing Binary Temperature Prediction Models
Pre-trained Random Forest models for effective temperature (Teff) prediction of eclipsing binary stars using Gaia DR3 photometry.
Models Overview
This repository contains the models used in the best-of-four ensemble approach for temperature prediction of 2.1 million eclipsing binaries.
Best-of-Four Ensemble Models
The ensemble selects the prediction with the lowest Random Forest uncertainty for each object from four models:
| Model | Description | Features | Performance | Contribution |
|---|---|---|---|---|
| Flag 1 | Highest quality Gaia sources | Gaia photometry | MAE: 195K, R²: 0.91 | 29.2% |
| Teff-only | Colors only (corrected) | Gaia photometry | MAE: 290K, R²: 0.67 | 25.6% |
| Teff+logg | Colors + surface gravity | Gaia + logg | MAE: 277K, R²: 0.73 | 23.8% |
| Teff+clustering | Colors + cluster features | Gaia + 5 cluster probs | MAE: 610K, R²: 0.65 | 21.4% |
Ensemble Performance:
- Mean uncertainty: 203 K (22.8% improvement over single best model)
- Coverage: 97.2% of 2.1M eclipsing binaries
Available Models
1. Flag 1 Model (Primary)
Model ID: rf_gaia_colors_flag1_20251218_094937
Description: Trained exclusively on high-quality Gaia GSP-Phot sources (flag 1) with corrected temperatures.
Training Data:
- 711,666 high-quality Gaia sources
- Target: Corrected Teff (polynomial correction for stars >10,000K)
Features (6):
- Gaia photometry:
g,bp,rp - Gaia colors:
bp_rp,g_rp,g_bp
Performance:
- MAE: 195 K
- RMSE: 312 K
- R²: 0.9126
- Within 10%: 98.5%
Files:
rf_gaia_colors_flag1_20251218_094937.pkl(1.37 GB)rf_gaia_colors_flag1_20251218_094937_metadata.jsonrf_gaia_colors_flag1_20251218_094937_SUMMARY.txt
2. Cluster Model
Model ID: rf_gaia_cluster_teff_corrected_20251208_092934
Description: Gaia photometry enhanced with K-means clustering features for improved temperature prediction.
Training Data:
- Same training set as other corrected models
- Target: Corrected Teff
Features (11):
- Gaia photometry:
g,bp,rp - Gaia colors:
bp_rp,g_rp,g_bp,bp_g - Clustering probabilities: 5 cluster probabilities from K-means (k=5)
Performance:
- MAE: 610 K
- RMSE: 1171 K
- R²: 0.6535
- Within 10%: 65.6%
Top Features:
bp_rp(20.5%)g_bp(17.6%)cluster_prob_2(13.0%)
Files:
rf_gaia_cluster_teff_corrected_20251208_092934.pkl(1.26 GB)rf_gaia_cluster_teff_corrected_20251208_092934_metadata.jsonrf_gaia_cluster_teff_corrected_20251208_092934_SUMMARY.txtrf_gaia_cluster_teff_corrected_20251208_092934_clustering_kmeans.pkl(4.1 MB)rf_gaia_cluster_teff_corrected_20251208_092934_clustering_scaler.pkl(1 KB)
Note: This model requires the clustering model and scaler for feature generation before prediction.
Usage
Loading a Model
import joblib
from pathlib import Path
# Download model from HuggingFace
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="Dedulek/gaia-eb-teff-models",
filename="rf_gaia_colors_flag1_20251218_094937.pkl"
)
# Load model
model = joblib.load(model_path)
Making Predictions (Flag 1 Model)
import numpy as np
# Prepare features (example)
features = np.array([[
15.5, # g magnitude
16.2, # bp magnitude
14.8, # rp magnitude
1.4, # bp_rp color
0.7, # g_rp color
-0.7 # g_bp color
]])
# Predict temperature
teff_pred = model.predict(features)[0]
print(f"Predicted Teff: {teff_pred:.0f} K")
# Get uncertainty (if model supports it)
# Note: RF uncertainty requires full tree method
Using Cluster Model
import joblib
import numpy as np
from sklearn.preprocessing import StandardScaler
# Load main model and clustering components
model = joblib.load("rf_gaia_cluster_teff_corrected_20251208_092934.pkl")
kmeans = joblib.load("rf_gaia_cluster_teff_corrected_20251208_092934_clustering_kmeans.pkl")
scaler = joblib.load("rf_gaia_cluster_teff_corrected_20251208_092934_clustering_scaler.pkl")
# Prepare base features (Gaia photometry)
base_features = np.array([[15.5, 16.2, 14.8, 1.4, 0.7, -0.7, -2.1]])
# Generate clustering features
scaled_features = scaler.transform(base_features[:, :7]) # Scale photometry
cluster_probs = kmeans.predict_proba(scaled_features) # Get cluster probabilities
# Combine features
full_features = np.hstack([base_features, cluster_probs])
# Predict
teff_pred = model.predict(full_features)[0]
Model Training
All models were trained using:
- Algorithm: Random Forest Regressor
- Trees: 100 estimators
- Max depth: None (fully grown trees)
- Uncertainty: Full tree method for accurate RF uncertainties
- Target: Corrected Teff (polynomial correction applied for stars >10,000K)
Correction Coefficients
Temperature correction coefficients are available in the dataset repository:
- File:
correction/teff_correction_coeffs_deg2.pkl - Repository: Dedulek/gaia-eb-teff-datasets
Companion Dataset
The predictions from these models are available in: Dedulek/gaia-eb-teff-datasets
The dataset includes:
- Best-of-four catalog (2.1M eclipsing binaries)
- Photometry data (Gaia + Pan-STARRS + 2MASS)
- Quality flags and uncertainties
Citation
If you use these models, please cite:
@misc{gaia_eb_teff_models_2024,
title={Gaia Eclipsing Binary Temperature Prediction Models},
author={Your Name/Team},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/Dedulek/gaia-eb-teff-models}}
}
Model Performance Comparison
Flag 1 Model
- Best for: High-quality predictions with lowest uncertainty
- Strengths: Trained on highest quality Gaia sources, excellent R²
- Use case: When you need maximum accuracy and have Gaia flag 1 sources
Cluster Model
- Best for: Objects with distinct clustering patterns
- Strengths: Captures population structure, good for certain stellar types
- Use case: Complementary to photometry-only models
License
These models are released under CC BY 4.0. You are free to use, share, and adapt with attribution.
Updates
- 2024-12-18: Added Flag 1 model (29.2% contribution to ensemble)
- 2024-12-08: Added Cluster model with K-means features
- 2024-12-09: Best-of-four ensemble approach (203K mean uncertainty)
Contact
For questions or issues, please open an issue in the repository or contact the authors.