theonegareth commited on
Commit
2072f33
·
verified ·
1 Parent(s): 41fc109

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +138 -0
README.md CHANGED
@@ -1,3 +1,141 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  license: mit
4
+ library_name: sklearn
5
+ tags:
6
+ - sklearn
7
+ - gold-price-prediction
8
+ - time-series
9
+ - classification
10
+ - financial-prediction
11
+ datasets:
12
+ - custom
13
+ metrics:
14
+ - accuracy
15
+ - f1-score
16
+ - roc-auc
17
+ model-index:
18
+ - name: Gold Price Direction Predictor
19
+ results:
20
+ - task:
21
+ type: classification
22
+ name: Binary Classification
23
+ dataset:
24
+ type: custom
25
+ name: Antam Gold Prices
26
+ metrics:
27
+ - type: accuracy
28
+ value: 0.55 # Approximate from training
29
+ name: Accuracy
30
+ - type: f1
31
+ value: 0.56 # Approximate
32
+ name: F1 Score
33
+ - type: roc_auc
34
+ value: 0.58 # Approximate
35
+ name: ROC AUC
36
  ---
37
+
38
+ # Gold Price Direction Predictor
39
+
40
+ This model predicts the next-day direction of gold prices (up or down) based on historical Antam gold price data and technical indicators.
41
+
42
+ ## Model Description
43
+
44
+ - **Model Type**: Binary Classification (Gradient Boosting / XGBoost / LightGBM)
45
+ - **Task**: Predict whether gold price will go up or down the next day
46
+ - **Input**: Feature vector with technical indicators (returns, lags, RSI, MACD, Bollinger Bands, etc.)
47
+ - **Output**: Probability of price going up (0-1), thresholded at optimized value for prediction
48
+
49
+ ## Intended Uses & Limitations
50
+
51
+ ### Intended Uses
52
+ - Financial analysis and decision support
53
+ - Educational purposes for machine learning in finance
54
+ - Research on gold price prediction
55
+
56
+ ### Limitations
57
+ - Trained on historical Antam gold prices only
58
+ - May not generalize to other markets or time periods
59
+ - Prediction accuracy is around 55-60% (better than random but not perfect)
60
+ - Requires up-to-date feature computation for real-time use
61
+
62
+ ## How to Use
63
+
64
+ ### Loading the Model
65
+
66
+ ```python
67
+ from huggingface_hub import hf_hub_download
68
+ from joblib import load
69
+
70
+ # Download model
71
+ model_path = hf_hub_download("theonegareth/GoldPricePredictor", "gold_direction_model.joblib")
72
+ model = load(model_path)
73
+ ```
74
+
75
+ ### Making Predictions
76
+
77
+ The model expects a pandas DataFrame with the same feature columns used in training.
78
+
79
+ ```python
80
+ import pandas as pd
81
+
82
+ # Example feature vector (you need to compute these from your data)
83
+ features = pd.DataFrame({
84
+ 'ret': [0.01],
85
+ 'log_ret': [0.00995],
86
+ 'ret_lag_1': [0.005],
87
+ # ... all required features
88
+ })
89
+
90
+ # Predict probability of going up
91
+ proba_up = model.predict_proba(features)[:, 1]
92
+ prediction = (proba_up >= 0.52).astype(int) # Using optimized threshold
93
+ ```
94
+
95
+ ### Feature Engineering
96
+
97
+ To use this model, you need to compute the same features from your gold price data:
98
+
99
+ - Daily returns and log returns
100
+ - Lagged returns (1-5 days)
101
+ - Rolling means and stds (3,5,10,20 days)
102
+ - RSI (14-day)
103
+ - MACD and signal
104
+ - Bollinger Bands
105
+ - Day of week and month
106
+
107
+ See the training notebooks for the complete `add_features_adaptive` function.
108
+
109
+ ## Training Data
110
+
111
+ - Source: Antam historical gold prices (Indonesian market)
112
+ - Period: [Insert date range from your data]
113
+ - Features: 25+ technical indicators
114
+ - Target: Next-day price direction (up=1, down=0)
115
+
116
+ ## Performance
117
+
118
+ Based on holdout testing:
119
+ - Accuracy: ~55%
120
+ - F1 Score: ~56%
121
+ - ROC AUC: ~58%
122
+
123
+ See the confusion matrix, ROC curve, and feature importance plots in the repository.
124
+
125
+ ## Training Procedure
126
+
127
+ 1. Data preprocessing and feature engineering
128
+ 2. Time-series split for cross-validation
129
+ 3. Hyperparameter tuning with RandomizedSearchCV
130
+ 4. Model selection based on F1 score
131
+ 5. Threshold optimization for final predictions
132
+
133
+ Models compared: Gradient Boosting, XGBoost, LightGBM
134
+
135
+ ## Contact
136
+
137
+ For questions or issues, please open an issue on this repository.
138
+
139
+ ## License
140
+
141
+ MIT License