LocalOptimum commited on
Commit
71f5c62
·
verified ·
1 Parent(s): 28198d9

Upload sentiment analysis model

Browse files
README.md ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: zh
3
+ license: apache-2.0
4
+ tags:
5
+ - sentiment-analysis
6
+ - chinese
7
+ - finance
8
+ - finbert
9
+ - crypto
10
+ - text-classification
11
+ datasets:
12
+ - custom
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ - precision
17
+ - recall
18
+ model-index:
19
+ - name: Chinese Financial Sentiment Analysis (Crypto)
20
+ results:
21
+ - task:
22
+ type: text-classification
23
+ name: Sentiment Analysis
24
+ metrics:
25
+ - type: accuracy
26
+ value: 0.645
27
+ name: Accuracy
28
+ - type: f1
29
+ value: 0.6365
30
+ name: F1 Score
31
+ - type: precision
32
+ value: 0.6394
33
+ name: Precision
34
+ - type: recall
35
+ value: 0.645
36
+ name: Recall
37
+ ---
38
+
39
+ # Chinese Financial Sentiment Analysis Model (Crypto Focus)
40
+
41
+ 中文金融情感分析模型(加密货币领域)
42
+
43
+ ## 模型描述 | Model Description
44
+
45
+ 本模型基于 `yiyanghkust/finbert-tone-chinese` 微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。
46
+
47
+ This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative.
48
+
49
+ ## 训练数据 | Training Data
50
+
51
+ - **数据量 | Size**: 1000条人工标注的中文金融新闻 | 1000 manually annotated Chinese financial news articles
52
+ - **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets
53
+ - **标注方式 | Annotation**: AI辅助 + 人工修正 | AI-assisted + Manual correction
54
+ - **数据分布 | Distribution**:
55
+ - Positive(正面): 420条 (42.0%)
56
+ - Neutral(中性): 420条 (42.0%)
57
+ - Negative(负面): 160条 (16.0%)
58
+
59
+ ## 性能指标 | Performance Metrics
60
+
61
+ 在200条测试集上的表现 | Performance on 200 test samples:
62
+
63
+ | 指标 Metric | 数值 Value |
64
+ |-------------|-----------|
65
+ | 准确率 Accuracy | 64.50% |
66
+ | F1分数 F1 Score | 63.65% |
67
+ | 精确率 Precision | 63.94% |
68
+ | 召回率 Recall | 64.50% |
69
+
70
+ ## 使用方法 | Usage
71
+
72
+ ### 快速开始 | Quick Start
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
76
+ import torch
77
+
78
+ # 加载模型和分词器 | Load model and tokenizer
79
+ model_name = "YOUR_USERNAME/sentiment-finetuned-1000" # 替换为你的用户名
80
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
81
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
82
+
83
+ # 分析文本 | Analyze text
84
+ text = "比特币突破10万美元创历史新高"
85
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
86
+
87
+ # 预测 | Predict
88
+ with torch.no_grad():
89
+ outputs = model(**inputs)
90
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
91
+ predicted_class = torch.argmax(predictions, dim=-1).item()
92
+
93
+ # 结果映射 | Result mapping
94
+ labels = ['positive', 'neutral', 'negative']
95
+ sentiment = labels[predicted_class]
96
+ confidence = predictions[0][predicted_class].item()
97
+
98
+ print(f"情感: {sentiment}")
99
+ print(f"置信度: {confidence:.4f}")
100
+ ```
101
+
102
+ ### 批量处理 | Batch Processing
103
+
104
+ ```python
105
+ texts = [
106
+ "币安获得阿布扎比监管授权",
107
+ "以太坊完成Fusaka升级",
108
+ "某交易所遭攻击损失100万美元"
109
+ ]
110
+
111
+ inputs = tokenizer(texts, return_tensors="pt", truncation=True,
112
+ max_length=128, padding=True)
113
+
114
+ with torch.no_grad():
115
+ outputs = model(**inputs)
116
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
117
+ predicted_classes = torch.argmax(predictions, dim=-1)
118
+
119
+ labels = ['positive', 'neutral', 'negative']
120
+ for text, pred in zip(texts, predicted_classes):
121
+ print(f"{text} -> {labels[pred]}")
122
+ ```
123
+
124
+ ## 训练参数 | Training Configuration
125
+
126
+ - **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese
127
+ - **训练轮数 | Epochs**: 5
128
+ - **批次大小 | Batch Size**: 16
129
+ - **学习率 | Learning Rate**: 2e-5
130
+ - **最大序列长度 | Max Length**: 128
131
+ - **训练设备 | Device**: NVIDIA GeForce RTX 3060 Laptop GPU
132
+ - **训练时间 | Training Time**: ~5分钟 | ~5 minutes
133
+
134
+ ## 适用场景 | Use Cases
135
+
136
+ - ✅ 加密货币新闻情感分析
137
+ - ✅ 社交媒体舆情监控
138
+ - ✅ 金融市场情绪指标
139
+ - ✅ 实时新闻情感跟踪
140
+ - ✅ 投资决策辅助参考
141
+
142
+ ## 局限性 | Limitations
143
+
144
+ - ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳
145
+ - ⚠️ 负面样本相对较少(16%),对负面情感的识别可能不够敏感
146
+ - ⚠️ 短文本(少于10字)的分析准确率可能下降
147
+ - ⚠️ 仅支持简体中文
148
+ - ⚠️ 模型不能替代人工判断,仅供参考
149
+
150
+ ## 许可证 | License
151
+
152
+ Apache-2.0
153
+
154
+ ## 引用 | Citation
155
+
156
+ 如果使用本模型,请引用:
157
+
158
+ ```bibtex
159
+ @misc{watchtower-sentiment-2025,
160
+ title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
161
+ author={WatchTower Team},
162
+ year={2025},
163
+ howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}},
164
+ note={Fine-tuned from yiyanghkust/finbert-tone-chinese}
165
+ }
166
+ ```
167
+
168
+ ## 基础模型 | Base Model
169
+
170
+ 本模型基于以下模型微调:
171
+ - [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese)
172
+
173
+ 感谢原作者的贡献!
174
+
175
+ ## 更新日志 | Changelog
176
+
177
+ ### v2.0 (2025-12-09)
178
+ - ✅ 扩充训练数据至1000条
179
+ - ✅ 修正标注错误,提升数据质量
180
+ - ✅ 优化类别分布,提升模型平衡性
181
+ - ✅ F1分数提升2.01%(0.6165 → 0.6365)
182
+
183
+ ### v1.0 (Initial Release)
184
+ - 基于500条标注数据的初始版本
185
+
186
+ ## 联系方式 | Contact
187
+
188
+ 如有问题或建议,欢迎提 issue 或 PR。
189
+
190
+ ---
191
+
192
+ **维护者 | Maintainer**: WatchTower Team
193
+ **最后更新 | Last Updated**: 2025-12-09
config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "directionality": "bidi",
8
+ "dtype": "float32",
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "Neutral",
14
+ "1": "Positive",
15
+ "2": "Negative"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3072,
19
+ "label2id": {
20
+ "Negative": 2,
21
+ "Neutral": 0,
22
+ "Positive": 1
23
+ },
24
+ "layer_norm_eps": 1e-12,
25
+ "max_position_embeddings": 512,
26
+ "model_type": "bert",
27
+ "num_attention_heads": 12,
28
+ "num_hidden_layers": 12,
29
+ "pad_token_id": 0,
30
+ "pooler_fc_size": 768,
31
+ "pooler_num_attention_heads": 12,
32
+ "pooler_num_fc_layers": 3,
33
+ "pooler_size_per_head": 128,
34
+ "pooler_type": "first_token_transform",
35
+ "position_embedding_type": "absolute",
36
+ "problem_type": "single_label_classification",
37
+ "transformers_version": "4.57.3",
38
+ "type_vocab_size": 2,
39
+ "use_cache": true,
40
+ "vocab_size": 21128
41
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:201a42ab8939395ab1923d8eb7bd5505c645a8331572e4cec417007a7853e761
3
+ size 409103316
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": false,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92dfe961fb52dc1b69afe75a1a36ee5850f2525ca6066e2863f577c3d75cba51
3
+ size 5841
vocab.txt ADDED
The diff for this file is too large to render. See raw diff