Files changed (1) hide show
  1. README.md +108 -96
README.md CHANGED
@@ -1,97 +1,109 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - text-generation-inference
5
- - PRM
6
- - Code
7
- - Math
8
- license: apache-2.0
9
- language:
10
- - en
11
- base_model:
12
- - Qwen/Qwen2.5-1.5B-Instruct
13
- pipeline_tag: text-generation
14
- ---
15
-
16
- ![PRM.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/2inJGKPx_BrMcID7Osto-.png)
17
-
18
- # **Deepthink-1.5B-Open-PRM**
19
-
20
- > **Deepthink-1.5B-Open-PRM** is a **process-supervised reasoning model** fine-tuned from **Qwen2.5 1.5B** using **Process Reward Models (PRM)**. It excels at **step-by-step mathematical problem solving** in both **English** and **Simplified Chinese**, offering interpretable, logically structured responses for use in **education**, **STEM tutoring**, and **lightweight math agents**.
21
-
22
- ## **Key Features**
23
-
24
- 1. **Process Reward Model Supervision (PRM)**
25
- Fine-tuned with PRMs to reward high-quality intermediate reasoning steps — fostering step-by-step interpretability, accuracy, and educational transparency.
26
-
27
- 2. **Compact Foundation (Qwen2.5 0.5B)**
28
- Built upon the highly efficient Qwen2.5 1.5B architecture and scaled up through distillation and reward-based alignment to 1.5B parameters, balancing reasoning quality and deployment efficiency.
29
-
30
- 3. **Bilingual Math Capability**
31
- Fluent in solving and explaining math problems in both **English** and **Simplified Chinese**, making it ideal for multilingual classrooms and tutoring platforms.
32
-
33
- 4. **Process-Supervised Math Reasoning**
34
- Trained to reason like a teacher — showing each logical step before delivering an answer. Ideal for learners who need to understand the “how” and “why” behind each solution.
35
-
36
- 5. **Long-Context & Word Problem Reasoning**
37
- Especially proficient with multi-step arithmetic, word problems, logic puzzles, and middle school to early college-level math.
38
-
39
- ## **Quickstart with Transformers**
40
-
41
- ```python
42
- from transformers import AutoModelForCausalLM, AutoTokenizer
43
-
44
- model_name = "prithivMLmods/Deepthink-1.5B-Open-PRM"
45
-
46
- model = AutoModelForCausalLM.from_pretrained(
47
- model_name,
48
- torch_dtype="auto",
49
- device_map="auto"
50
- )
51
- tokenizer = AutoTokenizer.from_pretrained(model_name)
52
-
53
- prompt = "Solve: A tank can be filled by one pipe in 6 hours and emptied by another in 9 hours. How long will it take to fill the tank if both pipes are opened together?"
54
-
55
- messages = [
56
- {"role": "system", "content": "You are a helpful math tutor who explains each step clearly."},
57
- {"role": "user", "content": prompt}
58
- ]
59
-
60
- text = tokenizer.apply_chat_template(
61
- messages,
62
- tokenize=False,
63
- add_generation_prompt=True
64
- )
65
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
66
-
67
- generated_ids = model.generate(
68
- **model_inputs,
69
- max_new_tokens=512
70
- )
71
- generated_ids = [
72
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
73
- ]
74
-
75
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
76
- ```
77
-
78
- ## **Intended Use**
79
-
80
- - **Math Education Agents**: Tutors that explain problems step by step, helping users build understanding through reasoning.
81
- - **Bilingual Learning Platforms**: Apps that teach math in both Chinese and English.
82
- - **STEM-Oriented Assistants**: Supports early-stage problem solving in science and engineering contexts.
83
- - **Lightweight LLM Deployments**: Optimized for low-resource environments, from browsers to mobile devices.
84
-
85
- ## **Limitations**
86
-
87
- 1. **Domain Specificity**
88
- Primarily tuned for math reasoning — performance may degrade on unrelated tasks like creative writing or open dialogue.
89
-
90
- 2. **Model Size Constraint**
91
- While efficient, 1.5B parameters may struggle with highly abstract or very long multi-domain tasks.
92
-
93
- 3. **PRM Bias Generalization**
94
- PRM training can bias toward rewardable structures results should still be reviewed for correctness and completeness.
95
-
96
- 4. **Prompt Structure Sensitivity**
 
 
 
 
 
 
 
 
 
 
 
 
97
  Well-structured queries yield more accurate and educationally useful outputs.
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-generation-inference
5
+ - PRM
6
+ - Code
7
+ - Math
8
+ license: apache-2.0
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ base_model:
24
+ - Qwen/Qwen2.5-1.5B-Instruct
25
+ pipeline_tag: text-generation
26
+ ---
27
+
28
+ ![PRM.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/2inJGKPx_BrMcID7Osto-.png)
29
+
30
+ # **Deepthink-1.5B-Open-PRM**
31
+
32
+ > **Deepthink-1.5B-Open-PRM** is a **process-supervised reasoning model** fine-tuned from **Qwen2.5 1.5B** using **Process Reward Models (PRM)**. It excels at **step-by-step mathematical problem solving** in both **English** and **Simplified Chinese**, offering interpretable, logically structured responses for use in **education**, **STEM tutoring**, and **lightweight math agents**.
33
+
34
+ ## **Key Features**
35
+
36
+ 1. **Process Reward Model Supervision (PRM)**
37
+ Fine-tuned with PRMs to reward high-quality intermediate reasoning steps fostering step-by-step interpretability, accuracy, and educational transparency.
38
+
39
+ 2. **Compact Foundation (Qwen2.5 0.5B)**
40
+ Built upon the highly efficient Qwen2.5 1.5B architecture and scaled up through distillation and reward-based alignment to 1.5B parameters, balancing reasoning quality and deployment efficiency.
41
+
42
+ 3. **Bilingual Math Capability**
43
+ Fluent in solving and explaining math problems in both **English** and **Simplified Chinese**, making it ideal for multilingual classrooms and tutoring platforms.
44
+
45
+ 4. **Process-Supervised Math Reasoning**
46
+ Trained to reason like a teacher — showing each logical step before delivering an answer. Ideal for learners who need to understand the “how” and “why” behind each solution.
47
+
48
+ 5. **Long-Context & Word Problem Reasoning**
49
+ Especially proficient with multi-step arithmetic, word problems, logic puzzles, and middle school to early college-level math.
50
+
51
+ ## **Quickstart with Transformers**
52
+
53
+ ```python
54
+ from transformers import AutoModelForCausalLM, AutoTokenizer
55
+
56
+ model_name = "prithivMLmods/Deepthink-1.5B-Open-PRM"
57
+
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ model_name,
60
+ torch_dtype="auto",
61
+ device_map="auto"
62
+ )
63
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
64
+
65
+ prompt = "Solve: A tank can be filled by one pipe in 6 hours and emptied by another in 9 hours. How long will it take to fill the tank if both pipes are opened together?"
66
+
67
+ messages = [
68
+ {"role": "system", "content": "You are a helpful math tutor who explains each step clearly."},
69
+ {"role": "user", "content": prompt}
70
+ ]
71
+
72
+ text = tokenizer.apply_chat_template(
73
+ messages,
74
+ tokenize=False,
75
+ add_generation_prompt=True
76
+ )
77
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
78
+
79
+ generated_ids = model.generate(
80
+ **model_inputs,
81
+ max_new_tokens=512
82
+ )
83
+ generated_ids = [
84
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
85
+ ]
86
+
87
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
88
+ ```
89
+
90
+ ## **Intended Use**
91
+
92
+ - **Math Education Agents**: Tutors that explain problems step by step, helping users build understanding through reasoning.
93
+ - **Bilingual Learning Platforms**: Apps that teach math in both Chinese and English.
94
+ - **STEM-Oriented Assistants**: Supports early-stage problem solving in science and engineering contexts.
95
+ - **Lightweight LLM Deployments**: Optimized for low-resource environments, from browsers to mobile devices.
96
+
97
+ ## **Limitations**
98
+
99
+ 1. **Domain Specificity**
100
+ Primarily tuned for math reasoning — performance may degrade on unrelated tasks like creative writing or open dialogue.
101
+
102
+ 2. **Model Size Constraint**
103
+ While efficient, 1.5B parameters may struggle with highly abstract or very long multi-domain tasks.
104
+
105
+ 3. **PRM Bias Generalization**
106
+ PRM training can bias toward rewardable structures — results should still be reviewed for correctness and completeness.
107
+
108
+ 4. **Prompt Structure Sensitivity**
109
  Well-structured queries yield more accurate and educationally useful outputs.