Luigi commited on
Commit
cf62bd5
·
1 Parent(s): 913c94a

remove unused scripts

Browse files
test_multilingual.py DELETED
@@ -1,208 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Test multilingual summarization functionality.
4
- Tests summarization and title generation across different languages.
5
- """
6
-
7
- import sys
8
- import os
9
- sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
10
-
11
- from src.summarization import summarize_transcript, generate_title
12
- from src.utils import available_gguf_llms
13
-
14
- # Test transcripts in different languages
15
- TEST_TRANSCRIPTS = {
16
- "english": """
17
- Hello everyone, today we're going to discuss artificial intelligence and its impact on modern society.
18
- AI has become increasingly important in our daily lives, from voice assistants like Siri and Alexa,
19
- to recommendation systems on Netflix and YouTube. The technology is advancing rapidly, with machine
20
- learning algorithms becoming more sophisticated every day. However, we must also consider the ethical
21
- implications of AI development, including privacy concerns, job displacement, and the potential for bias
22
- in automated decision-making systems. It's crucial that we develop AI responsibly to ensure it benefits
23
- all of humanity rather than just a select few.
24
- """,
25
-
26
- "french": """
27
- Bonjour à tous, aujourd'hui nous allons discuter de l'intelligence artificielle et de son impact sur la société moderne.
28
- L'IA est devenue de plus en plus importante dans notre vie quotidienne, des assistants vocaux comme Siri et Alexa,
29
- aux systèmes de recommandation sur Netflix et YouTube. La technologie progresse rapidement, avec des algorithmes
30
- d'apprentissage automatique devenant plus sophistiqués chaque jour. Cependant, nous devons également considérer
31
- les implications éthiques du développement de l'IA, y compris les préoccupations de confidentialité, le déplacement
32
- d'emplois, et le potentiel de biais dans les systèmes de prise de décision automatisée. Il est crucial que nous
33
- développions l'IA de manière responsable pour assurer qu'elle bénéficie à toute l'humanité plutôt qu'à une élite.
34
- """,
35
-
36
- "spanish": """
37
- Hola a todos, hoy vamos a discutir sobre la inteligencia artificial y su impacto en la sociedad moderna.
38
- La IA se ha vuelto cada vez más importante en nuestra vida diaria, desde asistentes de voz como Siri y Alexa,
39
- hasta sistemas de recomendación en Netflix y YouTube. La tecnología está avanzando rápidamente, con algoritmos
40
- de aprendizaje automático volviéndose más sofisticados cada día. Sin embargo, también debemos considerar
41
- las implicaciones éticas del desarrollo de la IA, incluyendo preocupaciones de privacidad, desplazamiento
42
- laboral, y el potencial de sesgo en sistemas de toma de decisiones automatizada. Es crucial que desarrollemos
43
- la IA de manera responsable para asegurar que beneficie a toda la humanidad en lugar de solo a unos pocos.
44
- """,
45
-
46
- "german": """
47
- Hallo zusammen, heute werden wir über künstliche Intelligenz und ihre Auswirkungen auf die moderne Gesellschaft sprechen.
48
- KI ist in unserem täglichen Leben immer wichtiger geworden, von Sprachassistenten wie Siri und Alexa,
49
- bis hin zu Empfehlungssystemen auf Netflix und YouTube. Die Technologie entwickelt sich rasant weiter, mit
50
- maschinellen Lernalgorithmen, die jeden Tag ausgefeilter werden. Allerdings müssen wir auch die ethischen
51
- Implikationen der KI-Entwicklung berücksichtigen, einschließlich Datenschutzbedenken, Arbeitsplatzverlust
52
- und das Potenzial für Voreingenommenheit in automatisierten Entscheidungssystemen. Es ist entscheidend, dass
53
- wir KI verantwortungsvoll entwickeln, um sicherzustellen, dass sie der gesamten Menschheit zugutekommt und nicht nur wenigen.
54
- """,
55
-
56
- "chinese": """
57
- 大家好,今天我们来讨论人工智能及其对现代社会的影响。
58
- 人工智能已经在我们的日常生活中变得越来越重要,从Siri和Alexa这样的语音助手,
59
- 到Netflix和YouTube上的推荐系统。技术正在快速发展,机器学习算法每天都变得更加复杂。
60
- 然而,我们也必须考虑人工智能发展的伦理含义,包括隐私问题、就业流失,
61
- 以及自动化决策系统中潜在的偏见。至关重要的是,我们要负责任地开发人工智能,
62
- 以确保它惠及全人类而不是少数人。
63
- """,
64
-
65
- "japanese": """
66
- 皆さんこんにちは、今日は人工知能とそれが現代社会に与える影響について議論します。
67
- AIは私たちの日常生活でますます重要になっています。SiriやAlexaのような音声アシスタントから、
68
- NetflixやYouTubeの推薦システムまで。技術は急速に進化しており、機械学習アルゴリズムは
69
- 日々より洗練されたものになっています。しかし、私たちはAI開発の倫理的影響も考慮しなければなりません。
70
- プライバシーの懸念、雇用の喪失、自動化された意思決定システムにおける偏りの可能性を含めて。
71
- 私たちはAIを責��を持って開発し、それが少数の人々ではなく全人類に利益をもたらすことを確実にすることが重要です。
72
- """,
73
-
74
- "arabic": """
75
- مرحباً بكم جميعاً، اليوم سنناقش الذكاء الاصطناعي وتأثيره على المجتمع الحديث.
76
- أصبح الذكاء الاصطناعي أكثر أهمية في حياتنا اليومية، من المساعدين الصوتيين مثل Siri وAlexa،
77
- إلى أنظمة التوصية على Netflix وYouTube. التكنولوجيا تتطور بسرعة، مع خوارزميات التعلم الآلي
78
- التي تصبح أكثر تعقيداً كل يوم. ومع ذلك، يجب أن نعتبر أيضاً الآثار الأخلاقية لتطوير الذكاء الاصطناعي،
79
- بما في ذلك مخاوف الخصوصية، والتهجير الوظيفي، وإمكانية التحيز في أنظمة اتخاذ القرارات الآلية.
80
- من المهم أن نطور الذكاء الاصطناعي بمسؤولية لضمان أنه يفيد البشرية جمعاء وليس فئة قليلة فقط.
81
- """
82
- }
83
-
84
- def test_multilingual_summarization():
85
- """Test summarization in multiple languages"""
86
- print("Testing multilingual summarization...")
87
-
88
- # Use the first available model
89
- model_name = list(available_gguf_llms.keys())[0]
90
- print(f"Using model: {model_name}")
91
-
92
- for language, transcript in TEST_TRANSCRIPTS.items():
93
- print(f"\n--- Testing {language.upper()} ---")
94
- print(f"Original transcript length: {len(transcript)} characters")
95
-
96
- try:
97
- # Generate title
98
- title = generate_title(transcript, model_name)
99
- print(f"Generated title: {title}")
100
-
101
- # Generate summary
102
- summary_parts = list(summarize_transcript(transcript, model_name, "Summarize this transcript"))
103
- summary = ''.join(summary_parts)
104
- print(f"Generated summary: {summary[:200]}..." if len(summary) > 200 else f"Generated summary: {summary}")
105
-
106
- # Basic validation - check if summary is in the same language family
107
- # This is a simple heuristic check
108
- if language == "english":
109
- # English should contain mostly ASCII characters for basic words
110
- english_words = sum(1 for word in summary.split() if word.isascii() and len(word) > 2)
111
- total_words = len(summary.split())
112
- if total_words > 0:
113
- english_ratio = english_words / total_words
114
- print(".2f")
115
- if english_ratio < 0.3:
116
- print(f"⚠️ WARNING: Low English ratio detected ({english_ratio:.2f})")
117
-
118
- elif language == "chinese":
119
- # Chinese should contain Chinese characters
120
- chinese_chars = sum(1 for char in summary if '\u4e00' <= char <= '\u9fff')
121
- if len(summary) > 0:
122
- chinese_ratio = chinese_chars / len(summary)
123
- print(".2f")
124
- if chinese_ratio < 0.1:
125
- print(f"⚠️ WARNING: Low Chinese character ratio detected ({chinese_ratio:.2f})")
126
-
127
- elif language == "japanese":
128
- # Japanese should contain Hiragana/Katakana/Kanji
129
- japanese_chars = sum(1 for char in summary if ('\u3040' <= char <= '\u309f') or ('\u30a0' <= char <= '\u30ff') or ('\u4e00' <= char <= '\u9fff'))
130
- if len(summary) > 0:
131
- japanese_ratio = japanese_chars / len(summary)
132
- print(".2f")
133
- if japanese_ratio < 0.1:
134
- print(f"⚠️ WARNING: Low Japanese character ratio detected ({japanese_ratio:.2f})")
135
-
136
- elif language == "arabic":
137
- # Arabic should contain Arabic characters
138
- arabic_chars = sum(1 for char in summary if '\u0600' <= char <= '\u06ff')
139
- if len(summary) > 0:
140
- arabic_ratio = arabic_chars / len(summary)
141
- print(".2f")
142
- if arabic_ratio < 0.1:
143
- print(f"⚠️ WARNING: Low Arabic character ratio detected ({arabic_ratio:.2f})")
144
-
145
- print("✅ Test passed")
146
-
147
- except Exception as e:
148
- print(f"❌ Test failed for {language}: {e}")
149
-
150
- def test_language_consistency():
151
- """Test that titles and summaries maintain language consistency"""
152
- print("\n\nTesting language consistency between titles and summaries...")
153
-
154
- model_name = list(available_gguf_llms.keys())[0]
155
-
156
- for language, transcript in TEST_TRANSCRIPTS.items():
157
- print(f"\n--- Testing consistency for {language.upper()} ---")
158
-
159
- try:
160
- title = generate_title(transcript, model_name)
161
- summary_parts = list(summarize_transcript(transcript, model_name, "Summarize this transcript"))
162
- summary = ''.join(summary_parts)
163
-
164
- # Simple consistency check - both should be non-empty and in similar character sets
165
- if not title or not summary:
166
- print("❌ Empty title or summary generated")
167
- continue
168
-
169
- # Check if both title and summary contain similar language characteristics
170
- def get_language_chars(text):
171
- """Get ratio of language-specific characters"""
172
- if language == "chinese":
173
- return sum(1 for char in text if '\u4e00' <= char <= '\u9fff')
174
- elif language == "japanese":
175
- return sum(1 for char in text if ('\u3040' <= char <= '\u309f') or ('\u30a0' <= char <= '\u30ff') or ('\u4e00' <= char <= '\u9fff'))
176
- elif language == "arabic":
177
- return sum(1 for char in text if '\u0600' <= char <= '\u06ff')
178
- else:
179
- return sum(1 for char in text if char.isascii())
180
-
181
- title_chars = get_language_chars(title)
182
- summary_chars = get_language_chars(summary)
183
-
184
- title_ratio = title_chars / len(title) if len(title) > 0 else 0
185
- summary_ratio = summary_chars / len(summary) if len(summary) > 0 else 0
186
-
187
- ratio_diff = abs(title_ratio - summary_ratio)
188
- print(f"Language character ratios - Title: {title_ratio:.2f}, Summary: {summary_ratio:.2f}")
189
- if ratio_diff > 0.3:
190
- print(f"⚠️ WARNING: Large difference in language character ratios ({ratio_diff:.2f})")
191
- else:
192
- print("✅ Language consistency maintained")
193
-
194
- except Exception as e:
195
- print(f"❌ Consistency test failed for {language}: {e}")
196
-
197
- except Exception as e:
198
- print(f"❌ Consistency test failed for {language}: {e}")
199
-
200
- if __name__ == "__main__":
201
- print("Starting multilingual summarization tests...")
202
- print("=" * 60)
203
-
204
- test_multilingual_summarization()
205
- test_language_consistency()
206
-
207
- print("\n" + "=" * 60)
208
- print("Multilingual tests completed!")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test_multilingual_quick.py DELETED
@@ -1,86 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Quick test for multilingual summarization functionality.
4
- Tests title generation in multiple languages.
5
- """
6
-
7
- import sys
8
- import os
9
- sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
10
-
11
- from src.summarization import generate_title
12
- from src.utils import available_gguf_llms
13
-
14
- # Test transcripts in different languages (shorter versions)
15
- TEST_TRANSCRIPTS = {
16
- "english": "Hello everyone, today we're going to discuss artificial intelligence and its impact on modern society. AI has become increasingly important in our daily lives.",
17
-
18
- "french": "Bonjour à tous, aujourd'hui nous allons discuter de l'intelligence artificielle et de son impact sur la société moderne. L'IA est devenue de plus en plus importante dans notre vie quotidienne.",
19
-
20
- "spanish": "Hola a todos, hoy vamos a discutir sobre la inteligencia artificial y su impacto en la sociedad moderna. La IA se ha vuelto cada vez más importante en nuestra vida diaria.",
21
-
22
- "chinese": "大家好,今天我们来讨论人工智能及其对现代社会的影响。人工智能已经在我们的日常生活中变得越来越重要。",
23
- }
24
-
25
- def test_multilingual_titles():
26
- """Test title generation in multiple languages"""
27
- print("Testing multilingual title generation...")
28
-
29
- # Use the first available model
30
- model_name = list(available_gguf_llms.keys())[0]
31
- print(f"Using model: {model_name}")
32
-
33
- for language, transcript in TEST_TRANSCRIPTS.items():
34
- print(f"\n--- Testing {language.upper()} ---")
35
- print(f"Transcript: {transcript[:100]}...")
36
-
37
- try:
38
- # Generate title
39
- title = generate_title(transcript, model_name)
40
- print(f"Generated title: {title}")
41
-
42
- # Basic language validation
43
- if language == "english":
44
- # Check if title contains English words
45
- english_words = sum(1 for word in title.split() if word.isascii() and len(word) > 2)
46
- if english_words == 0:
47
- print("⚠️ WARNING: No English words detected in title")
48
- else:
49
- print("✅ English title generated")
50
-
51
- elif language == "chinese":
52
- # Check if title contains Chinese characters
53
- chinese_chars = sum(1 for char in title if '\u4e00' <= char <= '\u9fff')
54
- if chinese_chars == 0:
55
- print("⚠️ WARNING: No Chinese characters detected in title")
56
- else:
57
- print("✅ Chinese title generated")
58
-
59
- elif language in ["french", "spanish"]:
60
- # Check if title contains language-specific patterns
61
- if language == "french":
62
- # French often has accented characters or specific words
63
- french_indicators = any(char in title for char in "àâäéèêëïîôöùûüÿç") or "IA" in title or "société" in title
64
- if not french_indicators:
65
- print("⚠️ WARNING: Title may not be in French")
66
- else:
67
- print("✅ French title generated")
68
- elif language == "spanish":
69
- # Spanish often has accented characters or specific words
70
- spanish_indicators = any(char in title for char in "áéíóúüñ") or "IA" in title or "sociedad" in title or "impacto" in title
71
- if not spanish_indicators:
72
- print("⚠️ WARNING: Title may not be in Spanish")
73
- else:
74
- print("✅ Spanish title generated")
75
-
76
- except Exception as e:
77
- print(f"❌ Test failed for {language}: {e}")
78
-
79
- if __name__ == "__main__":
80
- print("Starting multilingual title tests...")
81
- print("=" * 50)
82
-
83
- test_multilingual_titles()
84
-
85
- print("\n" + "=" * 50)
86
- print("Multilingual title tests completed!")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test_summary_language.py DELETED
@@ -1,51 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Quick test for multilingual summarization functionality.
4
- Tests summary generation in one language to verify functionality.
5
- """
6
-
7
- import sys
8
- import os
9
- sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
10
-
11
- from src.summarization import summarize_transcript
12
- from src.utils import available_gguf_llms
13
-
14
- def test_single_language_summary():
15
- """Test summary generation in one language"""
16
- print("Testing multilingual summary generation...")
17
-
18
- # Use the first available model
19
- model_name = list(available_gguf_llms.keys())[0]
20
- print(f"Using model: {model_name}")
21
-
22
- # Test with French transcript
23
- french_transcript = "Bonjour à tous, aujourd'hui nous allons discuter de l'intelligence artificielle. L'IA transforme notre société de manière profonde. Elle impacte notre travail, notre vie quotidienne, et même notre façon de penser. Il est important de comprendre ces changements."
24
-
25
- print("\n--- Testing FRENCH Summary ---")
26
- print(f"Transcript: {french_transcript}")
27
-
28
- try:
29
- # Generate summary
30
- summary_parts = list(summarize_transcript(french_transcript, model_name, "Résumez ce transcript"))
31
- summary = ''.join(summary_parts)
32
- print(f"Generated summary: {summary}")
33
-
34
- # Check if summary contains French
35
- french_indicators = any(char in summary for char in "àâäéèêëïîôöùûüÿç") or "IA" in summary or "société" in summary or "intelli" in summary
36
- if french_indicators:
37
- print("✅ French summary generated")
38
- else:
39
- print("⚠️ WARNING: Summary may not be in French")
40
-
41
- except Exception as e:
42
- print(f"❌ Summary test failed: {e}")
43
-
44
- if __name__ == "__main__":
45
- print("Starting summary language test...")
46
- print("=" * 40)
47
-
48
- test_single_language_summary()
49
-
50
- print("\n" + "=" * 40)
51
- print("Summary language test completed!")