Add subcards

#4
by trebedea - opened
Files changed (1) hide show
  1. README.md +60 -1
README.md CHANGED
@@ -680,4 +680,63 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
680
 
681
  For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
682
 
683
- Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
680
 
681
  For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
682
 
683
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
684
+
685
+ ### Plus Plus (++) Promise
686
+
687
+ We value you, the datasets, the diversity they represent, and what we have been entrusted with. This model and its associated data have been:
688
+
689
+ * Verified to comply with current applicable disclosure laws, regulations, and industry standards.
690
+ * Verified to comply with applicable privacy labeling requirements.
691
+ * Annotated to describe the collector/source (NVIDIA or a third-party).
692
+ * Characterized for technical limitations.
693
+ * Reviewed to ensure proper disclosure is accessible to, maintained for, and in compliance with NVIDIA data subjects and their requests.
694
+ * Reviewed before release.
695
+ * Tagged for known restrictions and potential safety implications.
696
+
697
+ ### Bias
698
+
699
+ | Field | Response |
700
+ |-------|----------|
701
+ | Participation considerations from adversely impacted groups protected classes in model design and testing: | None |
702
+ | Measures taken to mitigate against unwanted bias: | Reasoning traces in training dataset were investigated against political bias and propaganda using automatic filters and human evaluation. |
703
+
704
+ ### Explainability
705
+
706
+ | Field | Description |
707
+ |-------|-------------|
708
+ | Intended Domain | Content Safety / Custom Content Safety / Topic-following / Dialogue Moderation |
709
+ | Model Type | Classifier with a reasoning trace |
710
+ | Intended Users | AI/ML Engineers, LLM Developers, Safety Assurance Teams |
711
+ | Output | Types: Text<br><br>Formats: The output format depends on the selected mode:<br><br>• Reasoning Off:<br>`Prompt harm: harmful/unharmful`<br>`Response Harm: harmful/unharmful`<br><br>• Reasoning On:<br>`<think> [Model's reasoning trace] </think>`<br>`Prompt harm: harmful/unharmful`<br>`Response Harm: harmful/unharmful` |
712
+ | Describe how the model works: | Type: Finetuned Transformer (Decoder-only) working as a classifier with a reasoning trace.<br>Backbone: Google Gemma-3-4B-it<br>Parameters: 4B (Billion) |
713
+ | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
714
+ | Technical Limitations: | • Performance might degrade on very specific custom safe harms, we advise developers to evaluate the peformance of the model on specific evaluations sets before using in production. |
715
+ | Verified to have met prescribed NVIDIA quality standards: | Yes |
716
+ | Performance Metrics: | • F-1 Score<br>• Throughput/Latency<br>• Reasoning Efficiency |
717
+ | Potential Known Risks: | • The model may misclassify or fail to detect harmful content for categories not well-represented in its training data (e.g., specific types of harassment, threats, or hate speech).<br>• As with any safety model, it can produce false positives or false negatives. |
718
+ | Terms of Use: | Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/), [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy). |
719
+
720
+ ### Privacy
721
+
722
+ | Field | Description |
723
+ |---------------|-------------|
724
+ | Generatable or reverse engineerable personal data? | No |
725
+ | Personal data used to create this model? | No |
726
+ | How often is dataset reviewed? | Before Every Release |
727
+ | Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | Yes |
728
+ | Is there provenance for all datasets used in training? | Yes |
729
+ | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
730
+ | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
731
+ | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ |
732
+
733
+ ### Safety
734
+
735
+ | Field | Response |
736
+ |-------|----------|
737
+ | Model Application(s): | Large Language Model-based Content Safety & Moderation |
738
+ | Describe the life-critical impact (if present). | Not Applicable |
739
+ | Use Case Restrictions: | Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/), [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy). |
740
+ | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
741
+
742
+