Add subcards
#4
by
trebedea
- opened
README.md
CHANGED
|
@@ -680,4 +680,63 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
|
|
| 680 |
|
| 681 |
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
|
| 682 |
|
| 683 |
-
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 680 |
|
| 681 |
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
|
| 682 |
|
| 683 |
+
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
|
| 684 |
+
|
| 685 |
+
### Plus Plus (++) Promise
|
| 686 |
+
|
| 687 |
+
We value you, the datasets, the diversity they represent, and what we have been entrusted with. This model and its associated data have been:
|
| 688 |
+
|
| 689 |
+
* Verified to comply with current applicable disclosure laws, regulations, and industry standards.
|
| 690 |
+
* Verified to comply with applicable privacy labeling requirements.
|
| 691 |
+
* Annotated to describe the collector/source (NVIDIA or a third-party).
|
| 692 |
+
* Characterized for technical limitations.
|
| 693 |
+
* Reviewed to ensure proper disclosure is accessible to, maintained for, and in compliance with NVIDIA data subjects and their requests.
|
| 694 |
+
* Reviewed before release.
|
| 695 |
+
* Tagged for known restrictions and potential safety implications.
|
| 696 |
+
|
| 697 |
+
### Bias
|
| 698 |
+
|
| 699 |
+
| Field | Response |
|
| 700 |
+
|-------|----------|
|
| 701 |
+
| Participation considerations from adversely impacted groups protected classes in model design and testing: | None |
|
| 702 |
+
| Measures taken to mitigate against unwanted bias: | Reasoning traces in training dataset were investigated against political bias and propaganda using automatic filters and human evaluation. |
|
| 703 |
+
|
| 704 |
+
### Explainability
|
| 705 |
+
|
| 706 |
+
| Field | Description |
|
| 707 |
+
|-------|-------------|
|
| 708 |
+
| Intended Domain | Content Safety / Custom Content Safety / Topic-following / Dialogue Moderation |
|
| 709 |
+
| Model Type | Classifier with a reasoning trace |
|
| 710 |
+
| Intended Users | AI/ML Engineers, LLM Developers, Safety Assurance Teams |
|
| 711 |
+
| Output | Types: Text<br><br>Formats: The output format depends on the selected mode:<br><br>• Reasoning Off:<br>`Prompt harm: harmful/unharmful`<br>`Response Harm: harmful/unharmful`<br><br>• Reasoning On:<br>`<think> [Model's reasoning trace] </think>`<br>`Prompt harm: harmful/unharmful`<br>`Response Harm: harmful/unharmful` |
|
| 712 |
+
| Describe how the model works: | Type: Finetuned Transformer (Decoder-only) working as a classifier with a reasoning trace.<br>Backbone: Google Gemma-3-4B-it<br>Parameters: 4B (Billion) |
|
| 713 |
+
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
|
| 714 |
+
| Technical Limitations: | • Performance might degrade on very specific custom safe harms, we advise developers to evaluate the peformance of the model on specific evaluations sets before using in production. |
|
| 715 |
+
| Verified to have met prescribed NVIDIA quality standards: | Yes |
|
| 716 |
+
| Performance Metrics: | • F-1 Score<br>• Throughput/Latency<br>• Reasoning Efficiency |
|
| 717 |
+
| Potential Known Risks: | • The model may misclassify or fail to detect harmful content for categories not well-represented in its training data (e.g., specific types of harassment, threats, or hate speech).<br>• As with any safety model, it can produce false positives or false negatives. |
|
| 718 |
+
| Terms of Use: | Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/), [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy). |
|
| 719 |
+
|
| 720 |
+
### Privacy
|
| 721 |
+
|
| 722 |
+
| Field | Description |
|
| 723 |
+
|---------------|-------------|
|
| 724 |
+
| Generatable or reverse engineerable personal data? | No |
|
| 725 |
+
| Personal data used to create this model? | No |
|
| 726 |
+
| How often is dataset reviewed? | Before Every Release |
|
| 727 |
+
| Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | Yes |
|
| 728 |
+
| Is there provenance for all datasets used in training? | Yes |
|
| 729 |
+
| Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
|
| 730 |
+
| Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
|
| 731 |
+
| Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ |
|
| 732 |
+
|
| 733 |
+
### Safety
|
| 734 |
+
|
| 735 |
+
| Field | Response |
|
| 736 |
+
|-------|----------|
|
| 737 |
+
| Model Application(s): | Large Language Model-based Content Safety & Moderation |
|
| 738 |
+
| Describe the life-critical impact (if present). | Not Applicable |
|
| 739 |
+
| Use Case Restrictions: | Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/), [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy). |
|
| 740 |
+
| Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
|
| 741 |
+
|
| 742 |
+
|