Model Overview

Model Summary

OpenAI's GPT OSS Safeguard is an open-weight safety reasoning model designed for content classification and foundational safety tasks.Built upon the GPT OSS architecture, this model allows developers to classify text content based on their own provided safety policies, making it highly adaptable for use cases like LLM input-output filtering and content labeling for trust and safety. Released under the permissive Apache 2.0 license, it encourages broad experimentation, customization, and commercial deployment without restrictive licensing concerns.

A key innovation of the GPT OSS Safeguard is its ability to interpret a developer's written policy directly at inference time.This "bring your own policy" approach offers significant flexibility, as the safety guidelines are not hard-coded into the model but are provided with the content to be classified.This allows for iterative refinement of policies to improve performance and tailor the model's behavior to specific needs, such as a gaming forum moderating discussions on cheating or a review site screening for fake reviews.

To enhance transparency and trust, the model provides full access to its chain-of-thought reasoning process, enabling developers to understand and debug the model's decisions.Furthermore, it features a configurable reasoning effort, allowing users to adjust between low, medium, and high settings to balance performance and latency for their specific application.This model is part of OpenAI's commitment to advancing open-source AI safety tools in collaboration with the community.

For more details, please refer to GPT OSS Blog, GitHub.

Weights are released under the Apache 2 License . Keras model code is released under the Apache 2 License.

Links

Installation

Keras and KerasHub can be installed with:

pip install -U -q keras-hub
pip install -U -q keras

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.

Available GPT OSS Presets.

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

Preset Parameters Description
gpt_oss_safeguard_20b_en 20B This preset has 21 billion total parameters, with 3.6 billion active parameters, a context length of over 128k, and is de-quantized from MXFP4.
gpt_oss_safeguard_120b_en 120B This preset has 117 billion total parameters, with 5.1 billion active parameters, a 128k context length, and is de-quantized from MXFP4.
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including keras/gpt_oss_safeguard_20b_en