|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- text-classification |
|
|
- pytorch |
|
|
- jax |
|
|
- code_x_glue_cc_defect_detection |
|
|
- code |
|
|
- roberta |
|
|
- security |
|
|
- vulnerability-detection |
|
|
- codebert |
|
|
- apache-2.0 |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# CodeBERT fine-tuned for Java Vulnerability Detection |
|
|
|
|
|
CodeBERT model fine-tuned for detecting security vulnerabilities in Java code. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code. |
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
- Detect security vulnerabilities in Java source code |
|
|
- Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1) |
|
|
|
|
|
## How to Use |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java") |
|
|
|
|
|
# run code |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
import numpy as np |
|
|
tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') |
|
|
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') |
|
|
|
|
|
inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length') |
|
|
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 |
|
|
outputs = model(**inputs, labels=labels) |
|
|
loss = outputs.loss |
|
|
logits = outputs.logits |
|
|
|
|
|
print(np.argmax(logits.detach().numpy())) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
Trained on CodeXGLUE Defect Detection dataset. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Focused on Java code only |
|
|
- May not detect all types of vulnerabilities |