File size: 1,741 Bytes
411d58a ba7d849 3b0aa07 ba7d849 3b0aa07 ba7d849 3b0aa07 ba7d849 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 dd0fb08 3b0aa07 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
library_name: transformers
pipeline_tag: text-classification
tags:
- text-classification
- pytorch
- jax
- code_x_glue_cc_defect_detection
- code
- roberta
- security
- vulnerability-detection
- codebert
- apache-2.0
license: apache-2.0
---
# CodeBERT fine-tuned for Java Vulnerability Detection
CodeBERT model fine-tuned for detecting security vulnerabilities in Java code.
## Model Description
This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code.
## Intended Uses
- Detect security vulnerabilities in Java source code
- Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1)
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java")
model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java")
# run code
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length')
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits
print(np.argmax(logits.detach().numpy()))
```
## Training Data
Trained on CodeXGLUE Defect Detection dataset.
## Limitations
- Focused on Java code only
- May not detect all types of vulnerabilities |