File size: 1,741 Bytes
411d58a
ba7d849
 
3b0aa07
ba7d849
 
 
 
3b0aa07
ba7d849
3b0aa07
 
 
ba7d849
 
411d58a
 
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
 
411d58a
3b0aa07
 
 
411d58a
3b0aa07
 
411d58a
3b0aa07
411d58a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b0aa07
 
 
411d58a
3b0aa07
dd0fb08
3b0aa07
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
library_name: transformers
pipeline_tag: text-classification
tags:
- text-classification
- pytorch
- jax
- code_x_glue_cc_defect_detection
- code
- roberta
- security
- vulnerability-detection
- codebert
- apache-2.0
license: apache-2.0
---

# CodeBERT fine-tuned for Java Vulnerability Detection

CodeBERT model fine-tuned for detecting security vulnerabilities in Java code.

## Model Description

This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code.

## Intended Uses

- Detect security vulnerabilities in Java source code
- Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1)

## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java")
model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java")

# run code
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')

inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length')
labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits

print(np.argmax(logits.detach().numpy()))
```

## Training Data

Trained on CodeXGLUE Defect Detection dataset.

## Limitations

- Focused on Java code only
- May not detect all types of vulnerabilities