--- library_name: transformers pipeline_tag: text-classification tags: - text-classification - pytorch - jax - code_x_glue_cc_defect_detection - code - roberta - security - vulnerability-detection - codebert - apache-2.0 license: apache-2.0 --- # CodeBERT fine-tuned for Java Vulnerability Detection CodeBERT model fine-tuned for detecting security vulnerabilities in Java code. ## Model Description This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code. ## Intended Uses - Detect security vulnerabilities in Java source code - Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1) ## How to Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java") model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java") # run code ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch import numpy as np tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length') labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 outputs = model(**inputs, labels=labels) loss = outputs.loss logits = outputs.logits print(np.argmax(logits.detach().numpy())) ``` ## Training Data Trained on CodeXGLUE Defect Detection dataset. ## Limitations - Focused on Java code only - May not detect all types of vulnerabilities