How to finetune models
#56
by
yangguofeng
- opened
hello,
I currently have access to approximately 30 million protein sequences and am fine-tuning ProtGPT2 using a LoRA-based approach due to limited computational resources. Given that ProtGPT2 contains 36 transformer blocks, I am wondering whether it is more effective to apply LoRA adapters to all blocks, or to restrict them to a subset of blocks (such as the higher or middle-to-high layers) in order to balance performance and efficiency. I would appreciate any guidance on which blocks tend to be most important for adaptation in this setting while preserving the pretrained protein-level priors.