what is the possible max new token?

#15

by PF94 - opened Dec 4, 2024

PF94

Dec 4, 2024

I tried the example inference code and it worked great. But when I increase the max_new_token to 2048, most of the time I got either RuntimeError: CUDA error: device-side assert triggered or RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED. In config.json max_position_embeddings and max_length are 4096. Does it mean the model can output up to 4096 tokens?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment