Evaluating In-Context Learning Ability

by mustafaa - opened Jul 20, 2024

Jul 20, 2024

•

edited Jul 24, 2024

Hello,

First of all, I would like to thank you for this amazing project. I am evaluating Chameleon's in-context learning ability. However, I think I am missing something about the inference process. When I work with a zero-shot setting, the model outputs are normal. However, with a few-shot setting, the model's responses are awkward. It sometimes avoids answering and occasionally outputs irrelevant characters. I do not encounter this problem in the zero-shot setting. Below you can find the code that I used.

def load_model(self, args) -> None:
    """
    Load the Chameleon model and processor.

    Parameters:
    - args: The arguments to load the model.

    Returns:
    None
    """
    
    from transformers import ChameleonForConditionalGeneration, ChameleonProcessor, BitsAndBytesConfig

    print('Loading Chameleon!!!')

    self.model = ChameleonForConditionalGeneration.from_pretrained(
        args.hf_path,
        device_map="cuda:0",
        torch_dtype=torch.bfloat16,
    ).to(args.device).eval()

    self.processor = ChameleonProcessor.from_pretrained(args.hf_path)
    self.generation_cfg = {
        'do_sample': True,
        'temperature': 0.7,
        'top_p': 0.9,
        'repetition_penalty': 1.2,
    }

    if args.is_zero_cot_active or args.is_few_cot_active:
      self.generation_cfg['max_new_tokens'] = 512
    else:
      self.generation_cfg['max_new_tokens'] = 50

    print('Chameleon loaded!!!')

def calculate_generated_text(self, prompt, vision_x):
    """
    Calculate generated text given a prompt and vision data.

    Parameters:
    - prompt (str): The input prompt.
    - vision_x (list[PIL Images]): List of PIL Images containing vision data.

    Returns:
    Tuple[str, str]: Tuple containing the raw and salt answer text.
    """

   """
   Example Prompt:
   In zero-shot: "<image> <Question> <Options> Answer: "
   In few-shot: "<image> <Question> <Options> Answer: <Answer> <image> <Question> <Options> Answer: "
   """ 
    if self.model is None or self.processor is None:
        raise AttributeError('Model or processor is not initialized. Call load_model first!')

    inputs = self.processor(prompt, images=vision_x, padding=True, return_tensors="pt").to(device=self.model.device, dtype=torch.bfloat16)

    out = self.model.generate(**inputs,  **self.generation_cfg)
    
    generated_text = self.processor.decode(out[0], skip_special_tokens=True)
    
    salt_prompt = prompt.replace("<image>", "")

    salt_answer = generated_text[len(salt_prompt):]

    return generated_text, salt_answer

eve1234

Jul 23, 2024

Hi @mustafaa . I'm highly interested in trying this model but there are no clear instructions yet, so I tried your code. I'm wondering how you deal with prompt length? I got an ValueError when executing inputs=processor(...), and the error still exists after I set generation_cfg.max_length and generation_cfg.max_new_tokens=2048. My prompt is "<image> Briefly describe the image. ". Just the image would have length more than 1000 so I can't really reduce the input length.

ValueError: Input length of input_ids is 1029, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, > setting max_new_tokens.

mustafaa

Jul 24, 2024

Hi @eve1234 . I think I made a mistake in the generate function. It should be:

out = self.model.generate(**inputs, **self.generation_cfg)

If it doesn't work, then you can try to feed each argument separately as in this post.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment