Resolving Interference When Merging Models
Paper
•
2306.01708
•
Published
•
15
This is a merge of pre-trained language models created using mergekit.
This model was merged using the TIES merge method using GoToCompany/llama3-8b-cpt-sahabatai-v1-instruct as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: ./full_model_0
parameters:
density: 0.5 # Keeps the top 50% of most changed weights
weight: 0.33 # Give each model roughly equal weight
- model: ./full_model_1
parameters:
density: 0.5 # Keeps the top 50% of most changed weights
weight: 0.33 # Give each model roughly equal weight
- model: ./full_model_2
parameters:
density: 0.5 # Keeps the top 50% of most changed weights
weight: 0.33 # Give each model roughly equal weight
- model: ./full_model_3
parameters:
density: 0.5 # Keeps the top 50% of most changed weights
weight: 0.33 # Give each model roughly equal weight
merge_method: ties
base_model: GoToCompany/llama3-8b-cpt-sahabatai-v1-instruct # TIES needs the original base model
parameters:
normalize: true
dtype: float16
Base model
meta-llama/Meta-Llama-3-8B-Instruct