Similarity

This page summarizes how redundancy patterns behave across tasks and languages, and how these similarities relate to the data shown below.

The task level analysis shows that redundancy patterns are strongly structured by the underlying task.

Across MMLU tasks, redundant layers are concentrated toward the final layers (approximately layers 25 to 31), which supports the idea that merging should primarily target later layers.
Redundancy patterns are not uniform across tasks, indicating clear task dependence.
Tasks that are conceptually similar tend to have more similar redundancy patterns (for example, Math and Computer Science, or Legal and Humanities).
Task level similarity is generally high; the Math and Computer Science pair is the strongest with a correlation of 0.951.

	medical	legal	math	cs	humanities
medical	1.000000	0.887484	0.966250	0.940329	0.931864
legal	0.887484	1.000000	0.884862	0.899014	0.862810
math	0.966250	0.884862	1.000000	0.951172	0.903280
cs	0.940329	0.899014	0.951172	1.000000	0.885816
humanities	0.931864	0.862810	0.903280	0.885816	1.000000

MMLU heatmaps (tasks)

The language level analysis reveals that varying the language, even for a fixed task, has an even larger impact on redundancy.

When the task is fixed (medical) but the language changes, layer similarity varies more strongly.
Redundancy across languages becomes less consistent and more irregular.
Cross language correlations are lower (for example, Spanish and Chinese have a correlation of 0.730), whereas correlations across tasks stay above 0.86.
These trends suggest that language has a stronger influence on layer redundancy than the task domain.

GMMLU heatmaps (languages)