NLP Assignment
This site accompanies our course project for CS 613: Natural Language Processing at IIT Gandhinagar.
Our project is titled "Beyond Naive Merging: Enhancing LLM Compression via Alpha Optimization, Task-Specific Similarity, and Neural Alignment".
Project overview
Large Language Models (LLMs) are powerful but expensive to deploy. We build on the "Pruning via Merging" (MKA) method and study how to make layer-merging based compression more principled and reliable. Concretely, we:
- Treat the merge weight \(\alpha\) as a trainable parameter and optimize it using data driven methods rather than a fixed similarity heuristic.
- Show that layer similarity is task and language dependent by analyzing similarity heatmaps across MMLU domains and Global MMLU languages.
- Propose an "align then merge" pipeline that first aligns neurons and then merges layers, improving stability at higher compression ratios.
The main experimental results and plots are available via the navigation links:
- Neural Alignment – prompt browser and alignment based comparisons.
- Similarity – similarity visualizations and correlation tables.
- Alpha Values – analysis of learned \(\alpha\) values and compression accuracy.
Our code is available at: https://github.com/Jain-Laksh/Layer-Merging-via-Manifold-Alignment.
Acknowledgements
- Course: CS 613 – Natural Language Processing, IIT Gandhinagar
- Instructor: Prof. Mayank Singh
- Teaching Assistant: Sailesh Panda
- Team: "Dropout Squad" (third year B.Tech, IIT Gandhinagar)
- Aditya Borate
- Aryan Solanki
- Laksh Jain
- Nishchay Bhutoria
- Parthiv Patel
- Rudra Pratap Singh
- Soham Gaonkar