Source

NLDB

DATE OF PUBLICATION

07/01/2025

Authors

Daniil Moskovskiy Sergey Pletenev Sergey Zagoruyko Alexander Panchenko

Share

Memory Efficient LM Compression using Fisher Information from Low-Rank Representations

Compression, LoRA, SVD, Fisher information

Abstract

Although modern language models (LMs) demonstrate excellentperformance in diverse text processing tasks, the substantial GPUmemory required to load and infer these models can be prohibitive tousers. To compress and accelerate LMs, various techniques, such as quantization,distillation, pruning, and low-rank factorization, are used. Inthis work, we focus on improving a method from the latter category,namely, a recent technique Fisher-Weighted Singular Value Decomposition(FWSVD). Despite its efficiency, FWSVD requires fine-tuning ofthe whole model on a downstream task. We introduce a simple, yetpowerful, modification of FWSVD that enables compression of modelspreviously unavailable with the original approach. By combining LoRAwith FWSVD we demonstrate that low-rank-based compression can beachieved without storing the full gradients, sometimes even outperformingthe original full fine-tuning. We evaluate our proposed approach onvarious NLP tasks, including NLU, NER, text summarization, and QA,showing its effectiveness compared to strong baselines.

Full text

Run LoRA Run: Faster and Lighter LoRA Implementations

Daria Cherniuk, Aleksandr Mikhalev, Ivan Oseledets

SOURCE

CLEAR: Character Unlearning in Textual and Visual Modalities

Alexey Dontsov, Dmitrii Korzh, Alexey Zhavoronkin, Boris Mikheev, Denis Bobkov, Aibek Alanov, Oleg Rogov, Ivan Oseledets, Elena Tutubalina

SOURCE

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Mikhail Salnikov, Andrey Sakhovskiy, Irina Nikishina, Aida Usmanova, Angelie Kraft, Cedric Möller, Debayan Banerjee, Junbo Huang, Longquan Jiang, Rana Abdullah, Xi Yan, Elena Tutubalina, Ricardo Usbeck, Alexander Panchenko

SOURCE

The benefits of query-based KGQA systems for complex and temporal questions in LLM era

Artem Alekseev, Mikhail Chaichuk, Miron Butko, Alexander Panchenko, Elena Tutubalina, Oleg Somov

SOURCE

ИЗМЕРИТЕЛЬНЫЕ СИГНАЛЫ НА ОСНОВЕ ПЕРЕСТАНОВОЧНЫХ ПОЛИНОМОВ ДЛЯ ВОСПОЛНЕНИЯ ТЕНЗОРОВ КАНАЛА OFDM MIMO

, , , Vladimir Lyashev, Ivan Oseledets

SOURCE

Optimal experimental design: from design point to design region

Martin Bubel, Philipp Seufert, Gleb Karpov, Jan Schwientek, Michael Bortz, Ivan Oseledets

SOURCE

CC-CERT: A Probabilistic Approach to Certify General Robustness of Neural Networks

Mikhail Pautov, Nurislam Tursynbek, Marina Munkhoeva, Nikita Muravev, Alexander Petiushko, Ivan Oseledets

SOURCE