Publications

My publications

  <div class="row">
    <div class="col-sm-2 abbr"><abbr class="badge">Arxiv</abbr></div>

    <!-- Entry bib key -->
    <div id="yin2023OWL" class="col-sm-8">
    
      <!-- Title -->
      <div class="title">Outlier Weighed Layerwise Sparsity OWL: A Missing Secret Sauce for Pruning LLMs to High Sparsity</div>
      <!-- Author -->
      <div class="author"><b>Yin, Lu</b>;&nbsp;You, Wu;&nbsp;Zhenyu, Zhang;&nbsp;Cheng-Yu, Hsieh;&nbsp;Yaqing, Wang;&nbsp;Yiling, Jia;&nbsp;Mykola, Pechenizkiy;&nbsp;Yi, Liang;&nbsp;Zhangyang, Wang;&nbsp;and Shiwei, Liu
      </div>

      <!-- Journal/Book title and date -->
      <div class="periodical">
        <em>In Arxiv</em> 2023
      </div>
    
      <!-- Links/Buttons -->
      <div class="links">
        <a class="abstract btn btn-sm z-depth-0" role="button">Abs</a>
        <a href="https://arxiv.org/pdf/2310.05175.pdf" class="btn btn-sm z-depth-0" role="button">HTML</a>
      </div>

      <!-- Hidden abstract block -->
      <div class="abstract hidden">
        <p>Large Language Models (LLMs), renowned for their remarkable performance, present a challenge due to their colossal model size when it comes to practical deployment. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters can be pruned in one-shot without hurting performance. Building upon insights gained from pre-LLM models, prevailing LLM pruning strategies have consistently adhered to the practice of uniformly pruning all layers at equivalent sparsity. However, this observation stands in contrast to the prevailing trends observed in the field of vision models, where non-uniform layerwise sparsity typically yields substantially improved results. To elucidate the underlying reasons for this disparity, we conduct a comprehensive analysis of the distribution of token features within LLMs. In doing so, we discover a strong correlation with the emergence of outliers, defined as features exhibiting significantly greater magnitudes compared to their counterparts in feature dimensions. Inspired by this finding, we introduce a novel LLM pruning methodology that incorporates a tailored set of non-uniform layerwise sparsity ratios specifically designed for LLM pruning, termed as Outlier Weighed Layerwise sparsity (OWL). The sparsity ratio of OWL is directly proportional to the outlier ratio observed within each layer, facilitating a more effective alignment between layerwise weight sparsity and outlier ratios. Our empirical evaluation, conducted across the LLaMA-V1 family and OPT, spanning various benchmarks, demonstrates the distinct advantages offered by OWL over previous methods. For instance, our approach exhibits a remarkable performance gain, surpassing the state-of-the-art Wanda and SparseGPT by 61.22 and 6.80 perplexity at a high sparsity level of 70%, respectively. </p>
      </div>
    </div>
  </div>

</li>

  • AAAI
    Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost
    Yin, Lu; Liu, Shiwei; Fang, Meng; Huang, Tianjin; Menkovski, Vlado; and Pechenizkiy,
    In Thirty-Seventh AAAI Conference on Artificial Intelligence 2023
  • </ol>

    2022

    1. LoG (Best Paper)
      You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets
      Huang, Tianjin; Chen, Tianlong; Fang, Meng; Menkovski, Vlado; Zhao, Jiaxu; Yin, Lu; Yulong Pei, Decebal Constantin Mocanu; Wang, Zhangyang; Pechenizkiy, Mykola; and Liu, Shiwei
      In Learning on Graphs Conference 2022
    2. UAI
      Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training
      Yin, Lu; Menkovski, Vlado; Fang, Meng; Huang, Tianjin; Pei, Yulong; Pechenizkiy, Mykola; Mocanu, Decebal Constantin; and Liu, Shiwei
      In The 38th Conference on Uncertainty in Artificial Intelligence 2022

    2021

    1. ICML
      Do we actually need dense over-parameterization? in-time over-parameterization in sparse training
      Liu, Shiwei; Yin, Lu; Mocanu, Decebal Constantin; and Pechenizkiy, Mykola
      In International Conference on Machine Learning 2021
    2. NeurIPS
      Sparse training via boosting pruning plasticity with neuroregeneration
      Liu, Shiwei; Chen, Tianlong; Chen, Xiaohan; Atashgahi, Zahra; Yin, Lu; Kou, Huanyu; Shen, Li; Pechenizkiy, Mykola; Wang, Zhangyang; and Mocanu, Decebal Constantin
      Advances in Neural Information Processing Systems 2021

    </div>


    Data Efficiency and Knowledge Elicitation

    2021

    1. AAAI (Workshop)
      Semantic-Based Few-Shot Learning by Interactive Psychometric Testing
      Yin, Lu; Menkovski, Vlado; Pei, Yulong; and Pechenizkiy, Mykola
      AAAI 2022 Workshop on Interactive Machine Learning (IML@AAAI22) 2021
    2. ACML (Long Oral)
      Hierarchical Semantic Segmentation using Psychometric Learning
      Yin, Lu; Menkovski, Vlado; Liu, Shiwei; and Pechenizkiy, Mykola
      Proceedings of Machine Learning Research 2021
    3. IJCAI (DC)
      Beyond labels: knowledge elicitation using deep metric learning and psychometric testing
      Yin, Lu
      In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence 2021

    2020

    1. ECML
      Knowledge Elicitation Using Deep Metric Learning and Psychometric Testing
      Yin, Lu; Menkovski, Vlado; and Pechenizkiy, Mykola
      In Joint European Conference on Machine Learning and Knowledge Discovery in Databases 2020

    –>