参数剪枝相关研究 - glqglq/ml_dl_wiki GitHub Wiki

常见问题

  • 先剪哪个层?
    • 使用能量预估:Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning(CVPR2017)
    • NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications(ECCV2018)
  • 修剪后的是否会被恢复?
    • Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
  • 全局修剪or分层修剪?
  • 修剪比例?
    • 全局的VS各层的
    • 自定义比例VS机器学习比例:
      • 自定义比例一般通过层敏感度来确定比例。
      • 机器学习比例:
        • Learning Efficient Convolutional Networks through Network Slimming
        • 建模成min-max优化问题,然后通过两个模块交替迭代分别进行裁剪和通过调节pruning rate控制精度损失:Play and Prune: Adaptive Filter Pruning for Deep Model Compression
        • 根据不同需求(如保证精度还是限制计算量),利用强化学习来学习每一层最优的sparsity ratio:ADC: Automated Deep Compression and Acceleration with Reinforcement Learning
        • 将原参数向约束表示的可行集投影来自动找到每层的最优sparsity ratio:Learning-Compression” Algorithms for Neural Net Pruning
  • 什么修剪标准?
    • weight:
      • l1:Pruning Filters for Efficient ConvNets(ICLR2017)
      • 相似性:Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration(CVPR2019)
      • BN:Learning Efficient Convolutional Networks Through Network Slimming
    • gradient、loss:
      • Optimal Brain Damage(NIPS1990),需要计算Hessian矩阵或其近似比较费时
      • Second order derivatives: Optimal Brain Surgeon(NIPS1993),需要计算Hessian矩阵或其近似比较费时
      • 对activation在0点进行泰勒展开:Pruning Convolutional Neural Networks for Resource Efficient(ICLR2017)
      • 换成weight的展开再加个平方:Importance Estimation for Neural Network Pruning(CVPR2019)
      • 用Fisher信息来近似Hessian矩阵:Faster gaze prediction with dense networks and Fisher pruning
      • 归一化的目标函数相对于参数的导数绝对值:SNIP: Single-shot Network Pruning based on Connection Sensitivity(ICLR2019)
    • activation:
      • 激活中0的比例(APoZ):Network trimming: A data-driven neuron pruning approach towards efficient deep architectures
      • 信息熵:An Entropy-based Pruning Method for CNN Compression
    • 重建误差:
      • 最小化特征重建误差(Feature reconstruction error)来确定哪些channel需要裁剪:
        • 贪心法:ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
        • Lasso Regression:Lasso Regression:Channel Pruning for Accelerating Very Deep Neural Networks(ICLR2017)
      • 最小化网络倒数第二层的重建误差,并将反向传播的误差累积考虑在内:NISP: Pruning Networks using Neuron Importance Score Propagation(CVPR2018)
      • 一方面在中间层添加额外的discrimination-aware loss(用以强化中间层的判别能力),另一方面也考虑特征重建误差的loss,综合两方面loss对于参数的梯度信息,决定哪些为需要被裁剪的channel:Discrimination-aware Channel Pruning for Deep Neural Networks
  • 是否稀疏训练?
    • 用group Lasso进行结构化稀疏:
      • Learning Structured Sparsity in Deep Neural Networks
      • Sparse Convolutional Neural Networks
    • 引入可学习的mask,用APG算法来稀疏mask:Data-Driven Sparse Structure Selection for Deep Neural Networks(ECCV2018)
    • 用约束优化中的经典算法ADMM来求解:A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers(ECCV2018)
    • L1正则:Learning Efficient Convolutional Networks through Network Slimming(ICCV2017)
    • ISTA:Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers(ICLR2018)
    • L1正则+width-multiplier+shink-expand:MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks(CVPR2018)
  • 求解方法
    • 贪心:
    • 规划问题:
    • Bayesian方法:
    • 基于梯度的方法:
    • 基于聚类的方法:
  • 单次剪枝(One-Shot)VS迭代式(Iterative)剪枝。
  • 训练前剪枝VS训练中剪枝VS训练后剪枝。
  • 多久进行一次剪枝?
  • 剪枝后是否使用剪枝前的权重进行finetune?
  • 迭代分层剪枝某层后finetune别的层是否动?
  • 迭代分层剪枝所有层后是否全局finetune?

参考文献