Adaptive Optimizers with Sparse Group Lasso
November 27, 2024 · View on GitHub
我们展示 TFPlus 带有 Sparse Group Lasso 功能优化器的使用方法,详细原理见 Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction,ECML PKDD '21。
Lasso 和 Group Lasso 可以用来对模型进行稀疏化压缩,自动选择重要特征。我们提出基于 Sparse Group Lasso 的通用特征选择框架,对大部分常用优化器加入了 Sparse Group Lasso 功能,在蚂蚁的在线学习以及离线训练场景被广泛使用。目前开源的优化器包括 Group Adam 和 Sparse Group Ftrl (Group AdaGrad),分别对应 Adam 和 AdaGrad。
使用示例
import tfplus
l1 = 1.0E-5
l2 = 1.0E-5
l21 = 1.0E-5
# Group Adam
opt = tfplus.train.GroupAdamOptimizer(
learning_rate=1e-4,
initial_accumulator_value=0.0,
beta1=0.9,
beta2=0.999,
epsilon=1e-8,
l1_regularization_strength=l1,
l2_regularization_strength=l2,
l21_regularization_strength=l21,
use_locking=False,
name="GroupAdam",
accum_name=None,
linear_name=None,
version=4)
"""Construct a new Group Adam optimizer.
Args:
learning_rate: A float value or a constant float `Tensor`.
initial_accumulator_value: The starting value for accumulators.
Only zero or positive values are allowed.
beta1: A float value or a constant float tensor.
The exponential decay rate for the 1st moment estimates.
beta2: A float value or a constant float tensor.
The exponential decay rate for the 2nd moment estimates.
epsilon: A small constant for numerical stability. This epsilon is
"epsilon hat" in the Kingma and Ba paper (in the formula just before
Section 2.1), not the epsilon in Algorithm 1 of the paper.
l1_regularization_strength: A float value, must be greater than or
equal to zero.
l2_regularization_strength: A float value, must be greater than or
equal to zero.
l21_regularization_strength: A float value, must be greater than or
equal to zero.
use_locking: If `True` use locks for update operations.
name: Optional name prefix for the operations created when applying
gradients. Defaults to "GroupAdam".
accum_name: The suffix for the variable that keeps the gradient squared
accumulator. If not present, defaults to name.
linear_name: The suffix for the variable that keeps the linear gradient
accumulator. If not present, defaults to name + "_1".
version: the specific version of GroupAdam.
Raises:
ValueError: If one of the arguments is invalid.
"""
# Sparse Group FTRL
opt = tfplus.train.SparseGroupFtrlOptimizer(
learning_rate=1e-2,
learning_rate_power=-0.5,
initial_accumulator_value=0.1,
l1_regularization_strength=l1,
l2_regularization_strength=l2,
l21_regularization_strength=l3,
use_locking=False,
name="SparseGroupFtrl",
accum_name=None,
linear_name=None,
l2_shrinkage_regularization_strength=0.0)
"""SparseGroupFtrlOptimizer inherits tf.compat.v1.train.FtrlOptimizer
which implements ftrl + sparse group lasso for KvVariable.
If var is tensorflow variable, the optimizer will
equal to tf.compat.v1.train.FtrlOptimizer.
Args:
learning_rate: A float value or a constant float `Tensor`.
learning_rate_power: Same to tf.compat.v1.train.FtrlOptimizer.
initial_accumulator_value: The starting value for accumulators.
Only zero or positive values are allowed.
l1_regularization_strength: A float value, must be greater than or
equal to zero.
l2_regularization_strength: A float value, must be greater than or
equal to zero.
l21_regularization_strength: A float value, must be greater than or
equal to zero.
use_locking: If `True` use locks for update operations.
name: Optional name prefix for the operations created when applying
gradients. Defaults to "GroupAdam".
accum_name: The suffix for the variable that keeps the gradient squared
accumulator. If not present, defaults to name.
linear_name: The suffix for the variable that keeps the linear gradient
accumulator. If not present, defaults to name + "_1".
l2_shrinkage_regularization_strength: Same to tf.compat.v1.train.FtrlOptimizer.
Raises:
ValueError: If one of the arguments is invalid.
"""