Home - yangxudong/yangxudong.github.io GitHub Wiki
Welcome to the 杨旭东 wiki!
机器学习
距离
convert the cosine similarity into an angular distance
脏数据清洗
An Empirical Study of Example Forgetting during Deep Neural Network Learning - github Confidence Learning
Tensorflow
freeze网络的一部分权重
- 定义变量时加
trainable=False
- 不要直接用
optimizer.minimize()
, 把步骤拆开,如下:
# tvars = tf.trainable_variables() # 把这句替换为下面几行,控制哪些变量需要更新
tvars = []
for collection in ["adapters", "layer_norm", "head"]:
tvars += tf.get_collection(collection)
grads = tf.gradients(loss, tvars)
# This is how the model was pre-trained.
(grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)
train_op = optimizer.apply_gradients( # 只更新tvars列表里的参数
zip(grads, tvars), global_step=global_step)
需要在创建变量时,添加到指定的collection中,如下:
w1 = tf.get_variable(
"weights1", [in_size, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=init_scale),
collections=["adapters", tf.GraphKeys.GLOBAL_VARIABLES])
b1 = tf.get_variable(
"biases1", [1, hidden_size],
initializer=tf.zeros_initializer(),
collections=["adapters", tf.GraphKeys.GLOBAL_VARIABLES])
也可以用tvars = tf.trainable_variables()
得到所有的变量,然后遍历,根据名字找出目标变量
for var in tvars:
if var.name in target_variable_names:
# do something here
孪生网络
output1 = siamese_nn(address1, num_features)
# Declare that we will use the same variables on the second string
with tf.variable_scope(tf.get_variable_scope(), reuse=True):
output2 = siamese_nn(address2, num_features)
统计模型的参数数量和flops
import os
import tensorflow as tf
from functools import reduce
from operator import mul
tf.app.flags.DEFINE_string("model_dir", "", "saved model path")
FLAGS = tf.app.flags.FLAGS
def get_num_params():
num_params = 0
for variable in tf.trainable_variables():
shape = variable.get_shape()
num_params += reduce(mul, [dim.value for dim in shape], 1)
return num_params
def count_flops(graph):
flops = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.float_operation())
print('FLOPs: {}'.format(flops.total_float_ops))
with tf.Session() as sess:
meta_graph_def = tf.saved_model.loader.load(sess, ['serve'], FLAGS.model_dir)
print("Param Num:", get_num_params())
count_flops(sess.graph)
写tfrecord文件
遇到一个读tfrecord文件出错的问题:
DataLossError (see above for traceback): corrupted record at 0
排查原因是因为在写写tfrecord时用错了API,应该用writer = tf.python_io.TFRecordWriter(output_file)
,而我用成了writer = tf.gfile.GFile(output_file, 'w')
。tfrecord文件的每条记录都有一些crc校验信息,而用gfile写入时不会写入这些校验信息,导致读取失败。
writer = tf.python_io.TFRecordWriter(output_file)
count = 0
for record in result:
item_id = record["item_id"]
label = record["label"]
weight = record["weight"]
input_ids = record["input_ids"]
input_mask = record["input_mask"]
input_types = record["input_types"]
logits = record["logits"]
probs = record["probs_float"]
features = collections.OrderedDict()
features["title_input_ids"] = create_int_feature(input_ids)
features["title_input_mask"] = create_int_feature(input_mask)
features["title_input_types"] = create_int_feature(input_types)
features["item_id"] = create_int_feature([item_id])
# features["embedding"] = create_float_feature(embedding)
features["logits"] = create_float_feature(logits)
features["probs"] = create_float_feature(probs)
features["label"] = create_int_feature([label])
features["weight"] = create_float_feature([weight])
tf_example = tf.train.Example(features=tf.train.Features(feature=features))
writer.write(tf_example.SerializeToString())
参数优化时分两个学习率(比如在BERT后面接了一个BILSTM),实现起来就是这样:
with tf.variable_scope('opt'):
params_of_bert = []
params_of_other = []
for var in tf.trainable_variables():
vname = var.name
if vname.startswith("bert"):
params_of_bert.append(var)
else:
params_of_other.append(var)
opt1 = tf.train.AdamOptimizer(1e-4)
opt2 = tf.train.AdamOptimizer(1e-3)
gradients_bert = tf.gradients(loss, params_of_bert)
gradients_other = tf.gradients(loss, params_of_other)
gradients_bert_clipped, norm_bert = tf.clip_by_global_norm(gradients_bert, 5.0)
gradients_other_clipped, norm_other = tf.clip_by_global_norm(gradients_other, 5.0)
train_op_bert = opt1.apply_gradients(zip(gradients_bert_clipped, params_of_bert))
train_op_other = opt2.apply_gradients(zip(gradients_other_clipped, params_of_other))
常见问题
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize
在代码里加一个这句config.gpu_options.allow_growth=True
andconfig.gpu_options.force_gpu_compatible=False
session_config = tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement,
gpu_options=tf.GPUOptions(allow_growth=True, force_gpu_compatible=False)
)
config = tf.estimator.RunConfig(
tf_random_seed=FLAGS.tf_random_seed,
session_config=session_config,
save_checkpoints_steps=FLAGS.save_checkpoints_steps,
keep_checkpoint_max=FLAGS.keep_checkpoint_max,
log_step_count_steps=FLAGS.log_step_count_steps,
save_summary_steps=FLAGS.log_step_count_steps,
# train_distribute=distribution
)
- Losses collection is not thread local so it can't be used inside model_fn call when using DistrubuteStrategy
When calling tf.losses.add_loss inside model_fn in Estimator API, it is added to the tf.GraphKeys.LOSSES collection. tf.losses.get_total_loss is aggregating all the losses from the tf.GraphKeys.LOSSES collection.
Unfortunately, when using tf.contrib.distribute.MirroredStrategy as a distribute strategy, collection is updated from all concurrent model_fn calls. This leads to tower losses being aggregated to total loss in other towers as well.
- 当计算损失函数时使用了
tf.norm()
,可能会触发 NaN 异常
Here's a very simple reproduction:
import tensorflow as tf
with tf.Graph().as_default():
y = tf.zeros(shape=[1], dtype=tf.float32)
dist = tf.norm(y,axis=0)
(grad,) = tf.gradients(dist, [y])
with tf.Session():
print(grad.eval())
Prints:[ nan]
The issue is that tf.norm computes sum(x2)0.5. The gradient is x / sum(x2) ** 0.5 (see e.g. https://math.stackexchange.com/a/84333), so when sum(x2) is zero we're dividing by zero.
There's not much to be done in terms of a special case: the gradient as x approaches all zeros depends on which direction it's approaching from. For example if x is a single-element vector, the limit as x approaches 0 could either be 1 or -1 depending on which side of zero it's approaching from.
So in terms of solutions, you could just add a small epsilon:
import tensorflow as tf
def safe_norm(x, epsilon=1e-12, axis=None):
return tf.sqrt(tf.reduce_sum(x ** 2, axis=axis) + epsilon)
with tf.Graph().as_default():
y = tf.constant([0.])
dist = safe_norm(y,axis=0)
(grad,) = tf.gradients(dist, [y])
with tf.Session():
print(grad.eval())
Prints:[ 0.]
Note that this is not actually the Euclidean norm. It's a good approximation as long as the input is much larger than epsilon.