Home - yangxudong/yangxudong.github.io GitHub Wiki

Welcome to the 杨旭东 wiki!

自然语言处理

机器学习

距离

convert the cosine similarity into an angular distance

$sim(u,v)=\left(1-arccos\left(\frac{u{\cdot}v}{{||u||}{||v||}}\right)/\pi\right)$

脏数据清洗

An Empirical Study of Example Forgetting during Deep Neural Network Learning - github Confidence Learning

Tensorflow

freeze网络的一部分权重

定义变量时加trainable=False
不要直接用optimizer.minimize(), 把步骤拆开，如下：

  # tvars = tf.trainable_variables()  # 把这句替换为下面几行，控制哪些变量需要更新
  tvars = []
  for collection in ["adapters", "layer_norm", "head"]:
    tvars += tf.get_collection(collection)

  grads = tf.gradients(loss, tvars)

  # This is how the model was pre-trained.
  (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)

  train_op = optimizer.apply_gradients(  # 只更新tvars列表里的参数
      zip(grads, tvars), global_step=global_step)

需要在创建变量时，添加到指定的collection中，如下：

    w1 = tf.get_variable(
        "weights1", [in_size, hidden_size],
        initializer=tf.truncated_normal_initializer(stddev=init_scale),
        collections=["adapters", tf.GraphKeys.GLOBAL_VARIABLES])
    b1 = tf.get_variable(
        "biases1", [1, hidden_size],
        initializer=tf.zeros_initializer(),
        collections=["adapters", tf.GraphKeys.GLOBAL_VARIABLES])

也可以用tvars = tf.trainable_variables()得到所有的变量，然后遍历，根据名字找出目标变量

  for var in tvars:
    if var.name in target_variable_names:
      # do something here

孪生网络

output1 = siamese_nn(address1, num_features)
# Declare that we will use the same variables on the second string
with tf.variable_scope(tf.get_variable_scope(), reuse=True):
    output2 = siamese_nn(address2, num_features)

统计模型的参数数量和flops

import os
import tensorflow as tf
from functools import reduce
from operator import mul

tf.app.flags.DEFINE_string("model_dir", "", "saved model path")
FLAGS = tf.app.flags.FLAGS


def get_num_params():
    num_params = 0
    for variable in tf.trainable_variables():
        shape = variable.get_shape()
        num_params += reduce(mul, [dim.value for dim in shape], 1)
    return num_params


def count_flops(graph):
  flops = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.float_operation())
  print('FLOPs: {}'.format(flops.total_float_ops))


with tf.Session() as sess:
  meta_graph_def = tf.saved_model.loader.load(sess, ['serve'], FLAGS.model_dir)
  print("Param Num:", get_num_params())
  count_flops(sess.graph)

写tfrecord文件

遇到一个读tfrecord文件出错的问题：

DataLossError (see above for traceback): corrupted record at 0

排查原因是因为在写写tfrecord时用错了API，应该用writer = tf.python_io.TFRecordWriter(output_file)，而我用成了writer = tf.gfile.GFile(output_file, 'w')。tfrecord文件的每条记录都有一些crc校验信息，而用gfile写入时不会写入这些校验信息，导致读取失败。

    writer = tf.python_io.TFRecordWriter(output_file)
    count = 0
    for record in result:
        item_id = record["item_id"]
        label = record["label"]
        weight = record["weight"]
        input_ids = record["input_ids"]
        input_mask = record["input_mask"]
        input_types = record["input_types"]
        logits = record["logits"]
        probs = record["probs_float"]
        features = collections.OrderedDict()
        features["title_input_ids"] = create_int_feature(input_ids)
        features["title_input_mask"] = create_int_feature(input_mask)
        features["title_input_types"] = create_int_feature(input_types)
        features["item_id"] = create_int_feature([item_id])
        # features["embedding"] = create_float_feature(embedding)
        features["logits"] = create_float_feature(logits)
        features["probs"] = create_float_feature(probs)
        features["label"] = create_int_feature([label])
        features["weight"] = create_float_feature([weight])
        tf_example = tf.train.Example(features=tf.train.Features(feature=features))
        writer.write(tf_example.SerializeToString())

参数优化时分两个学习率(比如在BERT后面接了一个BILSTM)，实现起来就是这样:

with tf.variable_scope('opt'):
    params_of_bert = []
    params_of_other = []
    for var in tf.trainable_variables():
        vname = var.name
        if vname.startswith("bert"):
            params_of_bert.append(var)
        else:
            params_of_other.append(var)
    opt1 = tf.train.AdamOptimizer(1e-4)
    opt2 = tf.train.AdamOptimizer(1e-3)
    gradients_bert = tf.gradients(loss, params_of_bert)
    gradients_other = tf.gradients(loss, params_of_other)
    gradients_bert_clipped, norm_bert = tf.clip_by_global_norm(gradients_bert, 5.0)
    gradients_other_clipped, norm_other = tf.clip_by_global_norm(gradients_other, 5.0)
    train_op_bert = opt1.apply_gradients(zip(gradients_bert_clipped, params_of_bert))
    train_op_other = opt2.apply_gradients(zip(gradients_other_clipped, params_of_other))

常见问题

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize 在代码里加一个这句 config.gpu_options.allow_growth=True and config.gpu_options.force_gpu_compatible=False

    session_config = tf.ConfigProto(
        allow_soft_placement=True,
        log_device_placement=FLAGS.log_device_placement,
        gpu_options=tf.GPUOptions(allow_growth=True, force_gpu_compatible=False)
    )
    config = tf.estimator.RunConfig(
        tf_random_seed=FLAGS.tf_random_seed,
        session_config=session_config,
        save_checkpoints_steps=FLAGS.save_checkpoints_steps,
        keep_checkpoint_max=FLAGS.keep_checkpoint_max,
        log_step_count_steps=FLAGS.log_step_count_steps,
        save_summary_steps=FLAGS.log_step_count_steps,
        # train_distribute=distribution
    )

Losses collection is not thread local so it can't be used inside model_fn call when using DistrubuteStrategy

When calling tf.losses.add_loss inside model_fn in Estimator API, it is added to the tf.GraphKeys.LOSSES collection. tf.losses.get_total_loss is aggregating all the losses from the tf.GraphKeys.LOSSES collection.

Unfortunately, when using tf.contrib.distribute.MirroredStrategy as a distribute strategy, collection is updated from all concurrent model_fn calls. This leads to tower losses being aggregated to total loss in other towers as well.

详见github issue

当计算损失函数时使用了tf.norm()，可能会触发 NaN 异常

Here's a very simple reproduction:

import tensorflow as tf

with tf.Graph().as_default():
  y = tf.zeros(shape=[1], dtype=tf.float32)
  dist = tf.norm(y,axis=0)
  (grad,) = tf.gradients(dist, [y])
  with tf.Session():
    print(grad.eval())

Prints:[ nan]

The issue is that tf.norm computes sum(x2)0.5. The gradient is x / sum(x2) ** 0.5 (see e.g. https://math.stackexchange.com/a/84333), so when sum(x2) is zero we're dividing by zero.

There's not much to be done in terms of a special case: the gradient as x approaches all zeros depends on which direction it's approaching from. For example if x is a single-element vector, the limit as x approaches 0 could either be 1 or -1 depending on which side of zero it's approaching from.

So in terms of solutions, you could just add a small epsilon:

import tensorflow as tf

def safe_norm(x, epsilon=1e-12, axis=None):
  return tf.sqrt(tf.reduce_sum(x ** 2, axis=axis) + epsilon)

with tf.Graph().as_default():
  y = tf.constant([0.])
  dist = safe_norm(y,axis=0)
  (grad,) = tf.gradients(dist, [y])
  with tf.Session():
    print(grad.eval())

Prints:[ 0.]

Note that this is not actually the Euclidean norm. It's a good approximation as long as the input is much larger than epsilon.