Home - yangxudong/yangxudong.github.io GitHub Wiki

Welcome to the 杨旭东 wiki!

自然语言处理

机器学习

距离

convert the cosine similarity into an angular distance

f1

脏数据清洗

An Empirical Study of Example Forgetting during Deep Neural Network Learning - github Confidence Learning

Tensorflow

freeze网络的一部分权重

  • 定义变量时加trainable=False
  • 不要直接用optimizer.minimize(), 把步骤拆开,如下:
  # tvars = tf.trainable_variables()  # 把这句替换为下面几行,控制哪些变量需要更新
  tvars = []
  for collection in ["adapters", "layer_norm", "head"]:
    tvars += tf.get_collection(collection)

  grads = tf.gradients(loss, tvars)

  # This is how the model was pre-trained.
  (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)

  train_op = optimizer.apply_gradients(  # 只更新tvars列表里的参数
      zip(grads, tvars), global_step=global_step)

需要在创建变量时,添加到指定的collection中,如下:

    w1 = tf.get_variable(
        "weights1", [in_size, hidden_size],
        initializer=tf.truncated_normal_initializer(stddev=init_scale),
        collections=["adapters", tf.GraphKeys.GLOBAL_VARIABLES])
    b1 = tf.get_variable(
        "biases1", [1, hidden_size],
        initializer=tf.zeros_initializer(),
        collections=["adapters", tf.GraphKeys.GLOBAL_VARIABLES])

也可以用tvars = tf.trainable_variables()得到所有的变量,然后遍历,根据名字找出目标变量

  for var in tvars:
    if var.name in target_variable_names:
      # do something here

孪生网络

output1 = siamese_nn(address1, num_features)
# Declare that we will use the same variables on the second string
with tf.variable_scope(tf.get_variable_scope(), reuse=True):
    output2 = siamese_nn(address2, num_features)

统计模型的参数数量和flops

import os
import tensorflow as tf
from functools import reduce
from operator import mul

tf.app.flags.DEFINE_string("model_dir", "", "saved model path")
FLAGS = tf.app.flags.FLAGS


def get_num_params():
    num_params = 0
    for variable in tf.trainable_variables():
        shape = variable.get_shape()
        num_params += reduce(mul, [dim.value for dim in shape], 1)
    return num_params


def count_flops(graph):
  flops = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.float_operation())
  print('FLOPs: {}'.format(flops.total_float_ops))


with tf.Session() as sess:
  meta_graph_def = tf.saved_model.loader.load(sess, ['serve'], FLAGS.model_dir)
  print("Param Num:", get_num_params())
  count_flops(sess.graph)

写tfrecord文件

遇到一个读tfrecord文件出错的问题:

DataLossError (see above for traceback): corrupted record at 0

排查原因是因为在写写tfrecord时用错了API,应该用writer = tf.python_io.TFRecordWriter(output_file),而我用成了writer = tf.gfile.GFile(output_file, 'w')。tfrecord文件的每条记录都有一些crc校验信息,而用gfile写入时不会写入这些校验信息,导致读取失败。

    writer = tf.python_io.TFRecordWriter(output_file)
    count = 0
    for record in result:
        item_id = record["item_id"]
        label = record["label"]
        weight = record["weight"]
        input_ids = record["input_ids"]
        input_mask = record["input_mask"]
        input_types = record["input_types"]
        logits = record["logits"]
        probs = record["probs_float"]
        features = collections.OrderedDict()
        features["title_input_ids"] = create_int_feature(input_ids)
        features["title_input_mask"] = create_int_feature(input_mask)
        features["title_input_types"] = create_int_feature(input_types)
        features["item_id"] = create_int_feature([item_id])
        # features["embedding"] = create_float_feature(embedding)
        features["logits"] = create_float_feature(logits)
        features["probs"] = create_float_feature(probs)
        features["label"] = create_int_feature([label])
        features["weight"] = create_float_feature([weight])
        tf_example = tf.train.Example(features=tf.train.Features(feature=features))
        writer.write(tf_example.SerializeToString())

参数优化时分两个学习率(比如在BERT后面接了一个BILSTM),实现起来就是这样:

with tf.variable_scope('opt'):
    params_of_bert = []
    params_of_other = []
    for var in tf.trainable_variables():
        vname = var.name
        if vname.startswith("bert"):
            params_of_bert.append(var)
        else:
            params_of_other.append(var)
    opt1 = tf.train.AdamOptimizer(1e-4)
    opt2 = tf.train.AdamOptimizer(1e-3)
    gradients_bert = tf.gradients(loss, params_of_bert)
    gradients_other = tf.gradients(loss, params_of_other)
    gradients_bert_clipped, norm_bert = tf.clip_by_global_norm(gradients_bert, 5.0)
    gradients_other_clipped, norm_other = tf.clip_by_global_norm(gradients_other, 5.0)
    train_op_bert = opt1.apply_gradients(zip(gradients_bert_clipped, params_of_bert))
    train_op_other = opt2.apply_gradients(zip(gradients_other_clipped, params_of_other))

常见问题

  1. Failed to get convolution algorithm. This is probably because cuDNN failed to initialize 在代码里加一个这句 config.gpu_options.allow_growth=True and config.gpu_options.force_gpu_compatible=False
    session_config = tf.ConfigProto(
        allow_soft_placement=True,
        log_device_placement=FLAGS.log_device_placement,
        gpu_options=tf.GPUOptions(allow_growth=True, force_gpu_compatible=False)
    )
    config = tf.estimator.RunConfig(
        tf_random_seed=FLAGS.tf_random_seed,
        session_config=session_config,
        save_checkpoints_steps=FLAGS.save_checkpoints_steps,
        keep_checkpoint_max=FLAGS.keep_checkpoint_max,
        log_step_count_steps=FLAGS.log_step_count_steps,
        save_summary_steps=FLAGS.log_step_count_steps,
        # train_distribute=distribution
    )
  1. Losses collection is not thread local so it can't be used inside model_fn call when using DistrubuteStrategy

When calling tf.losses.add_loss inside model_fn in Estimator API, it is added to the tf.GraphKeys.LOSSES collection. tf.losses.get_total_loss is aggregating all the losses from the tf.GraphKeys.LOSSES collection.

Unfortunately, when using tf.contrib.distribute.MirroredStrategy as a distribute strategy, collection is updated from all concurrent model_fn calls. This leads to tower losses being aggregated to total loss in other towers as well.

详见github issue

  1. 当计算损失函数时使用了tf.norm(),可能会触发 NaN 异常

Here's a very simple reproduction:

import tensorflow as tf

with tf.Graph().as_default():
  y = tf.zeros(shape=[1], dtype=tf.float32)
  dist = tf.norm(y,axis=0)
  (grad,) = tf.gradients(dist, [y])
  with tf.Session():
    print(grad.eval())

Prints:[ nan]

The issue is that tf.norm computes sum(x2)0.5. The gradient is x / sum(x2) ** 0.5 (see e.g. https://math.stackexchange.com/a/84333), so when sum(x2) is zero we're dividing by zero.

There's not much to be done in terms of a special case: the gradient as x approaches all zeros depends on which direction it's approaching from. For example if x is a single-element vector, the limit as x approaches 0 could either be 1 or -1 depending on which side of zero it's approaching from.

So in terms of solutions, you could just add a small epsilon:

import tensorflow as tf

def safe_norm(x, epsilon=1e-12, axis=None):
  return tf.sqrt(tf.reduce_sum(x ** 2, axis=axis) + epsilon)

with tf.Graph().as_default():
  y = tf.constant([0.])
  dist = safe_norm(y,axis=0)
  (grad,) = tf.gradients(dist, [y])
  with tf.Session():
    print(grad.eval())

Prints:[ 0.]

Note that this is not actually the Euclidean norm. It's a good approximation as long as the input is much larger than epsilon.