新闻资讯

最新资讯

PDF下载 | 这个白皮书把工业自动化TSN讲透了

英特尔®5G解决方案组合再添硬核成员，新一代eASIC™器件Diamond Mesa发布

灵活才是关键！看英特尔® Agilex™ FPGA如何助力大数据时代应用全面提速

看了这款《愤怒的麋鹿》，你有没有萌生自己在家做一款游戏的想法

利用LeFlow移植MNIST-CNN到FPGA上

2020-05-26

导读

本文是继上一篇《如何将TensorFlow代码转换到FPGA上》关于LeFlow的介绍后，利用LeFlow实现将自定义的CNN算法移植到FPGA上，应用场景选择最简单的MNIST手写字识别。模拟在实验过程及模型发布过程中的全流程，包括算法开发、预测、模型转换、FPGA综合、仿真、烧录等。

算法

算法这一块使用比较简单的CNN全卷积网络参考，之前测试Flatten会有问题，权值过大，LeFlow无法编译通过。下面直接看训练代码：

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data# Load the MNIST data set
mnist_data = input_data.read_data_sets(“MNIST_data/”, one_hot=True)# The basic MLP graph
x = tf.placeholder(tf.float32, shape=[None, 784], name=”input”)
x_image = tf.reshape(x, [-1,28,28,1])fc_size = 20w_c1 = tf.Variable(tf.truncated_normal([3, 3, 1, 3], stddev=0.1), name=”w_c1″)
b_c1 = tf.Variable(tf.constant(0.1, shape=[3]), name=”b_c1″)
w_c2 = tf.Variable(tf.truncated_normal([3, 3, 3, 8], stddev=0.1), name=”w_c2″)
b_c2 = tf.Variable(tf.constant(0.1, shape=[8]), name=”b_c2″)w_fc1 = tf.Variable(tf.truncated_normal([7, 7, 8, fc_size], stddev=0.1), name=”w_fc1″)
b_fc1 = tf.Variable(tf.constant(0.1, shape=[fc_size]), name=”b_fc1″)
w_fc2 = tf.Variable(tf.truncated_normal([fc_size, 10], stddev=0.1), name=”w_fc2″)
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]), name=”b_fc2″)h_p1 = tf.nn.max_pool(tf.nn.relu(tf.add(tf.nn.conv2d(x_image, w_c1, strides=[1, 1, 1, 1], padding=’SAME’),b_c1)),ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=’SAME’)h_p2 = tf.nn.max_pool(tf.nn.relu(tf.add(tf.nn.conv2d(h_p1, w_c2, strides=[1, 1, 1, 1], padding=’SAME’), b_c2)),ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=’SAME’)keep_prob = tf.placeholder(“float”, name=”keep_prob”)
h_fc1 = tf.nn.dropout(tf.reshape(tf.nn.relu(tf.add(tf.nn.conv2d(h_p2, w_fc1, strides=[1, 1, 1, 1], padding=’VALID’), b_fc1)),[-1, fc_size]), keep_prob)y = tf.nn.softmax(tf.add(tf.matmul(h_fc1, w_fc2), b_fc2), name=”output”)# The placeholder for the correct result
real_y = tf.placeholder(tf.float32, [None, 10], name=”real_y”)# Loss function
cross_entropy = -tf.reduce_sum(real_y*tf.log(y))

# Optimization
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# Correct Prediction
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(real_y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Initialization
init = tf.global_variables_initializer()

saver = tf.train.Saver()

# Starting tf XLA session
with tf.Session() as session:

# Training using MNIST dataset
epochs = 20000
session.run(init)
for i in range(epochs):
batch_x, batch_y = mnist_data.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x: batch_x, real_y: batch_y, keep_prob: 1.0})
print “step %d, training accuracy %g”%(i, train_accuracy)
session.run(train_step, feed_dict={x: batch_x, real_y: batch_y, keep_prob: 0.8})

network_accuracy = session.run(accuracy, feed_dict={x: mnist_data.test.images, real_y: mnist_data.test.labels, keep_prob: 1.0})

print(‘The accuracy over the MNIST data is {:.2f}%’.format(network_accuracy * 100))

saver.save(session, “Model/model.ckpt”)

<< 滑动查看>>

下面是这个网络的连接图，参数和结构都相对比较简单。由于手写字图片本身很简单，没有过多的特征值信息，所以这样的网络基本是够用的。只用了常见的算子，可以确保LeFlow编译没有问题。

网络连接图

模型和预测

接下来进行模型的训练：

python cnnMNIST_train.py

训练完成后会在Models目录下生成模型checkpoint文件。一般该文件是可以加载继续训练的，该文件分为图结构和权重文件，所以会有多个文件。一般发布的时候会对该文件进行冻结，从而生成一个pb文件，这样会比较方便，因此我们模拟这种场景对模型进行冻结。

sh freeze.sh

接下来是推理代码的实现，基本是先加载该冻结的模型文件，然后输入图片文件，得到输出结果：

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python.platform import gfile
import numpy as np
import datetime

# Load the MNIST data set
mnist_data = input_data.read_data_sets(“MNIST_data/”, one_hot=True)

sess = tf.Session()
with gfile.FastGFile(‘./Model/model.pb’, ‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sess.graph.as_default()
tf.import_graph_def(graph_def, name=’prefix’)

sess.run(tf.global_variables_initializer())

x = sess.graph.get_tensor_by_name(‘prefix/input:0’)
keep_prob = sess.graph.get_tensor_by_name(‘prefix/keep_prob:0’)
y = sess.graph.get_tensor_by_name(‘prefix/output:0’)

#look the tensor name
#for op in sess.graph.get_operations():
# print(op.name)

# real ret is 6
test_image=123

starttime = datetime.datetime.now()
#for i in range(10):
# test_image=i
ret = sess.run(y, feed_dict={x: [mnist_data.test.images[test_image]], keep_prob: 1.0})
# print(“Expected Result: “+str(np.argmax(mnist_data.test.labels[test_image])))
# print(“Real Result: “+str(np.argmax(ret)))

endtime = datetime.datetime.now()

delta = (endtime – starttime).microseconds/1000.0

print(“Expected Result: “+str(np.argmax(mnist_data.test.labels[test_image])))
print(“Real Result: “+str(np.argmax(ret)))
print ret
print(“Use time: %s ms” % str(delta))

执行预测：

python predict.py

输出：

Expected Result: 6
Real Result: 6
[[9.6263975e-06 5.2255353e-05 2.6832073e-04 9.6497261e-06 7.0781505e-04
1.0373291e-05 9.9832779e-01 1.9901098e-07 6.1113882e-04 2.8554557e-06]]
Use time: 19.535 ms

TO FPGA

模型文件和预测结果都没有问题，那么就可以考虑转换成FPGA需要的Verilog代码了，从之前的例子也可以看出，其实要编译的代码需要根据网络的推理结构重新实现一次，然后进行编译，这个在训练或者预测代码里都可以实现，但是这里分开实现逻辑会比较清晰。代码如下：

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import sys
sys.path.append(‘../../src’)
import processMif as mif
import additionalOptions as options

def load_graph(frozen_graph_filename):
# We load the protobuf file from the disk and parse it to retrieve the
# unserialized graph_def
with tf.gfile.GFile(frozen_graph_filename, “rb”) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())

# Then, we can use again a convenient built-in function to import a graph_def into the
# current default Graph
with tf.Graph().as_default() as graph:
tf.import_graph_def(
graph_def,
input_map=None,
return_elements=None,
name=”prefix”,
op_dict=None,
producer_op_list=None
)
return graph

# We use our “load_graph” function
graph = load_graph(“./Model/model.pb”)

for op in graph.get_operations():
print(op.name)

# Get the model nodes
x = graph.get_tensor_by_name(‘prefix/input:0’)
y = graph.get_tensor_by_name(‘prefix/output:0’)
w_c1 = graph.get_tensor_by_name(‘prefix/w_c1:0’)
b_c1 = graph.get_tensor_by_name(‘prefix/b_c1:0’)
w_c2 = graph.get_tensor_by_name(‘prefix/w_c2:0’)
b_c2 = graph.get_tensor_by_name(‘prefix/b_c2:0’)

w_fc1 = graph.get_tensor_by_name(‘prefix/w_fc1:0’)
w_fc2 = graph.get_tensor_by_name(‘prefix/w_fc2:0’)
b_fc1 = graph.get_tensor_by_name(‘prefix/b_fc1:0’)
b_fc2 = graph.get_tensor_by_name(‘prefix/b_fc2:0′)

# Load the MNIST data set
mnist_data = input_data.read_data_sets(“MNIST_data/”, one_hot=True)

test_image=123

with tf.Session(graph=graph) as session:
with tf.device(“device:XLA_CPU:0”):
hp1 = tf.nn.max_pool(tf.nn.relu(tf.add(tf.nn.conv2d(tf.reshape(x, [-1,28,28,1]), w_c1, strides=[1, 1, 1, 1], padding=’SAME’),b_c1)),ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=’SAME’)
hp2 = tf.nn.max_pool(tf.nn.relu(tf.add(tf.nn.conv2d(hp1, w_c2, strides=[1, 1, 1, 1], padding=’SAME’), b_c2)),ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=’SAME’)
h_fc1 = tf.nn.relu(tf.add(tf.nn.conv2d(hp2, w_fc1, strides=[1, 1, 1, 1], padding=’VALID’), b_fc1))
y = tf.nn.softmax(tf.add(tf.matmul(tf.reshape(h_fc1, [-1, 20]), w_fc2), b_fc2))

ret = session.run(y, feed_dict={x: [mnist_data.test.images[test_image]]})

print(“Expected Result: “+str(np.argmax(mnist_data.test.labels[test_image])))
print(“Real Result: “+str(ret))

# Creating memories for testing

param1 = mnist_data.test.images[test_image]
param0 = w_c1.eval()
param2 = b_c1.eval()
param3 = w_c2.eval()
param4 = b_c2.eval()
param5 = w_fc1.eval()
param6 = b_fc1.eval()
param7 = w_fc2.eval()
param8 = b_fc2.eval()
mif.createMem([param0,param1,param2,param3,param4,param5,param6,param7,param8])

<< 滑动查看>>

基本流程就是先把需要的参数和权重都从pb模型中加载出来，然后在with tf.device(“device:XLA_CPU:0”):之下实现模型推理算法，这里面需要注意的是训练阶段的算子是不需要加进来的，比如dropout等。另外就是要创建legup需要的内存mif文件mif.createMem，注意参数顺序。

使用LeFlow编译代码：

../../src/LeFlow cnnMNIST_to_fpga.py

图像界面中使用Modelsim仿真：

cd cnnMNIST_to_fpga_filesmake w

查看仿真结果：

对于这种稍微复杂算法的仿真，时间也是比较长的，大概在十个小时左右，需要耐心等待。

仿真结果

这里输出的结果是存放在temp中的，一般是在最后一个temp，但是前面的可能也会存放（中间值，没有修改就和后面的相同）。从图里可以看到结果是一个数组，内容都是二进制，可以将二进制转换为浮点类型和前面的预测值对比。

对比后的结果可以看出有一定的误差，大概在千分之1的样子。不过这并不影响分类的结果。接下来就可以利用Quartus和Jtag烧录到FPGA了，这里不做过多介绍。

总结

本文主要介绍了如何用LeFlow转换一个简单的CNN网络到FPGA中，并应用在手写字识别的场景中。可以看出对于简单的网络来说，LeFlow还是比较好用的，无论是代码实现还是编译过程都是非常简单的。但是缺陷也比较明显，复杂网络确实存在比较严重的问题。接下来会研究更复杂的网络和应用场景，比如表情识别和车牌识别。