Hands on TensorRT on NvidiaTX2



You are most probably familiar with deep learning frameworks like Tensorflow, Pytorch, mxnet etc. These frameworks are general purpose tools geared towards learning a model from data. These are great for research prototyping but not tailored for deployment. A lot of opensource code is available which develop various applications using the above frameworks. However, when it comes to deploying the trained models using these could be a sub-optimal solution. Although an overwhelming majority of people will still use (atleast for next 2-3 years) these frameworks even for deployment. Although in theory it is possible to install tensorflow-gpu on this device, but so far I have not been able to compile tensorflow on aarch64 (TX2). I was suggested to use TensorRT instead which comes pre-installed on TX2.

Nvidia has come up with TensorRT which is an inference engine. It is a high performance runtime inference engine, which gives maximum GPU performance from server GPUs to embedded GPUs like Jetson. Particularly they developed libnvinfer , which is a cuda based library geared for scalable inference. I am trying to get TensorRT working on the DJI Manifold 2 (Nvidia TX2). It is claimed that TensorRT roughly is 10x-40x faster than tensorflow models. For a quick info on understanding what is TensorRT, I recommend the official webminar (30min) from Nvidia [Link].

Image result for dji manifold 2
DJI Manifold 2-G. This is essentially Nvidia TX2. The manifold 2C is essentially an Intel i7.

Running the Official Samples

I good first step is to get the official samples working correctly. On my device (DJI Manifold-2G aka Nvidia TX2), they were found at location `/usr/src/tensorrt/`. Copy this whole folder to your home directory. These samples demonstrate the C++ API of TensorRT.

$ cd /usr/src/tensorrt/
$ ls 
bin/     data/     samples/
$ cp -r tensorrt $HOME
$ cd $HOME/tensorrt/samples/
$ make all 

This hopefully should compile all the samples. The executable are generated in the bindirectory. Now if you go to bin directory and say try to execute sample_mnist, you will see the program crash.

ERROR: cudnnEngine.cpp (56) - Cuda Error in initializeCommonContext: 4
ERROR: cudnnEngine.cpp (56) - Cuda Error in initializeCommonContext: 4
sample_mnist: sampleMNIST.cpp:63: void caffeToGIEModel(const string&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, unsigned int, nvinfer1::IHostMemory*&): Assertion `engine' failed.
Aborted (core dumped)

On looking at the code, it is easy to see that the program assumes you are in the corresponding data directory. Also the program crashes on not using sudo. I haven’t figured the ‘why’, but if any reader has info on this, please do comment. Something like the following works:

dji@manifold2:~/tensorrt_officialsamples/bin$ cd ../data/mnist/
dji@manifold2:~/tensorrt_officialsamples/data/mnist$ sudo ../../bin/sample_mnist

@@@@@@@           %@@@@@@@@@
@@@@@@@           %@@@@@@@@@
@@@@@@@#:-#-.     %@@@@@@@@@
@@@@@@@@@@@@#    #@@@@@@@@@@
@@@@@@@@@@@@@    #@@@@@@@@@@
@@@@@@@@@@@@@:  :@@@@@@@@@@@
@@@@@@@@@%+==   *%%%%%%%%%@@
@@@@@@@@%                 -@
@@@@@@@@@#+.          .:-%@@
@@@@@@@@@@@*     :-###@@@@@@
@@@@@@@@@@@*   -%@@@@@@@@@@@
@@@@@@@@@@@*   *@@@@@@@@@@@@
@@@@@@@@@@@*   @@@@@@@@@@@@@
@@@@@@@@@@@*   #@@@@@@@@@@@@
@@@@@@@@@@@*   *@@@@@@@@@@@@
@@@@@@@@@@@*   *@@@@@@@@@@@@
@@@@@@@@@@@*   @@@@@@@@@@@@@
@@@@@@@@@@@*   @@@@@@@@@@@@@

7: **********

If you could get something like this, congratulations, your tensorrt is working correctly…I highly recommend you to read the code of sample_mnist. It demonstrate the toy-mnist example of digit-image classification, deployed using the tensorrt’s C++ API. To get to know how it works, read here. Lot of you could be like me, and more interested to use the Python API, read-on to know about my experience on it.

TensorRT Python API

For Jetson devices, python-tensorrt is available with jetpack4.2. See here for info. So for my device, as of may 2019, C++ is the only was to get tensorRT model deployment.

TensorRT C++ API

While there are several ways to specify the network in TensorRT, my desired usage is that, I wish to use my pretrained keras model with TensorRT. If you are familiar with keras, then you know that a model can be built with Sequential API or the Functional API. In both cases the model is of the type keras.models.Model. Yet another way is to load a pretrained model from .h5 file or from .json file. In TensorRT there is a UFF Parser, which can load a .uff file. UFF is the Nvidia’s network and weights definition file format. One can write keras model to UFF through tensorflow’s intermediate .pb (proto-binary) format.

keras —> .pb —> .uff —> load with UFFParser on TX2

Step-1: Keras Model to Tensorflow Proto-binary (.pb)

Using tensorflow and keras it is possible to produce a .pb file. See details in the next step.

Step-2: Proto-binary (.PB) to Nvidia’s .uff


You could also see this for a minimalist demo.

To run this script, you need to full TensorFlow (atleast tf1.12) as well TensorRT (I used 5.1) on your x86 computer. Note that, installing full tensorflow is not recommended on the TX2 device. Alternately you may use my docker image which can already run this script. Simply clone the cartwheel_train repo and run this script. Make sure to adjust the paths before running the script. You need to note down the input and output tensor name which this script outputs. This is needed for the UFFParser.

$(host) mkdir $HOME/docker_ws
$(host) docker run --runtime=nvidia -it  -v $HOME/docker_ws:/app  mpkuse/kusevisionkit:tfgpu-1.12-tensorrt-5.1 bash
$(docker) cd /app 
$(docker) git clone https://github.com/mpkuse/cartwheel_train
$(docker) cd cartwheel_train #make sure you adjust the path in the script before executing 
$(docker) test_kerasmodel_to_pb.py
Load model_json_fname: models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//model.json
Load JSON file:  models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//model.json
Load Weights:  models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//core_model.1000.keras
**Converted output node names are: [u'net_vlad_layer_1/l2_normalize_1']**
Saved the graph definition in ascii format at models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//output_model.pbtxt
Saved the freezed graph at models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//output_model.pb
Now do
		cd models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss/
		convert-to-uff output_model.pbtxt
$(docker) convert output_model.pbtxt # this will produce the .uff file.

Step-3: TensorRT Load .uff with UFFParser C++ API

I adopted a standalone example from the official samples. It loads a pretrained MNIST model in uff format. It works on TX2.


One thought on “Hands on TensorRT on NvidiaTX2

  1. Hi there,

    We’ve had the same problem running TensorRT programs without sudo. In our case, it appears that some permissions were not set correctly, I guess because we didn’t do a proper flash of the OS using Jetpack.

    Running the following command should fix this. (Found out by running the program using strace -f.)
    sudo chown -R nvidia:nvidia /home/nvidia/.nv/

    We also had some issue running TensorRT from another user (not nvidia), I believe we resolved that by adding that user to the ‘video’ group.

    Hope this helps you!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s