About

I’m a compute architect at NVIDIA, focused on accelerating Deep Learning software stack on cutting-edge GPUs such as Hopper and Blackwell. Currently, I am engaged in developing a deep learning compiler and enhancing end-to-end training performance. In my spare time, I maintain a keen interest in emerging deep learning algorithms, including embodied intelligence, AI4Science, LLM and graphics.

Some summarizes of my working:

Part of my contribution on Deep Learning Compilers:

  • Fuser (2022): Support graphOps like gather/scatter/index_select. code blog
  • pytorch_geometric (2022): Add TorchScript support for PyG community. code

Part of my contribution on Deep Learning models and frameworks for public users:

Part of my contribution on fast kernels: