About me

Jiajia Li is an Assistant Professor in Department of Computer Science at North Carolina State University (NCSU), Raleigh, NC. Her research focuses on high performance computing with an emphasis on the interaction among applications, numerical methods, algorithms, automatic performance tuning, performance profiling, program analysis, and computer architectures. She is particularly interested in building high-performance and reliable systems for sparse and irregular workloads, including sparse (multi-)linear algebra, solvers, tensor computations, and GPU programs for large-scale data analytics and domain applications on diverse computer architectures.

Jiajia Li was an Assistant Professor in Department of Computer Science at the College of William & Mary (W&M), Williamsburg, VA and a Research Scientist at High Performance Computing group of Pacific Northwest National Laboratory (PNNL), Richland, WA from 2018-2022. She has received her Ph.D. degree (Aug. 2018) in Computational Science & Engineering at Georgia Institute of Technology, advised by Professor Richard Vuduc. She has received Rising Stars in Computational and Data Sciences, Best Student Paper Award, and IBM PhD Fellowship. Before, she was a research intern of IBM Thomas J. Watson Research Center and Intel Parallel Computing Lab in the summers of 2016 and 2015 respectively. In the past, she has received a Ph.D. degree (Jul. 2013) from Institute of Computing Technology at Chinese Academy of Sciences. She received her B.S. (Jul. 2008) in Computational Mathematics from Dalian University of Technology in the Accelerated Student Program (2/180).

Please feel free to drop me an email @ jiajia.li@ncsu.edu if you have questions about CS PhD program, research collaboration, research/career/international life suggestions, etc.

For more information, please click here for the Curriculum Vitae.

Visit my research group: TensorWorld.

News

July 2026: Zecheng Li will present our TypeCraft: A Lightweight Data Type Profiler with High Resolution work at OSDI'26 in Seattle, WA, USA.
June 2026: Yanbo Zhao will present our Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC and SparseZeta work at PLDI'26 and ARRAY'26.
May 2026: We released TensorSuite, a suite of real and synthetic sparse tensors for benchmarking sparse tensor computations.
May 2026: Zhaonan Meng presented our STTID: High-Performance Sparse Tensor-Train Interpolative Decomposition work at IPDPS'26 in New Orleans, LA, USA.

Projects

Current Projects

CS2: Collaborative Research: CS2: Formally Verified and Performance-Optimized Tensor Contraction Sequences in Quantum Many-Body Computations

Lead PI: Jiajia Li; PIs: Bo Yuan (Rutgers), Cunxi Yu (Michigan), Ajay Panyala (PNNL)
NSF CS2 project, 05/01/2026 - 04/30/2029, Total amount: $1,500,000 (NCSU portion: $267,000)
CROSS-Large: Collaborative Research: PPoSS: LARGE: Cross-layer Coordination and Optimization for Scalable and Sparse Tensor Networks (CROSS)

Lead PI: Jiajia Li; PIs: Frank Mueller (NCSU), Dong Li (UC Merced), Lizhong Chen (OSU)
NSF PPoSS project, 09/07/2023 – 08/31/2028, Total amount: $5,000,000 (NCSU portion: $3,000,000)
CROSS-Plan: Collaborative Research: PPoSS: Planning: Cross-layer Coordination and Optimization for Scalable and Sparse Tensor Networks (CROSS)

Lead PI: Jiajia Li; PIs: Frank Mueller (NCSU), Dong Li (UC Merced), Lizhong Chen (OSU)
NSF PPoSS project, 10/01/2022 – 09/30/2024, Total amount: $250,000 (NCSU portion: $125,000)

Past Projects

SHARWK: Scalable Hypergraph Analysis Via Random Walk Kernels

PI: Jiajia Li
DOE EXPRESS project #656071, 11/14/2022 – 12/30/2023, Total amount: $78,382
DrGPU: Collaborative Research: CNS Core: SMALL: Optimizing GPU Programs via Novel Profiling Techniques

PIs: Jiajia Li, Xu Liu
NSF CNS project, 10/01/2021 - 09/30/2025, Total amount: $250,000
CloudSvc: Collaborative Research: CNS Core: Small: Towards Efficient Cloud Services

PIs: Jiajia Li, Xu Liu
NSF CNS project, 10/01/2020 - 09/30/2024, Total amount: $266,000
HiParTI: Application-Algorithm-Architecture Co-Design for Large-Scale, Sparse Tensor/Matrix Methods

PI: Jiajia Li; Team: Ang Li, Ajay Panyala
DOE PNNL LDRD project
Parallel Tensor Infrastructure (ParTI) on multicore CPUs and GPUs

Code released in Github: [ParTI]
SMAT (SpMV Auto-tuner)

More past projects

Awards

Selected Awards and Honors

The 39th IEEE International Conference on Computer Design (ICCD’21) Best Paper Award
Rising Stars in Computational and Data Sciences, 2019 [Link]
Principles and Practice of Parallel Programming (PPoPP’19) Best Paper Award Finalist
ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC'18) Best Student Paper Award
[PNNL Press] [GaTech Press]
SIAM ALA'18 Student Travel Grant
GaTech CoC Graduate Student Council Travel Grant
IBM PhD Fellowship for 2017-2018 [Link]
Travel grant from the Institute for Pure and Applied Mathematics (IPAM) for Big Data Meets Computation Workshop 2017
Selected students to attend IEEE-WIE Women’s Leadership Summit 2016
ZhuLiYueHua Award for the Excellent PhD Students of Chinese Academy of Sciences (Top 0.2%), 2013
Merit Student of Institute of Computing Technology, 2013
Xia Peisu Scholarship of Institute of Computing Technology (Top 1%), 2011
Outstanding Research Assistant of the Computer Architecture Laboratory at the University of Chinses Academy of Sciences, 2011
Outstanding Student of the Computer Architecture Laboratory at the University of Chinses Academy of Sciences, 2010

Publication

Recent Publications (2026-2024)

TypeCraft: A Lightweight Data Type Profiler with High Resolution
Zecheng Li, Xu Liu, Namhyung Kim, Blake Jones, Alexey Alexandrov, Jiajia Li.

USENIX Symposium on Operating Systems Design and Implementation (OSDI). 2026 (Accepted)
SparseZETA: Intelligent Auto-Tuner for Designing High-Performance SpMV Programs
Zhen Du, Ying Liu, Xionghui Chen, Yanbo Zhao, Xiaobing Feng, Huimin Cui, Jiajia Li.

Programming Language Design and Implementation (PLDI). 2026 (Accepted)
Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC
Yanbo Zhao, Zhaonan Meng, Sai Krishna Teja Varma Manthena, Xu Liu, Ajay Panyala, Jiajia Li.

ARRAY, co-located with Programming Language Design and Implementation (PLDI). 2026 (Accepted)
SmartDispatch: Dynamic Substitution of NumPy-style APIs on Heterogenous CPU-GPU Systems
Jinku Cui, Yueming Hao, Shuyin Jiao, Jiajia Li, Xu Liu.

Foundations of Software Engineering (FSE). 2026 (Accepted)
G-HEMP: Fast Multi-GPU Private Inference for Large-Scale GCNs with Homomorphic Encryption
Ran Ran, Zhaoting Gong, Zhaowei Li, Xianting Lu, Jiajia Li, Wujie Wen.

Machine Learning and Systems (MLSys). 2026 (Accepted)
STTID: High-Performance Sparse Tensor-Train Interpolative Decomposition
Zhaonan Meng, Miles Stoudenmire, Karl Pierce, Frank Mueller, Jiajia Li.

IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2026 (Accepted)
Diagonal-Budgeted Trotterization for Efficient Quantum Hamiltonian Simulation
Srikar Chundury, Blake Burgstahler, Jiajia Li, In-Saeng Suh, Frank Mueller.

International Conference on Supercomputing (ICS). 2026 (Accepted)
Bullseye Hash: An Efficient Hash-Table for Sparse Tensor Contraction
Guofeng Feng, Zecheng Li, Mingzhen Li, Weile Jia, Ninghui Sun, Guangming Tan, Jiajia Li.

ACM Transactions on Architecture and Code Optimization (TACO). 2026 (Accepted)
RedSan: A Redundant Memory Instruction Sanitizer for GPU Programs
Yanbo Zhao, Yueming Hao, Zecheng Li, Shuyin Jiao, Xu Liu, Jiajia Li.

ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC). 2025
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
Qidong Zhao, Hao Wu, Yueming Hao, Zilingfeng Ye, Jiajia Li, Xu Liu, Keren Zhou.

ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2025
SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation
Zecheng Li, Shruti Shivakumar, Jiajia Li, Ramakrishnan Kannan.

IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2025
SRSparse: Generating Codes for High-Performance Sparse Matrix-Vector Semiring Computations
Zhen Du, Ying Liu, Ninghui Sun, Huimin Cui, Xiaobing Feng, Jiajia Li.

ACM Transactions on Architecture and Code Optimization (TACO). 2025
Advancing Matrix Operations for High-Performance and Memory-Efficient Automata Processing on GPUs
Zhenlin Wu, Tianao Ge, Jiajia Li, Xinyu Chen, Hongyuan Liu.

ACM Transactions on Architecture and Code Optimization (TACO). 2025
gHyPart: GPU-friendly End-to-End Hypergraph Partitioner
Zhenlin Wu, Haosong Zhao, Hongyuan Liu, Wujie Wen, Jiajia Li.

ACM Transactions on Architecture and Code Optimization (TACO). 2025
FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks
Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, Jiajia Li.

International Conference on Supercomputing (ICS). 2024
PINE: Efficient Yet Effective Piecewise Linear Trees
Zecheng Li, Jiajia Li.

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). 2024 (Poster)
Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design
Guofeng Feng, Weile Jia, Ninghui Sun, Guangming Tan, Jiajia Li.

Principles and Practice of Parallel Programming (PPoPP). 2024 (Short Paper)
Accelerating Neural Differential Equations for Irregularly-Sampled Dynamical Systems Using Variational Formulation
Hongjue Zhao, Yuchen Wang, Hairong Qi, Jiajia Li, Lui Sha, Han Zhao, Huajie Shao.

ICLR Workshop on AI4DifferentialEquations in Science. 2024

Earlier Publications

Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness
Zhen Xie, Jie Liu, Jiajia Li, Dong Li.

Principles and Practice of Parallel Programming (PPoPP). 2023
Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition
Zheng Miao, Jon Calhoun, Rong Ge, Jiajia Li.

ACM Transactions on Parallel Computing (TOPC). 2023
AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices
Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, Ninghui Sun.

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). 2022
BALA-CPD: BALanced and Asynchronous Distributed Tensor Decomposition
Zheng Miao, Jiajia Li, Jon Calhoun, Rong Ge.

IEEE Cluster. 2022
DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications CGRAs
Cheng Tan, Nicolas Bohm Agostini, Tong Geng, Chenhao Xie, Jiajia Li, Ang Li, Kevin Barker, Antonino Tumeo.

IEEE International Symposium on High-Performance Computer Architecture (HPCA). 2022
LB-HM: Load Balance-Aware Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications
Zhen Xie, Jie Liu, Yuchen Ma, Jiajia Li, Dong Li.

Principles and Practice of Parallel Programming (PPoPP). 2022 (Poster)
A High Performance Sparse Tensor Algebra Compiler in MLIR
Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, Gokcen Kestor.

LLVM-HPC at SC. 2021
DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications
Cheng Tan, Tong Geng, Chenhao Xie, Nicolas Bohm Agostini, Jiajia Li, Ang Li, Kevin Barker, Antonino Tumeo.

IEEE International Conference on Computer Design (ICCD). 2021 Best Paper Award
A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs
Tong Geng, Chunshu Wu, Cheng Tan, Chenhao Xie, Anqi Guo, Pouya Haghi, Sarah Yuan He, Jiajia Li, Martin Herbordt, Ang Li.

IEEE High Performance Extreme Computing Conference (HPEC). 2021
Efficient Parallel Sparse Symmetric Tucker Decomposition for High-Order Tensors
Shruti Shivakumar, Jiajia Li, Ramakrishnan Kannan, Srinivas Aluru.

SIAM Conference on Applied and Computational Discrete Algorithms (ACDA). 2021
Athena: High-Performance Sparse Tensor Contraction Sequence on Heterogeneous Memory
Jiawen Liu, Dong Li, Roberto Gioiosa, Jiajia Li.

International Conference on Supercomputing (ICS). 2021
[paper] [bib]
Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory
Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, Jiajia Li.

Principles and Practice of Parallel Programming (PPoPP). 2021
[paper] [bib] [code]
Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory
Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, Jiajia Li.

Non-Volatile Memories Workshop (NVMW). 2021
A Sparse Tensor Benchmark Suite for CPUs and GPUs
Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, Kevin Barker.

IEEE International Symposium on Workload Characterization (IISWC). 2020
[paper] [bib] [code-PASTA]
Generic, Sparse Tensor Core for Neural Networks
Xiaolong Wu, Yang Yi, Dave (Jing) Tian, Jiajia Li.

Machine Learning for Software Hardware Co-Design (MLSH) at PACT. 2020
Programming Strategies for Irregular Algorithms on the Emu Chick
Eric Hein, Srinivas Eswar, Abdurrahman Yasar, Jiajia Li, Jeffrey S. Young, Tom Conte, Umit V. Catalyurek, Rich Vuduc, Jason Riedy, Bora Ucar.

ACM Transactions on Parallel Computing. 2020
Sparsity-Aware Distributed Tensor Decomposition
Zheng Miao, Jon C. Calhoun, Rong Ge, Jiajia Li.

ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC). 2020 (Poster)
High-Performance Sparse Tensor Algebra Compiler
Ruiqin Tian, Jiajia Li, Bin Ren, Gokcen Kestor.

Women in High Performance Computing Workshop (WHPC) at SC. 2020 (Poster)
On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics
Jesun Sahariar Firoz, Ang Li, Jiajia Li, Kevin Barker.

IEEE High Performance Extreme Computing Conference (HPEC). 2020
A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs
Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, Kevin Barker.

Principles and Practice of Parallel Programming (PPoPP). 2020 (Poster)
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan Tallent, Kevin Barker.

IEEE Transactions on Parallel and Distributed Systems. 2020
An Efficient Mixed-Mode Representation of Sparse Tensors
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prashant Rawat, Sriram Krishnamoorthy, P. (Saday) Sadayappan.

ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC). 2019
Efficient and Effective Sparse Tensor Reordering
Jiajia Li, Bora Ucar, Umit Catalyurek, Jimeng Sun, Kevin Barker, Richard Vuduc.

International Conference on Supercomputing (ICS). 2019
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
Jiajia Li, Yuchen Ma, Xiaolong Wu, Ang Li, Kevin Barker.

CCF Transactions on High Performance Computing. 2019
A Microbenchmark Characterization of the Emu Chick
Jeffrey S. Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, Thomas M. Conte.

Journal of Parallel Computing. 2019
A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs
Ke Meng, Jiajia Li, Guangming Tan.

Principles and Practice of Parallel Programming (PPoPP). 2019 Best Paper Award Finalist
Load-balanced Sparse MTTKRP on GPUs
Israt Nisa, Jiajia Li, Aravind Sukumaran Rajam, Richard Vuduc, P. (Saday) Sadayappan.

IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2019
An Autotuning Protocol to Rapidly Build Autotuners
Junhong Liu, Guangming Tan, Yulong Luo, Jiajia Li, Zeyao Mo, Ninghui Sun.

ACM Transactions on Parallel Computing. 2019
Scalable Tensor Decompositions in High Performance Computing Environments
Jiajia Li.

Ph.D. Dissertation, Georgia Institute of Technology. 2018
HiCOO: Hierarchical Storage of Sparse Tensors
Jiajia Li, Jimeng Sun, Richard Vuduc.

ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC). 2018 Best Student Paper Award
Optimizing Sparse Tensor Times Matrix on GPUs
Yuchen Ma, Jiajia Li, Xiaolong Wu, Chenggang Yan, Jimeng Sun, Richard Vuduc.

Journal of Parallel and Distributed Computing. 2018
An Initial Characterization of the Emu Chick
Eric Hein, Tom Conte, Jeffrey Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Richard Vuduc, Jason Riedy.

IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 2018
Bridging the Gap between Deep Learning and Sparse Matrix Format Selection
Yue Zhao, Jiajia Li, Chunhua Liao, Xipeng Shen.

Principles and Practice of Parallel Programming (PPoPP). 2018
Design and Implementation of Adaptive SpMV Library for Multicore and Manycore Architecture
Guangming Tan, Junhong Liu, Jiajia Li.

ACM Transactions on Mathematical Software. 2018
Model-Driven Sparse CP Decomposition for Higher-Order Tensors
Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, Richard Vuduc.

IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2017
Bridging the Gap between Deep Learning and Sparse Matrix Format Selection
Yue Zhao, Jiajia Li, Chunhua Liao, Xipeng Shen.

Parallel Architectures and Compilation Techniques (PACT). 2017 (Poster)
Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning
Xiuxia Zhang, Guangming Tan, Shuangbai Xue, Jiajia Li, Keren Zhou, Mingyu Chen.

Principles and Practice of Parallel Programming (PPoPP). 2017 Best Artifact Award
Optimizing Sparse Tensor Times Matrix on Multi-core and Many-core Architectures
Jiajia Li, Yuchen Ma, Chenggang Yan, Richard Vuduc.

IA3 Workshop at SC. 2016
An Input-Adaptive and In-Place Approach to Dense Tensor-Times-Matrix Multiply
Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, Richard Vuduc.

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). 2015
Introducing High Performance Computing Concepts into Engineering Undergraduate Curriculum: A Success Story
B. Neelima, Jiajia Li.

EduHPC Workshop at SC. 2015
Research on Sparse Matrix Vector Multiplication Auto-tuning Method
Jiajia Li.

Ph.D. Thesis, University of Chinese Academy of Sciences. 2013
SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication
Jiajia Li, Guangming Tan, Mingyu Chen, Ninghui Sun.

Programming Language Design and Implementation (PLDI). 2013
An Optimized Large-Scale Hybrid DGEMM Design for CPUs and ATI GPUs
Jiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun.

International Conference on Supercomputing (ICS). 2012
Study of Choosing the Best Storage Format of Sparse Matrix Vector Multiplication
Jiajia Li, Xiuxia Zhang, Guangming Tan, Mingyu Chen.

Journal of Computer Research and Development. 2012 (in Chinese)
Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks
Jiajia Li, Guangming Tan, Mingyu Chen.

International Conference on Parallel and Distributed Systems (ICPADS). 2010

Software

Research Software and Open-Source Tools

HiParTI

A Hierarchical Parallel Tensor Infrastructure
PASTA

A Parallel Sparse Tensor Algorithm Benchmark Suite
ParTI

A Parallel Tensor Infrastructure for Data Analysis
AdaTM

Adaptive Tensor Memoization algorithm for CP decomposition
InTensLi

Input-adaptive and in-place dense tensor-times-matrix multiply
SMAT

Sparse Matrix-vector multiplication Auto-Tuner
HDGEMM

A Hybrid DGEMM library on a Heterogeneous CPU-AMD GPU Architecture

Activities

Organizing and Editorial Activities

PC Vice Chair for Big Data Infrastructure of IEEE International Conference on Big Data (BigData'26).
AI & ML Track PC Chair of the International Supercomputing Conference (ISC'26).
Artifact Evaluation Co-Chair of Principles and Practice of Parallel Programming (PPoPP'26).
Finance Chair of International Conference on Parallel Architectures and Compilation Techniques (PACT'25).
Artifact Evaluation Co-Chair of Principles and Practice of Parallel Programming (PPoPP'25).
Registration Chair of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24).
General Co-Chair of the 1st Workshop on Cross-stack Optimization of Tensor Methods (XTensor'24) at ASPLOS'24.
Application and Algorithms Track PC Co-Chair of HPC Asia'24.
Industry Liaison Chair of Principles and Practice of Parallel Programming (PPoPP'23).
Artifact Evaluation Co-Chair of ACM SIGPLAN International Conference on Compiler Construction (CC'23).
Program Chair of Emerging Parallel and Distributed Runtime Systems and Middleware Workshop (IPDRM'22).
Co-Chair of the International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'21).

Program Committee and Reviewing Service

2026: Program committee member of SC, BigData, DAC, HPDC, ICDM, ICS, PPoPP, IPDPS, ICPP, and CLUSTER.
2025: Program committee member of SC, BigData, CCGrid, ICS, PPoPP, IPDPS, and SPAA.
2024: Program committee member of ICPP and IPDPS; session chair at SC'24.
2023: Program committee member of BigData, ICS, PPoPP, IPDPS, ICPP, and ICDCS.
2022: Program committee member of SC, IPDPS, ISC, CLUSTER, PPoPP, and SIAM PP.
2021: Program committee member of SC, ICS, LCTES, ICPP, CLUSTER, ICDCS, ISC, and NPC.
Selected prior service: Euro-Par'19, HiPC'19, HPC China (2013-2019, 2022), and reviewing for TPDS, TNNLS, JPDC, ParCo, Algorithmica, IEEE Access, and THPC.

Teaching

Courses Taught

Architecture of Parallel Computers - NCSU CSC/ECE 506-01, Fall 2025
Parallel Algorithms - NCSU CSC 491-005, CSC 591-126, Fall 2025
Parallel Systems - NCSU CSC 548-01, ECE 591-029, Spring 2025
Seminar in Computer Science - NCSU CSC 801-002, Fall 2024
Parallel Algorithms - NCSU CSC 591/791-126, ECE 591-025, Fall 2024
Accelerating Deep Learning - NCSU CSC 495-004/591-104, Spring 2024
Efficient Tensor Computation for AI and Scientific Applications - NCSU CSC 591/791-26, Fall 2023
Parallel Systems - NCSU CSC 548-01, Spring 2023
Accelerating Deep Learning - NCSU CSC 591/791-25, Spring 2023
Algorithms - W&M CSCI 303-01, Spring 2022
Accelerating Deep Learning - W&M CSCI 780-02, Fall 2021

Students

Ph.D. Students

Current
- Feiyang Zheng, started 2025
- Devadatta Mandaogane, started 2025
- Rahmy Salman, started 2024
- Zhaonan Meng, started 2024
- Sai Krishna Teja Varma Manthena, started 2024
- Zecheng Li, started 2023
- Sogolsadat Mansouri, started 2022
- Yanbo Zhao (co-advise), started 2022
- Yi Wang (co-advise), started 2021
- Jinku Cui (co-advise), started 2020
Alumni
- Qidong Zhao (co-advise), graduated 2025, now at Google

Master's Students

Current
- Zizhong Wang, started 2025
Alumni
- Sri Harshavardhan Reddy Deverapalli, graduated 2026, now at NCSU
- Mushtaq Ahmed Shaikh, graduated 2025
- Ahmed Taimoor, graduated 2025, now at NCSU
- Devadatta Mandaogane, graduated 2025, now at NCSU
- Swarnamalya Mohan, graduated 2024
- Sounder Rajendran, graduated 2024, now at AMD
- Sai Krishna Teja Varma Manthena, graduated 2024, now at NCSU
- Karthik Ganapathi Subramanian, graduated 2024
- Po-Hsun Lin, graduated 2024