About me

Jiajia Li is an Assistant Professor in Department of Computer Science at North Carolina State University (NCSU), Raleigh, NC. Her research emphasizes on high performance computing with a focus on the interaction among applications, numerical methods, data structures, algorithms, automatic performance tuning, and computer architectures. She is eager to pursue high performance sparse (multi-)linear algebra, solvers, and tensor decompositions for large-scale data analytics and domain applications on diverse computer architectures.


Jiajia Li was an Assistant Professor in Department of Computer Science at the College of William & Mary (W&M), Williamsburg, VA and a Research Scientist at High Performance Computing group of Pacific Northwest National Laboratory (PNNL), Richland, WA from 2018-2022. She has received her Ph.D. degree (Aug. 2018) in Computational Science & Engineering at Georgia Institute of Technology, advised by Professor Richard Vuduc. She has received Rising Stars in Computational and Data Sciences, Best Student Paper Award, and IBM PhD Fellowship. Before, she was a research intern of IBM Thomas J. Watson Research Center and Intel Parallel Computing Lab in the summers of 2016 and 2015 respectively. In the past, she has received a Ph.D. degree (Jul. 2013) from Institute of Computing Technology at Chinese Academy of Sciences. She received her B.S. (Jul. 2008) in Computational Mathematics from Dalian University of Technology in the Accelerated Student Program (2/180).


Please feel free to drop me an email @ jiajia.li@ncsu.edu if you have questions about CS PhD program, research collaboration, research/career/international life suggestions, etc.


For more information, please click here for the Curriculum Vitae

.

News

  • July 2026: Zecheng Li will present our TypeCraft: A Lightweight Data Type Profiler with High Resolution work at OSDI'26 in Seattle, WA, USA.
  • May 2026: Zhaonan Meng will present our STTID: High-Performance Sparse Tensor-Train Interpolative Decomposition work at IPDPS'26 in New Orleans, LA, USA.

Projects

Selected current and past projects. Expand each group for details.

Current Projects
Past Projects
  • SHARWK: Scalable Hypergraph Analysis Via Random Walk Kernels

    PI: Jiajia Li
    DOE EXPRESS project #656071, 11/14/2022 – 12/30/2023, Total amount: $78,382

  • HiParTI: Application-Algorithm-Architecture Co-Design for Large-Scale, Sparse Tensor/Matrix Methods

    PI: Jiajia Li; Team: Ang Li, Ajay Panyala
    DOE PNNL LDRD project

  • Parallel Tensor Infrastructure (ParTI) on multicore CPUs and GPUs

    Code released in Github: [ParTI]

  • SMAT (SpMV Auto-tuner)

More past projects

Awards

Selected Awards and Honors
  • The 39th IEEE International Conference on Computer Design (ICCD’21) Best Paper Award

  • Rising Stars in Computational and Data Sciences, 2019 [Link]

  • Principles and Practice of Parallel Programming (PPoPP’19) Best Paper Award Finalist

  • ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC'18) Best Student Paper Award
    [PNNL Press] [GaTech Press]

  • SIAM ALA'18 Student Travel Grant

  • GaTech CoC Graduate Student Council Travel Grant

  • IBM PhD Fellowship for 2017-2018 [Link]

  • Travel grant from the Institute for Pure and Applied Mathematics (IPAM) for Big Data Meets Computation Workshop 2017

  • Selected students to attend IEEE-WIE Women’s Leadership Summit 2016

  • ZhuLiYueHua Award for the Excellent PhD Students of Chinese Academy of Sciences (Top 0.2%), 2013

  • Merit Student of Institute of Computing Technology, 2013

  • Xia Peisu Scholarship of Institute of Computing Technology (Top 1%), 2011

  • Outstanding Research Assistant of the Computer Architecture Laboratory at the University of Chinses Academy of Sciences, 2011

  • Outstanding Student of the Computer Architecture Laboratory at the University of Chinses Academy of Sciences, 2010

Publication

Recent Publications (2026-2024)
  • TypeCraft: A Lightweight Data Type Profiler with High Resolution

    Zecheng Li, Xu Liu, Namhyung Kim, Blake Jones, Alexey Alexandrov, Jiajia Li.

    USENIX Symposium on Operating Systems Design and Implementation (OSDI).

  • SmartDispatch: Dynamic Substitution of NumPy-style APIs on Heterogenous CPU-GPU Systems

    Jinku Cui, Yueming Hao, Shuyin Jiao, Jiajia Li, Xu Liu.

    Foundations of Software Engineering (FSE).

  • G-HEMP: Fast Multi-GPU Private Inference for Large-Scale GCNs with Homomorphic Encryption

    Ran Ran, Zhaoting Gong, Zhaowei Li, Xianting Lu, Jiajia Li, Wujie Wen.

    Machine Learning and Systems (MLSys).

  • STTID: High-Performance Sparse Tensor-Train Interpolative Decomposition

    Zhaonan Meng, Miles Stoudenmire, Karl Pierce, Frank Mueller, Jiajia Li.

    IEEE International Parallel and Distributed Processing Symposium (IPDPS).

  • RedSan: A Redundant Memory Instruction Sanitizer for GPU Programs

    Yanbo Zhao, Yueming Hao, Zecheng Li, Shuyin Jiao, Xu Liu, Jiajia Li.

    ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC).

  • DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads

    Qidong Zhao, Hao Wu, Yueming Hao, Zilingfeng Ye, Jiajia Li, Xu Liu, Keren Zhou.

    ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

  • SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation

    Zecheng Li, Shruti Shivakumar, Jiajia Li, Ramakrishnan Kannan.

    IEEE International Parallel and Distributed Processing Symposium (IPDPS).

  • SRSparse: Generating Codes for High-Performance Sparse Matrix-Vector Semiring Computations

    Zhen Du, Ying Liu, Ninghui Sun, Huimin Cui, Xiaobing Feng, Jiajia Li.

    ACM Transactions on Architecture and Code Optimization (TACO).

  • Advancing Matrix Operations for High-Performance and Memory-Efficient Automata Processing on GPUs

    Zhenlin Wu, Tianao Ge, Jiajia Li, Xinyu Chen, Hongyuan Liu.

    ACM Transactions on Architecture and Code Optimization (TACO).

  • gHyPart: GPU-friendly End-to-End Hypergraph Partitioner

    Zhenlin Wu, Haosong Zhao, Hongyuan Liu, Wujie Wen, Jiajia Li.

    ACM Transactions on Architecture and Code Optimization (TACO).

  • FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks

    Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, Jiajia Li.

    International Conference on Supercomputing (ICS).

  • PINE: Efficient Yet Effective Piecewise Linear Trees

    Zecheng Li, Jiajia Li.

    ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC).

  • Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design

    Guofeng Feng, Weile Jia, Ninghui Sun, Guangming Tan, Jiajia Li.

    Principles and Practice of Parallel Programming (PPoPP).

  • Accelerating Neural Differential Equations for Irregularly-Sampled Dynamical Systems Using Variational Formulation

    Hongjue Zhao, Yuchen Wang, Hairong Qi, Jiajia Li, Lui Sha, Han Zhao, Huajie Shao.

    ICLR Workshop on AI4DifferentialEquations in Science.

Earlier Publications
  • Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness

    Zhen Xie, Jie Liu, Jiajia Li, Dong Li.

    Principles and Practice of Parallel Programming (PPoPP).

  • Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition

    Zheng Miao, Jon Calhoun, Rong Ge, Jiajia Li.

    ACM Transactions on Parallel Computing (TOPC).

  • AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices

    Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, Ninghui Sun.

    ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC).

  • BALA-CPD: BALanced and Asynchronous Distributed Tensor Decomposition

    Zheng Miao, Jiajia Li, Jon Calhoun, Rong Ge.

    IEEE Cluster.

  • DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications CGRAs

    Cheng Tan, Nicolas Bohm Agostini, Tong Geng, Chenhao Xie, Jiajia Li, Ang Li, Kevin Barker, Antonino Tumeo.

    IEEE International Symposium on High-Performance Computer Architecture (HPCA).

  • LB-HM: Load Balance-Aware Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications

    Zhen Xie, Jie Liu, Yuchen Ma, Jiajia Li, Dong Li.

    Principles and Practice of Parallel Programming (PPoPP).

  • A High Performance Sparse Tensor Algebra Compiler in MLIR

    Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, Gokcen Kestor.

    LLVM-HPC at SC.

  • DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications

    Cheng Tan, Tong Geng, Chenhao Xie, Nicolas Bohm Agostini, Jiajia Li, Ang Li, Kevin Barker, Antonino Tumeo.

    IEEE International Conference on Computer Design (ICCD). Best Paper Award

  • A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs

    Tong Geng, Chunshu Wu, Cheng Tan, Chenhao Xie, Anqi Guo, Pouya Haghi, Sarah Yuan He, Jiajia Li, Martin Herbordt, Ang Li.

    IEEE High Performance Extreme Computing Conference (HPEC).

  • Efficient Parallel Sparse Symmetric Tucker Decomposition for High-Order Tensors

    Shruti Shivakumar, Jiajia Li, Ramakrishnan Kannan, Srinivas Aluru.

    SIAM Conference on Applied and Computational Discrete Algorithms (ACDA).

  • Athena: High-Performance Sparse Tensor Contraction Sequence on Heterogeneous Memory

    Jiawen Liu, Dong Li, Roberto Gioiosa, Jiajia Li.

    International Conference on Supercomputing (ICS).
    [paper] [bib]

  • Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory

    Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, Jiajia Li.

    Principles and Practice of Parallel Programming (PPoPP).
    [paper] [bib] [code]

  • Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory

    Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, Jiajia Li.

    Non-Volatile Memories Workshop (NVMW).

  • A Sparse Tensor Benchmark Suite for CPUs and GPUs

    Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, Kevin Barker.

    IEEE International Symposium on Workload Characterization (IISWC).
    [paper] [bib] [code-PASTA]

  • Generic, Sparse Tensor Core for Neural Networks

    Xiaolong Wu, Yang Yi, Dave (Jing) Tian, Jiajia Li.

    Machine Learning for Software Hardware Co-Design (MLSH) at PACT.

  • Programming Strategies for Irregular Algorithms on the Emu Chick

    Eric Hein, Srinivas Eswar, Abdurrahman Yasar, Jiajia Li, Jeffrey S. Young, Tom Conte, Umit V. Catalyurek, Rich Vuduc, Jason Riedy, Bora Ucar.

    ACM Transactions on Parallel Computing.

  • Sparsity-Aware Distributed Tensor Decomposition

    Zheng Miao, Jon C. Calhoun, Rong Ge, Jiajia Li.

    ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC).

  • High-Performance Sparse Tensor Algebra Compiler

    Ruiqin Tian, Jiajia Li, Bin Ren, Gokcen Kestor.

    Women in High Performance Computing Workshop (WHPC) at SC.

  • On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics

    Jesun Sahariar Firoz, Ang Li, Jiajia Li, Kevin Barker.

    IEEE High Performance Extreme Computing Conference (HPEC).

  • A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs

    Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, Kevin Barker.

    Principles and Practice of Parallel Programming (PPoPP).

  • Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

    Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan Tallent, Kevin Barker.

    IEEE Transactions on Parallel and Distributed Systems.

  • An Efficient Mixed-Mode Representation of Sparse Tensors

    Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prashant Rawat, Sriram Krishnamoorthy, P. (Saday) Sadayappan.

    ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC).

  • Efficient and Effective Sparse Tensor Reordering

    Jiajia Li, Bora Ucar, Umit Catalyurek, Jimeng Sun, Kevin Barker, Richard Vuduc.

    International Conference on Supercomputing (ICS).

  • PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite

    Jiajia Li, Yuchen Ma, Xiaolong Wu, Ang Li, Kevin Barker.

    CCF Transactions on High Performance Computing.

  • A Microbenchmark Characterization of the Emu Chick

    Jeffrey S. Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, Thomas M. Conte.

    Journal of Parallel Computing.

  • A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs

    Ke Meng, Jiajia Li, Guangming Tan.

    Principles and Practice of Parallel Programming (PPoPP). Best Paper Award Finalist

  • Load-balanced Sparse MTTKRP on GPUs

    Israt Nisa, Jiajia Li, Aravind Sukumaran Rajam, Richard Vuduc, P. (Saday) Sadayappan.

    IEEE International Parallel and Distributed Processing Symposium (IPDPS).

  • An Autotuning Protocol to Rapidly Build Autotuners

    Junhong Liu, Guangming Tan, Yulong Luo, Jiajia Li, Zeyao Mo, Ninghui Sun.

    ACM Transactions on Parallel Computing.

  • Scalable Tensor Decompositions in High Performance Computing Environments

    Jiajia Li.

    Ph.D. Dissertation, Georgia Institute of Technology.

  • HiCOO: Hierarchical Storage of Sparse Tensors

    Jiajia Li, Jimeng Sun, Richard Vuduc.

    ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC). Best Student Paper Award

  • Optimizing Sparse Tensor Times Matrix on GPUs

    Yuchen Ma, Jiajia Li, Xiaolong Wu, Chenggang Yan, Jimeng Sun, Richard Vuduc.

    Journal of Parallel and Distributed Computing.

  • An Initial Characterization of the Emu Chick

    Eric Hein, Tom Conte, Jeffrey Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Richard Vuduc, Jason Riedy.

    IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

  • Bridging the Gap between Deep Learning and Sparse Matrix Format Selection

    Yue Zhao, Jiajia Li, Chunhua Liao, Xipeng Shen.

    Principles and Practice of Parallel Programming (PPoPP).

  • Design and Implementation of Adaptive SpMV Library for Multicore and Manycore Architecture

    Guangming Tan, Junhong Liu, Jiajia Li.

    ACM Transactions on Mathematical Software.

  • Model-Driven Sparse CP Decomposition for Higher-Order Tensors

    Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, Richard Vuduc.

    IEEE International Parallel and Distributed Processing Symposium (IPDPS).

  • Bridging the Gap between Deep Learning and Sparse Matrix Format Selection

    Yue Zhao, Jiajia Li, Chunhua Liao, Xipeng Shen.

    Parallel Architectures and Compilation Techniques (PACT).

  • Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

    Xiuxia Zhang, Guangming Tan, Shuangbai Xue, Jiajia Li, Keren Zhou, Mingyu Chen.

    Principles and Practice of Parallel Programming (PPoPP). Best Artifact Award

  • Optimizing Sparse Tensor Times Matrix on Multi-core and Many-core Architectures

    Jiajia Li, Yuchen Ma, Chenggang Yan, Richard Vuduc.

    IA3 Workshop at SC.

  • An Input-Adaptive and In-Place Approach to Dense Tensor-Times-Matrix Multiply

    Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, Richard Vuduc.

    ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC).

  • Introducing High Performance Computing Concepts into Engineering Undergraduate Curriculum: A Success Story

    B. Neelima, Jiajia Li.

    EduHPC Workshop at SC.

  • Research on Sparse Matrix Vector Multiplication Auto-tuning Method

    Jiajia Li.

    Ph.D. Thesis, University of Chinese Academy of Sciences.

  • SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication

    Jiajia Li, Guangming Tan, Mingyu Chen, Ninghui Sun.

    Programming Language Design and Implementation (PLDI).

  • An Optimized Large-Scale Hybrid DGEMM Design for CPUs and ATI GPUs

    Jiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun.

    International Conference on Supercomputing (ICS).

  • Study of Choosing the Best Storage Format of Sparse Matrix Vector Multiplication

    Jiajia Li, Xiuxia Zhang, Guangming Tan, Mingyu Chen.

    Journal of Computer Research and Development. (in Chinese)

  • Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks

    Jiajia Li, Guangming Tan, Mingyu Chen.

    International Conference on Parallel and Distributed Systems (ICPADS).

Software

Research Software and Open-Source Tools
  • HiParTI

    A Hierarchical Parallel Tensor Infrastructure

  • PASTA

    A Parallel Sparse Tensor Algorithm Benchmark Suite

  • ParTI

    A Parallel Tensor Infrastructure for Data Analysis

  • AdaTM

    Adaptive Tensor Memoization algorithm for CP decomposition

  • InTensLi

    Input-adaptive and in-place dense tensor-times-matrix multiply

  • SMAT

    Sparse Matrix-vector multiplication Auto-Tuner

  • HDGEMM

    A Hybrid DGEMM library on a Heterogeneous CPU-AMD GPU Architecture

Activities

Organizing and Editorial Activities
  • PC Vice Chair for Big Data Infrastructure of IEEE International Conference on Big Data (BigData'26).

  • AI & ML Track PC Chair of the International Supercomputing Conference (ISC'26).

  • Artifact Evaluation Co-Chair of Principles and Practice of Parallel Programming (PPoPP'26).

  • Finance Chair of International Conference on Parallel Architectures and Compilation Techniques (PACT'25).

  • Artifact Evaluation Co-Chair of Principles and Practice of Parallel Programming (PPoPP'25).

  • Registration Chair of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24).

  • General Co-Chair of the 1st Workshop on Cross-stack Optimization of Tensor Methods (XTensor'24) at ASPLOS'24.

  • Application and Algorithms Track PC Co-Chair of HPC Asia'24.

  • Industry Liaison Chair of Principles and Practice of Parallel Programming (PPoPP'23).

  • Artifact Evaluation Co-Chair of ACM SIGPLAN International Conference on Compiler Construction (CC'23).

  • Program Chair of Emerging Parallel and Distributed Runtime Systems and Middleware Workshop (IPDRM'22).

  • Co-Chair of the International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'21).

Program Committee and Reviewing Service
  • 2026: Program committee member of SC, BigData, DAC, HPDC, ICS, PPoPP, IPDPS, ICPP, and CLUSTER.

  • 2025: Program committee member of SC, BigData, CCGrid, ICS, PPoPP, IPDPS, and SPAA.

  • 2024: Program committee member of ICPP and IPDPS; session chair at SC'24.

  • 2023: Program committee member of BigData, ICS, PPoPP, IPDPS, ICPP, and ICDCS.

  • 2022: Program committee member of SC, IPDPS, ISC, CLUSTER, PPoPP, and SIAM PP.

  • 2021: Program committee member of SC, ICS, LCTES, ICPP, CLUSTER, ICDCS, ISC, and NPC.

  • Selected prior service: Euro-Par'19, HiPC'19, HPC China (2013-2019, 2022), and reviewing for TPDS, TNNLS, JPDC, ParCo, Algorithmica, IEEE Access, and THPC.

Teaching

Recent teaching and mentoring activities are adapted from the current PA2 dossier and grouped here in a shorter website format.

Recent Courses Taught
  • Architecture of Parallel Computers

    NCSU CSC/ECE 506-01, Fall 2025, Enrollment: 40

  • Parallel Algorithms

    NCSU CSC 491-005, CSC 591-126, Fall 2025, Enrollment: 8

  • Parallel Systems

    NCSU CSC 548-01, ECE 591-029, Spring 2025, Enrollment: 62

  • Seminar in Computer Science

    NCSU CSC 801-002, Fall 2024, Enrollment: 16+

  • Parallel Algorithms

    NCSU CSC 591/791-126, ECE 591-025, Fall 2024, Enrollment: 19

  • Accelerating Deep Learning

    NCSU CSC 495-004/591-104, Spring 2024, Enrollment: 26

  • Earlier Graduate and Undergraduate Courses

    Additional offerings include Efficient Tensor Computation for AI and Scientific Applications, Parallel Systems, and courses at William & Mary on Algorithms and Accelerating Deep Learning.

People

Students listed here are drawn from Section II.C. Mentoring Activities in the current PA2 dossier.

Ph.D. Students
  • Current

    • Feiyang Zheng, started 2025
    • Devadatta Mandaogane, started 2025
    • Rahmy Salman, started 2024
    • Zhaonan Meng, started 2024
    • Sai Krishna Teja Varma Manthena, started 2024
    • Zecheng Li, started 2023
    • Sogolsadat Mansouri, started 2022
    • Yanbo Zhao (co-advise), started 2022
    • Yi Wang (co-advise), started 2021
    • Jinku Cui (co-advise), started 2020
  • Alumni

    • Qidong Zhao (co-advise), graduated 2025, now at Google
Master's Students
  • Current

    • Zizhong Wang, started 2025
  • Alumni

    • Sri Harshavardhan Reddy Deverapalli, graduated 2026, now at NCSU
    • Mushtaq Ahmed Shaikh, graduated 2025
    • Ahmed Taimoor, graduated 2025, now at NCSU
    • Devadatta Mandaogane, graduated 2025, now at NCSU
    • Swarnamalya Mohan, graduated 2024
    • Sounder Rajendran, graduated 2024, now at AMD
    • Sai Krishna Teja Varma Manthena, graduated 2024, now at NCSU
    • Karthik Ganapathi Subramanian, graduated 2024
    • Po-Hsun Lin, graduated 2024