# publications

## 2023

## 2022

- ASPLOSPaulihedral: A Generalized Block-Wise Compiler Optimization Framework for Quantum Simulation KernelsGushu Li, Anbang Wu,
*Yunong Shi*, and 3 more authors*ASPLOS*, 2022The quantum simulation kernel is an important subroutine appearing as a very long gate sequence in many quantum programs. In this paper, we propose Paulihedral, a block-wise compiler framework that can deeply optimize this subroutine by exploiting high-level program structure and optimization opportunities. Paulihedral first employs a new Pauli intermediate representation that can maintain the high-level semantics and constraints in quantum simulation kernels. This naturally enables new large-scale optimizations that are hard to implement at the low gate-level. In particular, we propose two technology-independent instruction scheduling passes, and two technology-dependent code optimization passes which reconcile the circuit synthesis, gate cancellation, and qubit mapping stages of the compiler. Experimental results show that Paulihedral can outperform state-of-the-art compiler infrastructures in a wide-range of applications on both near-term superconducting quantum processors and future fault-tolerant quantum computers.

- PLDIGiallar: Push-Button Verification for the Qiskit Quantum CompilerRunzhou Tao,
*Yunong Shi*, Jianan Yao, and 5 more authors*PLDI*, 2022This paper presents Giallar, a fully-automated verification toolkit for quantum compilers. Giallar requires no manual specifications, invariants, or proofs, and can automatically verify that a compiler pass preserves the semantics of quantum circuits. To deal with unbounded loops in quantum compilers, Giallar abstracts three loop templates, whose loop invariants can be automatically inferred. To efficiently check the equivalence of arbitrary input and output circuits that have complicated matrix semantics representation, Giallar introduces a symbolic representation for quantum circuits and a set of rewrite rules for showing the equivalence of symbolic quantum circuits. With Giallar, we implemented and verified 44 (out of 56) compiler passes in 13 versions of the Qiskit compiler, the open-source quantum compiler standard, during which three bugs were detected in and confirmed by Qiskit. Our evaluation shows that most of Qiskit compiler passes can be automatically verified in seconds and verification imposes only a modest overhead to compilation performance.

## 2021

- PLDIGleipnir: Toward Practical Error Analysis for Quantum ProgramsRunzhou Tao,
*Yunong Shi*, Jianan Yao, and 3 more authors*PLDI 2021*, 2021Practical error analysis is essential for the design, optimization, and evaluation of Noisy Intermediate-Scale Quantum(NISQ) computing. However, bounding errors in quantum programs is a grand challenge, because the effects of quantum errors depend on exponentially large quantum states. In this work, we present Gleipnir, a novel methodology toward practically computing verified error bounds in quantum programs. Gleipnir introduces the (ρ,δ)-diamond norm, an error metric constrained by a quantum predicate consisting of the approximate state ρ and its distance δ to the ideal state ρ. This predicate (ρ,δ) can be computed adaptively using tensor networks based on the Matrix Product States. Gleipnir features a lightweight logic for reasoning about error bounds in noisy quantum programs, based on the (ρ,δ)-diamond norm metric. Our experimental results show that Gleipnir is able to efficiently generate tight error bounds for real-world quantum programs with 10 to 100 qubits, and can be used to evaluate the error mitigation performance of quantum compiler transformations.

- NANOCOMOn the Co-Design of Quantum Software and HardwareGushu Li, Anbang Wu,
*Yunong Shi*, and 3 more authors*NANOCOM*, 2021A quantum computing system naturally consists of two components, the software system and the hardware system. Quantum applications are programmed using the quantum software and then executed on the quantum hardware. However, the performance of existing quantum computing system is still limited. Solving a practical problem that is beyond the capability of classical computers on a quantum computer has not yet been demonstrated. In this review, we point out that the quantum software and hardware systems should be designed collaboratively to fully exploit the potential of quantum computing. We first review three related works, including one hardware-aware quantum compiler optimization, one application-aware quantum hardware architecture design flow, and one co-design approach for the emerging quantum computational chemistry. Then we discuss some potential future directions following the co-design principle.

- ISCASoftware-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum ProcessorsGushu Li,
*Yunong Shi*, and Ali Javadi-Abhari*ISCA*, 2021Computational chemistry is the leading application to demonstrate the advantage of quantum computing in the near term. However, large-scale simulation of chemical systems on quantum computers is currently hindered due to a mismatch between the computational resource needs of the program and those available in today’s technology. In this paper we argue that significant new optimizations can be discovered by co-designing the application, compiler, and hardware. We show that multiple optimization objectives can be coordinated through the key abstraction layer of Pauli strings, which are the basic building blocks of computational chemistry programs. In particular, we leverage Pauli strings to identify critical program components that can be used to compress program size with minimal loss of accuracy. We also leverage the structure of Pauli string simulation circuits to tailor a novel hardware architecture and compiler, leading to significant execution overhead reduction by up to 99%. While exploiting the high-level domain knowledge reveals significant optimization opportunities, our hardware/software framework is not tied to a particular program instance and can accommodate the full family of computational chemistry problems with such structure. We believe the co-design lessons of this study can be extended to other domains and hardware technologies to hasten the onset of quantum advantage.

## 2020

- PIEEEResource-Efficient Quantum Computing by Breaking Abstractions
*Yunong Shi*, Pranav Gokhale, Prakash Murali, and 11 more authors*Proceedings of the IEEE*, 2020Building a quantum computer that surpasses the computational power of its classical counterpart is a great engineering challenge. Quantum software optimizations can provide an accelerated pathway to the first generation of quantum computing applications that might save years of engineering effort. Current quantum software stacks follow a layered approach similar to the stack of classical computers, which was designed to manage the complexity. In this review, we point out that greater efficiency of quantum computing systems can be achieved by breaking the abstractions between these layers. We review several works along this line, including two hardware-aware compilation optimizations that break the quantum Instruction Set Architecture (ISA) abstraction and two error-correction/informationprocessing schemes that break the qubit abstraction. Last, we discuss several possible future directions.

## 2019

- ASPLOSOptimized Compilation of Aggregated Instructions for Realistic Quantum Computers
*Yunong Shi*, Nelson Leung, Pranav Gokhale, and 4 more authors*ASPLOS*, 2019Recent developments in engineering and algorithms have made real-world applications in quantum computing possible in the near future. Existing quantum programming languages and compilers use a quantum assembly language composed of 1- and 2-qubit (quantum bit) gates. Quantum compiler frameworks translate this quantum assembly to electric signals (called control pulses) that implement the specified computation on specific physical devices. However, there is a mismatch between the operations defined by the 1- and 2-qubit logical ISA and their underlying physical implementation, so the current practice of directly translating logical instructions into control pulses results in inefficient, high-latency programs. To address this inefficiency, we propose a universal quantum compilation methodology that aggregates multiple logical operations into larger units that manipulate up to 10 qubits at a time. Our methodology then optimizes these aggregates by (1) finding commutative intermediate operations that result in more efficient schedules and (2) creating custom control pulses optimized for the aggregate (instead of individual 1- and 2-qubit operations). Compared to the standard gate-based compilation, the proposed approach realizes a deeper vertical integration of high-level quantum software and low-level, physical quantum hardware. We evaluate our approach on important near-term quantum applications on simulations of superconducting quantum architectures. Our proposed approach provides a mean speedup of 5times, with a maximum of 10times. Because latency directly affects the feasibility of quantum computation, our results not only improve performance but also have the potential to enable quantum computation sooner than otherwise possible.

- MICROPartial Compilation of Variational Algorithms for Noisy Intermediate-Scale Quantum MachinesPranav Gokhale, Yongshan Ding, Thomas Propson, and 6 more authors
*In*, 2019Quantum computing is on the cusp of reality with Noisy Intermediate-Scale Quantum (NISQ) machines currently under development and testing. Some of the most promising algorithms for these machines are variational algorithms that employ classical optimization coupled with quantum hardware to evaluate the quality of each candidate solution. Recent work used GRadient Descent Pulse Engineering (GRAPE) to translate quantum programs into highly optimized machine control pulses, resulting in a significant reduction in the execution time of programs. This is critical, as quantum machines can barely support the execution of short programs before failing.However, GRAPE suffers from high compilation latency, which is untenable in variational algorithms since compilation is interleaved with computation. We propose two strategies for partial compilation, exploiting the structure of variational circuits to pre-compile optimal pulses for specific blocks of gates. Our results indicate significant pulse speedups ranging from 1.5x-3x in typical benchmarks, with only a small fraction of the compilation latency of GRAPE.

- NJPFault-tolerant preparation of approximate GKP states
*Yunong Shi*, Christopher Chamberland, and Andrew Cross*New Journal of Physics*, Sep 2019Gottesman–Kitaev–Preskill (GKP) states appear to be amongst the leading candidates for correcting errors when encoding qubits into oscillators. However the preparation of GKP states remains a significant theoretical and experimental challenge. Until now, no clear definitions for fault-tolerantly preparing GKP states have been provided. Without careful consideration, a small number of faults can lead to large uncorrectable shift errors. After proposing a metric to compare approximate GKP states, we provide rigorous definitions of fault-tolerance and introduce a fault-tolerant phase estimation protocol for preparing such states. The fault-tolerant protocol uses one flag qubit and accepts only a subset of states in order to prevent measurement readout errors from causing large shift errors. We then show how the protocol can be implemented using circuit QED. In doing so, we derive analytic expressions which describe the leading order effects of the nonlinear dispersive shift and Kerr nonlinearity. Using these expressions, we show that to mitigate the nonlinear dispersive shift and Kerr terms would require the protocol to be implemented on time scales four orders of magnitude longer than the time scales relevant to the protocol for physically motivated parameters. Despite these restrictions, we numerically show that a subset of the accepted states of the fault-tolerant phase estimation protocol maintain good error correcting capabilities even in the presence of noise.