#### Toward Failure Recoverable And Secured Persistent Memory Systems

IEEE Data & Storage Symposium 2022

#### Sihang Liu

University of Virginia (Current)

University of Waterloo (Joining in 2023 as an Assistant Professor)



June 9, 2022

#### **Demand for Memory and Storage**



Source: Kanev et al. ISCA'15.

#### Memory and storage takes the majority of time

#### **Memory and Storage Technologies**



| Memory  |                  | Volatile<br>Low-capacity<br>High-cost           | Intel Optane Persistent Memory |
|---------|------------------|-------------------------------------------------|--------------------------------|
|         | DRAM             | Fast                                            |                                |
| Storage | ିତ୍ତି<br>HDD/SSD | Persistent<br>High-capacity<br>Low-cost<br>Slow | Persistent Memory              |

Persistent memory *unifies* memory and storage

## **System Stack for Persistent Memory**



Unify memory and storage

Enable better performance using direct management of persistent data

## **System Stack Redesign for Persistent Memory**



#### My research: Seamlessly integrate persistent memory by redesigning both the software and hardware

## **Program Correctness**



[ASPLOS'21, ASPLOS'20, ASPLOS'19]

Persistent Memory System

Ensure *correct failure-recovery* for persistent memory programs

## **Efficiency and Security**

System Stack for Persistent Memory

PM Applications

PM Library

Processor PM HW Support

Persistent Memory

[*In-submission*, PACT'21, ISCA'19, HPCA'18]



Design *efficient* and *secured* persistent memory hardware

# Outline

System Stack for Persistent Memory

PM Applications

PM Library

Processor PM HW Support

Persistent Memory

#### **Software Support for Persistent Memory**



#### Hardware System for Persistent Memory



# Outline

System Stack for Persistent Memory

PM Applications

**PM** Library

Processor PM HW Support

Persistent Memory

#### **Software Support for Persistent Memory**



A test case generator for better *testing efficiency* 



#### Hardware System for Persistent Memory



# **PMTest:**

# A Fast and Flexible Testing Framework for Persistent Memory Programs

Sihang Liu, Yizhou Wei, Jishen Zhao, Aasheesh Kolli, and Samira Khan

The 2019 International Conference on Architectural Support for Programming Languages and Operating Systems (**ASPLOS**)

**NVMW Memorable Paper Award—Finalist** 

# What if the system fails?

#### Conventional System Persistent Memory System



#### Faster, direct access benefits storage applications

## **Software for Persistent Memory Systems**



File system handles Program customizes recovery recovery

The burden of *failure-recovery* lies on the programmers

## **Programming for Persistent Memory Systems**

- Support for crash consistency has *two fundamental guarantees* 
  - Persistence: writes become persistent on demand



## **Programming for Persistent Memory Systems**

- Support for crash consistency has *two fundamental guarantees* 
  - Persistence: writes become persistent on demand
  - Ordering: one write becomes persistent before another



#### **Example of Persistent Memory Programming**



#### Ensuring crash consistency is hard!



## **Programming for Persistent Memory Is Hard!**



Directly using *low-level primitives* to implement crash-consistent programs is *not trivial* 

System Experts

E.g., Software developed by Lenovo has misuse of low-level primitives, e.g., persist barrier()



Libraries are developed for persistent memory to make programming easier

(e.g., Intel's PMDK library)

| <b>₽</b> master (#3134)  | © 1.6 1.5-rc1 examples: btree: snapshot node before modifying it                                                                                                                         |
|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| )<br>m                   | Found by PMTest                                                                                                                                                                          |
| E Shov                   | pbalcer commented on May 11, 2021                                                                                                                                                        |
| 1                        | Yup, you are right. But I think the map_init should be called unconditionally after an open for all the maps (might need to check for NULL though). Feel free to create a PR.<br>Thanks! |
| 365<br>366               |                                                                                                                                                                                          |
| 367<br>368<br>369<br>370 | Iplewa added QA Escape: Known issue QA Escape: Missing test and removed QA Escape: Known issue labels on Jun 24, 2021                                                                    |
| 371                      | R Iplewa assigned DamianDuy on Jul 19, 2021                                                                                                                                              |
| licat                    | DamianDuy closed this on Aug 25, 2021                                                                                                                                                    |

#### We need to test persistent memory programs

#### **Challenge I: Expose Crash Consistency Issues**



How can we expose crash consistency bugs?

## **Challenge II: Various Persistent Memory Systems**



#### How can we cover *various software* and *hardware*?

# Operations for crash consistency are similar: guarantees of *ordering* and *persistence*



#### **Expose persistence and ordering**

### **Expose Persistence and Ordering**

Construct *persistence intervals* from instruction trace A time interval in which a write may become persistent

Deduce *persistence* and *ordering* 



#### **Expose Persistence and Ordering**

Construct *persistence intervals* from instruction trace *A time interval in which a write may become persistent Specification: A becomes persistent before B* 



# **Our Work: PMTest**



Workflow:

- Tracks accesses to persistent memory
- Deduce the ordering and persistence
- Check against specifications
- Detect crash consistency issues





# These tools have detected **18 bugs** in existing software **produced by the industry** for persistent memory systems



[1] Cross-Failure Bug Detection in Persistent Memory Programs.

Sihang Liu, Korakit Seemakhupt, Yizhou Wei, Thomas Wenisch, Aasheesh Kolli, and Samira Khan. ASPLOS. 2020.

[2] PMFuzz: Test Case Generation for Persistent Memory Programs. Sihang Liu\*, Suyash Mahar\*, Baishakhi Ray, and Samira Khan. ASPLOS. 2021





# Outline

System Stack for Persistent Memory

PM Applications

**PM** Library

Processor PM HW Support

Persistent Memory

#### **Software Support for Persistent Memory**



#### Hardware System for Persistent Memory



#### **Janus:**

# Optimizing Memory and Storage Support for Non-Volatile Memory Systems

Sihang Liu, Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira Khan

The 2019 International Symposium on Computer Architecture (ISCA)

MICRO Top Picks—Honorable Mention

#### **Persistent Memory Hardware**

Persistent memory hardware comes different types of supports



**Backend operations** in persistent memory hardware

### **Increased Write Latency**



Backend operations *increase write latency* 

## Why is write latency critical?



#### **Overhead of Backend Operations**



Backend operations *increase* the execution time

## **Challenges in Optimizing Backend Operations**

Each backend operation seems *indivisible* 

➡ Integration leads to **serialized** operations



## **Decomposition of Backend Operations**

However, it is possible to decompose them into sub-operations



Generate counter Encrypt counter Data ⊕ Encrypted counter Generate MAC (for integrity verification)

#### **Decomposition of Backend Operations**

**Decomposing** the example operations:



Decomposing backend operations enables more optimizations

## **Optimization: Parallelization**

There are two types of dependencies:



#### 1. Dependency within each operation

## **Optimization: Parallelization**

There are two types of dependencies:

*Intra-operation* dependency *Inter-operation* dependency



Sub-operations without dependency can execute in parallel

### **Optimization: Parallelization**

There are two types of dependencies:

*Intra-operation* dependency *Inter-operation* dependency



Sub-operations without dependency can execute in parallel



Sub-operations can *pre-execute* as soon as their data/address dependency is resolved



Address-dependent sub-operations can pre-execute as soon as the address of the write is available



Data-dependent sub-operations can pre-execute as soon as the data of the write is available



Both-dependent sub-operations can pre-execute as soon as the data and address of the write are available

## **Our Work: Janus**

Janus is a Roman god with *two faces*: One looks into the *past* and another into the *future* 

When dependent data and address become available

Past



Pre-execute operations with dependency resolved



# **Performance Improvement**

Backend operations Original writeback latency



Janus:

Parallelization



Serialized Parallelized

### Parallelization reduces the latency of each operation

# **Performance Improvement**

Backend operations Original writeback latency



Janus:

- Parallelization
- Pre-execution



Serialized Parallelized Pre-executed Pre-execution moves the latency off the critical path

### **Software-Hardware Co-design**

Automated software instrumentation for pre-execution

Find pre-execution opportunities based on *address/data* dependencies



### Software-Hardware Co-design

Janus hardware executes the instrumented program



# **Evaluation Methodology**

#### • Platform - Gem5 Simulation

Processor

L1 D/I, L2 cache

Backend memory operation cache Backend memory operation units Out-of-Order, 4GHz 64/32KB, 2MB per core (shared) 512KB per core for each operation (shared) 4 units per core

#### Design points

- **Baseline**: all backend operations are serialized
- Janus: pre-execute parallelized backend operations

# **Evaluation Methodology**

#### Storage-class workloads

| Array Swap | Randomly swap two locations in an array                              |
|------------|----------------------------------------------------------------------|
| Queue      | Randomly push/pop items to a queue                                   |
| Hash Table | Randomly insert key-values to a hash table                           |
| B-Tree     | Randomly insert key-values to a b-tree                               |
| RB-Tree    | Randomly insert key-values to a rb-tree                              |
| TATP       | Add items to a telecommunication table with the TATP input generator |
| TPCC       | Add items to a hash table with the TPCC input generator              |

### Performance



Janus provides 2X speedup on average

# Summary

System Stack for Persistent Memory

PM Applications

PM Library

Processor PM HW Support

Persistent Memory

#### **Software Support for Persistent Memory**



#### Hardware System for Persistent Memory



## **Future Directions**

- Adaption of PM into larger scale systems
  - How can datacenter-scale workloads better utilize persistent memory?
  - How can we redesign the networking system to better leverage the lower latency of PM?
- Integration of computation logic into PM
  - What computation logic can we place on PM to accelerate memoryintensive workloads?
- More security challenges of PM systems
  - How can we design software systems for PM that ensures existing security guarantees?

### Toward Failure Recoverable And Secured Persistent Memory Systems

IEEE Data & Storage Symposium 2022

#### Sihang Liu

University of Virginia (Current)

University of Waterloo (Joining in 2023 as an Assistant Professor)



June 9, 2022