

### **Smart Formal for Scalable Verification**

Ashish Darbari

Axiomise



1





## PACKET-BASED DESIGNS

What's the game here?





## Sequential Multi-packet Designs

- Units of data flow together in what we call as packets
  - Fixed number of packets
  - Variable number of packets
- Usual requirement is that packets must not be:
  - Reordered
  - Dropped
  - Modified in flight
- State of each packet depends on the state of all others ahead of it





#### **SoC** Verification





## Verification Challenge

- Directed testing is too directed
- Constrained random inherently incomplete FSMs challenging
- Formal verification capable of
  - Hunting deep corner case bugs
  - Build exhaustive proofs
- But formal without a good methodology doesn't scale





### Variable Packet Design

BUFFER DEPTH e.g., 8 DEEP







#### **Design Description**







### **Example Behaviour**







### **Verification Strategy**

- Build mechanisms to track data
- Provide any constraints or assumptions
- Write checks/assertions to establish "correctness always holds"
- Write cover properties to check certain behaviours can occur
- Ensure that you have not missed any bug in your test bench





## **Formal Verification Strategy**

- No sequence generation, stimulus is free
- Just constrain out the illegal stimulus
- Checking by formal tools is symbolic covering all combinations of 0s and 1s
- Track inputs going into the DUT and check if the expected ones come out
  - But, tracking is not done for every incoming value explicitly
  - Track one symbolic value you track all the values... yeah seems like magic!
  - Designated values that are tracked are called "watched values"











SYSTEMS INITIATIVE

#### **Smart Tracker in Action**

FIFO is EMPTY





#### First Write





#### Second Write





#### Third Write





#### Watched Data Appears on Input





#### **Three Elements Ahead of Watched Data**























# Adapting Smart Tracker





## Adapting Smart Tracker

- Tracking would have to be done for all packets, not just one!
- Challenge is that we have variable number of packets, not just fixed size!
- Watched data can be any packet between 0 and P-1 for a max of P packets!
- Checkers verify that the design works for all watched packets not just one!





## **Observing Sampling In**

```
always @ (posedge clk or negedge resetn)
if (!resetn)
       sample in started <= 1'b0;</pre>
else
       sample in started <= ready to start sampling in;</pre>
always @ (posedge clk or negedge resetn)
if (!resetn)
       sample in finished <= 1'b0;</pre>
else
       sample_in_finished <= ready_to_finish_sampling_in;</pre>
```





## **Observing Sampling Out**

```
always @ (posedge clk or negedge resetn)
if (!resetn)
       sample out started <= 1'b0;</pre>
else
       sample out started <= ready to start sampling out;</pre>
always @ (posedge clk or negedge resetn)
if (!resetn)
       sample out finished <= 1'b0;</pre>
else
       sample_out_finished <= ready_to_finish_sampling_out;</pre>
```





#### Ready to Sample?

assign ready\_to\_start\_sampling\_in = sample\_in\_started || sample\_in\_started\_c; assign ready\_to\_finish\_sampling\_in = sample\_in\_finished || sample\_in\_finished\_c; assign ready\_to\_start\_sampling\_out = sample\_out\_started || sample\_out\_started\_c; assign ready\_to\_finish\_sampling\_out = sample\_out\_finished || sample\_out\_finished\_c;

sample\_in\_started\_c: on an input handshake when the very first input data matches wd[0]
sample\_in\_finish\_c: if we already started and are going to finish and have an input handshake
sample\_out\_started\_c: start sampling out when tracking counter is 1
sample\_out\_finish\_c: stop sampling out if all packets would be be read out and we started to sample out





### **Tracking Counter**

always @(posedge clk or negedge resetn)

if (!resetn)

```
tracking_counter <= `h0;
else if
tracking_counter <= tracking_counter + incr - decr;</pre>
```

assign incr = hsk\_i && (pkt\_wptr[wptr]==cpkt\_len[wptr]) && !input\_sampled\_completely;

assign decr = hsk o && (pkt rptr[rptr]==0) && !sample out started;





## **Counting Output Packets**

always @ (posedge clk or negedge resetn)

if (!resetn)

```
counter_out <= `h0;</pre>
```

else if (input\_sampled\_completely && not\_sampled\_out\_completely)

```
counter_out <= counter_out + 1'b1;</pre>
```





### Master Ordering Checks

endgenerate

SYSTEMS INITIATIVE



0100001100000001101000 `**901010101001010**010110  $1001111100101010111^{-1}10010101000$ **`1010110010001001100** 01000110001100010 10010101010100100 0101 **BUG** 1001 1100101010010101010 .0100010101001010101 100100111001010101010101 101010100001010101 0100001110010011100. 01001010101010**~10101010** 1010101 001010110001010001110J1000111001010100





## **Catching Reordering Bug**

Carried out on an Intel i5 7300 with 6 GB RAM

| File | View Source | ce Trace    | Tools Window Help                                         | . 🕞 🔝        | •               |        | 10 <sup>42,248</sup> states |
|------|-------------|-------------|-----------------------------------------------------------|--------------|-----------------|--------|-----------------------------|
|      | 📝 💟 se      | Q 🕜 App     | mode: FPV 💌                                               |              |                 |        | ~141K flops                 |
| VCF: | GoalList    |             |                                                           |              |                 |        |                             |
|      | Time 12H    | Max Cycle ( | -1 <enter match="" name="" value=""></enter>              |              |                 |        |                             |
|      |             |             |                                                           | LL           | 512-deep buffer |        |                             |
|      | status 🔺    | depth       | name                                                      | elapsed_time | engine          | type   |                             |
| 1    | ×           | 5           | packet_design.u_pkt_design_sva.as_all_packets[0].but_last | 00:02:16     | bl              | assert | Up to 8 packets             |
| 2    | ×           | 7           | packet_design.u_pkt_design_sva.as_all_packets[1].but_last | 00:02:42     | bl              | assert | Packet size: 32 bit         |
| 3    | ×           | 9           | packet_design.u_pkt_design_sva.as_all_packets[2].but_last | 00:03:46     | b1              | assert |                             |
| 4    | ×           | 14          | packet_design.u_pkt_design_sva.as_all_packets[3].but_last | 00:06:49     | sl              | assert | Bug found in less than 40   |
| 5    | ×           | 16          | packet_design.u_pkt_design_sva.as_all_packets[4].but_last | 00:26:49     | el              | assert |                             |
| 6    | ×           | 16          | packet_design.u_pkt_design_sva.as_all_packets[5].but_last | 00:07:05     | s1              | assert | minutes                     |
| 7    | ×           | 18          | packet_design.u_pkt_design_sva.as_all_packets[6].but_last | 00:39:22     | el              | assert |                             |
| 8    | ×           | 4           | packet_design.u_pkt_design_sva.as_last_packet             | 00:01:16     | el              | assert |                             |





#### **Inconclusive Proofs**

| File View So | u <u>r</u> ce Trace | Tool <u>s W</u> indow <u>H</u> elp                                                                                     |                          |
|--------------|---------------------|------------------------------------------------------------------------------------------------------------------------|--------------------------|
|              |                     | npmode: EPV 🔽                                                                                                          | 10 <sup>38</sup> states! |
| VCF:GoalList |                     |                                                                                                                        | ~128 flops               |
| Time 12H     | Max Cycl            | e -1<br>Enter name Match Value>                                                                                        | 8-deep buffer            |
| status       | depth               | name                                                                                                                   | Up to 4-packets          |
| 1 🔺          | 22                  | packet_design.u_pkt_design_sva.as_all_packets[0].but_last                                                              | Packet size: 4 bit       |
|              |                     |                                                                                                                        |                          |
| 2 🔺          | 22                  | packet_design.u_pkt_design_sva.as_all_packets[1].but_last                                                              |                          |
| 2 🔺          | 22<br>23            | packet_design.u_pkt_design_sva.as_all_packets[1].but_last<br>packet_design.u_pkt_design_sva.as_all_packets[2].but_last | INCONCLUSIVE PROOF       |

Carried out on an Intel i5 7300 with 6 GB RAM





## What's the Biggest Design Provable in an Hour?

When rubber hits the ground





### **Increasing Buffer Depth**



---- 2 PACKETS WITH 1 BIT WIDE DATA





### **Increasing Number of Packets**







#### **Increasing Packet Size**



---- 2 DEEP BUFFER WITH 2 PACKETS











## **Proof Engineering**

Scalable formal verification = "Proof Engineering"

#### Invariants Assume Guarantee







## Invariants and Assume Guarantee

- Break the whole puzzle into smaller jigsaws
- Identify helper lemmas as individual components of jigsaw
- Identify how they fit together to complete the full puzzle
- PROVE helper lemmas then ASSUME them to prove other lemmas







#### Invariants Help in Proof Convergence









## Invariants and Assume Guarantee

L0: If **sample in started and sample out finished** then the tracking counter is zero.

- L1: If watched data input **has not** been sampled in then *tracking counter* is *less than or equal to* the difference between the write and the read pointers.
- L2: If watched data input **has been** sampled in but sampling out has not started then *the tracking counter is less than or equal to* the difference between the write and the read pointers.
- L3: If the **sampling out has not started** then the *very first watched data* that was sampled in the design must be residing at the location given by the function of read pointer and tracking counter.
- L4: If the **very first watched data value** has been seen at the output port then *the next data value* to be seen on the output will be the next watched data value sampled into the design.



### **Increasing Packets**











#### **Increasing Packet Size**



← 256 DEEP BUFFER WITH 8 PACKETS





## **Proof Engineering**

Scalable formal verification = "Proof Engineering"

> Invariants Assume Guarantee

#### Decomposition







## **Structural Decomposition**

#### Split wide vectors into groups of 8-bits Creates more properties to prove But each property itself is tractable

|    | status               | depth | name (C)                                                       |        | engine     | elapsed_time |
|----|----------------------|-------|----------------------------------------------------------------|--------|------------|--------------|
| 4  |                      |       | packet_design.u_pkt_design_sva.all_packets_bits[0].index_7_0   |        |            |              |
| 5  | <ul> <li></li> </ul> |       | packet_design.u_pkt_design_sva.all_packets_bits[1].index_15_8  |        | el         | 00:16:29     |
| 6  | · ·                  |       | packet_design.u_pkt_design_sva.all_packets_bits[1].index_23_16 |        | el         | 00:30:33     |
| 7  | · •                  |       | packet_design.u_pkt_design_sva.all_packets_bits[1].index_31_24 |        | e3         | 05:26:31     |
| 8  | 1                    | 1     | packet_design.u_pkt_design_sva.all_packets_bits[1].index_7_0   | _/     |            |              |
| 9  | ×                    |       | packet_design.u_pkt_design_sva.all_packets_bits[2].index_15_8  |        | e3         | 01:55:23     |
| 10 | ×                    |       | packet_design.u_pkt_design_sva.all_packets_bits[2].index_23_16 |        | e3         | 05:39:03     |
| 11 | ~                    |       | packet_design.u_pkt_design_sva.all_packets_bits[2].index_31_24 |        | e3         | 05:56:47     |
| 12 |                      |       | packet_design.u_pkt_design_sva.all_packets_bits[2].index_7_0   |        |            |              |
|    | status               | depth | name (C)                                                       | engine | elapsed_ti | me type      |
| 1  | <u> </u>             | 46    | packet_design.u_pkt_design_sva.all_packets_bits[2].index_7_0   | bl     | 02:02:0    | 9 assert     |
| 2  | ×                    |       | packet_design.u_pkt_design_sva.all_packets_bits[1].index_7_0   | el     | 00:48:2    | 4 assert     |
| 3  | ~                    |       | packet_design.u_pkt_design_sva.all_packets_bits[0].index_7_0   | el     | 00:40:1    | 8 assert     |
|    | status 🔦 depth       |       | name (C)                                                       | engine | elapsed_t  | ime type     |
| 1  | <ul> <li></li> </ul> |       | backet_design.u_pkt_design_sva.all_packets_bits[2].index_7_0   |        | 00:14:2    | 1 assert     |

10<sup>10,000</sup> states! ~32,768 flops

256-deep buffer Up to 4-packets Packet size: 32 bit

PROOF in 7 hours 54 min

Running out of compute power at this point!!





## **Scaling Compute Power**

- On a 128 GB, 18-core machine
- For a design with **500K+ flops** (10<sup>164,228</sup> states)
  - 256-deep, up to 64 packets, each packet being 32-bit
- Bug hunting
  - Random reordering bug can be found in under 20 min of run time
- Exhaustive Proof obtained in under 3 hours







92 sec to find corner case bugs!

Design flops126+ millionGates449+ millionProperty flops7.9+ millionCheck typeend-to-endCompile time35 minutes!Cut-pointingNoBlack-boxing:No

338 million flops, 1.1 billion gates, 100.2 seconds to find bugs!





SYSTEMS INITIATIVE



## Summary

- Verification of serial designs is a challenge
- What we need is: Methodology + Technology
- On a laptop we find:
  - bugs in designs with 141K flops in less than 40 min
  - proof in designs with 24K flops in less than 35 min
- On a server with 128GB RAM we find:
  - catch bugs in designs with 500k+ flops in less than 20 min
  - proof in 3 hours (256 deep, 64 packets, 32-bit packet size)
- There is no other verification paradigm quite like formal!

