



MUNICH, GERMANY OCTOBER 15-16, 2024

## G-QED for Pre-Silicon Verification Saranyu Chattopadhyay<sup>\*</sup>, Keerthi Devarajegowda<sup>+</sup>, Mo Fadiheh<sup>\*</sup> Stanford University<sup>\*</sup>, Siemens EDA<sup>+</sup>







### Traditional Verification Still a Challenge



#### Short design-to-deployment



[Siemens Wilson Research 2022]



### <u>Generalized</u> Quick Error Detection



SYSTEMS INITIATIVE

### Agenda

1. What is Formal Verification?

2. G-QED for Designers

3. G-QED Checks Deep Dive

4. G-QED Demo







## What is Formal Verification?



#### Formal Verification



// signal values t ##2 check/rd\_data == dat == b



#### Formal Verification Vs Simulation



But - Even after extensive simulation, significant state space remains uncovered





#### Formal Verification Vs Simulation



proof, verifying extensive state space in the process





### Verification Challenges









# G-QED for Designers



#### Example Design Under Verification





#### G-QED for Designers



| like Instructions | like Software-visible states |
|-------------------|------------------------------|
|                   |                              |

| Action Set | Arch State Update                                | Output  |
|------------|--------------------------------------------------|---------|
| SetVal     | Accu = Data                                      |         |
| MAC        | Accu = Accu+Data <sub>0</sub> ×Data <sub>1</sub> |         |
| OutGen     |                                                  | F(Accu) |





#### G-QED for Designers







#### G-QED Checks: Sound & Complete





#### G-QED Checks: Sound & Complete



#### RB, SAC: routinely in industry







## G-QED Checks Deep Dive



#### Functional Consistency (FC) Concept



**Expected:** Output consistent irrespective of context





#### FC Check on Action OutGen()



SYSTEMS INITIATIVE

#### FC Implementation Problem

Difficult to implement **FC** this way

arch state @  $t_1 ==$  arch state @  $t_2$  $\rightarrow$  same Output

Hard to specify exact clock cycles  $t_1$  and  $t_2$ 





#### Hard to Specify Exact Clock Cycles







#### Gets Worse Generalizing for Any Design

| Arch state elements read over multiple cycles |                                       |  |
|-----------------------------------------------|---------------------------------------|--|
|                                               |                                       |  |
|                                               |                                       |  |
| Pood                                          | cuclos non contiguous                 |  |
| Reau                                          | cycles non-contiguous                 |  |
|                                               |                                       |  |
|                                               |                                       |  |
| Same arch sta                                 | te element read <b>multiple times</b> |  |
|                                               |                                       |  |
|                                               | (2024                                 |  |





#### Solution: G-QED



OUT<sub>i</sub> = f(IN<sub>i</sub>, arch state value used by IN<sub>i</sub>)





#### Solution: G-QED





#### Solution: G-QED



acce

SYSTEMS INITIATIVE



#### **Response Bound**







#### Response Bound Bug









### Single Action Correctness

- FC takes care of bugs related to sequence of instructions or inputs
- Single Action Correctness can be checked from a known reachable state
  - From *some* reachable state (e.g., reset state)









#### Single Action Correctness

- Single Action Correctness is straightforward to apply
- However, writing SVA (properties) for complex operations can still be challenging (ex: Floating Point Units)



### Single Action Correctness for Floating Point Units

#### **Challenges**

- Complex IEEE 754 floating-point specification
  - Arithmetic and comparison operations
  - Bfloat16, half, single, and double precision
  - Five rounding modes
  - Five exception flags
- IEEE 754 design and verification experts in short supply
- Simulation cannot guarantee compliance

| Floating-point essential for advanced |
|---------------------------------------|
| artificial intelligence applications  |

Multiply/add loops in convolution layer of a CNN

| <pre>for(r=0; r<r; pre="" r++)<=""></r;></pre>                              | //output feature map           |
|-----------------------------------------------------------------------------|--------------------------------|
| <pre>for (q=0; q<q; pre="" q++)<=""></q;></pre>                             | //input feature map            |
| <pre>for (m=0; m<m; m++)<="" pre=""></m;></pre>                             | //row in feature map           |
| <pre>for (n=0; n<n; n++)<="" pre=""></n;></pre>                             | //column in feature map        |
| <pre>for(k=0; k<k; k++)<="" pre=""></k;></pre>                              | //row in convolution kerenel   |
| for (1=0; 1 <l; 1++)<="" td=""><td>//column in convolution kernel</td></l;> | //column in convolution kernel |
| Y[r][m][n]+=W[r]                                                            | [q][k][l]*X[q][m+k][n+l];      |



IEEE 754 Half / Single / Double / ... Precision

#### **Only formal can prove compliance**





#### Questa FPU

#### Ensure FPU designs meet expectations

- Prove correctness with easy setup, without C++ model
- Supports broad array of FPU functionality
  - Half/single/double, bfloat16 and custom precisions
  - All 5 rounding modes and 5 exceptions flags
  - ADD, SUB, MUL, DIV, FMA, ABS, NEG functions
  - Conversion and comparison operations
- Parameters specify ambiguities in the standard
- Easy to model intended deviations from the IEEE-754 standard
- Enables RISC-V configuration







#### Built-in template assertion

- All standard-derived information part of template
  - User provides mapping of FPU DUT to SVA module
  - Design specifics like encoding of operations, modes, and concrete protocol added by user
  - Extraction supported by numerous covers to extract timing and encoding







### Built-in template assertion

<u>File Edit Tools Syntax Buffers Window H</u>elp

#### 🖻 🗔 🔚 占 [ 🥱 🎓 [ 💥 🗊 🖸 [ 🖄 🌳 🧇 [ 🖓 🚱 [ 🍕 📰 🥥 [ 🕱 🐯

| <pre>module fp_checker import ieee_754 single_precision::*; #(</pre>                                                                                                                                            |           |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| <pre>conf_t conf = '{UNF_BEFORE_ROUNDING: 1'b1, // set to 0 if tinyness is computed after rounding<br/>INV FMA QNAN: 1'b1, default:1'b1}, // set to 0 if fma of 0*Inf+NaN is not raising an INV exception</pre> |           |
| ieee_flags_t supported_flags = '{INX: 1'b1, // set to 0 to skip 'inexact' exception check                                                                                                                       |           |
| INV: 1'b1, // set to 0 to skip 'invalid' exception check                                                                                                                                                        |           |
| UNF: 1'b1, // set to 0 to skip 'underflow' exception check                                                                                                                                                      |           |
| OVF: 1'b1, // set to 0 to skip 'overflow' exception check<br>DIV0: 1'b1}) (// set to 0 to skip 'division by 0' exception check                                                                                  |           |
| input CLk,                                                                                                                                                                                                      |           |
| <pre>input reset_n,</pre>                                                                                                                                                                                       |           |
| (* OSS_TCL_VALUE_PRINTER = "ieee754_fp" *)                                                                                                                                                                      |           |
| input ieee_t op_a, // DUT source operands (* OSS TCL VALUE PRINTER = "ieee754 fp" *)                                                                                                                            |           |
| (* OSS_TCL_VALUE_PRINTER = "ieee754_fp" *)<br>input ieee t op b, // DUT source operands Can mix and match different pre                                                                                         | ecisions  |
| (* OSS_TCL_VALUE_PRINTER = "ieee754_fp" *)                                                                                                                                                                      |           |
| <pre>input ieee_t op_c='0, // DUT fma 3rd source operands</pre>                                                                                                                                                 |           |
| <pre>input start=1'b1, // DUT start of computation with source IEEE flags definition input[1:0] rmode='0, // rounding mode in DUT encoding</pre>                                                                |           |
| <pre>// TODO: adapt bit width of fpu operation encoding</pre>                                                                                                                                                   |           |
| <pre>input[2:0] fpu_op=0, // signal encoding fpu operation in DUT (please increase bitwidth if needed)</pre>                                                                                                    |           |
| (* OSS_TCL_VALUE_PRINTER = "ieee754_fp" *)                                                                                                                                                                      |           |
| <pre>input ieee_t result, // DUT result of computation input result valid, // DUT result valid</pre>                                                                                                            |           |
| sva/fp template.sv                                                                                                                                                                                              |           |
| // end of covers                                                                                                                                                                                                |           |
|                                                                                                                                                                                                                 |           |
| endmodule                                                                                                                                                                                                       |           |
| bind fpu fp checker #(                                                                                                                                                                                          |           |
| .supported flags('{INV:1'b0, default:1'b1})                                                                                                                                                                     |           |
| ) fp_checker (                                                                                                                                                                                                  |           |
| <pre>// leave yet unknown signals (like fpu_op, rmode or flags) unconnected and reverse-engineer with contained covers</pre>                                                                                    |           |
| .clk(clk_i),<br>.reset n(1'b1),                                                                                                                                                                                 |           |
| .start(start i), // DUT start of computation with source operands                                                                                                                                               |           |
| .fpu_op(fpu_op_i), // signal encoding fpu operation in DUT Connect to the design via bind s                                                                                                                     | statement |
| .op_a(opa_i), // DUT source operand a                                                                                                                                                                           |           |
| .op_b(opb_i), // DUT source operand b<br>.op c(), // DUT source operand c                                                                                                                                       |           |
| .rmode(rmode i), // rounding mode in DUT encoding                                                                                                                                                               |           |
| .result(output_o), // DUT result of computation                                                                                                                                                                 |           |
| .result_valid(ready_o),// DUT result valid                                                                                                                                                                      |           |
| .INX_flag(ine_o), // DUT 'inexact' flag<br>.OVF flag(overflow o), // DUT 'overflow' flag                                                                                                                        |           |
| .UNF flag(underflow o), // DUT 'underflow' flag (aka tiny)                                                                                                                                                      |           |
| .DIV0 flag(div zero o), // DUT 'division by 0' flag                                                                                                                                                             |           |
| .INV flag(); // DUT 'invalid' flag                                                                                                                                                                              |           |





#### Built-in template assertion





green: provided by App blue : provided by user





#### Assertions and cover properties







#### Questa FPU

#### Ensure FPU designs meet expectations

- FPU operations are tedious and difficult to verify using simulation
- Questa FPU solution has the building blocks to construct simple readable properties
- FPU App reduces verification from months down to days
- Provers and disprovers performance enables bug finding in minutes
- Full proof is possible on restricted cases
- Bounded proof is available for all cases
- FPU App provides comprehensive verification
  - Fully compliant external reference model increases verification confidence





## G-QED DEMO





Check: 
$$IN == IN_k \rightarrow OUT == OUT_k$$







## Questions

