# Verification with multi-core parallel simulations: Have you found your sweet spot yet?

Design types and applications suitable for multi-core simulations

## Serial-scan chain and MBIST Design-For-Test (DFT) Simulations

|         | Design type                             | Single-core sim runtime | Number of cores | Speedup with Questa MC2 |
|---------|-----------------------------------------|-------------------------|-----------------|-------------------------|
| Design1 | SoC with MBIST test                     | 1 hour                  | 5               | 4.65x                   |
| Design2 | SoC with serial scan test               | 12 hours                | 6               | 3x                      |
| Design3 | Network SoC design with DFT             | 16 hours                | 4               | 3.8x                    |
| Design4 | SoC sub-block GLS with DFT              | 4 hours                 | 4               | 2.5x                    |
| Design5 | Serial scan ATPG test (with SDF timing) | 2 hours                 | 4               | 1.8x                    |

#### FPGA and FPGA Fabric Designs

|          | Design type         | Single-core<br>sim runtime | Number of cores | Speedup with<br>Questa MC2 |
|----------|---------------------|----------------------------|-----------------|----------------------------|
| Design6  | FPGA Fabric GLS     | -                          | 4               | 4.5x                       |
| Design7  | TDD LTE FPGA design | 7 hours                    | 4               | 2.3x                       |
| Design8  | FPGA VHDL           | 30 minutes                 | 4               | 1.9x                       |
| Design9  | SoC-FPGA Verilog    | 28 hours                   | 5               | 3x                         |
| Design10 | FPGA chip design    | 10 hours                   | 4               | 3.6x                       |

#### **Multi-core Processor, SoC or Server Designs**

|          | Design type                                                                | Single-core<br>sim runtime | Number of cores | Speedup with Questa MC2 |
|----------|----------------------------------------------------------------------------|----------------------------|-----------------|-------------------------|
| Design11 | Multi-core processor GLS (with SDF timing)                                 | 53 hours                   | 4               | 2.3x                    |
| Design12 | VHDL RTL SoC                                                               | 1 hour to 4.5<br>days      | 4               | 2.4x                    |
| Design13 | Microprocessor, Verilog RTL                                                | 1 hour                     | 5               | 3x                      |
| Design14 | High-bandwidth memory with DRAM cell matrix; stimuli in VECTOR file format | 5 hours                    | 3               | 1.3x                    |
| Design15 | ASIC Server design                                                         | _                          | 4               | 2.5x                    |

#### Image, Graphics and Video processing Designs

|          | Design type                             | Single-core sim runtime | Number of cores | Speedup with Questa MC2 |
|----------|-----------------------------------------|-------------------------|-----------------|-------------------------|
| Design16 | ASIC video processing design            | 6.25 hours              | 7               | 2.5x                    |
| Design17 | Graphics design                         | 7 hours                 | 4               | 2.3x                    |
| Design18 | Graphics design with debug logging      | 27 hours                | 4               | 2.7x                    |
| Design19 | Video Processing design                 | < 1 hour                | 4               | 1.2X                    |
| Design20 | Image processing design, Verilog<br>RTL | 2 hours                 | 5               | 2x                      |

# Designs with long simulation runtimes

|          | Design type                                             | Single-core<br>sim runtime | Number of cores |           | Speedup with Questa MC2 |
|----------|---------------------------------------------------------|----------------------------|-----------------|-----------|-------------------------|
| Design11 | Multi-core processor<br>GLS design (with<br>SDF timing) | 53 hours                   | 4               | 23 hours  | 2.3x                    |
| Design17 | Graphics design                                         | 7 hours                    | 4               | 3 hours   | 2.3x                    |
| Design10 | FPGA chip design                                        | 10 hours                   | 4               | < 3 hours | 3.6x                    |
| Design3  | Network SoC design<br>with DFT                          | 16 hours                   | 4               | ~4 hours  | 3.8x                    |
| Design2  | SoC with serial scan test                               | 12 hours                   | 6               | 4 hours   | 3x                      |

# Long debug turnaround times

|          | Design type         | Single-core sim runtime |   | Questa MC2<br>sim runtime | Speedup with<br>Questa MC2 |
|----------|---------------------|-------------------------|---|---------------------------|----------------------------|
| Design18 | Graphics design     | 27 hours                | 4 | 10 hours                  | 2.7X                       |
| Design7  | TDD LTE FPGA design | 7 hours                 | 4 | 3 hours                   | 2.3x                       |

# Design types and applications not suitable for multi-core simulations

Very heavy inter-partition communication overhead that offsets multi-core speedup

the design

Insufficient parallel and

balanced activity in scan

chains

Clean and balanced design

partitions with long

runtimes

Sequential nature of

simulation causes

overheads that defeat

multi-core gains

Multiple balanced

processing elements/cores,

highly parallel simulation

activity

Design cannot be

and long runtimes

Better re

through s

grid/resource utilization

throughput

Heavy testbe

#### Designs with heavy hierarchical references across the design

|                             | <b>→</b> | Design type              | Single-core sim runtime |   | Speedup with Questa MC2 |
|-----------------------------|----------|--------------------------|-------------------------|---|-------------------------|
| Produces parallel, balanced | Design21 | Scan chain test with SDF | 2.5 hours               | 4 | -14.5X                  |
| and distributed activity in |          |                          |                         |   |                         |

# Parallel-load Design-For-Test (DFT) Simulations

|          |          | Design type                                     | Single-core<br>sim runtime | Number<br>of cores | Speedup with Questa MC2 |
|----------|----------|-------------------------------------------------|----------------------------|--------------------|-------------------------|
| <b>→</b> | Design22 | Multi-media MBIST design with parallel load DFT | 51 hours                   | 4                  | 1.24X                   |
|          | Design23 | GLS SoC DFT                                     | 10.5 hours                 | 4                  | 1.3X                    |

### Designs with heavy serial simulation activity

| <b>→</b> |          | Design type      | Single-core<br>sim runtime | Number of cores | Speedup with Questa MC2 |
|----------|----------|------------------|----------------------------|-----------------|-------------------------|
|          | Design24 | SerDes           | < 10 minutes               | 3               | <b>1.O</b> X            |
|          | Design25 | OVM based design | 1 hour                     | 2               | -1.2X                   |
|          |          |                  |                            |                 |                         |

# Designs with heavy encrypted IP/blocks

| partitioned through encrypted region, hence       |          | Design type                                              | Single-core sim runtime | Number of cores | Speedup with Questa MC2 |
|---------------------------------------------------|----------|----------------------------------------------------------|-------------------------|-----------------|-------------------------|
| unbalanced partitions                             | Design26 | Design with encrypted 3 <sup>rd</sup> party IP           | 80 minutes              | 2               | 1.13x                   |
| Typically have multiple parallel processing cores | Design27 | VHDL image processing design with encrypted ASIC library | 3 hours                 | 4               | 1.27X                   |

# Short/bulk tests, and regression tests

| egression   | <b>→</b> | Design type     | Single-core<br>runtimes of<br>individual tests | Number<br>of cores | Speedup with<br>Questa MC2 |
|-------------|----------|-----------------|------------------------------------------------|--------------------|----------------------------|
| is achieved | Design19 | Video Processor | 7.5 – 21 minutes                               | 4                  | 1.2X                       |
| superior    |          |                 |                                                |                    |                            |

#### Complete simulations faster, Reduced latency

# Long simulations with unbalanced simulation activity

|               |          | Design type       | Single-core<br>sim runtime | Number of cores | Speedup with Questa MC2 |
|---------------|----------|-------------------|----------------------------|-----------------|-------------------------|
| ench or other | Design25 | OVM based design  | 1 hour                     | 2               | <b>-1.2</b> X           |
| ocks          | Design28 | Networking design | 77 minutes                 | 4               | 1.2X                    |
|               | Design29 | Video design      | 2.5 hours                  | 4               | -1.8x                   |

Improve simulation turnaround time for faster debug

Discussion about design qualification criteria available in reference paper:

"The Need for Speed: Understanding design factors that make multi-core parallel simulations efficient, Shobana Sudhakar and Rohit K Jain, DVCON 2013"

Rohit K Jain rohit\_jain@mentor.com



Shobana Sudhakar shobana\_sudhakar@mentor.com

