

## Next Gen System Design and Verification for Transportation

David Aerne

Jacob Wiltgen

**Richard Pugh** 







### Changes in Automotive

#### Consumer demands are reshaping the industry





#### Impact on Tier1/Tier2 suppliers To realize level 4/5, design and verification must evolve

#### Intelligence moving to the edge







#### Challenges facing Tier1/Tier2 To realize level 4/5, design and verification must evolve

- Explosion in design complexity and size to address automotive market needs
- Continually changing algorithms and sensors
- Satisfying functional requirements and ensuring functional safety
- Volume of testing required to reach Level 5 autonomy
  - Millions of tests and billions of road miles
- System and Systems of Systems testing
  - V2X
  - Sense-Compute-Actuate







### **Tutorial Overview**

A workflow demonstrating the use of HLS and emulation in safety-critical application







## Using HLS to rapidly develop AI Algorithms





## Computer Vision/AI Application Challenges

Automotive and other "real-time" application especially challenging

- Computationally very expensive
  - Billions of operations/second
- High responsiveness required
  - High-bandwidth and low-latency
  - Real-time processing of data required
- ADAS solution required to be < 100W
- Continually evolving algorithms and sensors
- Each provider wants to add their "secret sauce"







### Convolutional Neural Networks: Training vs Inference (Embedded AI)

#### Training



 Compute intensive, very large datasets & memory, CPU/GPU farms, floating point required



 Uses data from trained network, end system often has real-time requirements, mapping to FPGA/ASIC and/or dedicated HW, fixed point, power often a consideration





accellera

SYSTEMS INITIATIVE

#### Next-generation CV Designs Require Parallelism

Fully connected layer

uses matrix

- Convolutional Neural Networks use lots of 2-d convolutional filters
  - Billions of multiply-accumulate operations per second
- Multiple convolutional layers
- Networks are constant evolving
  - Data rates, number of layers, image size, etc..





#### Numerous Hardware/Memory CNN Architectures



SYSTEMS INITIATIVE



#### What are the Choices for Hardware Platform? There is no clear winner today as this market is emerging

- CPU
  - Not fast or efficient enough
- **D**SP
  - Good at image processing but not enough performance for Deep AI
- GPU
  - Good at training but too power hungry for long term inference solution
- FPGA
  - Mostly meets performance/latency, not the lowest power, eventually cost for volume a problem, RTL flow not practical
- ASIC
  - Lowest power, meets performance/latency, lowest volume cost, high NRE and no field modifications/upgrades, Algorithms still changing, RTL flow not practical
- Dedicated AI and CV processors or accelerators in IP and ASIC
  - Popping up like weeds high performance, locks customer in, many server target
- Some scalable combination of the above



Flexibility





#### High-Level Synthesis – Design at a Higher Level

- HLS generates high quality RTL from C++/SystemC level descriptions
  - Micro-architecture exploration is accelerated
    - Parallelism, Throughput, Latency, Area (loop unrolling and pipelining)
    - Memories vs. Registers (resource allocation)
    - Integrated Power estimation
  - Library of IP components, e.g. math, DSP, video algorithms
  - ASIC and FPGA target support
- Verify at a higher level of abstraction (HLS-ready C++ source)
  - Perform Formal checks prior to synthesis
  - Simulate 50-1000x faster
  - Achieve Code and Functional coverage goals
- Post RTL generation verification and optimization
  - SLEC HLS to formally prove C++ model and RTL equivalency
  - Integrated Power Optimization









### Why HLS is So Much More Productive than RTL

• HLS separates functionality from implementation with powerful tool capabilities for controlling implementation

Functionality described in C++ or SystemC



HLS Tool Implementation Control

#### Automatically

- Builds concurrent RTL from C++ Classes
- Adds Interfaces and Infers memories
- Constraints drive architecture
- Constraints drive parallelism Unrolling
- Resource sharing for minimal area
- Schedules operations to close Timing
- Implements Power optimizations





## YOLO Tiny\* progressive refinement

- YOLO Tiny: Real-time object detection and classification CNN
- This YOLO Tiny demo is based on the Google *TensorFlow* open-source machine learning technology based in Python
- The intent of the demo is to show techniques for progressive refinement from high-level abstracted TensorFlow CNN layer models written in Python down to HLS-synthesized RTL, i.e.,

Original TensorFlow code  $\implies$  HLS ready C++ blocks  $\implies$  Synthesized RTL blocks





SYSTEMS INITIATIVE

## YOLO Tiny\*

- Real-time object detection and classification
  - Detects over 20 different objects
  - Yet even this "small" CNN is computationally intensive
  - Over 70 Billion MAC/s using over 25 million weights
- Made up of mostly 2-d convolution and pooling layers







SYSTEMS INITIATIVE



### Memory Architecture and Power Considerations

- Keeping data local is key to minimizing power consumption
  - Very important for ASIC
- Floating-point is costly
  - Used in training of networks
  - Not needed in inference engine
- Fixed-point doesn't need to be power-of-two





Energy numbers are from Mark Horowitz "Computing's Energy Problem (and what we can do about it)", ISSCC 2014





#### Memory Inference and O-time Back-Door Memory Accesses

- Large C++ arrays automatically mapped to ASIC or FPGA memories
- Arrays on the design interface can be synthesized as memory interfaces
- Internal arrays synthesized to instantiated (black boxed) memories

#### Catapult Architectural Constraints View



void top(int inputImg[1080][1920], int kernel0[3][3], int kernel1[3][3], int outputImg[1080][1920]){ int img[1080][1920];

conv2d(inputImg,kernel0,img); conv2d(img,kernel1,outputImg); }









### Precise Modeling of Bit-accuracy

- HLS uses exact bit-widths to meet specification and save power/area
  - bit-widths are not always pow2 (4, 8, 16, 32, 64 bits)
- Rapid simulation of true hardware behavior
- RTL is correct by construction
  - Precise consistency of representation and simulation results between C++ algorithm and synthesized RTL









## 2-D Convolution with Windowing IP







SYSTEMS INITIATIVE

### Catapult and Veloce Solve the Verification Bottleneck

- Quickly verify synthesizable HLS C++ and RTL in the Tensorflow environment
  - Test the quantized HLS against the floating point model in tensorflow
- Reduce RTL verification from hours to minutes



## YOLOTiny: Selected layers of TensorFlow testbench broken out to



## HLS-ready C++ implementation-targeted algorithms





SYSTEMS INITIATIVE

# YOLOTiny: C++ implementations of CNNs replaced with synthesized RTL blocks



1-stage breakout

9-stage breakout



## Easily Test HW Models and RTL Quickly

- Swap any layer or the entire design
  - HLS C++ executable or RTL running on Veloce is a python function call in tensorflow







#### Memory Access Bottleneck Changes Between Layers

- Some CNN Layers have millions of coefficients
  - Not possible to store everything locally for all layers
- Every layer is different
  - Weight vs feature map storage



#### YOLO Tiny CNN



### **Quickly Explore CNN Architectures Using HLS**

- HLS constraints allow architectural exploration
  - Massive parallelism is possible (if fed efficiently)
  - Evaluate PPA across multiple architectures
- Easily code multiple architectures in C++
  - Sliding-window architecture processes fmap data in raster order
  - In-place architecture reads weights once



#### YOLO Tiny

#### **Original Algorithm**

#### 



### Hybrid Architecture for Minimizing RAM Access

• Dual-layer architecture

Sliding-Window

Convolution /

Max Pooling

accellera

SYSTEMS INITIATIVE

- Based of feature map and weight storage requirements
- DRAM traffic is minimized
- Sliding-window architecture processes fmap data in raster order

. . . .

**Off-chip DRAM** 

- Weights stored locally in ROM
- Windowed fmap data provides local reuse

Sliding-Window

Convolution/

Max Pooling

- In-place architecture reads weights once
  - Caches weights locally for reuse

FIFO **•** 

AXI4 stream





SYSTEMS INITIATIVE

## Sliding Window Architecture (Layers 1-4)

- Reordering Loops Facilitates Loop Unrolling
- Use "sliding window" to store fmap data locally
- Small number of weights, store locally Original Algorithm

**Reordered Loops** 





## Sliding Window Architecture

- "windowed" fmap data and kernel data stored in registers
- Multiple output channel data can be computed in parallel







## In-place Architecture (Layers 5-9)

- Layers processed one after another
- Feature maps stored locally in SRAM
- Weights read from system memory





### Reordering Loops to Keep Feature Map Data Local

- Loops are organized so that weights are only read once from system DRAM
- Weights are held stationary across feature maps
- Feature maps are computed in order and stored in local SRAM

#### **Reordered Loops**







#### Flush out issues via formal analysis of HLS design source

- Quickly and easily find coding bugs and errors before synthesis or simulation
- Certain C++ language behavior is ambiguous for hardware
  - Lead to mismatches between
     C++ and RTL simulation
  - Difficult to debug
- Combination of static "lint" and formal based checks plus QoR checks, e.g.
  - Un-initialized memory read
  - Out of bounds reads and writes
  - Accumulator of native C type







### Achieve Coverage Closure on HLS design source

- Bringing RTL coverage to HLS
  - C++ and SystemC design source
- Match coverage concepts from RTL
  - Statement, Branch, FEC and Toggle
  - Functional Coverage including covergroups, coverpoints, bins, crosses
- Synthesis Aware Coverage
  - Function inlining/instances
  - Loop unrolling
- Coverage Data saved within UCDB
  - Test plan integration and merging
  - Track progress towards goals
  - Prevent Systematic Faults

| Coverage Summary By Instance ( 78.11% ) |                     |          |             |            |          |  |  |
|-----------------------------------------|---------------------|----------|-------------|------------|----------|--|--|
|                                         |                     |          |             |            |          |  |  |
| Page Size: 10                           |                     |          |             |            |          |  |  |
| Instance ↑                              |                     | Branches | Expressions | Statements | Total    |  |  |
| Search                                  | <b>T</b>            | Search 🔫 | Search 🔫    | Search 🝸   | Search 🔻 |  |  |
| Total                                   |                     | 80.48%   | 55.31%      | 98.54%     | 78.11%   |  |  |
| \simpleCNN::run_73_3                    |                     | -        | -           | 100%       | 100%     |  |  |
| \arb_re                                 | sp::run[with T = a  | 95.83%   | 100%        | 100%       | 98.61%   |  |  |
| \conv1                                  | ::run[with TfdataIn | 87.17%   | 0%          | 100%       | 62.39%   |  |  |
| \conv2::run[with TfdataIn               |                     | 87.17%   | 0%          | 100%       | 62.39%   |  |  |
|                                         |                     |          |             |            |          |  |  |
| Covergroups Coverage(62.5%)             |                     |          |             |            |          |  |  |
| Jovergroups coverage ( 02.5 % )         |                     |          |             |            |          |  |  |
|                                         |                     |          |             | E          |          |  |  |
|                                         |                     |          |             | La.        |          |  |  |
| Covergroups                             |                     | Bins     | Hits        | Goal       | Coverage |  |  |
| ■ main_23::MyCCoverGroup                |                     | 8        | 5           | 100        | 62.5%    |  |  |
| MyCCoverGroup_1_inst                    |                     | 8        | 5           | 0          | 62.5%    |  |  |





SYSTEMS INITIATIVE

## Quickly Implement CNN Architectures Using HLS

- Easily code multiple architectures in C++
  - Sliding-window architecture processes fmap data in raster order
  - In-place architecture reads weights once
- HLS constraints allow architectural exploration
  - Massive parallelism is possible
  - Evaluate power, performance, and area PPA across multiple architectures and microarchitectures

#### Reordered Loops (Layers 5-9)

256-parallel multipliers









# Meet power requirements via automatic power estimation and optimization

- Fast and accurate power estimation
  - Integrated solution
  - Power analysis and exploration
  - Guidance on how to reduce power
- Best Power Optimization
  - Gating of clocks, flops, memories and data
- Automatic flow
  - SLEC for formal verification
  - API Integration with emulation

| 👉 Start Page 🛛 🔄 Table 🕽 🔛 Flow Manager 🕽 🗺 power.rpt                     |                                                                                      |  |  |  |  |  |
|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------|--|--|--|--|--|
| 🛛 🗣 🏠 🌆 🏘 🌆 🥱 🤊 😋 🛛 Goto line                                             |                                                                                      |  |  |  |  |  |
| Power Report (uW)                                                         | Memory Register Combinational                                                        |  |  |  |  |  |
| <pre>pre_pwropt_default_Verilog    Static    Dynamic    Total inst0</pre> | 0.00 0.03 0.15<br><b>post_pwropt_default_Verilog</b><br>Static<br>Dynamic<br>Total   |  |  |  |  |  |
| Static<br>Dynamic<br>Total<br>ac_conv2d_v_post_pool_DTYPE_<br>Static      | <b>inst0</b><br>Static<br>Dynamic<br>Total                                           |  |  |  |  |  |
| Dynamic<br>Total                                                          | <pre>ac_conv2d_v_post_pool_DTYPE_8_8_64_run_inst    Static    Dynamic    Total</pre> |  |  |  |  |  |





#### Catapult HLS is the Only Solution for Rapid Algorithm to RTL

- Accelerate design time with higher level of abstraction
  - 5x less code than RTL
  - New features added in days not weeks
- Quickly evaluate power and performance of algorithms
  - Rapidly explore multiple options for optimal PPA
- Enable late functional changes without impacting schedule
  - Algorithms can be easily modified and regenerated
  - New technology nodes are easy













#### Catapult Enables Re-Use between FPGA and ASIC

- Enable designers to bring algorithms into high-speed HW/FPGA for fast Proof of Concept or demonstrator
- Key IP blocks reused from FPGA to ASIC to save months of redevelopment
  - Any ASIC library can be characterized to HLS
- Easy move between eFPGA and ASIC
- Same C code can be retargeted for different market/application within days



High speed FPGA and ASIC





### Summary

- Next generation CV/AI algorithms are massively complex
- Delivering optimized RTL with the best PPA on time is difficult
  - Achieving the most optimal architecture is hard to do in hand-code RTL
  - Going from CV/AI development platform to RTL is not well understood
  - Verifying the RTL is time consuming
    - Billions of computations
    - Massively parallel hardware
- Catapult provides a complete methodology from high-level model to PPA optimized and rapidly verified RTL
- <Catapult\_install>/shared/examples/ml/tinyYOLO\_v2





### Functional Safety workflow of a High Level Synthesis Design





#### What is Functional Safety?

Driving down risk of Electrical and Electronics malfunctioning due to failures

#### **Systematic Faults**



- Incomplete Specs
- Misinterpreted Specs
- Bad RTL
- HW/SW Interface Problems

#### Challenges

- Process & requirements
- IC complexity
- Exhaustive & efficient

### SYSTEMS INITIATIVE

#### Random Faults



- EMI
- Electro-migration
- Permanent or transient
- Latent

**ISO 26262** 

#### Challenges

- Manual -> automation
- Scale with IC complexity

#### **Malicious Faults**



- Encryption Vulnerabilities
- Denial of Service
- Untrusted IC
- Hardware Trojan



#### Functional Safety in an HLS Workflow



Mentor Safe IC Workflow





Systematic Faults

Random HW Faults



#### First Time Right Safe IC Workflow



Calculation of FMEDA metrics and providing early safety architectural guidance Creation of safe designs to mitigate the effects random hardware faults Proving design safeness that achieves target ASIL levels through fault campaigns





### Making YOLO Tiny Safe

- Tutorial Goals
  - Evaluate the safeness of the YOLO Tiny design (a.k.a Simple CNN design)
  - Achieve ASIL B metric targets through minimal design enhancement
  - Proof the design achieves ASIL B targets through a fault injection campaign
  - Create FMEDA work product required for certification
    - YOLO Tiny Sliding-Sliding-Sliding-In-place Window Window Window Convolution/ Convolution Convolution/ Convolution/ Max Pooling / Max Max Pooling Max Pooling Poolina Weights and results AXI4 stream **Off-chip DRAM**



- Assumptions
  - All logic in YOLO Tiny is safety critical
  - All single point faults must be protected by HW based safety mechanisms (no BIST)
  - ASIL B targets



#### **FMEDA Work Product**

• Evidence the design has sufficient protection from permanent and transient random HW failures



#### Simple CNN FMEDA Worksheet







### YOLO Tiny Architecture Overview

- First 4 stages are independent
- Stages 5 9 use shared logic and memory
- High degree of combinatorial logic due to Mult/Adds







#### First Time Right Safe IC Workflow



Calculation of FMEDA metrics and providing early safety architectural guidance

Creation of safe designs to mitigate the effects random hardware faults Proving design safeness that achieves target ASIL levels through fault campaigns





#### **FIT Computation**







#### **YOLO Tiny FIT Computation**

- Perform FIT calculation on each sub-block/filter
- Sub-blocks rolled up to compute total YOLO Tiny FIT



#### Simple CNN FMEDA Worksheet

| Component Name       | Safety<br>Related? | Violates<br>SG | Register<br>Count | Number of<br>Gates | PERM FIT<br>(λ) | TRAN FIT<br>(λ) |
|----------------------|--------------------|----------------|-------------------|--------------------|-----------------|-----------------|
| 2D_Conv_Stg1         | yes                | yes            | 21347             | 130027             | 0.15            | 0.25            |
| 2D_Conv_Stg2         | yes                | yes            | 5074              | 67626              | 0.09            | 0.59            |
| 2D_Conv_Stg3         | yes                | yes            | 4492              | 65585              | 0.09            | 0.84            |
| 2D_Conv_Stg4         | yes                | yes            | 4533              | 66051              | 0.09            | 2.06            |
| 2D_Conv_Stg5_to_stg9 | yes                | yes            | 60190             | 1105385            | 0.99            | 2.07            |



#### 1.93 FIT (Perm) 8.09 FIT (Trans)



Simplified FMEDA worksheet





acceller

SYSTEMS INITIATIVE

### Identifying the safety holes

- Gain understanding of underlying architecture and RTL that was created via ullet**HLS** synthesis
- Identify FIT contribution at the micro level to systematically guide safety ulletmechanism exploration



|                    | 2D_Conv_Stg | J2           |           |     |
|--------------------|-------------|--------------|-----------|-----|
|                    |             | S 🛃          |           |     |
|                    |             |              |           |     |
|                    | <b>a</b> 11 |              |           |     |
|                    |             | <b>1 1 1</b> |           |     |
|                    |             | Subset of    | contribut | orc |
|                    |             | Subset of    | contribut |     |
| Instance           |             | Perm %       | Trans %   |     |
| 2D_Conv_Stg2.inst0 |             | 9.06588      | 8.00971   |     |
| 2D_Conv_Stg2.inst1 |             | 64.2728      | 44.1694   |     |
| 2D_Conv_Stg2.inst2 |             | 26.4449      | 47.4349   |     |
|                    |             | 7            |           |     |

Large FIT contributors – Address first



### Safety Exploration

• Estimate achievable diagnostic coverage using design analysis to measure the effectiveness of proposed safety mechanisms





\_ \_ \_

Simple CNN -

My Polarion

System Administrator

**Q** Search

Home

With

memory

protection

accellera

SYSTEMS INITIATIVE

#### **Exploration Results**

 Exploration performed on sub blocks and rolled up to attain top level estimated Diagnostic Coverage

|            | Simple CNN FMEDA Worksheet          |                      |                    |                |                   |                    |                 |                 | Analysis<br>(Estimated)             | Low transient coverage due to missing SMs on memories |  |
|------------|-------------------------------------|----------------------|--------------------|----------------|-------------------|--------------------|-----------------|-----------------|-------------------------------------|-------------------------------------------------------|--|
| Without    | Simple CNN +                        | Component Name       | Safety<br>Related? | Violates<br>SG | Register<br>Count | Number of<br>Gates | PERM FIT<br>(λ) | TRAN FIT<br>(λ) | Primary SM DC (PERM)<br>Estimated % | Primary SM DC (TRAN)<br>Estimated %                   |  |
| memory     |                                     | 2D_Conv_Stg1         | yes                | yes            | 21347             | 130027             | 0.15            | 0.25            | 92                                  | 25                                                    |  |
| protection | System Administrator<br>My Polarion | 2D_Conv_Stg2         | yes                | yes            | 5074              | 67626              | 0.09            | 0.59            | 92                                  | 1                                                     |  |
|            |                                     | 2D_Conv_Stg3         | yes                | yes            | 4492              | 65585              | 0.09            | 0.84            | 90                                  | 8                                                     |  |
|            | Home                                | 2D_Conv_Stg4         | yes                | yes            | 4533              | 66051              | 0.09            | 2.06            | 87                                  | 3                                                     |  |
|            | <b>P</b>                            | 2D_Conv_Stg5_to_stg9 | yes                | yes            | 60190             | 1105385            | 0.99            | 2.07            | 92                                  | 31                                                    |  |

| Simple CN           | Simple CNN FMEDA Worksheet |                |                   |                    |                 | Analysis        |                                     |                                     |  |  |
|---------------------|----------------------------|----------------|-------------------|--------------------|-----------------|-----------------|-------------------------------------|-------------------------------------|--|--|
|                     |                            |                |                   |                    |                 |                 | (Estimated)                         | ·                                   |  |  |
| Component Name      | Safety<br>Related?         | Violates<br>SG | Register<br>Count | Number of<br>Gates | PERM FIT<br>(λ) | TRAN FIT<br>(λ) | Primary SM DC (PERM)<br>Estimated % | Primary SM DC (TRAN)<br>Estimated % |  |  |
| 2D_Conv_Stg1        | yes                        | yes            | 21347             | 130027             | 0.15            | 0.25            | 92                                  | 95                                  |  |  |
| 2D_Conv_Stg2        | yes                        | yes            | 5074              | 67626              | 0.09            | 0.59            | 92                                  | 97                                  |  |  |
| 2D_Conv_Stg3        | yes                        | yes            | 4492              | 65585              | 0.09            | 0.84            | 90                                  | 97                                  |  |  |
| 2D_Conv_Stg4        | yes                        | yes            | 4533              | 66051              | 0.09            | 2.06            | 87                                  | 98                                  |  |  |
| 2D_Conv_Stg5_to_stg | 9 yes                      | yes            | 60190             | 1105385            | 0.99            | 2.07            | 92                                  | 94                                  |  |  |

\_ \_ \_ \_ \_ \_ \_ \_ \_



#### First Time Right Safe IC Workflow



Calculation of FMEDA metrics and providing early safety architectural guidance

Creation of safe designs to mitigate the effects random hardware faults Proving design safeness that achieves target ASIL levels through fault campaigns





### Safety Mechanism Overview

- Wide variety of safety mechanism approaches available
- PPA requirements + use model + Safety target => the optimal safety mechanism



Lockstep duplication







Triple Modular Redundancy

**Endpoint Triplication** 







### Safety Enhancement of YOLO Tiny

- Automated safety mechanism insertion performed on sub blocks
- Safety Exploration recommended a mix of register parity and duplication to hit safety target
  - 2D filters were duplication heavy due to large fan-in cones









#### Memory ECC to protect from transients

• Memory ECC to increase the transient fault diagnostic coverage



Transistors inside memory model covered by safety mechanism.





#### First Time Right Safe IC Workflow



Calculation of FMEDA metrics and providing early safety architectural guidance

Creation of safe designs to mitigate the effects random hardware faults Proving design safeness that achieves target ASIL levels through fault campaigns





### Fault Campaign Overview

- Fault campaign consists of fault injection of permanent and transient faults ٠
- Goal: Classify all faults after reducing scope of fault campaign through fault ٠ optimization and collapsing







#### Fault List Optimization



| Module                 | DC % | Fault Count (before) | Fault Count (after) | % Reduction |  |  |  |  |
|------------------------|------|----------------------|---------------------|-------------|--|--|--|--|
| 2D_Conv_Stg1           | 92%  | 130660               | 106750              | 19%         |  |  |  |  |
| 2D_Conv_Stg2           | 90%  | 69712                | 49427               | 30%         |  |  |  |  |
| 2D_Conv_Stg3           | 92%  | 67845                | 47115               | 31%         |  |  |  |  |
| 2D_Conv_Stg4           | 87%  | 68316                | 47504               | 31%         |  |  |  |  |
| 2D_Conv_Stg5_to_Stg9   | 92%  | 1130651              | 616150              | 45%         |  |  |  |  |
| VOLO Tiny Ontimization |      |                      |                     |             |  |  |  |  |



YOLO Tiny Optimization



SYSTEMS INITIATIVE

### Fault Classification

• The goal of a fault campaign is to classify all faults in the fault list





### **Stimulus Optimization**

- Grade stimulus so only stimulus which has the potential for fault classification is used
- Filter stimulus to increase efficiency of fault injection simulation







#### **Concurrent Fault Simulation**

- Fault list can contain 10s of thousands of faults
  - Must execute faults in parallel to close on fault list
- Each fault executed concurrently in a single simulation
- Deviation checked against golden model







### Intelligent Fault injection

- Find the right point to inject a fault
- Ensure there is propagation potential
- Identification of high cost fault nodes







#### Fault Campaign Manager

- Parallelize window fault injection across compute cores
- Distribute jobs across machine grid







#### **YOLO Tiny Fault Classification**

| Fault Node                                                                                                                                                                                           | Fault Type | InjectTime | AlarmTim<br>e | Resolution | Stimulus |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|------------|---------------|------------|----------|
| ac_conv2d_coeff_store_DTYPE_FTYPE_FTYPE_DTYPE_208_208_16_32.inst2.ac<br>_accum_h_ac_fixed_45_21_true_AC_TRN_AC_WRAP_FTYPE_DTYPE_208_208_<br>16_32_run_inst.add_1274.n4                               | SAO        | 1040       | 1044          | Detected   | vsim.vcd |
| ac_conv2d_coeff_store_DTYPE_FTYPE_FTYPE_DTYPE_208_208_16_32.inst2.ac<br>_accum_h_ac_fixed_45_21_true_AC_TRN_AC_WRAP_FTYPE_DTYPE_208_208_<br>16_32_run_inst.add_1274.n4                               | SA1        | 1080       | 1084          | Detected   | vsim.vcd |
|                                                                                                                                                                                                      |            |            |               |            |          |
| ac_conv2d_coeff_store_DTYPE_FTYPE_FTYPE_DTYPE_208_208_16_32.inst2.ac<br>_accum_h_ac_fixed_45_21_true_AC_TRN_AC_WRAP_FTYPE_DTYPE_208_208_<br>16_32_run_inst.FMAP_OCHAN_outChan_5_2_lpi_3_dfm_2_0_1[0] | SA0        | 2010       | 2014          | Detected   | vsim.vcd |
| ac_conv2d_coeff_store_DTYPE_FTYPE_FTYPE_DTYPE_208_208_16_32.inst2.ac<br>_accum_h_ac_fixed_45_21_true_AC_TRN_AC_WRAP_FTYPE_DTYPE_208_208_<br>16_32_run_inst.FMAP_OCHAN_outChan_5_2_lpi_3_dfm_2_0_1[0] | SA1        | 2040       | 2044          | Detected   | vsim.vcd |

2D\_Conv\_Stg2 Fault Classification





#### **Certification Work Products**

• Close the loop linking estimated metrics to proven metrics





#### Summary

- Demonstrated a functional safety workflow on a real HLS design
- If you want to learn more, here is how Mentor<sup>®</sup> tools mapped to this workflow







#### **Break – 15 minutes**





### Emulation Hardware-in-the-Loop for System-of-Systems Verification







## A broad portfolio of solutions is needed for autonomous vehicle development



Ensuring digital continuity, multi-domain traceability and functional safety of autonomous systems







SYSTEMS INITIATIVE

## Electronic system of system challenges for AV verification and validation





# Self-driving technology requires massive verification cycles to reach safety for "Level 5"

#### "14.2 billion miles of testing is needed"

Akio Toyoda, CEO of Toyota Paris Auto Show 2016

#### "Design validation will be a major – if not the largest – cost component"

Roland Berger "Autonomous Driving" 2014

#### "While hardware innovations will deliver - software will remain a critical bottleneck"

McKinsey "When will the robots hit the road?









SYSTEMS INITIATIVE

# Multiple variants of the same scenario are part of the verification process



These scenarios and the multiple variants can be tested real-time when using a high-performance

computing environment



<u>Safety and Security</u> key challenges are addressed early in the design cycle by virtualizing the system

Reinvent the vehicle development processes to address:

Increasing software and hardware complexity

Massive validation and verification cycles

Growing number and variety of sensors

Reconciling agility with better traceability





# Hardware emulation is the ideal platform for system of systems verification and validation



- Emulators typically run 1000's times faster compared to a SW simulator running on general purpose computer
  - An emulator is a special purpose supercomputer for modeling digital integrated circuits
- An emulator includes a HW system, OS SW and SW applications
- Emulation technology enables new design and verification methodologies from chips to systems





## How a hardware emulator works...



- Virtual system for pre-silicon software verification
- Full visibility into hardware design for efficient debug
  - Cannot have with FPGA prototype



• Fault injection, monitoring, results analysis for safety-critical applications



## Veloce Automotive Solution — four pillars





# Functional safety – failure analysis & higher reliability



### **Veloce Fault App**

### **Run Fault Campaigns**

- Perform mission-critical safety circuit verification
- Analyze effectiveness of safety mechanisms in the design
- Mimic the effects of transient and hard faults on the design
- Targeting safety critical industries (automotive, aerospace, military)

Value: Optimize and accelerate Fault campaigns





## Digital twin technology



Reduce development times and increase quality while shortening time to market by <u>shifting left</u>

ifficiency More efficient and reliable software by providing highspeed <u>virtual platforms long before silicon</u>

> Supports <u>geographically dispersed teams</u> <u>collaborating</u> on pre-silicon development and pre- and post-silicon debug

> <u>Track progress</u> to requirements and schedule through incremental metrics for safety, security, power, performance and benchmarks pre-silicon





A complete autonomous vehicle verification and validation environment at system level







# Virtual testing of autonomous driving functions accelerates time to safety goals



- Prescan (TASS International)
  - World modeling and scenario building
    - Road sections, bridges, etc.
    - Trees, buildings, traffic signs
    - Cars, trucks, pedestrians
    - Weather conditions
  - Sensor model library
    - Camera
    - Radar
    - Lidar
    - Ultrasonic
    - Infrared
    - V2X
    - GPS





## High Performance Solution: PreScan with Veloce emulation

#### **PreScan generates virtual driving** scenarios and sensor data

SENSE

### AER Euro NCAP seet To relling at 50 kph approaching a vehicle at 50 kph decelerating at 6 m/s^2. initial distance between the vehicles is 12m. The AEBS will give a warning at 2.6s TTC re to collison), will apply 40h brake at 1.6s TTC and 100h brake at 0.9s TTC 800 rpm 0 % 50 1 %

Veloce

COMPUTE

Veloce verifies the most

complex chip designs

Verification of ADAS chips in the context of many different traffic scenarios

Full design visibility for comprehensive debug of SW and HW and SW/HW interactions

VALUE





# LMS Amesim Platform for multi-domain system simulation

• Unique capabilities to create real-time ready models preserving physical relevance



Highlighted algebraic loops

Performance analyzer - Frequency analysis of the chassis model





# Comprehensive verification to shift left the development cycle

# Integrated heterogeneous system of systems framework to simulate and verify multiple ECUs







Verification of System of Systems with Multiple ECUs using Digital Twin Virtual Environment



- Modeled using Vector CANoe
- Packet Exerciser & Monitor:
  - Get Different driving inputs
  - Display speed/RPM

### **Engine Control ECU**

- **PowerPC Virtual Platform (Mentor Vista)**
- AUTOSAR Stack + Application:
  - Reads Input Combination from Dashboard & gives Command/Pedal/Brake Angle to Transmission Controller ECU

### **Transmission Controller ECU**

- Hardware Model (Mentor Veloce Strato)
  - RISC-V + Memory Subsystem + CAN Controller Application: Reads Pedal Angle from Engine Controller ECU and Calculates RPM/Speed

### Brake System

Simcenter Amesim Co-simulation Slave FMU

#### ADAS Control ECU (5)

- Mentor Veloce HYCON
  - Read camera stream, detect object and supply object distance
- Simcenter Prescan
  - Perform evasive maneuver using distance information





**RPM/Speed** 

Vista Platform

**Brake System** (FMU)

Sincenter Amesim



### DEMO : Advanced Emergency Braking System







# Summary: Veloce Transforms Automotive Design

Secure rapid achievement of safety requirements

- Addresses complexity, size, and accurate timing needs of automotive electronic systems
  - Full visibility and debug for automotive designs
- Removes Risk associated with Safety Critical Systems
  - Functional verification for systematic failure analysis
  - Safety verification for random failure analysis
  - Fault tolerance and coverage
  - ISO 26262 compliance
- Digital Twin Functionality
  - Full ECU verification to shift left the development cycle
  - Delivering on time-to-market







# Conclusion





SYSTEMS INITIATIVE

# Mentor<sup>®</sup> HLS, Emulation and FuSa Workflow





# Siemens and Mentor provide comprehensive transportation solutions!







# **Questions?**

