# Development of Flexi Performance Analysis Platform for Multi–SoC Networking Systems

Srinivasan Reddy Devarajan

Anant Raj Gupta

Ingo Volkening

Franz Josef Schaefer

Vijay Raj Franklin Paul Rajan William

Intel Corporation





# Agenda

- Introduction
- Multi-SoC Platform
  - Platform Overview
  - Development of the Platform
  - Refinement of Models
- Results Overview
- Summary





### Introduction





Solid methodology for development of multi-SoC flexi performance analysis platform is CONFERENCE AND VERIELCA 3 mandatory for first pass silicon







#### Feedback to Designers



© Accellera Systems Initiative





\*not limited to above list

- Inter-SoC Interconnects
  - Chip level Interconnects are required to communicate between the various SoCs in the multi-SoC system. One of the widely used inter-SoC interconnects is PCIe



PCIe Abstract Performance Model

© Accellera Systems Initiative

- PCIe performance evaluation models are developed at transaction level model based on the SystemQ infrastructure.
- Different PCIe models are developed for different generations of PCIe based on the encoding/decoding techniques and overhead traffics.
- The number of lanes in the PCIe system and the outstanding slots availability on the ports connected to PCIe plays a huge role in driving the specifications of the SoC attached in the multi-SoC platform.



— The timing accuracy of these models are approximately timed models.



### Multi-SoC Platform – Refinement of Models

• Orthogonal Refinement of SystemQ Models



— Communication refinement is achieved by adding more communication features to the channels that connects the various queue-server pairs in the refined models.



### Multi-SoC Platform – Refinement of Models

- Micro-Architectural Refinement of SystemQ Model
  - Micro-architectural refinement is highly recommended for IPs/Subsystems with multiple data sources and sinks working with the SoC attached to through SoC interconnect.



Micro-Architectural Refinement of SystemQ Models

• What is achieved??

accelle

SYSTEMS INITIATIVE

**Buffer Optimization** 

**Optimization in Pending Transactions** 

Lane width Optimization



# Results

• Simulation sweep of critical performance parameters:

PCIe 2x vs PCIe 1x

| System Scenario                    | Up Stream<br>(Norm. Gbps) | Down Stream<br>(Norm. Gbps) | Latency<br>(Norm. ms) | Comments  |
|------------------------------------|---------------------------|-----------------------------|-----------------------|-----------|
| WAN Standalone on 2x<br>PCle lanes | 1                         | 4                           | 2                     | Meets KPI |
| WAN Standalone on 1x               | 4                         | 4                           | Ŷ                     | Does not  |
| PCle lanes                         | •                         | •                           |                       | meet KPI  |

x – denotes the multiplication factor (Example: x =2)

#### **Conclusions:**

Clearly PCIe 1x lanes doesn't meet the peak throughput rates

WAN  $\Leftrightarrow$  Network Processor needs PCIe 2x SoC Interconnect

#### **Sweep Parameters**

- Number of PCIe lanes
- Number of pending transactions
- Over load condition and others

#### Markers



Gbps ≥ Expected Rate (ER) Latency ≤ Expected Latency (EL)



ER < Gbps < 95% of ER EL < Latency < 105% of EL



95% of ER < Gbps < 90% of ER 105% of EL < Latency < 110% of EL



90% of ER < Gbps < 85% of ER 110% of EL < Latency < 115% of EL







# Results

Simulation sweep of critical performance parameters: ullet

| Up Stream<br>(Norm. Gbps) | Down Stream<br>(Norm. Gbps) | Latency<br>(Norm. ms)         | Comments                                                                                                                                                         |  |  |  |
|---------------------------|-----------------------------|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| ¢                         | F                           | 2                             | Meets KPI                                                                                                                                                        |  |  |  |
| 5                         | F                           | S.                            | Does not<br>meet KPI                                                                                                                                             |  |  |  |
| 4                         | ~                           | 27                            | Does not<br>meet KPI                                                                                                                                             |  |  |  |
| 8                         | 1                           | ŕ                             | Does not<br>meet KPI                                                                                                                                             |  |  |  |
|                           | (Norm. Gbps)<br>♠<br>Ø      | Norm.Gbps)(Norm.Gbps)♠♠♦●♦●♦● | (Norm. Gbps)(Norm. Gbps)(Norm. ms)♠♠♦♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥ </td |  |  |  |

Varving nPending on PCle 2x

x - denotes the multiplication factor (Example: x = 4)

#### **Conclusions:**

PCIe 2x lanes with 3x pending transactions on the interconnect ports barely meets the expected rate

WAN  $\Leftrightarrow$  Network Processor needs PCIe 2x lanes with 4x pending transactions on the interconnect ports

#### Sweep Parameters

- Number of PCIe lanes
- Number of pending transactions
- Over load condition and others

#### Markers



Gbps ≥ Expected Rate (ER) Latency  $\leq$  Expected Latency (EL)



ER < Gbps < 95% of ER EL < Latency < 105% of EL



95% of ER < Gbps < 90% of ER 105% of EL < Latency < 110% of EL



90% of ER < Gbps < 85% of ER 110% of EL < Latency < 115% of EL



Gbps  $\leq 85\%$  of ER Latency  $\geq$  115% of EL



# Results

#### Simulation sweep of critical performance parameters: •

| System Scenario                                  | Up Stream<br>(Norm. Gbps) | Down Stream<br>(Norm. Gbps) | Latency<br>(Norm. ms) | Comments             |
|--------------------------------------------------|---------------------------|-----------------------------|-----------------------|----------------------|
| WAN Standalone with<br>10% OverLoad              | ¢                         | ¢                           | 1                     | Meets KPI            |
| WAN Standalone with 20% OverLoad                 | ¢                         | ¢                           |                       | Meets KPI            |
| Usecase with all large packets                   | ¢                         | ¢                           | 5                     | Meets KPI            |
| Usecase with WAN short<br>and rest large packets | F                         | ¢                           | 3                     | Meets KPI            |
| WAN Standalone with<br>throttled US on PCIe 1x   | •                         | ⇒                           | ¢                     | Does not<br>meet KPI |

#### **Conclusions:**

Considering degrading latency trends for 20% Over load condition confirms the design is not over engineered

System Architecture handles customer usecase of multi-SoC system



Technical feedback for business opportunities on future systems

#### Sweep Parameters

- Number of PCIe lanes
- Number of pending transactions •
- Over load condition and others •

#### Markers



Gbps  $\geq$  Expected Rate (ER) Latency  $\leq$  Expected Latency (EL)





95% of ER < Gbps < 90% of ER 105% of EL < Latency < 110% of EL



90% of ER < Gbps < 85% of ER 110% of EL < Latency < 115% of EL



Gbps  $\leq 85\%$  of ER Latency  $\geq$  115% of EL



### Summary



#### Multi-SoC AnyWAN<sup>™</sup> Architecture

• Developed flexi performance analysis platform for multi-SoC system based on AnyWAN<sup>™</sup> Architecture

#### Optimised System Development

- Early design space exploration and developed the optimal realizable architecture for multi-SoC system
- Systematically explored architecture tradeoffs and design feedbacks on interconnect parameters in the multi-SoC system
- Design feedback on architectural & micro-architectural parameters of all the SoCs to meet the peak performance requirements

#### **Extendable Platform**

• Early technical feedback on architecture reusability to create business case for successive products





### Questions







### Back-up





- IPs/Subsystems
  - Transaction level model based on the SystemQ infrastructure is developed.
  - IP protocol and behavior is signed-off based on all the intra/inter dependency of the various events.



Basic Elements of an IP

- The timing accuracy of these models are approximately timed models.





- Traffic Source and Sinks
  - Capable to source and sink the traffic flow based on the configurations such as packet length, intervals of packet bursts, packet burst distribution, source and destination IDs, source and sink rates reporter/report interval etc.



• Time generators specify the time elapsed between two messages. The generalized exponential time distributed function is:

$$F(t) = 1 - e^{-\lambda t}$$
, where  $\mu = \frac{1}{\lambda} <= F^{-1}(x) = -\mu log(1-x) \forall 0 \le x < 1$ 

• Length generators models the length of the message to be generated everywhere the timer is lapsed and triggers to generate the message.

— The timing accuracy of these models are approximately timed models.



- Bridges (Abstractions/Protocols)
  - Components at different timing accuracy
    - Multi-SoC platform comprises of components working at different timing accuracy, accurate bridges are required to translate the data without any loss of generality from one component to the other.
  - Components at different Protocols
    - Components that work on different SystemQ message (packet information in the message) protocols to model the IP behavior in the SoC.
  - As the data flow traverses through multiple SoCs and triggers of data arrival causes events in other SoCs, care must be taken while developing the bridges to retain the most critical information of the data flow which will used at all stages.
  - The timing accuracy of these models are approximately timed models.



