

## High-Definition Routing Congestion Prediction for Large-Scale FPGAs

Mohamed Baker Alawieh<sup>1</sup>, Wuxi Li<sup>1</sup>, **Yibo Lin**<sup>2</sup>, Love Singhal<sup>3</sup>, Mahesh Iyer<sup>3</sup> and David Z. Pan<sup>1</sup> <sup>1</sup>ECE Department, University of Texas at Austin <sup>2</sup>CS Department, Peking University <sup>3</sup>Intel Corporation, USA

# **FPGA Routing Congestion Prediction**

#### -----

### Field Programmabe Gate Arrays

High Energy Efficiency Good Reprogrammability Rapidly Growing Capacity

## FPGA Placement

Has a significant impact on FPGA routing quality

## **Routability Aware**

Incorporates congestion prediction into the placement process



## **Congestion**

Primitive congestion prediction techniques have demonstrated significant impact on routing quality





## **Conventional Approaches**



#### RouteNet

Predicts congestion hotspot Design rule violation detection [Xie+, ICCAD'18]

## RUDY

Bounding box-based routing estimation Overestimates the routing demand [Spindler+, DATE'07]

#### **GAN-Based**

Predicts congestion based on placement Cannot handle industrial-size designs [Yu+, DAC'19]

#### Regression-based Prediction

Congestion prediction based on global routing info [Pui+, ICCAD'17]



## Conditional GANs for Image Translations [Isola+, CVPR 2017]

**CGANS** 

Conditional GANs



### Image Translation

CGANs can be used for the task

Apply domain transfer

Take image from one domain and generate output in another

During training, pairs of matched images are used

#### [cartoon credit: Gall, 18, dzone.com]

**GANS** 

Generative Adversarial Networks

# **GAN-based Congestion Estimation**

#### **Placement and Netlist Information**



## **፼**€Features

Uses VTR academic tool Works for small designs only

Netlist information is encoded using flying lines \*For a large design with over 700K nets

This representation becomes obsolete for large designs



**CGAN-Based Image Translation** 







#### Only 5K nets out of 700K shown

All 700K nets shown



*pix2pix* model [Isola+, CVPR 2017] Limited resolution 256x256 Cannot handle large-scale FPGAs

# **High-Definition Routing Prediction for Large FPGAs**

#### 🟊 GAN Model

*pix2pix* model [Isola+, CVPR 2017] Limited resolution 256x256 Cannot handle large-scale FPGAs



Virtex UltraScale+ VU19 has ~663K CLB slices

Use a high definition image translation model Handle resolution up to 4000x1000



Uses VTR academic tool Works for small designs only



Novel feature encoding for placement and netlist Use different channels of input image

## **Input Features Encoding**

## **Pin Density**

Reflects placement information Encoded on the blue channel



## Vertical Demand

Estimtes vertical routing demand Computed analogous to RUDY Encoded on green channel



#### Resulting **RGB** image



## Horizontal Demand

Estimtes vertical routing demand Computed analogous to RUDY Encoded on red channel



## **Output Features Encoding**

### Vertical Routing

Routing congestion along the vertical direction



## ← Horizontal Routing

Routing congestion along the horizontal direction

| - | -        | -     | - | _ | -    | - | _ | - | - | -      |   | -   | -    | - | - | -              |
|---|----------|-------|---|---|------|---|---|---|---|--------|---|-----|------|---|---|----------------|
|   |          | = 1   | - |   | =    |   | - | - |   |        |   |     |      |   | = |                |
|   |          |       | - |   | -    |   | - | - |   |        |   |     | -    |   | = |                |
|   |          |       | - | - |      |   | Ξ | - |   | -      | - |     |      | - |   |                |
|   |          |       |   |   |      |   | - |   |   | -      |   | T   | -    | - |   |                |
|   |          |       |   |   |      |   |   |   |   |        |   | ٩., |      |   |   |                |
|   | <u>.</u> | -     |   |   | -    |   |   |   |   |        |   | - 0 | 1    |   | - | _11            |
| - |          |       |   |   |      |   |   |   |   |        |   |     |      | - | - |                |
| - | 100      |       | - |   |      |   | - |   |   |        |   |     |      | - |   | -              |
|   |          |       | _ |   |      | - |   | - |   |        |   |     |      | - | - |                |
| - |          | 100.0 |   | - | 1.11 |   |   | - |   | 100.00 |   |     | 1.20 |   |   | and the second |

Resulting **RGB** image



Blue channel left empty

# **High Definition Image Translation**

#### pix2pixHD [Wang+, CVPR'18]

#### 🚵 Generator Design

Dual generator architecture For high resolution generation

Global Generator  $(G_1)$ : Performs the core translation Works at half desired resolution

Local Enhancer  $(G_2)$ : Generates high resolution images Fine-tunes details in the image



# **High Definition Image Translation**

#### pix2pixHD [Wang+, CVPR'18]

#### 📐 Generator Design

Dual generator architecture For high resolution generation



Three level discrimination

Global Generator  $(G_1)$ : Performs the core translation Works at half desired resolution

Local Enhancer  $(G_2)$ : Generates high resolution images Fine-tunes details in the image



# **High Definition Image Translation**

#### Generator Design

Dual generator architecture For high resolution generation

Global Generator  $(G_1)$ : Performs the core translation Works at half desired resolution

Local Enhancer  $(G_2)$ : Generates high resolution images Fine-tunes details in the image

### 🚳 Discriminator Design

Three level discrimination

GAN Loss Feature Mapping loss

$$\min_{G} \max_{D_1, D_2, D_3} \sum_{k=1, 2, 3} \mathcal{L}_{GAN}(G, D_k) + \lambda \mathcal{L}_{FM}(G, D_k)$$

$$\mathcal{L}_{GAN}(G, D_k) = \mathbb{E}_{x, y}[\log D_k(x, y)] + \mathbb{E}_x[\log (1 - D_k(x, G(x)))] \mathcal{L}_{FM}(G, D_k) = \mathbb{E}_{x, y} \sum_{i=1}^T ||D_k^{(i)}(x, y) - D_k^{(i)}(x, G(x))||_1$$

## **Experimental Setup**

| <b>Benchmark</b><br>ISPD 2016<br>Placement: elfPlace [Li+, ICCAD'19]<br>Routing: NCTU-GR [Liu+, TCAD'13] | Training Setup Train 12 different models 11 for train, 1 for test |                            |      |                                                                                            |                                                                        | <b>Evaluation Metrics</b><br>NRMS:<br>Normalized root mean square       |                                                                      |                                                |  |
|----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|----------------------------|------|--------------------------------------------------------------------------------------------|------------------------------------------------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------------------|------------------------------------------------|--|
| For each design:<br>200 placements are generated<br>Placements are routed<br>Congestion maps obtained    |                                                                   | asectn[anu+<br>d feratures |      | 55K<br>66K<br>170K<br>172K<br>174K<br>352K<br>355K<br>216K<br>366K<br>600K<br>363K<br>602K | #RAN<br>0<br>100<br>600<br>600<br>1000<br>1000<br>1000<br>1000<br>1000 | 0<br>100<br>500<br>500<br>EMID:<br>E <sup>600</sup><br>E <sup>600</sup> | 12<br>121<br>121<br>1281<br>1281<br>1281<br>1281<br>2541<br>mQy40g ( | larity index<br>distance<br>ixel distributions |  |
|                                                                                                          |                                                                   | Resources                  | 538K | 1075K                                                                                      | 1728                                                                   | 768                                                                     | N/A                                                                  |                                                |  |

## **Sample Results – FPGA 02**

#### RUDY ~ [Spindler+, DATE'07] pix2pix ~ [Yu+, DAC'19]\*



## **Sample Results – FPGA 08**

#### RUDY ~ [Spindler+, DATE'07] pix2pix ~ [Yu+, DAC'19]\*



## **Quantitative Comparison**



15

## **Model Application**

|                                  | Decian            | Full Routing Capacity |          |        |  |  |  |
|----------------------------------|-------------------|-----------------------|----------|--------|--|--|--|
| 🧮 In Placement                   | Design            | Rudy                  | Proposed | Imp    |  |  |  |
| Models were used for routability | FPGA-1            | 336117                | 336117   | 0.00%  |  |  |  |
| estimation within elfPlaceF      | FPGA-2            | 691618                | 691618   | 0.00%  |  |  |  |
| replacing RUDY                   | FPGA-3            | 3062734               | 3062734  | 0.00%  |  |  |  |
|                                  | FPGA-4            | 5550659               | 5551473  | -0.01% |  |  |  |
|                                  | FPGA-5            | 10538770              | 9797007  | 7.04%  |  |  |  |
|                                  | FPGA-6 5773333 57 | 5773333               | 0.00%    |        |  |  |  |
|                                  | FPGA-7            | 9182199               | 9163640  | 0.20%  |  |  |  |
|                                  | FPGA-8            | 9053192               | 9053192  | 0.00%  |  |  |  |
|                                  | FPGA-9            | 11641853              | 11635870 | 0.05%  |  |  |  |
|                                  | FPGA-10           | 5515319               | 5515319  | 0.00%  |  |  |  |
|                                  | FPGA-11           | 11777500              | 11757650 | 0.16%  |  |  |  |
|                                  | FPGA-12           | 6235694               | 6235694  | 0.00%  |  |  |  |

#### **FPGA-5** is the most congested design

## **Model Application**

|    |                                  | Design  | Full Ro  | outing Capa | acity  | Reduced Routing Capacity |          |        |  |
|----|----------------------------------|---------|----------|-------------|--------|--------------------------|----------|--------|--|
|    | 🧧 In Placement                   | Design  | Rudy     | Proposed    | Imp    | Rudy                     | Proposed | Imp    |  |
|    | Models were used for routability | FPGA-1  | 336117   | 336117      | 0.00%  | 336117                   | 336117   | 0.00%  |  |
|    | estimation within elfPlaceF      | FPGA-2  | 691618   | 691618      | 0.00%  | 691618                   | 691618   | 0.00%  |  |
|    | replacing RUDY                   | FPGA-3  | 3062734  | 3062734     | 0.00%  | 3062734                  | 3062734  | 0.00%  |  |
| RC | <b>UTED WL REDUCTION</b>         | FPGA-4  | 5550659  | 5551473     | -0.01% | 5557608                  | 5551473  | 0.11%  |  |
|    |                                  | FPGA-5  | 10538770 | 9797007     | 7.04%  | N/A                      | N/A      | N/A    |  |
|    |                                  | FPGA-6  | 5773333  | 5773333     | 0.00%  | 5777149                  | 5773333  | 0.07%  |  |
|    | Up to                            | FPGA-7  | 9182199  | 9163640     | 0.20%  | 9199730                  | 9163640  | 0.39%  |  |
|    | 7%                               | FPGA-8  | 9053192  | 9053192     | 0.00%  | 9055093                  | 9055093  | 0.00%  |  |
|    | 170                              | FPGA-9  | 11641853 | 11635870    | 0.05%  | 11652436                 | 11635870 | 0.14%  |  |
|    |                                  | FPGA-10 | 5515319  | 5515319     | 0.00%  | 5515319                  | 5515319  | 0.00%  |  |
|    |                                  | FPGA-11 | 11777500 | 11757650    | 0.16%  | 11877778                 | 11757650 | 1.01%  |  |
|    |                                  | FPGA-12 | 6235694  | 6235694     | 0.00%  | 6224962                  | 6235694  | -0.17% |  |
|    |                                  |         |          |             |        |                          |          |        |  |

#### **FPGA-5** is the most congested design

### **Conclusions**

- We propose an accurate FPGA routing congestion estimation framework based on high-definition image translation
- Our proposed approach demonstrate superior accuracy compared to state-of-the-art techniques
- Our proposed approach results in up to 7% reduction in routed wirelength

## **Future Work**

Further improve feature representation

- > Preserve original connectivity information in feature encoding
- Develop new placement algorithm built around such accurate congestion estimation
- Extend the application to ASIC