References
Exploiting 2.5D/3D Heterogeneous Integration for AI Computing1
Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling2
End-to-End Benchmarking of Chiplet-Based In-Memory Computing3
Summary
HISIM4, a modeling and benchmarking tool for heterogeneous integration of chiplets by communicating through NoP5. Components: partitioning, mapping and placement; computing unit/processing unit; heterogeneous interconnection; network/routing engine; thermal analysis. technology roadmap6, power/latency prediction, thermal analysis for electro-thermal co-design and cycle-accurate simulation for design space exploration.
For 3D interconnection, simplified TSV model for RC extraction.
generalize technology roadmap and electro-modeling for 2.5D/3D interconnect, they are comparable with on-chip interconnect.
Novelties
- It is the 1st to support hardware performance evaluation of chiplets architecture. handle monolithic, 2.5D and 3D architectures for AI models at the same time to compare the performance, optimize the configuration.
- Integrate In-Memory-Computing for AI models.
Source Code
Placement:
place_1: tiers determined by tiles per tier int(math.sqrt(N_tile))
;
place_5: tiles per tier determined by num of tiers; mesh_edge=int(math.sqrt(N_tile))
Variable | Explanation | Relationships | |
---|---|---|---|
computing_data[i][1] | tiles for the ML layer | ||
computing_data[i][9] | specific chiplet | computing_data | |
computing_data[i][14] | flops | ||
placement7 | 1: calc tier, 2.5D8 5: calc tiles per tier❓9 | ||
placement | 1️⃣Tier Edge to Edge 2️⃣from the bottom to top tier1❓ 3️⃣the hotspot far from each other 4️⃣put all hotspot in the same place 5️⃣tile-to-tile 3D connection | ||
layer_start_tile layer_end_tile 10 | 1️⃣: tiles per tier given11 5️⃣:❓12 | computing_data[i][9] computing_data[i][1] 13layer_start_tile_tier | |
each_tile_activation_Q 14 | computing_data[i][8] computing_data[i][1] | ||
tile_total 15 | 1️⃣layer 162️⃣ tile array3️⃣ tile element | tile_index [[x,y,computing_data[i][9]]] last item each_tile_activation_Q *3❓ | |
empty_tile_total | depends on placement [[[x0,y0,tier], [x0,y1,tier],…]] | tile_index17 | |
Q_3d_scatter 18 | ❓ | ||
layer_HOP_2d 19layer_HOP_3d 20 | → ↑ | ||
Local/Global Routinglayer_Q 21Q_3d 22Q_2d 22one tile connects to all next level tiles | Local Routing ❓Q_3d+=(tile_total[i][-1][2])*(len(tile_total[i+1])-1) \frac{Q_{tier}}{N_{layer}}\cdot N_{layer_next} ❓ Global Routing ❓Q_3d23), Q_2d | each_tile_activation_Q | |
total_router_area 24 | Orion25 | single_router_area1 edge_single_router channel_width | |
area_2_5d ❗️ | aib edge length26 ❗️ layer_Q_2_5d 27 | layer_aib 28aib_out | |
L_booksim_2d29 L_booksim_3d30 L_2_5d 31 | Latecny from Booksim32 delay factors❗️33 | aib_out[2] power_summary_router | |
powertotal_2d_channel_power total_2d_router_power total_tsv_channel_power total_3d_router_power | Latecny from Booksim32power_summary_router func34 | ||
tier_2d_hop_list_power ❓tier_3d_hop_list_power ❓tier_2d_hop_list ❓35tier_3d_hop_list ❓ | |||
total_energy energy_2d 36energy_3d 37energy_2_5d 38 | ❓L_booksim_2d ❓L_booksim_3d | ||
Variable | Explanation | Relationships | |
---|---|---|---|
power_router power_tsv | not placement 239 placement 240 not placement 241 placement 242 | tier_2d_hop_list_power tier_3d_hop_list_power | |
imc_size r_size | area_single_tile single_router_area | ||
thermal 3D | get_unitsize() create_cube() load_power() conductance_G() solver() | ||
thermal 2.5D | |||
Components
Component | State-of-the-Art | This Work | ||
Interconnect | vv ff | |||
Network | NoC: Booksim43, Orion44, Nirgram4546 3D NoC: Ratatoskr47 | cycle-accurate link delay | ||
Category | Parameter | Description |
mono/2.5D | l_{wire} | Length of wire |
mono/2.5D | w_{wire} | Width of wire |
mono/2.5D | t_{wire} | thickness of wire |
mono/2.5D | p_{wire} | pitch of wire |
3D | d_{TSV} | TSV metal diameter |
3D | h_{TSV} | TSV height |
3D | p_{TSV} | TSV pitch |
3D | t_{ox_TSV} | Thickness of TSV insulator |
TSV
TSV component lump circuit is based on the paper Modeling and Analysis of Through-Silicon Via (TSV) Noise Coupling and Suppression Using a Guard Ring48
For the TSV model in cylinder shape, 4 C_{\mathrm{TSV}} on horizontal directions in parallel, 2 R_{\mathrm{TSV}}, L_{\mathrm{TSV}}, M_{\mathrm{TSV}} on vertical directions in series. In Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling2, only resistance and capacitance are considered.
Resistance
\left\{\begin{matrix}<br>R_{\mathrm{TSV}}=\frac{1}{2} \frac{1}{\sigma_{\mathrm{TSV}}} \frac{h_{\text {unit }}}{\pi r_{\mathrm{TSV}}{ }^{2}}\text{ , if }\sigma_{\text{TSV}}\ge r_{\text{TSV}}\\ <br>R_{\mathrm{TSV}}=\frac{1}{2} \frac{1}{\sigma_{\mathrm{TSV}}} \frac{h_{\text {unit }}}{\pi\left(r_{\mathrm{TSV}}^{2}-\left(r_{\mathrm{TSV}}-\delta_{\mathrm{TSV}}\right)^{2}\right)}\text{ , if }\sigma_{\text{TSV}}\lt r_{\text{TSV}}<br>\end{matrix}\right.
Where skin depth is \delta_{\mathrm{TSV}}=\frac{1}{\sqrt{\pi f \mu \sigma_{\text{TSV}}}}[\mathrm{m}]
Self Inductance
L_{\mathrm{TSV}}=\frac{1}{2}\left(L\left(r_{\mathrm{TSV}}\right)-L\left(p_{\mathrm{TSV}-\mathrm{TSV}}\right)\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}}[H]
L(x)=\frac{\mu h_{\mathrm{TSV}}}{2 \pi}\left[\ln \left(\left(\frac{h_{\mathrm{TSV}}}{x}\right)+\sqrt{\left(\frac{h_{\mathrm{TSV}}}{x}\right)^{2}+1}\right)+\frac{x}{h_{\mathrm{TSV}}}-\sqrt{\left(\frac{x}{h_{\mathrm{TSV}}}\right)^{2}+1}\right]
Mutual Inductance
M_{\mathrm{TSV}}= \frac{1}{2}\left(L\left(2 r_{\mathrm{TSV}}+d_{\mathrm{TSV}-\mathrm{TSV}}\right)\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}} -\frac{1}{2} L\left(\sqrt{p_{\mathrm{TSV} \_\mathrm{TSV}}^{2}+\left(2 r_{\mathrm{TSV}}+d_{\mathrm{TSV} \_\mathrm{TSV}}\right)^{2}}\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}}
L(x)=\frac{\mu h_{\mathrm{TSV}}}{2 \pi}\left[\ln \left(\left(\frac{h_{\mathrm{TSV}}}{x}\right)+\sqrt{\left(\frac{h_{\mathrm{TSV}}}{x}\right)^{2}+1}\right)+\frac{x}{h_{\mathrm{TSV}}}-\sqrt{\left(\frac{x}{h_{\mathrm{TSV}}}\right)^{2}+1}\right]
Capacitance
C_{\mathrm{TSV}}=\frac{1}{4}\cdot \frac{2 \pi \varepsilon_{0} \varepsilon_{r, \text { ox\_TSV }} h_{\text {sub }}}{\ln \left(\frac{r_{\mathrm{TSV}}+t_{\text {ox\_TSV }}}{r_{\mathrm{TSV}}}\right)} \cdot\frac{h_{\text {unit }}}{h_{\text {sub }}}[F]
#Cu TSV skin depth at 1GHz
import math
freq=1e9
sigma=5.8e7
mu=4e-7*3.14159
TSV_skin_depth=1/(2*3.14159*freq*mu*sigma)**0.5
print("TSV_skin_depth=",TSV_skin_depth)
r_tsv=5e-6
d_tsv=2*r_tsv
h_tsv=100e-6
R_tsv=0.5*(1/TSV_skin_depth)*( (1/math.pi)*h_tsv*1/(r_tsv**2-(r_tsv-TSV_skin_depth)**2) )
print("R_tsv=",R_tsv)
PythonResources
Heterogeneous Integration SIMulator Github Repo
- Z. Wang et al., “Exploiting 2.5D/3D Heterogeneous Integration for AI Computing,” 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Korea, Republic of, 2024, pp. 758-764, doi: 10.1109/ASP-DAC58780.2024.10473875. keywords: {Analytical models;Three-dimensional displays;Computational modeling;Wires;Multichip modules;Benchmark testing;Transformers;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Performance Analysis}📁[↩]
- Z. Wang et al., “Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling,” 2023 IEEE 15th International Conference on ASIC (ASICON), Nanjing, China, 2023, pp. 1-4, doi: 10.1109/ASICON58565.2023.10396377. keywords: {Analytical models;Three-dimensional displays;Computational modeling;Multichip modules;Benchmark testing;Data models;Artificial intelligence;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Electro-thermal Co-design}📁[↩][↩]
- G. Krishnan et al., ‘End-to-End Benchmarking of Chiplet-Based In-Memory Computing’, Neuromorphic Computing. IntechOpen, Nov. 15, 2023. doi: 10.5772/intechopen.111926.[↩]
- Hegeneous Integration Simulator[↩]
- Network on Package[↩]
- 2.5D and 3D based interconnect that is different from traditional ones[↩]
- 1 as vertical; 5 as zigzag[↩]
- tile arrangment of computing_data[↩]
- vertical arrangement of computing_data[↩]
delta=computing_data[][1]-1
[↩]- check
computing_data[i][9]
[↩] - computing_data from 1 to N_tier, then N_tier+1, to 2*N_tier, meaning the csv in a vertical arrangment[↩]
layer_end_tile-layer_start_tile+1
[↩]computing_data[i][8]/computing_data[i][1]
[↩]- [[[x0, y0, tier0],[x1, y0, tier0]],[[x4, y4, tier0],[x4, y5, tier0]]][↩]
computing_data
row[↩]- [[x0, y0, tier]][↩]
- tile_total[i][-1][2]*num_tiles_this_layer/N_tile[↩]
- manhatton distance of current layer to next layer, append diff
hop2d
[↩] - manhatton distance of current layer to next layer, append diff
hop3d
[↩] - append[↩]
- sum[↩][↩]
- different tiers: tile_total[i][-1][2])*(len(tile_total[i+1])-1)/(N_tile*percent_router[↩]
- Orion,
power_summary_router
func,mesh_edge
[↩] - Wang, Hang-Sheng, Xinping Zhu, Li-Shiuan Peh, and Sharad Malik. “Orion: A power-performance simulator for interconnection networks.” In 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002.(MICRO-35). Proceedings., pp. 294-305. IEEE, 2002. 📁[↩]
(edge_single_router+edge_single_tile)*mesh_edge
[↩]layer_Q_2_5d[i]*1e-6/8
[↩]- using
aib
func[↩] [hop2d*(trc+tva+tsa+tst+tl)+(tenq)*[Q_2d/W2d]]/fclk_noc
[↩][hop3d*(trc+tva+tsa+tst+tl)+(tenq)*[Q_3d/W3d]]/fclk_noc
[↩]aib_out[2]
[↩]- Jiang, Nan, Daniel U. Becker, George Michelogiannakis, James Balfour, Brian Towles, David E. Shaw, John Kim, and William J. Dally. “A detailed and flexible cycle-accurate network-on-chip simulator.” In 2013 IEEE international symposium on performance analysis of systems and software (ISPASS), pp. 86-96. IEEE, 2013. 📁[↩][↩]
trc, tva, tsa, tst,tl, tenq
[↩]- hop, bandwidth, delay, chiplet num, mesh edge[↩]
- average number of 2D hops per layer per tile within each tier.[↩]
[total_2d_channel_power+total_2d_router_power]*L_booksim_2d*fclk_noc
[↩][total_tsv_channel_power+total_3d_router_power]*L_booksim_3d*fclk_noc
[↩]total_2_5d_channel_power*L_2_5d*fclk_noc
[↩]power_router[i] = tier_2d_hop_list_power[i]
[↩]power_router[len(tier_2d_hop_list_power)-i-1]=tier_2d_hop_list_power[i]
[↩]power_tsv[i] = tier_3d_hop_list_power[i]
[↩]power_tsv[len(tier_3d_hop_list_power)-i-1] = tier_3d_hop_list_power[i]
[↩]- ✅cycle-accurate simulation, performance evaluation[↩]
- power, area models[↩]
- 🚫no latency report[↩]
- file[↩]
- ✅performance results for 3D routers and links through RTL🚫longer simulation time, no trace-based simulation[↩]
- J. Cho et al., “Modeling and Analysis of Through-Silicon Via (TSV) Noise Coupling and Suppression Using a Guard Ring,” in IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 1, no. 2, pp. 220-233, Feb. 2011, doi: 10.1109/TCPMT.2010.2101892.
keywords: {Through-silicon vias;Noise;Couplings;Substrates;Integrated circuit modeling;Silicon;Mathematical model;Guard ring;measurement;noise coupling model;noise coupling suppression;noise isolation;noise transfer function;shielding structure;substrate noise;three-dimensional integrated circuit (3D-IC);through-silicon via (TSV)} 📁[↩]
Leave a Reply