Exploiting 2.5D/3D Heterogeneous Integration for AI Computing

References

Exploiting 2.5D/3D Heterogeneous Integration for AI Computing1

Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling2

End-to-End Benchmarking of Chiplet-Based In-Memory Computing3

Summary

HISIM4, a modeling and benchmarking tool for heterogeneous integration of chiplets by communicating through NoP5. Components: partitioning, mapping and placement; computing unit/processing unit; heterogeneous interconnection; network/routing engine; thermal analysis. technology roadmap6, power/latency prediction, thermal analysis for electro-thermal co-design and cycle-accurate simulation for design space exploration.

For 3D interconnection, simplified TSV model for RC extraction.

generalize technology roadmap and electro-modeling for 2.5D/3D interconnect, they are comparable with on-chip interconnect.

Novelties

  • It is the 1st to support hardware performance evaluation of chiplets architecture. handle monolithic, 2.5D and 3D architectures for AI models at the same time to compare the performance, optimize the configuration.
  • Integrate In-Memory-Computing for AI models.

Source Code

Placement:

place_1: tiers determined by tiles per tier int(math.sqrt(N_tile));

place_5: tiles per tier determined by num of tiers; mesh_edge=int(math.sqrt(N_tile))

VariableExplanationRelationships
computing_data[i][1]tiles for the ML layer
computing_data[i][9]specific chipletcomputing_data
computing_data[i][14]flops
placement71: calc tier, 2.5D8
5: calc tiles per tier❓9
placement1️⃣Tier Edge to Edge
2️⃣from the bottom to top tier1❓
3️⃣the hotspot far from each other
4️⃣put all hotspot in the same place
5️⃣tile-to-tile 3D connection
layer_start_tile
layer_end_tile10
1️⃣: tiles per tier given11
5️⃣:❓12
computing_data[i][9]
computing_data[i][1]13
layer_start_tile_tier
each_tile_activation_Q14computing_data[i][8]
computing_data[i][1]

tile_total15
1️⃣layer16
2️⃣tile array
3️⃣tile element
tile_index [[x,y,computing_data[i][9]]]
last item each_tile_activation_Q*3❓
empty_tile_totaldepends on placement
[[[x0,y0,tier], [x0,y1,tier],…]]
tile_index17
Q_3d_scatter18
layer_HOP_2d19
layer_HOP_3d20

Local/Global Routing
layer_Q21
Q_3d22
Q_2d22
one tile connects to all next level tiles
Local Routing
❓Q_3d+=(tile_total[i][-1][2])*(len(tile_total[i+1])-1)
\frac{Q_{tier}}{N_{layer}}\cdot N_{layer_next}

Global Routing
❓Q_3d23), Q_2d
each_tile_activation_Q
total_router_area24Orion25single_router_area1
edge_single_router
channel_width
area_2_5d❗️aib
edge length26
❗️layer_Q_2_5d27
layer_aib28
aib_out
L_booksim_2d29
L_booksim_3d30
L_2_5d31
Latecny from Booksim32
delay factors❗️33
aib_out[2]
power_summary_router
power
total_2d_channel_power
total_2d_router_power
total_tsv_channel_power
total_3d_router_power
Latecny from Booksim32
power_summary_router func34
tier_2d_hop_list_power
tier_3d_hop_list_power
tier_2d_hop_list35
tier_3d_hop_list
total_energy
energy_2d36
energy_3d37
energy_2_5d38
❓L_booksim_2d
❓L_booksim_3d
network_model.py
VariableExplanationRelationships
power_router
power_tsv
not placement 239
placement 240
not placement 241
placement 242
tier_2d_hop_list_power
tier_3d_hop_list_power
imc_size
r_size
area_single_tile
single_router_area
thermal 3Dget_unitsize()
create_cube()
load_power()
conductance_G()
solver()
thermal 2.5D
thermal_model.py

Components

ComponentState-of-the-ArtThis Work
Interconnectvv
ff
NetworkNoC: Booksim43, Orion44, Nirgram4546
3D NoC: Ratatoskr47
cycle-accurate link delay
CategoryParameterDescription
mono/2.5Dl_{wire}Length of wire
mono/2.5Dw_{wire}Width of wire
mono/2.5Dt_{wire}thickness of wire
mono/2.5Dp_{wire}pitch of wire
3Dd_{TSV}TSV metal diameter
3Dh_{TSV}TSV height
3Dp_{TSV}TSV pitch
3Dt_{ox_TSV}Thickness of TSV insulator
2D, 2.5D wire and 3D TSV interconnect parameters

TSV

TSV component lump circuit is based on the paper Modeling and Analysis of Through-Silicon Via (TSV) Noise Coupling and Suppression Using a Guard Ring48

For the TSV model in cylinder shape, 4 C_{\mathrm{TSV}} on horizontal directions in parallel, 2 R_{\mathrm{TSV}}, L_{\mathrm{TSV}}, M_{\mathrm{TSV}} on vertical directions in series. In Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling2, only resistance and capacitance are considered.

Resistance

\left\{\begin{matrix}<br>R_{\mathrm{TSV}}=\frac{1}{2} \frac{1}{\sigma_{\mathrm{TSV}}} \frac{h_{\text {unit }}}{\pi r_{\mathrm{TSV}}{ }^{2}}\text{ , if }\sigma_{\text{TSV}}\ge r_{\text{TSV}}\\ <br>R_{\mathrm{TSV}}=\frac{1}{2} \frac{1}{\sigma_{\mathrm{TSV}}} \frac{h_{\text {unit }}}{\pi\left(r_{\mathrm{TSV}}^{2}-\left(r_{\mathrm{TSV}}-\delta_{\mathrm{TSV}}\right)^{2}\right)}\text{ , if }\sigma_{\text{TSV}}\lt r_{\text{TSV}}<br>\end{matrix}\right.

Where skin depth is \delta_{\mathrm{TSV}}=\frac{1}{\sqrt{\pi f \mu \sigma_{\text{TSV}}}}[\mathrm{m}]

Self Inductance

L_{\mathrm{TSV}}=\frac{1}{2}\left(L\left(r_{\mathrm{TSV}}\right)-L\left(p_{\mathrm{TSV}-\mathrm{TSV}}\right)\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}}[H]
L(x)=\frac{\mu h_{\mathrm{TSV}}}{2 \pi}\left[\ln \left(\left(\frac{h_{\mathrm{TSV}}}{x}\right)+\sqrt{\left(\frac{h_{\mathrm{TSV}}}{x}\right)^{2}+1}\right)+\frac{x}{h_{\mathrm{TSV}}}-\sqrt{\left(\frac{x}{h_{\mathrm{TSV}}}\right)^{2}+1}\right]

Mutual Inductance

M_{\mathrm{TSV}}= \frac{1}{2}\left(L\left(2 r_{\mathrm{TSV}}+d_{\mathrm{TSV}-\mathrm{TSV}}\right)\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}} -\frac{1}{2} L\left(\sqrt{p_{\mathrm{TSV} \_\mathrm{TSV}}^{2}+\left(2 r_{\mathrm{TSV}}+d_{\mathrm{TSV} \_\mathrm{TSV}}\right)^{2}}\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}}
L(x)=\frac{\mu h_{\mathrm{TSV}}}{2 \pi}\left[\ln \left(\left(\frac{h_{\mathrm{TSV}}}{x}\right)+\sqrt{\left(\frac{h_{\mathrm{TSV}}}{x}\right)^{2}+1}\right)+\frac{x}{h_{\mathrm{TSV}}}-\sqrt{\left(\frac{x}{h_{\mathrm{TSV}}}\right)^{2}+1}\right]

Capacitance

C_{\mathrm{TSV}}=\frac{1}{4}\cdot \frac{2 \pi \varepsilon_{0} \varepsilon_{r, \text { ox\_TSV }} h_{\text {sub }}}{\ln \left(\frac{r_{\mathrm{TSV}}+t_{\text {ox\_TSV }}}{r_{\mathrm{TSV}}}\right)} \cdot\frac{h_{\text {unit }}}{h_{\text {sub }}}[F]
#Cu TSV skin depth at 1GHz
import math
freq=1e9
sigma=5.8e7
mu=4e-7*3.14159
TSV_skin_depth=1/(2*3.14159*freq*mu*sigma)**0.5
print("TSV_skin_depth=",TSV_skin_depth)
r_tsv=5e-6
d_tsv=2*r_tsv
h_tsv=100e-6
R_tsv=0.5*(1/TSV_skin_depth)*( (1/math.pi)*h_tsv*1/(r_tsv**2-(r_tsv-TSV_skin_depth)**2) )
print("R_tsv=",R_tsv)
Python

Resources

Heterogeneous Integration SIMulator Github Repo

  1. Z. Wang et al., “Exploiting 2.5D/3D Heterogeneous Integration for AI Computing,” 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Korea, Republic of, 2024, pp. 758-764, doi: 10.1109/ASP-DAC58780.2024.10473875. keywords: {Analytical models;Three-dimensional displays;Computational modeling;Wires;Multichip modules;Benchmark testing;Transformers;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Performance Analysis}📁[]
  2. Z. Wang et al., “Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling,” 2023 IEEE 15th International Conference on ASIC (ASICON), Nanjing, China, 2023, pp. 1-4, doi: 10.1109/ASICON58565.2023.10396377. keywords: {Analytical models;Three-dimensional displays;Computational modeling;Multichip modules;Benchmark testing;Data models;Artificial intelligence;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Electro-thermal Co-design}📁[][]
  3. G. Krishnan et al., ‘End-to-End Benchmarking of Chiplet-Based In-Memory Computing’, Neuromorphic Computing. IntechOpen, Nov. 15, 2023. doi: 10.5772/intechopen.111926.[]
  4. Hegeneous Integration Simulator[]
  5. Network on Package[]
  6. 2.5D and 3D based interconnect that is different from traditional ones[]
  7. 1 as vertical; 5 as zigzag[]
  8. tile arrangment of computing_data[]
  9. vertical arrangement of computing_data[]
  10. delta=computing_data[][1]-1[]
  11. check computing_data[i][9][]
  12. computing_data from 1 to N_tier, then N_tier+1, to 2*N_tier, meaning the csv in a vertical arrangment[]
  13. layer_end_tile-layer_start_tile+1[]
  14. computing_data[i][8]/computing_data[i][1][]
  15. [[[x0, y0, tier0],[x1, y0, tier0]],[[x4, y4, tier0],[x4, y5, tier0]]][]
  16. computing_data row[]
  17. [[x0, y0, tier]][]
  18. tile_total[i][-1][2]*num_tiles_this_layer/N_tile[]
  19. manhatton distance of current layer to next layer, append diff hop2d[]
  20. manhatton distance of current layer to next layer, append diff hop3d[]
  21. append[]
  22. sum[][]
  23. different tiers: tile_total[i][-1][2])*(len(tile_total[i+1])-1)/(N_tile*percent_router[]
  24. Orion, power_summary_router func, mesh_edge[]
  25. Wang, Hang-Sheng, Xinping Zhu, Li-Shiuan Peh, and Sharad Malik. “Orion: A power-performance simulator for interconnection networks.” In 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002.(MICRO-35). Proceedings., pp. 294-305. IEEE, 2002. 📁[]
  26. (edge_single_router+edge_single_tile)*mesh_edge[]
  27. layer_Q_2_5d[i]*1e-6/8[]
  28. using aib func[]
  29. [hop2d*(trc+tva+tsa+tst+tl)+(tenq)*[Q_2d/W2d]]/fclk_noc[]
  30. [hop3d*(trc+tva+tsa+tst+tl)+(tenq)*[Q_3d/W3d]]/fclk_noc[]
  31. aib_out[2][]
  32. Jiang, Nan, Daniel U. Becker, George Michelogiannakis, James Balfour, Brian Towles, David E. Shaw, John Kim, and William J. Dally. “A detailed and flexible cycle-accurate network-on-chip simulator.” In 2013 IEEE international symposium on performance analysis of systems and software (ISPASS), pp. 86-96. IEEE, 2013. 📁[][]
  33. trc, tva, tsa, tst,tl, tenq[]
  34. hop, bandwidth, delay, chiplet num, mesh edge[]
  35. average number of 2D hops per layer per tile within each tier.[]
  36. [total_2d_channel_power+total_2d_router_power]*L_booksim_2d*fclk_noc[]
  37. [total_tsv_channel_power+total_3d_router_power]*L_booksim_3d*fclk_noc[]
  38. total_2_5d_channel_power*L_2_5d*fclk_noc[]
  39. power_router[i] = tier_2d_hop_list_power[i][]
  40. power_router[len(tier_2d_hop_list_power)-i-1]=tier_2d_hop_list_power[i][]
  41. power_tsv[i] = tier_3d_hop_list_power[i][]
  42. power_tsv[len(tier_3d_hop_list_power)-i-1] = tier_3d_hop_list_power[i][]
  43. ✅cycle-accurate simulation, performance evaluation[]
  44. power, area models[]
  45. 🚫no latency report[]
  46. file[]
  47. ✅performance results for 3D routers and links through RTL🚫longer simulation time, no trace-based simulation[]
  48. J. Cho et al., “Modeling and Analysis of Through-Silicon Via (TSV) Noise Coupling and Suppression Using a Guard Ring,” in IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 1, no. 2, pp. 220-233, Feb. 2011, doi: 10.1109/TCPMT.2010.2101892.
    keywords: {Through-silicon vias;Noise;Couplings;Substrates;Integrated circuit modeling;Silicon;Mathematical model;Guard ring;measurement;noise coupling model;noise coupling suppression;noise isolation;noise transfer function;shielding structure;substrate noise;three-dimensional integrated circuit (3D-IC);through-silicon via (TSV)}
    📁[]

Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

🧭