Exploiting 2.5D/3D Heterogeneous Integration for AI Computing

References

Exploiting 2.5D/3D Heterogeneous Integration for AI Computing1

Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling2

End-to-End Benchmarking of Chiplet-Based In-Memory Computing3

Summary

HISIM4, a modeling and benchmarking tool for heterogeneous integration of chiplets by communicating through NoP5. Components: partitioning, mapping and placement; computing unit/processing unit; heterogeneous interconnection; network/routing engine; thermal analysis. technology roadmap6, power/latency prediction, thermal analysis for electro-thermal co-design and cycle-accurate simulation for design space exploration.

For 3D interconnection, simplified TSV model for RC extraction.

generalize technology roadmap and electro-modeling for 2.5D/3D interconnect, they are comparable with on-chip interconnect.

Novelties

  • It is the 1st to support hardware performance evaluation of chiplets architecture. handle monolithic, 2.5D and 3D architectures for AI models at the same time to compare the performance, optimize the configuration.
  • Integrate In-Memory-Computing for AI models.

Source Code

Placement:

place_1: tiers determined by tiles per tier int(math.sqrt(N_tile));

place_5: tiles per tier determined by num of tiers; mesh_edge=int(math.sqrt(N_tile))

VariableExplanationRelationships
computing_data[i][1]tiles for the ML layer
computing_data[i][9]specific chipletcomputing_data
computing_data[i][14]flops
placement71: calc tier, 2.5D8
5: calc tiles per tier❓9
placement1️⃣Tier Edge to Edge
2️⃣from the bottom to top tier1❓
3️⃣the hotspot far from each other
4️⃣put all hotspot in the same place
5️⃣tile-to-tile 3D connection
layer_start_tile
layer_end_tile10
1️⃣: tiles per tier given11
5️⃣:❓12
computing_data[i][9]
computing_data[i][1]13
layer_start_tile_tier
each_tile_activation_Q14computing_data[i][8]
computing_data[i][1]

tile_total15
1️⃣layer16
2️⃣tile array
3️⃣tile element
tile_index [[x,y,computing_data[i][9]]]
last item each_tile_activation_Q*3❓
empty_tile_totaldepends on placement
[[[x0,y0,tier], [x0,y1,tier],…]]
tile_index17
Q_3d_scatter18❓
layer_HOP_2d19
layer_HOP_3d20

Local/Global Routing
layer_Q21
Q_3d22
Q_2d22
one tile connects to all next level tiles
Local Routing
❓Q_3d+=(tile_total[i][-1][2])*(len(tile_total[i+1])-1)
QtierNlayerNlayernext
❓
Global Routing
❓Q_3d23), Q_2d
each_tile_activation_Q
total_router_area24Orion25single_router_area1
edge_single_router
channel_width
area_2_5d❗️aib
edge length26
❗️layer_Q_2_5d27
layer_aib28
aib_out
L_booksim_2d29
L_booksim_3d30
L_2_5d31
Latecny from Booksim32
delay factors❗️33
aib_out[2]
power_summary_router
power
total_2d_channel_power
total_2d_router_power
total_tsv_channel_power
total_3d_router_power
Latecny from Booksim32
power_summary_router func34
tier_2d_hop_list_power❓
tier_3d_hop_list_power❓
tier_2d_hop_list❓35
tier_3d_hop_list❓
total_energy
energy_2d36
energy_3d37
energy_2_5d38
❓L_booksim_2d
❓L_booksim_3d
network_model.py
VariableExplanationRelationships
power_router
power_tsv
not placement 239
placement 240
not placement 241
placement 242
tier_2d_hop_list_power
tier_3d_hop_list_power
imc_size
r_size
area_single_tile
single_router_area
thermal 3Dget_unitsize()
create_cube()
load_power()
conductance_G()
solver()
thermal 2.5D
thermal_model.py

Components

ComponentState-of-the-ArtThis Work
Interconnectvv
ff
NetworkNoC: Booksim43, Orion44, Nirgram4546
3D NoC: Ratatoskr47
cycle-accurate link delay
CategoryParameterDescription
mono/2.5DlwireLength of wire
mono/2.5DwwireWidth of wire
mono/2.5Dtwirethickness of wire
mono/2.5Dpwirepitch of wire
3DdTSVTSV metal diameter
3DhTSVTSV height
3DpTSVTSV pitch
3DtoxTSVThickness of TSV insulator
2D, 2.5D wire and 3D TSV interconnect parameters

TSV

TSV component lump circuit is based on the paper Modeling and Analysis of Through-Silicon Via (TSV) Noise Coupling and Suppression Using a Guard Ring48

For the TSV model in cylinder shape, 4 CTSV on horizontal directions in parallel, 2 RTSV, LTSV, MTSV on vertical directions in series. In Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling2, only resistance and capacitance are considered.

Resistance

{<br>RTSV=121σTSVhunit πrTSV2 , if σTSVrTSV<br>RTSV=121σTSVhunit π(rTSV2(rTSVδTSV)2) , if σTSV<rTSV<br>\left\{\begin{matrix}<br>R_{\mathrm{TSV}}=\frac{1}{2} \frac{1}{\sigma_{\mathrm{TSV}}} \frac{h_{\text {unit }}}{\pi r_{\mathrm{TSV}}{ }^{2}}\text{ , if }\sigma_{\text{TSV}}\ge r_{\text{TSV}}\\ <br>R_{\mathrm{TSV}}=\frac{1}{2} \frac{1}{\sigma_{\mathrm{TSV}}} \frac{h_{\text {unit }}}{\pi\left(r_{\mathrm{TSV}}^{2}-\left(r_{\mathrm{TSV}}-\delta_{\mathrm{TSV}}\right)^{2}\right)}\text{ , if }\sigma_{\text{TSV}}\lt r_{\text{TSV}}<br>\end{matrix}\right.

Where skin depth is δTSV=1πfμσTSV[m]

Self Inductance

LTSV=12(L(rTSV)L(pTSVTSV))hunit hTSV[H]L_{\mathrm{TSV}}=\frac{1}{2}\left(L\left(r_{\mathrm{TSV}}\right)-L\left(p_{\mathrm{TSV}-\mathrm{TSV}}\right)\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}}[H]
L(x)=μhTSV2π[ln((hTSVx)+(hTSVx)2+1)+xhTSV(xhTSV)2+1]L(x)=\frac{\mu h_{\mathrm{TSV}}}{2 \pi}\left[\ln \left(\left(\frac{h_{\mathrm{TSV}}}{x}\right)+\sqrt{\left(\frac{h_{\mathrm{TSV}}}{x}\right)^{2}+1}\right)+\frac{x}{h_{\mathrm{TSV}}}-\sqrt{\left(\frac{x}{h_{\mathrm{TSV}}}\right)^{2}+1}\right]

Mutual Inductance

MTSV=12(L(2rTSV+dTSVTSV))hunit hTSV12L(pTSV_TSV2+(2rTSV+dTSV_TSV)2)hunit hTSVM_{\mathrm{TSV}}= \frac{1}{2}\left(L\left(2 r_{\mathrm{TSV}}+d_{\mathrm{TSV}-\mathrm{TSV}}\right)\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}} -\frac{1}{2} L\left(\sqrt{p_{\mathrm{TSV} \_\mathrm{TSV}}^{2}+\left(2 r_{\mathrm{TSV}}+d_{\mathrm{TSV} \_\mathrm{TSV}}\right)^{2}}\right) \frac{h_{\text {unit }}}{h_{\mathrm{TSV}}}
L(x)=μhTSV2π[ln((hTSVx)+(hTSVx)2+1)+xhTSV(xhTSV)2+1]L(x)=\frac{\mu h_{\mathrm{TSV}}}{2 \pi}\left[\ln \left(\left(\frac{h_{\mathrm{TSV}}}{x}\right)+\sqrt{\left(\frac{h_{\mathrm{TSV}}}{x}\right)^{2}+1}\right)+\frac{x}{h_{\mathrm{TSV}}}-\sqrt{\left(\frac{x}{h_{\mathrm{TSV}}}\right)^{2}+1}\right]

Capacitance

CTSV=142πε0εr, ox_TSV hsub ln(rTSV+tox_TSV rTSV)hunit hsub [F]C_{\mathrm{TSV}}=\frac{1}{4}\cdot \frac{2 \pi \varepsilon_{0} \varepsilon_{r, \text { ox\_TSV }} h_{\text {sub }}}{\ln \left(\frac{r_{\mathrm{TSV}}+t_{\text {ox\_TSV }}}{r_{\mathrm{TSV}}}\right)} \cdot\frac{h_{\text {unit }}}{h_{\text {sub }}}[F]
#Cu TSV skin depth at 1GHz
import math
freq=1e9
sigma=5.8e7
mu=4e-7*3.14159
TSV_skin_depth=1/(2*3.14159*freq*mu*sigma)**0.5
print("TSV_skin_depth=",TSV_skin_depth)
r_tsv=5e-6
d_tsv=2*r_tsv
h_tsv=100e-6
R_tsv=0.5*(1/TSV_skin_depth)*( (1/math.pi)*h_tsv*1/(r_tsv**2-(r_tsv-TSV_skin_depth)**2) )
print("R_tsv=",R_tsv)
Python

Resources

Heterogeneous Integration SIMulator Github Repo

  1. Z. Wang et al., “Exploiting 2.5D/3D Heterogeneous Integration for AI Computing,” 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Korea, Republic of, 2024, pp. 758-764, doi: 10.1109/ASP-DAC58780.2024.10473875. keywords: {Analytical models;Three-dimensional displays;Computational modeling;Wires;Multichip modules;Benchmark testing;Transformers;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Performance Analysis}📁[↩]
  2. Z. Wang et al., “Benchmarking Heterogeneous Integration with 2.5D/3D Interconnect Modeling,” 2023 IEEE 15th International Conference on ASIC (ASICON), Nanjing, China, 2023, pp. 1-4, doi: 10.1109/ASICON58565.2023.10396377. keywords: {Analytical models;Three-dimensional displays;Computational modeling;Multichip modules;Benchmark testing;Data models;Artificial intelligence;Heterogeneous Integration;2.5D;3D;Chiplet;ML accelerators;Electro-thermal Co-design}📁[↩][↩]
  3. G. Krishnan et al., ‘End-to-End Benchmarking of Chiplet-Based In-Memory Computing’, Neuromorphic Computing. IntechOpen, Nov. 15, 2023. doi: 10.5772/intechopen.111926.[↩]
  4. Hegeneous Integration Simulator[↩]
  5. Network on Package[↩]
  6. 2.5D and 3D based interconnect that is different from traditional ones[↩]
  7. 1 as vertical; 5 as zigzag[↩]
  8. tile arrangment of computing_data[↩]
  9. vertical arrangement of computing_data[↩]
  10. delta=computing_data[][1]-1[↩]
  11. check computing_data[i][9][↩]
  12. computing_data from 1 to N_tier, then N_tier+1, to 2*N_tier, meaning the csv in a vertical arrangment[↩]
  13. layer_end_tile-layer_start_tile+1[↩]
  14. computing_data[i][8]/computing_data[i][1][↩]
  15. [[[x0, y0, tier0],[x1, y0, tier0]],[[x4, y4, tier0],[x4, y5, tier0]]][↩]
  16. computing_data row[↩]
  17. [[x0, y0, tier]][↩]
  18. tile_total[i][-1][2]*num_tiles_this_layer/N_tile[↩]
  19. manhatton distance of current layer to next layer, append diff hop2d[↩]
  20. manhatton distance of current layer to next layer, append diff hop3d[↩]
  21. append[↩]
  22. sum[↩][↩]
  23. different tiers: tile_total[i][-1][2])*(len(tile_total[i+1])-1)/(N_tile*percent_router[↩]
  24. Orion, power_summary_router func, mesh_edge[↩]
  25. Wang, Hang-Sheng, Xinping Zhu, Li-Shiuan Peh, and Sharad Malik. “Orion: A power-performance simulator for interconnection networks.” In 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002.(MICRO-35). Proceedings., pp. 294-305. IEEE, 2002. 📁[↩]
  26. (edge_single_router+edge_single_tile)*mesh_edge[↩]
  27. layer_Q_2_5d[i]*1e-6/8[↩]
  28. using aib func[↩]
  29. [hop2d*(trc+tva+tsa+tst+tl)+(tenq)*[Q_2d/W2d]]/fclk_noc[↩]
  30. [hop3d*(trc+tva+tsa+tst+tl)+(tenq)*[Q_3d/W3d]]/fclk_noc[↩]
  31. aib_out[2][↩]
  32. Jiang, Nan, Daniel U. Becker, George Michelogiannakis, James Balfour, Brian Towles, David E. Shaw, John Kim, and William J. Dally. “A detailed and flexible cycle-accurate network-on-chip simulator.” In 2013 IEEE international symposium on performance analysis of systems and software (ISPASS), pp. 86-96. IEEE, 2013. 📁[↩][↩]
  33. trc, tva, tsa, tst,tl, tenq[↩]
  34. hop, bandwidth, delay, chiplet num, mesh edge[↩]
  35. average number of 2D hops per layer per tile within each tier.[↩]
  36. [total_2d_channel_power+total_2d_router_power]*L_booksim_2d*fclk_noc[↩]
  37. [total_tsv_channel_power+total_3d_router_power]*L_booksim_3d*fclk_noc[↩]
  38. total_2_5d_channel_power*L_2_5d*fclk_noc[↩]
  39. power_router[i] = tier_2d_hop_list_power[i][↩]
  40. power_router[len(tier_2d_hop_list_power)-i-1]=tier_2d_hop_list_power[i][↩]
  41. power_tsv[i] = tier_3d_hop_list_power[i][↩]
  42. power_tsv[len(tier_3d_hop_list_power)-i-1] = tier_3d_hop_list_power[i][↩]
  43. ✅cycle-accurate simulation, performance evaluation[↩]
  44. power, area models[↩]
  45. 🚫no latency report[↩]
  46. file[↩]
  47. ✅performance results for 3D routers and links through RTL🚫longer simulation time, no trace-based simulation[↩]
  48. J. Cho et al., “Modeling and Analysis of Through-Silicon Via (TSV) Noise Coupling and Suppression Using a Guard Ring,” in IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 1, no. 2, pp. 220-233, Feb. 2011, doi: 10.1109/TCPMT.2010.2101892.
    keywords: {Through-silicon vias;Noise;Couplings;Substrates;Integrated circuit modeling;Silicon;Mathematical model;Guard ring;measurement;noise coupling model;noise coupling suppression;noise isolation;noise transfer function;shielding structure;substrate noise;three-dimensional integrated circuit (3D-IC);through-silicon via (TSV)}
    📁[↩]

Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

🧭