Weitere ähnliche Inhalte Ähnlich wie Methods for Achieving RTL to Gate Power Consistency (20) Kürzlich hochgeladen (20) Methods for Achieving RTL to Gate Power Consistency1. 6/23/2014 © 2014 ANSYS, Inc. 1
Methods for Achieving RTL to Gate
Power Consistency
Design Automation Conference 2014
2. 6/23/2014 © 2014 ANSYS, Inc. 2
PowerArtist™: RTL Design-for-Power Platform
Power Analysis and Debug
Original RTL Low-Power RTL
Automated Power Reduction Links with Physical
Physical
Power
RTL Power
PACE RPM
3. 6/23/2014 © 2014 ANSYS, Inc. 3
Objectives of RTL Power Analysis
• Power trade-off analysis using relative accuracy
• Sign off power with absolute accuracy
• Analysis driven power reduction
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291
Cumulative Area
Overhead
(normalized)
Total Power
Savings Available
(normalized)
# RTL Changes (Design Effort)
Maximum acceptable area
impact
Maximum possible
power savings
Only 5 changes
gave 50% saving
4. 6/23/2014 © 2014 ANSYS, Inc. 5
RTL Power: Inputs for PowerArtist
Vdd
1
Power domains
(UPF / CPF)
Vdd
2
module PA (
...
always @ (posedge clk) begin
dout <= din1;
end
assign out = sel ? dout : din2;
...
endmodule RTL
(VHDL, Verilog, System Verilog)
RTL Power
Analysis
Capacitance model
(WLM / PACE)
mux
and
register
register
Activity
(FSDB / VCD / SAIF)
Clock tree, gating
(SDC, PACE, user input)
clk
Power models
(Liberty .lib)
5. 6/23/2014 © 2014 ANSYS, Inc. 6
Factors Affecting RTL Power Accuracy
Synthesis
Modeling
Inferencing
Multi-VT
Cell Selection
Micro-architecture
Algorithmic
RTL Models
Activity
Propagation
Timing
Power
Computation
Physical
Models
Clock Tree
Wire Cap
Transition Time
Low Power
Structures
Voltage / Power
Domains
CPF / UPF
NOTE: Algorithmic and Low Power
structures are not configured for
accuracy
6. 6/23/2014 © 2014 ANSYS, Inc. 7
Synthesis Modeling Aspects for RTL Power
• Optimization settings to be consistent as synthesis
• Enable DesignWare flow (if DW components are present) Inferencing
Multi-VT • Apply consistent multi-VT settings from synthesis
• Fine-tune cell selection based on synthesis netlist
• Apply boundary conditions based on load/ frequency Cell Selection
• Apply microarchitectures for macros (e.g. adders,
multipliers) Microarchitecture
7. 6/23/2014 © 2014 ANSYS, Inc. 8
Synthesis Modeling Aspects in PowerArtist
b = 8’b11000100;
assign z = a * b;
CSA
Constant Multipliers
assign z = a + b + c + d ; a b c
CSA d
CSA
+
a b
+ c
+ d
+
Chains of Adders
Look-Up Table Optimization
OR
plane
address
data
case (address)
8'd0 : data = {32'd0};
8'd1 : data = {32'd12};
…
endcase
address
Optimized and-or plane by
sharing common logic
data
Cell mapping to
basic 2-input cells
Modeled using
AOIs
Un-encoded mux
8. 6/23/2014 © 2014 ANSYS, Inc. 9
RTL Power Accuracy
Using Wire Load Models
– Large difference seen with
simple wire load models
– Clock and Combo power show
the largest difference
– Total power shows 40%
difference wrt gate level
Mobile SoC Case Study
** Note: GATE considered to be most accurate
28.8%
11.0%
-9.2%
69.2%
41.2%
32.3%
40.2%
-100%
-80%
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% Difference
Power (Watts)
RTL Wire Load Models vs. Gate Level
(Different Power Categories)
RTL WLM GATE %diff
9. 6/23/2014 © 2014 ANSYS, Inc. 10
Physical Aspects Modeling for Power
• Modeling clock tree
• Balanced and Clock Mesh topology Clock Tree
• Accurately model post-layout wire capacitance
• Model capacitance profile for different types of nets Wire Cap
• Accurately model slew for realistic power
• Both clock and logic nets Transition Time
10. 6/23/2014 © 2014 ANSYS, Inc. 11
Physical Modeling: Clock Tree
• RTL clock power accuracy requirements
– Understand clock gating methodology
– Understand clock tree topology and buffering
• Difficult for RTL designers to get data from backend team
Balanced Clock Tree Clock Mesh Topology
11. 6/23/2014 © 2014 ANSYS, Inc. 12
Physical Modeling: Wire Cap
40nm, 45k nets with fanout 1
Traditional Wire Load Models
• Not available in some vendor libraries; often not calibrated
• Custom WLMs not portable across blocks and designs
• Simplistic modeling results in poor accuracy
WLM assigns 1fF for all nets vs. SPEF
that varies 0.2fF to >129fF
12. 6/23/2014 © 2014 ANSYS, Inc. 13
PACE™ for RTL Power Accuracy
PACE applies from RTL to Pre-layout Power
• Clock tree models
– Determine buffer and CG cells per inferred clock tree
– Supports both balanced clock tree as well as clock mesh
• Wire capacitance models
– Granular, power-oriented vs. traditional WLMs
module PA (
...
always @ (posedge clk)
begin
dout <= din1;
end
assign out = sel ? dout :
din2;
...
endmodule
Clock distribution
Parasitics
Multiple Vt
Low-power structures
RTL Power
Bridge the RTL ↔ Implementation Gap
Statistical Models:
Wire Cap and Clock
Representative
Layout
PowerArtist
Calibration (PACE)
Post-Layout Power
13. 6/23/2014 © 2014 ANSYS, Inc. 14
-13.4%
5.1%
-9.2%
22.8%
8.1%
-37.4%
3.0%
-100%
-80%
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% Difference
Power (Watts)
PACE Cap Models vs. WLM & Gate Level
(Different Power Categories)
RTL WLM RTL w PACE Cap GATE %diff
RTL Power Accuracy
Using PACE Cap Models
– Tighter correlation seen with
PACE Cap models
– Register and Combo power
are within +/-20%
– Total power shows <5%
difference wrt gate level
Mobile SoC Case Study
** Note: GATE considered to be most accurate
14. 6/23/2014 © 2014 ANSYS, Inc. 15
RTL Power Accuracy
Using PACE Cap + Clock Models
– Best correlation seen with
PACE Cap + Clock models
– Overall correlation is within
+/-15%
Mobile SoC Case Study
** Note: GATE considered to be most accurate
-13.4%
9.9%
-9.2%
-12.8% -9.0% -13.6% -9.4%
-100.0%
-80.0%
-60.0%
-40.0%
-20.0%
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% Difference
Power (Watts)
PACE Cap+Clk Models vs. WLM & Gate Level
(Different Power Categories)
RTL WLM RTL w PACE Cap+Clock GATE
%diff w/ PACE %diff w/ WLM
15. 6/23/2014 © 2014 ANSYS, Inc. 16
0.000
0.020
0.040
0.060
0.080
0.100
0.120
Design 1 Design 2 Design 3
Power (Watts)
Total Power Comparison
RTL WLM RTL PACE GATE
RTL Power Accuracy
Using PACE Cap + Clock Models
– Total power with WLM is
greater than +/-30%
– With PACE models within
+/-20%
Mobile SoC Blocks Case
Study
** Note: GATE considered to be most accurate
16. 6/23/2014 © 2014 ANSYS, Inc. 17
RTL Power Accuracy
Using PACE Cap + Clock Models
– Total power with WLM is
greater than +/-30%
– With PACE models within
+/-20%
Mobile SoC Blocks Case
Study
** Note: GATE considered to be most accurate
– Clock power with PACE
is within +/-20% as well
15.5%
19.0%
20.7%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
7.00E-02
8.00E-02
Design 1 Design 2 Design 3
% diff
Power (Watts)
Clock Power wrt RTL PACE vs. GATE
GATE RTL PACE %diff
17. 6/23/2014 © 2014 ANSYS, Inc. 18
Nvidia Case Study: RTL Power Accuracy
DESIGN
Number of
instances
Black-boxed
DW
instances
Avg
Dynamic
Power
(mW)
Avg
Leakage
Power
(mW)
Avg Total
Power
(mW)
Avg
Dynamic
Power
(mW)
Avg
Leakage
Power
(mW)
Avg Total
Power
(mW)
%
Dynamic
Power
% Leakage
Power
% Total
Power
PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02%
TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97%
TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77%
TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67%
SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88%
SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12%
115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26%
125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97%
85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18%
Average Power excluding SMI/TTF
Average Power PR/TD only
Post-synthesis PT-PX RTL Power Artist
RTL Power Artist vs
Post-synthesis PT-PX
Average Power overall designs
• Power correlation performed for 6 designs 130K - 1.13M instances
• In general, very good average power correlation observed (SMI and TTF having DWs)
• 8-16 tests being run across the blocks
** Source : Nvidia-Apache Webinar, July 2013 (Miki)
18. 6/23/2014 © 2014 ANSYS, Inc. 19
Summary
• RTL power enables early design trade offs for high power impact
• PowerArtist provides predictable RTL power accuracy wrt GATE
• PowerArtist has advanced synthesis and physical modeling techniques
• PowerArtist PACE modeling is proven across designs
• Use PowerArtist for RTL power sign-off with absolute accuracy