SlideShare a Scribd company logo
1 of 26
Treelink模型预测算法
       比赛分享

      鸣嵩
决策树-1
• 经典的决策树
 o 根据天气情况决定是否合适打网球
决策树-2
• 根据变量数值来决策
决策树-3
• 根据向量的值来决策
决策树-4
• 比赛中模型包含190棵决策树
 o Treelink/decision tree forest
 o 输出是所有决策树上计算的结果叠加

• 每棵树都是4层的完全二叉树


                                   ……



  Tree 1            Tree 2              Tree 190
单棵决策树上的一次预测
• 输入是向量 float px[0]~px [51],输出是float pY[0]
代码的问题-1
• 0.019217是double
• 0.019217f 才是float
• double v.s. float
  o Double 8bytes, 64bits register
  o Float 4bytes, 32bits register

• double与float进行比较
  需要额外的类型转换
代码的问题-2
                      反汇编结果,gcc 4.1




注释
x_test[28] => 0x70(%rdi) =>%xmm2 (float) => %xmm3 (double)
0.00706646 => 0x19345(%rip)

X86指令不能直接带浮点数,只能编译到代码段中,运算时加载到寄存器内
代价:访存,数据的位置依赖于编译器
改进
• 利用x86指令集中的32位
  立即数
  o mov,cmp,add……
  o 浮点数转为整型

• 减少一次访存


              反汇编




注释:
x_test[2] => 0x8(%rdi) => %r10d
1028025 => 0xfafb0 (cmp指令中的32位立即数)
SIMD




批量的将Float转换成Integer
GCC Intrinsic
• #include <emmintrin.h>
了解处理器
Nehalem E5620
         • 长流水线 >= 15级
         • X86指令解释为微
           指令后乱序执行
           o 等待执行的微指令放在
             Reserveration Station
           o 多个ALU运算单元并发、
             乱序执行
           o Reorder Buffer中实现串
             行化
           o Instruction Retirement
Pipeline
• 示例:4级和8级的流水线
Intel的长流水线
Branch Predicton
Front End
            读入x86指令,
            每个时钟周期
            16字节




            x86指令解析为
            微指令(μop)


        微指令(μop)缓存
乱序执行-1
寄存器重命名
分配临时寄存器

微指令进入保留站

                       发射指令


  EU              EU               EU

           各种运算               Load/Store
乱序执行-2
               按指令顺序写出结果
               指令生效,真正写入
               内存和物理寄存器
存入临时寄存器

           触发具有数据依赖的指令执行




          EU中计算结果      Load/Store
指令量化分析
• 取指令,每个16字节/cycle
• X86指令解析为微指令
   o 简单指令3条/cycle
   o 复杂指令1条/cycle

• 保留站到EU的Port,总共6个
   o   P0,P1,P5到ALU单元
   o   P2,P3,P4到Load/Store单元

• Instruction Retirement,4条μop/cycle
• Dependency Chain长度
指令优化
• 长流水线 >= 15级
 o Branch prediction miss性能损耗大
    • 减少Branch prediction miss率
 o 减少/消除conditional branch
    • Bit运算代替比较
    • Comvg指令代替比较
• 充分发挥Intel处理器乱序执行的能力
 o 避免指令间存在long dependency chain
 o 避免指令间隐性的依赖关系,例如对eflags的依赖
消除Conditional Branch
• 如何消除这个if语句       • Bit运算版本1
    if (a < b) {   int mask = (a-b) >> 31;
      r = c;       r = (mask & c) | (~mask & d);
    } else {
      r = d;       • Bit运算版本2
    }              int mask = (a-b) >> 31;
                   r = d + mask & (c-d);

                   • cmovg版本
                   r = (a < b) ?c : d;
不要滥用CMOV指令
CMOV (and, more generically, any
"predicated instruction") tends to
generally a bad idea on an
aggressively out-of-order CPU.

                   —— Linux Torvalds
优化结果
     只保留前两层比较,因
     为branch命中率较高



  第四层用bit运算代替比较,
  充分发挥处理器的乱序执行




  第三层用cmovg,优点:指令少
优化结果

                                  反汇编 平均13条指令




执行时间100w条输入190棵树 0.44s, E5620 @ 2.40GHz
平均每次计算 (0.44 * 2.4 * 1000,000,000)/(1000,000 * 190) = 5.55 个时钟周期
量化分析
                 指令         μop    P0/1/5       P0       P1       P5       P2   P3   P4
mov 0x70(%rdi),%edx         6      3        x        x        x        3
lea    -0xf540(%r9),%eax    1      1                 1
sar    $0x1f,%eax           1      1        x                 x
sub    $0xe76606,%edx       1      1        x        x        x
and $0x1d5c6,%eax           1      1        x        x        x
sar    $0x1f,%edx           1      1        x                 x
add $0x3cc1,%eax            1      1        x        x        x
and $0x6079,%edx            1      1        x        x        x
sub    $0x50fd,%edx         1      1        x        x        x
cmpl $0xfe3f93,0xb8(%rdi)   1      1        x        x        x        1
cmovg %eax,%edx             2      2        x        x        x
lea (%rdx,%rcx,1),%r8d      1      1                 1
jg    0x403e18              1      1                          1
            总和              19     16                                  4
          需要时钟周期            4.75   5.3                                 4
量化分析
                 指令          μop    P0/1/5       P0       P1       P5       P2   P3   P4
mov 0x70(%rdi),%edx          6      3        x        x        x        3
lea    -0xf540(%r9),%eax     1      1                 1
sar    $0x1f,%eax            1      1        x                 x

                           理论值 5.3个时钟周期x
sub    $0xe76606,%edx        1      1        x        x        x
and $0x1d5c6,%eax            1  1  x x
sar    $0x1f,%edx          实际值 5.5个时钟周期x
                             1  1  x
add $0x3cc1,%eax             1 高度契合
                                1  x x x
and $0x6079,%edx             1      1        x        x        x
sub    $0x50fd,%edx          1      1        x        x        x
cmpl $0xfe3f93,0xb8(%rdi)    1      1        x        x        x        1
cmovg %eax,%edx              2      2        x        x        x
lea (%rdx,%rcx,1),%r8d       1      1                 1
jg    0x403e18               1      1                          1
            总和               19     16                                  4
          需要时钟周期             4.75   5.3                                 4

More Related Content

What's hot

Redundancy Datacenter Design
Redundancy Datacenter DesignRedundancy Datacenter Design
Redundancy Datacenter DesignJeffrey Lam
 
University electronic management system
University electronic management systemUniversity electronic management system
University electronic management systemAleksey Lashin
 
The Evolution of the Datacenter
The Evolution of the DatacenterThe Evolution of the Datacenter
The Evolution of the DatacenterStackIQ
 
Datacenter Strategy, Design, and Build
Datacenter Strategy, Design, and BuildDatacenter Strategy, Design, and Build
Datacenter Strategy, Design, and BuildChristopher Kelley
 
GS1 Standards in Building Smart Cities
GS1 Standards in Building Smart CitiesGS1 Standards in Building Smart Cities
GS1 Standards in Building Smart CitiesDaeyoung Kim
 
Employee Profile Management System
Employee Profile Management SystemEmployee Profile Management System
Employee Profile Management Systemncct
 
萨伏伊别墅分析
萨伏伊别墅分析萨伏伊别墅分析
萨伏伊别墅分析smiling824
 
Data center proposal
Data center proposalData center proposal
Data center proposalMuhammad Ahad
 
Data center Building & General Specification
Data center Building & General Specification Data center Building & General Specification
Data center Building & General Specification Ali Mirfallah
 
What are the types of data centers
What are the types of data centersWhat are the types of data centers
What are the types of data centersLivin Jose
 
DataCenter:: Infrastructure Presentation
DataCenter:: Infrastructure PresentationDataCenter:: Infrastructure Presentation
DataCenter:: Infrastructure PresentationMuhammad Asad Rashid
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 
Data centers
Data centersData centers
Data centerstejaswi25
 
The Perimeter Protection Issues, Technique and Operation
The Perimeter Protection Issues, Technique and OperationThe Perimeter Protection Issues, Technique and Operation
The Perimeter Protection Issues, Technique and OperationHafiza Abas
 
03. non-functional-attributes-introduction-4-slides
03. non-functional-attributes-introduction-4-slides03. non-functional-attributes-introduction-4-slides
03. non-functional-attributes-introduction-4-slidesMuhammad Ahad
 

What's hot (20)

Redundancy Datacenter Design
Redundancy Datacenter DesignRedundancy Datacenter Design
Redundancy Datacenter Design
 
University electronic management system
University electronic management systemUniversity electronic management system
University electronic management system
 
The Evolution of the Datacenter
The Evolution of the DatacenterThe Evolution of the Datacenter
The Evolution of the Datacenter
 
Datacenter Strategy, Design, and Build
Datacenter Strategy, Design, and BuildDatacenter Strategy, Design, and Build
Datacenter Strategy, Design, and Build
 
GS1 Standards in Building Smart Cities
GS1 Standards in Building Smart CitiesGS1 Standards in Building Smart Cities
GS1 Standards in Building Smart Cities
 
project report on DATACENTER
project report on DATACENTERproject report on DATACENTER
project report on DATACENTER
 
Employee Profile Management System
Employee Profile Management SystemEmployee Profile Management System
Employee Profile Management System
 
萨伏伊别墅分析
萨伏伊别墅分析萨伏伊别墅分析
萨伏伊别墅分析
 
Data center proposal
Data center proposalData center proposal
Data center proposal
 
Data center Building & General Specification
Data center Building & General Specification Data center Building & General Specification
Data center Building & General Specification
 
physical-security (1).pdf
physical-security (1).pdfphysical-security (1).pdf
physical-security (1).pdf
 
What are the types of data centers
What are the types of data centersWhat are the types of data centers
What are the types of data centers
 
DataCenter:: Infrastructure Presentation
DataCenter:: Infrastructure PresentationDataCenter:: Infrastructure Presentation
DataCenter:: Infrastructure Presentation
 
Server room presentation 16th january 2014
Server room presentation 16th january 2014Server room presentation 16th january 2014
Server room presentation 16th january 2014
 
Apple Make Vs Buy decision
Apple Make Vs Buy decision Apple Make Vs Buy decision
Apple Make Vs Buy decision
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
Data centers
Data centersData centers
Data centers
 
The Perimeter Protection Issues, Technique and Operation
The Perimeter Protection Issues, Technique and OperationThe Perimeter Protection Issues, Technique and Operation
The Perimeter Protection Issues, Technique and Operation
 
Data Centre Design Guideline and Standards
Data Centre Design Guideline and StandardsData Centre Design Guideline and Standards
Data Centre Design Guideline and Standards
 
03. non-functional-attributes-introduction-4-slides
03. non-functional-attributes-introduction-4-slides03. non-functional-attributes-introduction-4-slides
03. non-functional-attributes-introduction-4-slides
 

Viewers also liked

DFS递归实现、栈实现
DFS递归实现、栈实现DFS递归实现、栈实现
DFS递归实现、栈实现also24
 
基于Lucene的站内搜索
基于Lucene的站内搜索基于Lucene的站内搜索
基于Lucene的站内搜索fulin tang
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted indexweedge
 
自然语言处理 中文分词程序实验报告%28含源代码%29
自然语言处理 中文分词程序实验报告%28含源代码%29自然语言处理 中文分词程序实验报告%28含源代码%29
自然语言处理 中文分词程序实验报告%28含源代码%29aemoe
 
Contents Page Analysis
Contents Page AnalysisContents Page Analysis
Contents Page AnalysisSav1509
 
Power point webs 1.0, 2.0, 3.0(present.)
Power point webs 1.0, 2.0, 3.0(present.)Power point webs 1.0, 2.0, 3.0(present.)
Power point webs 1.0, 2.0, 3.0(present.)carlainfo
 
1 Boden Alexander Photos 2007 Ppt Version
1 Boden Alexander Photos 2007 Ppt Version1 Boden Alexander Photos 2007 Ppt Version
1 Boden Alexander Photos 2007 Ppt Versiondschlecht
 
طريقة عمل بلح الشام بالصور
طريقة عمل بلح الشام بالصورطريقة عمل بلح الشام بالصور
طريقة عمل بلح الشام بالصورMohamed Al Yemani
 
Sophomore SEOP 2011
Sophomore SEOP 2011Sophomore SEOP 2011
Sophomore SEOP 2011toutly
 
Idesco DESCoder Tutorial Presentation
Idesco DESCoder Tutorial PresentationIdesco DESCoder Tutorial Presentation
Idesco DESCoder Tutorial PresentationIdesco Oy
 
hydrosphere and water pollution
hydrosphere and water pollutionhydrosphere and water pollution
hydrosphere and water pollutioncathypanotes
 
'Lost soul' animatic
'Lost soul' animatic'Lost soul' animatic
'Lost soul' animaticFordyy
 
Performance Based 1st. Page Rankings
Performance Based 1st. Page RankingsPerformance Based 1st. Page Rankings
Performance Based 1st. Page Rankingsbriteideas
 
Trailer analysis scream 4
Trailer analysis   scream 4Trailer analysis   scream 4
Trailer analysis scream 4Fordyy
 
Trailer analysis toy story 3
Trailer analysis   toy story 3Trailer analysis   toy story 3
Trailer analysis toy story 3Fordyy
 
搜索引擎原理略览
搜索引擎原理略览搜索引擎原理略览
搜索引擎原理略览pluschen
 

Viewers also liked (20)

DFS递归实现、栈实现
DFS递归实现、栈实现DFS递归实现、栈实现
DFS递归实现、栈实现
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
基于Lucene的站内搜索
基于Lucene的站内搜索基于Lucene的站内搜索
基于Lucene的站内搜索
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted index
 
自然语言处理 中文分词程序实验报告%28含源代码%29
自然语言处理 中文分词程序实验报告%28含源代码%29自然语言处理 中文分词程序实验报告%28含源代码%29
自然语言处理 中文分词程序实验报告%28含源代码%29
 
Contents Page Analysis
Contents Page AnalysisContents Page Analysis
Contents Page Analysis
 
Idesco 2013
Idesco 2013Idesco 2013
Idesco 2013
 
Power point webs 1.0, 2.0, 3.0(present.)
Power point webs 1.0, 2.0, 3.0(present.)Power point webs 1.0, 2.0, 3.0(present.)
Power point webs 1.0, 2.0, 3.0(present.)
 
1 Boden Alexander Photos 2007 Ppt Version
1 Boden Alexander Photos 2007 Ppt Version1 Boden Alexander Photos 2007 Ppt Version
1 Boden Alexander Photos 2007 Ppt Version
 
طريقة عمل بلح الشام بالصور
طريقة عمل بلح الشام بالصورطريقة عمل بلح الشام بالصور
طريقة عمل بلح الشام بالصور
 
Sophomore SEOP 2011
Sophomore SEOP 2011Sophomore SEOP 2011
Sophomore SEOP 2011
 
Idesco DESCoder Tutorial Presentation
Idesco DESCoder Tutorial PresentationIdesco DESCoder Tutorial Presentation
Idesco DESCoder Tutorial Presentation
 
hydrosphere and water pollution
hydrosphere and water pollutionhydrosphere and water pollution
hydrosphere and water pollution
 
'Lost soul' animatic
'Lost soul' animatic'Lost soul' animatic
'Lost soul' animatic
 
Paychex
PaychexPaychex
Paychex
 
Performance Based 1st. Page Rankings
Performance Based 1st. Page RankingsPerformance Based 1st. Page Rankings
Performance Based 1st. Page Rankings
 
Trailer analysis scream 4
Trailer analysis   scream 4Trailer analysis   scream 4
Trailer analysis scream 4
 
Trailer analysis toy story 3
Trailer analysis   toy story 3Trailer analysis   toy story 3
Trailer analysis toy story 3
 
Bamboo
BambooBamboo
Bamboo
 
搜索引擎原理略览
搜索引擎原理略览搜索引擎原理略览
搜索引擎原理略览
 

Similar to Treelink比赛分享

快快樂樂SIMD
快快樂樂SIMD快快樂樂SIMD
快快樂樂SIMDWei-Ta Wang
 
Monitor is all for ops
Monitor is all for opsMonitor is all for ops
Monitor is all for ops琛琳 饶
 
淘宝前端优化
淘宝前端优化淘宝前端优化
淘宝前端优化锐 张
 
淘宝前台系统优化实践“吞吐量优化”-Qcon2011
淘宝前台系统优化实践“吞吐量优化”-Qcon2011淘宝前台系统优化实践“吞吐量优化”-Qcon2011
淘宝前台系统优化实践“吞吐量优化”-Qcon2011Yiwei Ma
 
Mahout資料分析基礎入門
Mahout資料分析基礎入門Mahout資料分析基礎入門
Mahout資料分析基礎入門Jhang Raymond
 
110824 knoss-windows系统机制浅析
110824 knoss-windows系统机制浅析110824 knoss-windows系统机制浅析
110824 knoss-windows系统机制浅析Zoom Quiet
 
Golang advance
Golang advanceGolang advance
Golang advancerfyiamcool
 
Flash mmorpg游戏引擎及工具开发概述-张明光
Flash mmorpg游戏引擎及工具开发概述-张明光Flash mmorpg游戏引擎及工具开发概述-张明光
Flash mmorpg游戏引擎及工具开发概述-张明光FLASH开发者交流会
 
Binary exploitation - AIS3
Binary exploitation - AIS3Binary exploitation - AIS3
Binary exploitation - AIS3Angel Boy
 
我对后端优化的一点想法
我对后端优化的一点想法我对后端优化的一点想法
我对后端优化的一点想法mysqlops
 
Avm2虚拟机浅析与as3性能优化(陈士凯)
Avm2虚拟机浅析与as3性能优化(陈士凯)Avm2虚拟机浅析与as3性能优化(陈士凯)
Avm2虚拟机浅析与as3性能优化(陈士凯)FLASH开发者交流会
 
[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)
[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)
[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)Shanda innovation institute
 
Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)Kris Mok
 
Talking about exploit writing
Talking about exploit writingTalking about exploit writing
Talking about exploit writingsbha0909
 
我对后端优化的一点想法.pptx
我对后端优化的一点想法.pptx我对后端优化的一点想法.pptx
我对后端优化的一点想法.pptxjames tong
 
视频编码原理简介Sohu版
视频编码原理简介Sohu版视频编码原理简介Sohu版
视频编码原理简介Sohu版pluschen
 
管理员必备的20个 Linux系统监控工具
管理员必备的20个 Linux系统监控工具管理员必备的20个 Linux系统监控工具
管理员必备的20个 Linux系统监控工具wensheng wei
 

Similar to Treelink比赛分享 (20)

快快樂樂SIMD
快快樂樂SIMD快快樂樂SIMD
快快樂樂SIMD
 
Monitor is all for ops
Monitor is all for opsMonitor is all for ops
Monitor is all for ops
 
Optimzing mysql
Optimzing mysqlOptimzing mysql
Optimzing mysql
 
Godson x86
Godson x86Godson x86
Godson x86
 
淘宝前端优化
淘宝前端优化淘宝前端优化
淘宝前端优化
 
淘宝前台系统优化实践“吞吐量优化”-Qcon2011
淘宝前台系统优化实践“吞吐量优化”-Qcon2011淘宝前台系统优化实践“吞吐量优化”-Qcon2011
淘宝前台系统优化实践“吞吐量优化”-Qcon2011
 
Mahout資料分析基礎入門
Mahout資料分析基礎入門Mahout資料分析基礎入門
Mahout資料分析基礎入門
 
110824 knoss-windows系统机制浅析
110824 knoss-windows系统机制浅析110824 knoss-windows系统机制浅析
110824 knoss-windows系统机制浅析
 
Golang advance
Golang advanceGolang advance
Golang advance
 
Flash mmorpg游戏引擎及工具开发概述-张明光
Flash mmorpg游戏引擎及工具开发概述-张明光Flash mmorpg游戏引擎及工具开发概述-张明光
Flash mmorpg游戏引擎及工具开发概述-张明光
 
Binary exploitation - AIS3
Binary exploitation - AIS3Binary exploitation - AIS3
Binary exploitation - AIS3
 
我对后端优化的一点想法
我对后端优化的一点想法我对后端优化的一点想法
我对后端优化的一点想法
 
Avm2虚拟机浅析与as3性能优化(陈士凯)
Avm2虚拟机浅析与as3性能优化(陈士凯)Avm2虚拟机浅析与as3性能优化(陈士凯)
Avm2虚拟机浅析与as3性能优化(陈士凯)
 
[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)
[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)
[Flash开发者交流][2010.05.30]avm2虚拟机浅析与as3性能优化(陈士凯)
 
Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)
 
Talking about exploit writing
Talking about exploit writingTalking about exploit writing
Talking about exploit writing
 
Ch2 1
Ch2 1Ch2 1
Ch2 1
 
我对后端优化的一点想法.pptx
我对后端优化的一点想法.pptx我对后端优化的一点想法.pptx
我对后端优化的一点想法.pptx
 
视频编码原理简介Sohu版
视频编码原理简介Sohu版视频编码原理简介Sohu版
视频编码原理简介Sohu版
 
管理员必备的20个 Linux系统监控工具
管理员必备的20个 Linux系统监控工具管理员必备的20个 Linux系统监控工具
管理员必备的20个 Linux系统监控工具
 

Treelink比赛分享