Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data

2.338 Aufrufe

Veröffentlicht am

Hortonworks DataFlow & Apache Nifi presented at Oslo Hadoop Big Data Meetup in Oslo, Norway 2015-11-19.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data

  1. 1. Hortonworks  DataFlow Enterprise  Data  Flow  powered  by  Apache  NiFi Mats  Johansson Solutions  Engineer  -­ EMEA ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  2. 2. Page  2 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Disclaimer This  document  may  contain  product  features  and  technology  directions  that  are  under   development,  may  be  under  development  in  the  future  or  may  ultimately  not  be   developed. Project  capabilities  are  based  on  information  that  is  publicly  available  within  the  Apache   Software  Foundation  project  websites  ("Apache").    Progress  of  the  project  capabilities   can  be  tracked  from  inception  to  release  through  Apache,  however,  technical  feasibility,   market  demand,  user  feedback  and  the  overarching  Apache  Software  Foundation   community  development  process  can  all  effect  timing  and  final  delivery. This  document’s  description  of  these  features  and  technology  directions  does  not   represent  a  contractual  commitment,  promise  or  obligation  from  Hortonworks  to  deliver   these  features  in  any  generally  available  product. Product  features  and  technology  directions  are  subject  to  change,  and  must  not  be   included  in  contracts,  purchase  orders,  or  sales  agreements  of  any  kind. Since  this  document  contains  an  outline  of  general  product  development  plans,   customers  should  not  rely  upon  it  when  making  purchasing  decisions.
  3. 3. Page  3 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved IoAT Data  Grows  Faster  Than  We  Consume  It Much  of  the  new  data   exists  in-­flight,  between   systems  and  devices  as   part  of  the  Internet  of   AnythingNEW TRADITIONAL The  Opportunity Unlock  transformational  business  value from  a  full  fidelity  of  data  and  analytics for  all  data. Geolocation Server  logs Files &  emails ERP,  CRM,  SCM Traditional  Data  Sources Internet  of  Anything Sensors and machines Clickstream Social  media
  4. 4. Page  4 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Internet  of  Anything  is  Driving  New  Requirements Need  trusted  insights  from  data  at  the  very  edge  to  the  data  lake  in  real-­ time  with  full-­fidelity – Data  generated  by  sensors,  machines,  geo-­location  devices,  logs,  clickstreams,  social  feeds,  etc.   Modern  applications need  access  to  both  data-­in-­motion  and  data-­at-­rest IoAT data  flows  are  multi-­directional  and  point-­to-­point – Very  different  than  existing  ETL,  data  movement,  and  streaming  technologies  which  are  generally  one  direction The  perimeter  is  outside  the  data  center  and  can  be  very  jagged – This  “Jagged  Edge”  creates  new  opportunity  for  security,  data  protection,  data  governance  and  provenance
  5. 5. Page  5 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Architectural  Limitations  Today • Traditional  data  movement  software  has  been  built  for  the  world  of   standardized data  and  one  way  flows • Tools  built  for  newer  types  of  data  tend  to  be  custom,  difficult  to   manage,  and  architecturally  disjoint • Businesses  can  not  easily  collect,  conduct,  and  curate  secure  multi-­ directional  and  point-­to-­point  IoAT data  flows • IoAT data  flows  are  not  optimized  and  use  costly/limited  bandwidth  and   cannot  dynamically  prioritize  the  most  valuable  data • Difficult  to  gain  actionable  insights  from  the  combination  of  data-­in-­ motion  and  data-­at-­rest
  6. 6. Page   6 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved The  IoAT Data  Flow Hortonworks  Data  Platform powered  by  Apache  Hadoop Hortonworks  Data  Platform powered  by  Apache  Hadoop Enrich Context Store  Data   and  Metadata Internet of  Anything Hortonworks  DataFlow   powered  by  Apache  NiFi Perishable   Insights Historical Insights Introducing  Hortonworks  DataFlow Hortonworks  DataFlow  and  the  Hortonworks  Data  Platform   deliver  the  industry’s  most  complete  solution  for  management  of  Big  Data.
  7. 7. Page   7 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Simplistic  View  of  IoAT &  Data  Flow The  Data  Flow  Thing Process  and   Analyze  Data Acquire  Data Store  Data
  8. 8. Page   8 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Global  interactions  with  customers,  business  partners,  and  things spanning  different  volume,  velocity,  bandwidth,  and  latency  needs Realistic  View  of  IoAT and  Data  Flow
  9. 9. Page   9 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Meeting  IoAT Edge  Requirements GATHE R DELIVER PRIORITIZE Track  from  the  edge Through  to  the  datacenter Small  Footprints operate  with  very  little  power Limited  Bandwidth can  create  high  latency Data  Availability exceeds  transmission  bandwidth Data  Must  Be  Secured throughout  its  journey
  10. 10. Page   10 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Dataflow  requirements  within  the  Data  Center Understanding Ability  to  observe  precisely  how  systems  exchange  data  in  real-­time  and  historically Agility Ability  to  interact  with  and  alter  live  flows  and  iterate  on  new  ones Dynamic  Access  Controls The  entitlements  of  users  and  systems  and  sensitivity  of  data  can  change  frequently Cross  Cutting  Concerns Address  common  needs  once  like  enrichment,  filtering,  transformation Enable  architecture  transition Legacy  vs modern  is  an  ‘always’  event.    Format,  schema,  protocol  conversion  is  routine
  11. 11. Page  11 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Apache  NiFi:  Collect,  Conduct,  Curate Aggregate  all  IoAT data  from  sensors,  geo-­location  devices,   machines,  logs,  files,  and  feeds  via  a  highly  secure  lightweight  agent Collect:        Bring  Together• Logs • Files • Feeds • Sensors Mediate  point-­to-­point  and  bi-­directional  data  flows,  delivering  data   reliably  to  real-­time  applications  and  storage  platforms  such  as  HDP Conduct:    Mediate  the  Data  Flow• Deliver • Secure • Govern • Audit Parse,  filter,  join,  transform,  fork,  and  clone  data  in  motion  to   empower  analytics  and  perishable  insights Curate:        Gain  Insights• Parse • Filter • Transform • Fork • Clone
  12. 12. Page  12 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved November  2014 NiFi is  donated  to  the  Apache  Software  Foundation   (ASF)  through  NSA’s  Technology  Transfer  Program   and  enters  ASF’s  incubator. 2006 NiagaraFiles (NiFi)  was  first  incepted  by  Joe  Witt  at   the  National  Security  Agency  (NSA) A  Brief  History  of  Apache  Nifi July  2015 NiFi reaches  ASF  top-­level  project  status
  13. 13. Page   13 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Apache  NiFi:  Three  key  concepts • Manage  the  flow  of  information • Data  Provenance • Secure  the  control  plane  and  data  plane
  14. 14. Page   14 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Apache  NiFi  – Key  Features • Guaranteed  delivery • Data  buffering   - Backpressure - Pressure  release • Prioritized  queuing • Flow  specific  QoS - Latency  vs.  throughput - Loss  tolerance • Data  provenance • Recovery/recording   a  rolling  log  of  fine-­ grained  history • Visual  command  and   control • Flow  templates • Pluggable/multi-­role   security • Designed  for  extension • Clustering
  15. 15. Page   15 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Common  Apache  NiFi Use  Cases Predictive  Analytics Ensure  the  highest  value  data  is  captured  and  available  for  analysis Compliance Gain  full  transparency  into  provenance  and  flow  of  data   IoT Optimization Secure,  Prioritize,  Enrich  and  Trace  data  at  the  edge Fraud  Detection Move  sales  transaction  data  in  real  time  to  analyze  on  demand   Big  Data  Ingest Easily  and  efficiently  ingest  data  into  Hadoop Value  Resources Gain  visibility  into  how  data  sources  are  used  to  determine  value
  16. 16. Page   16 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Flow  Based  Programming  (FBP) FBP  Term NiFi Term Description Information   Packet FlowFile Each object  moving  through  the  system. Black Box FlowFile   Processor Performs  the  work, doing  some  combination  of  data  routing,   transformation,  or  mediation  between  systems. Bounded   Buffer Connection The  linkage between  processors, acting  as  queues  and  allowing  various   processes  to  interact  at  differing  rates. Scheduler Flow   Controller Maintains  the  knowledge  of  how  processes  are  connected, and  manages   the  threads  and  allocations  thereof  which  all  processes  use. Subnet Process   Group A  set  of  processes  and  their  connections,  which  can  receive  and  send   data  via  ports.  A  process group  allows  creation  of  entirely  new   component  simply  by  composition  of  its components.
  17. 17. Page   17 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Hortonworks Data  Flow Visual  User  Interface HTML  5,  drag  and  drop,  for  agile  execution High  Throughput,  Low  Bandwidth for  any  data,  big  or  small Provenance  Metadata for  governance  and  compliance Secure  End-­to-­End  Data  Routing with  encryption  and  compressionPowered  by   Apache  NiFi
  18. 18. Page   18 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Basics  of  Connecting  Systems For  every  connection,   these  must  agree: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size  of  event 6. Frequency  of  event 7. Authorization  access 8. Relevance P1 Producer C1 Consumer
  19. 19. Page   19 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Using  Messaging Only  a  subset  agree   using  messaging 1. Protocol 2. Format 3. Schema 4. Priority 5. Size  of  event 6. Frequency  of  event 7. Authorization  access 8. Relevance P1 CN C1 Messaging More  issues  to  consider: • How  do  you  know  what  the  data  flow  looks  like?   • How  is  it  managed? • How  is  it  working  – today,  yesterday?
  20. 20. Page   20 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Using  an  Enterprise  Service  Bus  (ESB) Still,  only  a  subset  agree   using  an  ESB: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size  of  event 6. Frequency  of  event 7. Authorization  access 8. Relevance P1 Broker CN C1 Messaging Even  more  issues  to  consider: • Remote  procedure  calls  (RPC)  and  throughput  issues   are  introduced • Design  and  deploy  management  – slow  setup,  not  interactive • You  can  scale  out,  but  not  up  or  down • You  still  don’t  know  what  the  data  flow  looks  like
  21. 21. Page   21 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved OS/Host JVM Flow  Controller Web  Server Processor  1 Extension  N FlowFile Repository Content Repository Provenance Repository Local  Storage OS/Host JVM Flow  Controller Web  Server Processor  1 Extension  N FlowFile Repository Content Repository Provenance Repository Local  Storage Architecture OS/Host JVM NiFi  Cluster  Manager  – Request  Replicator Web  Server Master NiFi  Cluster   Manager  (NCM) OS/Host JVM Flow  Controller Web  Server Processor  1 Extension  N FlowFile Repository Content Repository Provenance Repository Local  Storage Slaves NiFi  Nodes High  Availability:  Control  plane  vs Data  plane…
  22. 22. Page   22 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Define  A  Hortonworks  DataFlow • Easy  to  use  drag  and  drop  UI • Flexible  to  define  the  Data  Flow
  23. 23. Page   23 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved HDF  – Powered  by  Apache  NiFi
  24. 24. Page   24 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Add  processor  for  data  intake 1 Drag  and  drop  processor  icon  from  the  top  menu
  25. 25. Page   25 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Choose  the  specific  processor 2 Choose  one  of  the  processors  – currently  90  available  – designed  for  extension
  26. 26. Page   26 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Example:  Pick  Twitter  Processor
  27. 27. Page   27 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Configure  the  processor 3 Select  processor  and   choose  option  to  Configure 4 Adjust   parameters  as   required
  28. 28. Page   28 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Another  processor  for  data  output 5 Drag  and  drop  processor  icon  from  the  top  menu 6 Example:  choose  PutHDFS processor
  29. 29. Page   29 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Configure  second  processor 7 Configure  2nd processor
  30. 30. Page   30 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Connect  processors,  configure  connection 8
  31. 31. Page   31 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Click  Start  to  begin  processing 9
  32. 32. Page   32 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved See  processors  update  with  real  time  changes 10 As  data  flows,  GUI  interface  updates  in  real   time.  
  33. 33. Page   33 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Dynamically  adjust  and  tune  data  flow  as  needed 11 Dynamically  adjust  and  tune  dataflow  as  needed,  in   real  time.  Can  also  replicate  data  for  testing  and   comparison.  
  34. 34. Page   34 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Understand  the  data  path  with  Data  Provenance 14 Select  Data  Provenance
  35. 35. Page   35 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Trace  lineage  of  a  particular  piece  of  data 15 Icon  for  Data  Lineage
  36. 36. Page   36 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Every  change  to  data  is  tracked:  processing,  views 16 Provenance  event  is  tracked
  37. 37. Page   37 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Updates  as  changes  happen 17 Updates  as  data  flows
  38. 38. Page   38 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Easily  access  and  trace  changes  to  dataflow
  39. 39. Page   39 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Audit  trail  of  Hortonworks  DataFlow User  Actions
  40. 40. Page   40 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Nifi is  complementary  to  Hadoop Deployment  flexibility  from  devices  to  data  center.  Delivers  data  flow   QoS across  dimensions  such  as:  loss  tolerant  vs.  guaranteed   delivery,  low  latency  vs.  high  throughput,  and  priority-­based   queuing.     Operations Governance Starting  at  the  source,  captures  fine-­grained  metadata  regarding  all   data  received,  forked,  joined,  cloned,  modified,  sent,  and  ultimately   dropped  as  data  reaches  its  configured  end-­state  delivering   comprehensive  governance  (aka  provenance,  chain  of  custody)   Security Secures  the  data  movement  from  beginning  to  end.  Allows  for  fine-­ grained  data  authorization  policies  to  be  enforced  at  the  flow-­level.    
  41. 41. Page   41 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Operations • Reporting  tasks (push) • Statistics  /  status  (pull) • Dynamic  flow  changes - Push  new  business  rules  via  REST  API   (closed  loop) - Pull  updates  periodically  from  web   services • Site-­to-­site - Stay  at  the  ‘flow  level’  not  suddenly   doing  file  transfer  protocols • Extensible • Optimized  user   experience  – log  hunts   should  be  the  exception Scale  down,  up,  and  out  – in   containers  and  on  virtual  machines
  42. 42. Page   42 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved The  Need  for  Data  Provenance For  Operators • Traceability,  lineage • Recovery  and  replay For  Compliance • Audit  trail For  Business • Value  sources   • Value  IT  investment BEGIN END LINEAGE
  43. 43. Page   43 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Internet  of   Anything Extending  Data  Governance  from  the  Edge  to  Hadoop ETL   /  DQ MDM ARCHIVE Traditional   Data  Systems Data  Governance  Requirements Transparent Governance  standards  and   protocols  must  be  clearly  defined   and  available  to  all Reproducible Recreate  the  relevant  data   landscape  at  a  given  point  in  time Auditable Trace all  relevant  events  and  assets   with  appropriate  historical  lineage Consistent Compliance  practices  must  be   consistent Hadoop  Data   Platform Must  snap  into  existing data  governance   frameworks  and  openly exchange  metadata SCM CRM ERP Holistic  Data   Governance Business   Analytics Visualization &  Dashboards
  44. 44. Page   44 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved The  Need  for  Fine-­grained  Security  and  Compliance It’s  not  enough  to  say  you  have   encrypted  communications • Enterprise  authorization   services  –entitlements   change  often • People  and  systems  with   different  roles  require   difference  access  levels • Tagged/classified  data
  45. 45. Page   45 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Security Administration Central  management  and   consistent  security • NiFi  Cluster  Manager Authentication Authenticate  users  and  systems • 2-­Way  SSL  support  out  of  the  box;;  additional  types  coming Authorization Provision  access  to  data • Pluggable  authorization  designed  to  fit  any  Identity  and  Access  Management  (IAM)  scheme • File-­based  authority  provider  out  of  the  box • Multi-­role Audit Maintain  a  record  of  data  access • Detailed  logging  of  all  user  actions • Detailed  logging  of  key  system  behaviors • Data  Provenance  enables  unparalleled  tracking  from  the  edge  through  the  Lake Data  Protection Protect  data  at  rest  and  in  motion • Support  a  variety  of  SSL/encrypted  protocols • Tag  and  utilize  tags  on  data  for  fine  grained  access  controls • Encrypt/decrypt  content  using  pre-­shared  key  mechanisms Administrator Configure  system  threads,  user   accounts,  and  flow  audit  history Data  Flow  Manager Manipulate   the  dataflow Read  Only View  the  dataflow  only +NiFi Configure  system  threads,  user   accounts,  and  flow  audit  history Proxy Manipulate   the  dataflow Provenance Query  the  provenance   repository  and   download content
  46. 46. Page   46 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  47. 47. Page   47 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Operations:  Planned
  48. 48. Page   48 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  49. 49. Page   49 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  50. 50. Page   50 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Planned  Apache  NiFi Enhancements IN  PROGRESS Enhanced  Configuration  management of  flows STARTED Extension and  template  registry TARGETTED  TONIFI  0.4.0  RELEASE First-­class Avro  support1 STARTED Interactive  queue  management STARTED Multi-­tenant data  flow FUTURE Pluggable authentication FUTURE Reference-­able  process groups FUTURE Variable registry https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
  51. 51. Page   51 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  ReservedPage   51 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Tweet:  #hadooproadshow Try  It  Yourself,   Download  Nifi and  HDP  Sandbox from   hortonworks.com/sandbox Tweet:  #hadooproadshow
  52. 52. Page   52 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Thank  you! Mats  Johansson mjohansson@hortonworks.com @matsjo66 https://se.linkedin.com/in/matsjo66