Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
The
Technology
Behind
Yucheng Low, PhD
Chief Architect
GraphLab
Create
GraphLab Philosophy
Users-First Architecture
Architecture
Systems
User
Architecture
Systems
User
Systems-First
Architectures
Systems define constraints.
Optimize for performance.
PowerGraph
Architecture
Systems
User
Users-First
Architectures
Users define constraints.
Optimize for user interaction.
What is a Users-First
Architecture for Data Science?
SFrame and SGraph
Building on decades of database
and systems research.
Built by data scientists,
for data scientists.
SFrame: Scalable Tabular
Data Manipulation
SGraph: Scalable Graph
Manipulation
User Com.
Title Body
User Disc.
Enabling users to easily and efficiently
translate between both representations to
get the best of both worlds.
SFrame: Scalable Tabular
Data Manipulation
SGraph: Scalable Graph
Manipulation
User Com.
Title Body
User Disc.
SFrame: Scalable Tabular
Data Manipulation
SGraph: Scalable Graph
Manipulation
User Com.
Title Body
User Disc.
SFrame Design
Jobs fail because:
• Machine run out of memory
• Did not set Java Heap Size correctly
• Resource Configurati...
SFrame Design
• Graceful Degradation as 1st principle
• Always Works
Pain Point #2: Too Strict or Too Weak Schemas
We want...
SFrame Design
• Graceful Degradation as 1st principle
• Always Works
• Rich Datatypes
• Strong schema types: int, double, ...
SFrame Design
• Graceful Degradation as 1st principle
• Always Works
• Rich Datatypes
• Strong schema types: int, double, ...
SFrame Design
• Graceful Degradation as 1st principle
• Always Works
• Rich Datatypes
• Strong schema types: int, double, ...
SFrame Python API Example
Make a little SFrame of 1 column and 5 values:
sf = gl.SFrame({‘x’:[1,2,3,4,5]})
Normalizes the ...
SFrame Querying
Supports most typical SQL SELECT operations using a
Pythonic syntax.
SQL
SELECT Book.title AS title, COUNT...
SFrame Columnar Encoding
user movie rating
Netflix Dataset,
99M rows, 3 columns, ints
1.4GB raw
289MB gzip compressed
SFrame Columnar Encoding
user movie rating Type aware compression:
• Variable Bit length Encode
• Frame Of Reference Encod...
SFrame Columnar Encoding
user movie rating Type aware compression:
• Variable Bit length Encode
• Frame Of Reference Encod...
SFrame Columnar Encoding
user movie rating Type aware compression:
• Variable Bit length Encode
• Frame Of Reference Encod...
SFrames Distributed
• Distributed Dataflow
• Columnar Query Optimizations
• Communicate columnar compressed blocks
rather ...
SFrame: Scalable Tabular
Data Manipulation
SGraph: Scalable Graph
Manipulation
User Com.
Title Body
User Disc.
SFrame: Scalable Tabular
Data Manipulation
SGraph: Scalable Graph
Manipulation
User Com.
Title Body
User Disc.
SGraph
• SFrame backed graph representation.
Inherits SFrame properties.
• Data types, External Memory, Columnar,
compress...
SGraph Layout
1
2
3
4
Vertex
SFrames
Vertices Partitioned
into p = 4 SFrames.
SGraph Layout
1
2
3
4
Vertex
SFrames
__id Name Address ZipCode
1011 John … 98105
2131 Jack … 98102
Vertices Partitioned
in...
SGraph Layout
1
2
3
4
Vertex
SFrames
(1,2)
(2,2)
(3,2)
(4,2)
(1,1)
(2,1)
(3,1)
(4,1)
(1,4)
(2,4)
(3,4)
(4,4)
(1,3)
(2,3)
(...
SGraph Layout
1
2
3
4
Vertex
SFrames
(1,2)
(2,2)
(3,2)
(4,2)
(1,1)
(2,1)
(3,1)
(4,1)
(1,4)
(2,4)
(3,4)
(4,4)
(1,3)
(2,3)
(...
SGraph Layout
1
2
3
4
Vertex
SFrames
(1,2)
(2,2)
(3,2)
(4,2)
(1,1)
(2,1)
(3,1)
(4,1)
(1,4)
(2,4)
(3,4)
(4,4)
(1,3)
(2,3)
(...
SGraph Layout
Vertex
SFrames
Edge
SFrames
SGraph Layout
Vertex
SFrames
Edge
SFrames
SGraph Layout
Vertex
SFrames
Edge
SFrames
SGraph Layout
Vertex
SFrames
Edge
SFrames
SGraph Layout
Vertex
SFrames
Edge
SFrames
Deep Integration of SFrames and
SGraphs
• Seamless interaction between graph data
and table data.
• Queries can be perform...
Demo
SFrame: Scalable Tabular
Data Manipulation
SGraph: Scalable Graph
Manipulation
User Com.
Title Body
User Disc.
User-first ...
Nächste SlideShare
Wird geladen in …5
×

GraphLab Conference 2014 Yucheng Low - Scalable Data Structures: SFrame & SGraph

775 Aufrufe

Veröffentlicht am

Scalable Data Structures: SFrame & SGraph

Veröffentlicht in: Daten & Analysen
  • Attract Abundance Into Your Life - New musical "Angel tone" calls in your angels to help you manifest abundance and miracles into your life... starting in just minutes per day. Go here to listen now. ▲▲▲ http://ishbv.com/manifmagic/pdf
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • How to use "The Scrambler" ot get a girl obsessed with BANGING you... ➤➤ http://ishbv.com/unlockher/pdf
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • How to Love Yourself: 15 Tips for Developing Self Love ➤➤ https://bit.ly/30Ju5r6
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

GraphLab Conference 2014 Yucheng Low - Scalable Data Structures: SFrame & SGraph

  1. 1. The Technology Behind Yucheng Low, PhD Chief Architect GraphLab Create
  2. 2. GraphLab Philosophy Users-First Architecture
  3. 3. Architecture Systems User
  4. 4. Architecture Systems User Systems-First Architectures Systems define constraints. Optimize for performance. PowerGraph
  5. 5. Architecture Systems User Users-First Architectures Users define constraints. Optimize for user interaction.
  6. 6. What is a Users-First Architecture for Data Science?
  7. 7. SFrame and SGraph Building on decades of database and systems research. Built by data scientists, for data scientists.
  8. 8. SFrame: Scalable Tabular Data Manipulation SGraph: Scalable Graph Manipulation User Com. Title Body User Disc.
  9. 9. Enabling users to easily and efficiently translate between both representations to get the best of both worlds.
  10. 10. SFrame: Scalable Tabular Data Manipulation SGraph: Scalable Graph Manipulation User Com. Title Body User Disc.
  11. 11. SFrame: Scalable Tabular Data Manipulation SGraph: Scalable Graph Manipulation User Com. Title Body User Disc.
  12. 12. SFrame Design Jobs fail because: • Machine run out of memory • Did not set Java Heap Size correctly • Resource Configuration X needs to be bigger. Pain Point #1: Resource Limits
  13. 13. SFrame Design • Graceful Degradation as 1st principle • Always Works Pain Point #2: Too Strict or Too Weak Schemas We want strong schema types. We also want weak schema types. Missing Values
  14. 14. SFrame Design • Graceful Degradation as 1st principle • Always Works • Rich Datatypes • Strong schema types: int, double, string... • Weak schema types: list, dictionary
  15. 15. SFrame Design • Graceful Degradation as 1st principle • Always Works • Rich Datatypes • Strong schema types: int, double, string... • Weak schema types: list, dictionary Pain Point #3: Feature Manipulation Difficult or costly to inspect existing features and create new features. Hard to perform data exploration.
  16. 16. SFrame Design • Graceful Degradation as 1st principle • Always Works • Rich Datatypes • Strong schema types: int, double, string... • Weak schema types: list, dictionary • Columnar Architecture • Easy feature engineering + Vectorized feature operations. • Immutable columns + Lazy evaluation • Statistics + visualization + sketches
  17. 17. SFrame Python API Example Make a little SFrame of 1 column and 5 values: sf = gl.SFrame({‘x’:[1,2,3,4,5]}) Normalizes the column x: sf[‘x’] = sf[‘x’] / sf[‘x’].sum() Uses a python lambda to create a new column: sf[‘x-squared’] = sf[‘x’].apply(lambda x: x*x) Create a new column using a vectorized operator: sf[‘x-cubed’] = sf[‘x-squared’] * sf[‘x’] Create a new SFrame taking only 2 of the columns: sf2 = sf[[‘x’,’x-squared’]]
  18. 18. SFrame Querying Supports most typical SQL SELECT operations using a Pythonic syntax. SQL SELECT Book.title AS title, COUNT(*) AS authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title; SFrame Python Book.join(Book_author, on=‘isbn’) .groupby(‘title’, {‘authors’:gl.aggregate.COUNT})
  19. 19. SFrame Columnar Encoding user movie rating Netflix Dataset, 99M rows, 3 columns, ints 1.4GB raw 289MB gzip compressed
  20. 20. SFrame Columnar Encoding user movie rating Type aware compression: • Variable Bit length Encode • Frame Of Reference Encode • ZigZag Encode • Delta / Delta ZigZag Encode • Dictionary Encode • General Purpose LZ4 Netflix Dataset, 99M rows, 3 columns, ints 1.4GB raw 289MB gzip compressed SFrame File
  21. 21. SFrame Columnar Encoding user movie rating Type aware compression: • Variable Bit length Encode • Frame Of Reference Encode • ZigZag Encode • Delta / Delta ZigZag Encode • Dictionary Encode • General Purpose LZ4 Netflix Dataset, 99M rows, 3 columns, ints 1.4GB raw 289MB gzip compressed User  176 MB 14.2 bits/int SFrame File 0.02 bits/intMovie  257 KB 3.8 bits/intRating  47 MB ------------------------------- Total  223MB
  22. 22. SFrame Columnar Encoding user movie rating Type aware compression: • Variable Bit length Encode • Frame Of Reference Encode • ZigZag Encode • Delta / Delta ZigZag Encode • Dictionary Encode • General Purpose LZ4 Netflix Dataset, 99M rows, 3 columns, ints 1.4GB raw 289MB gzip compressed User  176 MB 14.2 bits/int SFrame File 0.02 bits/intMovie  257 KB 3.8 bits/intRating  47 MB ------------------------------- Total  223MB 10s
  23. 23. SFrames Distributed • Distributed Dataflow • Columnar Query Optimizations • Communicate columnar compressed blocks rather than row tuples. The choice of distributed or local execution is a question of query optimization.
  24. 24. SFrame: Scalable Tabular Data Manipulation SGraph: Scalable Graph Manipulation User Com. Title Body User Disc.
  25. 25. SFrame: Scalable Tabular Data Manipulation SGraph: Scalable Graph Manipulation User Com. Title Body User Disc.
  26. 26. SGraph • SFrame backed graph representation. Inherits SFrame properties. • Data types, External Memory, Columnar, compression, etc. • Layout optimized for batch external memory computation.
  27. 27. SGraph Layout 1 2 3 4 Vertex SFrames Vertices Partitioned into p = 4 SFrames.
  28. 28. SGraph Layout 1 2 3 4 Vertex SFrames __id Name Address ZipCode 1011 John … 98105 2131 Jack … 98102 Vertices Partitioned into p = 4 SFrames.
  29. 29. SGraph Layout 1 2 3 4 Vertex SFrames (1,2) (2,2) (3,2) (4,2) (1,1) (2,1) (3,1) (4,1) (1,4) (2,4) (3,4) (4,4) (1,3) (2,3) (3,3) (4,3) Edge SFrames Edges partitioned into p^2 = 16 SFrames.
  30. 30. SGraph Layout 1 2 3 4 Vertex SFrames (1,2) (2,2) (3,2) (4,2) (1,1) (2,1) (3,1) (4,1) (1,4) (2,4) (3,4) (4,4) (1,3) (2,3) (3,3) (4,3) Edge SFrames Edges partitioned into p^2 = 16 SFrames.
  31. 31. SGraph Layout 1 2 3 4 Vertex SFrames (1,2) (2,2) (3,2) (4,2) (1,1) (2,1) (3,1) (4,1) (1,4) (2,4) (3,4) (4,4) (1,3) (2,3) (3,3) (4,3) Edge SFrames Edges partitioned into p^2 = 16 SFrames.
  32. 32. SGraph Layout Vertex SFrames Edge SFrames
  33. 33. SGraph Layout Vertex SFrames Edge SFrames
  34. 34. SGraph Layout Vertex SFrames Edge SFrames
  35. 35. SGraph Layout Vertex SFrames Edge SFrames
  36. 36. SGraph Layout Vertex SFrames Edge SFrames
  37. 37. Deep Integration of SFrames and SGraphs • Seamless interaction between graph data and table data. • Queries can be performed easily across graph and tables.
  38. 38. Demo
  39. 39. SFrame: Scalable Tabular Data Manipulation SGraph: Scalable Graph Manipulation User Com. Title Body User Disc. User-first architecture. Built by data scientists, for data scientists.

×