SlideShare ist ein Scribd-Unternehmen logo
1 von 62
HiPIC




        Recent IT Development and Women:
          Big Data and The Power of Women in Goryeo


                        KWiSE Annual Meeting
                            Chapman University, CA
                                Oct 20th 2012



                             Jongwook Woo (PhD)
                 High-Performance Internet Computing Center (HiPIC)
        Educational Partner with Cloudera and Grants Awardee of Amazon AWS
                        Computer Information Systems Department
                         California State University, Los Angeles

 Jongwook Woo
                                                                             CSULA
HiPIC                  Contents
Part I. Big Data
    Fundamentals of Big Data
    Data-Intensive Computing: Hadoop
    Big Data Supporters and Use Cases

Part II. The Power of Women in Goryeo
 Dynasty
    North East Asia before the Mongol Empire
    Korea and Mongol
    The Empress Gi




                                                CSULA
  Jongwook Woo
HiPIC                 Part I
Big Data
    Fundamentals of Big Data
    NoSQL DB: HBase, MongoDB
    Data-Intensive Computing: Hadoop
    Big Data Supporters and Use Cases




                                         CSULA
  Jongwook Woo
HiPIC                Experience in Big Data
 Grants
     Received Amazon AWS in Education Research Grant (July
      2012 - July 2014)
     Received Amazon AWS in Education Coursework Grants (July
      2012 - July 2013, Jan 2011 - Dec 2011

 Partnership
     Received Academic Education Partnership with Cloudera since
      June 2012

 Certificate
     Certificate of Achievement in the Big Data University Training
      Course, “Hadoop Fundamentals I”, July 8 2012

 Cloud Computing Blog
     http://dal-cloudcomputing.blogspot.com/
                                                                   CSULA
  Jongwook Woo
What is Big Data, Map/Reduce, Hadoop, NoSQL DB on
HiPIC                              Cloud Computing




                                                                     CSULA
  Jongwook Woo
HiPIC                      Big Data

Too much data
    Tera-Byte (1012), Peta-byte (1015)
           – Because of web
           – Sensor Data, Bioinformatics, Social
             Computing, smart phone, online game…

Cannot handle with the legacy
 approach
    Too big
    Un-/Semi-structured data


                                                    CSULA
  Jongwook Woo
HiPIC               Two Issues in Big Data

How to store Big Data
    NoSQL DB

How to compute Big Data
    Parallel Computing with multiple cheap
     computers
           – Not need super computers




                                              CSULA
  Jongwook Woo
HiPIC            Contents
Fundamentals of Big Data

Data-Intensive Computing: Hadoop

Big Data Supporters and Use Cases




                                     CSULA
  Jongwook Woo
HiPIC                          Data nowadays

• Data Issues
    o data grows to 10TB, and then 100TB.
    o Unstructured data coming from sources
                like Facebook, Twitter, RFID readers, sensors,
                 and so on.
                Need to derive information from both the
                 relational data and the unstructured data
                   • as soon as possible.

• Solution to efficiently compute Big
   Data
    o     Hadoop Map/Reduce
                                                              CSULA
  Jongwook Woo
HiPIC             Solutions in Big Data Computation
 Map/Reduce by Google
    (Key, Value) parallel computing
 Apache Hadoop
     Big Data
                 Data Computation (MapReduce, Pig)

 Integrating MapReduce and RDB
     Oracle + Hadoop
     Sybase IQ
     Vertica + Hadoop
     Hadoop DB
     Greenplum
     Aster Data
 Integrating MapReduce and NoSQL DB
     MongoDB MapReduce
     HBase
                                                      CSULA
  Jongwook Woo
HiPIC                             Apache Hadoop
 Motivated by Google Map/Reduce and GFS
     open source project of the Apache Foundation.
     framework written in Java
        – originally developed by Doug Cutting
                 • who named it after his son's toy elephant.

 Two core Components
     Storage: HDFS
       – High Bandwidth Clustered storage
     Processing: Map/Reduce
       – Fault Tolerant Distributed Processing

 Hadoop scales linearly with
     data size
     Analysis complexity

                                                                CSULA
  Jongwook Woo
HiPIC                        Hadoop issues
 Map/Reduce is not DB
     Algorithm in Restricted Parallel Computing

 HDFS and HBase
     Cannot compete with the functions in RDBMS

 But, useful for
     Semi-structured data model and high-level dataflow query
      language on top of MapReduce
        – Pig, Hive, Jsql, Cascading, Cloudbase
     Useful for huge (peta- or Terra-bytes) but non-complicated data
           – Web crawling
           – log analysis
               • Log file for web companies
           – New York Times case


                                                                  CSULA
  Jongwook Woo
HiPIC            MapReduce Pros & Cons Summary

Good when
    Huge data for input, intermediate, output
    A few synchronization required
    Read once; batch oriented datasets (ETL)

Bad for
    Fast response time
    Large amount of shared data
    Fine-grained synch needed
    CPU-intensive not data-intensive
    Continuous input stream


                                                 CSULA
  Jongwook Woo
HiPIC                 MapReduce in Detail
Functions borrowed from functional
 programming languages (eg. Lisp)

Provides Restricted parallel programming
 model on Hadoop
        User implements Map() and Reduce()
        Libraries (Hadoop) take care of
         EVERYTHING else
             – Parallelization
             – Fault Tolerance
             – Data Distribution
             – Load Balancing

                                              CSULA
  Jongwook Woo
HiPIC                      Map
Convert input data to (key, value) pairs
map() functions run in parallel,
     creating different intermediate (key, value)
     values from different input data sets




                                                 CSULA
  Jongwook Woo
HiPIC                       Reduce
reduce() combines those intermediate values
 into one or more final values for that same
 key
reduce() functions also run in parallel,
    each working on a different output key
Bottleneck:
    reduce phase can‟t start until map phase is
     completely finished.




                                                   CSULA
  Jongwook Woo
HiPIC            Example: Sort URLs in the largest hit order
Compute the largest hit URLs
         Stored in log files

Map()
         Input <logFilename, file text>
         Output: Parses file and emits <url, hit counts> pairs
                 – eg. <http://hello.com, 1>

Reduce()
         Input: <url, list of hit counts> from multiple map
          nodes
         Output: Sums all values for the same key and emits
          <url, TotalCount>
                 – eg.<http://hello.com, (3, 5, 2, 7)> => <http://hello.com, 17>
                                                                            CSULA
  Jongwook Woo
HiPIC            Map/Reduce for URL visits

                              Input Log Data


              Map1()          Map2()       …      Mapm()
 (http://hi.com, 1)         (http://halo.com, 1)
 (http://hello.com, 3)      (http://hello.com, 5)
 …                          …
                         Data Aggregation/Combine
    (http://hi.com, <1, 1, …, 1>)       (http://halo.com, <1, 5,>)
                        (http://hello.com, <3, 5, 2, 7>)
                 Reduce1 ()    Reduce2()    …      Reducel()

  (http://hi.com, 32)                       (http://halo.com, 6)
                          (http://hello.com, 17)
                                                               CSULA
  Jongwook Woo
HiPIC                             Legacy Example

In late 2007, the New York Times
 wanted to make available over the web
 its entire archive of articles,
    11 million in all, dating back to 1851.
    four-terabyte pile of images in TIFF format.
    needed to translate that four-terabyte pile of TIFFs
     into more web-friendly PDF files.
           – not a particularly complicated but large computing chore,
                 • requiring a whole lot of computer processing time.




                                                                         CSULA
  Jongwook Woo
HiPIC                    Legacy Example (Cont’d)

In late 2007, the New York Times
 wanted to make available over the web
 its entire archive of articles,
    a software programmer at the Times, Derek Gottfrid,
       – playing around with Amazon Web Services, Elastic
         Compute Cloud (EC2),
           • uploaded the four terabytes of TIFF data into Amazon's
             Simple Storage System (S3)
           • In less than 24 hours, 11 millions PDFs, all stored
             neatly in S3 and ready to be served up to visitors to the
             Times site.
     The total cost for the computing job? $240
           – 10 cents per computer-hour times 100 computers times 24 hours




                                                                        CSULA
  Jongwook Woo
HiPIC            Contents
Fundamentals of Big Data

Data-Intensive Computing: Hadoop

Big Data Supporters and Use Cases




                                     CSULA
  Jongwook Woo
HiPIC                        Supporters of Big Data

 Apache Hadoop Supporters
     Cloudera
           – Like Linux and Redhat
           – HiPIC is an Academic Partner
     Hortonworks
           – Pig,
           – Consulting and training
     Facebook
           – Hive
     IBM
           – Jaql

 NoSQL DB supporters
     MongoDB
           – HiPIC tries to collaborate
     HBase, CouchDB, Apache Cassandra (originally by FB) etc
                                                                CSULA
  Jongwook Woo
HiPIC          Similarities in Pig, Hive, and Jaql

•    translate high-level languages into MapReduce jobs
     o the programmer can work at a higher level
         than writing MapReduce jobs in Java or other
           lower-level languages
•    programs are much smaller than Java code.
•    option to extend these languages,
      o often by writing user-defined functions in Java.
•    Interoperability
      o programs written in these high-level languages can
        be imbedded inside other languages as well.



                                                           CSULA
    Jongwook Woo
HiPIC                        Pig
•    developed at Yahoo Research around 2006
     o moved into the Apache Software Foundation in
        2007.
•    PigLatin,
      o Pig's language
      o a data flow language
      o well suited to processing unstructured data
         Easy to write MapReduce codes




                                                      CSULA
    Jongwook Woo
HiPIC                                  Hive
•    developed at Facebook
      o turns Hadoop into a data warehouse
            o complete with a dialect of SQL for querying.
•    HiveQL
     o a declarative language (SQL dialect)
•    Difference from PigLatin,
      o     you do not specify the data flow,
              but instead describe the result you want
                    Hive figures out how to build a data flow to
                     achieve it.
      o a schema is required,



                                                                    CSULA
    Jongwook Woo
HiPIC                      Jaql

• developed at IBM.
• a data flow language
    o its native data structure format is JSON (JavaScript
      Object Notation).




                                                        CSULA
  Jongwook Woo
HiPIC             Use Cases

Amazon AWS

Facebook

Twitter

Craiglist

HuffPOst | AOL




                              CSULA
  Jongwook Woo
HiPIC                                Amazon AWS

 amazon.com
     Consumer and seller business

 aws.amazon.com
     IT infrastructure business
           – Focus on your business not IT management
     Pay as you go
           – Pay for servers by the hour
           – Pay for storage per Giga byte per month
           – Pay for data transfer per Giga byte
     Services with many APIs
           – S3: Simple Storage Service
           – EC2: Elastic Compute Cloud
                 • Provide many virtual Linux servers
                 • Can run on multiple nodes
                     – Hadoop and HBase
                     – MongoDB
                                                        CSULA
  Jongwook Woo
HiPIC                   Amazon AWS (Cont’d)

Customers on aws.amazon.com
    Samsung
      – Smart TV hub sites: TV applications are on AWS
    Netflix
           – ~25% of US internet traffic
           – ~100% on AWS
    NASA JPL
           – Analyze more than 200,000 images
    NASDAQ
           – Using AWS S3



                                                    CSULA
  Jongwook Woo
HiPIC                            Facebook [7]

Using Apache HBase
    For Titan and Puma
    HBase for FB
           – Provide excellent write performance and good reads
           – Nice features
                 • Scalable
                 • Fault Tolerance
                 • MapReduce




                                                              CSULA
  Jongwook Woo
HiPIC                         Titan: Facebook

Message services in FB
    Hundreds of millions of active users
    15+ billion messages a month
    50K instant message a second

Challenges
    High write throughput
           – Every message, instant message, SMS, email
    Massive Clusters
           – Must be easily scalable

Solution
    Clustered HBase
                                                          CSULA
  Jongwook Woo
HiPIC                              Puma: Facebook

 ETL
     Extract, Transform, Load
       – Data Integrating from many data sources to Data Warehouse
     Data analytics
       – Domain owners‟ web analytics for Ad and apps
                 • clicks, likes, shares, comments etc

 ETL before Puma
     8 – 24 hours
        – Procedures: Scribe, HDFS, Hive, MySQL

 ETL after Puma
     Puma
        – Real time MapReduce framework
     2 – 30 secs
        – Procedures: Scribe, HDFS, Puma, HBase


                                                                     CSULA
  Jongwook Woo
HiPIC                              Twitter [8]

Three Challenges
    Collecting Data
           – Scribe as FB
    Large Scale Storage and analysis
           – Cassandra: ColumnFamily key-value store
           – Hadoop
    Rapid Learning over Big Data
           – Pig
                 • 5% of Java code
                 • 5% of dev time
                 • Within 20% of running time




                                                       CSULA
  Jongwook Woo
HiPIC                 Craiglist in MongoDB [9]

Craiglist
    ~700 cities, worldwide
    ~1 billion hits/day
    ~1.5 million posts/day
    Servers
           – ~500 servers
           – ~100 MySQL servers

Migrate to MongoDB
    Scalable, Fast, Proven, Friendly




                                                 CSULA
  Jongwook Woo
HiPIC
                                 HuffPost | AOL [10]



Two Machine Learning Use Cases
   Comment Moderation
         – Evaluate All New HuffPost User Comments Every
           Day
                  • Identify Abusive / Aggressive Comments
                  • Auto Delete / Publish ~25% Comments Every Day
   Article Classification
         – Tag Articles for Advertising
                  • E.g.: scary, salacious, …

build a flexible ML platform running on
 Hadoop
   Pig for Hadoop implementation.

                                                                CSULA
   Jongwook Woo
HiPIC              Conclusion
Era of Big Data

Need to store and compute Big Data

Storage: NoSQL DB

Computation: Hadoop MapRedude

Need to analyze Big Data in mobile
 computing, SNS for Ad, User Behavior,
 Patterns, Bioinformatics, Medical data …




                                            CSULA
  Jongwook Woo
HiPIC                  Part II
The power of Women in Goryeo
 Dynasty
    North East Asia before the Mongol Empire
    Korea and Mongol
    The Empress Gi




                                                CSULA
  Jongwook Woo
HiPIC            Three kingdoms (AD 907 - 1125)




                                                  CSULA
  Jongwook Woo
HiPIC                        Before Mongol

Three kingdoms balanced power
    Goryeo, Yo (Liao, Cathay, Khitan, 契丹),
     Song
      –Goryeo-Yo: 3 wars
                 • First invasion (AD 993): 서희,
                 • Second invasion with 400K (AD 1010):
                   강조
                 • Third invasion with 100K (AD 1018):
                   강감찬
                   – Goryeo became famous after this victory




                                                               CSULA
  Jongwook Woo
HiPIC            Three kingdoms (AD 1115- 1234)




                                                  CSULA
  Jongwook Woo
HiPIC                        Before Mongol

Three kingdoms balanced power
 (AD 1115 - 1234)
    Goryeo, Gum (Jin, Jurchen, Yojin, 金朝),
     South Song
           –윤관 invaded Jurchen Wanyan (完顏) clan
            (AD 1111) and many battles
           –Jin defeated Liao dynasty at AD 1121
           – wanted to keep a peace with Goryeo
                 • From the emperor of big brother to the
                   king of little brother



                                                            CSULA
  Jongwook Woo
HiPIC
                 Part II. The power of Women in Goryeo Dynasty

Korea and Mongol
    Wars since AD1231 (고종 18)
Goryeo (Korea) dynasty
    Military dictatorship of Choe family ended at AD1258
     (고종 45)
Mongol
    Was conquering China (the South Song dynasty)
     since AD1257
      – Möngke Kahn
                 • Right battalion
           – Kublai
                 • Left battalion


                                                                 CSULA
  Jongwook Woo
HiPIC                      Korea and Mongol (Cont’d)




                 Mongol Empire in 1227 at Genghis Khan„s death
                 [http://en.wikipedia.org/wiki/Timeline_of_the_Mon
                 gol_Empire]
                                                                     CSULA
  Jongwook Woo
HiPIC                       Korea and Mongol (Cont’d)



                       1236 Beginning
                       invading Europe by
                       Hulagu
                                      Ariq Böke controlled
                                                            1231 Beginning
                                      Mongol at Karakorum
                                                            invading Korea
                                         1236 Beginning
                                         invading South Asia
                                         By Möngke Khan and
                                         Kublai




                 Mongol Empire after Genghis Khan„s death (1227)
                 under Möngke Khan
                 [http://en.wikipedia.org/wiki/Timeline_of_the_Mongol
                 _Empire]
                                                                             CSULA
  Jongwook Woo
HiPIC                    Korea and Mongol (Cont’d)

World in AD1257 – 1260
    1257: Mongols was attacking Vietnam
    1258: Mongols occupied Baghdad
    1259: Mongols was invading Syria
      – The death of Möngke Khan
    1260: The succession war had begun
           – By Möngke‟s brothers : Kublai Khan and Ariq Böke.
           – Kublai and the youngest brother Hulugu returned to
             KaraKorum: Capital of the Mongol empire
                 • Kara: north, Korum: Khori (Space, 골, 고을)




                                                                  CSULA
  Jongwook Woo
HiPIC                 Korea and Mongol (Cont’d)

Again Goreyo and Mongol in 1259
    Decided to have a peace treaty with Mongol
           – Actually to surrender
    April 21 1259 (고종 46): The Crown Prince left to
     meet the Khan
    May 17th 1259: The Crown Prince met Mongol army
     at Yoyang (Liao liang) who was about to invade
     Goreyo
           – Stop the Mongol army
     June 30 1259: The king Go-Jong passed away
     July 30 1259: The Khan passed away
           – Mongol army stopped the prince to hide the khan‟s
            death
    The prince met Kublai at Gaebong close to the
     Yellow river
           – Dec 1259: Kublai was returning back to KaraKorum CSULA
  Jongwook Woo
HiPIC                      Korea and Mongol (Cont’d)




                        Hulagu
                                          Ariq Böke controlled
                                          Mongol at Karakorum          Goryeo‟s Crown
                                                                       Prince

                                                              Kublai




                 Mongol Empire after Möngke Kahn' death (1227)
                 [http://en.wikipedia.org/wiki/Timeline_of_the_Mon
                 gol_Empire]
                                                                               CSULA
  Jongwook Woo
HiPIC                Goreyo and Mongol in 1260-1264

The great meeting and the great Khan
    Kublai welcomed the prince with the glad favor
           – Kublai was so happy and said
                 • “The god is helping me. Goryeo kingdom surrendered
                   to me, who was never defeated even by the Chinese
                   emperor Dang Tae-Jong”
                 • He knew that Goryeo is originated from GoGuRyeo
    Kublai appointed the prince to the king of Goryeo
     (Won-Jong)
           – as Go-Jong passed away
    They came together to Beijing on Jan 1260.
    April 1260: Won-Jong‟s enthronement ceremony in
     Goryeo
     August 21 1264: Ariq Böke surrendered to Kublai
     at Xanadu (KaraKorum)

                                                                    CSULA
  Jongwook Woo
HiPIC            The great meeting and the marriage

 Sept 1264: King Won-Jong went to Beijing and meet
  the Khan
     Another great welcoming from the Khan

 1269: Kublai decided his daughter to marry the
  crown price of Goryeo
     1269, Aug 1270: Won-Jong and the crown prince asked Kublai for the
      marriage
     1271, 1272: the prince went to Beijing and returned back
       – Volunteer to lead the invasion of Japan
     April 1273: Defeated Sambyolcho at Jeju island

 May 1274: The crown prince of Goryeo and the
  princess of the Mongol (Holdorogerimisil, 제국공주)
  empire married at the palace of the capital in the
  Mongol empire

 Aug 1274: The prince became the king (충렬왕)
                                                                     CSULA
  Jongwook Woo
HiPIC                 Korea and Mongol (Cont’d)




Mongol Empire in 1300 -1405: this map is not
correct as Goryeo was an independent
kingdom
[http://en.wikipedia.org/wiki/Timeline_of_the_Mon
gol_Empire]                                          CSULA
     Jongwook Woo
HiPIC                  Korea and Mongol (Cont’d)




                                        The Mongol Empire and the
                                        Kingdom of Goryeo tied with
                                        marriages




Mongol Empire in
[http://en.wikipedia.org/wiki/Kublai_Khan]

                                                                      CSULA
   Jongwook Woo
HiPIC                   The political position

 The position of the king was the 7th ranked in the
  Mongol empire
     It is the power of the princess
        – A daughter of Kublai
     Should know that Kublai Khan has 12 sons.
     Goryeo received many benefits from the empire
        – “Only Goryeo in the world kept the king and kingdom”
        – When the king went to the palace of the empire, all mongol
           officials wanted to give presents.
        – The king asked the Khan to suppress Mongol generals in
           Goryeo

 The position of the king was the 4th ranked in the
  empire
     The next great Khan Temur:
     The princess is his aunt
     The khan asked the king be the 4th ranked at the empire
                                                                  CSULA
  Jongwook Woo
HiPIC             The Empress Gi (기황후, 奇皇后)

born to Gi Ja-o (奇子敖)
 in Haengju (幸州), Gor
 yeo
   Became a concubine of
    Toghun Temür Khan
         – Became the first
           empress in 1365
   Her son Ayurshiridar was
    designated Crown Prince
    in 1353.
         – Supported by Korean
           eunuch Bak Bulhwa
           (朴不花)
         – became a Khan called
           Biligtü Khan in 1370.


                                               CSULA
   Jongwook Woo
HiPIC            The Empress Gi (기황후, 奇皇后)

 Good for Goryeo
    She prohibited the culture to send Korean women to
     the Mongol empire for marriage and slavery
     She eliminated any discussion to make Goryeo
     kingdom as one of provinces in the Mongol empire




                                                     CSULA
  Jongwook Woo
HiPIC            The Empress Gi (기황후, 奇皇后)

 An elder brother named Gi Cheol (奇轍,
 Bayan Bukha).
     Came to threaten the position of the king of Goryeo
     King Gongmin exterminated the Gi family in 1356




                                                       CSULA
  Jongwook Woo
HiPIC            The Empress Gi (기황후, 奇皇后)

 The Ming China occupied the capital of the
 empire, Dadu (大都, Beijing), in 1368
    The empress was disappointed that Goryeo did not
     send any reinforcements
    Fled north to Shangdu (上都, Xanadu)




                                                    CSULA
  Jongwook Woo
HiPIC                  Conclusion II
Woman has a power to control husband:
 King and Khan (Emperor)
     can promote their social positions to the higher
Woman can make a son to a Khan

Woman possess a political power to
 positively affect the motherland

We need to know history and educate kids




                                                         CSULA
  Jongwook Woo
HiPIC            Question?




                             CSULA
  Jongwook Woo
HiPIC                References Part I
1) Introduction to MongoDB, Nosh Petigara, Jan 11, 2011

2) Hadoop Fundamental I, Big Data University

3) “Large Scale Data Analysis with Map/Reduce”, Marin
   Dimitrov, Feb 2010

4) “BFS & MapReduce”, Edward J Yoon
   http://blog.udanax.org/2009/02/breadth-first-search-
   mapreduce.html, Feb 26 2009

5) “Market Basket Analysis Algorithm with no-SQL DB HBase
   and Hadoop”,Jongwook Woo, Siddharth Basopia, Yuhang
   Xu, Seon Ho Kim, The Third International Conference on
   Emerging Databases (EDB 2011), Songdo Park Hotel,
   Incheon, Korea, Aug. 25-27, 2011




                                                          CSULA
  Jongwook Woo
HiPIC                    References
6) “Market Basket Analysis Algorithm with Map/Reduce of
   Cloud Computing”, Jongwook Woo and Yuhang Xu, The 2011
   international Conference on Parallel and Distributed
   Processing Techniques and Applications (PDPTA 2011),Las
   Vegas (July 18-21, 2011)

7) Building Realtime Big Data Services at Facebook with
   Hadoop and Hbase, Jonathan Gray, Facebook, Nov 11, 2011,
   Hadoop World NYC

8) Analyzing Big Data at Twitter, Kevin Well, Web 2.0 Expo, NYC,
   Sep 2010

9) Lessons Learned from Migrating 2+ Billion Documents at
   Craigslist, Jeremy Zawodny, 2011

10) Machine Learning on Hadoop at Huffington Post | AOL, Thu
    Kyaw and Sang Chul Song, Hadoop DC, Oct 4, 2011


                                                            CSULA
  Jongwook Woo
HiPIC                   References
11) “MapReduce Debates and Schema-Free”, Woohyun Kim,
    www.coordguru.com, http://blog.naver.com/wisereign, March
    3 2010

12) “Large Scale Data Analysis with Map/Reduce”, Marin
    Dimitrov, Feb 2010

13) “HBase Schema Design Case Studies”, Qingyan Liu, July 13
    2009




                                                          CSULA
  Jongwook Woo
HiPIC            References Part II
1) 고려에 시집온 징기스칸의 딸들, 이한수, Nov 8 2006, 김영사

2) 쿠빌라이 칸의 일본원정과 충렬왕, 이승한, 2009, 푸른역사




                                            CSULA
  Jongwook Woo

Weitere ähnliche Inhalte

Was ist angesagt?

Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 
Pig programming is more fun: New features in Pig
Pig programming is more fun: New features in PigPig programming is more fun: New features in Pig
Pig programming is more fun: New features in Pigdaijy
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
Big Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and TrainingBig Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and TrainingJongwook Woo
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionXplenty
 

Was ist angesagt? (20)

Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
Pig programming is more fun: New features in Pig
Pig programming is more fun: New features in PigPig programming is more fun: New features in Pig
Pig programming is more fun: New features in Pig
 
An Introduction to the World of Hadoop
An Introduction to the World of HadoopAn Introduction to the World of Hadoop
An Introduction to the World of Hadoop
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
Big Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and TrainingBig Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and Training
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 

Ähnlich wie Recent IT Development and Women: Big Data and The Power of Women in Goryeo

Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsJongwook Woo
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopJongwook Woo
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Introduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using HadoopIntroduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using HadoopJongwook Woo
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldJongwook Woo
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoopguest27e6764
 
Big Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingBig Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingJongwook Woo
 
Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Jongwook Woo
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
 
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Jongwook Woo
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...inside-BigData.com
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Jongwook Woo
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approachesLuxoft
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open DataJongwook Woo
 

Ähnlich wie Recent IT Development and Women: Big Data and The Power of Women in Goryeo (20)

Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on Hadoop
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Introduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using HadoopIntroduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using Hadoop
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingBig Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive Computing
 
Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 
Hadoop
HadoopHadoop
Hadoop
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approaches
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
 

Mehr von Jongwook Woo

Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum ComputingJongwook Woo
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsJongwook Woo
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIJongwook Woo
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its TrendsJongwook Woo
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkJongwook Woo
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningJongwook Woo
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraJongwook Woo
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataJongwook Woo
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive AnalysisJongwook Woo
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLJongwook Woo
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryJongwook Woo
 
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeWhose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeJongwook Woo
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017Jongwook Woo
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open PlatformJongwook Woo
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open DataJongwook Woo
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLJongwook Woo
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and SparkJongwook Woo
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and SparkJongwook Woo
 

Mehr von Jongwook Woo (20)

Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum Computing
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its Trends
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and Spark
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep Learning
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart Factory
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeWhose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure ML
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 

Recent IT Development and Women: Big Data and The Power of Women in Goryeo

  • 1. HiPIC Recent IT Development and Women: Big Data and The Power of Women in Goryeo KWiSE Annual Meeting Chapman University, CA Oct 20th 2012 Jongwook Woo (PhD) High-Performance Internet Computing Center (HiPIC) Educational Partner with Cloudera and Grants Awardee of Amazon AWS Computer Information Systems Department California State University, Los Angeles Jongwook Woo CSULA
  • 2. HiPIC Contents Part I. Big Data Fundamentals of Big Data Data-Intensive Computing: Hadoop Big Data Supporters and Use Cases Part II. The Power of Women in Goryeo Dynasty North East Asia before the Mongol Empire Korea and Mongol The Empress Gi CSULA Jongwook Woo
  • 3. HiPIC Part I Big Data Fundamentals of Big Data NoSQL DB: HBase, MongoDB Data-Intensive Computing: Hadoop Big Data Supporters and Use Cases CSULA Jongwook Woo
  • 4. HiPIC Experience in Big Data  Grants  Received Amazon AWS in Education Research Grant (July 2012 - July 2014)  Received Amazon AWS in Education Coursework Grants (July 2012 - July 2013, Jan 2011 - Dec 2011  Partnership  Received Academic Education Partnership with Cloudera since June 2012  Certificate  Certificate of Achievement in the Big Data University Training Course, “Hadoop Fundamentals I”, July 8 2012  Cloud Computing Blog  http://dal-cloudcomputing.blogspot.com/ CSULA Jongwook Woo
  • 5. What is Big Data, Map/Reduce, Hadoop, NoSQL DB on HiPIC Cloud Computing CSULA Jongwook Woo
  • 6. HiPIC Big Data Too much data Tera-Byte (1012), Peta-byte (1015) – Because of web – Sensor Data, Bioinformatics, Social Computing, smart phone, online game… Cannot handle with the legacy approach Too big Un-/Semi-structured data CSULA Jongwook Woo
  • 7. HiPIC Two Issues in Big Data How to store Big Data NoSQL DB How to compute Big Data Parallel Computing with multiple cheap computers – Not need super computers CSULA Jongwook Woo
  • 8. HiPIC Contents Fundamentals of Big Data Data-Intensive Computing: Hadoop Big Data Supporters and Use Cases CSULA Jongwook Woo
  • 9. HiPIC Data nowadays • Data Issues o data grows to 10TB, and then 100TB. o Unstructured data coming from sources  like Facebook, Twitter, RFID readers, sensors, and so on.  Need to derive information from both the relational data and the unstructured data • as soon as possible. • Solution to efficiently compute Big Data o Hadoop Map/Reduce CSULA Jongwook Woo
  • 10. HiPIC Solutions in Big Data Computation  Map/Reduce by Google (Key, Value) parallel computing  Apache Hadoop  Big Data Data Computation (MapReduce, Pig)  Integrating MapReduce and RDB  Oracle + Hadoop  Sybase IQ  Vertica + Hadoop  Hadoop DB  Greenplum  Aster Data  Integrating MapReduce and NoSQL DB  MongoDB MapReduce  HBase CSULA Jongwook Woo
  • 11. HiPIC Apache Hadoop  Motivated by Google Map/Reduce and GFS  open source project of the Apache Foundation.  framework written in Java – originally developed by Doug Cutting • who named it after his son's toy elephant.  Two core Components  Storage: HDFS – High Bandwidth Clustered storage  Processing: Map/Reduce – Fault Tolerant Distributed Processing  Hadoop scales linearly with  data size  Analysis complexity CSULA Jongwook Woo
  • 12. HiPIC Hadoop issues  Map/Reduce is not DB  Algorithm in Restricted Parallel Computing  HDFS and HBase  Cannot compete with the functions in RDBMS  But, useful for  Semi-structured data model and high-level dataflow query language on top of MapReduce – Pig, Hive, Jsql, Cascading, Cloudbase  Useful for huge (peta- or Terra-bytes) but non-complicated data – Web crawling – log analysis • Log file for web companies – New York Times case CSULA Jongwook Woo
  • 13. HiPIC MapReduce Pros & Cons Summary Good when Huge data for input, intermediate, output A few synchronization required Read once; batch oriented datasets (ETL) Bad for Fast response time Large amount of shared data Fine-grained synch needed CPU-intensive not data-intensive Continuous input stream CSULA Jongwook Woo
  • 14. HiPIC MapReduce in Detail Functions borrowed from functional programming languages (eg. Lisp) Provides Restricted parallel programming model on Hadoop User implements Map() and Reduce() Libraries (Hadoop) take care of EVERYTHING else – Parallelization – Fault Tolerance – Data Distribution – Load Balancing CSULA Jongwook Woo
  • 15. HiPIC Map Convert input data to (key, value) pairs map() functions run in parallel,  creating different intermediate (key, value) values from different input data sets CSULA Jongwook Woo
  • 16. HiPIC Reduce reduce() combines those intermediate values into one or more final values for that same key reduce() functions also run in parallel, each working on a different output key Bottleneck: reduce phase can‟t start until map phase is completely finished. CSULA Jongwook Woo
  • 17. HiPIC Example: Sort URLs in the largest hit order Compute the largest hit URLs  Stored in log files Map()  Input <logFilename, file text>  Output: Parses file and emits <url, hit counts> pairs – eg. <http://hello.com, 1> Reduce()  Input: <url, list of hit counts> from multiple map nodes  Output: Sums all values for the same key and emits <url, TotalCount> – eg.<http://hello.com, (3, 5, 2, 7)> => <http://hello.com, 17> CSULA Jongwook Woo
  • 18. HiPIC Map/Reduce for URL visits Input Log Data Map1() Map2() … Mapm() (http://hi.com, 1) (http://halo.com, 1) (http://hello.com, 3) (http://hello.com, 5) … … Data Aggregation/Combine (http://hi.com, <1, 1, …, 1>) (http://halo.com, <1, 5,>) (http://hello.com, <3, 5, 2, 7>) Reduce1 () Reduce2() … Reducel() (http://hi.com, 32) (http://halo.com, 6) (http://hello.com, 17) CSULA Jongwook Woo
  • 19. HiPIC Legacy Example In late 2007, the New York Times wanted to make available over the web its entire archive of articles, 11 million in all, dating back to 1851. four-terabyte pile of images in TIFF format. needed to translate that four-terabyte pile of TIFFs into more web-friendly PDF files. – not a particularly complicated but large computing chore, • requiring a whole lot of computer processing time. CSULA Jongwook Woo
  • 20. HiPIC Legacy Example (Cont’d) In late 2007, the New York Times wanted to make available over the web its entire archive of articles, a software programmer at the Times, Derek Gottfrid, – playing around with Amazon Web Services, Elastic Compute Cloud (EC2), • uploaded the four terabytes of TIFF data into Amazon's Simple Storage System (S3) • In less than 24 hours, 11 millions PDFs, all stored neatly in S3 and ready to be served up to visitors to the Times site.  The total cost for the computing job? $240 – 10 cents per computer-hour times 100 computers times 24 hours CSULA Jongwook Woo
  • 21. HiPIC Contents Fundamentals of Big Data Data-Intensive Computing: Hadoop Big Data Supporters and Use Cases CSULA Jongwook Woo
  • 22. HiPIC Supporters of Big Data  Apache Hadoop Supporters  Cloudera – Like Linux and Redhat – HiPIC is an Academic Partner  Hortonworks – Pig, – Consulting and training  Facebook – Hive  IBM – Jaql  NoSQL DB supporters  MongoDB – HiPIC tries to collaborate  HBase, CouchDB, Apache Cassandra (originally by FB) etc CSULA Jongwook Woo
  • 23. HiPIC Similarities in Pig, Hive, and Jaql • translate high-level languages into MapReduce jobs o the programmer can work at a higher level  than writing MapReduce jobs in Java or other lower-level languages • programs are much smaller than Java code. • option to extend these languages, o often by writing user-defined functions in Java. • Interoperability o programs written in these high-level languages can be imbedded inside other languages as well. CSULA Jongwook Woo
  • 24. HiPIC Pig • developed at Yahoo Research around 2006 o moved into the Apache Software Foundation in 2007. • PigLatin, o Pig's language o a data flow language o well suited to processing unstructured data  Easy to write MapReduce codes CSULA Jongwook Woo
  • 25. HiPIC Hive • developed at Facebook o turns Hadoop into a data warehouse o complete with a dialect of SQL for querying. • HiveQL o a declarative language (SQL dialect) • Difference from PigLatin, o you do not specify the data flow,  but instead describe the result you want  Hive figures out how to build a data flow to achieve it. o a schema is required, CSULA Jongwook Woo
  • 26. HiPIC Jaql • developed at IBM. • a data flow language o its native data structure format is JSON (JavaScript Object Notation). CSULA Jongwook Woo
  • 27. HiPIC Use Cases Amazon AWS Facebook Twitter Craiglist HuffPOst | AOL CSULA Jongwook Woo
  • 28. HiPIC Amazon AWS  amazon.com  Consumer and seller business  aws.amazon.com  IT infrastructure business – Focus on your business not IT management  Pay as you go – Pay for servers by the hour – Pay for storage per Giga byte per month – Pay for data transfer per Giga byte  Services with many APIs – S3: Simple Storage Service – EC2: Elastic Compute Cloud • Provide many virtual Linux servers • Can run on multiple nodes – Hadoop and HBase – MongoDB CSULA Jongwook Woo
  • 29. HiPIC Amazon AWS (Cont’d) Customers on aws.amazon.com Samsung – Smart TV hub sites: TV applications are on AWS Netflix – ~25% of US internet traffic – ~100% on AWS NASA JPL – Analyze more than 200,000 images NASDAQ – Using AWS S3 CSULA Jongwook Woo
  • 30. HiPIC Facebook [7] Using Apache HBase For Titan and Puma HBase for FB – Provide excellent write performance and good reads – Nice features • Scalable • Fault Tolerance • MapReduce CSULA Jongwook Woo
  • 31. HiPIC Titan: Facebook Message services in FB Hundreds of millions of active users 15+ billion messages a month 50K instant message a second Challenges High write throughput – Every message, instant message, SMS, email Massive Clusters – Must be easily scalable Solution Clustered HBase CSULA Jongwook Woo
  • 32. HiPIC Puma: Facebook  ETL  Extract, Transform, Load – Data Integrating from many data sources to Data Warehouse  Data analytics – Domain owners‟ web analytics for Ad and apps • clicks, likes, shares, comments etc  ETL before Puma  8 – 24 hours – Procedures: Scribe, HDFS, Hive, MySQL  ETL after Puma  Puma – Real time MapReduce framework  2 – 30 secs – Procedures: Scribe, HDFS, Puma, HBase CSULA Jongwook Woo
  • 33. HiPIC Twitter [8] Three Challenges Collecting Data – Scribe as FB Large Scale Storage and analysis – Cassandra: ColumnFamily key-value store – Hadoop Rapid Learning over Big Data – Pig • 5% of Java code • 5% of dev time • Within 20% of running time CSULA Jongwook Woo
  • 34. HiPIC Craiglist in MongoDB [9] Craiglist ~700 cities, worldwide ~1 billion hits/day ~1.5 million posts/day Servers – ~500 servers – ~100 MySQL servers Migrate to MongoDB Scalable, Fast, Proven, Friendly CSULA Jongwook Woo
  • 35. HiPIC HuffPost | AOL [10] Two Machine Learning Use Cases Comment Moderation – Evaluate All New HuffPost User Comments Every Day • Identify Abusive / Aggressive Comments • Auto Delete / Publish ~25% Comments Every Day Article Classification – Tag Articles for Advertising • E.g.: scary, salacious, … build a flexible ML platform running on Hadoop Pig for Hadoop implementation. CSULA Jongwook Woo
  • 36. HiPIC Conclusion Era of Big Data Need to store and compute Big Data Storage: NoSQL DB Computation: Hadoop MapRedude Need to analyze Big Data in mobile computing, SNS for Ad, User Behavior, Patterns, Bioinformatics, Medical data … CSULA Jongwook Woo
  • 37. HiPIC Part II The power of Women in Goryeo Dynasty North East Asia before the Mongol Empire Korea and Mongol The Empress Gi CSULA Jongwook Woo
  • 38. HiPIC Three kingdoms (AD 907 - 1125) CSULA Jongwook Woo
  • 39. HiPIC Before Mongol Three kingdoms balanced power Goryeo, Yo (Liao, Cathay, Khitan, 契丹), Song –Goryeo-Yo: 3 wars • First invasion (AD 993): 서희, • Second invasion with 400K (AD 1010): 강조 • Third invasion with 100K (AD 1018): 강감찬 – Goryeo became famous after this victory CSULA Jongwook Woo
  • 40. HiPIC Three kingdoms (AD 1115- 1234) CSULA Jongwook Woo
  • 41. HiPIC Before Mongol Three kingdoms balanced power (AD 1115 - 1234) Goryeo, Gum (Jin, Jurchen, Yojin, 金朝), South Song –윤관 invaded Jurchen Wanyan (完顏) clan (AD 1111) and many battles –Jin defeated Liao dynasty at AD 1121 – wanted to keep a peace with Goryeo • From the emperor of big brother to the king of little brother CSULA Jongwook Woo
  • 42. HiPIC Part II. The power of Women in Goryeo Dynasty Korea and Mongol Wars since AD1231 (고종 18) Goryeo (Korea) dynasty Military dictatorship of Choe family ended at AD1258 (고종 45) Mongol Was conquering China (the South Song dynasty) since AD1257 – Möngke Kahn • Right battalion – Kublai • Left battalion CSULA Jongwook Woo
  • 43. HiPIC Korea and Mongol (Cont’d) Mongol Empire in 1227 at Genghis Khan„s death [http://en.wikipedia.org/wiki/Timeline_of_the_Mon gol_Empire] CSULA Jongwook Woo
  • 44. HiPIC Korea and Mongol (Cont’d) 1236 Beginning invading Europe by Hulagu Ariq Böke controlled 1231 Beginning Mongol at Karakorum invading Korea 1236 Beginning invading South Asia By Möngke Khan and Kublai Mongol Empire after Genghis Khan„s death (1227) under Möngke Khan [http://en.wikipedia.org/wiki/Timeline_of_the_Mongol _Empire] CSULA Jongwook Woo
  • 45. HiPIC Korea and Mongol (Cont’d) World in AD1257 – 1260 1257: Mongols was attacking Vietnam 1258: Mongols occupied Baghdad 1259: Mongols was invading Syria – The death of Möngke Khan 1260: The succession war had begun – By Möngke‟s brothers : Kublai Khan and Ariq Böke. – Kublai and the youngest brother Hulugu returned to KaraKorum: Capital of the Mongol empire • Kara: north, Korum: Khori (Space, 골, 고을) CSULA Jongwook Woo
  • 46. HiPIC Korea and Mongol (Cont’d) Again Goreyo and Mongol in 1259 Decided to have a peace treaty with Mongol – Actually to surrender April 21 1259 (고종 46): The Crown Prince left to meet the Khan May 17th 1259: The Crown Prince met Mongol army at Yoyang (Liao liang) who was about to invade Goreyo – Stop the Mongol army  June 30 1259: The king Go-Jong passed away  July 30 1259: The Khan passed away – Mongol army stopped the prince to hide the khan‟s death The prince met Kublai at Gaebong close to the Yellow river – Dec 1259: Kublai was returning back to KaraKorum CSULA Jongwook Woo
  • 47. HiPIC Korea and Mongol (Cont’d) Hulagu Ariq Böke controlled Mongol at Karakorum Goryeo‟s Crown Prince Kublai Mongol Empire after Möngke Kahn' death (1227) [http://en.wikipedia.org/wiki/Timeline_of_the_Mon gol_Empire] CSULA Jongwook Woo
  • 48. HiPIC Goreyo and Mongol in 1260-1264 The great meeting and the great Khan Kublai welcomed the prince with the glad favor – Kublai was so happy and said • “The god is helping me. Goryeo kingdom surrendered to me, who was never defeated even by the Chinese emperor Dang Tae-Jong” • He knew that Goryeo is originated from GoGuRyeo Kublai appointed the prince to the king of Goryeo (Won-Jong) – as Go-Jong passed away They came together to Beijing on Jan 1260. April 1260: Won-Jong‟s enthronement ceremony in Goryeo  August 21 1264: Ariq Böke surrendered to Kublai at Xanadu (KaraKorum) CSULA Jongwook Woo
  • 49. HiPIC The great meeting and the marriage  Sept 1264: King Won-Jong went to Beijing and meet the Khan  Another great welcoming from the Khan  1269: Kublai decided his daughter to marry the crown price of Goryeo  1269, Aug 1270: Won-Jong and the crown prince asked Kublai for the marriage  1271, 1272: the prince went to Beijing and returned back – Volunteer to lead the invasion of Japan  April 1273: Defeated Sambyolcho at Jeju island  May 1274: The crown prince of Goryeo and the princess of the Mongol (Holdorogerimisil, 제국공주) empire married at the palace of the capital in the Mongol empire  Aug 1274: The prince became the king (충렬왕) CSULA Jongwook Woo
  • 50. HiPIC Korea and Mongol (Cont’d) Mongol Empire in 1300 -1405: this map is not correct as Goryeo was an independent kingdom [http://en.wikipedia.org/wiki/Timeline_of_the_Mon gol_Empire] CSULA Jongwook Woo
  • 51. HiPIC Korea and Mongol (Cont’d) The Mongol Empire and the Kingdom of Goryeo tied with marriages Mongol Empire in [http://en.wikipedia.org/wiki/Kublai_Khan] CSULA Jongwook Woo
  • 52. HiPIC The political position  The position of the king was the 7th ranked in the Mongol empire  It is the power of the princess – A daughter of Kublai  Should know that Kublai Khan has 12 sons.  Goryeo received many benefits from the empire – “Only Goryeo in the world kept the king and kingdom” – When the king went to the palace of the empire, all mongol officials wanted to give presents. – The king asked the Khan to suppress Mongol generals in Goryeo  The position of the king was the 4th ranked in the empire  The next great Khan Temur:  The princess is his aunt  The khan asked the king be the 4th ranked at the empire CSULA Jongwook Woo
  • 53. HiPIC The Empress Gi (기황후, 奇皇后) born to Gi Ja-o (奇子敖) in Haengju (幸州), Gor yeo Became a concubine of Toghun Temür Khan – Became the first empress in 1365 Her son Ayurshiridar was designated Crown Prince in 1353. – Supported by Korean eunuch Bak Bulhwa (朴不花) – became a Khan called Biligtü Khan in 1370. CSULA Jongwook Woo
  • 54. HiPIC The Empress Gi (기황후, 奇皇后)  Good for Goryeo She prohibited the culture to send Korean women to the Mongol empire for marriage and slavery  She eliminated any discussion to make Goryeo kingdom as one of provinces in the Mongol empire CSULA Jongwook Woo
  • 55. HiPIC The Empress Gi (기황후, 奇皇后)  An elder brother named Gi Cheol (奇轍, Bayan Bukha).  Came to threaten the position of the king of Goryeo  King Gongmin exterminated the Gi family in 1356 CSULA Jongwook Woo
  • 56. HiPIC The Empress Gi (기황후, 奇皇后)  The Ming China occupied the capital of the empire, Dadu (大都, Beijing), in 1368 The empress was disappointed that Goryeo did not send any reinforcements Fled north to Shangdu (上都, Xanadu) CSULA Jongwook Woo
  • 57. HiPIC Conclusion II Woman has a power to control husband: King and Khan (Emperor)  can promote their social positions to the higher Woman can make a son to a Khan Woman possess a political power to positively affect the motherland We need to know history and educate kids CSULA Jongwook Woo
  • 58. HiPIC Question? CSULA Jongwook Woo
  • 59. HiPIC References Part I 1) Introduction to MongoDB, Nosh Petigara, Jan 11, 2011 2) Hadoop Fundamental I, Big Data University 3) “Large Scale Data Analysis with Map/Reduce”, Marin Dimitrov, Feb 2010 4) “BFS & MapReduce”, Edward J Yoon http://blog.udanax.org/2009/02/breadth-first-search- mapreduce.html, Feb 26 2009 5) “Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop”,Jongwook Woo, Siddharth Basopia, Yuhang Xu, Seon Ho Kim, The Third International Conference on Emerging Databases (EDB 2011), Songdo Park Hotel, Incheon, Korea, Aug. 25-27, 2011 CSULA Jongwook Woo
  • 60. HiPIC References 6) “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing”, Jongwook Woo and Yuhang Xu, The 2011 international Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2011),Las Vegas (July 18-21, 2011) 7) Building Realtime Big Data Services at Facebook with Hadoop and Hbase, Jonathan Gray, Facebook, Nov 11, 2011, Hadoop World NYC 8) Analyzing Big Data at Twitter, Kevin Well, Web 2.0 Expo, NYC, Sep 2010 9) Lessons Learned from Migrating 2+ Billion Documents at Craigslist, Jeremy Zawodny, 2011 10) Machine Learning on Hadoop at Huffington Post | AOL, Thu Kyaw and Sang Chul Song, Hadoop DC, Oct 4, 2011 CSULA Jongwook Woo
  • 61. HiPIC References 11) “MapReduce Debates and Schema-Free”, Woohyun Kim, www.coordguru.com, http://blog.naver.com/wisereign, March 3 2010 12) “Large Scale Data Analysis with Map/Reduce”, Marin Dimitrov, Feb 2010 13) “HBase Schema Design Case Studies”, Qingyan Liu, July 13 2009 CSULA Jongwook Woo
  • 62. HiPIC References Part II 1) 고려에 시집온 징기스칸의 딸들, 이한수, Nov 8 2006, 김영사 2) 쿠빌라이 칸의 일본원정과 충렬왕, 이승한, 2009, 푸른역사 CSULA Jongwook Woo