SlideShare a Scribd company logo
1 of 44
:)
tiny   :projects
Tesseract OCR



1985       2006
HP       Google
Tesseract OCR



2006       2011
TIFF              *
Tesseract OCR



2009       2010
Text      layout
Tesseract OCR



2007          2011
 6               33
Tesseract OCR
  Arabic, English, Bulgarian, Catalan, Czech,
 Chinese (Simplified and Traditional), Danish
(standard and Fraktur script), German, Greek,
Finnish, French, Hebrew, Croatian, Hungarian,
Indonesian, Italian, Japanese, Korean, Latvian,
     Lithuanian, Dutch, Norwegian, Polish,
    Portuguese, Romanian, Russian, Slovak
   (standard and Fraktur script), Slovenian,
   Spanish, Serbian, Swedish, Tagalog, Thai,
       Turkish, Ukrainian and Vietnamese
Tesseract OCR

Officially supported:




 Probably runs on:
Image processing
Google Refine
Runs on:
Runs in:
Major features:

Import from anywhere
Faceting
Clustering
Split crate custom columns
GREL transformations
Export/etc
google protocol buffers

                                   Person person;
                                   person.set_id(123);




                               >
message Person {                   person.set_name("Bob");
  required int32 id = 1;           person.set_email("bob@example.com");
  required string name = 2;
  optional string email = 3;       fstream out("person.pb", ios::out ...
}                                  person.SerializeToOstream(&out);
                                   out.close();
512   bytes / tweet
  340,000,000   tweets / day (2012)
7,253,333,333   bytes / hour
    2,014,814   bytes / second
        1,921   Mbytes / second
       15,371   Mbits / second

           8    Tbytes / day (2011)

  Google: ~ 377M searches/day
+ =
+ =
+ =
>   + =
>   + =
>   + =
?

    MapReduce
snappy
http://code.google.com/p/snappy/
snappy


Fast                Stable




Robust
                  Free and BSD
Size(less is better)
                                             compression ratio (%)
80



70



60



50



40



30



20



10



0
     lzjb 2010 lzo 2.04 1x fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1
                                                   lzf                         lzrw1-a   lzrw2   lzrw3   lzrw3-a   snappy   quicklz    quicklz
                                1            2                                                                       1.0    1.5.0 -1   1.5.0 -2
6
                                     Data types
                    5




                    4
compression ratio




                    3                                    snappy
                                                         zlib



                    2




                    1




                    0
                        plain text       html     jpeg
Size



from 20% to 100% bigger

                :(


     ...not for amazon glacier
Speed is better)
                                            Compression (MB/s) (more
250




200




150




100




50




  0
      lzjb 2010   lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1
                                                   lzf                         lzrw1-a   lzrw2   lzrw3   lzrw3-a   snappy   quicklz    quicklz
                     1x         1            2                                                                       1.0    1.5.0 -1   1.5.0 -2
Speed is better)
                                          Decompression (MB/s) (more
500


450


400


350


300


250


200


150


100


50


  0
      lzjb 2010   lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1
                                                   lzf                         lzrw1-a   lzrw2   lzrw3   lzrw3-a   snappy   quicklz    quicklz
                     1x         1            2                                                                       1.0    1.5.0 -1   1.5.0 -2
On 1 core of 64-bit Core i7 processor:

  • Compression:        250MB/s

  • Decompression: 500MB/s

                   :P
Portable, but...
Portable, but primarily optimized
for 64-bit x86-compatible
processors
Used:

 BigTable
MapReduce
Google RPC
 Hadoop
Bindings:
@TarasRoshko

       HTTP headers here:

http://code.google.com/p/snappy/
source/browse/trunk/framing_for
             mat.txt
QA?   Ostap Andrusiv

      Software Engineer
      Eleks software
      @p1f

More Related Content

Viewers also liked

A Look At Google Glass
A Look At Google GlassA Look At Google Glass
A Look At Google GlassOstap Andrusiv
 
Lessons learned from Tesla Watch Apps experiments
Lessons learned from Tesla Watch Apps experimentsLessons learned from Tesla Watch Apps experiments
Lessons learned from Tesla Watch Apps experimentsOstap Andrusiv
 
Scaladroids: Developing Android Apps with Scala
Scaladroids: Developing Android Apps with ScalaScaladroids: Developing Android Apps with Scala
Scaladroids: Developing Android Apps with ScalaOstap Andrusiv
 
Wearable Connectivity Architectures
Wearable Connectivity ArchitecturesWearable Connectivity Architectures
Wearable Connectivity ArchitecturesOstap Andrusiv
 
Breaking Glass: Glass development without Glass
Breaking Glass: Glass development without GlassBreaking Glass: Glass development without Glass
Breaking Glass: Glass development without GlassOstap Andrusiv
 
Wearables - The Next Level of Mobility
Wearables - The Next Level of MobilityWearables - The Next Level of Mobility
Wearables - The Next Level of MobilityOstap Andrusiv
 
The Making of Tesla Smartwatch Apps
The Making of Tesla Smartwatch AppsThe Making of Tesla Smartwatch Apps
The Making of Tesla Smartwatch AppsOstap Andrusiv
 

Viewers also liked (8)

A Look At Google Glass
A Look At Google GlassA Look At Google Glass
A Look At Google Glass
 
Lessons learned from Tesla Watch Apps experiments
Lessons learned from Tesla Watch Apps experimentsLessons learned from Tesla Watch Apps experiments
Lessons learned from Tesla Watch Apps experiments
 
Scaladroids: Developing Android Apps with Scala
Scaladroids: Developing Android Apps with ScalaScaladroids: Developing Android Apps with Scala
Scaladroids: Developing Android Apps with Scala
 
Wearable Connectivity Architectures
Wearable Connectivity ArchitecturesWearable Connectivity Architectures
Wearable Connectivity Architectures
 
Breaking Glass: Glass development without Glass
Breaking Glass: Glass development without GlassBreaking Glass: Glass development without Glass
Breaking Glass: Glass development without Glass
 
UX Challenges in VR
UX Challenges in VRUX Challenges in VR
UX Challenges in VR
 
Wearables - The Next Level of Mobility
Wearables - The Next Level of MobilityWearables - The Next Level of Mobility
Wearables - The Next Level of Mobility
 
The Making of Tesla Smartwatch Apps
The Making of Tesla Smartwatch AppsThe Making of Tesla Smartwatch Apps
The Making of Tesla Smartwatch Apps
 

Similar to Tiny Google Projects

Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Enkitec
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking systemJesse Vincent
 
Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011
Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011
Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011Vlad Savitsky
 
Speed is Essential for a Great Web Experience
Speed is Essential for a Great Web ExperienceSpeed is Essential for a Great Web Experience
Speed is Essential for a Great Web ExperienceAndy Davies
 
Performance tuning
Performance tuningPerformance tuning
Performance tuningJon Haddad
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyAerospike
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesNETWAYS
 
Lustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New FeaturesLustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New Featuresinside-BigData.com
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfRedis Labs
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
New idc architecture
New idc architectureNew idc architecture
New idc architectureMason Mei
 

Similar to Tiny Google Projects (20)

Blogopolisの裏側
Blogopolisの裏側Blogopolisの裏側
Blogopolisの裏側
 
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Deployment
DeploymentDeployment
Deployment
 
NFS and Oracle
NFS and OracleNFS and Oracle
NFS and Oracle
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking system
 
Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011
Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011
Oleg Natalushko. Drupal server anatomy. DrupalCamp Kyiv 2011
 
Dream colo
Dream coloDream colo
Dream colo
 
Speed is Essential for a Great Web Experience
Speed is Essential for a Great Web ExperienceSpeed is Essential for a Great Web Experience
Speed is Essential for a Great Web Experience
 
Performance tuning
Performance tuningPerformance tuning
Performance tuning
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
 
Lustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New FeaturesLustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New Features
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConf
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
New idc architecture
New idc architectureNew idc architecture
New idc architecture
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Tiny Google Projects

  • 1. :)
  • 2.
  • 3. tiny :projects
  • 4.
  • 5.
  • 6.
  • 7. Tesseract OCR 1985 2006 HP Google
  • 8. Tesseract OCR 2006 2011 TIFF *
  • 9. Tesseract OCR 2009 2010 Text layout
  • 10. Tesseract OCR 2007 2011 6 33
  • 11. Tesseract OCR Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish (standard and Fraktur script), German, Greek, Finnish, French, Hebrew, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian and Vietnamese
  • 14.
  • 15.
  • 16.
  • 20. Major features: Import from anywhere Faceting Clustering Split crate custom columns GREL transformations Export/etc
  • 21.
  • 22. google protocol buffers Person person; person.set_id(123); > message Person { person.set_name("Bob"); required int32 id = 1; person.set_email("bob@example.com"); required string name = 2; optional string email = 3; fstream out("person.pb", ios::out ... } person.SerializeToOstream(&out); out.close();
  • 23. 512 bytes / tweet 340,000,000 tweets / day (2012) 7,253,333,333 bytes / hour 2,014,814 bytes / second 1,921 Mbytes / second 15,371 Mbits / second 8 Tbytes / day (2011) Google: ~ 377M searches/day
  • 24. + =
  • 25. + =
  • 26. + =
  • 27. > + =
  • 28. > + =
  • 29. > + = ? MapReduce
  • 30.
  • 32. snappy Fast Stable Robust Free and BSD
  • 33. Size(less is better) compression ratio (%) 80 70 60 50 40 30 20 10 0 lzjb 2010 lzo 2.04 1x fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1 2 1.0 1.5.0 -1 1.5.0 -2
  • 34. 6 Data types 5 4 compression ratio 3 snappy zlib 2 1 0 plain text html jpeg
  • 35. Size from 20% to 100% bigger :( ...not for amazon glacier
  • 36. Speed is better) Compression (MB/s) (more 250 200 150 100 50 0 lzjb 2010 lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1x 1 2 1.0 1.5.0 -1 1.5.0 -2
  • 37. Speed is better) Decompression (MB/s) (more 500 450 400 350 300 250 200 150 100 50 0 lzjb 2010 lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1x 1 2 1.0 1.5.0 -1 1.5.0 -2
  • 38. On 1 core of 64-bit Core i7 processor: • Compression: 250MB/s • Decompression: 500MB/s :P
  • 40. Portable, but primarily optimized for 64-bit x86-compatible processors
  • 43. @TarasRoshko HTTP headers here: http://code.google.com/p/snappy/ source/browse/trunk/framing_for mat.txt
  • 44. QA? Ostap Andrusiv Software Engineer Eleks software @p1f

Editor's Notes

  1. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  2. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  3. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  4. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  5. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  6. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  7. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  8. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  9. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  10. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  11. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  12. In-memory test (compression and decompression) with ENWIK8 using1 core of Intel Xeon X5355 @ 2.66GHz (64-bit compilation under gcc 4.1.1 (Linux) -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math --param inline-unit-growth=999 -DNDEBUG)
  13. zlibsnappyplain text1.5-1.72.7html2-4 3-7 jpeg11
  14. http://aws.amazon.com/glacier/
  15. http://pastebin.com/SFaNzRuf
  16. http://encode.ru/threads/1255-Google-released-Snappy-compression-decompression-library
  17. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  18. http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/