SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Mining Lectures
 Marcel Caraciolo - @marcelcaraciolo




                                       1
Who’s me ?                    Marcel Pinheiro Caraciolo

 Brazilian, lover of crabs

 Director of P&D - brazilian startup Orygens
 M.S.C Candidate at Data Mining and Recommender Systems
 Current moderator of the Local Python User Group at Pernambuco

           Interested at machine learning,
      recommender systems and mobile computing

  Blogging about machine learning with Python since 2008
              http://aimotion.blogspot.com

 Young apprentice with Python programming since 2008.



                                                                  2
How I started this analysis?




          24 hours ago...

                               3
Question

          How were the topics
distributed around the Scipy Conference
           General Sessions ?




                                          4
Scrapping of Scipy Conference




    Small Web-Crawler for extracting the
            approved lectures
         urllib2, re, BeautifulSoap...
                                           5
Resume


41        Lectures


820   minutes length




                       6
It means...



=~   4100 tweets posted.



                           7
Or watch...



    Star Wars Trilogy


         2x

                        8
Or finish Super Mario Game...




          82 x!
                               9
Na nossa língua agora...
Or open the Eclipse




 Abrir o Eclipse 2 vezes!
         2 x!
                            11


                             10
Most popular Authors

  Dharhas Pothina - 3


  Wes McKinney - 2


   All the others - 1



                        11
Playing with the text...




The most frequent words at the conference
                 nltk, re
                                            12
But let’s take a deeper look.
 I used the clustering algorithm K-Means

 Tool used for visualization Ubigraph




                                           13
Distribution of the Lectures

 Basic Frameworks
       matplotlib, ipython



                                    Building frameworks
                                      performance, models, web services



   Parallelism
   performance, gpu, statistical



                                               Visualization
              Numpy                                 data analysis, statistical

             toolkits using Numpy

                                                                                 14
To sum up...
Mining english text is so
     much easier!!!
        Submit your work also!

 Spread the scientific python over the
             community

I expect to be back to Scipy next year!



                                          15
https://github.com/marcelcaraciolo/clustering_scipy

        Mining Lectures
           Marcel Caraciolo - @marcelcaraciolo



                                                      16

Weitere ähnliche Inhalte

Andere mochten auch

Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text mining
Lars Juhl Jensen
 
Ict Presentation
Ict PresentationIct Presentation
Ict Presentation
amoi286
 

Andere mochten auch (20)

Scipy, numpy and friends
Scipy, numpy and friendsScipy, numpy and friends
Scipy, numpy and friends
 
Computação Científica com Python, Numpy e Scipy
Computação Científica com Python, Numpy e ScipyComputação Científica com Python, Numpy e Scipy
Computação Científica com Python, Numpy e Scipy
 
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
SciPy India 2009
SciPy India 2009SciPy India 2009
SciPy India 2009
 
Tutorial de numpy
Tutorial de numpyTutorial de numpy
Tutorial de numpy
 
Sistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência ColetivaSistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência Coletiva
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Textmining
TextminingTextmining
Textmining
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
 
Opening sequence conventions
Opening sequence conventionsOpening sequence conventions
Opening sequence conventions
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text mining
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Emotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningEmotion detection from text using data mining and text mining
Emotion detection from text using data mining and text mining
 
Ict Presentation
Ict PresentationIct Presentation
Ict Presentation
 

Ähnlich wie Mining Scipy Lectures

The Joy of SciPy, Part I
The Joy of SciPy, Part IThe Joy of SciPy, Part I
The Joy of SciPy, Part I
Dinu Gherman
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Simplilearn
 

Ähnlich wie Mining Scipy Lectures (20)

O Reilly Learning Python 3rd Edition
 O Reilly Learning Python 3rd Edition O Reilly Learning Python 3rd Edition
O Reilly Learning Python 3rd Edition
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
Caffe - A deep learning framework (Ramin Fahimi)
Caffe - A deep learning framework (Ramin Fahimi)Caffe - A deep learning framework (Ramin Fahimi)
Caffe - A deep learning framework (Ramin Fahimi)
 
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRANDeep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
 
Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRANDeep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
 
Real-World Functional Programming @ Incubaid
Real-World Functional Programming @ IncubaidReal-World Functional Programming @ Incubaid
Real-World Functional Programming @ Incubaid
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
The Joy of SciPy, Part I
The Joy of SciPy, Part IThe Joy of SciPy, Part I
The Joy of SciPy, Part I
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data Science
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Mag pi18 Citation "PhotoReportage"
Mag pi18 Citation "PhotoReportage"Mag pi18 Citation "PhotoReportage"
Mag pi18 Citation "PhotoReportage"
 
Numba
NumbaNumba
Numba
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Transfer Leaning Using Pytorch  synopsis Minor project pptxTransfer Leaning Using Pytorch  synopsis Minor project pptx
Transfer Leaning Using Pytorch synopsis Minor project pptx
 
Datascope runs on python
Datascope runs on pythonDatascope runs on python
Datascope runs on python
 
venv and pip.pdf
venv and pip.pdfvenv and pip.pdf
venv and pip.pdf
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
 

Mehr von Marcel Caraciolo

Construindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PE
Construindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PEConstruindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PE
Construindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PE
Marcel Caraciolo
 

Mehr von Marcel Caraciolo (20)

Como interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonComo interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com Python
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Construindo softwares  de bioinformática  para análises clínicas : Desafios e...Construindo softwares  de bioinformática  para análises clínicas : Desafios e...
Construindo softwares de bioinformática para análises clínicas : Desafios e...
 
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2
 
Como Python pode ajudar na automação do seu laboratório
Como Python pode ajudar na automação do  seu laboratórioComo Python pode ajudar na automação do  seu laboratório
Como Python pode ajudar na automação do seu laboratório
 
Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3
 
Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?
 
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
 
Benchy, python framework for performance benchmarking of Python Scripts
Benchy, python framework for performance benchmarking  of Python ScriptsBenchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking of Python Scripts
 
Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?
 
Construindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com PythonConstruindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com Python
 
Python, A pílula Azul da programação
Python, A pílula Azul da programaçãoPython, A pílula Azul da programação
Python, A pílula Azul da programação
 
Construindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceConstruindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduce
 
Como Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilComo Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no Brasil
 
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
 
Aula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosAula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursos
 
Arquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursosArquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursos
 
PyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for FoursquarePyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for Foursquare
 
Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?
 
Construindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PE
Construindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PEConstruindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PE
Construindo Comunidades Open-Source Bem Sucedidas: Experiências do PUG-PE
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Mining Scipy Lectures

  • 1. Mining Lectures Marcel Caraciolo - @marcelcaraciolo 1
  • 2. Who’s me ? Marcel Pinheiro Caraciolo Brazilian, lover of crabs Director of P&D - brazilian startup Orygens M.S.C Candidate at Data Mining and Recommender Systems Current moderator of the Local Python User Group at Pernambuco Interested at machine learning, recommender systems and mobile computing Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com Young apprentice with Python programming since 2008. 2
  • 3. How I started this analysis? 24 hours ago... 3
  • 4. Question How were the topics distributed around the Scipy Conference General Sessions ? 4
  • 5. Scrapping of Scipy Conference Small Web-Crawler for extracting the approved lectures urllib2, re, BeautifulSoap... 5
  • 6. Resume 41 Lectures 820 minutes length 6
  • 7. It means... =~ 4100 tweets posted. 7
  • 8. Or watch... Star Wars Trilogy 2x 8
  • 9. Or finish Super Mario Game... 82 x! 9
  • 10. Na nossa língua agora... Or open the Eclipse Abrir o Eclipse 2 vezes! 2 x! 11 10
  • 11. Most popular Authors Dharhas Pothina - 3 Wes McKinney - 2 All the others - 1 11
  • 12. Playing with the text... The most frequent words at the conference nltk, re 12
  • 13. But let’s take a deeper look. I used the clustering algorithm K-Means Tool used for visualization Ubigraph 13
  • 14. Distribution of the Lectures Basic Frameworks matplotlib, ipython Building frameworks performance, models, web services Parallelism performance, gpu, statistical Visualization Numpy data analysis, statistical toolkits using Numpy 14
  • 15. To sum up... Mining english text is so much easier!!! Submit your work also! Spread the scientific python over the community I expect to be back to Scipy next year! 15
  • 16. https://github.com/marcelcaraciolo/clustering_scipy Mining Lectures Marcel Caraciolo - @marcelcaraciolo 16