Suche senden
Hochladen
Meetup#4, Smart.Data@OK.ru
•
0 gefällt mir
•
402 views
S
SPb_Data_Science
Folgen
Talk by Dmitry Bugaychenko, OK
Weniger lesen
Mehr lesen
Daten & Analysen
Melden
Teilen
Melden
Teilen
1 von 35
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Trending Topics in Recommender Systems
Trending Topics in Recommender Systems
SPb_Data_Science
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
SPb_Data_Science
Вступ до курсу
Вступ до курсу
penkovakati
Classification metrics
Classification metrics
SPb_Data_Science
Узагальнення знань з теми Відношення та пропорція
Узагальнення знань з теми Відношення та пропорція
penkovakati
Эффективные Алгоритмы Поиска Подобных Объектов Для Терабайтов Данных
Эффективные Алгоритмы Поиска Подобных Объектов Для Терабайтов Данных
SPb_Data_Science
Prestige 2014 NOV FASHION
Prestige 2014 NOV FASHION
Stephanie Ziemer
Meetup#2. Intro to Factorization Machines
Meetup#2. Intro to Factorization Machines
SPb_Data_Science
Empfohlen
Trending Topics in Recommender Systems
Trending Topics in Recommender Systems
SPb_Data_Science
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
SPb_Data_Science
Вступ до курсу
Вступ до курсу
penkovakati
Classification metrics
Classification metrics
SPb_Data_Science
Узагальнення знань з теми Відношення та пропорція
Узагальнення знань з теми Відношення та пропорція
penkovakati
Эффективные Алгоритмы Поиска Подобных Объектов Для Терабайтов Данных
Эффективные Алгоритмы Поиска Подобных Объектов Для Терабайтов Данных
SPb_Data_Science
Prestige 2014 NOV FASHION
Prestige 2014 NOV FASHION
Stephanie Ziemer
Meetup#2. Intro to Factorization Machines
Meetup#2. Intro to Factorization Machines
SPb_Data_Science
Meetup#2. Introduction to Algorithmic Trading
Meetup#2. Introduction to Algorithmic Trading
SPb_Data_Science
Meetup #1. Trends, talks, cool stuff.
Meetup #1. Trends, talks, cool stuff.
SPb_Data_Science
KDB Software Inc - The Reflections Experience
KDB Software Inc - The Reflections Experience
Kevin Devault
OCT Men's SHOOT-2
OCT Men's SHOOT-2
Stephanie Ziemer
Creating a Solid Content Marketing Plan for Small Businesses
Creating a Solid Content Marketing Plan for Small Businesses
Nick Aldridge
Meetup #1. Building a CNN in Kaggle Data Science Bowl
Meetup #1. Building a CNN in Kaggle Data Science Bowl
SPb_Data_Science
Meetup#4, Apache Spark as SQL Engine
Meetup#4, Apache Spark as SQL Engine
SPb_Data_Science
Benford’s law & fraud detection slides
Benford’s law & fraud detection slides
SPb_Data_Science
Diabetic Retinopathy Detection
Diabetic Retinopathy Detection
SPb_Data_Science
I Tweet You Tweet Presentation
I Tweet You Tweet Presentation
Parmjit Parmar
Intro to Digital Storytelling (shorter version)
Intro to Digital Storytelling (shorter version)
Amy Goodloe
IAB Russia - Boris Omelnitskiy - Russian Internet Market in Numbers - HUBFORU...
IAB Russia - Boris Omelnitskiy - Russian Internet Market in Numbers - HUBFORU...
HUB INSTITUTE
More Time Now
More Time Now
Khalil Aleker
How Cross-Organizational Cooperation on AI lead to changes in Norway
How Cross-Organizational Cooperation on AI lead to changes in Norway
World Appreciative Inquiry Conference 2012
Data Insights - sentiXchange
Data Insights - sentiXchange
Akshay Wattal
PGA Tour Social Media Case Study
PGA Tour Social Media Case Study
Chris Yates
Python 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute Beginners
Sai Linn Thu
Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1
NBER
The Social Media Garden Report
The Social Media Garden Report
Silverman_Research
Tenzo ama multi c w11.17
Tenzo ama multi c w11.17
Steve Medina
Opening Doors: Generating new Connections to foster Social Inclusion
Opening Doors: Generating new Connections to foster Social Inclusion
World Appreciative Inquiry Conference 2012
How news organizations are reinventing themselves through social media
How news organizations are reinventing themselves through social media
bethtucker
Weitere ähnliche Inhalte
Andere mochten auch
Meetup#2. Introduction to Algorithmic Trading
Meetup#2. Introduction to Algorithmic Trading
SPb_Data_Science
Meetup #1. Trends, talks, cool stuff.
Meetup #1. Trends, talks, cool stuff.
SPb_Data_Science
KDB Software Inc - The Reflections Experience
KDB Software Inc - The Reflections Experience
Kevin Devault
OCT Men's SHOOT-2
OCT Men's SHOOT-2
Stephanie Ziemer
Creating a Solid Content Marketing Plan for Small Businesses
Creating a Solid Content Marketing Plan for Small Businesses
Nick Aldridge
Meetup #1. Building a CNN in Kaggle Data Science Bowl
Meetup #1. Building a CNN in Kaggle Data Science Bowl
SPb_Data_Science
Meetup#4, Apache Spark as SQL Engine
Meetup#4, Apache Spark as SQL Engine
SPb_Data_Science
Benford’s law & fraud detection slides
Benford’s law & fraud detection slides
SPb_Data_Science
Diabetic Retinopathy Detection
Diabetic Retinopathy Detection
SPb_Data_Science
Andere mochten auch
(9)
Meetup#2. Introduction to Algorithmic Trading
Meetup#2. Introduction to Algorithmic Trading
Meetup #1. Trends, talks, cool stuff.
Meetup #1. Trends, talks, cool stuff.
KDB Software Inc - The Reflections Experience
KDB Software Inc - The Reflections Experience
OCT Men's SHOOT-2
OCT Men's SHOOT-2
Creating a Solid Content Marketing Plan for Small Businesses
Creating a Solid Content Marketing Plan for Small Businesses
Meetup #1. Building a CNN in Kaggle Data Science Bowl
Meetup #1. Building a CNN in Kaggle Data Science Bowl
Meetup#4, Apache Spark as SQL Engine
Meetup#4, Apache Spark as SQL Engine
Benford’s law & fraud detection slides
Benford’s law & fraud detection slides
Diabetic Retinopathy Detection
Diabetic Retinopathy Detection
Ähnlich wie Meetup#4, Smart.Data@OK.ru
I Tweet You Tweet Presentation
I Tweet You Tweet Presentation
Parmjit Parmar
Intro to Digital Storytelling (shorter version)
Intro to Digital Storytelling (shorter version)
Amy Goodloe
IAB Russia - Boris Omelnitskiy - Russian Internet Market in Numbers - HUBFORU...
IAB Russia - Boris Omelnitskiy - Russian Internet Market in Numbers - HUBFORU...
HUB INSTITUTE
More Time Now
More Time Now
Khalil Aleker
How Cross-Organizational Cooperation on AI lead to changes in Norway
How Cross-Organizational Cooperation on AI lead to changes in Norway
World Appreciative Inquiry Conference 2012
Data Insights - sentiXchange
Data Insights - sentiXchange
Akshay Wattal
PGA Tour Social Media Case Study
PGA Tour Social Media Case Study
Chris Yates
Python 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute Beginners
Sai Linn Thu
Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1
NBER
The Social Media Garden Report
The Social Media Garden Report
Silverman_Research
Tenzo ama multi c w11.17
Tenzo ama multi c w11.17
Steve Medina
Opening Doors: Generating new Connections to foster Social Inclusion
Opening Doors: Generating new Connections to foster Social Inclusion
World Appreciative Inquiry Conference 2012
How news organizations are reinventing themselves through social media
How news organizations are reinventing themselves through social media
bethtucker
A Different Grid: Multi-Channel Service Design, the African Way (IA Summit 2012)
A Different Grid: Multi-Channel Service Design, the African Way (IA Summit 2012)
Franco Papeschi
Gelatine: Making coworking places gel for better collaboration and social lea...
Gelatine: Making coworking places gel for better collaboration and social lea...
kavasmlikon
Comparing the Severity of International Crime
Comparing the Severity of International Crime
Alixandra Greenman
Social tv the second screen driving tv engagement
Social tv the second screen driving tv engagement
MAD Perspectives LLC
Character Design Co-creation Toolkit for Children
Character Design Co-creation Toolkit for Children
dhanha
01 CXPA Elisa - AI Robotiikka Tunneäly & Tekoäly - Shirute - Sirte Pihlaja
01 CXPA Elisa - AI Robotiikka Tunneäly & Tekoäly - Shirute - Sirte Pihlaja
Customer Experience Professionals Association
Devopsdays Austin 2015 - Guns, Germs and Microservices
Devopsdays Austin 2015 - Guns, Germs and Microservices
John Willis
Ähnlich wie Meetup#4, Smart.Data@OK.ru
(20)
I Tweet You Tweet Presentation
I Tweet You Tweet Presentation
Intro to Digital Storytelling (shorter version)
Intro to Digital Storytelling (shorter version)
IAB Russia - Boris Omelnitskiy - Russian Internet Market in Numbers - HUBFORU...
IAB Russia - Boris Omelnitskiy - Russian Internet Market in Numbers - HUBFORU...
More Time Now
More Time Now
How Cross-Organizational Cooperation on AI lead to changes in Norway
How Cross-Organizational Cooperation on AI lead to changes in Norway
Data Insights - sentiXchange
Data Insights - sentiXchange
PGA Tour Social Media Case Study
PGA Tour Social Media Case Study
Python 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute Beginners
Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1
The Social Media Garden Report
The Social Media Garden Report
Tenzo ama multi c w11.17
Tenzo ama multi c w11.17
Opening Doors: Generating new Connections to foster Social Inclusion
Opening Doors: Generating new Connections to foster Social Inclusion
How news organizations are reinventing themselves through social media
How news organizations are reinventing themselves through social media
A Different Grid: Multi-Channel Service Design, the African Way (IA Summit 2012)
A Different Grid: Multi-Channel Service Design, the African Way (IA Summit 2012)
Gelatine: Making coworking places gel for better collaboration and social lea...
Gelatine: Making coworking places gel for better collaboration and social lea...
Comparing the Severity of International Crime
Comparing the Severity of International Crime
Social tv the second screen driving tv engagement
Social tv the second screen driving tv engagement
Character Design Co-creation Toolkit for Children
Character Design Co-creation Toolkit for Children
01 CXPA Elisa - AI Robotiikka Tunneäly & Tekoäly - Shirute - Sirte Pihlaja
01 CXPA Elisa - AI Robotiikka Tunneäly & Tekoäly - Shirute - Sirte Pihlaja
Devopsdays Austin 2015 - Guns, Germs and Microservices
Devopsdays Austin 2015 - Guns, Germs and Microservices
Kürzlich hochgeladen
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
Eduminds Learning
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
dolaknnilon
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
ttt fff
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
📊 Markus Baersch
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Jeremy Anderson
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Boston Institute of Analytics
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
VICTOR MAESTRE RAMIREZ
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
yuu sss
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
voginip
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Boston Institute of Analytics
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
natarajan8993
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
F La
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
Business Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
ysmaelreyes
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Boston Institute of Analytics
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
Seán Kennedy
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
yuu sss
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
F La
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Cathrine Wilhelmsen
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Thomas Poetter
Kürzlich hochgeladen
(20)
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Business Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Meetup#4, Smart.Data@OK.ru
1.
Smart.Data@ОК.ru! How!to!make!the!world!a!bit!be0er!using!a!petabyte!of!data! and!a!couple!of!good!books! Dmitry!Bugaychenko!
2.
What!is!it!about?! • About/us/and/the/size/ • Smart!data!vs.!big!data! •
Smart!data!tasks! • Smart!data!deck! • Open!data! 2!
3.
OK.ru!is!about! • Family,!friends,!classmates! • Sea!of!posiEve!and!humor! •
Enormous!collecEon!of!mulEmedia!content! • Largest!plaHorm!for!online!games! 3!
4.
OK.ru!is!! ! ! The!largest!entertainment!social! network!between!AtlanEc!ocean!and! Amur!river! 4!
5.
OK.ru!in!numbers! • 200!000!000!registered!users! • 10!000!000!communiEes! •
10!000!000!000!connecEons! • Daily:! – 40!000!000!unique!users! – 250!000!000!messages! – 25!000!000!posts! – 44!000!000!photos! – 7!000!000!friendships! – 9!000!000!000!entries!added!to!news!feeds! – …! 5!
6.
OK.ru!in!technical!numbers! • 7000+!servers! • 10Tb!of!posts! •
20Tb!of!likes! • 80Tb!messages!in!discussions! • 400Gb!social!connecEons! • …! • And!6Tb!of!new!data!in!our!Hadoop!each!day! 6!
7.
What!is!it!about?! • About!us!and!the!size! • Smart/data/vs./big/data/ •
Smart!data!tasks! • Smart!data!deck! • Open!data! 7!
8.
Big!Data!vs.!Smart!Data! • 100!gigabytes,!how!much!is!it?! – 50000!electronic!books! – 10000!mp3!music!tracks! – 5000!photos!from!a!modern!smart!phone! – 10!HD!movies! – 1!season!of!your!favorite!TV!series!in!good!quality! 8!
9.
Big!Data!vs.!Smart!Data! • 100!gigabytes!is!enough!to! – Auto[generate!music!catalog! –
Triple!acEvity!in!the!music!layer! – Double!acEvity!in!the!video!«related»!area! • 400!gigabytes:! – Plus!50%!for!clicks!at!the!communiEes!page! – Extra!1М!communiEes!visits!from!the!“recommended! communiEes”!portlet! • 10Тб!–!plus!20%!for!clicks!at!“Like”!at!news!feed! 9!
10.
Smart!data!! ! ! The!size!of!data!is!important,!but!your! ability!to!employ!the!data!to!improve! your!product!is!way!more!important!! ! 10!
11.
Be!smart!! 1. Set!the!Goal! 2. Find!the!Model! 3.
Select!the!Toolset! 4. Collect!the!Data!needed! 5. Mine!the!Data,!train!the!Model,!apply!results! 6. Profit!! 11!
12.
Be!smart!! 1. Set!the!Goal! 2. Find!the!Model! 3.
Select!the!Toolset! 4. Collect!the!Data!needed! 5. Mine!the!Data,!train!the!Model,!apply!results! 6. Profit!! 7. Repeat/from/step/1/ 12!
13.
What!is!it!about?! • About!us!and!the!size! • Smart!data!vs.!big!data! •
Smart/data/tasks/ • Smart!data!deck! • Open!data! 13!
14.
Smart!data!for!music! • The!data! – Metadata!of!UGC!uploads!and!copyrighted! content! – Users!playbacks!and!playlists! • Stage!0:!Construct!music!catalog!using!a! bunch!of!staEsEcal!algorithms! – Improved!search!and!navigaEon! – Images!in!music!layer!! – Plus!20[30%!to!playbacks! 14!
15.
Smarter!data!for!music! • Stage!1:!mining!collaboraEve!correlaEons! – Similar!arEsts!and!“ArEst!radio”! • Stage!2:!mining!temporal!correlaEons!and! combining!with!metadata!and!collaboraEve! part! – PersonalizaEon!of!the!main!page! – The!most!affecEng!feature!(+100%!to!playbacks)! •
Stage!3:!segmenEng!users’!tastes! – “My!Radio”!feature! 15!
16.
Smartest!data!for!music! • Stage!4:!music!content!analysis! – “ContentId”!system! – DeduplicaEon!for!search!index! – ArEsts!genre!tagging! – Outliers!removal! – Large!investments,!but!very!limited!user!effect! 16!
17.
Smart!data!for!communiEes! • The!data:! – Log!of!visits!and!acEons!in!communiEes! – CommuniEes!metadata! – Posts!content! • Stage!1:!mine!collaboraEve!correlaEons,!apply! toolbox!used!for!music! – “Recommended!communiEes”!portlet! – Highest!CTR!among!all!“add”!content!on!the!main! page! 17!
18.
Smarter!data!for!communiEes! • Stage!2:!extend!recommender!with!metadata!(tags)! and!regional/demography!data! – Improved!CTR!for!recommend!communiEes! –
PersonalizaEon!of!the!communiEes!page! – +50%!to!clicks!at!the!communiEes!page! • Stage!3:!mine!communiEes!content!for!their! semanEcs! – Implemented!fancy!distributed!Robust!LDA!model!for! communiEes!post! – Not!in!producEon!yet!–!waiEng!for!one!of!you,!guys,!to! complete!;)! 18!
19.
Smart!data!for!video! • The!data! – Likes!for!videos! –
“View!events”! • Stage!1:!mine!collaboraEve!correlaEons!! – Double!clicks!at!the!“related!video”!area! – Likes!perform!be0er!then!“view!events”! • Stage!2:!advances!collaboraEve!models!for!top! personalizaEon! – Different!variaEons!of!SVD! – Limited!effect.!To!be!conEnued!(may!be!with!one!of! you!;)!)! 19!
20.
Smart!data!for!news!feed! • The!data:! – News!feed!impressions!(the!largest!our!data!set)! –
Likes,!comments!and!clicks! • Stage!1:!improve!ranking!using!CTR! – We!managed!to!construct!the!infrastructure!capable!to!calculate!CTR! for!all!our!content!at!real!Eme!(at!the!speed!of!up!to!4!000!000!events! per!second)! – +10%!for!CTR!when!considering!CTR!!! • Stage!2:!improved!models!for!news!feed!ranking! – SVD[based!collaboraEve!approach!is!running!in!experimental!seungs! and!looks!promising! – More!models!are!waiEng:!content!based!(LDA!or!whatever),!social! based,!ensembles! – Join!the!movement!! 20!
21.
Smart!data!for!anEspam! • Stolen!accounts!detecEon! • Pornography!detecEon! •
Textual!spam!classificaEon! • AutomaEc!registraEon!detecEons! • …! 21!
22.
Smart!data!for!BI! • We!would!like!to!understand!why!do!we!see! certain!effects!and!how!can!we!influence! them! • Using!data!analysis!and!visualizaEon!to!find! the!answers!and!insights! ! 22!
23.
More!areas!willing!to!become!smart!! • Help!users!to!find!friends!(people!you!may! know!and!more)! • Presents! •
Games! • Photos! • OperaEonal!data! • …! 23!
24.
What!is!it!about?! • About!us!and!the!size! • Smart!data!vs.!big!data! •
Smart!data!tasks! • Smart/data/deck/ • Open!data! 24!
25.
Our!technologies! • Hadoop!2.x! • Apache!Pig! •
Apache!Spark! • Apache!Kaxa! • Apache!Samza! • Apache!Cassandra! • Python,!R,!Tableau! 25!
26.
Data!processing!pipeline! 26!
27.
What!is!it!about?! • About!us!and!the!size! • Smart!data!vs.!big!data! •
Smart!data!tasks! • Smart!data!deck! • Open/data/ 27!
28.
Likes!data!set! • The!first!dataset!we!opened!for!public! • Contains!posts!made!in!communiEes! •
Contains!users’!likes!for!the!post! • Get!it!at!h0p://likesdataset.sh2014.org/index/! • Try!yourself!at!the!mini[contest!for!predicEng! likes! 28!
29.
SNA!hakaton!dataset! • Large!dataset!(100+Gb)! • Contains! – Fragment!of!social!graph! – Users’!posts,!likes!and!logins! – CommuniEes!posts! – Demography!data! – Complaints! •
To!get!it! mailto:elena.mikhaylova@corp.mail.ru! 29!
30.
New!users!dataset! • The!freshest!one!(data!up!to!2015!April!8[th)! • Contains! –
New!users!registraEons!(40K)! – All!acEviEes!related!to!those!users!(both!own!and!others)! – Social!graph!for!all!involved!users!(4M)! – Demography!data!for!the!users! • PotenEal!tasks:!friends!recommendaEons,!bot! detecEon,!profile!data!correcEon…! • To!get!it!mailto:dmitry.bugaychenko@corp.mail.ru! ! 30!
31.
What!is!it!about?! • About!us!and!the!size! • Smart!data!vs.!big!data! •
Smart!data!tasks! • Smart!data!deck! • Open!data! • A/bit/of/humor/(author/personal/opinion)/ 31!
32.
Data!mining!at!a!lab! 32!
33.
Data!mining!in!industry! 33!
34.
Data!mining!at!OK.ru! 34!
35.
Thank!you!for!your!a0enEon!! !!!?! 35!
Jetzt herunterladen