SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
IQT QUARTERLY




ThE NEW chALLENgE OF DATA INFLATION
By Jon Gosier



The online chatter of individuals, networks of friends or professionals, and the data being
created by networked devices are growing exponentially, although our time to consume
it all remains depressingly finite. There is a plethora of solutions that approach the
challenge of data inflation in different ways. But what methodologies have worked at
scale and how are they being used in the field?


The Infinite Age                                            entirely? Is the content the original blog post, or is it
                                                            that post coupled with the likes, retweets, comments,
First, let’s look at the landscape. The amount of           and so on?
content being created daily is incredible, and
includes the growth of image data, videos, networked        Content is now born and almost instantaneously
devices, and the contextual metadata created around         multiplies, not just through copying, but also through
these components.                                           the interactions individuals have with it — making it
                                                            their own and subsequently augmenting it.
The Information Age is behind us; we’re now living in
an Infinite Age, where the flow of data is ever-growing     In the Infinite Age, one person may have created the
and ever-changing. The life of this content may be          content, but because others consumed it, it’s as much
ephemeral (social media) or mostly constant (static         a representation of the reader as it is the author.
documents and files). In either case, the challenge is
dealing with it at human scale.                             The Inflation Epoch
In the time it takes to finish reading a given blog post,   In physics, the moment just after the Big Bang, but
chances are high that it’s been retweeted, cited,           prior to the early formation of the universe as we
aggregated, pushed, commented on, or ‘liked’ all over       understand it, is called the Planck Epoch or the GUT
the web. An all-encompassing verb to describe this is       Era (for the Grand Unified Theory that explains how all
sharing. All of this sharing confuses our ability           the four forces of nature were once unified). The
to define the essence of that original content because      Planck Epoch is defined as the moment of accelerated
value has been added at different points by people          growth when the universe expanded from a singularity
who’ve consumed it.                                         smaller than a proton.

Does that additional activity become a part of the          We mirror this moment a trillion times each day when
original content, or does it exist as something else        online content is created and augmented by the




 06      Vol. 3 No. 3    Identify. Adapt. Deliver. ™
IQT QUARTERLY




surrounding activity from others online. From the             have massive data centers that enable these indexes
moment we publish, the dogpile of interaction that            and queries to take seconds.
follows is a bit like the entropy that followed the Big
                                                              As big data becomes a growing problem inside of
Bang: exponential and rapid before gradually trailing off
                                                              organizations as much as outside, index and search
to a semi-static state where people may still interact,
                                                              technology has become one way to deal with data. A
but usually with less frequency, before leaving to
                                                              new swath of technologies allow organizations to
consume new content being published elsewhere.
                                                              accomplish this without the same level of
When people talk about big data, this is one of the           infrastructure because, understandably, not every
problems they are discussing. The amount of content           company can afford to spend like Google does on its
that is created daily can be measured in the billions         data challenges. Companies like Vertica, Cloudera,
(articles, blog posts, tweets, text messages, photos,         10Gen, and others provide database technology that
etc.), but that’s only where the problem starts. There’s      can be deployed internally or across cloud servers
also the activity that surrounds each of those items          (think Amazon Web Services), which makes dealing
— comment threads, photo memes, likes on Facebook,            with inflated content easier by structuring at the
and retweets on Twitter. Outside of this social               database level so that retrieving information takes
communication, there are other problems that must             fewer computing resources.
be solved, but for the sake of this article we’ll limit our
                                                              This approach allows organizations to capture
focus to big data as it relates to making social media
                                                              enormous quantities of data in a database so that it
communication actionable.
                                                              can be retrieved and made actionable later.
How are people dealing with social data at this scale to
make it actionable without becoming overwhelmed?
                                                              Methodology 2 – contextualization and
Methodology 1 – Index and Search                              Feature Extraction
                                                              Through the development of search technologies,
Search technology as a method for sorting through
incredibly large data sets is the one that most people        the phrase “feature extraction” became common
understand because they use it regularly in the form of       terminology in information retrieval circles. Feature
Google. Keeping track of all the URLs and links that          extraction uses algorithms to pull out the individual
make up the web is a seemingly Sisyphean task, but            nuances of content. This is done at what I call an
Google goes a step beyond that by crawling and                atomic level, meaning any characteristic of data that
indexing large portions of public content online. This        can be quantified. For instance, in an email, the TO:
helps Google perform searches much faster than                and FROM: addresses would be features of that email.
having to crawl the entire web every time a search is         The timestamp indicating when that email was sent is
executed. Instead, connections are formed at the              also a feature. The subject would be another. Within
database level, allowing queries to occur faster while        each of those there are more features as well, but this
using less server resources. Companies like Google            is the general high-level concept.




                                                                                                   Waveform 1 = Value
                                                                                                   for all Categories
                                                                                                   over time

                                                                                                   Waveform 2 = Value
                                                                                                   for Sub-Category
                                                                                                   over time

                                                                                                   Waveform 3 = Value
                                                                                                   for disparate Category
                                                                                                   over time


 Stacked Waveform graph used to plot data with values in a positive and negative domain (like sentiment analysis).
 Produced by the author using metaLayer.




                                                              IQT QUARTERLY WINTER 2012         Vol. 3 No. 3         07
IQT QUARTERLY




For the most part, search uses such features to help
improve the efficiency of the index. In contextualization
technologies, the practice is to use these features to
modify the initial content, adding it as metadata and
thereby creating a type of inflated content (as we’ve
augmented the original).

When users drag photos onto a map on Flickr, they are
contextualizing them by tagging the files with location
data. This location data makes the inflated content
actionable; we can view it on a map, or we can segment
photos by where they were taken. This new location
data is an augmentation of the original metadata,
and creates something that previously did not exist.
When we’re dealing in the hundreds of thousands,
millions, and billions of content items, feature extraction
is used to carry out the previous example at scale.

The following is a real-world use case, though I’ve
                                                               Infographic showing network analysis of the SwiftRiver
been careful not to divulge any of the client’s                platform. Produced by the author using Circos.
proprietary details.

Recently at my company metaLayer, a colleague
came to us with one terabyte of messages pulled               Methodology 3 – Visualization
from Twitter. These messages had been posted                  Visualization is another way to deal with excessive data
during Hurricane Irene. The client needed to generate         problems. Visualization may sound like an abstract
conclusions about these messages that were not                term, but the visual domain is actually one of the most
going to be possible in their original loosely structured     basic things humans can use for relating complex
form. He asked us to help him structure this data,            concepts to one another. We’ve used symbols for
find the items he was looking for (messages about             thousands of years to do this. Data visualizations and
Hurricane Irene or people affected by it), and extract        infographics simply use symbols to cross barriers of
the features that would be useful for him.                    understanding about the complexities of research so
The client lacked the sufficient context to identify what     that, regardless of expertise, everyone involved has a
was most relevant in the data set. So we used our             better understanding of a problem.
platform to algorithmically extract features like             Content producers like the New York Times have found
sentiment, location, and taxonomic values from each           that visualizations are a great way to increase
Twitter message using natural language processing.            audience engagement with news content. The
Because it was an algorithmic process, this only took         explosion of interest in infographics and data
a few hours, allowing the client to get a baseline that       visualizations online echoes this.
made the rest of his research possible. Now the data
could be visualized or segmented in ways that weren’t         To visualize excessive data sets, leveraging some of the
possible with the initial content. The inflated content       previous methods makes discovering hidden patterns
— metadata generated algorithmically — included               and trends a visual, and likely more intuitive, process.
elements that made his research possible. We now
had an individual profile of every message that gave          Methodology 4 – crowd Filtering
us a clue about its tone, location of origin, and how to      and curation
categorize the message. This allowed the client's team
                                                              The rise of crowdsourcing methodologies presents a
to look at the data with a new level of confidence.
                                                              new framework for dealing with certain types of
In the context of our social data challenge, these            information overload. Websites like Digg and Reddit
extracted features might be used on their own by an           are examples of using crowdsourcing to vet and
application, or they might become part of an index or         prioritize data by the cumulative interests of a given
database like the one mentioned previously.                   community. On these websites, anyone can contribute
                                                              information, but only the material deemed to be




 08       Vol. 3 No. 3    Identify. Adapt. Deliver. ™
IQT QUARTERLY




                                                              Big data can refer to managing
                                                              an excess of data, to an overwhelming
                                                              feed of data, or to the rapid proliferation
                                                              of inflated content due to the meta-
                                                              values added through sharing.




interesting by the highest number of people will rise          Curation has risen as a term over the past decade to
to the top. By leveraging the community of users and           refer to the collection, intermixing, and re-syndication
consumers itself as a resource, the admin passes the           of inflated content. It is used to refer to the
responsibility of finding the most relevant content to         construction of narratives without actually producing
the users.                                                     any original content, instead taking relevant bits of
                                                               content created by others and using them as the
This, of course, won’t work in quite the same way for
                                                               building blocks for something new. This presentation
an organization’s internal project or your data mining
                                                               of public data as edited by others represents a new
project, but it does work if you want to limit the
                                                               work, though this “new work” may be made up of
information collected to those in the crowd who have
                                                               nothing original at all. By carefully selecting curators
some measure of authority or authenticity.
                                                               whom you know will be selective in what they curate,
The news curation service Storyful.com is a great              the aggregate of information produced should be of a
example of using crowd-sourced information to report           higher quality.
on breaking events around the globe. Its system works
                                                               conclusion
not because the masses are telling the stories (that
would lead to unmanageable chaos), but because the             Big data, like most tech catchphrases, means different
staff behind Storyful has pre-vetted contributors. This        things to different people. It can refer to managing an
is known as bounded crowdsourcing, which simply                excess of data, to an overwhelming feed of data, or to
means to extend some measure of authority or                   the rapid proliferation of inflated content due to the
preference to a subset of a larger group. In this case,        meta-values added through sharing. Pulling
the larger group is anyone using social media around           actionable information from streams of social
the world, whereas the bounded group is only those             communication represents a unique challenge in that
around the world that Storyful’s staff has deemed to           it embodies all aspects of the phrase and the
be consistent in their reliability, authority, and             accompanying challenges. Ultimately, if content is
authenticity. This is commonly referred to as curating         growing exponentially, the methods to manage it have
the curators.                                                  to be capable of equal speed and scale.



Jon Gosier is the co-founder of metaLayer Inc., which offers products for visualization, analytics, and the structure of
indiscriminate data sets. metaLayer gives companies access to complex visualization and algorithmic tools, making
intelligence solutions intuitive and more affordable. Jon is a former staff member at Ushahidi, an organization that
provides open source data products for global disaster response and has worked on signal to noise problems with
international journalists and defense organizations as Director of its SwiftRiver team.




                                                               IQT QUARTERLY WINTER 2012           Vol. 3 No. 3       09

Weitere ähnliche Inhalte

Was ist angesagt?

Abundance Economics
Abundance EconomicsAbundance Economics
Abundance EconomicsMelanie Swan
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Amit Sheth
 
Blockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACsBlockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACsMelanie Swan
 
PowerMeeting6nov2010
PowerMeeting6nov2010PowerMeeting6nov2010
PowerMeeting6nov2010Weigang Wang
 

Was ist angesagt? (7)

Abundance Economics
Abundance EconomicsAbundance Economics
Abundance Economics
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
 
Blockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACsBlockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACs
 
Data dynamite presentation
Data dynamite presentationData dynamite presentation
Data dynamite presentation
 
Avatar
AvatarAvatar
Avatar
 
8108-37744-1-PB.pdf
8108-37744-1-PB.pdf8108-37744-1-PB.pdf
8108-37744-1-PB.pdf
 
PowerMeeting6nov2010
PowerMeeting6nov2010PowerMeeting6nov2010
PowerMeeting6nov2010
 

Ähnlich wie The New Challenge of Data Inflation

Towards the Design of Intelligible Object-based Applications for the Web of T...
Towards the Design of Intelligible Object-based Applications for the Web of T...Towards the Design of Intelligible Object-based Applications for the Web of T...
Towards the Design of Intelligible Object-based Applications for the Web of T...Pierrick Thébault
 
Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0New Media Days
 
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...thiteu
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detectionIJICTJOURNAL
 
Tags, Networks, Narrative
Tags, Networks, NarrativeTags, Networks, Narrative
Tags, Networks, NarrativeBruce Mason
 
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital businessStrategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital businessMarco Brambilla
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET Journal
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
Estudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social MediaEstudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social Mediaeliasvillagran
 
Selas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedSelas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedZiya NISANOGLU
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019Richard Zijdeman
 
14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdfMehwishKanwal14
 
Delivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudDelivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudBooz Allen Hamilton
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Karen S Calhoun
 

Ähnlich wie The New Challenge of Data Inflation (20)

Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Towards the Design of Intelligible Object-based Applications for the Web of T...
Towards the Design of Intelligible Object-based Applications for the Web of T...Towards the Design of Intelligible Object-based Applications for the Web of T...
Towards the Design of Intelligible Object-based Applications for the Web of T...
 
Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0
 
Ljc
LjcLjc
Ljc
 
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detection
 
Tags, Networks, Narrative
Tags, Networks, NarrativeTags, Networks, Narrative
Tags, Networks, Narrative
 
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital businessStrategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
 
ICICCE0280
ICICCE0280ICICCE0280
ICICCE0280
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Estudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social MediaEstudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social Media
 
Selas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedSelas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media Excerpted
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
 
14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf
 
Delivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudDelivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the Cloud
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008
 

Mehr von Jon Gosier

Film + TV Private Finance Insights 2022 [Report]
Film + TV Private Finance Insights 2022 [Report]Film + TV Private Finance Insights 2022 [Report]
Film + TV Private Finance Insights 2022 [Report]Jon Gosier
 
SCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M Movies
SCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M MoviesSCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M Movies
SCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M MoviesJon Gosier
 
Fan Meta Data: Goldmine or Landmine?
Fan Meta Data: Goldmine or Landmine?Fan Meta Data: Goldmine or Landmine?
Fan Meta Data: Goldmine or Landmine?Jon Gosier
 
Who Influences Who
Who Influences WhoWho Influences Who
Who Influences WhoJon Gosier
 
Why Startups Fail
Why Startups FailWhy Startups Fail
Why Startups FailJon Gosier
 
The Science of Failing Forward: Congress of Future Science and Technology Lea...
The Science of Failing Forward: Congress of Future Science and Technology Lea...The Science of Failing Forward: Congress of Future Science and Technology Lea...
The Science of Failing Forward: Congress of Future Science and Technology Lea...Jon Gosier
 
ODX: Designing for Better Outcomes
ODX: Designing for Better OutcomesODX: Designing for Better Outcomes
ODX: Designing for Better OutcomesJon Gosier
 
IOT4I Internet of Things for Impact | Netherlands
IOT4I Internet of Things for Impact | NetherlandsIOT4I Internet of Things for Impact | Netherlands
IOT4I Internet of Things for Impact | NetherlandsJon Gosier
 
Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015
Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015
Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015Jon Gosier
 
Africa's Third Act: Why the Continent Matters Now More than Ever
Africa's Third Act: Why the Continent Matters Now More than EverAfrica's Third Act: Why the Continent Matters Now More than Ever
Africa's Third Act: Why the Continent Matters Now More than EverJon Gosier
 
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Pa...
Predicting Macroeconomic Trends  Through Real-Time Mobile Data Collection [Pa...Predicting Macroeconomic Trends  Through Real-Time Mobile Data Collection [Pa...
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Pa...Jon Gosier
 
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...Jon Gosier
 
21st Century Strategies for Financial Inclusion
21st Century Strategies for Financial Inclusion21st Century Strategies for Financial Inclusion
21st Century Strategies for Financial InclusionJon Gosier
 
About Ebola Deeply - Slides from Foreign Press Center
About Ebola Deeply - Slides from Foreign Press CenterAbout Ebola Deeply - Slides from Foreign Press Center
About Ebola Deeply - Slides from Foreign Press CenterJon Gosier
 
Philly Tech Week 2014 - The Future of Digital Marketing - Data-Mining
Philly Tech Week 2014 - The Future of Digital Marketing - Data-MiningPhilly Tech Week 2014 - The Future of Digital Marketing - Data-Mining
Philly Tech Week 2014 - The Future of Digital Marketing - Data-MiningJon Gosier
 
Using Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and InterventionUsing Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and InterventionJon Gosier
 
Data Mining Online Audiences with D8A Group
Data Mining Online Audiences with D8A GroupData Mining Online Audiences with D8A Group
Data Mining Online Audiences with D8A GroupJon Gosier
 
Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...
Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...
Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...Jon Gosier
 
The Humanitarian Face of Big Data | ICCM 2013
The Humanitarian Face of Big Data | ICCM 2013The Humanitarian Face of Big Data | ICCM 2013
The Humanitarian Face of Big Data | ICCM 2013Jon Gosier
 
Structuring Your Social Enterprise #FTW
Structuring Your Social Enterprise #FTWStructuring Your Social Enterprise #FTW
Structuring Your Social Enterprise #FTWJon Gosier
 

Mehr von Jon Gosier (20)

Film + TV Private Finance Insights 2022 [Report]
Film + TV Private Finance Insights 2022 [Report]Film + TV Private Finance Insights 2022 [Report]
Film + TV Private Finance Insights 2022 [Report]
 
SCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M Movies
SCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M MoviesSCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M Movies
SCALETECH 2022: What Scaling $300M Startups Taught me about Scaling $300M Movies
 
Fan Meta Data: Goldmine or Landmine?
Fan Meta Data: Goldmine or Landmine?Fan Meta Data: Goldmine or Landmine?
Fan Meta Data: Goldmine or Landmine?
 
Who Influences Who
Who Influences WhoWho Influences Who
Who Influences Who
 
Why Startups Fail
Why Startups FailWhy Startups Fail
Why Startups Fail
 
The Science of Failing Forward: Congress of Future Science and Technology Lea...
The Science of Failing Forward: Congress of Future Science and Technology Lea...The Science of Failing Forward: Congress of Future Science and Technology Lea...
The Science of Failing Forward: Congress of Future Science and Technology Lea...
 
ODX: Designing for Better Outcomes
ODX: Designing for Better OutcomesODX: Designing for Better Outcomes
ODX: Designing for Better Outcomes
 
IOT4I Internet of Things for Impact | Netherlands
IOT4I Internet of Things for Impact | NetherlandsIOT4I Internet of Things for Impact | Netherlands
IOT4I Internet of Things for Impact | Netherlands
 
Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015
Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015
Africa's Third Act: Why the Continent Matters Now More than Ever #BTW2015
 
Africa's Third Act: Why the Continent Matters Now More than Ever
Africa's Third Act: Why the Continent Matters Now More than EverAfrica's Third Act: Why the Continent Matters Now More than Ever
Africa's Third Act: Why the Continent Matters Now More than Ever
 
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Pa...
Predicting Macroeconomic Trends  Through Real-Time Mobile Data Collection [Pa...Predicting Macroeconomic Trends  Through Real-Time Mobile Data Collection [Pa...
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Pa...
 
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...
Predicting Macroeconomic Trends Through Real-Time Mobile Data Collection [Sli...
 
21st Century Strategies for Financial Inclusion
21st Century Strategies for Financial Inclusion21st Century Strategies for Financial Inclusion
21st Century Strategies for Financial Inclusion
 
About Ebola Deeply - Slides from Foreign Press Center
About Ebola Deeply - Slides from Foreign Press CenterAbout Ebola Deeply - Slides from Foreign Press Center
About Ebola Deeply - Slides from Foreign Press Center
 
Philly Tech Week 2014 - The Future of Digital Marketing - Data-Mining
Philly Tech Week 2014 - The Future of Digital Marketing - Data-MiningPhilly Tech Week 2014 - The Future of Digital Marketing - Data-Mining
Philly Tech Week 2014 - The Future of Digital Marketing - Data-Mining
 
Using Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and InterventionUsing Predictive Analytics for Anticipatory Investigation and Intervention
Using Predictive Analytics for Anticipatory Investigation and Intervention
 
Data Mining Online Audiences with D8A Group
Data Mining Online Audiences with D8A GroupData Mining Online Audiences with D8A Group
Data Mining Online Audiences with D8A Group
 
Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...
Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...
Data-Driven Crisis Monitoring: Turning Online Activity into Actionable Insigh...
 
The Humanitarian Face of Big Data | ICCM 2013
The Humanitarian Face of Big Data | ICCM 2013The Humanitarian Face of Big Data | ICCM 2013
The Humanitarian Face of Big Data | ICCM 2013
 
Structuring Your Social Enterprise #FTW
Structuring Your Social Enterprise #FTWStructuring Your Social Enterprise #FTW
Structuring Your Social Enterprise #FTW
 

Kürzlich hochgeladen

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Kürzlich hochgeladen (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

The New Challenge of Data Inflation

  • 1. IQT QUARTERLY ThE NEW chALLENgE OF DATA INFLATION By Jon Gosier The online chatter of individuals, networks of friends or professionals, and the data being created by networked devices are growing exponentially, although our time to consume it all remains depressingly finite. There is a plethora of solutions that approach the challenge of data inflation in different ways. But what methodologies have worked at scale and how are they being used in the field? The Infinite Age entirely? Is the content the original blog post, or is it that post coupled with the likes, retweets, comments, First, let’s look at the landscape. The amount of and so on? content being created daily is incredible, and includes the growth of image data, videos, networked Content is now born and almost instantaneously devices, and the contextual metadata created around multiplies, not just through copying, but also through these components. the interactions individuals have with it — making it their own and subsequently augmenting it. The Information Age is behind us; we’re now living in an Infinite Age, where the flow of data is ever-growing In the Infinite Age, one person may have created the and ever-changing. The life of this content may be content, but because others consumed it, it’s as much ephemeral (social media) or mostly constant (static a representation of the reader as it is the author. documents and files). In either case, the challenge is dealing with it at human scale. The Inflation Epoch In the time it takes to finish reading a given blog post, In physics, the moment just after the Big Bang, but chances are high that it’s been retweeted, cited, prior to the early formation of the universe as we aggregated, pushed, commented on, or ‘liked’ all over understand it, is called the Planck Epoch or the GUT the web. An all-encompassing verb to describe this is Era (for the Grand Unified Theory that explains how all sharing. All of this sharing confuses our ability the four forces of nature were once unified). The to define the essence of that original content because Planck Epoch is defined as the moment of accelerated value has been added at different points by people growth when the universe expanded from a singularity who’ve consumed it. smaller than a proton. Does that additional activity become a part of the We mirror this moment a trillion times each day when original content, or does it exist as something else online content is created and augmented by the 06 Vol. 3 No. 3 Identify. Adapt. Deliver. ™
  • 2. IQT QUARTERLY surrounding activity from others online. From the have massive data centers that enable these indexes moment we publish, the dogpile of interaction that and queries to take seconds. follows is a bit like the entropy that followed the Big As big data becomes a growing problem inside of Bang: exponential and rapid before gradually trailing off organizations as much as outside, index and search to a semi-static state where people may still interact, technology has become one way to deal with data. A but usually with less frequency, before leaving to new swath of technologies allow organizations to consume new content being published elsewhere. accomplish this without the same level of When people talk about big data, this is one of the infrastructure because, understandably, not every problems they are discussing. The amount of content company can afford to spend like Google does on its that is created daily can be measured in the billions data challenges. Companies like Vertica, Cloudera, (articles, blog posts, tweets, text messages, photos, 10Gen, and others provide database technology that etc.), but that’s only where the problem starts. There’s can be deployed internally or across cloud servers also the activity that surrounds each of those items (think Amazon Web Services), which makes dealing — comment threads, photo memes, likes on Facebook, with inflated content easier by structuring at the and retweets on Twitter. Outside of this social database level so that retrieving information takes communication, there are other problems that must fewer computing resources. be solved, but for the sake of this article we’ll limit our This approach allows organizations to capture focus to big data as it relates to making social media enormous quantities of data in a database so that it communication actionable. can be retrieved and made actionable later. How are people dealing with social data at this scale to make it actionable without becoming overwhelmed? Methodology 2 – contextualization and Methodology 1 – Index and Search Feature Extraction Through the development of search technologies, Search technology as a method for sorting through incredibly large data sets is the one that most people the phrase “feature extraction” became common understand because they use it regularly in the form of terminology in information retrieval circles. Feature Google. Keeping track of all the URLs and links that extraction uses algorithms to pull out the individual make up the web is a seemingly Sisyphean task, but nuances of content. This is done at what I call an Google goes a step beyond that by crawling and atomic level, meaning any characteristic of data that indexing large portions of public content online. This can be quantified. For instance, in an email, the TO: helps Google perform searches much faster than and FROM: addresses would be features of that email. having to crawl the entire web every time a search is The timestamp indicating when that email was sent is executed. Instead, connections are formed at the also a feature. The subject would be another. Within database level, allowing queries to occur faster while each of those there are more features as well, but this using less server resources. Companies like Google is the general high-level concept. Waveform 1 = Value for all Categories over time Waveform 2 = Value for Sub-Category over time Waveform 3 = Value for disparate Category over time Stacked Waveform graph used to plot data with values in a positive and negative domain (like sentiment analysis). Produced by the author using metaLayer. IQT QUARTERLY WINTER 2012 Vol. 3 No. 3 07
  • 3. IQT QUARTERLY For the most part, search uses such features to help improve the efficiency of the index. In contextualization technologies, the practice is to use these features to modify the initial content, adding it as metadata and thereby creating a type of inflated content (as we’ve augmented the original). When users drag photos onto a map on Flickr, they are contextualizing them by tagging the files with location data. This location data makes the inflated content actionable; we can view it on a map, or we can segment photos by where they were taken. This new location data is an augmentation of the original metadata, and creates something that previously did not exist. When we’re dealing in the hundreds of thousands, millions, and billions of content items, feature extraction is used to carry out the previous example at scale. The following is a real-world use case, though I’ve Infographic showing network analysis of the SwiftRiver been careful not to divulge any of the client’s platform. Produced by the author using Circos. proprietary details. Recently at my company metaLayer, a colleague came to us with one terabyte of messages pulled Methodology 3 – Visualization from Twitter. These messages had been posted Visualization is another way to deal with excessive data during Hurricane Irene. The client needed to generate problems. Visualization may sound like an abstract conclusions about these messages that were not term, but the visual domain is actually one of the most going to be possible in their original loosely structured basic things humans can use for relating complex form. He asked us to help him structure this data, concepts to one another. We’ve used symbols for find the items he was looking for (messages about thousands of years to do this. Data visualizations and Hurricane Irene or people affected by it), and extract infographics simply use symbols to cross barriers of the features that would be useful for him. understanding about the complexities of research so The client lacked the sufficient context to identify what that, regardless of expertise, everyone involved has a was most relevant in the data set. So we used our better understanding of a problem. platform to algorithmically extract features like Content producers like the New York Times have found sentiment, location, and taxonomic values from each that visualizations are a great way to increase Twitter message using natural language processing. audience engagement with news content. The Because it was an algorithmic process, this only took explosion of interest in infographics and data a few hours, allowing the client to get a baseline that visualizations online echoes this. made the rest of his research possible. Now the data could be visualized or segmented in ways that weren’t To visualize excessive data sets, leveraging some of the possible with the initial content. The inflated content previous methods makes discovering hidden patterns — metadata generated algorithmically — included and trends a visual, and likely more intuitive, process. elements that made his research possible. We now had an individual profile of every message that gave Methodology 4 – crowd Filtering us a clue about its tone, location of origin, and how to and curation categorize the message. This allowed the client's team The rise of crowdsourcing methodologies presents a to look at the data with a new level of confidence. new framework for dealing with certain types of In the context of our social data challenge, these information overload. Websites like Digg and Reddit extracted features might be used on their own by an are examples of using crowdsourcing to vet and application, or they might become part of an index or prioritize data by the cumulative interests of a given database like the one mentioned previously. community. On these websites, anyone can contribute information, but only the material deemed to be 08 Vol. 3 No. 3 Identify. Adapt. Deliver. ™
  • 4. IQT QUARTERLY Big data can refer to managing an excess of data, to an overwhelming feed of data, or to the rapid proliferation of inflated content due to the meta- values added through sharing. interesting by the highest number of people will rise Curation has risen as a term over the past decade to to the top. By leveraging the community of users and refer to the collection, intermixing, and re-syndication consumers itself as a resource, the admin passes the of inflated content. It is used to refer to the responsibility of finding the most relevant content to construction of narratives without actually producing the users. any original content, instead taking relevant bits of content created by others and using them as the This, of course, won’t work in quite the same way for building blocks for something new. This presentation an organization’s internal project or your data mining of public data as edited by others represents a new project, but it does work if you want to limit the work, though this “new work” may be made up of information collected to those in the crowd who have nothing original at all. By carefully selecting curators some measure of authority or authenticity. whom you know will be selective in what they curate, The news curation service Storyful.com is a great the aggregate of information produced should be of a example of using crowd-sourced information to report higher quality. on breaking events around the globe. Its system works conclusion not because the masses are telling the stories (that would lead to unmanageable chaos), but because the Big data, like most tech catchphrases, means different staff behind Storyful has pre-vetted contributors. This things to different people. It can refer to managing an is known as bounded crowdsourcing, which simply excess of data, to an overwhelming feed of data, or to means to extend some measure of authority or the rapid proliferation of inflated content due to the preference to a subset of a larger group. In this case, meta-values added through sharing. Pulling the larger group is anyone using social media around actionable information from streams of social the world, whereas the bounded group is only those communication represents a unique challenge in that around the world that Storyful’s staff has deemed to it embodies all aspects of the phrase and the be consistent in their reliability, authority, and accompanying challenges. Ultimately, if content is authenticity. This is commonly referred to as curating growing exponentially, the methods to manage it have the curators. to be capable of equal speed and scale. Jon Gosier is the co-founder of metaLayer Inc., which offers products for visualization, analytics, and the structure of indiscriminate data sets. metaLayer gives companies access to complex visualization and algorithmic tools, making intelligence solutions intuitive and more affordable. Jon is a former staff member at Ushahidi, an organization that provides open source data products for global disaster response and has worked on signal to noise problems with international journalists and defense organizations as Director of its SwiftRiver team. IQT QUARTERLY WINTER 2012 Vol. 3 No. 3 09