Convergence Partners has released its latest research report on big data and its meaning for Africa. The report argues that big data poses a threat to those it overlooks, namely a large percentage of Africa’s populace, who remain on big data’s periphery.
2. 17 December 2013
‘Data is the new oil; like oil, it must be refined before it can be used.’
Summary
Of concern to us in the developing world is that the current ecosystem around big
data creates a new kind of digital divide: the big data rich (developed world) and the
big data poor (developing world). This report argues that big data poses a threat to
those it overlooks, namely a large percentage of Africa’s populace, who remain on
big data’s periphery. As most Africans use feature phones, and not smartphones
‘they do not regularly contribute data to be analysed, as they do not routinely engage
in activities that big data is designed to capture’1. Additionally, the report discusses
the political economy of big data, its implications on policymaking, warns against a
scramble for Africa’s data and outlines opportunities for the Continent to fully exploit
the advent of big data. It is argued that that there is a requirement for the active
involvement of policymakers, business and civil society to ensure that Africa
leverages the benefits, and addresses the potential pitfalls that the big data
phenomenon may create.
Background
The phenomenal adoption of mobile phones on the African continent over the last
twenty years2, in tandem with the proliferation of connected devices and fledgling
‘Internet of things’ has heralded the arrival of the big data era on the Continent.
Africans are increasingly emitting and creating digital information with their mobile
phones, Internet use and various forms of digital transactions. Globally, ‘computing
has become ubiquitous, creating countless new digital puddles and oceans of
information’3. Google states that ‘the first five exabytes of information created were
between the dawn of civilisation and 2003, whereas that much information is now
created every two days, with the pace increasing.’4 Every animate and inanimate
1
2
3
4
Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford)
ITU estimates 63.5% mobile penetration in 2013
Bollier D. (2010) The Promise and Peril of big Data. (The Aspen Institute: Maryland)
Ibid
2
3. 17 December 2013
object on earth will soon be generating data, and Cisco forecasts that thirty-seven
billion intelligent devices will connect to the Internet by 2020.5 These devices and
sensors drive exponentially growing data traffic, which in 2012 was almost twelve
times larger than all global Internet traffic in 2000.
This wealth of new data, in turn, accelerates advances in computing – creating a
virtuous cycle of big data. Analytics is now more accessible, owing to both the
precipitous drop in the price of storage technologies and processing bandwidth6.
Cluster computing systems provide the storage capacity, computing power and highspeed local area networks to handle these large data sets. In conjunction with ‘new
forms of computation combining statistical analysis, optimisation and artificial
intelligence’ 7 , researchers are able to construct statistical models from large
collections of data to infer how the system should respond to new data.
Notwithstanding these developments, the analytics of big data is still in its infancy
globally, and more so in emerging economies.
5
6
7
http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf
Russom P. (2011) Big Data Analytics. (TDWI research: Washington)
Bollier D. (2010) The Promise and Peril of big Data. (The Aspen Institute: Maryland)
3
4. 17 December 2013
Big data is, in many ways, a poor term. Traditionally, it has been understood using
three characteristics, namely: volume, variety and velocity.
Source: TDWI research
Big data is thus conceived of as a ‘massive volume of both structured and
unstructured data, generated internally by and externally to organisations, that is so
large that it's difficult to process with traditional database and software techniques’8.
Though there is little doubt that the quantities of data now available are often quite
large, this is not the defining characteristic of this new data ecosystem. ‘Big data is
less about data that is big, than it is about a capacity to search, aggregate and crossreference large data sets.’9 Karen Levy argues that ‘data is big not because of the
number of points that comprise a particular dataset, nor the statistical methods used
8
9
Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford)
Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly
phenomenon. (Information, Communication and Society Journal)
4
5. 17 December 2013
to analyse them, nor the computational power on which such analysis relies. Instead,
data is big because of the depth to which it has come to pervade our personal
connections to one another’10. Big data is thus better defined as a socio-technical
phenomenon that rests on the interplay of:
•
‘Technology: maximising computation power and algorithmic accuracy
to gather, analyse, link, and compare large data sets;
•
Analysis: drawing on large data sets to identify patterns in order to
make economic, social, technical, and legal claims; and
•
Mythology: the widespread belief that large data sets offer a higher
form of intelligence and knowledge that can generate insights that were
previously impossible, with the aura of truth, objectivity, and
accuracy’11
Big Data and its Discontents
Since the turn of the century, and particularly since the advent of social media,
consumers have volunteered volumes of personal data. Unstructured data, which
constitutes 80% of all data, describes information formatted as natural language
rather than numerical figures. Unstructured data encompasses everything from
social media interactions, to recordings, to emails and more. As previously stated,
the proliferation of smartphones, tablets and other devices has exponentially
accelerated data creation to the extent that it is now estimated that the rate at which
data is generated and captured is doubling every 90 days.
Though the promise of big data lies within the ability to make predictions based on it,
it is imperative that a cautionary note be sounded to data evangelists who have a
utopian view of the promise of big data. Though admittedly more useful than
traditional statistics, big data is not a panacea as there are questions around the
reliability, accuracy and representativeness of its data sets. ‘Technology is neither
good nor bad; nor is it neutral. Technology’s interaction with the social ecology is
10
Levy K. (2013) Relational Big Data. (Stanford Law Review: Stanford)
11
Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly
phenomenon. (Information, Communication and Society Journal)
5
6. 17 December 2013
such that technical developments frequently have environmental, social and human
consequences that go far beyond the immediate purposes of the technical devices
and practices themselves’12. Like other socio-technical phenomena, big data triggers
both utopian and dystopian rhetoric. ‘On one hand, big data is seen as a powerful
tool to address various societal ills, offering the potential of new insights into areas
as diverse as medical research and climate change. On the other, big data is seen
as a troubling manifestation of big brother enabling invasions of privacy, decreasing
civil liberties and increasing state control’13.
Of particular concern to us in the developing world is that the current ecosystem
around big data creates a new kind of digital divide: the big data rich (developed
world) and the big data poor (developing world). This report argues that big data
poses a threat to those it overlooks, namely a large percentage of Africa’s populace,
who remain on big data’s periphery. As most Africans use feature phones, and not
smartphones ‘they do no regularly contribute data to be analysed, as they do not
routinely engage in activities that big data is designed to capture’14. Consequently,
their preferences and needs risk being ignored when governments use big data and
advanced analytics to shape public policy. The danger is that as we increasingly rely
on big data’s numbers to speak for themselves, we risk misunderstanding the results
and in turn misallocating important public resources15. Thus, with every big data set,
we need to ask which people and data sets are excluded. Many African data sets
exhibit this ‘signal problem’ where data are assumed to accurately reflect the social
world, whereas there are significant gaps with little or no signal coming from
particular communities.
In a future where big data, and the predictions it makes possible, will fundamentally
reorder government and the marketplace, ‘the exclusion of poor and otherwise
marginalised people from datasets has troubling implications for economic
opportunity, social mobility and democratic participation’16. These technologies may
create a new kind of voicelessness, where certain groups’ preferences and
behaviours receive little or no consideration when political elites decide how to
12
Ibid
Ibid
14
Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford)
13
15
16
http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/
Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford)
6
7. 17 December 2013
distribute goods and services, and how to reform public and private institutions. Of
course, the poor (and most Africans by extension) are in many ways already
marginalised, but big data could reinforce and exacerbate existing problems.
Moreover, the use, abuse and misuse of data are a troubling lesson about the
limitations of information as the world hurtles toward the big data era. The underlying
data in most African countries are of poor quality, unrepresentative and can be
biased meaning it is more likely they will be misanalysed and used misleadingly.
Even more damning is that data can fail to capture what it purports to quantify. As
big data is largely in the languages of the developed world, it further isolates African
language content. However, an opportunity exists for the creation of these African
language specific data sets by Africans, whether by converting existing large
amounts of analogue African data (through crowd sourcing the digitisation process),
or uploading the extensive video content that resides in African broadcasters’
archives.
Big data holds substantial potential for the future, and large dataset analysis has
important uses. However, the promise of big data is, and will be, best fulfilled when
its limitations, biases and features are adequately understood and taken into account
when interpreting the data 17 .
As is evident, big data is the source of both
tremendous promise and disquieting surveillance. In reality, like any complex social
phenomenon, big data is both of these, a set of heterogeneous resources and
practices deployed in multiple ends toward diverse ends.
The Political Economy of Big Data
As articulated above, big data has the potential to ‘solidify existing inequalities and
stratifications, and to create new ones’18. It could restructure societies so that the
only people who matter – quite literally the ones who count – are those who regularly
contribute to the right data flows. Manovich has argued that there are three classes
of people in the realm of big data, namely: ‘those who create data (both consciously
and by leaving digital footprints), those who have the means to collect it, and those
17
United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York)
18
Ibid
7
8. 17 December 2013
who have the expertise to analyse it. This last group is the smallest, and most
privileged as they are the ones that get to determine the rules about how big data will
be used and who gets to participate’ 19 . However, in the African context it is
necessary to ask questions about what all this data means, who gets to access it,
how data analysis is deployed, and to what ends. It is worth noting that there is a
scarcity of data analysts on the Continent, which then begs the question of who will
determine the African agenda, asking relevant questions and ensuring inclusivity in
the research undertaken. It is imperative that big data on the Continent do away with
the ‘politics of the missing’ to render visible the poor and marginalised in developing
countries. However, Africa’s paucity of reliable communications infrastructure poses
a significant challenge for the application of big data, as the network backbone
required for big data systems is sorely lacking. Key constraints are that current
network deployments do not have sufficient reach into the populace, are of poor
quality, overpriced and a low capacity. It is imperative that these factors be
addressed, as they are vital for a thriving big data ecosystem.
The data emanating from mobile phones holds particular promise, in part because
for many low-income people it is their only form of interactive technology. Utilising
this data created by mobile phones can improve our understanding of vulnerable
populations, and quicken governments’ response to the emergence of new trends20.
Big Data and Development
Though big data and real-time analytics are no modern panacea for age-old
development challenges, ‘the diffusion of data science into the realm of development
constitutes a genuine opportunity to bring powerful new tools to the fight against
poverty, hunger and disease’21. To this end, the United Nations launched Global
Pulse in 2009 ‘to leverage innovations in digital data, rapid data collection and
analysis to help decision makers gain a real-time understanding of how crises impact
19
Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly
phenomenon. (Information, Communication and Society Journal)
20
World Economic Forum. (2012) Big Data, Big Impact: New Possibilities for International Development. (World Economic
Forum: Geneva)
21
Ibid
8
9. 17 December 2013
vulnerable populations 22 .’ Big data for development is about turning imperfect,
complex, often unstructured data into actionable information. This implies using
advanced computational tools, such as machine learning, which have developed in
other fields, to reveal trends and correlations within and across large datasets that
would otherwise remain undiscovered. Additionally, the GSMA has developed a
‘Mobile for Development Intelligence’ with the aim of persuading mobile operators to
share data with researchers and development organisations. Its mission statement
reads that ‘open access to high quality data will improve decision making, increase
total investment from the commercial mobile industry and development sector, and
accelerate economic, environmental and social impact from mobile solutions23.’
The data philanthropy discussed above, which entails corporations anonymising their
data and providing it to development organisations to mine for insights, patterns and
trends in (or near) real-time is still in its infancy. Data philanthropy is a laudable
advancement as it seeks to minimise Africa’s information asymmetries through the
creation of data commons, which are a critical input of big data for development. This
data can be conceived of as a public good, as it is both non-rivalrous and nonexcludable, ensuring that one’s use of the data does not restrict its availability to
others. As such, the benefits of creating and maintaining a data commons are that
the information benefits society as a whole, while protecting individual security. A
more concerted effort is required to make open data commons a reality, and
success.
22
Ibid
23
United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York)
9
10. 17 December 2013
Source: United Nations Global Pulse
However, though data may be public (or semi-public) this does not simplistically
equate with full permission being given for all uses24. Big data researchers rarely
acknowledge that there is a ‘considerable difference between being in public (eg.
sitting in a park) and being public (eg. actively courting attention)’. The ethical and
policy implications of big data will be addressed below.
24
Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly
phenomenon. (Information, Communication and Society Journal)
10
11. 17 December 2013
Policy Implications of Big Data
The advent of big data presents significant opportunities and challenges for Africa’s
information and communications technologies (ICT) policy making. Big data is, at its
core, a social phenomenon – though the dominant narrative reduces people to mere
data points to be acted upon. ‘Big data and its attendant practices aren’t monoliths;
rather diverse and socially contingent, a fact which any policy analysis of big data
phenomena must consider’ 25 . As lines between the physical and digital world
continue to blur, and as big data and advanced analytics increasingly ‘shape
governmental decision-making about the allocation of resources, equality and
privacy principles will grow increasingly intertwined’ 26 . Moreover, exclusion or
underrepresentation in government datasets, then could mean losing out on
important government services and public goods. Policymakers thus need to be
aware of the possibility that the big data revolution may create new forms of
inequality and subordination, which raise broad democracy concerns.
As such, ensuring that the big data revolution is a joint revolution, ‘one whose
benefits are broadly and equitably shared, may also require, paradoxically, a right
not to be forgotten – a right against exclusion’27. A data antisubordination policy28
would ensure this. This antisubordination policy would, at a minimum ‘provide those
who live outside or on the margins of data flows some guarantee that their status as
persons with ‘light data footprints’ will not subject them to unequal treatment by the
state in the allocation of public goods and services’ 29 . This mooted data
antisubordination policy would also ensure that public institutions be required to
mitigate the disparate impact that their use of big data may have on persons who live
outside or on the margins of government datasets. Similarly, public servants relying
on big data for policymaking and other core democratic functions should be
compelled to take steps to ensure that big data’s marginalised groups continue to
have a voice in democratic processes.
25
Levy K. (2013) Relational Big Data. (Stanford Law Review: Stanford)
26
Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford)
27
Ibid
28
Ibid
29
Ibid
11
12. 17 December 2013
In the field of public policy, ‘it is the predictive power of big data analytics that
understandably attracts the most attention as insights on human behaviour can be
gleaned from these data’30. The increase in the availability of data has occurred
relatively fast, and as such is not yet balanced by the emergence of privacy
legislation or ethical frameworks that can mitigate potentially damaging uses of the
data. As the big data pools are predominantly in the hands of powerful intermediary
institutions, not ordinary people, they may thus be misused and abused. If
policymakers do not insist on ‘building privacy, transparency, autonomy and other
protections into big data related activities from the outset, this will diminish big data’s
lofty ambitions’31. There is a need for a healthier balance of power between those
who generate the data, and those who make inferences and decisions based on it.
‘African countries represent a strong testing ground for data protections as the power
imbalance
between
the
producers
and
users
(mainly
large
multinational
corporations) of personal data there, is one of the largest anywhere’32. It is highly
likely that individuals, and even governments, may lack the information, resources or
access to make corporations or countries accountable when they breach data
protection guidelines.
It is evident that increasingly powerful and secretive algorithms, such as PRISM,
combined with numerous other massive datasets pose a significant risk to personal
privacy and civil liberties, especially in the African context. In the era of big data,
policy should include protections lest these worrisome orthodoxies crystalise.
30
United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York)
31
http://blogs.oii.ox.ac.uk/policy/the-scramble-for-africas-data
Ibid
32
12
13. 17 December 2013
The Scramble for Africa’s Data
After the last decade’s exponential rise in ICT use, Africa is fast becoming a source
of big data as Africans increasingly ‘emit digital information with their mobile phone
calls, internet use and various forms of digitised transactions’33. ‘The emergence of
big data in Africa has the potential to make the continent’s citizens a rich mine of
information, with the default mode being for this to happen without their consent or
involvement, and without ethical and normative frameworks to ensure data protection
or to weigh the risks against the benefits’34. It is increasingly likely that there will be a
new scramble for Africa: a digital resource grab, and African countries need to be
fully cognisant of this, and circumspect in their approach and monitoring thereof.
Opportunities for Africa
Notwithstanding the severe lack of qualified people on the Continent to exploit the
attendant benefits of big data, significant opportunities exist. ‘In light of the serious
problems with both illiteracy and information access in the developing world,
especially Africa, there is a widespread belief that speech technology can play a
significant role in improving the quality of life of developing-world citizens’35. It is oft
said that African societies rely on oral traditions to transfer knowledge and culture
inter-generationally, and the developing field of phonetic search, machine learning
and natural language processing coupled with big data portend well for Africa’s
ability to fully harness and leverage analytic power. There is a great number of
languages on the Continent, over 2000, with the development of ‘voice-search
systems being a useful tool in delivering on the original promise’ of big data
analytics. Though ‘speech technology has to date played a much smaller role in the
developing world, the rapid spread of telephone networks through the developing
world’36, leads to optimism that this situation will change significantly in years to
33
34
35
Ibid
Ibid
Barnard E., Moreno P., Schalkwyk J., and van Heerden C. (2010) Voice Search for Development. (Human Language
Technologies Research Group: Pretoria)
36
Ibid
13
14. 17 December 2013
come. Recently, ‘a novel application of speech technology, namely the use of
speech recognition to perform searches through Web content and personal
information’37, has become increasingly popular in the developed world and it is this
paper’s contention that this could be duplicated and appropriated for the African
continent.
However, though ‘voice search lends itself to efficient and low-cost data collection
(thereby addressing resource constraints)’38, ‘digital content that is relevant to people
of the developing world is generally scarce and distributed across numerous sources
without any form of integration’39. African universities and the graduates produced
(mainly computer scientists and linguists) will thus have to redouble their efforts to
ensure that ‘voice search makes web-based content available regardless of the
original source of the data, which will go some way towards solving issues of content
availability’40.
Africa has already demonstrated its excellent science and engineering skills by
designing and starting to build the 64-dish MeerKAT telescope – as a pathfinder to
the Square Kilometre Array (SKA) – in South Africa. ‘The technology being
developed is cutting-edge and the project is creating a large group of young
scientists and engineers with world-class expertise in technologies that will be crucial
for development’41. Though many of today’s data scientists are formally trained in
computer science, maths or economics, they can emerge from any field with a strong
data and computational focus. Hal Varian argues that ‘the ability to take data understand it, process it, extract value from it, visualise it, and communicate it, is
going to be a hugely important skill in coming decades’42. Ben Fry takes this a step
further and argues for an entirely new field that combines the skills from often
disjointed areas of expertise in the analytics of big data.
37
Ibid
Ibid
39
Ibid
38
40
41
Ibid
Botman H. (2013) The role of universities in the development of Africa. (Paper presented to the Swiss Federal Institute of
Technology)
42
http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers
14
15. 17 December 2013
He argues that fields such as statistics, data mining, graphic design, and information
visualisation each offer meaning to and can find patterns to data, but practitioners of
each are often unaware of, or unskilled in, the methods of the adjacent fields
required for a solution. As such, to fully exploit opportunities that stem from big data,
African universities need to reorientate themselves and their curricula to ensure that
all their graduates are ‘data literate’ (meaning competent in finding, manipulating,
managing, and interpreting data), as well as being adept at mathematical and
hypothetical deductive reasoning.
Conclusion
The increased analytics and predictive power associated with big data conjure
utopian and dystopian scenarios. This paper argues that the advancement of big
data and the Internet of things, though a significant milestone in the development of
social science and the Internet, is not an end in itself. Having highlighted the gains of
big data analytics and its ability to transform society, the paper warns against a
‘dictatorship of data’ wherein data governs us in ways that may do as much harm as
good. The cautionary note sounded cited ‘political and social equality considerations
where the vulnerable are likely to be further relegated to an inferior status’43, as well
as the policy implications of big data and a likely scramble for Africa’s data as
reasons to be circumspect of the promise of big data. ‘Technology is neither good
nor bad; nor is it neutral’44.
43
Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford)
Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly
phenomenon. (Information, Communication and Society Journal)
44
15
17. 17 December 2013
REFERENCES
Barnard E., Moreno P., Schalkwyk J., and van Heerden C. (2010) Voice Search for Development. (Human Language
Technologies Research Group: Pretoria)
Bollier D. (2010) The Promise and Peril of big Data. (The Aspen Institute: Maryland)
Botman H. (2013) The role of universities in the development of Africa. (Paper presented to the Swiss Federal Institute of
Technology)
Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly
phenomenon. (Information, Communication and Society Journal)
Clarke R., and Wigan M. (2013) Big Data’s unintended consequences. (IEEE: Washington)
Crawford K. (2013) The Hidden Biases in Big Data. (Harvard Business Review Blog Network)
Retrieved from http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/
Cukier K., and Mayer-Schonberger V. (2013) The Dictatorship of Data. (MIT Technology Review)
Retrieved from http://www.technologyreview.com/news/514591/the-dictatorship-of-data/
Einav L., and Levin J. (2013) The Data Revolution and Economic Analysis. (Working Paper 19035) Retrieved from
http://www.nber.org/papers/w19035
Hartzog W., and Selinger E. Big Data in Small Hands. (Stanford Law Review: Stanford)
King J., and Richards N. (2013) Three paradoxes of Big Data. (Stanford Law Review: Stanford)
Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford)
Levy K. (2013) Relational Big Data. (Stanford Law Review: Stanford)
Michael K., and Miller K. (2013) Big Data: New Opportunities and New Challenges. (IEEE: Washington)
Pietsch W. (2013) Big Data – The New Science of Complexity. (Munich Center for Technology in Society: Munich)
Polonetsky J., and Tene O. (2013) Privacy and Big Data: making ends meet. (Stanford Law Review: Stanford)
Russom P. (2011) Big Data Analytics. (TDWI research: Washington)
Taylor L. (2013) The Scramble for Africa’s Data: Resource Grab or Developmental opportunity. (Oxford Internet Institute:
Oxford)
United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York)
World Economic Forum. (2012) Big Data, Big Impact: New Possibilities for International Development. (World Economic
Forum: Geneva)
17