In emerging markets, eight out of ten small businesses cannot access the loans they need to grow. USAID’s Development Credit Authority (DCA) uses risk-sharing agreements to mobilize local private capital to fill this financing gap. The goal of this collaboration between UN Global Pulse and USAID is to explore how big data could support the work of USAID’s Development Credit Authority.
Kenya has become an established tech leader in Africa in recent years – generating greater volumes of digital data as a result. The goal of this study is to explore what new sources of digital data, and methods for analysis, could be helpful in answering the question: “What barriers to accessing loans do small businesses in Kenya face?”
Accordingly, this presentation paints a picture of the big data landscape in Kenya, shows preliminary findings, and lays the groundwork for further investigation.
2. Purpose of the project:
The goal of this study is to determine the feasibility of answering
the question “what barriers to accessing loans do small
businesses in Kenya face?” through analysis of new sources of
digital data.
The research involved the following elements:
- Digital landscape in Kenya
- Custom analysis of select sources of social media data and
online search data
- Assessment of the digital footprint of DCA clients and menu of
potentially relevant data sources for further investigation
- Conclusions & recommendations
The exercise is intended to inspire new thinking in how USAID’s
Development Credit Authority can use new sources of digital data
to inform its work.
3. Relevant sources of Big Data for Development:
WHAT PEOPLE SAY (i.e., international and local online news sources, publicly accessible
blogs, forum posts, comments and public social media content, online
advertising, e-commerce sites and websites created by local retailers
that list prices and inventory)
WHAT PEOPLE DO (i.e., aggregated transactional data from the use of digital services
such as financial services (including purchases, money transfers,
savings and loan repayments), communications services (such as
anonymized records of mobile phone usage patterns) or information
services (such as anonymized records of search queries).
For reference, UN Global Pulse’s introductory guide, “Big Data for Development: A Primer,” is available online and for
download at: http://unglobalpulse.org/bigdataprimer
4. Social Data?
What is social data?
• Social data is the text that individuals share digitally, e.g.
via Twitter, blogs, Facebook
• Social data is a massive amount of qualitative data.
How is social data analyzed?
• Trends in social data can be analyzed by aggregating
volumes of text relating to a set of predefined key-words.
• Computer algorithms can also automatically detect words
that co-occur with predefined key-words, giving context.
5. EXAMPLE 1: Rice prices in Indonesia
The number of tweets discussing the price of rice in Indonesia follows a similar
function as the official inflation statistics for the food basket
http://unglobalpulse.org/projects/twitter-and-perceptions-crisis-related-stress
6. EXAMPLE 2: Finance chatter in the US
Twitter chatter on the topic of finance in the US increase significantly
during the US debt ceiling debate in 2009.
http://unglobalpulse.org/projects/twitter-and-perceptions-crisis-related-stress
7. Kenya’s Digital Landscape
• Dec 2012, 78.0% of Kenya’s adults had mobile phones
• Sept 2012, estimated only 7% smart phones
• But Jan 2013, Safaricom’s launched a new smartphone that sold out in
less than two weeks.
• Dec 2012, internet stood at 9.4m subscriptions, growth of 75.1% over
the previous year.
• Including non-subscribers, 41.1% of the population accessing internet
by Dec 2012.
Sources: Communications Commisson of Kenya Quarterly Sector Statistics Report (2012/13),
AudienceScapes, Internet Access and Use in Kenya (2010)
7
8. Phone Ownership in Kenya
A 2009 survey showed that there are comparable rates of mobile ownership
among every income bracket in the country.
Source: Ownership and Usage Patterns in Kenya Amy Wesolowski, Nathan Eagle,, Abdisalan M. Noor, Robert
W. Snow, Caroline O. Buckee
9. Gathering Contextual Knowledge:
DCA Client Survey
10 DCA clients from Kenya Commercial Bank were surveyed. All of these
clients were farmers, from peri-urban or rural areas in Central Kenya.
10. Social Media Monitoring
• Various tools/platforms, both proprietary and open-source,
allow for social media filtering & analysis
• For this analysis, Global Pulse used the Crimson Hexagon
ForSight platform, which:
– Provides access to full archive of public Tweets
– Can automate categorization of tweets, once an analyst
creates a set of rules and filters
11. Building a taxonomy of keywords
Step One
• The field survey DCA clients to describe, in colloquial language, the
words they tend to use when discussing loans/finance.
Step Two
• Use the keywords gleaned from survey to create a taxonomy
• Test and refine taxonomy iteratively by exploring Twitter data
Step Three:
• Exclude words that create “noise” in the data (ie. irrelevant posts)
• For example, Kenya bank KCB sponsors sporting events so those
tweets are excluded:
– Sample tweet: @theARsite Kenya: Amwari to Test New Evo 9 Car At Ngong Ahead of
Next Month's KCB Nyeri Rally (All Africa): Share With Fr... http://bit.ly/YtiS9v
12. Keywords
(loan OR loans OR mkopo OR wakopo) AND ("Top up" OR "Payback period"
OR installments OR expansion OR mpesa OR mbesa OR financing OR
"business financing" OR biashara OR dairy OR msoto OR red OR doh OR
qualify OR stocking OR application OR maximum OR duration OR interests OR
delay OR security OR "land title" OR deed OR "deposit dates" OR tembelea OR
"fixed deposit receipts" OR secured OR "calculated interest" OR interest OR
guarantees OR guarantor OR lawyer OR Agricultural OR agriculture OR
development OR application OR procedures OR payback OR improvement OR
n’gombe OR wakora OR repay OR balance OR "agreement letter" OR period
OR clear OR siri OR security OR sambaza OR defaulted OR "cooperative
society" OR Faulu OR credit OR Agrovets OR mfugo OR zidisha OR "penalty
charges" OR penalty OR Emergency OR "ketes temiship" OR inflation OR
expectations OR capital OR terms OR payment OR "nilitemelea banki" OR farm
OR status OR assets OR asset OR mshwari OR land OR animal OR animals OR
"long term" OR "short term" OR "mini statement" OR "mini statements" OR
ministatements OR "shamba shape ups" OR "fixed accounts" OR mshwari OR
zidisha OR bank OR banki) AND -helb AND -@MweuDeh AND -hooker AND @helbpage AND -Hooker AND -@HELBpage AND –“car-jacker” AND -Chelsea
AND –Manchester
13. General loan monitor
Categories were rationalized due to lack of data to break down to a more granular level
Original
categories
Final
categories
I
want
a
loan
-‐Business
-‐Personal
General
Loan,
posi5ve
I
have
a
loan,
nega5ve
-‐Business
-‐Personal
General
Loan,
nega5ve
I
have
a
loan,
posi5ve
-‐Business
-‐Personal
I
have
a
loan,
neutral
-‐Business
-‐Personal
Informa5on
seeking
Seeking
informa5on
on
loans
Informa5on
provision
Providing
informa5on
on
loans
*Jokes, sports chat, extraneous noise filtered out
14. Much of the growth in chatter about loans is related to the
launch of M-shwari,
a new mobile based savings and small loans service
available to M-Pesa customers.
15. General Nature of Loan Chatter
Jan 1 2013 to March 14, 2013 (before and after M-shwari is launched)
16. Sample tweet content:
“I need a bizness
loan…interest is
double”
Understanding
sentiment around bank
loans helps people
make the best decisions
when they need loans.
17. CONTEXT IS KEY!
Another spike in relevant
tweets came after an
announcement by a Vice
Presidential candidate that
if elected, the government
would offer an interest free
loan to women and youth.
While most tweets were
neutral in response, the
announcement was also
met with skepticism – with
people tweeting things like
“gullible” and “silly season”
18. Loans by Sector
From January 1, 2012 to August 25, 2013, there
have been 5,317 relevant posts, representing an
average of 9.2 posts per day. There has been a
growth in Twitter similar to the one seen in the
first monitor, with a sustained growth after the
launch of M-shwari. This growth is driven by
chatter in business and personal loans, as
opposed to government loans.
19. Looking at volume and sentiments of tweets
related to a specific bank
20. Google Trends
Google Trends makes tools publically available to track the volume of
searches over time by country. Using Google Trends, it is possible to:
- Track relative changes in search volumes over time.
- Compare different search volumes.
Limitations
- Can’t create subcategories within one search term
- Only one word that commonly occurs with the initial keyword is
given.
21. Searches for “loan”
• There is no straightforward way to create sub-categories with-in overall loan
searches. In Google Trends there are two ways to approximate this.
• First is to specify a full search phrase in quotes, for example “business loan”
or “personal loan.” No data was returned.
•
•
Second is to exclude words from the search, for example “Loan -student.”
Google Trends shows the top co-occurring word with “loan,” which is HELB, the
student loan authority.
22. How to access other potentially
relevant sources of data
This study included a preliminary analysis of readily-available data (Twitter and
Google search).
However, other sources of data which may reveal highly relevant “digital
signals” about the topic would require more effort to access. Namely, this
includes mobile phone data and information found on disparate websites.
Data from Mobile Services
Partnerships or subscriptions with service providers (like M-Shwari) would be
required to access the data.
Content from Websites
A great deal of information is available online which is updated as things change.
While it not feasible to gather this information by hand, a “scraper” can be built
to automatically collect the data and integrate it into a useable format.
23. Example of website which publishes
relevant real-time content: Equity Bank
24. Opportunities
• Despite small numbers of contextually relevant tweets available in
2013, there is an emergence of a Kenya-specific Twitter culture.
• Twitter is being used to seek, access and share information about
loans, especially mobile loans, as well as to comment on the news
related to personal and business loans.
• Much of the chatter is related to M-Shwari and other Safaricom
products. A future iteration of the monitors could focus solely on
non-traditional banking or exclude M-shwari to focus solely on
traditional bank loans.
• Monitoring banks’ Twitter handles could provide insight into (1)
products & services available, and, to a lesser extent, (2) information
seeking behavior.
• Chance to get in early & set-up monitors rather than reverse
engineer analysis
25. Challenges
• Kenya’s current digital landscape: quantity of relevant social
media data restricts its utility (small changes, for example due to
a popular retweet or the behavior of one Twitter user, can create
spikes in the charts)
• There social media chatter is largely driven by news/events or
information-seeking, rather than substantively about loans
• There is a lot of “noise” in the data. For KCB, this noise includes
sports chatter. For both banks, this includes news item related
to the overall business of the bank, not necessarily directly
related to bank services.
• Short-term, social data can likely only provide supplementary
insight about barriers to finance in Kenya. (e.g. analysis could be
useful for revealing early themes or trends, to inform the topics
of focus groups to validate)
26. What comes next?
•
Big data projects work best when there are several iterations and collaboration
between topical domain experts (who understand both the context, and the
programmatic information gaps or needs), and data scientists/analysts.
•
Need to be imaginative in how new data sources could be used to supplement
traditional data-collection and decision-making processes within organizations
•
While social media is on the rise in Kenya, it is also clear that for the purposes of
informing research on financial inclusion, it is not saturated enough yet.
•
If this project were to be extended in Kenya, other digital data sources might be
more useful for exploration. Accessing that data would require establishment of
partnership agreements (in the case of mobile phone data), or new capacities (in
the case of scraping data from websites).
•
If DCA is interested in beginning to using Twitter data to inform its work
now, it might be a good idea to pilot the methodology in a country
has a stronger social media culture.