SlideShare ist ein Scribd-Unternehmen logo
1 von 38
TEXT
MINING
Team 4
Syed Aqib Ali
Syeda Ramsha Habib Gilani
Lateefah Omoyosola Yusuf
Rochelle Star Velasquez
TABLE OF CONTENT
1. What is Text Mining?
2. Introduction
3. Main Models Used
4. Key Contributions
5. Marketing and Non-marketing Applications
6. Limitations
7. Avenues for future research
8. Key Takeaways
WHAT IS TEXT MINING?
WHAT IS TEXT MINING?
Text mining is a process of deriving/extracting high
quality meaningful information and patterns.
Text analysis involves information retrieval, analysis
to study word frequency distributions, pattern
recognition, information extraction, data mining
techniques including link and association analysis,
visualization, and predictive analytics.
INTRODUCTION
INTRODUCTION
● A research study applying Text Mining and
Machine Learning tools.
● The authors find that loan applicants' choice
of words reveals insights into their intentions,
circumstances, and personality.
● This information is powerful in predicting
loan repayment, going beyond typical
financial and demographic factors.
Setting and Data
1. Potential borrowers submit their request for a loan for a specific
amount with a specific maximum interest rate (they are willing to pay).
2. The loan amount they wish to borrow must in (between $1,000 and
$25,000 in the data).
3. Prosper verifies all financial information, including the potential
borrower’s credit score.
Textual, Financial, and Demographic Variables
1. Textual variables:
a. The number of characters in the title and the text box.
b. The percentage of words with six or more letters.
c. SMOG: This measures writing quality by mapping it to number of years of formal
education needed to easily understand the text in first reading.
d. Count of spelling mistakes.
e. Bigrams : Two-word combinations (help to understand the context and the pattern).
2. Financial variable:
a. Loan amount, borrower’s credit grade, Debt to income ratio.
3. Demographic variables:
a. Gender, age, location, race.
PROCESS OF
TEXT MINING
The authors used something called "Term
frequency-inverse document frequency" or tf-
idf to compare how often a word is used in a
loan request to how often it's used in all the
loan requests and how long the request is.
Process 04
Process 01
tm package in r was used to select
distinct words in each loan application.
Process 02
- Porter’s stemming algorithm to collapse
variations of words into one e.g., “borrower,”
“borrowed,” “borrowing,” and “borrowers”
become “borrow” (3.5M words → 30,920 unique
words and 1052 bigrams.
PyEnchant 1.6.6 package in Python was
used to count spelling mistakes in the
loan applications. This allows them to
identify words that are misspelled and
potentially serve as a proxy for
characteristics correlated with lower
income.
Process 03
4
MAIN MODELS USED
MODEL 1 - Predictive model
Aim:
To evaluate whether the text used by borrowers in their loan application predicts
their loan default.
Machine Learning Methods:
Ensemble stacking approach
1. Train each model on the calibration data (2 logistics regression and 3 tree-
based methods).
2. Build a weighting model to combine the models calibrated in the first model.
Result
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research,
56(6), 960-980.
Result
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
MODEL 2 - Words and writing styles of default loan request
Aim:
Learn which words, writing styles, and general ideas conveyed by the text are more
likely to be associated with default loan request.
Machine Learning Methods:
1)Machine learning tools
Naive Bayes
L1 regularization binary logistic model
Word Count Dictionary (LIWC)
2) Standard Econometrics tools
Topic’s Logistic regression extracted from
a latent Dirichlet allocation (LDA) analysis
and the sub-dictionaries of the Linguistic
Inquiry.
Result
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
MODEL 3 - Potential Borrower’s Personality
Aim:
Further exploration of potential traits and states of borrowers.
Machine Learning Methods:
Applying LIWC library.
Results:
Defaulting loan requests are written in a manner consistent
with the writing styles of extroverts and liars.
KEY CONTRIBUTIONS
Analyzing applications
Borrower 1: “I am a hard working person, married for 25 years, and have
two wonderful boys. Please let me explain why I need help. I would use
the $2,000 loan to fix our roof. Thank you, God bless you, and I promise to
pay you back.”
Borrower 2: “While the past year in our new place has been more than
great, the roof is now leaking and I need to borrow $2,000 to cover the
cost of the repair. I pay all bills (e.g., car loans, cable, utilities) on time.”
Which borrower is more likely to default?
KEY CONTRIBUTIONS
Textual information
on the loan
significantly helps
predict loan default.
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
KEY CONTRIBUTIONS
Words indicative of
loan repayment.
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
KEY CONTRIBUTIONS
Loan default requests mimic the
writing styles of extroverts and liars.
KEY CONTRIBUTIONS
Evidence of people with different
educational backgrounds and
economic situations use words
differently.
KEY CONTRIBUTIONS
Evidence of supplementing
traditional measures and replacing
some aspects of it.
KEY CONTRIBUTIONS
Help lenders avoid defaulting borrowers
and help borrowers better express
themselves when requesting a loan.
MARKETING AND
NON-MARKETING
APPLICATIONS
MARKETING APPLICATIONS
• Sentiment analysis
• Brand monitoring
• Customer feedback analysis
• Churn prediction
• Predictive analysis
• Market research
• Personalized marketing
• Social media analytics
NON-MARKETING APPLICATIONS
• Psychological profiling
• Fraud detection
• Credit risk assessment
• Customer service
LIMITATIONS
LIMITATIONS
1. Text data may not be available for all loan
applications, as some borrowers may not
provide any text or may provide incomplete
or inaccurate information.
2. Text data may be subject to
interpretation and bias, as different lenders
may interpret the same text differently
based on their own biases and assumptions.
3. The use of text data to predict loan
default raises ethical and legal concerns
FURTHER RESEARCH
FURTHER RESEARCH
● The predictive ability of text analysis
regarding future behavior extended
to other behaviors and industries.
● Extension of results to other types of
communication, e.g., phone calls
and online chats.
● How word usage can change
overtime.
FURTHER RESEARCH
● Exploring the role of emotions and
mental states in financial behaviors.
● Investigate the impact of different
writing styles on loan default.
● Application of the findings to other
loan types and platforms.
● Develop more accurate and
efficient text-mining and machine
learning tools for analyzing loan
applications.
KEY TAKEAWAYS
KEY TAKEAWAYS
● Text mining and machine learning tools can be
employed to predict psychographics, including
the likelihood of future loan defaults.
KEY TAKEAWAYS
● The LIWC dictionaries associated with
extroversion and deception are significantly
correlated with default.
KEY TAKEAWAYS
● There may be variables that are affected by
both the observable text and unobservable
personality traits.
Thank you
for your
attention!

Weitere ähnliche Inhalte

Ähnlich wie Text Mining - Advanced Customer Analytics

Effect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private BanksEffect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private Banks
ijtsrd
 
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANALRunning Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
MalikPinckney86
 
Propose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docxPropose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docx
briancrawford30935
 
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docxMODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
raju957290
 
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
ijtsrd
 

Ähnlich wie Text Mining - Advanced Customer Analytics (20)

Adithya Resume
Adithya ResumeAdithya Resume
Adithya Resume
 
NEIL MANOJ C (2247224) (PPT).pptx
NEIL MANOJ C (2247224) (PPT).pptxNEIL MANOJ C (2247224) (PPT).pptx
NEIL MANOJ C (2247224) (PPT).pptx
 
3-Project_FIN_955PROJECT_LAST VERSION (1)
3-Project_FIN_955PROJECT_LAST VERSION (1)3-Project_FIN_955PROJECT_LAST VERSION (1)
3-Project_FIN_955PROJECT_LAST VERSION (1)
 
03_AJMS_298_21.pdf
03_AJMS_298_21.pdf03_AJMS_298_21.pdf
03_AJMS_298_21.pdf
 
MTBiz August-September 2016
MTBiz August-September 2016MTBiz August-September 2016
MTBiz August-September 2016
 
Effect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private BanksEffect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private Banks
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
 
A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.
A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.
A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.
 
Financial Text Analysis
Financial Text AnalysisFinancial Text Analysis
Financial Text Analysis
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
 
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANALRunning Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
 
DB_Assgn 3
DB_Assgn 3DB_Assgn 3
DB_Assgn 3
 
Propose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docxPropose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docx
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for Microcredit
 
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docxMODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
 
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
 
B510519.pdf
B510519.pdfB510519.pdf
B510519.pdf
 
Consumers Buying Behaviors’ Loans and Credits: A Situationer
Consumers Buying Behaviors’ Loans and Credits: A SituationerConsumers Buying Behaviors’ Loans and Credits: A Situationer
Consumers Buying Behaviors’ Loans and Credits: A Situationer
 

Mehr von Aqib Syed

Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Aqib Syed
 

Mehr von Aqib Syed (20)

KNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdf
KNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdfKNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdf
KNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdf
 
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
 
E Scooters in Scandinavia and Sustainability
E Scooters in Scandinavia and SustainabilityE Scooters in Scandinavia and Sustainability
E Scooters in Scandinavia and Sustainability
 
The Great Leader Muhammad Ali Jinnah
The Great Leader Muhammad Ali JinnahThe Great Leader Muhammad Ali Jinnah
The Great Leader Muhammad Ali Jinnah
 
Sir Syed Ahmed Khan Bahadur -History of Pakistan
Sir Syed Ahmed Khan Bahadur -History of PakistanSir Syed Ahmed Khan Bahadur -History of Pakistan
Sir Syed Ahmed Khan Bahadur -History of Pakistan
 
Pakistan Resolution 1940 -History of Pakistan
Pakistan Resolution 1940 -History of PakistanPakistan Resolution 1940 -History of Pakistan
Pakistan Resolution 1940 -History of Pakistan
 
Rise of Mughal Empire (1625-1707)- History of SubContinent
Rise of Mughal Empire (1625-1707)-  History of SubContinentRise of Mughal Empire (1625-1707)-  History of SubContinent
Rise of Mughal Empire (1625-1707)- History of SubContinent
 
Decline of Mughals (1707-1857) -History of SubContinent
Decline of Mughals (1707-1857) -History of SubContinentDecline of Mughals (1707-1857) -History of SubContinent
Decline of Mughals (1707-1857) -History of SubContinent
 
Allama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinent
Allama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinentAllama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinent
Allama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinent
 
East Pakistan Separation- History of SubContinent
East Pakistan  Separation- History of SubContinentEast Pakistan  Separation- History of SubContinent
East Pakistan Separation- History of SubContinent
 
General Muhammad Zia Ul Haq - Dictatorship in Pakistan
General Muhammad Zia Ul Haq - Dictatorship in PakistanGeneral Muhammad Zia Ul Haq - Dictatorship in Pakistan
General Muhammad Zia Ul Haq - Dictatorship in Pakistan
 
Zulfiqar Ali Bhutto- A Politician
Zulfiqar Ali Bhutto- A Politician Zulfiqar Ali Bhutto- A Politician
Zulfiqar Ali Bhutto- A Politician
 
Ashoka- The Great _History of Subcontinent
Ashoka- The Great _History of SubcontinentAshoka- The Great _History of Subcontinent
Ashoka- The Great _History of Subcontinent
 
Perception and Marketing- Consumer Behavior
Perception and Marketing- Consumer BehaviorPerception and Marketing- Consumer Behavior
Perception and Marketing- Consumer Behavior
 
Learning, Memory and Retrieval
Learning, Memory and RetrievalLearning, Memory and Retrieval
Learning, Memory and Retrieval
 
Exposure, Attention and Interpretation -Consumer Behavior
Exposure, Attention and Interpretation -Consumer BehaviorExposure, Attention and Interpretation -Consumer Behavior
Exposure, Attention and Interpretation -Consumer Behavior
 
Emotions and Marketing Strategy- Cosnumer Behavior
Emotions and Marketing Strategy- Cosnumer BehaviorEmotions and Marketing Strategy- Cosnumer Behavior
Emotions and Marketing Strategy- Cosnumer Behavior
 
Attitude - Consumer Behavior
Attitude - Consumer BehaviorAttitude - Consumer Behavior
Attitude - Consumer Behavior
 
Measuring Sources of Brand Equity -Brand Management
Measuring Sources of Brand Equity -Brand ManagementMeasuring Sources of Brand Equity -Brand Management
Measuring Sources of Brand Equity -Brand Management
 
Social Media Marketing - Brand Management
Social Media Marketing - Brand ManagementSocial Media Marketing - Brand Management
Social Media Marketing - Brand Management
 

Kürzlich hochgeladen

00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
nafizanafzal
 
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Klinik kandungan
 
Mental Health Issues of Graduate Students
Mental Health Issues of Graduate StudentsMental Health Issues of Graduate Students
Mental Health Issues of Graduate Students
vineshkumarsajnani12
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
 
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
yulianti213969
 
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di MalangObat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Jakarta Wa 085176963835 Apotek Jual Obat Cytotec Di Jakarta
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
CaitlinCummins3
 
Obat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di Jakarta
Obat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di JakartaObat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di Jakarta
Obat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di Jakarta
Obat Aborsi Jakarta Wa 085176963835 Apotek Jual Obat Cytotec Di Jakarta
 

Kürzlich hochgeladen (20)

WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
Beyond Numbers A Holistic Approach to Forensic Accounting
Beyond Numbers A Holistic Approach to Forensic AccountingBeyond Numbers A Holistic Approach to Forensic Accounting
Beyond Numbers A Holistic Approach to Forensic Accounting
 
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
 
The Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and UncertaintyThe Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and Uncertainty
 
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
 
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
 
Solar Panel Installation A Comprehensive Guide.pdf
Solar Panel Installation A Comprehensive Guide.pdfSolar Panel Installation A Comprehensive Guide.pdf
Solar Panel Installation A Comprehensive Guide.pdf
 
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deckPitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
 
Mental Health Issues of Graduate Students
Mental Health Issues of Graduate StudentsMental Health Issues of Graduate Students
Mental Health Issues of Graduate Students
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Pixar Case Analysis.....................
Pixar Case Analysis.....................Pixar Case Analysis.....................
Pixar Case Analysis.....................
 
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdfProgress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
 
Chapter 2 Organization Structure of a Treasury
Chapter 2 Organization Structure of a TreasuryChapter 2 Organization Structure of a Treasury
Chapter 2 Organization Structure of a Treasury
 
Thompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptx
Thompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptxThompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptx
Thompson_Taylor_MBBS_PB1_2024-03 (1)- Project & Portfolio 2.pptx
 
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
 
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di MalangObat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
 
Ital Liptz - all about Itai Liptz. news.
Ital Liptz - all about Itai Liptz. news.Ital Liptz - all about Itai Liptz. news.
Ital Liptz - all about Itai Liptz. news.
 
Obat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di Jakarta
Obat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di JakartaObat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di Jakarta
Obat Aborsi Jakarta 0851\7696\3835 Jual Obat Cytotec Di Jakarta
 
Presentation on cross cultural negotiations.
Presentation on cross cultural negotiations.Presentation on cross cultural negotiations.
Presentation on cross cultural negotiations.
 

Text Mining - Advanced Customer Analytics

  • 1. TEXT MINING Team 4 Syed Aqib Ali Syeda Ramsha Habib Gilani Lateefah Omoyosola Yusuf Rochelle Star Velasquez
  • 2. TABLE OF CONTENT 1. What is Text Mining? 2. Introduction 3. Main Models Used 4. Key Contributions 5. Marketing and Non-marketing Applications 6. Limitations 7. Avenues for future research 8. Key Takeaways
  • 3. WHAT IS TEXT MINING?
  • 4.
  • 5. WHAT IS TEXT MINING? Text mining is a process of deriving/extracting high quality meaningful information and patterns. Text analysis involves information retrieval, analysis to study word frequency distributions, pattern recognition, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics.
  • 7. INTRODUCTION ● A research study applying Text Mining and Machine Learning tools. ● The authors find that loan applicants' choice of words reveals insights into their intentions, circumstances, and personality. ● This information is powerful in predicting loan repayment, going beyond typical financial and demographic factors.
  • 8. Setting and Data 1. Potential borrowers submit their request for a loan for a specific amount with a specific maximum interest rate (they are willing to pay). 2. The loan amount they wish to borrow must in (between $1,000 and $25,000 in the data). 3. Prosper verifies all financial information, including the potential borrower’s credit score.
  • 9. Textual, Financial, and Demographic Variables 1. Textual variables: a. The number of characters in the title and the text box. b. The percentage of words with six or more letters. c. SMOG: This measures writing quality by mapping it to number of years of formal education needed to easily understand the text in first reading. d. Count of spelling mistakes. e. Bigrams : Two-word combinations (help to understand the context and the pattern). 2. Financial variable: a. Loan amount, borrower’s credit grade, Debt to income ratio. 3. Demographic variables: a. Gender, age, location, race.
  • 10. PROCESS OF TEXT MINING The authors used something called "Term frequency-inverse document frequency" or tf- idf to compare how often a word is used in a loan request to how often it's used in all the loan requests and how long the request is. Process 04 Process 01 tm package in r was used to select distinct words in each loan application. Process 02 - Porter’s stemming algorithm to collapse variations of words into one e.g., “borrower,” “borrowed,” “borrowing,” and “borrowers” become “borrow” (3.5M words → 30,920 unique words and 1052 bigrams. PyEnchant 1.6.6 package in Python was used to count spelling mistakes in the loan applications. This allows them to identify words that are misspelled and potentially serve as a proxy for characteristics correlated with lower income. Process 03 4
  • 12. MODEL 1 - Predictive model Aim: To evaluate whether the text used by borrowers in their loan application predicts their loan default. Machine Learning Methods: Ensemble stacking approach 1. Train each model on the calibration data (2 logistics regression and 3 tree- based methods). 2. Build a weighting model to combine the models calibrated in the first model.
  • 13. Result Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 14. Result Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 15. MODEL 2 - Words and writing styles of default loan request Aim: Learn which words, writing styles, and general ideas conveyed by the text are more likely to be associated with default loan request. Machine Learning Methods: 1)Machine learning tools Naive Bayes L1 regularization binary logistic model Word Count Dictionary (LIWC) 2) Standard Econometrics tools Topic’s Logistic regression extracted from a latent Dirichlet allocation (LDA) analysis and the sub-dictionaries of the Linguistic Inquiry.
  • 16. Result Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 17. MODEL 3 - Potential Borrower’s Personality Aim: Further exploration of potential traits and states of borrowers. Machine Learning Methods: Applying LIWC library. Results: Defaulting loan requests are written in a manner consistent with the writing styles of extroverts and liars.
  • 19. Analyzing applications Borrower 1: “I am a hard working person, married for 25 years, and have two wonderful boys. Please let me explain why I need help. I would use the $2,000 loan to fix our roof. Thank you, God bless you, and I promise to pay you back.” Borrower 2: “While the past year in our new place has been more than great, the roof is now leaking and I need to borrow $2,000 to cover the cost of the repair. I pay all bills (e.g., car loans, cable, utilities) on time.” Which borrower is more likely to default?
  • 20. KEY CONTRIBUTIONS Textual information on the loan significantly helps predict loan default. Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 21. KEY CONTRIBUTIONS Words indicative of loan repayment. Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 22. KEY CONTRIBUTIONS Loan default requests mimic the writing styles of extroverts and liars.
  • 23. KEY CONTRIBUTIONS Evidence of people with different educational backgrounds and economic situations use words differently.
  • 24. KEY CONTRIBUTIONS Evidence of supplementing traditional measures and replacing some aspects of it.
  • 25. KEY CONTRIBUTIONS Help lenders avoid defaulting borrowers and help borrowers better express themselves when requesting a loan.
  • 27. MARKETING APPLICATIONS • Sentiment analysis • Brand monitoring • Customer feedback analysis • Churn prediction • Predictive analysis • Market research • Personalized marketing • Social media analytics
  • 28. NON-MARKETING APPLICATIONS • Psychological profiling • Fraud detection • Credit risk assessment • Customer service
  • 30. LIMITATIONS 1. Text data may not be available for all loan applications, as some borrowers may not provide any text or may provide incomplete or inaccurate information. 2. Text data may be subject to interpretation and bias, as different lenders may interpret the same text differently based on their own biases and assumptions. 3. The use of text data to predict loan default raises ethical and legal concerns
  • 32. FURTHER RESEARCH ● The predictive ability of text analysis regarding future behavior extended to other behaviors and industries. ● Extension of results to other types of communication, e.g., phone calls and online chats. ● How word usage can change overtime.
  • 33. FURTHER RESEARCH ● Exploring the role of emotions and mental states in financial behaviors. ● Investigate the impact of different writing styles on loan default. ● Application of the findings to other loan types and platforms. ● Develop more accurate and efficient text-mining and machine learning tools for analyzing loan applications.
  • 35. KEY TAKEAWAYS ● Text mining and machine learning tools can be employed to predict psychographics, including the likelihood of future loan defaults.
  • 36. KEY TAKEAWAYS ● The LIWC dictionaries associated with extroversion and deception are significantly correlated with default.
  • 37. KEY TAKEAWAYS ● There may be variables that are affected by both the observable text and unobservable personality traits.