Dr Luke Blaxill (Leverhulme Early Career Research Fellow, Anglia Ruskin University) - Negotiated Texts Network Seminar (29 June 2017)
Historians are increasingly surrounded by an ever-growing forest of machine-readable textual sources. The old challenge of scarcity has been replaced by that of abundance. Despite this, the impact of text mining in History has been remarkably weak. Historians, who continue to be extremely interested in language, continue overwhelmingly to prize sharply focussed analyses based on close readings. Macroscopic computational approaches based on large (or even small) corpora remain at the fringes, despite the traditional barrier of cost and manpower being considerably mitigated by the march of technology.
This paper explores the transformative potential of text mining in this field in two areas of political language where large corpora have become available. The first example is based on election platform speeches in the late Victorian and Edwardian era. In this age of emerging democracy, even local constituency candidates would routinely hold over a hundred public meetings in an election campaign, speaking at length to large audiences which were often reported very thoroughly by a diligent and wordy press. I argue that even very simple text mining techniques in a relatively small corpus (4 million words) can challenge historical consensus on the contents of general election campaigns, on the significance of issues such as imperialism and Irish Home Rule, and the respective visibility of party leaders such as Gladstone and Disraeli.
The second is an analysis of the language of women MPs in Parliament since 1945. Drawing upon the outputs of the Digging into Linked Parliamentary Data ('Dilipad') project – which has added gender and party coding to the digital edition of Hansard – I present a wide-ranging empirical analysis of the role of gender in the 677 million words of Commons debates from 1945 to 2015. I investigate whether there is strong evidence to support the central feminist claim that women's contributions to Commons debates are substantively different to those of men, ask whether the 'gender effect' has been strengthening or weakening as the number of women in Parliament has increased since the 1997 election, and also at the effect of party, such as the oft-made claim that Labour (with its greater proportion of female MPs and ideological sympathy for feminism) was more focussed on representing women than the Conservatives.
Overall, I argue these techniques, whether used conservatively in a supplementary capacity alongside traditional approaches, or more boldly to lead analysis, have considerable potential to reshape historians' work in the digital age. They allow us to analyse texts too large in size to read, help overcome flaws in human ability to intuitively estimate frequency, allow greater verifiability, more precise communication of quantity, and a more empirical approach to working.
Influencing policy (training slides from Fast Track Impact)
A War of Words? Text Mining Political Speeches in Britain in the 19th and 20th Centuries
1. Negotiated Texts Networks Seminar
A War of Words?
Text Mining Political Speeches in
Britain in the 19th and 20th Centuries
1
Dr. Luke Blaxill
Leverhulme Early Career Research Fellow
luke.blaxill@anglia.ac.uk
12. ‘Gladstone’: Top Contexts
42% Foreign Policy Weakness
17% Inferiority to Disraeli
8% Good orator
16% Disunity
11% Disestablishment
7% General Gordon
7% Financial incompetence
57% Irish Home Rule
11% Liberal Unionists
7% Abandoning Land Reform
56% Irish Home Rule
14% Disunity
12% Newcastle Programme
11% General Greatness
General Greatness 20%
Financial competence 17%
Bringer of Peace 14%
Superiority to Disraeli 14%
Manifesto/Programme 21%
Party unity 21%
General Greatness 20%
Irish Home Rule 39%
General Greatness 30%
Newcastle Programme 12%
1880
1885
1892
Irish Home Rule 40%
General Greatness 29%
Party unity 17%
Superiority to Chamberlain 13%
1886
20. Topic %
Macroeconomics 12,30
Civil Rights, Immigration 3,26
Health 6,27
Agriculture 4,02
Labour and Employment 3,78
Education 4,93
Environment 1,76
Energy 3,25
Transportation 6,39
Law, Crime, and Family 6,36
Social Welfare 3,90
Planning and Housing 5,43
Banking, Finance 4,12
Defence 11,61
Space, Science, Technology 1,41
Foreign Trade 0,61
International Affairs 8,36
Government Operations 11,10
Colonial and Territorial Issues 1,14
Table 1: Percentage[L1] of speeches delivered in the Commons since 1945 by CAP topic codes
21. Variable Name Explanation
Gender (Female= 1) Dummy variable: the main explanatory variable of
interest.
Party (Labour=1) Dummy variable, with Conservative= 0, Labour= 1, and
Liberal= 2. We excluded the other parties because they
didn’t have a consistent presence in the House of
Commons since 1945.
Party Status (Government= 1) Dummy variable that indicates if the member sits on the
government or opposition benches.
Seniority Integer variable that counts the number of times a
member was elected.
Portfolio (Responsible=1) Dummy variable: set to 1 for members whose portfolio
applies to the specific policy area. It is not surprising, for
example, that the Health Secretary will talk more about
Health issues, and this variable allows us to control for
this.
Topic Topology Dummy variable that aligns with the topology introduced
in table ??. Soft issues are coded as 0; hard issues as 1;
and all others as 2.
Parliamentary Session Categorical Variable: Each parliamentary s
Total Speeches (Exposure) Integer variable that records the total number of speeches
across all topics
Table 3: Variables that are included in our model
22. Topic Gender(Female=1
)
Party(Lab)
Macroeconomics -4,941 *** -0,732
Civil Rights,
Immigration
7,712 *** 0,458
Health 6,703 *** 1,975 *
Agriculture 0,507 -4,153 ***
Labour and
Employment
-2,532 * 7,423 ***
Education 2,458 * 1,401
Environment 0,336 -3,381 **
Energy -4,559 *** 6,038 ***
Transportation -2,131 * -0,508
Law, Crime, and
Family
4,628 *** -0,306
Social Welfare 5,534 *** 2,422 *
Planning and Housing 0,371 3,089 **
Banking, Finance -2,631 ** -3,332 **
Defence -5,037 *** -2,466 *
Space, Science -3,419 ** 0,261
Foreign Trade -0,940 -0,357
International Affairs -1,646 -3,703 ***
Government
Operations
-4,352 *** 1,568
Commonwealth Issues -1,111 -2,079 *
Table 4: z-scores and significance for the gender and party variables. * > 0.05, ** > 0.01 *** > 0.001. Standard errors are clustered by MP.
We therefore code as
‘Male Topics’
Macroeconomics
Energy
Defence
Space Science
Govt Operations
We therefore code as
‘Female Topics’
Civil Rights + Immigration
Health
Law, Crime and Family
Social Welfare
Education
23. Figure 1: ratio of speeches on civil rights, health, education and social welfare. Red
line represents the ration for all MPs, black line for male MPs and grey line for
female MPs
24.
25. z-score topic words HEATH
DEBATES
6,31 abort pregnanc women foetus termin contracept woman rape babi gynaecologist
2,87 cancer research transplant cell organ screen fluorid human donor blood
2,83 smoke tobacco alcohol cigarett advertis drink ban smoker pub product
0,98 vaccin osteopath immunis whoop acupunctur hairdress osteopat regi cough measl
-5,29 nhs bill trust mental amend nurs carer communiti doctor claus
Table 5: Z-scores for topics most related to gender. Topic model is trained on all health debates.
26. Figure 7: predicted number of speeches for male (left) and female MPs (rights). The blue line represents the ‘male’ topics, the
red line the ‘female’ topics. Dotted lines show the 95% confidence intervals.
30. Education Issue in 1885: top contexts
Liberals mentions of
School, Child, Education (163
mentions) Score
Percentage of total
mentions
Poor people priced out of education 34 21%
General expressions of support in favour
of free education 34 21%
Will give dignity to poor/ will help poor 19 12%
Improve social mobility of poor 14 9%
Reassurance that religious aspect to
education will be kept 9 6%
Conservative mentions of school,
Child, Education (146 mentions) Score
Percentage of total
mentions
Weaken voluntary schools 26 18%
Expensive, wasteful 24 16%
Highlighting Liberal attack on religious
basis of education 19 13%
Stops people being stakeholders* 13 9%
Criticising compulsory and universal
education 16 11%
Poor standard of board Schools 5 3%
31. Church Disestablishment in 1885: top
contexts
Context of Liberal Mentions of Church (105
total) Score Percentage of total mentions
Proclamations in favour of Disestablishment 37 35%
Disestablishment as a route to Religious
Equality 17 16%
Candidate distancing themselves from
Disestablishment 10 10%
Attacks on Conservatives for politicising the
Church 9 9%
Context of Conservative Mentions of
Church (111) Score Percentage of total mentions
Attacks on Liberals for tying to weaken/
abolish Church 36 32%
General vows to protect Church 31 28%
Benefits of Church (Education) 5 5%
Benefits of Church (Church sponsored
charities) 11 10%
Benefits of Church (Classless, available for all
Classes) 10 9%
Benefits of Church (Education) 3 3%
Benefits of Church (Improvement of character) 3 3%
32. Mentions of Tariff Reform in East Anglia,
Jan 1910 compared to 1906: Liberals
33. Mentions of Tariff Reform in East Anglia,
Jan 1910 compared to 1906: Liberals + Conservatives