臨床疫学研究における傾向スコア分析の使い⽅〜観察研究における治療効果研究〜

臨床疫学研究における
傾向スコア分析の使い⽅
〜観察研究における治療効果研究〜
奥村泰之
公益財団法⼈東京都医学総合研究所
精神⾏動医学研究分野
第4回臨床研究実践講座ワークショップ
2020/2/27 (⽊) 12:30~16:30
JAM⾦属労働会館3階 301・302会議室

構成
 はじめに
 活⽤事例
 効果の種類
 共変量の選択法
 傾向スコアの推定
 傾向スコアの利⽤①概要
 傾向スコアの利⽤②マッチング法
 傾向スコアの利⽤③重み付け法
 バランスの評価
 効果の推定
 落ち葉拾い
 報告ガイドライン
 データ解析環境R①マッチング法
 データ解析環境R②重み付け法
 おわりに
2

英語表記
Propensity (Score)
Analysis/Methods/Matching
Matching Methods
Weighting Methods
3

⽇本発の論⽂急増
4
検索⽇: 2019/01/7
検索式: (propensity score*[tiab] OR propensity match*[tiab] OR propensity analy*[tiab]) AND (japan*[tiab])
1999 2002 2005 2008 2011 2014 2017
0
50
100
150
1 0 0 0 0 0 1 0
7 7 3
11 12
24 28
37
71
81
97
134
166

傾向スコア分析の使⽤⽬的
アウトカムの測定後に傾向スコア分析を
⾏い，選択バイアスを減らして，治療の
効果を検討する
アウトカムの測定前に傾向スコア分析を
⾏い，追跡する集団を限定する
5
Stuart EA: Stat Sci 25:1-21, 2010 .

変数の役割と尺度⽔準
1つのアウトカム
量的変数/質的変数/イベント発⽣までの時間
• ⽣活の質，⽣きている/死んでいる，⽣存時間
1つの処置変数
名義尺度 (2⽔準が中⼼)
• 処置群/対照群
1つ以上の共変量
量的変数/質的変数
6
共変量
アウトカム
処置変数

傾向スコア分析の流れ
7
共変量の選択
傾向スコアの推定
傾向スコアの利⽤
バランスの評価
効果の推定
効果の解釈
マッチング
層化
重み付け
共変量調整
Ali MS et al: J Clin Epidemiol. 2014 Nov 26. pii: S0895-4356(14)00347-3
反復

急性⼼筋梗塞におけるハロペリドール使⽤
による死亡リスク
9
Park Y et al: BMJ. 2018 Mar 28;360:k1218.

研究疑問
10
P a t i e n t
抗精神病薬を処⽅された急性⼼筋梗塞を
有する18歳以上の⼊院患者
Ex p o s u r e
経⼝ハロペリドールの使⽤
Comparison
経⼝リスペリドン/オランザピン/クエチ
アピンの使⽤
Ou t c o m e
処⽅開始7⽇以内の院内死亡

背景①
認知症の⾏動・⼼理症状に対する，抗精神病
薬の安全性が検討されてきた
「定型薬と⾮定型薬の両クラスとも，死亡リ
スクが上昇する」ことを，2005年と2008年に
⽶FDAは添付⽂書で⿊枠警告として含めた
ただし，その時点では，定型薬の⽅が死亡リ
スクが⾼いか否かを結論づける，⼗分なエビ
デンスがなかった
その後，外来・介護施設で，定型薬の⽅が死
亡リスクが⾼いことが明らかになってきた
11

背景②
⼊院において抗精神病薬は，せん妄の管理の
ために頻⽤されているが，その安全性は⼗分
に検討されていない
抗精神病薬は，QTc延⻑や不整脈など循環器
系への悪影響がある
したがって，⼼疾患による⼊院患者は，抗精
神病薬によるリスクに，より脆弱であること
が予想される
12

データ源
13
データベース名 Premier Research Database
国⽶国
対象 700病院の退院データ
代表性約20%

適格基準
 18歳以上
 主傷病が急性⼼筋梗塞
 経⼝抗精神病薬(ハロペリドール/リスペリドン/オランザピ
ン/クエチアピン)の使⽤
 最低3⽇以上の在院
 ⼊院3⽇までの間に最低2⽇は抗精神病薬を未使⽤
 抗精神病薬の開始時に2種類以上の抗精神病薬が使⽤
されていない
 統合失調症/双極性障害の診断名を有さない
 冠動脈バイパス術を受けていない
14

処置変数
処置群
定型薬(経⼝ハロペリドール)の使⽤
対照群
⾮定型薬(経⼝リスペリドン/オランザピン/クエチアピ
ン)の使⽤
処⽅中断の定義
2⽇以上の間，処⽅がない
処⽅変更の定義
薬剤クラスの変更 (定型薬➠⾮定型薬)
経⼝以外の抗精神病薬の使⽤
15

アウトカムと追跡期間の定義
アウトカム
院内死亡
追跡期間の定義
抗精神病薬の使⽤開始から7⽇間
打ち切りの定義
死亡，退院，追跡期間の終了
16

解析対象集団
主解析: ITT*集団
*intention-to-treat
処⽅変更・中断を考慮しない
副解析: as-treated集団
処⽅変更・中断を打ち切りとする
17

共変量
 患者属性
 性別, 年齢, ⼈種など
 病院特性
 地域，病床数，⼈⼝など
 慢性疾患(退院時サマリ)
 Charlsonインデックス，パーキンソン病など
 処⽅薬(インデックス⽇より前)
 抗⾎⼩板薬，抗凝固薬，ヘパリン静注など
 処置(インデックス⽇より前)
 経⽪冠動脈術，⼤動脈内バルーンなど
 その他
 抗精神病薬開始までの⽇数 18

流れ図
19
2003~2014年の間に抗精神病薬の処⽅が
ある18歳以上の⼊院患者 (n = 125264)
主傷病が急性⼼筋梗塞 (n = 64140)
経⼝の曝露・対照薬 (n = 28303)
研究コホート (n = 6578)
除外:
 在院⽇数<3 (n=1688)
 在院1~2⽇⽬に抗精神病薬
(n=17434)
 ⾮経⼝の抗精神病薬 (n=477)
 統合失調症/双極性障害
(n=783)
 冠動脈バイパス術 (n=1342)
 背景因⼦に⽋測 (n=1)

ロジスティック回帰分析により
傾向スコアを推定
20
ハロペリドール
⾮定型薬

傾向スコアの利⽤
 Patients who received an atypical antipsychotic were
matched to patients who received haloperidol using
a 1:1 nearest neighbor matching algorithm with a
caliper of 0.2 of the standard deviation of the
propensity score on the logit scale.
21
Nearest neighbor
matching/1:1/0.2SD

バランスの評価(抜粋)
マッチング前マッチング後
背景因⼦
(n=1668)
⾮定型薬
(n=4910)
(n=1659)
⾮定型薬
(n=1659)
平均年齢 77.0 74.6 77.0 76.8
⽩⼈ 68.9% 73.0% 69.0% 69.6%
緊急⼊院 91.9% 94.2% 91.9% 91.7%
⼼電図 1.0% 0.2% 0.5% 0.6%
抗⾎⼩板薬 90.1% 86.4% 90.1% 90.0%
スタチン 65.3% 58.8% 65.3% 65.3%
22

効果の推定
解析対象集団ハザード⽐
(95%信頼区間)
ITT*集団 1.51 (1.22 to 1.85)
as-treated集団 1.93 (1.34 to 2.76)
*intention-to-treat
23
ハロペリドール 7.8%
vs. ⾮定型薬 5.5%

結論
ハロペリドールは，⻑年にわたって，せん妄
管理のために活⽤されてきた。しかし，ハロ
ペリドールに脆弱性のある⼼疾患を有する患
者にとって，⾮定型抗精神病薬の⽅が害は少
ないと思われる。
24

平均処置効果
Average Treatment Effect (ATE)
⺟集団の構成員すべてが処置群から対照群に変
化したときの，アウトカムの期待値の差
26
Schafer JL, Kang J: Psychol Methods 13: 279-313, 2008.
アウトカム (Yi)
ID 処置変数 (Zi) 処置対照 Di
1 対照 7.6 6.1 1.5
2 処置 7.9 7.2 0.7
3 対照 4.1 5.2 -1.1
4 処置 7.1 4.8 2.3
・・・・・
N 対照 8.3 6.9 1.4
注) ⻘⾊は⽋測値
処置対照

処置群の平均処置効果
Average Treatment effect for Treated (ATT)
⺟集団の構成員のうち処置群が対照群に変化し
たときの，アウトカムの期待値の差
27
Schafer JL, Kang J: Psychol Methods 13: 279-313, 2008.
アウトカム (Yi)
ID 処置変数 (Zi) 処置対照 Di
1 対照 7.6 6.1 1.5
2 処置 7.9 7.2 0.7
3 対照 4.1 5.2 -1.1
4 処置 7.1 4.8 2.3
・・・・・
N 対照 8.3 6.9 1.4
注) ⻘⾊は⽋測値
処置処置対照処置

効果の種類の選択法
28
ある効果研究の
適格基準に合致した患者全員に
当該処置をすることは実現可能︖
平均処置効果
(ATE)
処置群の
平均処置効果 (ATT)
はいいいえ
Desai RJ, Franklin JM: BMJ. 2019 Oct 23;367:l5657.

⼼房細動からの脳梗塞予防に対する
抗⾎栓療法の有⽤性(ワルファリン vs ダビガトラン)
29
両剤とも代替性があるため
⼼房細動診断を有する全ての患者が
ダビガトランによる処置を受けられる
平均処置効果 (ATE)

プライマリケアの喫煙者への
禁煙推奨冊⼦の有⽤性(冊⼦提供あり vs 提供なし)
30
プライマリケアの喫煙者すべてが
冊⼦による処置を安価に受けられる
平均処置効果 (ATE)
Austin PC: Multivariate Behav Res 46: 399-424, 2011.

妊婦への抗精神病薬による
先天性奇形のリスク(抗精神病薬 vs 無投薬)
31
安全性の懸念から
統合失調症等の診断を有する全ての妊婦が
抗抗精神病薬による処置を受けられない
処置群の

喫煙者への禁煙カウンセリングの有⽤性
(介⼊あり vs 介⼊なし)
32
費⽤の懸念から
すべての喫煙者が禁煙カウンセリング
による処置を受けられない
処置群の

傾向スコアの利⽤法と効果の種類
33
利⽤法平均処置効果
(ATE)
処置群の
マッチング △ ○
重み付け ○ ○
層化 ○ ○
共変量 ○ ✕
Ali MS et al: Front Pharmacol. 2019 Sep 18;10:973.

効果の種類×利⽤法により推定値は異なる
急性⾮代償性⼼不全へのCPAPの効果(使⽤ vs. ⾮使⽤)
34
利⽤法×効果の種類 n ⽣存率の差
(95%信頼区間)
マッチング-ATT 952 0.03 (−0.02, 0.08)
重み付け-ATT 4953 0.02 (0.01, 0.02)*
重み付け-ATE 4953 0.05 (−0.01, 0.12)
ATE=平均処置効果; ATT=処置群の平均処置効果
Pirracchio R et al: Stat Methods Med Res. 2016 Oct;25(5):1938-1954.
解析前に効果の種類を選択すべき

効果の種類により
推定値が⼤きく異なる２つの要因
35
正値性の仮定からの逸脱
(violation or near violation of
positivity assumption)
傾向スコアの分布ごとに⾮均質な効果
(non-uniformity of treatment effect across
the PS strata)
Pirracchio R et al: Stat Methods Med Res. 2016 Oct;25(5):1938-1954.

記載例
平均値処置効果(ATE)
 In our study, the insured were men with health insurance whilst the
counterfactual group were those without insurance coverage. Under ideal
conditions, the effective strategy would be to obtain the average
effect of insurance coverage on prostate cancer screening, also known as
the average treatment effect (ATE).[1]
処置群の平均処置効果(ATT)
 The question of interest was whether the treated group (the BIC
[Breakfast in the Classroom] schools) had different outcomes than they
would have if not provided with the BIC program or the average
treatment effect among the treated;[2]
36
[1] Kangmennaang J, Luginaah I: J Cancer Epidemiol. 2016;2016:7284303.
[2] Anzman-Frasca S et al: JAMA Pediatr. 2015 Jan;169(1):71-7.

共変量の選択法
領域固有の知識[1]
統計的検定[1]
関連の強さ[1]
⾼次元傾向スコア[2]
38
[1] Ali MS et al: Front Pharmacol. 2019 Sep 18;10:973.
[2] Jackson JW et al: Curr Epidemiol Rep. 2017 Dec;4(4):271-280.
共変量
アウトカム
処置変数
最重要

交絡変数(X): 処置(Z)とアウトカム(Y)を規定する変数
共変量の分類①交絡変数(confounding variable)
39
1歳時点のアセトア
ミノフェン処⽅(Z)
喘息(Y)
ウィルス感染(X)
調整すべき変数
Ali MS et al: Am J Clin Nutr. 2016 Aug;104(2):247-58.
Williams TC et al: Pediatr Res. 2018 Oct;84(4):487-493.

リスク因⼦(R): アウトカム(Y)と関連するが
処置(Z)と関連しない変数
共変量の分類②リスク因⼦(risk factor)
40
抗精神病薬の種類(Z) 死亡(Y)
年齢(R)
調整すべき変数

操作変数(IV): 処置(Z)と強く関連するが
アウトカム(Y)には直接関係しない変数
共変量の分類③操作変数(instrumental variable)
41
術式(Z) 死亡(Y)
昨年の術式選
択割合(IV)
※頸動脈内膜剥離術
vs 頸動脈ステント留置術
調整すべきでない変数
Columbo JA et al: JAMA Netw Open. 2018 Sep 7;1(5):e181676.

中間変数(I): 処置(Z)の影響を受け
処置とアウトカムの因果関係の間にある処置後の変数
42
共変量の分類④中間変数(intermediate variable)
妊娠⾼⾎圧腎症(Z) ⼩児脳性⿇痺(Y)
早産(I)
Ananth CV, Schisterman EF et al: Am J Obstet Gynecol. 2017 Aug;217(2):167-175.

合流点(C): 2つ以上の要因から影響を受ける変数
共変量の分類⑤合流点(collider)
43
肥満(Z) 死亡(Y)
糖尿病(C)
遺伝など未測定の
交絡変数(U)
Zahir SF et al: Diabetologia. 2019 May;62(5):754-758.

共変量選択の留意点
① 共変量は真に処置前変数であること[1]
② 共変量を増やすよう重視すること[2]
③ 信頼性のある共変量を選択すること[3]
④ 共変量をカテゴリ化する場合は，臨床的に
意味のある閾値を使うこと[4]
⑤ ⾮線形関係が期待される場合は，⼆次・三
次の項を加えること[4]
⑥ 共変量の測定期間を変えるなど感度分析を
⾏うこと[1]
44
[3] Harris H, Horst SJ: Pract Assess Res Eval. (2016) 21:1–11
[4] Yang JY et al: Gastrointest Endosc. 2019 Sep;90(3):360-369.

記載例①共変量選択理由
 We identified potential confounders that were plausibly associated with
both the choice of antipsychotic and the risk of in-hospital death based
on clinical knowledge, using information from hospital admission to the
day before initiation of an antipsychotic. In addition to hospital
characteristics (teaching, urban), the covariates included patient
characteristics and conditions that were plausibly associated with the
choice of antipsychotic and the risk of in-hospital death (see
supplementary table S1 for the list of covariates).
45

記載例②共変量リスト
46
Pasternak B et al: BMJ. 2018 Mar 8;360:k678.

傾向スコアとは
正式な理解
観測した共変量が与えられた条件下で，ある
処置を受ける条件付き確率
直感的な理解
ある⼈の処置前の共変量を考慮したときに，
ある⼈が処置を受ける確率
48
Trojano M et al: Int MS J 16: 90-7, 2009.

傾向スコア推定の統計モデル
ロジスティック回帰(logistic regression)
正規化回帰モデル(penalized/regularized regression)
分類⽊・回帰⽊(Classification and Regression Trees: CART)
ランダムフォレスト(random forest)
サポートベクターマシーン(support vector machine)
ブースティング(boosting)
スーパーラーナー(super learner)
ニューラルネットワーク(neural networks)
49

統計モデルの⽐較研究①
50
Logistic regression vs. CART vs. Bagging vs.
Random Forest vs. Neural Network vs. naive
Bayes[1]
Random Forestがベスト
Logistic regressionとNeural Networkも良い
Logistic regression vs. boosted CART vs.
Covariate-balancing propensity score(CBPS)[2]
CBPSが良い
バランス評価をすればLogistic regressionも良い
[1] Cannas M, Arpino B: Biom J. 2019 Jul;61(4):1049-1072.
[2] Wyss R et al: Am J Epidemiol. 2014 Sep 15;180(6):645-55.

統計モデルの⽐較研究②
51
Logistic regression vs. CART vs. pruned CART
vs. bagged CART vs. Random Forest vs.
boosted CART[1]
⾮線形性あるいは⾮加法性のいずれかの条件では，す
べての⼿法は許容可能な⽔準
⾮線形性かつ⾮加法性の条件では，boosted CARTと
random forestを推奨
Logistic regression vs. CART vs. pruned CART
vs. Neural Network[2]
Logistic regressionの頑健性は⾼い
[1] Lee BK et al: Stat Med. 2010 Feb 10;29(3):337-46.
[2] Setoguchi S et al: Pharmacoepidemiol Drug Saf. 2008 Jun;17(6):546-55.

傾向スコア推定の変数役割
従属変数
処置変数 (Z) (処置群/対照群)
独⽴変数
共変量 (X)
傾向スコアの推定値 ( )
予測値:
特徴
得点可能範囲: 0〜1
サイズ: 標本サイズと同じ
52
共変量
(X)
処置変数
(Z)
i
ê
処置群
Austin PC et al: Analysis of observational health care data using SAS (pp51-84). SAS press. 2010.

観察研究における傾向スコア
ID 処置変数 (Zi) 傾向スコア (ei)
1 対照 0.2
2 処置 0.2
3 対照 0.3
4 処置 0.3
・・・
N 対照 0.4
53
⼈により異なる
傾向スコアが近い⼈同⼠
は共変量パターンが類似
する可能性が⾼い

RCTにおける傾向スコア
ID 処置変数 (Zi) 傾向スコア (ei)
1 対照 0.5
2 処置 0.5
3 対照 0.5
4 処置 0.5
・・・
N 対照 0.5
54
研究法で規定される

傾向スコア推定前の評価
55
年齢
⾷道拡張術の
実施
(n=177)
⾷道拡張術の
⾮実施
(n=597)
0-9 0% 28%
10-17 6% 26%
18-29 28% 13%
30-39 21% 13%
40+ 45% 21%
処置/⾮処置の可能性をほぼ説明する
⽔準は適格基準とする(e.g. 成⼈に限る)
Yang JY et al: Gastrointest Endosc. 2019 Sep;90(3):360-369.

傾向スコア推定後の評価
56
Leisman DE: Crit Care Med. 2019 Feb;47(2):176-185.
必ずステロ
イド使⽤
必ずステロイ
ド⾮使⽤
ステロイド使⽤の傾向スコア
患
者
数
ステロイド使
⽤と⾮使⽤が
ありえる
共通領域(common support)
が狭いと⽐較できない

共通領域が狭い場合*の対処法
*⾮共通領域を除くと対象者の多くがいなくなる場合
対照群の定義を変える
適格基準の定義を変える
57

記載例①傾向スコア推定後の評価
 The largely overlapping distributions of propensity scores (see
supplementary figure S2) suggest that haloperidol and atypical
antipsychotics were used interchangeably in many instances, judged by
the measured covariates.
58
⾮定型薬

記載例②ロジスティック回帰
59
 The propensity score for fluoroquinolone exposure was estimated by a
logistic regression model, including 47 covariates as predictors,
covering demographic information, medical history, prescription drug use,
and healthcare use (web table 2).
Pasternak B et al: BMJ. 2018 Mar 8;360:k678.

記載例③boosted CART
 Because of the observational nature of the present study and to minimize
confounding, we modeled the probabilities of developing high ESs at 11
and 15 years old using the PS weighting (PSW) method, which used
generalized boosted modeling (GBM) to calculate PSs, in twang
package in R.[ref] GBM has been made popular in the machine learning
community as one of the latest prediction methods, allowing researchers
to powerfully estimate exposure probability (PS) on the basis of many
predicting covariates. It fits several models, both linear and nonlinear,
using a regression tree and then merging predictions computed by each
model.[ref] Regression trees do not require researchers to specify functional
forms of variables (ie, they handle continuous, nominal, ordinal, and
missing independent variables, as well as nonlinear and interaction
effects).[ref] Covariables used to compute PS at 11 years old and PS at 15
years old using GBM were chosen considering previous work on ESs and
CVD risk2 and can be found in the Figure. Number of interaction trees
was set on 5000, shrinkage in 0.01 and level of interactions in 2, which
were basically set to minimize prediction errors by means of subsampling
strategies.[ref]
60
Belem da Silva CT et al: J Am Heart Assoc. 2019 Jan 22;8(2):e011011.

傾向スコアの利⽤①概要
61

傾向スコアの利⽤法
傾向スコアマッチング(propensity score matching)
傾向スコアの逆確率重み付け(inverse
probability of treatment weighting using the propensity score)
傾向スコアの層化/層別
(stratification/subclassification on the propensity score)
傾向スコアの共変量調整(covariate adjustment
using the propensity score)
62

傾向スコアマッチング
63
Kuss O et al: Dtsch Arztebl Int. 2016 Sep 5;113(35-36):597-603.
処置群
対照群
傾向スコア
の推定
傾向スコア
マッチング
マッチする相
⼿がいない
アウトカム
の評価

傾向スコアの逆確率重み付け
64
処置群
対照群
傾向スコア
の推定
重み付け
アウトカム
の評価
⼤きく
重み付け
傾向スコア
(PS)
重み
(1/PS)
0.49 2.04
0.61 1.64
0.89 1.12
0.95 1.05
傾向スコア
(PS)
重み
(1/[1-PS])
0.21 1.27
0.49 1.96
0.61 2.56
0.89 9.09
処置群対照群

傾向スコアの層化
65
処置群
対照群
傾向スコア
の推定
層化
(五分位など)
アウトカム
の評価

傾向スコアの共変量調整
66
処置群
対照群
傾向スコア
の推定
回帰モデルに処置変数と傾向スコア
を独⽴変数として投⼊してアウトカ
ム評価

マッチングの特徴
強み
サブグループ分析が容易
弱み
アウトカム評価において対応を考慮すべきか
合意がない
マッチングの過程で対象者が除外されると，
精度と⼀般化可能性が下がる
67

重み付けの特徴
強み
時間依存共変量に拡張できる
弱み
極端な重みと⾮共通領域の存在が課題
アウトカム評価においてブートストラップ法
やロバスト分散の使⽤が必要
サブグループ分析は複雑
68

層化の特徴
強み
アウトカム評価の⽅法が明瞭
弱み
傾向スコアの層による効果修飾があると解釈
が複雑になる
残余交絡が⽣じやすい
69

共変量調整の特徴
強み
実施が容易
弱み
モデル依存性が⾼い
傾向スコアとアウトカムとの関連が線形であ
り，傾向スコアと処置変数との交互作⽤がない
という仮定が必要
アウトカムが質的変数の場合，効果の解釈が
複雑になる
70

マッチングと重み付けの標的集団
マッチング
⺟集団から標本抽出された構成員のうちマッ
チングをできた⼈すべてが，ある処置を受けた
場合 vs. もう⼀⽅の処置を受けた場合，どうな
るか︖
重み付け
⺟集団から標本抽出された構成員すべてが，
ある処置を受けた場合 vs. ある処置を受けな
かった場合，どうなるか︖
71
Thomas L et al: JAMA. 2020 Jan 10. doi: 10.1001/jama.2019.21558.

マッチング後は年齢と受診回数の
分散が⼩さい
72
受
診
回
数
受
診
回
数
年齢
年齢
処置群
対照群
処置群
対照群
マッチング
された群
マッチング前の特性マッチング後の特性

73
受
診
回
数
年齢
処置群
対照群
重み付け前の特性重み付け後の特性
処置群
対照群
75歳以上＋受診16回以上の対照群の⼈は
過剰な重みになる
過剰
受
診
回
数
年齢

傾向スコアの利⽤法の選択割合
74
[1] Grose E et al: J Am Coll Surg. 2020 Jan;230(1):101-112.e2. (不明1編)
[2] Yao XI et al: J Natl Cancer Inst. 2017 Aug 1;109(8). (併⽤21編)
[3] Lonjon G et al: Ann Surg. 2017 May;265(5):901-909.
[4] Ali MS et al: J Clin Epidemiol. 2015 Feb;68(2):112-21. (不明3編，併⽤26編)
共変量調整
層化
重み付け
マッチング
選択割合 (%)
0 20 40 60 80 100
手術編
[1] 2016-2018 (303 )
がん手術編
[2] 2014-2015 (306 )
手術編
[3] 2013-2014 (129 )
全体編
[4] 2011-2012 (296 )
21
16
10
10
14
4
9
1
7
5
6
5
69
75
68
84

傾向スコアの利⽤②マッチング法

マッチングの設定
①アルゴリズム
②構成⽐
③抽出法
76

アルゴリズムの種類
77
Bai H, Clark MH: Propensity score methods and applications. Sage. 2019
Traditional matching Greedy matching Complex matching
Exact
Mahalanobis
Nearest neighbor
Caliper
Mahalanobis with PS
Subclassification
Optimal
Kernal
Full
Radius
Interval
Mahalanobis
caliper
Genetic
Difference-in-
differene

Nearest neighbor matching
①処置群から無作為に1⼈選択
②対照群から，①で選択した⼈の傾向ス
コアと，最も類似の傾向スコアの⼈をペ
アとする
③上記の①~②を反復
78
処置群の傾向スコア
1
ˆ 
i
e
0
ˆ 
i
e
対照群の傾向スコア

Nearest neighbor matching
79
ID PS
t1 0.48
t2 0.97
t3 0.69
t4 0.68
t5 0.96
t6 0.34
c1 0.31
c2 0.00
c3 0.74
c4 0.02
c5 0.52
c6 0.29
処置群対照群
ID PS1 ID PS2 |PS1-PS2|
t1 0.48 c5 0.52 0.04
t2 0.97 c3 0.74 0.23
t3 0.69 c1 0.31 0.37
t4 0.68 c6 0.29 0.39
t5 0.96 c4 0.02 0.93
t6 0.34 c2 0.00 0.34
2.31
距離のレンジは
0.04~0.93
距離の合計は
2.31
距離が近いペアを作る

Nearest neighbor matchingの課題
80
ID PS
t1 0.48
t2 0.97
t3 0.69
t4 0.68
t5 0.96
t6 0.34
c1 0.31
c2 0.00
c3 0.74
c4 0.02
c5 0.52
c6 0.29
処置群対照群
t1 0.48 c5 0.52 0.04
t2 0.97 c3 0.74 0.23
t3 0.69 c1 0.31 0.37
t4 0.68 c6 0.29 0.39
t5 0.96 c4 0.02 0.93
t6 0.34 c2 0.00 0.34
⼀部
距離が⼤きい

Caliper matching*
*アルゴリズムではなく設定の⼀種と分類することもある
81
ID PS
t1 0.48
t2 0.97
t3 0.69
t4 0.68
t5 0.96
t6 0.34
c1 0.31
c2 0.00
c3 0.74
c4 0.02
c5 0.52
c6 0.29
処置群対照群
t1 0.48 c1 0.31 0.17
t2 0.97 c3 0.74 0.23
t3 0.69 c5 0.52 0.16
t4 0.68
t5 0.96
t6 0.34 c6 0.29 0.05
距離が0.3以上の
処置群を除外

キャリパーの指定
指定法
⼀定の傾向スコアの距離 (キャリパー) に収ま
る⼈をマッチングの対象とする
推奨値
傾向スコアの推定値をロジット変換した値の
標準偏差に0.2を乗じた値が推奨
82

Optimal matching
距離の合計が最⼩になるペアを作る
83
ID PS
t1 0.48
t2 0.97
t3 0.69
t4 0.68
t5 0.96
t6 0.34
c1 0.31
c2 0.00
c3 0.74
c4 0.02
c5 0.52
c6 0.29
距離のレンジは
0.03~0.95
距離の合計は
2.22
処置群対照群
t1 0.48 c6 0.29 0.19
t2 0.97 c4 0.02 0.95
t3 0.69 c5 0.52 0.16
t4 0.68 c2 0.00 0.68
t5 0.96 c3 0.74 0.21
t6 0.34 c1 0.31 0.03
2.22

Full matching
1名の処置群と複数名の対照群，1名の対
照群と複数名の処置群でマッチングの組
み合わせ
84
1
ˆ 
i
e
0
ˆ 
i
e

構成⽐の種類
One-to-one/pair matching
1名の処置群と1名の対照群でマッチング
Fixed ratio matching
1名の処置群とM名の対照群でマッチング
One-to-many (1:M)/variable ratio
matching
1名の処置群と1~M名の対照群でマッチング
85
Leite W: Practical propensity score methods using R. Sage, 2016.

One-to-many (1:M) matching
処置群1⼈に，対照群を複数(⼈により対照群
の数が変動)
86
1
ˆ 
i
e
0
ˆ 
i
e

抽出法の種類
⾮復元抽出(without replacement)
処置群のペアとして同⼀の対照群の⼈を，複
数回使⽤できない
復元抽出(with replacement)
処置群のペアとして同⼀の対照群の⼈を，複
数回使⽤できる
87

復元抽出
88
処置群対照群
t1 0.48 c5 0.52 0.04
t2 0.97 c3 0.74 0.23
t3 0.69 c3 0.74 0.06
t4 0.68 c3 0.74 0.07
t5 0.96 c3 0.74 0.21
t6 0.34 c1 0.31 0.03
ID PS
t1 0.48
t2 0.97
t3 0.69
t4 0.68
t5 0.96
t6 0.34
c1 0.31
c2 0.00
c3 0.74
c4 0.02
c5 0.52
c6 0.29
対照群の
c3は4回使⽤
同⼀の対照群を復元使⽤

典型的なマッチングの設定
①アルゴリズム
Caliper matching
②構成⽐
One-to-one
③抽出法
Without replacement
89
[1] Grose E et al: J Am Coll Surg. 2020 Jan;230(1):101-112.e2.
[2] Yao XI et al: J Natl Cancer Inst. 2017 Aug 1;109(8).
[4] Ali MS et al: J Clin Epidemiol. 2015 Feb;68(2):112-21.

サブグループ分析
90
傾向スコアの推定とマッチングは
どうする︖

５つのサブグループ分析①
91
Wang SV et al: Am J Epidemiol. 2018 Aug 1;187(8):1799-1807.
要素解説
⽅法標本全体で傾向スコアを算出して，全体でのマッチ
ングを⾏う。同じ傾向スコアを使ってサブグループ
内で再度マッチングを⾏う(標本全体でマッチングされた
か否かは無視)
利点主解析の標本サイズは最⼤化される
⽋点サブグループは，主解析の集団と対応しているとは
限らない

５つのサブグループ分析②
92
要素解説
ングを⾏う。同じ傾向スコアを使ってサブグループ
内で再度マッチングを⾏う(標本全体でマッチングされた
⼈に限定)
利点サブグループは，主解析の集団と対応する
⽋点 ①と⽐較して，サブグループの標本サイズが⼩さく
なる

５つのサブグループ分析③
93
要素解説
⽅法サブグループ内で傾向スコアを算出してマッチング
を⾏う。主解析のためにサブグループを統合する。
⽋点事後的にサブグループ分析を⾏う場合，主解析の集
団と対応しない。事後的なサブグループが主解析の
集団と対応するためには，主解析のために新たにサ
ブグループを統合する必要がある。傾向スコア推定
の際に，収束の問題が⽣じやすい。

５つのサブグループ分析④
94
要素解説
⽅法標本全体で傾向スコアを算出して，サブグループ内
でマッチングを⾏う。主解析のためにサブグループ
を統合する。
⽋点事後的にサブグループ分析を⾏う場合，主解析の集
団と対応しない。事後的なサブグループが主解析の
集団と対応するためには，主解析のために新たにサ
ブグループを統合する必要がある。

５つのサブグループ分析⑤
95
要素解説
ングを⾏う。追加の調整をせずに，サブグループ内
で効果を推定する。
利点サブグループ内で，⼆度⽬のマッチングをする必要
がない。
⽋点統計学的特性は，すべての⼿法の中で最も悪い

記載例①Caliper matching/1:1/⾮復元抽出
 We applied a 1:1 nearest-neighbor risk-set matching algorithm on the
propensity score without replacement, with a maximum caliper width
of 0.1 of the SD of the logit of the propensity score.
96
Henriquez DDCA et al: JAMA Netw Open. 2019 Nov 1;2(11):e1915628.

記載例②Optimal matching/1:M/⾮復元抽出
 After deriving a propensity score for each patient, variable optimal
matching for each hypothermia-treated patient was performed, with up
to 4 controls without replacement for each treated patient, using an
algorithm match with a caliper width no greater than 0.2 times the
standard deviation of the logit of the propensity score.
97
Chan PS et al: JAMA. 2016 Oct 4;316(13):1375-1382.

 In the next step, patients were matched on estimated propensity scores
using a combination of exact and full matching.[ref] The matching was
exact concerning calendar quarter and in-hospital PCI. Full matching
means that a patient treated with fondaparinux could be matched to
several patients treated with LMWH and vice versa. The caliper (upper
limit to the allowed difference in propensity score between matched
patients treated with LMWH and fondaparinux) was 0.002 (except for
eGFR >15-30, for which the caliper was 0.005, and eGFR ≤15, for which the
caliper was 0.01). Unmatched patients were removed in the subsequent
analysis.
98
記載例③Full matching/M:M/⾮復元抽出
Szummer K et al: JAMA. 2015 Feb 17;313(7):707-16.

 We tested for the presence of effect modification in several relevant
subgroups. First, we restricted the analysis to patients without evidence
of antibiotic use, disease-modifying antirheumatic drug use, or infections
in the baseline period (defined as 180 days before cohort entry). Second,
we stratified the analysis by sex to reflect the differences in incidence and
severity of UTIs[Urinary Tract Infections][ref]. ...(中略)...
 Within each subgroup, the propensity score was reestimated and
patients were rematched on the newly estimated score using 1:1
nearest-neighbor matching within a caliper width of 0.01.
99
記載例④サブグループ分析
Dave CV et al: Ann Intern Med. 2019 Jul 30. doi: 10.7326/M18-3136.

傾向スコアの利⽤③重み付け法

確率抽出調査における重み付け法
特性
調査対象
疑似的
全住⺠
全住⺠
Non-Hispanic Black 25% 12% 13%
Mexican Americans 28% 9% 9%
12-19 years old 24% 12% 12%
101
全住⺠から⼀部の集団
を多く確率抽出
重みを考慮して分布を
求めると全住⺠に近似
Centers for Disease Control and Prevention
(https://www.cdc.gov/nchs/tutorials/NHANES/SurveyDesign/Weighting/OverviewExamples.htm)
⺟集団
標本
擬似集団
確率抽出
重み付け

傾向スコア分析における重み付け法
102
調査対象疑似集団
特性処置群対照群処置群対照群
男性 88% 44% 80% 80%
⼥性 12% 56% 20% 20%
標本
擬似集団
重み付け
共変量と処置変数が独⽴な
擬似集団(pseudo-population)を作る
Robins JM et al: Epidemiology. 2000 Sep;11(5):550-60.
共変量と処置変数が独⽴
な⺟集団から重み付けて
確率抽出されたとみなす

数値例①標本と共変量の分布
性別処置変数
アウトカム
イベント n
⼥性処置群なし 30
⼥性処置群あり 20
男性処置群なし 252
男性処置群あり 108
⼥性対照群なし 10
⼥性対照群あり 40
男性対照群なし 16
男性対照群あり 24
103
調査対象
特性処置群対照群
男性 88% 44%
⼥性 12% 56%
標本共変量の分布

数値例②傾向スコアの推定
性別処置変数
アウトカム
イベント n
傾向
スコア
⼥性処置群なし 30 0.5
⼥性処置群あり 20 0.5
男性処置群なし 252 0.9
男性処置群あり 108 0.9
⼥性対照群なし 10 0.5
⼥性対照群あり 40 0.5
男性対照群なし 16 0.9
男性対照群あり 24 0.9
104
ロジスティック回帰
分析の予測値
共変量が同⼀の⼈は
傾向スコアも同⼀

数値例③重みの計算
性別処置変数
アウトカム
イベント n
傾向
スコア(e)
重み
⼥性処置群なし 30 0.5 2.0
⼥性処置群あり 20 0.5 2.0
男性処置群なし 252 0.9 1.1
男性処置群あり 108 0.9 1.1
⼥性対照群なし 10 0.5 2.0
⼥性対照群あり 40 0.5 2.0
男性対照群なし 16 0.9 10.0
男性対照群あり 24 0.9 10.0
105
処置群は
1/e
対照群は
1/(1-e)
逆確率重み付け法

数値例④疑似集団の作成
性別処置変数
アウトカム
イベント n
傾向
スコア
重み
疑似
集団
⼥性処置群なし 30 0.5 2.0 60
⼥性処置群あり 20 0.5 2.0 40
男性処置群なし 252 0.9 1.1 280
男性処置群あり 108 0.9 1.1 120
⼥性対照群なし 10 0.5 2.0 20
⼥性対照群あり 40 0.5 2.0 80
男性対照群なし 16 0.9 10.0 160
男性対照群あり 24 0.9 10.0 240
106
重みの数だけ
コピーを作る(N×重み)

数値例⑤疑似集団と共変量の分布
107
疑似集団共変量の分布
疑似集団
特性処置群対照群
男性 80% 80%
⼥性 20% 20%
性別処置変数
アウトカム
イベント
疑似
集団
⼥性処置群なし 60
⼥性処置群あり 40
男性処置群なし 280
男性処置群あり 120
⼥性対照群なし 20
⼥性対照群あり 80
男性対照群なし 160
男性対照群あり 240

⼤きな重みの直感的理解
処置群において傾向スコアの値が⼩さい⼈
は，⼤きい⼈と⽐べて，対照群の⼈と共変量
が似るため，⼤きな重みとする
対照群において傾向スコアの値が⼤きい⼈
は，⼩さい⼈と⽐べて，処置群の⼈と共変量
が似るため，⼤きな重みとする
108
Stuart EA et al: Psychiatr Ann. 2009 Jul 1;39(7):41451.

極端な重み
109
処置群
受
診
回
数
年齢
対照群
1⼈の外れた⼈が
8⼈分の影響を持つ

極端な重みがある場合の対処法①
モデル誤設定の検討(model misspecification)[1]
傾向スコア推定におけるモデルの誤設定により異常に
⼤きな重みになっているかを検討する
トリミング(trimming)[2]
傾向スコアの分布の⾮共通領域の患者，つまり，処置
群か対照群となる可能性がゼロの患者を除外する
トランケーション(truncation)[1,2]
傾向スコアの分布の下端あるいは上端(例: 1パーセンタイ
ル以下/99パーセンタイル以上)になる患者を除外する
110
[1] Leite W: Practical propensity score methods using R. Sage, 2016.
[2] Desai RJ, Franklin JM: BMJ. 2019 Oct 23;367:l5657.

極端な重みがある場合の対処法②
安定化した重み付け(stabilized weights)[1,2]
傾向スコアによる重みの値が極端にならないように安
定化された重みを使う
その他の重み付け法[2]
層化による重み付け(marginal mean/fined stratification weights)
マッチングした重み付け(matching weights)
オーバーラップした重み付け(overlap weights)
111

統計モデルと極端な重みの関係
112
重み≥10
モデル加法・線形弱い程度1)の
⾮加法・⾮線形
中程度2)の
⾮加法・⾮線形
ロジスティク回帰 0.37% 0.42% 0.52%
CART 0.16% 0.13% 0.48%
Random forest 0.25% 0.15% 0.48%
Boosted CART 0.004% 0.005% 0.01%
1) 2要因の交互作⽤項が3個＋⼆次項が1個
2) 2要因の交互作⽤項が10個＋⼆次項が3個
Lee BK et al: PLoS One. 2011 Mar 31;6(3):e18174.
Boosted CARTは極端な
重みが⽣じにくい

４つの重み①逆確率重み付け
要素解説
名称逆確率重み付け
(Inverse probability weighting)
効果の種類集団全体のATE
特徴  効果の種類が明瞭
 極端な重みが⽣じやすい
 重みのトリミングが必要
ATE =平均処置効果
113

定義[1]
 z: 処置変数 (処置群=1; 対照群=0)
 pr(z=1): 処置群の割合
 e: 傾向スコア
 pr(z=0): 対照群の割合
問題[2]
極端な重みが減らないことが結構ある
114
𝑠𝑤
𝑧 𝒑𝒓 𝒛 𝟏
𝑒
1 𝑧 𝒑𝒓 𝒛 𝟎
1 𝑒
[1] Austin PC, Stuart EA et al: Stat Med. 2015 Dec 10;34(28):3661-79.
４つの重み①’安定化した重み
stabilized weights

４つの重み②層化による重み付け
要素解説
名称層化による重み付け
(Marginal mean/Fine stratification weights)
効果の種類集団全体のATE
特徴  効果の種類が明瞭
 極端な重みが⽣じにくい
 処置群の割合が低い場合に強い
ATE =平均処置効果
115

定義
 ns = 層別の⼈数
 pr(Z=z): 処置変数の割合
 nzs: 処置変数×層ごとの⼈数
116
傾向スコア
により層化 (s)
処置群の
⼈数 (n1)
対照群の
⼈数 (n0)
合計 (n)
1 121 1473 1594
2 196 1397 1593
3 236 1357 1593
4 355 1238 1593
5 451 1142 1593
合計 1359 6607 7966
処置群の第1層
の重み
Linden A: J Eval Clin Pract. 2014 Dec;20(6):1065-71.
傾向スコア
により層化 (s)
処置群の
⼈数 (n1)
対照群の
⼈数 (n0)
合計 (n)
1 121 1473 1594
2 196 1397 1593
3 236 1357 1593
4 355 1238 1593
5 451 1142 1593
合計 1359 6607 7966
４つの重み②層化による重み付け
⾮共有領域の重みを
ゼロとし，共有領域
の⼈で層化する

４つの重み③マッチングした重み付け
要素解説
名称マッチングした重み付け (Matching weights)
効果の種類下位集団のATE
特徴  重みは0~1に収まる
 処置変数が3⽔準以上に⾃然に拡張できる
 傾向スコアの共通領域が広く，処置群と
対照群の構成⽐が近似するとき，効果の
種類は集団全体のATEに近づく
 傾向スコアの共通領域が広いが，処置群
と対照群の構成⽐が近似しないとき，効
果の種類はATTに近づく
ATE=平均処置効果; ATT=処置群の平均処置効果
117

118
定義
 ei: 傾向スコア
 zi : 処置変数 (処置群=1; 対照群=0)
処置変数傾向
スコア
(e)
1-傾向
スコア
(1-e)
⼩さい⽅
の値
[min(e,1-e)]
mw
1 0.84 0.16 0.16 0.19
1 0.40 0.60 0.40 1.00
0 0.03 0.97 0.03 0.03
0 0.53 0.47 0.47 1.00
𝑚𝑤
𝑚𝑖𝑛 0.84,0.16
1 0.84
𝑚𝑤
𝑚𝑖𝑛 0.03,0.97
1 0.03
1⼈⽬の重み
3⼈⽬の重み
Li L, Greene T: Int J Biostat. 2013 Jul 31;9(2):215-34.
４つの重み③マッチングした重み付け

４つの重み④オーバーラップした重み付け
要素解説
名称オーバーラップした重み付け (Overlap weights)
効果の種類オーバーラップした集団のATE
特徴  重みは0~1に収まる
 共変量バランスが正確になる
 患者が処置群と対照群のいずれかになる可能性が
現実的である場合，効果の種類はATEと解釈できる
 解析対象集団は，処置の決定が臨床的に均衡する
集団となる
 傾向スコアの共通領域が限定されている場合は，
通常診療で処置を受ける患者を代表しないことに
なる
ATE=平均処置効果
119

120
定義
 ei: 傾向スコア
 zi : 処置変数 (処置群=1; 対照群=0)
処置変数傾向
スコア
(e)
1-傾向
スコア
(1-e)
ow
1 0.84 0.16 0.16
1 0.40 0.60 0.60
0 0.03 0.97 0.03
0 0.53 0.47 0.53
1⼈⽬の重み
3⼈⽬の重み
Li F et al: Am J Epidemiol. 2019 Jan 1;188(1):250-257.
４つの重み④オーバーラップした重み付け

処置群の平均処置効果のための重み付け
Sandardised mortality ratio weights
121
定義
 z: 処置変数 (処置群=1; 対照群=0)
 e: 傾向スコア
[1] Austin PC, Stuart EA et al: Stat Med. 2015 Dec 10;34(28):3661-79.

記載例①逆確率重み付け
 Inverse probability of treatment weighting on the propensity score
was used to balance comparison groups on recorded indicators of
baseline health, including known indications for baclofen use (including
off-label indications).[ref]
122
Muanda FT et al: JAMA. 2019 Nov 9. doi: 10.1001/jama.2019.17725.

記載例②マッチングした重み付け
 Comparison of OS[overall survival] between patients who underwent a
second PM[pulmonary metastasectomy] and patients who did not
required attention to factors associated with selection. To address this, we
used the matching weights method [ref]. This approach is a weighting
analogue to the 1:1 pair-matching method, although shown to be more
efficient, that provides better balance across covariates. Unlike 1:1 pair
matching, which excludes any unmatched patients, the matching weights
approach never discards any patients; instead, it only down-weights
some of the patients. The matching weights approach is a variant of the
inverse probability weights method; the matching weights can be
considered the probability of being selected to the matched data set. With
the application of the patient-level matching weights, each patient
contributes a fraction of itself to the overall cohort used in the analyses.
123
Chudgar NP et al: Ann Thorac Surg. 2017 Dec;104(6):1837-1845.

 We used propensity score weighting because it could produce one
interpretable overall treatment effect and would not diminish our sample
size. To estimate the average effect of treatment on individuals using
SGLT-2[sodium-glucose cotransporter 2] inhibitors, the average
treatment effect of the treated (ATT) weighting was applied; that is,
we compared the hazards of outcomes among individuals using SGLT-2
inhibitors with the hypothesized situation had they taken DPP-4[dipeptidyl
peptidase 4] inhibitors, GLP-1[glucagon-like peptide 1] agonists, or older
agents instead of SGLT-2 inhibitors. This approach is specifically useful
when systematic differences likely occur between the study sample and
the overall population.[ref]
124
記載例③処置群の平均処置効果のための
重み付け
Chang HY et al: JAMA Intern Med. 2018 Sep 1;178(9):1190-1198.

バランスの評価法
 標準化差の絶対値[1,2]
 分散⽐[2]
 図⽰[1,2]
 統計的検定[1,2]
 C統計量[1,2]
 適合度検定[1,2]
 オーバーラップ係数[1]
 コルモゴロフ–スミルノフ距離[2]
 レヴィ距離[2]
126
[1] Ali MS et al: J Clin Epidemiol. 2015 Feb;68(2):112-21.
推奨される⽅法

標準化差の絶対値
absolute Standardized Difference in means or proportions
定義
 Mt: 処置群の平均値
 Mc: 対照群の平均値
 sdt: 処置群の標準偏差
 sdc: 対照群の標準偏差
注意
SDに100を乗じて%表記する流派もある
127
Austin PC: Stat Med. 2009 Nov 10;28(25):3083-107.
 Mt: 処置群の割合
 Mc: 対照群の割合

標準化差の利⽤法
共変量ごとに標準化差を算出[1]
許容可能なバランスの判断基準として，0.10
あるいは0.25未満を利⽤[1,2]
共変量全体のバランス評価のために，個々の
共変量の標準化差の平均値を算出[1]
128

定義[1]
 sdt: 処置群の標準偏差
 sdc: 対照群の標準偏差
許容範囲[2]
厳しい基準: 0.8~1.2
緩い基準: 0.5~2.0
分散⽐
Variance ratio
129
[1] Austin PC: Stat Med. 2009 Nov 10;28(25):3083-107.

バランスが悪いときの対処法
共変量を増やす[1]
共変量間の交互作⽤項を追加[1]
量的変数の共変量の⾮線形性を検討[1]
⼆重ロバスト推定法を使う[2]
130
[1] Austin PC: Multivariate Behav Res 46: 399-424, 2011.

記載例①標準化差の⽅法と結果
⽅法
 Covariate balance between the two groups was assessed after
matching, and we considered an absolute standardized difference less
than 0.1 as evidence of balance.[ref]
結果
 We matched 99.5% of the haloperidol initiators to atypical antipsychotic
initiators (n=1659), and all covariates included in the propensity score
were well balanced after matching.
131

記載例②標準化差の表
132
マッチング前
マッチング後

記載例③判断基準の変更理由
 We calculated standardised differences to evaluate the balance of
variables in each predicted propensity score matched cohort. We first
regarded standardised differences less than 0.1 as having well
matched balance,[ref] but we could not achieve the value for the variable
of “defibrillation before matching” in the shockable cohort even with a
very narrow calliper width (0.001). When we attempted to achieve better
balancing of standardised differences (<0.1) by setting the calliper width
much narrower (<0.001), we lost a large number of patients. In the end,
we decided to avoid losing these patients by using a tight range of
target and chose a value of 0.25 rather than 0.1 of standardised
differences, as some statisticians have suggested,[ref] before doing our
final analyses.
133
Izawa J et al: BMJ. 2019 Feb 28;364:l430.

⽅法
 A standardised difference with an absolute value less than 0.10 and a
variance ratio between 4/5 and 5/4 was considered sufficient to
support the assumption of balance of the covariate between the
treatment groups [ref].
結果
 The absolute standardised differences were as high as 0.89, with 79% of
the covariates having an absolute standardised difference >0.10. The most
extreme variance ratio was 10.1 and 68% of the covariates had a variance
ratio <4/5 or >5/4. After matching, the largest absolute standardised
difference was 0.12 and only 11% of the covariates had an absolute
standardised difference >0.10. The most extreme variance ratio was 1.34
and only 7% of the covariates had a variance ratio <4/5 or >5/4. Thus,
the propensity score matching largely removed the imbalances in the
covariates (Figs. 1 and 2).
134
記載例④分散⽐の⽅法と結果
Jakobsen CJ et al: Eur J Cardiothorac Surg. 2009 Nov;36(5):863-8.

マッチング法における効果推定法
アウトカム対応なし[1-2] 対応あり[3-4]
量的変数独⽴な2群のt検定
U検定
⼀般化線形モデル
対応のある2群のt検定
ウィルコクソンの符号順位検定
⼀般化線形混合モデル
⼀般化推定⽅程式
質的変数 χ⼆乗検定
⼀般化線形モデル
マクネマー検定
AgrestiとMinの⽅法
条件付きロジスティック回帰分析
⼀般化線形混合モデル
⼀般化推定⽅程式
イベント発⽣
までの時間
⽐例ハザードモデル層別⽐例ハザードモデル
ロバスト推定
136
[1] Stuart EA: Stat Med. 2008 May 30;27(12):2062-5；[2] Stuart EA: Stat Sci. 2010 Feb 1;25(1):1-21.
[3] Austin PC: Stat Med. 2008 May 30;27(12):2037-49.; [4] Austin PC: Stat Med. 2011 May 20;30(11):1292-301.
対応を考慮すべきか︖

Elizabeth A. Stuartの主張
マッチング後の2群は，すべての共変量
において，良好にマッチングされている
ことが保証されないため，対応を考慮し
なくて良い
137
Stuart EA: Stat Med. 2008 May 30;27(12):2062-5
Stuart EA: Stat Sci. 2010 Feb 1;25(1):1-21

Peter C. Austinの主張
マッチング後の2群は独⽴ではないた
め，対応を考慮すべき
138
Austin PC: Stat Med. 2011 May 20;30(11):1292-301.

マッチング法で対応を考慮すべきかは未決着
 Although there is still some debate as to whether accounting for the
matched nature of the data is necessary, Austin and colleagues[ref]
advocate that accounting for the matched nature of the sample when
estimating the precision or significance of the treatment effect is
necessary, as matching was done after exposure.[1]
 Whether or not to account for the matched nature of the data in
estimating the variance of the treatment effect, for example, using
paired t-test for continuous outcome or McNemar’s test for binary
outcome, is an ongoing discussion.[ref][2]
139

重み付け法における効果推定法
140
アウトカム統計モデル
量的変数⼀般化線形モデル
(重回帰分析)
質的変数⼀般化線形モデル
(ロジスティック回帰分析)
イベント発⽣
までの時間
⽐例ハザードモデル
Linden A, Adams JL: J Eval Clin Pract. 2010 Feb;16(1):175-9.
重みを考慮する
標準誤差の推定には，ロバスト分
散かブートストラップ法を使う

傾向スコアによるマッチング後に
統計モデルで残余交絡を調整︖
従属変数
アウトカム
独⽴変数
処置変数 (Z) (処置群/対照群)
共変量 (X)
141
共変量
(X)
処置変数
(Z)
アウトカム
(Y)
質的変数/イベント発⽣までの時間
の場合，効果の解釈が困難になる
Austin PC: Stat Methods Med Res. 2017 Feb;26(1):201-222.

マッチング法の⼆重ロバスト推定法①
Doubly robust estimation/Double-adjustment
142
ステップ①マッチング後の対照群において
共変量注とアウトカムの回帰モデルを推定
注共変量ではなく傾向スコア投⼊することがある
残余交絡対策のため標準化差0.1以上の共変量調整が推奨
共変量
(X)
アウトカム
(Z)
処置変数
=対照群
Austin PC et al: Stat Methods Med Res. 2017 Feb;26(1):201-222.
Nguyen TL et al: BMC Med Res Methodol. 2017 Apr 28;17(1):78.

マッチング法の⼆重ロバスト推定法②
143
ステップ②処置群において
全員が対照群の場合の予測値を推定
共変量
(X)
予測値
( )
処置変数
=処置群

マッチング法の⼆重ロバスト推定法③
144
ステップ③アウトカムを推定
処置群のアウトカム
対照群のアウトカム

重み付け法における⼆重ロバスト推定法①
145
ステップ①標本全体で傾向スコアの推定 (e)
共変量
(X)
処置変数
(Z)
Funk MJ et al: Am J Epidemiol. 2011 Apr 1;173(7):761-7.
Li X, Shen C et al: Circ Cardiovasc Qual Outcomes. 2020 Jan;13(1):e006065.

重み付け法における⼆重ロバスト推定法②
146
ステップ②処置変数の群ごとに
共変量*とアウトカムの回帰モデルを推定
注共変量ではなく傾向スコア投⼊することがある
共変量
(X)
アウトカム
(Z)
共変量
(X)
アウトカム
(Z)
処置変数
=処置群
処置変数
=対照群
モデル1
モデル0

重み付け法における⼆重ロバスト推定法③
147
ステップ③標本全体において
全員が処置群の場合の予測値をモデル1
全員が対照群の場合の予測値をモデル0で推定
共変量
(X)
予測値
( )
処置変数
=処置群
モデル1
共変量
(X)
予測値
( )
処置変数
=対照群
モデル0

重み付け法における⼆重ロバスト推定法④
148
ステップ④アウトカムを推定
処置群のアウトカム

マッチング法の感度分析①
Rosenbaumの⽅法(1:1/⾮復元抽出)
149
測定された共変量
(X)
処置変数
(Z)
アウトカム
(Y)
未測定の共変量
(U)
未測定の共変量により処置変数の
アウトカムへの効果がなくなるか︖
Liu W et al: Prev Sci. 2013 Dec;14(6):570-80.

マッチング法の感度分析②
150
(X)
処置変数
(Z)
アウトカム
(Y)
(U)
未測定の共変量による処置変数への
効果 (Γ) を数パターン想定注
注 Γ ≥ 1
未測定の共変量によるアウト
カムへの効果を無限⼤と仮定注
注この仮定は理論上は緩められる

マッチング法の感度分析③
151
(X)
処置変数
(Z)
アウトカム
(Y)
(U)
未測定の共変量の影響を考慮した場合にお
ける，処置変数によるアウトカムへの効果
のp値の上限値 (と下限値) を求める

Γ p値の下限値 p値の上限値
1.0 0.008 0.008
1.1 0.003 0.023
1.2 < 0.001 0.050
1.3 < 0.001 0.092
1.4 < 0.001 0.151
152
マッチング法の感度分析④
原発性胆汁性胆管炎に対するペニシリンによる死亡抑制効果
未測定の共変量の効果がな
い場合は有意
未測定の共変量により処置
群になるオッズが20%上が
ると，有意でなくなる
Lu B et al: Stat Med. 2018 May 20;37(11):1846-1858.

重み付け法の感度分析①
Carenegieの⽅法
153
(X)
処置変数
(Z)
アウトカム
(Y)
(U)
未測定の共変量により処置変数の
アウトカムへの効果がなくなるか︖
Carnegie NB et al: Journal of Research on Educational Effectiveness, 9:3, 395-420, 2016

重み付け法の感度分析②
154
(X)
処置変数
(Z)
アウトカム
(Y)
(U)
未測定の共変量による処置変数への
効果 (ζz) を数パターン想定
未測定の共変量によるアウトカムへ
の効果 (ζy) を数パターン想定

重み付け法の感度分析③
155
(X)
処置変数
(Z)
アウトカム
(Y)
(U)
未測定の共変量の影響を考慮した場合にお
ける，処置変数によるアウトカムへの効果
と標準誤差を求める

156
重み付け法の感度分析④
未測定の交絡変数の影響がないときの効果 (ζy=0 [偏回帰係数])

157
重み付け法の感度分析⑤
測定された共変量のうち処置
変数とアウトカムへの影響が
最も⼤きい
統計的有意性がなくなる範囲

記載例①マッチング法の⽅法と結果
⽅法
 Relative risk (RR) was estimated as the ratio of the probability of the
outcome event in patients treated using the transcarotid approach
compared with patients treated using the transfemoral approach. The
95% CIs were constructed using methods that accounted for the
matched nature of the cohorts.[ref]
結果
158
Schermerhorn ML et al: JAMA. 2019 Dec 17;322(23):2313-2322.

記載例②重み付け法の⽅法と結果
⽅法
 For each outcome, 3 separate Cox proportional hazards regression
models with propensity score ATT weighting were constructed to
examine the association between the use of SGLT-2[sodium-glucose
cotransporter 2] inhibitors (relative to 3 reference groups) and the
outcome. We calculated robust estimates of SEs for all variables in the
models. [ref]
結果
159
Chang HY et al: JAMA Intern Med. 2018 Sep 1;178(9):1190-1198.

 To address potential for selection bias, we used augmented inverse
probability weighting (AIPW) propensity score methods to estimate
the average effect of drug benefit user group and PUM[potentially unsafe
medication] exposure. We used the teffects aipw command in STATA,
version 13 (StataCorp). Augmented inverse probability weighting
combines 2 models: inverse probability weighting in the drug benefit user
group selection propensity model with regression adjustment in the
outcomes model (PUM exposure) [ref]. When these 2 approaches are
combined, AIPW is called “doubly robust estimation” because only 1 of the
2 models needs to be correctly specified to obtain an unbiased estimator.
Specifically, logistic regression was used in the propensity model to
estimate the probability of belonging to either user group, and
weighted logistic regression and weighted linear regression were
used to model PUM exposure and the number of days of PUM
exposure, respectively [ref]. All covariates described here were included in
both the drug benefit user group selection and PUM exposure models. To
account for the highly skewed nature of the days of PUM exposure
variables, we estimated SEs and 95% CIs using a bias-corrected
bootstrap approach [ref].
160
記載例③⼆重ロバスト推定法の⽅法
Thorpe JM et al: Ann Intern Med. 2017 Feb 7;166(3):157-163.

⽅法
 In addition, we performed a sensitivity analysis to evaluate the impact
of an unmeasured confounder as previously described.[ref] Sensitivity
analysis was implemented using the R package Rbounds available at
http://cran.r-project.org/web/packages/rbounds/.
結果
 Furthermore, in a sensitivity analysis, the association between bivalirudin
use and vascular complications and GI bleed was robust enough to the
effect of an unmeasured confounder. The association with transfusion
was moderately robust, whereas the results corresponding to CABG
were only mildly robust (Table I in the online-only Data Supplement).
161
記載例④感度分析の⽅法と結果
Perdoncin E et al: Circ Cardiovasc Interv. 2013 Dec;6(6):688-93

考察
 However, our sensitivity analysis suggested that the antibleeding
efficacy estimates were fairly robust to the presence of an unmeasured
confounder and are likely extant.
付録
162
記載例⑤感度分析の考察と付録
Perdoncin E et al: Circ Cardiovasc Interv. 2013 Dec;6(6):688-93

⽬次
傾向スコア分析の４つの仮定
共変量の⽋測値処理
慣例
発展的なモデル
164

165
４つの仮定①Conditional exchangeability
定義
 : 参加者が対照群の場合に得られるアウトカム
 : 参加者が処置群の場合に得られるアウトカム
 参加者の処置変数
 参加者の共変量
 共変量が同じ値であるとき，処置変数の値と，得られ
るアウトカムは独⽴である条件付き独⽴
仮定の逸脱例
未測定の交絡変数がある
Pan W, Bai H: Propensity score analysis: fundamental and development. Guilford. 2015.

４つの仮定②Consistency
定義
 :参加者の観測されたアウトカム
 : 参加者の処置変数がによるである場合
に得られるアウトカム
ある参加者への処置( ) が「アスピリンの使⽤」
である場合に，未測定の交絡変数である⾷事とと
もに服⽤するか否か ( ) により，アウトカムが異
なる
166
Cole SR, Frangakis CE: Epidemiology. 2009 Jan;20(1):3-5

４つの仮定③Positivity
定義
観測された共変量が取りうる全ての組み合わせに
おいて，処置群と対照群の双⽅に参加者がいる。
決定論的な仮定の逸脱例
男性は⼦宮がないため，⼦宮摘出術の死亡率への
影響に関する研究で，処置群にならない。
確率的な仮定の逸脱例
167
Westreich D, Cole SR et al: Am J Epidemiol. 2010 Mar 15;171(6):674-7
31~35歳 36~40歳 41~45歳 46~50歳
アスピリン使⽤ 0 2 0 3
アスピリン⾮使⽤ 9 7 9 6

４つの仮定④No interference
定義
ある患者の処置は，他の患者のアウトカムに影響
しない
ある患者のワクチン接種が，他の患者の感染症発
症を予防する
168
Hernán MA: Stat Methods Med Res. 2012 Feb;21(1):3-5.

共変量の⽋測値処理①
完全ケース分析(complete case analysis)
⽋測値のない症例に限って分析する。
⽋測指標(missing indicator)
⽋測値のある共変量に1つの値 (例えば0) を代⼊する。
さらに，その⽋測の有無を⽰す⽋測指標を作成する。
共変量と⽋測指標の両者を傾向スコア推定のモデルに
含める。
単⼀代⼊法(single imputation)
⽋測値のある共変量に1つの値 (平均値や最頻値) を代
⼊して分析する。
169
Choi J et al: Eur J Epidemiol. 2019 Jan;34(1):23-36.

共変量の⽋測値処理②
多重代⼊法(multiple imputation)
170
⽋測値
を含む
データ
セット
代⼊されたデータ
セットを複数作成
データセット1
データセット2
データセットm
とマッチング
マッチング後の
データセット1
データセット2
データセットm
効果の推定
効果の推定値1
効果の推定値2
効果の推定値m
効果の統合
統合された
効果の推定値

共変量の⽋測値処理③
多重代⼊法と⽋測指標の併⽤(multiple imputation
together with missing indicator)
多重代⼊法により代⼊されたデータセットを複数作成
する。共変量の⽋測の有無を⽰す⽋測指標を作成す
る。代⼊された共変量と⽋測指標の両者を傾向スコア
推定のモデルに含める。
171

⽋測値処理の選択基準①
未測定の交絡=なし; 効果修飾=なし
タイプ MCAR MAR MNAR
完全ケース分析 ◎ ◎ ◎
⽋測指標 ✕ ✕ ✕
多重代⼊法注)
◎ ◎ ◎
多重代⼊法と⽋測
定指標の併⽤注) ◎ ◎ ◎
注) ⽋測値の代⼊⽣成モデルにアウトカムも含める
MCAR = missing completely at random
MAR = missing at random
MNAR = missing not at random
172

⽋測値処理の選択基準②
未測定の交絡=なし; 効果修飾=あり
173
タイプ MCAR MAR MNAR
完全ケース分析 ◎ ✕ ✕
⽋測指標 ✕ ✕ ✕
多重代⼊法注)
◎ ◎ ✕
定指標の併⽤注) ◎ ◎ ✕
注) ⽋測値の代⼊⽣成モデルに，処置変数と共変量との交互作⽤項，共変量
とアウトカムの交互作⽤項，処置変数とアウトカムの交互作⽤項を含める

⽋測値処理の選択基準③
未測定の交絡=あり; 効果修飾=なし
174
タイプ MCAR MNAR
完全ケース分析 ✕ △
⽋測指標 ✕ ✕
多重代⼊法注)
✕ ✕
定指標の併⽤注) ✕ △
注) ⽋測値の代⼊⽣成モデルにアウトカムも含める

⽋測メカニズム①MCAR*
*missing completely at random
175
X1
X2
X2
*
R
Z Y
⽋測指標
⽋測のある
共変量
⽋測発⽣は観測・⾮観測変数と独⽴
(例: X2の⽋測は，他の変数と無関係)

⽋測メカニズム②MAR*
*missing at random
176
X1
X2
X2
*
R
Z Y
⽋測指標
⽋測のある
共変量
⽋測発⽣は観測変数に依存
(例: X1の値が⾼い場合にX2が⽋測)

⽋測メカニズム③MNAR*
*missing not at random
177
X1
X2
X2
*
R
Z Y
⽋測指標
⽋測のある
共変量
⽋測発⽣は⾮観測変数に依存
(例: X2の値が⾼い場合にX2が⽋測)

慣例①標本サイズ
研究第1四分位中央値第3四分位
[1] ⼿術 2016-2018 (303編) 503 1803 6658
[2] がん⼿術 2014-2015 (306編) 307 699 2783
[3]⼿術 2013-2014 (129編) 348 904 4133
178
[2] Yao XI et al: J Natl Cancer Inst. 2017 Aug 1;109(8).

179
研究第1四分位中央値第3四分位
[1] 集中治療 2006-2009 (47編) 9 15 22
[2] 全体 1983-2003 (177編) 10 17 28
[3] 全体 2001 (47編) 8 17 27
[4] ⼿術 2013-2014 (129編) 7 12 18
[1] Gayat E et al: Intensive Care Med. 2010 Dec;36(12):1993-2003.
[2] Stürmer T et al: J Clin Epidemiol. 2006 May;59(5):437-47.
[3] Weitzen S et al: Pharmacoepidemiol Drug Saf. 2004 Dec;13(12):841-53.
慣例②共変量の数

発展的なモデル
3⽔準以上の処置変数
連続的な処置変数
時間依存性の処置変数
複雑な標本抽出法
マルチレベルなデータ構造
⾼次元傾向スコア
アウトカムの誤分類バイアス
180

Groseの提案①統計モデル
183
Grose E et al: J Am Coll Surg. 2020 Jan;230(1):101-112.e2.
解説(報告率)
傾向スコア推定の統計モデル (79%)
記載例
 Propensity scores for each patient were obtained from a multivariable
logistic regression model based on patient characteristics, year of
surgery, comorbidities, and hospital volume and location.

Groseの提案②共変量
184
傾向スコア推定に含めた共変量 (91%)
記載例
 The model included the following variables with pretreatment
characteristics: sex (male or female), age (≤17 years, 18-64 years, or ≥65
years), witness (witnessed or unwitnessed), bystander CPR (any CPR or no
CPR), first rhythm (ventricular fibrillation, ventricular tachycardia, pulseless
electrical activity, asystole, or others), and response time (<10 minutes or
≥10 minutes).

Groseの提案③共変量の選択根拠
185
傾向スコア推定に含めた共変量の選択根拠 (10%)
記載例
 The 2 mesh groups were propensity-score matched using factors that
have been shown previously to be associated with increased risk of 30-
day wound events after ventral hernia repair.
 Variables were selected from an initial univariate analysis comparing
the surgery and chemotherapy groups, and variables that differed
significantly between the 2 groups were chosen for propensity matching.

Groseの提案④マッチング後の標本サイズ
186
マッチング後の標本サイズ (94%)
記載例
 After co-variable adjustment, 31 of the 37 patients in the hepatectomy+
RFA group were matched 1:3 with 93 of the 516 patients in the
hepatectomy-alone group.

Groseの提案⑤マッチング法の詳細
187
①アルゴリズム (74%)
②構成⽐ (97%)
③抽出法 (30%)
記載例
 Patients were matched (1:1) using the nearest neighbor method
without replacement and a caliper width of 0.2 of the standard
deviation of the logit of the estimated propensity score.
 After score calculation, we performed 1:1 matching using a greedy
nearest-neighbor algorithm without replacement of the remaining 88
GDP patients to 88 control patients using a caliper of 20% of the logit
of the score’s standard deviation.

Groseの提案⑥バランスの評価
188
共変量バランスの評価 (52%)
記載例
 Standardized differences were estimated before and after matching to
evaluate the balance of covariates; small absolute values <0.10 SD
indicated balance between the cohorts.
 Standardized differences greater than 0.2 were considered to indicate
large imbalance among covariates used in the propensity score for
matching.

Groseの提案⑦効果の推定法
189
対応を考慮した効果の推定法 (56%)
記載例
 The analysis compared matched pairs using McNemar test for
categorical variables, paired t tests for symmetrically distributed
variables, and Wilcoxon signed-rank test for skewed continuous
variables.

Groseの提案⑧予測の精度
190
傾向スコア推定モデルの予測精度 (21%)
記載例
 We tested discrimination of the propensity model with the c-statistic.

Groseの提案⑨代表性の評価
191
マッチングできなかった症例の特性 (15%)
記載例
 Details of outcome data for the overall and unmatched population are
presented in TABLE E2.

Lonjonの提案①共変量の選択根拠
解説
①すべてのベースライン特性を投⼊する
②領域固有の知識や統計量により変数選択する
記載例
 Preoperative risk factors and demographic and operative variables were
entered in the propensity models irrespective of their significance (all
factors in Table 1).
 The factors considered to be the most important confounders also
contributing to deep-infection risk were chosen for the propensity-score
algorithm. [. . .] These factors were chosen, based on consensus among
the investigators, as the factors most important for predicting later
infection but also as those most divergent between the immediate and
delayed-closure groups (Table II).
192
Lonjon G et al: Ann Surg. 2017 May;265(5):901-909.

Lonjonの提案②ITT分析
解説
RCTと同様に，効果推定値を過⼤評価しないために，
ITT分析が望ましい
記載例
 Every unplanned extension of an incision for any task other than retrieving
the specimen was considered to be a conversion. Laparoscopic operations
that had to be converted to open surgery were analyzed according to the
intention-to-treat principle.
193

Lonjonの提案③⽋測値処理
解説
①⽋測のある症例を除外
②⽋測をカテゴリ化
③多重代⼊法
記載例
 Consistent with previously established methods specific to addressing
missing data in propensity score calculations, a separate category
for ’unknown’ was created for missing data for nominal variables.
 Missing data were infrequent (5% on any variable). We performed
additional analyses using various missing data statistical approaches
including multiple imputation and weighted estimating equations.
194

Lonjonの提案④マッチング法の詳細
解説
①アルゴリズム
②構成⽐
③抽出法
記載例
 A 1:1 match on the propensity score, without replacement, was
performed using the psmatch2 procedure, with a conservative caliper
width of 20% of the standard deviation of the log of propensity score.
195

Lonjonの提案⑤バランスの評価法
解説
標準化差が10%未満であることが望ましい
記載例
 We estimated standardised differences for all covariates before and
after matching, with a standardised difference of 10% or more
considered indicative of imbalance.
196

Lonjonの提案⑥効果の推定法
解説
対応のある効果の推定法が望ましい
記載例
 Continuous outcomes were compared in the PS-matched groups using
paired t-tests or Wilcoxon signed rank test as appropriate; differences
in proportions were compared using the McNemar’s test.
197

Lonjonの提案⑦代表性の評価
解説
代表性を評価するために，マッチング前後の共変
量を報告すべきである
記載例
198

Yaoの提案①
199
報告すべき点
標題・要旨
傾向スコアを利⽤していることを記載する
⽅法
バイアスに対処法するために，どのように傾向スコア分析を利⽤
したかを記載する
マッチング法など傾向スコアの利⽤法を記載する
ロジスティック回帰分析など傾向スコアの推定法を記載する
傾向スコアの推定に⽤いた変数を記載する
共変量選択の⼿続きを説明する
Yao XI et al: J Natl Cancer Inst. 2017 Aug 1;109(8).

Yaoの提案②
200
報告すべき点
⽅法
マッチング法の場合，①アルゴリズムとキャリパー，②構成⽐，
③抽出法，④対応のあるデータの分析法，⑤バランスの評価法を
記載する
重み付け法の場合，バランスの評価法を記述する
層別化の場合，①層の数，②バランスの評価法を記述する
傾向スコア分析の前提条件の評価法を記述する
傾向スコアの推定の際に，⽋測値をどのように扱ったかを説明す
る

Yaoの提案③
201
報告すべき点
結果
マッチング法の場合，マッチング前後の処置変数ごとに標本サイ
ズを報告する
傾向スコアの利⽤前の共変量の分布を記述する
傾向スコアの利⽤後の共変量の分布と⽐較可能性を記述する
関⼼のある変数ごとに⽋測値の数を報告する
傾向スコア分析の推定値と精度を報告する
調整前後の効果推定値と精度を報告する

Yaoの提案④
202
報告すべき点
考察
共変量バランスが取れていたか否かを議論し，慎重に解釈するこ
と
マッチング法の場合，マッチング後の標本サイズ減少による影響
を議論すること (特に減少率が50%を超える場合)

Stuartの提案①
203
報告すべき点記載箇所
介⼊群，対照群，アウトカムと標本の単位 (例: 患者や
病棟) の定義を含めることにより，関⼼のある因果的
疑問を明瞭に述べること。
I, M, D
関⼼のある効果の種類 (平均処置効果/処置群の平均処
置効果) と共に，アウトカムの対⽐ (リスク⽐/オッズ
⽐/リスク差など) を明瞭にすること。
I, M, R, D
解析対象集団に⾄るまでの適格基準と除外理由を明記
すること。なお，適格基準は，処置前変数により定義
しなければならない。
M
I = introduction; M = methods; R= results; D =discussion
Stuart EA: Propensity scores and matching methods. “The Reviewer’s Guide to
Quantitative Methods in the Social Sciences“ Routledge. 2018.

Stuartの提案②
204
共変量選択の合理性と共に，すべての共変量の定義を
明瞭にすること。処置後変数の調整に関⼼がある場合
は，それに合致した統計⼿法を使うこと。
M
傾向スコア分析の重要な仮定として，未測定の交絡変
数がないということを⽅法で述べ，その仮定の尤もら
しさについて考察で述べること。
M, D
傾向スコア推定のモデルを明瞭にすること。パラメト
リック⼿法では交互作⽤項などの指定，ノンパラメト
リック⼿法では中⽌基準などの指定を明記すること。
M

Stuartの提案③
205
マッチング法，重み付け法，層化法の詳細など，傾向
スコアの利⽤法を，その合理性と共に明瞭にすること。
M
特定の共変量を，正確なマッチング法など標準的では
ない⽅法で扱った場合，それを明記すること。
M
傾向スコア分析と効果の推定に使った，ソフトウェア
の名称とバージョンを⽰すこと。
M
関係する場合，その他の傾向スコアの利⽤法の試みと，
主解析選択の合理性を議論すること。なお，共変量バ
ランスが最も良好な傾向スコアの利⽤法を主解析とす
ることは，頻繁に⽤いられている。
M, R

Stuartの提案④
206
すべての項⽬の⽋測値の量と処理法を報告すること。 M
効果の推定⽅法を報告すること。なお，効果の推定に
おいて，共変量調整をしているか否か，該当する場合
は，その共変量を明記すること。
M
処置群と対照群ごとに，傾向スコア分析による調整前
後の標本サイズを報告すること。
R

Stuartの提案⑤
207
傾向スコア分析による調整前後における，傾向スコア
の共通領域を図⽰すること。
R
傾向スコア分析による調整前後における共変量のバラ
ンスを，それが許容⽔準にあるかと共に⽰すこと
R
未測定の交絡変数に対する感度分析の⽅法と結果を記
述し，考察すること。
M, R, D

Aliの提案①
208
報告すべき点
共変量の選択
共変量の選択法
実証的知⾒を考慮したか否か
処置変数またはアウトカムとの関連を考慮したか否か
傾向スコアの推定法
傾向スコアの推定法(例: ロジスティック回帰分析)
傾向スコアの推定に含めた変数または交互作⽤項

Aliの提案②
209
報告すべき点
マッチング法
マッチングのアルゴリズム，構成⽐，キャリパーの指定
復元抽出か否か，対応を考慮した分析か否か
マッチング前後の標本サイズ
マッチングされなかった標本サイズと特性
マッチング前後の処置群と対照群のベースライン特性
マッチング後の共変量のバランス

Aliの提案③
210
報告すべき点
重み付け法
逆確率重み付けと安定化した重みの範囲 (平均値，最⼤値，
最⼩値)
重みの定義
重みをトランケーションしたか否かと，その⽅法
重み付けられた集団における共変量のバランス

Aliの提案④
211
報告すべき点
共変量バランス評価の指標
傾向スコアの利⽤後にバランスが保てているか否か
効果の推定
統計⼿法
追加の共変量調整をしたか否か
感度分析を⾏ったか否か
効果の解釈
効果の種類(平均処置効果/処置群の平均処置効果)に即した解釈

Austinの提案
212
報告すべき点報告率[1] 報告率[2] 報告率[3]
(1)-① 共変量の選択根拠 ―― ―― 96% (45/47)
(1)-② 傾向スコアの推定 ―― ―― ――
(2) マッチング法の記述 48% (29/60) 55% (24/44) 68% (32/47)
(3) バランスの確認 0% (0/60) 0% (0/44) 4% (2/47)
(4) 効果の推定 13% (8/60) 25% (11/44) 28% (13/47)
[1] Austin PC: J Thorac Cardiovasc Surg 134:1128-35, 2007.
[2] Austin PC: Circ Cardiovasc Qual Outcomes 1: 62-7, 2008.
[3] Austin PC: Stat Med 27:2037-49, 2008.

データ解析環境R①マッチング法
213

マッチング法の流れ
① データの読み込み
② マッチング前のバランス評価
③ 傾向スコアの推定
④ 傾向スコアの利⽤
⑤ 傾向スコアの分布
⑥ マッチング後のバランス評価
⑦ マッチング後のデータ作成
⑧ 効果の推定
⑨ ⼆重ロバスト推定法
⑩ 感度分析
214

マッチング法関係のパッケージ
215
パッケージ名主な関数利⽤⽬的
Matching Match 傾向スコアの利⽤
twang ps 傾向スコアの推定
(Generalized Boosted Regression)
cobalt bal.plot
bal.tab
love.plot
傾向スコアの分布
rbounds psens 感度分析

説明⽤データセット
data(lalonde)
職業訓練プログラムの有効性
216
変数名説明コード役割
treat 職業訓練プログラム 1=処置群; 0=対照群処置変数
age 年齢連続量共変量
educ 教育年数連続量共変量
race ⼈種⿊⼈; ヒスパニック; ⽩⼈共変量
married 婚姻形態 1=既婚; 0=その他共変量
nodegree 学位 1=学位なし; 0=その他共変量
re74 1974年の所得 (処置前) 連続量共変量
re75 1975年の所得 (処置前) 連続量共変量
re78 1978年の所得 (処置後) 連続量アウトカム

①データの読み込み
217

bal.tab(formula, data, disp.mean=T, disp.sd=T,
binary="std", continuous="std", disp.v.ratio=T)
formula: 処置変数~共変量
data: データフレーム
disp.mean=T: 平均値の表⽰
disp.sd=T: 標準偏差の表⽰
binary=“std“: ⼆分変数について標準化差の表⽰
continuous=“std“: 連続変数について標準化差の表⽰
disp.v.ratio=T: 分散⽐の表⽰
218
②マッチング前のバランス評価: 関数

②マッチング前のバランス: アウトプット
219
対照群の
平均値と標準偏差
処置群の
標準化差
分散⽐

glm(formula, data, family=binomial)
family=binomial: ロジスティック回帰分析
glm()の返り値
fitted.values: 予測値 (傾向スコア)
220
③傾向スコアの推定: 関数

③傾向スコアの推定: アウトプット
221
ロジスティック回帰により
傾向スコアを推定
傾向スコアの推定値を
ロジット変換

ps(formula, data, n.trees, interaction.depth,
shrinkage, stop.method=“es.mean”, verbose=F)
n.trees: ⽊の推定回数
interaction.depth: 交互作⽤項の次数
shirinkage: 学習速度のパラメータ (0.001~0.1の範囲に指
定することが多い)
stop.method=“es.mean”: 標準化差の平均値の最⼩化を
基準とした，反復計算中⽌法
verbose=F: 計算過程を表⽰しない
222
③傾向スコアの推定: 関数
警告が出る場合は，n.treesを増やす，
あるいは，shirinkageを⼩さくする

③傾向スコアの推定: アウトプット
223
Boostingにより
傾向スコアの推定値を
ロジット変換

④傾向スコアの利⽤: 関数
224
Match(Y, Tr, X, replace=F, caliper=0.2, M=1,
ties=F)
Y: アウトカムのベクトル
Tr: 処置変数のベクトル (1=処置群; 0=対照群)
X: 距離 (ロジット変換した傾向スコア) のベクトル
replace=F: ⾮復元抽出
caliper=0.2: 距離の標準偏差に0.2を乗じた値
M=1: 処置群1 vs. 対照群1のマッチング
ties=F: 同⼀の傾向スコアの対照群がいる場合に，1名を
ランダムに選択

④傾向スコアの利⽤: アウトプット
225
マッチング法の利⽤
(キャリパー0.2/1:1/⾮復元抽出)
効果の推定マッチング前後
の標本サイズ

⑤傾向スコアの分布: 関数
226
bal.plot(obj, treat, covs, var.name, which="both",
type="histogram", mirror=T)
obj: Match()で返されるオブジェクト
treat: 処置変数のベクトル (1=処置群; 0=対照群)
covs: 傾向スコアのデータフレーム
var.name: 傾向スコアの変数名
which=“both”: 調整前後の分布
type=“histogram”: ヒストグラムを表⽰
mirror=T: 処置群と対照群を対⾯に表⽰

⑤傾向スコアの分布: アウトプット
227
マッチング前の分布マッチング後の分布

 bal.tab(M, formula, data, disp.mean=T, disp.sd=T,
binary="std", continuous="std", disp.v.ratio=T)
M: Match()で返されるオブジェクト
disp.mean=T: 平均値の表⽰
disp.sd=T: 標準偏差の表⽰
continuous=“std“: 連続変数について標準化差の表⽰
disp.v.ratio=T: 分散⽐の表⽰
228
⑥マッチング後のバランス評価: 関数

229
対照群の
処置群の
標準化差
分散⽐
⑥マッチング後のバランス: アウトプット

love.plot(x, formula, data, binary="std", abs=T,
threshold=0.1)
x: Match()で返されるオブジェクト
abs=T: 絶対値の表⽰
threshold=0.1: カットオフとして0.1にハイライト
230
⑥マッチング後のバランス評価: 関数

231
⑥バランス評価: アウトプット
マッチング前後の
標準化差

232
⑦マッチング後のデータ作成: 返り値
Match()の返り値
index.treated: 処置群の⾏番号
index.control: 対照群の⾏番号

233
⑦マッチング後のデータ作成: アウトプット
マッチング後の処置群と
(横持ちデータ)

234
⑦マッチング後のデータ作成: アウトプット
マッチング後の処置群と
(縦持ちデータ)

t.test(x, y, paired, var.equal = T)
x: 数値ベクトル (処置群のアウトカム)
y: 数値ベクトル (対照群のアウトカム)
paired=T: 対応のある検定
=F: 独⽴な2群の検定
var.equal=T: t検定
235
⑧効果の推定: 関数

236
⑧効果の推定: アウトプット
信頼区間
平均値差

237
⑨⼆重ロバスト推定法: オリジナル関数
dr.att(data, idx, yvar, tvar, covar, family)
data: マッチング後のデータフレーム
idx: ブートストラップ法のためのインデックス (信頼
区間を求めない場合は，1:nrow(data)と指定)
yvar: アウトカムの変数名
tvar: 処置変数の変数名
covar: 共変量の変数名
family=“binomial“: アウトカムが⼆分変数
=“gaussian”: アウトカムが量的変数

238
⑨⼆重ロバスト推定法: アウトプット
⼆重ロバスト推定法
による平均値差

臨床疫学研究における傾向スコア分析の使い⽅〜観察研究における治療効果研究〜

臨床疫学研究における傾向スコア分析の使い⽅〜観察研究における治療効果研究〜

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 臨床疫学研究における傾向スコア分析の使い⽅〜観察研究における治療効果研究〜

Ähnlich wie 臨床疫学研究における傾向スコア分析の使い⽅〜観察研究における治療効果研究〜 (20)

Mehr von Yasuyuki Okumura

Mehr von Yasuyuki Okumura (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)