SlideShare ist ein Scribd-Unternehmen logo
1 von 112
Downloaden Sie, um offline zu lesen
Statistical Approaches to the Inverse Problem of
Scatterometry
vorgelegt von
Diplom-Mathematiker
Mark-Alexander Henn
aus Mainz
Von der Fakult¨at II – Mathematik und Naturwissenschaften
der Technischen Universit¨at Berlin
zur Erlangung des akademischen Grades
Doktor der Naturwissenschaften
– Dr. rer. nat. –
genehmigte Dissertation
Promotionsausschuss:
Vorsitzende: Prof. Dr. rer. nat. Ulrike Woggon
Berichter: Prof. Dr. rer. nat. Markus B¨ar
Berichter: Prof. Dr. rer. nat. Harald Engel
Berichter: Dr. rer. nat. habil. Andreas Rathsfeld
Tag der wissenschaftlichen Aussprache: 5. Juli 2013
Berlin 2013
D 83
iii
“Each of us, deep down, believes that the whole world issues from his
own precious body, like images projected from a tiny slide onto an
earth-sized screen. And then, deeper down, each of us knows he’s wrong.”
Chad Harbach, The Art of Fielding (2011)
Abstract
In the present work statistical approaches to the inverse problem of scatterometry
are discussed. Scatterometry is the dimensional characterization of periodic nano-
structures as they are used in the manufacturing of lithographic masks. In contrast to
direct imaging methods, such as electron microscopy, scatterometry is a non-imaging
indirect measuring method. The critical dimensions (CDs) such as line widths and
heights of the surface profile are determined from the measured light diffraction pat-
tern.
The classical way to solve the inverse problem is the least squares (LSQ) approach.
Starting with a model function that depends on the parameters to be reconstructed,
the norm of the difference between the measured and simulated data is minimized. The
right choice of weights that account for the variances in the measurement data plays
a crucial role here, as an inappropriate choice of weights may cause an unsatisfactory
reconstruction and furthermore an overestimation of the associated uncertainties of
the reconstructed parameters.
Therefore the maximum likelihood estimation (MLE) is introduced as a method to
solve the inverse problem of scatterometry. By doing this, it is possible to determine
the variances of the measurement data in addition to the determination of the critical
dimensions. In the case of a simplified model function, in which significant effects
are not considered, MLE yields estimates for the variances of the measurement data
that are way too large. Thus the present work investigates two types of systematic
errors and their effect on the measured diffraction pattern. These errors stem from
line roughness and variations of the absorbing structure beneath the periodic line
structure.
It is shown how the estimated variances for the measurement data reduce if the
systematic errors are included into the model function. Furthermore this procedu-
re yields estimates for the critical dimensions that are consistent with results from
alternative measurement methods.
In the last part an example for a Bayesian approach to solving the inverse problem
of scatterometry is given. In contrast to LSQ and MLE, the solution to the inverse
problem in Bayesian terms is not a single estimate for the parameters of interest but
rather their corresponding probability distribution. An advantage of the Bayesian
v
vi Abstract
approach is that information about the critical dimensions obtained by alternative
methods can be incorporated as prior knowledge. It is demonstrated that several
measurement methods can be combined. As a result the uncertainties for the critical
dimensions can be drastically reduced.
Zusammenfassung
In der vorliegenden Arbeit werden verschiedene statistische Verfahren zur L¨osung
des inversen Problems in der Scatterometrie vorgestellt. Scatterometrie bezeichnet
hierbei die dimensionelle Charakterisierung periodischer Nanostrukturen wie sie zum
Beispiel in der Herstellung von Lithographiemasken benutzt werden. Im Gegensatz
zu bildgebenden Verfahren, etwa der Elektronenmikroskopie, handelt es sich bei der
Scatterometrie um eine indirekte Messmethode, d.h. aus den winkelabh¨angig gemesse-
nen Streulichtintensit¨aten (dem Beugungsmuster) werden kritische Dimensionen wie
z.B. Linienbreiten oder H¨ohen der Probe berechnet.
Der klassische Weg das inverse Problem zu l¨osen besteht darin, es als Regressions-
problem im Sinne der Methode der kleinsten Quadrate zu interpretieren. Ausgehend
von einer Modellfunktion, die von den zu rekonstruierenden Parametern abh¨angt,
wird der Abstand zwischen den gemessenen Daten und den vom Modell berechneten
Werten minimiert. Eine dabei erforderliche Gewichtung der unterschiedlichen Mess-
werte, im Sinne der f¨ur diese zu erwartenden Varianzen in der Messung, spielt eine
große Rolle. Im Falle nicht genau bekannter Gewichte k¨onnen die rekonstruierten
Parameter stark von den tats¨achlichen Werten abweichen und auch die f¨ur die rekon-
struierten Parameter gesch¨atzten Unsicherheiten sind unter Umst¨anden deutlich zu
groß.
In dieser Arbeit wird daher zun¨achst die Maximum Likelihood Methode (MLE)
als Verfahren zur L¨osung des inversen Problems angewandt. Diese erm¨oglicht es, ne-
ben den kritischen Dimensionen der Probe, auch die Varianzen der Messwerte zu
sch¨atzen. F¨ur zu stark vereinfachte Modellfunktionen die dazu f¨uhren, dass bestimm-
te signifikante Effekte nicht ber¨ucksichtigt werden, liefert MLE jedoch Sch¨atzungen
f¨ur die Varianzen der Eingangsdaten die deutlich zu groß sind. Daher werden in der
vorliegenden Arbeit zwei Arten systematischer Fehler und deren Einfluss auf die Mess-
werte untersucht. Hierbei handelt es sich um die sogenannte Linienrauheit und um
Variationen der Absorberstruktur auf die die Linien aufgebracht werden.
Es wird gezeigt, dass sich die gesch¨atzten Varianzen der Messdaten deutlich redu-
zieren, wenn die Modellfunktion um die beiden o.g. systematischen Effekte erweitert
wird, und dass auch die rekonstruierten kritischen Dimensionen konsistent zu den
Ergebnissen alternativer Messmethoden sind.
vii
viii Zusammenfassung
Im letzten Teil dieser Arbeit wird schließlich ein Ausblick auf den Bayes’schen
Ansatz zur L¨osung des inversen Problems gegeben. Im Gegensatz zu den beiden vor-
her diskutierten Ans¨atzen, geht es bei der Bayes’schen Methode nicht darum einen
einzelnen Wert f¨ur die kritischen Dimensionen zu bestimmen, es wird vielmehr die
Wahrscheinlichkeitsverteilung der Parameter bestimmt. Ein großer Vorteil dieser Me-
thode ist es, dass Informationen ¨uber die kritischen Dimensionen, die ¨uber alterna-
tive Messverfahren bestimmt wurden, als Vorwissen eingebracht werden k¨onnen und
es damit erm¨oglicht wird verschiedene Messverfahren zu kombinieren, um die den
kritischen Dimensionen zugeordneten Unsicherheiten zu reduzieren.
Contents
Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Citations to Previously Published Work . . . . . . . . . . . . . . . . . . . xvii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
1 Introduction 1
2 Preliminaries 7
2.1 Experimental Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Mathematical Modeling of Scatterometry . . . . . . . . . . . . . . . . 11
2.3 Inverse Problem Theory . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Maximum Likelihood and Least Squares 29
3.1 Measurement Error Model . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 The Effect of Systematic Errors on Scatterometry 41
4.1 Line Edge and Line Width Roughness . . . . . . . . . . . . . . . . . 42
4.2 Multilayer System Variations . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 The Effect of Systematic Errors on the Reconstruction using MLE 55
5.1 Maximum Likelihood Estimation and Model Selection . . . . . . . . . 56
5.2 Results for the EUV Mask . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Results for the MoSi Mask . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
ix
x Contents
6 Bayesian Approach to Scatterometry 71
6.1 Approximation of the Likelihood Function . . . . . . . . . . . . . . . 72
6.2 Using Prior Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Bayesian Approach to EUV Scatterometry . . . . . . . . . . . . . . . 74
6.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7 Summary, Conclusions and Outlook 81
7.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Bibliography 85
List of Figures
1.1 Sizes of semiconductor manufacturing process nodes . . . . . . . . . . 2
2.1 Scheme of a scatterometric setup . . . . . . . . . . . . . . . . . . . . 8
2.2 Scheme of the spectroscopic reflectometer . . . . . . . . . . . . . . . . 10
2.3 Scheme of the goniometric reflectometer . . . . . . . . . . . . . . . . 11
2.4 Scheme of the computational domain . . . . . . . . . . . . . . . . . . 12
2.5 Cross section of the EUV mask . . . . . . . . . . . . . . . . . . . . . 15
2.6 Cross section of the MoSi mask . . . . . . . . . . . . . . . . . . . . . 17
3.1 χ2
in dependence on CDs for different b/a . . . . . . . . . . . . . . . 32
3.2 Reconstructed CDs and SWA in dependence on b/a for simulated data 32
3.3 Reconstructed noise parameter a and b/a for simulated data . . . . . 34
3.4 Reconstructed SWAs for LSQ and MLS for simulated data . . . . . . 34
3.5 RMSD and mean estimated standard deviations for LSQ and MLE . 35
3.6 Reconstructed CDs and SWA in dependence on b/a for dataset D4 . . 36
3.7 Reconstructed noise parameter a and b/a for measured EUV data . . 36
3.8 Reconstructed SWAs for measured EUV data . . . . . . . . . . . . . 37
3.9 Reconstructed noise parameter a and b/a for measured DUV data . . 38
3.10 Reconstructed SWA for measured DUV data . . . . . . . . . . . . . . 38
4.1 AFM image showing LEWR . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Super cell design for LER/LWR computations . . . . . . . . . . . . . 43
4.3 Simulated diffraction patterns for perturbed line-space structures in
EUV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Normalized deviations from the efficiencies of the unperturbed refer-
ence line structure in EUV . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Standard deviations relative to the mean perturbed efficiencies in EUV 46
4.6 Effect of LEWR on the transmitted modes for the MoSi mask I . . . 49
4.7 Effect of LEWR on the transmitted modes for the MoSi mask II . . . 49
4.8 Effect of LEWR on the reflected modes for the MoSi mask I . . . . . 50
4.9 Effect of LEWR on the reflected modes for the MoSi mask II . . . . . 50
xi
xii List of Figures
4.10 Normalized deviations from the efficiencies of the unperturbed refer-
ence line structure in DUV . . . . . . . . . . . . . . . . . . . . . . . . 51
4.11 Effect of MLS variations on simulated efficiencies . . . . . . . . . . . 53
5.1 Reconstructed SWAs for the different models for simulated data . . . 59
5.2 RMSD from the actual value and square root of mean of the estimated
variances for the different models . . . . . . . . . . . . . . . . . . . . 59
5.3 Reconstructed roughness parameter σr and noise parameters a and b
for the different models for simulated data . . . . . . . . . . . . . . . 60
5.4 Likelihood-ratio test values for simulated data . . . . . . . . . . . . . 60
5.5 Dependency of log-likelihood on heights for dataset D4 and model M3 62
5.6 Reconstructed SWAs for the different models for measured EUV data 63
5.7 Reconstructed top CD deviation for the different models for measured
EUV data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.8 Reconstructed roughness parameter σr and noise parameters a and b
for measured EUV data . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.9 Comparison of reconstructed SWAs to 3D/AFM data . . . . . . . . . 65
5.10 Likelihood-ratio test values for measured EUV data . . . . . . . . . . 66
5.11 Reconstructed SWAs for measured DUV data with and without LEWR
correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.12 Reconstructed roughness parameter σr and noise parameters a and b
for measured DUV data . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.13 Likelihood-ratio values for measured DUV data . . . . . . . . . . . . 68
6.1 Distribution of the geometry parameters for field H5 I . . . . . . . . . 76
6.2 Distribution of the geometry parameters for field H5 II . . . . . . . . 77
6.3 Weights for Lapprox and πpost for different prior variances for field H5 . 78
6.4 Dependence of posterior estimators on prior knowledge for field H5 . 79
6.5 Dependence of posterior standard deviation on prior knowledge for field
H5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
List of Tables
2.1 Details of the EUV mask I . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Details of the MoSi mask . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Design values of the EUV mask I . . . . . . . . . . . . . . . . . . . . 35
4.1 Geometric parameters and optical constants for LER/LWR computations 44
4.2 Means and standard deviations for the MLS parameters . . . . . . . . 52
5.1 Models used for MLE with EUV data . . . . . . . . . . . . . . . . . . 57
5.2 Details of the EUV mask II . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Design values of the EUV mask II . . . . . . . . . . . . . . . . . . . . 61
5.4 Optimized parameters using different starting values for dataset D4
and model M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.5 Results of the reconstruction using different initial values for measured
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.6 Square roots of the mean of the estimated variances for the geometry
parameters for measured data . . . . . . . . . . . . . . . . . . . . . . 65
5.7 Models used for MLE with DUV data . . . . . . . . . . . . . . . . . . 66
6.1 Mean values and standard deviations from 3D/CD-AFM analysis of
field H5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Standard deviations of the geometry parameters for different priors for
field H5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
xiii
List of Symbols
Rn
n-dimensional vector space of real numbers
In Identity matrix of size n
· Euclidean norm in Rn
Z Set of integers
C Set of complex numbers
∅ Empty set
i Imaginary unit
Re z Real part of z
Im z Imaginary part of z
µ0 Permeability of free space
ǫ Permittivity
E Electric field
H Magnetic field
∆ Laplace operator
∂n Normal derivative
ω Angular frequency
p Parameter vector
y Measurement data vector
J Jacobian
L Likelihood function
I Fisher information matrix
X Random variable
ˆx Estimate of X
xiv
List of Symbols xv
u (ˆx) Uncertainty associated with ˆx
π Probability density function
N (µ, σ2
) Univariate normal distribution with mean µ and variance σ2
Σ Covariance matrix
σr Roughness parameter
List of Abbreviations
AFM Atomic force microscopy
CD Critical dimension
DUV Deep ultraviolet
EUV Extreme ultraviolet
FEM Finite element method
LER Line edge roughness
LWR Line width roughness
L:S Line-to-space ratio
LSQ Least squares
MLE Maximum likelihood estimation
MLS Multilayer system
PDE Partial differential equation
PDF Probability density function
PTB Physikalisch-Technische Bundesanstalt
(National metrology institute of Germany)
RCWA Rigorous coupled-wave analysis
RMSD Root-mean-square deviation
SVD Singular value decomposition
SWA Sidewall angle
TE Transverse electric polarization
TM Transverse magnetic polarization
xvi
Citations to Previously Published
Work
Some results from Chapter 3 appear in:
“A maximum likelihood approach to the inverse problem of scatterome-
try,”
M.-A. Henn, H. Gross, F. Scholze, M. Wurm, C. Elster and M. B¨ar,
Optics Express 20, 12 (2012)
Results from Chapter 4 have already been published in the following three papers:
“Modeling of line roughness and its impact on the diffraction intensities
and the reconstructed critical dimensions in scatterometry,”
H. Gross, M.-A. Henn, S. Heidenreich, A. Rathsfeld and M. B¨ar,
Appl. Opt. 51, 30 (2012)
“Improved grating reconstruction by determination of line roughness in
extreme ultraviolet scatterometry,”
M.-A. Henn, S. Heidenreich, H. Gross, A. Rathsfeld, F. Scholze and M.
B¨ar,
Optics Letters 37, 24 (2012)
“The effect of line roughness on DUV scatterometry,”
M.-A. Henn, S. Heidenreich, H. Gross, B. Bodermann and M. B¨ar,
Proc. SPIE 8789, (2013)
Some of the results from Chapter 5 have been published in:
“Improved Reconstruction of Critical Dimensions in Extreme Ultraviolet
Scatterometry by Modeling Systematic Errors,”
M.-A. Henn, H. Gross, S. Heidenreich, F. Scholze, C. Elster and M. B¨ar,
Meas. Sci. Technol. 25, 4 (2014)
xvii
Acknowledgments
First I would like to thank my supervisor Prof. Markus B¨ar and my dear col-
leagues at the Physikalisch-Technische Bundesanstalt (PTB), Dr. Clemens Elster,
Dr. Hermann Groß and Dr. Sebastian Heidenreich, for their support, suggestions
and fruitful comments during my work on this thesis. Special thanks to Dr. Bernd
Bodermann, Dr. Matthias Wurm and Dr. Frank Scholze for providing the measure-
ment data that has been evaluated in this work and to Dr. Gaoliang Dai for allowing
me to use his 3D/AFM measurement results.
Furthermore I would like to thank Prof. Harald Engel of the Technische Uni-
versit¨at Berlin for being the second supervisor for this dissertation, Dr. Andreas
Rathsfeld of the Weierstrass Institute for Applied Analysis and Stochastics (WIAS)
for helping me with all my questions about DIPOG and programming issues, Gerd
Lindner at the PTB for helping me out with my IT problems, and Dr. Sergio Alonso
for being such a great office co-worker.
I would like to express my sincere gratitude to several special people for guiding
me on my way through the scientific world from the very beginning, Prof. Christine
Papadakis, Carla Lehrbach, my late godmother Cornelia Beilstein and Prof. Rainer
W¨ust. Moreover many thanks to all my dear friends for reminding me that there is
more to life than mathematics and physics.
Finally and most importantly I would like to thank my family, especially my sister
Daniela, my brother-in-law Christian, my nieces Anna and Luisa and most of all my
mother for their support, love and encouragement. Nothing would have been possible
without you.
xix
This work is dedicated to my mother.
xxi
Chapter 1
Introduction
Moore’s law, formulated in 1965 [52], states that the number of transistors on
integrated circuits doubles approximately every two years, consequently leading to
a decrease in the critical dimensions of semiconductor elements, as shown in Fig.
1.1 below. In 2011, Intel launched the first commercial 22 nm semiconductors [37],
where 22 nm denotes the average half-pitch (i.e., half the distance between identical
features) of a memory cell using this technology. In light of this progress, it is now
crucially important to further develop reliable, robust and accurate methods to moni-
tor the manufacturing process. Advanced lithography, e.g. extreme ultraviolet (EUV)
lithography [8], is the key technology for the manufacturing of such semiconductors.
Highly accurate diffractive optical elements (photo masks) play an important role and
their dimensional characterization is an active field of current research [22].
As an alternative to direct imaging methods, such as atomic force microscopy
(AFM), scatterometry is a non-imaging indirect optical method. In scatterometry
the sample is irradiated with light of a specific wavelength and the diffraction pattern
of the scattered light is recorded. In this work two types of scatterometry are investi-
gated. Deep ultraviolet scatterometry (DUV) [53, 75, 76], operating in the spectrum
at about 193 nm, and extreme ultraviolet scatterometry (EUV) [44, 57], using radi-
ation with wavelengths around 13.5 nm. Photo masks used in EUV lithography are
usually constructed of lines of absorbing materials set atop a multilayer system (MLS)
serving as a Bragg mirror for wavelengths in the EUV range. DUV scatterometry can
1
2 Chapter 1: Introduction
Figure 1.1: Comparison of sizes of semiconductor manufacturing process nodes with
some microscopic objects and visible light wavelengths. (Illustration by Cmglee [1])
be used to evaluate the critical dimensions of the absorber structure, as the influence
of the multilayer features is negligible for light in the DUV range. On the other hand,
the high sensitivity of EUV radiation with respect to variations in the multilayer
can be used to evaluate its properties additionally. The short wavelength of EUV
is also advantageous since it provides a large number of diffraction orders from the
periodic structures irradiated. Therefore, EUV and DUV scatterometry complement
each other for metrology on such masks.
As an indirect method scatterometry heavily depends on the post-processing of
the actual measurement data, i.e., the conversion of the measurement data into in-
formation about the critical dimensions of the investigated photo mask. This post-
processing involves the solving of an inverse problem [69]. In contrary to the problem
of predicting how a given mask reacts to irradiation by a beam of a given wavelength,
called the forward problem or the evaluation of the forward model, the inverse prob-
lem is ill-posed in the Hadamard sense [31]. This means that a unique solution may
not exist and even if it exists small errors in the input data can lead to large errors in
the solution. Explicit use of a priori information and detailed knowledge about the
Chapter 1: Introduction 3
variances of the input data are necessary in order to get a reliable solution.
There are different approaches to the forward problem in scatterometry. Sev-
eral works use the rigorous coupled-wave analysis (RCWA) [42, 50, 51, 53, 54, 70].
However, Bodermann and Ehret [9] and Berger et al. [6] have shown that the finite
element method (FEM) yields better results if more complex structures are inves-
tigated. In the present case of periodic line structures Maxwell’s equations reduce
to the Helmholtz equation [43, 58] with adequate boundary conditions. The finite
element method (FEM) is then used to solve these equations [14]. In this work the
software package DIPOG [20], developed by the Weierstrass Institute for Applied
Analysis and Stochastics (WIAS) in Berlin, is used as a forward solver.
Obviously, the forward model depends on many parameters. A common method
to reduce the dimensionality of the problem, and hence the number of possible so-
lutions to the inverse problem, is to assume that the forward model depends only
on a small number of parameters. All the remaining influence parameters are fixed
to certain values, accounting for a priori knowledge. One example of such a sim-
plification for modeling the surface profile is to assume that the mask’s profile is a
symmetric polygonal domain composed of a finite number of trapezoid layers with
different materials. One would be interested in the width and the sidewall angles
of the polygonals, while keeping the heights and optical constants of the materials
constant.
The classical approach to the inverse problem is to set it up as a weighted least
squares problem, i.e., to find the combination of parameters having an influence on
the forward model such that the weighted difference between measurement data and
the model data is minimized. The weight factors in the least squares function account
for the variance in the measurement data and therefore represent knowledge about
the underlying measurement error model. The weighted least squares approach is
widely used in scatterometry [3, 30, 54, 59, 67] for it is robust, well developed and
easy to implement. However, it strongly depends on the used weight factors. Choos-
ing inadequate weights can add a bias to the reconstruction such that the application
of LSQ yields incorrect values for the parameters of interest and additionally leads
to wrong estimates of the associated uncertainties. Therefore, the maximum likeli-
4 Chapter 1: Introduction
hood estimation (MLE) [49] is introduced as a method to solve the inverse problem
of scatterometry. In MLE the variances of the input data are treated as variables
that need to be reconstructed as well. Even though MLE leads to a more reliable
solution to the inverse problem and the associated uncertainties in the first place, the
reconstructed variances for measurement data are much larger than those estimated
by the experimenters. This is due to the fact that the used forward model is still a
simplification of the actual experiment such that it does not account for systematic
errors.
In this work two types of systematic errors and their influence on the measured
efficiencies are discussed and eventually incorporated into the MLE approach, namely
the influence of line roughness [7, 26, 36, 39, 40, 62] and variations of the multilayer
system on the measured efficiencies. It is demonstrated that the incorporation of
those systematic effects into the modeling scheme can improve the quality of the
reconstruction and reduce the estimated uncertainties. However, the used forward
model gets more complicated as more systematic errors and more parameters are
added. In order to assess the quality of the several models and the corresponding
reconstructions the likelihood-ratio test [49, 66] is employed as a method of model
selection.
The LSQ approach and MLE are both deterministic methods to find the solution
to the inverse problem. This means that the solution is a single set of parameters
along with their uncertainties. An alternative to those deterministic approaches is
given by the Bayesian framework [38, 68]. Here the solution to the inverse problem
is no longer an estimate with uncertainties, but probability distributions of the pa-
rameters of interest, called the posterior distribution. This is highly recommended
since the likelihood function that is maximized for MLE, as well as the least squares
function that is minimized for LSQ, tend to have several local maxima and minima,
respectively, such that a single estimate may not give the complete information. The
Bayesian approach, however, makes it necessary that all the parameters on which
the forward model depends are modeled as random variables and are represented by
their probability distributions, which include all the information available about the
parameters prior to the measurement process. This prior information can also include
Chapter 1: Introduction 5
information obtained by different measurements, like those using AFM measurements
[15, 17, 18]. The fact that several measurement techniques can be combined to re-
duce the overall uncertainty of the parameters of interest is what makes the Bayesian
approach so interesting. Its main disadvantage, however, is that a rigorous evaluation
of the posterior distribution is a very time-consuming task. In this work an approach
that helps to circumvent this disadvantage by a simple approximation method is
derived and applied to measurement data.
The thesis is structured as follows: Chapter 2 gives a detailed description of the
two scatterometric measurement setups. It also contains the mathematical framework
for the solution to the forward problem and basic principles of the three methods used
to solve the inverse problem in this work, namely the least squares approach (LSQ),
the maximum likelihood estimation (MLE) and the Bayesian approach. A comparison
of the LSQ and MLE approaches both in terms of simulated and measured data is
given in Chapter 3. The effects systematic errors have on scatterometry are discussed
in Chapter 4, while the extension of the forward model to include the systematic
errors is given in Chapter 5. Chapter 5 also demonstrates the application of the
several models on simulated and measurement data and gives a possible ranking of
the model complexity. It is in Chapter 6 that the Bayesian approach is applied to
actual measurement data. The work closes with a summary and the conclusions in
Chapter 7.
Chapter 2
Preliminaries
We start by collecting some basic facts about scatterometry and some mathemat-
ical concepts that will be useful in the following chapters. We give an overview of the
experimental setups used in scatterometry in Section 2.1. The mathematical concepts
used to model the interaction of electromagnetic waves with matter are introduced
in Section 2.2 and Section 2.3 presents the basics about inverse problem theory.
7
8 Chapter 2: Preliminaries
2.1 Experimental Setups
The following section is mainly based on [27, 74]. A scatterometer can most
generally be defined as a device that illuminates a sample and measures properties
of light scattered from that sample; a schematic representation is shown in Fig. 2.1
below.
Figure 2.1: Scheme of a scatterometric setup.
There are several different scatterometric techniques available, varying according
to the used light source and the measured properties of the scattered light. We
will concentrate on two of them, namely standard scatterometry and spectroscopic
reflectometry. A standard scatterometer uses light from a monochromatic source with
a fixed polarization state. We will only consider the classical case, i.e., the case in
which the direction of the inspecting light beam is chosen to be inside the cross section
plane perpendicular to the groove direction. The resulting scattered wave directions
are then located in the same plane. It is called transverse electric polarization (TE),
or S polarization, when the incident electric field E is parallel to the grooves of the
sample, and transverse magnetic polarization (TM), or P polarization, when E is
perpendicular to the grooves.
Note that the measured samples throughout this work are line-space structures
(cf. Fig. 2.1), i.e., groups of parallel lines (bridges) placed on a plane surface. The
bridges are assumed to have the same cross section in the plane perpendicular to the
line direction (groove direction). Due to equal distances between the bridges, the
line-space structure forms a grating. The grating is constant in the groove direction
Chapter 2: Preliminaries 9
and periodic in the surface direction perpendicular to the grooves. Because of the
periodicity of the grating structure, the outgoing light propagates only into a finite
number of directions. The scatterometer measures the efficiencies, i.e., the portion
of energy conveyed to these discrete outgoing beams. Since each of these beams can
be associated with a diffraction order, the measured data is also called a diffraction
pattern, representing how much energy is transferred to each diffraction order.
Usually the measurement is performed with a fixed angle of incidence θinc either in
reflexion or transmission mode, depending on the optical constants of the sample. The
measurement device is called a goniometric scatterometer, when the angle of incidence
can be varied additionally during the measurement. Reflectometric measurements can
also be realized; the light source and the detector are hereby moved simultaneously,
such that the detector position is always at an angle −θinc measured from the normal
of the surface. With a spectral reflectometer it is possible to vary the wavelength of
the inspecting light. This can be done by either using a tunable laser system or a
broad-band light source and an adjustable monochromator.
In this work experimental data from two scatterometric setups are used. The first
one is a spectroscopic reflectometer that is also capable of detecting diffraction orders
apart from the main reflex operating with a light source in the EUV (extreme ultra-
violet) spectrum of about 13–14 nm. The second one is a goniometric scatterometer
operating at a wavelength of 193 nm in the DUV (deep ultraviolet) range. Both ex-
perimental setups are described in further detail in the following sections. Note that
for either of the two methods the size of the probed area is not infinitely small such
that the measured efficiencies are in both cases averaged values for the probed area.
2.1.1 EUV – Experimental Setup
The first type of measurement data is obtained with an EUV spectroscopic reflec-
tometer, shown in Fig. 2.2. It is operated at the soft x-ray radiometry beam line in
the PTB’s synchrotron radiation laboratory at BESSY II in Berlin [44, 63, 64]. The
beam line provides monochromatized radiation in the spectral range from 0.7 nm to
35 nm, including the EUV spectral range around 13.5 nm. The probed area is around
10 Chapter 2: Preliminaries
1 mm2
. The measurements shown here are obtained by scanning the detector angle
in-plane for three different wavelengths and a fixed angle of incidence of 6◦
for TE
polarization.
For further processing, only the measured diffraction efficiencies for the discrete
diffraction orders were used, no diffusely scattered radiation. For the structures in-
vestigated, EUV scatterometry offers the advantage of working in the regime with
the wavelength much shorter than the characteristic dimensions of the structures to
be investigated (a few 100 nm). Therefore, many diffraction orders can be measured,
providing information on the higher harmonics in the spatial frequency range corre-
sponding to smaller structure details. A typical set of measurement data consists of
69 to 75 efficiencies for diffraction orders in the range of −10 to 14. Depending on the
investigated structure the data in such a measurement dataset covers a wide dynamic
range, starting with 10−3
% for higher orders, up to several 10% for the 0th order.
Figure 2.2: Scheme of the spectroscopic reflectometer.
Chapter 2: Preliminaries 11
2.1.2 DUV – Experimental Setup
The second type of data derives from measurements with a DUV goniometric
scatterometer from the ultra-high resolution microscopy working group at PTB in
Braunschweig. The light source comprises a frequency-quadrupled TiSa laser with
a fundamental wavelength from 772 nm to 840 nm. Wavelengths down to 193 nm
are available via frequency conversion. A measurement dataset in the present case
includes reflected and transmitted diffraction efficiencies from the orders −4 to 2 at
seven different incident angles for a TM-polarized laser beam with a wavelength of
193 nm. The measurement spot size is about 100 µm in diameter for the present
measurements.
It consists typically of 43 data points with transmitted and reflected efficiencies,
see [77] for further details on the experiment. A scheme of the measurement setup is
shown in Fig. 2.3 below. The dynamic range of the measurement data is not as wide
as that from EUV measurements and covers the range from 0.1% to 3%.
Figure 2.3: Scheme of the goniometric reflectometer.
2.2 Mathematical Modeling of Scatterometry
The following section is mainly taken from [4, 43, 58]. The mathematical model
to describe the propagation of electromagnetic waves in matter used here is based
on Maxwell’s equations. The efficiencies and phase shifts for the different diffraction
directions are calculated based on the data of the incident light and from characteristic
12 Chapter 2: Preliminaries
parameters of the irradiated surface profile. The optical grating is modeled to be an
infinite plate consisting of different periodic non-magnetic materials with permeability
µ0 and dielectric constant ǫ. We chose the coordinate system shown in Fig. 2.4
throughout the calculation.
Figure 2.4: Schematic representation of the computational domain.
The investigated mask is furthermore assumed to be periodic in x-direction with
period d and homogeneous in z-direction, i.e., ǫ is invariant with respect to z. The
upper cover material is vacuum and the incident wave is normalized to have unit
amplitude. We consider coplanar diffraction with incident wave directions restricted
to the xy-plane leading to reflected and transmitted plane waves in the same plane.
The incident light then can be described as a superposition of TE-polarized and TM-
polarized light. Note that the magnetic field H and the electric field E remain parallel
to the structures in the TM and TE cases, respectively, so that the transverse compo-
nent of the respective fields can be determined from the two-dimensional Helmholtz
equation
∆u (x, y) + k2
(x, y) u (x, y) = 0 (2.1)
with the wavenumber function k (x, y) = ω (µ0ǫ (x, y))1/2
and angular frequency ω
Chapter 2: Preliminaries 13
of the incident light wave. Note that the wavenumber function is constant in areas
filled with the same material. On material interfaces the solution u and its nor-
mal derivative ∂nu, for TE polarization and the solution u and product k−1
∂nu, for
TM polarization, have to cross the interface continuously. The usual outgoing wave
conditions for half-spaces are required in the infinite regions.
The domain Ω in the cross section plane can therefore be reduced to a rectangle
with the x-coordinate varying between zero and the period d and with two artificial
boundaries Γ± = {y = b±
} located beneath the substrate (Γ−) and in the covering
vacuum (Γ+). On the lateral part, quasi-periodic boundary conditions are imposed
such that
u (0, y) = u (d, y) exp (−iα0d) ,
and non-local boundary conditions are imposed on Γ±
.
For instance, on Γ+
the trace ∂nu|Γ+ on Γ+
of the normal derivative ∂nu must
equal the y derivative of the Rayleigh expansion (see Eq. (2.2)) of the trace u|Γ+ of
u from Ω. The component Ez admits an expansion into Rayleigh series above and
beneath the grating structure. For TE polarization they are given by
Ez x, b+
=
∞
n=−∞
A+
n exp iβ+
n y exp (iαnx) + Ainc
0 exp −iβ+
0 y exp (iα0x) (2.2)
and
Ez x, b−
=
∞
n=−∞
A−
n exp −iβ−
n y exp (iαnx) . (2.3)
Note that β±
n = (k±)2
− (αn)2
, k±
= k (x, b±
), Ainc
0 = 1, α0 = k+
sin θinc and
αn = k+
sin θinc + 2π
d
n. The Rayleigh coefficients A±
n of interest are those with n ∈ U±
,
U±
=
{n ∈ Z : |αn| < k±
} if Im k±
= 0
∅ if Im k±
> 0
,
as they describe the magnitude and phase shift of the propagating plane waves.
The modulus |A±
n | is the amplitude of the nth reflected/transmitted wave mode and
arg (A±
n / |A±
n |) is the corresponding phase shift. Terms with n /∈ U±
lead to evanes-
cent waves.
14 Chapter 2: Preliminaries
The efficiency of the nth diffracted wave is defined as the ratio of its energy to
the energy of the incoming wave. The energy in turn is defined as the flux of the
Poynting vector P = Re (E × H) /2 through a reference area parallel to the plane of
the grating. The efficiencies can be expressed as
e±
n =
β±
n |A±
n |
2
β0 |Ainc
0 |
2 , (n, ±) ∈ (n, +) : n ∈ U+
∪ (n, −) : n ∈ U−
, (2.4)
see [58] Eq. (1.50). These efficiencies of propagating modes exist for non-absorbing
materials, i.e., materials such that Im k±
= 0. The efficiencies for TM polarization
can be derived analogously. Once Eq. (2.1) is solved with the finite element method
(FEM) for elliptic PDEs [14], the Rayleigh coefficients can be obtained by a discretized
Fourier series expansion applied to the solution restricted to Γ±
[13] (see Eqs. (2.2)
and (2.3)). Equation (2.4) yields the efficiencies. We use the software package DIPOG
[20] for our investigations, developed at the Weierstrass Institute for Applied Analysis
and Stochastics (WIAS) in Berlin.
Note that the method described above can be applied to arbitrary complex, peri-
odic structures. This means that there are no limits for the forward problem. How-
ever, the use of the FEM method as a model function in the inverse problem demands
a simplification of the modeling. It is therefore reasonable to limit the geometry of
the profile to certain classes of gratings that can be described by a small number of
parameters. A common approach is to define the profile of the grating as a polyg-
onal structure, composed of several materials. The coordinates of the corner points
of the polygonals along with the optical constants of the materials completely de-
fine the grating structure in this case. The distribution of the dielectric constant
ǫp (x, y) can then be defined by distinction of cases expressed by a combination of
linear inequalities for x, y that depend on the chosen parameter vector p.
The exact representation of the wavenumber functions consequently also depends
on those parameters such that the Helmholtz equation reads as:
∆u (x, y) + k2
(p, x, y) u (x, y) = 0, (2.5)
with k (p, x, y) = ω (µ0ǫp (x, y))1/2
. The period of the grating in x-direction, the
coordinates of the corner points, usually given relative to the period, the thickness of
Chapter 2: Preliminaries 15
the absorber and the specification of the optical constants of the materials p1, . . . , pk
are then sufficient for a complete characterization of the sample.
Together with the wavelength of the incident light λ and the angle of incidence
θinc, all parameters necessary to model the resulting diffraction pattern are given
and we end up with the model function, which maps the parameter vector p =
(p1, . . . , pk, λ, θinc) onto the corresponding diffraction pattern, i.e., the efficiencies and
phase shifts for the different diffraction directions
f : Rk+2
→ Rm
, p → f (p) .
Note that the PDE from Eq. (2.5) needs to be solved for every evaluation of the
model function.
2.2.1 Geometry Model for EUV
The first object under investigation is a photomask with a periodic line pattern
designed for use in EUV lithography. It will be called the EUV mask. The cross
section profile for one spatial period is shown in Fig. 2.5 below.
Figure 2.5: Cross section of the investigated EUV mask.
16 Chapter 2: Preliminaries
Material Height/nm SWA/◦
n k
TaO 12.00 82.7 0.948 0.0310
TaN - - 0.942 0.0337
SiO2 (buffer) 8.000 90 0.984 0.0082
SiO2 (capping) 1.234 0.984 0.0082
Si 12.869 1.000 0.0018
MoSi 0.147 0.970 0.0043
Mo 2.141 0.925 0.0062
MoSi 1.972 0.970 0.0043
Si 2.838 1.000 0.0018
Substrate ≈6.35 · 106
0.984 0.0082
Table 2.1: Geometric parameters and optical constants at a wavelength of 13.4 nm
of the EUV mask used for simulations, period d = 720 nm.
It consists of a symmetric polygonal domain composed of three trapezoidal layers
of different materials (TaO, TaN, and SiO2). These trapezoids are defined by the
heights of the three layers pi, i = 1, 6, 11 and by the coordinates pi, i = 2, 3, 7, 8, 12, 13
defining the x-coordinates of the corner positions. Beneath the line-space structure
there are two absorbing layers of SiO2 and of Si on top of a molybdenum silicide
(MoSi) multilayer stack (MLS). The stack consists of an in y-direction periodically
repeated structure composed of a Mo layer and a Si layer separated by two inter-
diffusion MoSi layers. Note that the MLS is added to enable the reflection of EUV
waves. It acts as a Bragg mirror at the design wavelength of about 13.4 nm.
Important geometric profile parameters are the height p6 of the TaN layer (55–60
nm) and the x-coordinates p2 and p7 of the right corners of the TaN layer. The
complex indices of refraction for the involved materials are given in Table 2.1 for a
wavelength of 13.4 nm.
A symmetric profile is imposed, i.e., the x-coordinates of the corresponding left
corners depend on those of the right corners such that p3 = d−p2 or p8 = d−p7, where
d is the period of the EUV mask. Furthermore, the sidewall angle (SWA) for the TaO
layer is fixed to 82.7◦
. This is done in order to model a certain edge rounding, i.e.,
the cross section area of this trapezoidal layer is equal to a corresponding TaO layer
having curved upper edges with a radius of about 6 nm. Additionally, we assume that
Chapter 2: Preliminaries 17
the SWA of the SiO2 layer is constant at 90◦
. The SWA of the TaN layer depends
on the x-coordinates of the corners and the height of the TaN layer, i.e.,
tan(SWA) =
p6
p2 − p7
.
The geometric features of main interest, i.e., the critical dimensions (CDs) to
be determined by scatterometry, are the height, top width and bottom width of the
absorbing structure and the SWA of the TaN absorber layer, which depend on the pa-
rameters p6, p7 and p2 (cf. Fig. 2.5). In the following we will refer to these parameters
as height, top CD, bottom CD and SWA. In our evaluations all remaining parameters
are set to the values given in Table 2.1, which represent the manufacturer’s design
values [59]. Note that the model function for the EUV mask only depends on the
parameter vector p = (p2, p6, p7) once an incident wavelength λ, which additionally
determines the optical constants, and an incident angle θinc are given.
2.2.2 Geometry Model for DUV
The second object under investigation is another line-space structure, called the
MoSi mask. Its cross section is a trapezoidal domain made of molybdenum silicide
(MoSi) based on a glass substrate (cf. Fig. 2.6).
Figure 2.6: Cross section of the investigated MoSi photomask.
18 Chapter 2: Preliminaries
The trapezoid is completely defined by its height p3 and by the x-coordinates
pi, i = 1, 2, 4, 5 of its corners. Again a symmetric profile is imposed and the sidewall
angle of the MoSi absorber layer depends on the corners and the height, such that:
tan(SWA) =
p3
p1 − p4
.
The critical dimensions to be determined by scatterometry are the height, top width
and bottom width of the absorbing structure and the SWA of the MoSi absorber
layer.
The optical constants and the design values of the manufacturer for the MoSi
mask can be found in Table 2.2 below. As for the EUV mask, the model function
for the MoSi mask depends on three parameters p = (p1, p3, p4) once an incident
wavelength λ and an incident angle θinc are given.
Material Height/nm SWA/◦
n k
MoSi - - 2.308 0.5975
Substrate ≈ 6.35 · 106
1.575 0
Table 2.2: Geometric parameters and optical constants at a wavelength of 193 nm of
the MoSi mask, period d = 560 nm.
Chapter 2: Preliminaries 19
2.3 Inverse Problem Theory
Equipped with a model function like the ones defined in Section 2.2
f : Rn
→ Rm
, p → f (p)
the (finite-dimensional) inverse problem reads as follows: Given a noisy realization
of the model y ∈ Rm
, usually called the measurement data, compute an estimate for
the parameters p ∈ Rn
. As we assume that
y = f (p) ,
the straightforward way would be to invert the model function f, such that
p = f−1
(y) .
Unfortunately, most inverse problems are ill-posed in the Hadamard sense [31], which
means that
1. a solution may not even exist,
2. the solution may not be unique,
3. the inverse function f−1
is not continuous, hence small errors in the data y may
cause large errors in the estimate of p.
We will present a short overview of three different approaches to solve the inverse
problem. Motivated by the actual inverse problem of scatterometry we will hereby
focus on finite-dimensional problems. The subsections discussing the least squares
and maximum likelihood approaches are mainly based on [32, 38, 41, 45], while the
section about the Bayesian approaches is based on [10, 23, 48] and also on [38, 41].
2.3.1 Least Squares Approach
We first consider a linear model function, i.e., a model function that can be written
as
f : Rn
→ Rm
, p → Mp,
20 Chapter 2: Preliminaries
for some matrix M ∈ Rm×n
with rank r ≤ min (m, n). The singular value decompo-
sition (SVD) of the matrix M is defined as
M = UDVT
, (2.6)
with
D =
Σr 0
0 0
∈ Rm×n
, (2.7)
Σr = diag (σ1, . . . , σr) , σ1 ≥ σ2 ≥ . . . ≥ σr > 0, UUT
= Im and VVT
= In.
One way to define the condition number of the matrix M is to define it as
cond (M) =
σ1
σr
.
An ill-conditioning of the matrix M can be classified as the following:
• The problem is rank-deficient, i.e., the solution is not unique if r < n ≤ m or if
m < n.
• The problem is numerically rank-deficient if M has a few small singular values,
with a clear gap between small and larger values.
• The problem is discrete ill-posed, i.e., there is no gap between the singular
values, but there are a lot of small singular values.
Numerically rank-deficient and discrete ill-posed problems are both characterized by
a very large condition number and can be considered underdetermined problems.
Most likely the noisy realization of the model y /∈ ran (M) and the inverse problem
has no classical solution such that Mp − y = 0. In this case we seek for a solution
such that Mp is in a sense close to the measurement y, most commonly using the
least squares method. First, one defines an objective function
χ2
(p) = Ω (Mp − y) 2
.
The weight matrix Ω adjusts the importance of individual data points and is usually
chosen to represent the variances of the measurement data. Throughout this work
Chapter 2: Preliminaries 21
we assume that Ω = diag ω
1/2
1 , . . . , ω
1/2
m , hence
χ2
(p) =
m
j=1
ωj (Mp)j − yj
2
.
The least squares solution to the inverse problem is the parameter vector pLSQ such
that:
pLSQ = arg min
p
χ2
.
Differentiating χ2
with respect to p and setting ∂χ2
∂p
= 0 yields the set of normal
equations
MT
ΩT
(ΩMp − Ωy) = 0.
If r = n ≤ m the LSQ solution can be formally obtained as
pLSQ = MT
ΩT
ΩM
−1
MT
Ωy. (2.8)
In the case that the matrix M is rank-deficient, i.e., r < n, and if Ω = Im, we
obtain the LSQ solution using the Moore-Penrose pseudoinverse M†
of M [56]. Using
the SVD from Eqs. (2.6) and (2.7), the pseudoinverse is obtained as
M†
= V
Σ−1
r 0
0 0
UT
,
and the LSQ solution can be calculated via
pLSQ = M†
y.
If the matrix M is numerically rank-deficient, a regularization technique known
as truncated SVD can be applied, whereby the small singular values {σk+1, . . . , σr}
are replaced by zeros. The pseudoinverse in this case is obtained as
M†
= V
Σ−1
k 0
0 0
UT
, Σ−1
k ∈ Rk×k
.
Problems in which M is discrete ill-posed require some further regularization, which
means that the ill-posed problem is replaced by a well-posed approximation. The
22 Chapter 2: Preliminaries
regularization of inverse problems is a vast field and many techniques are available
to deal with that problem. Since in this work no further regularization is applied, we
will not present those approaches, but refer to [21, 25, 71].
In the case of a nonlinear model function
f : Rn
→ Rm
, p → f (p) ,
the solution to the inverse problem is found by minimizing the resulting weighted
least squares function,
χ2
(p) = Ω (f (p) − y) 2
=
m
j=1
ωj [fj (p) − yj]2
. (2.9)
In this work the solution to Eq. (2.9) is found by a Gauss-Newton-type iterative
optimization proposed by Dr`ege, Alassaad and Byrne [3]. Starting with an initial
estimate p0
, the model function for each iteration is approximated by a Taylor series
around the current estimate pk
, such that
χ2
(p) ≈ Ω f pk
+ J pk
p − pk
− y
2
, J =
∂fj
∂pi i,j
.
The next iterate is found by minimizing this linear LSQ problem, similar to Eq. (2.8),
leading to the iteration formula
p(k+1)
= pk
+ J pk T
ΩT
Ω J pk
−1
J pk T
Ω y − f pk
.
It is clear that the variances of the input data have an influence on the variances
of the reconstructed parameters. A higher variance in the measurement noise typi-
cally leads to a higher variance of the reconstructed parameter values. One way to
estimate these variances is to calculate the approximate covariance matrix as pro-
posed in [3, 55]. If we assume that the model function f is approximately linear in
the relevant regions of the parameter values pi, then the errors of the reconstructed
parameters are, again, normally distributed random numbers with zero mean. The
standard deviations u (pi) of the quantities pi are given by the square root of the main
diagonal entries of the covariance matrix Σ of the parameters. The matrix Σ can be
approximated as
Σ ≈ JT
ΩT
ΩJ
−1
, (2.10)
Chapter 2: Preliminaries 23
with Ω = diag ω
1/2
1 , . . . , ω
1/2
m . Hence
u (pi) ≈ (Σi,i)1/2
.
Note that in the earlier works [3, 30, 55] the scaling factors ωj in Eq. (2.10) were
chosen according to the predefined error model. A modified approach to the variance
estimation for LSQ is to choose the scaling factors according to the resulting residuals
of the optimal solution, based on the following reasoning. A consistent solution ˆp to
the optimization problem (cf. Eq. (2.9)) should pass the χ2
-test, namely
χ2
min =
m
j=1
ωj [fj (ˆp) − yj]2
∈ χ2
ν,α/2, χ2
ν,(1−α)/2 , (2.11)
where ν = m − n denotes the degrees of freedom, χ2
ν,α/2, χ2
ν,(1−α)/2 the confidence
interval of the corresponding χ2
ν distribution for a specific significance level, e.g.,
α = 0.05. If this is not the case, then we can fulfill the condition of Eq. (2.11) by
rescaling the variances of the input data and the weights with some scaling factor κ:
ωj = ωjκ,
chosen such that the rescaled χ2
min equals ν.
Note that this rescaling of the weights does not affect the result of the optimization
given by the parameter values that minimize the function in Eq. (2.9). However, it
fits the variances of the reconstructed parameters. In this work LSQ refers to the
minimization of the function in Eq. (2.9) followed by a rescaling of the variances
according to the inclusion in Eq. (2.11). In [28, 29, 30] it has been shown that the
variances of the reconstructed parameters calculated using the above approximation
are comparable to those obtained using a more time-consuming Monte Carlo-type
method assuming known variances of the measurement data and a local linearity of
the model function f around the minimum.
2.3.2 Maximum Likelihood Approach
Note that for the weighted least squares approach described in the previous sub-
section, a complete knowledge of the variances of the measurement data is required.
24 Chapter 2: Preliminaries
Since this constraint is seldom fulfilled in real life, we therefore introduce a method
that is capable of solving inverse problems without this knowledge, known as maxi-
mum likelihood estimation (MLE). All that is required for MLE is a specification of
the underlying statistical model. In the present work we exclusively deal with nor-
mally distributed measurement errors and we therefore focus on the MLE for such an
error model.
If we assume that the measurement errors
ǫj = yj − fj (p) , j = 1, . . . , m,
i.e., the difference between the model and the measured values are uncorrelated,
normally distributed with unknown variances σ1, . . . , σm and zero mean for each of
the m measurement values, then their probability density function is proportional to
L (σ1, . . . , σm, p) =
m
j=1
2πσ2
j
−1/2
exp −
(fj(p) − yj)2
2σ2
j
. (2.12)
The maximum likelihood estimator is the parameter combination that maximizes this
likelihood function, i.e.,
ˆθMLE = (ˆσ1, . . . , ˆσm, ˆp) = arg max
σj,p
L (σj, p) .
This parameter combination is the most likely to produce the measured data y. Note
that for fixed variances the maximum likelihood estimator is identical to the LSQ
solution to the inverse problem.
If the second derivative of the logarithm of the likelihood function exists and
is finite, then the covariance matrix for the maximum likelihood estimator ˆθ can
asymptotically be expressed in terms of the negative second derivative of log L, the
Fisher information matrix [49]
I = −
∂2
log L
∂θi∂θj i,j
.
If we denote by ˆθ the maximum likelihood estimator, then its standard error can be
calculated as
u ˆθi = (Σi,i)1/2
, with Σ = I−1
. (2.13)
Chapter 2: Preliminaries 25
If the operations of integration with respect to y and differentiation with respect
to θ can be interchanged for the second derivative of log L, it can be proven that
the maximum likelihood estimator is asymptotically efficient. This means that it is
asymptotically consistent and it additionally achieves the Cram´er-Rao lower bound.
Consequently, no asymptotically unbiased estimator has a lower asymptotic mean
squared error than the MLE. Note that the error bars presented throughout this
work both for LSQ and MLE correspond to the 95% confidence intervals that can be
obtained by rescaling the standard errors by a factor of 1.96.
2.3.3 Bayesian Approach
The methods introduced in the previous subsections, LSQ and MLE, both yield
single estimators for the parameters of interest. Those estimators are found by min-
imizing the weighted least squares function and maximizing the likelihood function.
However, especially for MLE but also for LSQ, several local extrema may appear, such
that a single estimate may be uninformative. The Bayesian approach to the inverse
problem has a different point of view. The solution is no longer a single estimator but
a probability density for the parameters of interest, called the posterior probability
distribution.
The framework necessary to understand this approach will now be given. Note
that this subsection mainly originates from [38, 41, 68]. The main principles for the
Bayesian approach are:
1. All variables included in the model are modeled as random variables.
2. The randomness describes our degree of information concerning their realiza-
tions.
3. The degree of information concerning these values is coded in the probability
distributions.
4. The solution of the inverse problem is the posterior probability distribution.
26 Chapter 2: Preliminaries
Denoting random variables by capital letters, the model takes the form
Y = g (Θ) ,
with Y taking values in Rm
and Θ taking values in Rn+k
. Note that this model
function g is not identical to the model function f discussed before, as it usually also
contains information about the measurement errors and the appropriate error model
as well as the model function for the physical model. The directly observable random
variable Y is called the measurement and its realization Y = yobserved the data. The
non-observable random variable Θ that is of primary interest is called the unknown.
Any information that is available about Θ before the measuring process is coded
into a probability density
θ → πpri (θ) ,
called the prior density, expressing what we know about the unobservable parameters
prior to the measurement. After analyzing the measurement setting as well as all
additional information available about the variables, we have the joint probability
density of Θ and Y by π (θ, y). The marginal density of the unknown Θ must then
be
Rm
π (θ, y) dy = πpri (θ) .
If, on the other hand, we knew the value of the unknown, the conditional proba-
bility density of Y, given this information, would be
π (y|θ) =
π (θ, y)
πpri (θ)
, if πpri (θ) = 0.
This conditional probability of Y is called the likelihood function, as it expresses the
likelihood of different measurement outcomes with Θ = θ given (cf. Eq. (2.12)).
If the measurement data Y = yobserved is given, we end up with the conditional
probability distribution
π (θ|yobserved) =
π (θ, yobserved)
π (yobserved)
, if π (y)observed = 0,
called the posterior distribution, as it expresses what we know about Θ after the
observation Y = yobserved.
Chapter 2: Preliminaries 27
The goal in the Bayesian framework is to find the conditional probability function
π (θ|yobserved) for a given set of measurement data Y = yobserved. Combining the
above results, it can be expressed as
πpost (θ) = π (θ|yobserved) =
πpri (θ) π (yobserved|θ)
π (yobserved)
. (2.14)
Looking at Eq. (2.14), the solving of an inverse problem in the Bayesian framework
can be split into three steps:
1. Collect all the available prior information of the unknown Θ and construct a
prior probability density function πpri that reflects this information.
2. Find the likelihood function π (y|θ) that describes the interrelation between the
observation and the unknown.
3. Develop methods to explore the posterior probability if it can not be expressed
in an analytical way.
Chapter 3
Maximum Likelihood and Least
Squares
This chapter presents a comparison between the least squares method and the
maximum likelihood estimation. We start by introducing our model of measurement
errors and show how it is incorporated into the two different approaches in Section
3.1. Section 3.2 presents the results obtained by the two methods both for simulated
and measured datasets. It is demonstrated how LSQ can lead to unsatisfying results
if the knowledge of the measurement error is incomplete and how MLE can be applied
to circumvent this problem. The results have already been published in [34].
29
30 Chapter 3: Maximum Likelihood and Least Squares
3.1 Measurement Error Model
Usually a measurement dataset that characterizes the diffraction pattern is given
by a vector y = (y1, . . . , ym), consisting of efficiencies or phase shift differences for
different wavelengths, incident angles or polarization states with the jth data point
being a sum of the value of the model function and a noise contribution due to the
perturbation by measurement noise. If yj denotes the corresponding measurement
value, we assume
yj = fj (p) + ǫj,
where ǫj denotes the according measurement error. If there is no correlation between
the measurements and there are no further systematic errors, we can assume the
errors ǫj of the different measurements to be independent. We furthermore assume
that they can be modeled as a sum of two normally distributed random variables that
both have zero mean such that
ǫj = ǫj,1 + ǫj,2, with ǫj,1 ∼ N 0, (afj)2
and ǫj,2 ∼ N 0, b2
.
From an experimental point of view, power fluctuations of the incidental beam
during the recording of the diffraction patterns are the main source for the first term.
The second term describes the contribution of the background noise independent of
the measured light intensities. The overall variance of the errors then reads as
σ2
j = (afj)2
+ b2
. (3.1)
Based on this error model and the resulting variances in Eq. (3.1), the weighted
least squares function, given by Eq. (2.9), takes the following form:
χ2
(p) =
m
j=1
ωj [fj (p) − yj]2
, ωj = σ−2
j = (afj (p))2
+ b2 −1
. (3.2)
Throughout this chapter the LSQ solution to the inverse problem will refer to the
minimum of this weighted least squares function. Note that it only depends on the
geometry parameters p of the model function. For the minimization of the least
squares function a Gauss-Newton-type optimization routine available in DIPOG is
used.
Chapter 3: Maximum Likelihood and Least Squares 31
The likelihood function for the given error mode is a function of a, b and the
geometry parameters p for given measurement data y [49] (see Eq. (2.12))
L (a, b, p) =
m
j=1
2π (afj(p))2
+ b2 −1/2
exp −
(fj(p) − yj)2
2 (afj(p))2
+ b2
.
The MLE solution to the inverse problem will refer to the values a, b and p corre-
sponding to the maximum of this likelihood function. For the maximization DIPOG
is only used as a black box to solve the forward problem and to calculate the gra-
dients of the model function with respect to the parameters p. The optimization is
performed using a routine in MATLAB modified by the author.
3.2 Results
We now apply the LSQ and the MLE approaches to several simulated and mea-
sured datasets both for EUV and DUV scatterometry.
3.2.1 Dependency on the Chosen Weight Factors for EUV
Data
In this section, we solve the reconstruction problem using simulated data that
are superposed by a noise representing the variances of the input values that are
parametrized by a and b. We start with a least squares approach assuming fixed
uncertainty parameters a and b. Figure 3.1 shows the values of the χ2
-function
defined in Eq. (3.2) for one realization of a simulated dataset perturbed by a noise
contribution with a = 10% and b = 10−3
(cf. Eq. (3.1)) in dependence on the
bottom CD and top CD for two different noise models, i.e., two different weightings
in the least squares function from Eq. (3.2). Figure 3.2 shows the dependency of the
reconstructed CDs on the ratio b/a for the same example.
Note that the reconstruction results depend only on the ratio of b to a and not on
the absolute values of the two parameters. The geometrical parameters of the mask
used in the simulations were set to a bottom CD of 550 nm and a top CD of 546.9
32 Chapter 3: Maximum Likelihood and Least Squares
nm, corresponding to a sidewall angle of 90◦
for the TaN and the SiO2 layer and
82.7◦
for the TaO layer on top of the absorber line (cf. Fig. 2.5).
b/a=0.01
bottom CD / nm
topCD/nm
545 550 555 560
542
544
546
548
550
552
100
200
300
400
500
600
b/a=0.1
bottom CD / nm
topCD/nm
545 550 555 560
542
544
546
548
550
552
6000
8000
10000
12000
14000
Figure 3.1: χ2
in dependence on bottom CD and top CD for different ratios b/a.
0 0.02 0.04 0.06 0.08 0.1
540
545
550
555
560
b/a
CD/nm
bottom CD
top CD
0 0.02 0.04 0.06 0.08 0.1
84
86
88
90
b/a
sidewallangle/°
Figure 3.2: Reconstructed CDs and SWA in dependence on ratio b/a for a simulated
dataset.
In Fig. 3.1 one clearly sees how the minimum (the blue dot) shifts upon change
of the ratio b/a resulting in two different solutions to the inverse problem. This
dependency of course also has an impact on the reconstructed sidewall angle (see Fig.
3.2). One sees that the reconstructed value of the top CD decreases and that of the
bottom CD increases by several nanometers as the value of b/a in the reconstruction
is chosen by one order of magnitude bigger than the true value.
It is this sensitivity that suggested to us to treat the noise parameters a and
b as additional variables that need to be reconstructed as well, since a wrong or
Chapter 3: Maximum Likelihood and Least Squares 33
incomplete assessment regarding the variances of the input parameters (scattering
efficiencies) will not only lead to an under- or overestimation of the variances of the
output parameters (reconstructed geometry parameters) but also causes significant
systematic errors in the results. It is especially important to consider this when
determining the setup for a new experiment, as the knowledge of the variances of the
measurement errors is usually incomplete.
3.2.2 Application to Simulated EUV Data
In the following, MLE is applied to datasets obtained by simulating the diffraction
pattern of an EUV mask with bottom CD = 550 nm, top CD = 546.9 nm and height
= 58 nm. The results are compared to those of the LSQ method applied to the same
numerically generated datasets. The parameters used in the simulation are similar to
the ones used in the actual EUV measurements on such masks. The −10th to 12th
diffraction orders of a photo beam at a fixed angle of incidence of 6◦
are numerically
computed for three different wavelengths of the incoming EUV radiation, resulting in
a set of 3 × 23 data points.
The simulated data have been perturbed assuming a normally distributed mea-
surement error with a = 10% and b = 10−3
(cf. Eq. (3.1)) resulting in a total of 50
noisy datasets. The weights ωj for the LSQ in Eq. (3.2) were defined with a = 1%
and b = 10−3
representing a typical estimate of the variances in the real measurement
processes used in earlier publications [30]. For MLE those noise parameters were
treated as unknowns and hence needed to be reconstructed as well.
The results for the noise parameter a and the ratio b/a along with the approxi-
mate 95% confidence intervals are presented in Fig. 3.3. MLE is found to be capable
of reconstructing the noise parameter a from a dataset of limited size with a typical
relative error of 10%–20%, while errors in b can be substantially larger. Figure 3.4
below presents the reconstructed sidewall angles along with the approximate 95%
confidence intervals for the solutions of the two methods. Note that there is a sys-
tematic shift of about 1.5◦
between the actual value of 90◦
and the mean estimated
sidewall angle obtained by LSQ, while the mean estimated sidewall angle obtained
34 Chapter 3: Maximum Likelihood and Least Squares
by MLE is almost identical to the actual value. This bias is observed because of the
nonlinearity of the mathematical model. It is found to vanish for the present model
if the correct weights are chosen.
0 10 20 30 40 50
0.06
0.08
0.1
0.12
0.14
0.16
0.18
dataset
noiseparametera
0 10 20 30 40 50
−0.005
0
0.005
0.01
0.015
0.02
0.025
dataset
b/aFigure 3.3: Reconstructed noise parameter a (left panel) and ratio b/a (right panel)
with approximate 95% confidence intervals for MLE. The green dotted lines represent
the actual values, the red dotted lines represent the mean values.
0 10 20 30 40 50
75
80
85
90
95
100
dataset
SWA/°
LSQ
0 10 20 30 40 50
75
80
85
90
95
100
dataset
SWA/°
MLE
Figure 3.4: Comparison of the reconstructed sidewall angles with approximate 95%
confidence intervals for the solutions for LSQ (left panel) and for MLE (right panel).
The green dotted lines represent the actual values, the red dotted lines represent the
mean values.
For a consistent reconstruction the true value of the geometry parameter should
lie within the 95% confidence interval around the reconstructed value. Figure 3.4
shows that the percentage of consistent reconstructed sidewall angles is comparable
for both methods (92% for LSQ, 94% for MLE). However, the root-mean-square
Chapter 3: Maximum Likelihood and Least Squares 35
deviations (RMSD) of the reconstructed values from the true values as shown in Fig.
3.5 are about twice as high for LSQ as they are for MLE.
topCD botCD height SWA
0
0.5
1
1.5
2
2.5
3
3.5
4
relativedeviation/%
LSQ, RMSD
LSQ, mean
topCD botCD height SWA
0
0.5
1
1.5
2
2.5
3
3.5
4
relativedeviation/%
MLE, RMSD
MLE, mean
Figure 3.5: Comparison of the RMSD and mean estimated standard deviations in %
of the actual value for LSQ (left panel) and for MLE (right panel).
3.2.3 Application to Measured EUV Data
In the following we apply the maximum likelihood estimation to measurement data
from EUV scatterometry at PTB. The EUV mask under investigation is structured
into 121 dies, each of which contains two different scatterometry fields with periodic
line and space structures. While the dies D4, D8, F6 and H8 have identical design
values, dies H4, H5 and H6 have different periods but are assumed to have the same
bottom CD, for detailed values see Table 3.1.
Dataset Period/nm Bottom CD/nm
H4 840 140
H5 420 140
H6 280 140
D4,D8,F6,H8 720 540
Table 3.1: Design values of the EUV mask.
Note that the uncertainty for the periods lies around 370 pm [73]. All dies share
the remaining geometrical and structural parameters given in Table 2.1. Note that
36 Chapter 3: Maximum Likelihood and Least Squares
the dependence of the reconstructed solution on the choice of weights shown in Fig.
3.6 has the same qualitative shape as the one obtained for simulated input data in
Fig. 3.2.
0 0.02 0.04 0.06 0.08 0.1
535
540
545
550
555
560
565
570
b/a
CD/nm
bottom CD
top CD
0 0.02 0.04 0.06 0.08 0.1
76
78
80
82
84
86
b/a
sidewallangle/°
Figure 3.6: Reconstructed CDs and SWA in dependence on ratio b/a for measurement
dataset D4.
H4 H5 H6 D4 D8 F6 H8
0.1
0.14
0.18
0.22
0.26
0.3
noiseparametera
H4 H5 H6 D4 D8 F6 H8
0
0.005
0.01
0.015
0.02
0.025
b/a
Figure 3.7: Reconstructed noise parameter a (left panel) and ratio b/a (right panel)
with approximate 95% confidence intervals for measured data from the EUV scat-
terometer. The dotted lines represent the mean values of the reconstructed values.
The reconstructed noise parameters a and the ratios b/a for all measured datasets
are plotted in Fig. 3.7. The mean estimated value for b/a lies around 1.2 · 10−2
, a
value that differs significantly from the value of approximately 3 · 10−2
–10−1
(a =
1%–3%, b = 10−3
) used for the LSQ method in previous publications [30]. The mean
estimated value for the background noise b of about 2 · 10−3
agrees quite well with
the value given by the experimenters; the relative error a lies around 16%, which is
Chapter 3: Maximum Likelihood and Least Squares 37
about a magnitude larger than expected. Similar to the observations in the case of
simulated input data we observe a systematic shift of the sidewall angles reconstructed
using MLE towards the design value of 90◦
compared to the LSQ solutions with fixed
weights (see Fig. 3.8). Note that the mean sidewall angle obtained by SEM in [59]
was approximately 86◦
.
H4 H5 H6 D4 D8 F6 H8
60
65
70
75
80
85
90
95
100
sidewallangle/°
MLE
LSQ
Figure 3.8: Reconstructed sidewall angles with approximate 95% confidence intervals
for measured data from EUV scatterometry. The dotted lines represent the mean
values for the two methods.
3.2.4 Application to Measured DUV Data
Finally, we apply the MLE approach to measurement data from the DUV scat-
terometer. The dataset consists of 11 measurements on different mask fields, which
were fabricated with identical design specification, e.g., at a period of 560 nm [60].
The MLE results for the parameters a and b characterizing the uncertainty of the
input data with their approximate 95% confidence intervals are presented in Fig. 3.9.
The parameter b representing the background noise of the measurement setup lies
around 3 · 10−2
, which is in a fair agreement with the value of 4 · 10−2
given by the
experimenters in [76].
Recall that the sensitivity of the reconstructed SWA with respect to changes in
the ratio b/a is strong for small values of b/a and indifferent for large values of b/a
(cf. Figs. 3.2 and 3.6), hence the geometrical features obtained by LSQ with fixed
weights a = 2.5% and b = 4 · 10−2
(b/a = 1.6) are almost identical to those obtained
by MLE (b/a ≈ 1), both in terms of consistency and variance, as shown in Fig. 3.10.
38 Chapter 3: Maximum Likelihood and Least Squares
2 4 6 8 10
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
noiseparametera
dataset
2 4 6 8 10
0
0.5
1
1.5
2
2.5
3
b/a
dataset
Figure 3.9: Reconstructed noise parameter a and ratio b/a with approximate 95%
confidence intervals for measured data from DUV scatterometry. The dotted lines
represent the mean values of the reconstructed values.
0 2 4 6 8 10
80
81
82
83
84
85
86
87
88
dataset
sidewallangle/°
LSQ
0 2 4 6 8 10
80
81
82
83
84
85
86
87
88
dataset
sidewallangle/°
MLE
Figure 3.10: Reconstructed sidewall angles with approximate 95% confidence intervals
for measured data from the DUV scatterometer for LSQ (left panel) and for MLE
(right panel). The dotted lines represent the mean values.
Chapter 3: Maximum Likelihood and Least Squares 39
3.3 Chapter Summary
In the present chapter we have seen how the usage of maximum likelihood estima-
tion (MLE) can substantially improve the results for the critical dimensions compared
to the standard least squares (LSQ) approach used in earlier papers [3, 30, 54]. We
have demonstrated the sensitivity of the LSQ approach with respect to the used
statistical model in terms of simulation data. Strong systematic deviations on the re-
constructed CDs and sidewall angles could be observed if inappropriate weights that
account for the measurement errors of the input data were chosen in the least squares
function. Maximum likelihood estimation has been proposed as an alternative. Here
the parameters modeling the measurement errors were included as variables in the
optimization process.
The ability of MLE to solve the inverse problem has been investigated by apply-
ing it to simulated datasets with known variances of the input data. Using MLE the
geometrical parameters and the noise model parameters could be reconstructed with
sufficient accuracy. The variances of the reconstructed parameters were estimated
using the Fisher information matrix. Furthermore, MLE has been applied to several
sets of measurement data from different photomasks both for EUV and DUV scat-
terometry. It has been shown that the inclusion of the parameters of the error model
into the optimization improves the reconstruction of the mask’s geometry and leads
to a much better agreement between results of the optimized and the correspond-
ing measurement data. The obtained knowledge of the variances also allows a more
realistic estimate of the accuracy of the reconstructed parameters.
Application of MLE to EUV data yielded relative variances of 10%–20% super-
posed by absolute background noise in the range 1–2·10−3
for the measurement data.
The resulting variances of the CDs were found to be in the range of 2–3 nm, whereas
the height variances are approximately 0.5 nm. The sidewall angles were systemati-
cally larger than with application of the LSQ method and showed better agreement
with the design values as well as with independent measurement with scanning mi-
croscopy [59]. For DUV data MLE yielded relative variances of 3% superposed by
absolute background noise in the range of 3·10−2
. The variances of the CDs were
40 Chapter 3: Maximum Likelihood and Least Squares
found to be in the range of 1–2 nm and the height variances were determined to be
0.5 nm.
The relatively high variances for the geometrical parameters of the EUV mask and
the relative measurement noise of about 10%–20%, a value that is much higher than
the references given by the experimenters, are presumably caused by the much higher
sensitivity of the EUV mask to systematic errors stemming from oversimplifications
in the mathematical model (e.g., assumption of perfect periodic line structure with-
out roughness) and from incomplete knowledge of crucial model parameters (e.g., the
periods of the multilayer). In contrast, the MoSi mask has a much simpler struc-
ture and is therefore more robust against model errors. Nevertheless in both cases,
accuracies in the 1 nm range are within reach.
We will demonstrate how the incorporation of systematic errors, such as roughness
[39] and deviations in the multilayer structure [30] into the modeling and in the MLE
procedure, employed in the solution to the inverse problem, leads to even better
reconstruction results in the following chapters.
Chapter 4
The Effect of Systematic Errors on
Scatterometry
The last chapter closed with the conjecture that the relatively high variances for
the geometrical parameters of the EUV mask may be caused by systematic errors due
to an oversimplification in the mathematical model used to evaluate the measurement
data. In this chapter we will therefore discuss the influence systematic errors have
on the measurement data for scatterometry. Two types of errors will be investigated:
the error caused by line edge (LER) and line width roughness (LWR) and the errors
caused by variations of the multilayer system. The results presented in the first
section are mainly from [26, 35, 36]; those presented in the second section can be
found in [33].
41
42 Chapter 4: The Effect of Systematic Errors on Scatterometry
4.1 Line Edge and Line Width Roughness
The first source of systematic errors that we want to investigate are those caused by
line edge and line width roughness. Images taken by atomic force microscopy (cf. Fig.
4.1) show that the assumption that geometry and material properties of the grating
under investigation are invariant in one direction is not realistic. Instead, the absorber
lines vary along the z-axis (coordinate system as in Fig. 2.4). A rigorous modeling
Figure 4.1: Image taken by an atomic force microscope showing the presence of
roughness on a photomask. (Image by Advanced Mask Technology Center (AMTC)
[2])
of this variation in terms of a three-dimensional structure is at this point quite time-
consuming or even impossible; therefore, a two-dimensional model of the roughness
will be used. In this model roughness is modeled as a superposition of two effects.
The first one, called line edge roughness or LER, is a variation of the center position
of the absorbing structure. The second one, called line width roughness or LWR, is
the variation of the width of the absorbing structures. Note that the computational
domain for FEM needs to contain several profile lines in order to simulate roughness
effects in this setting. This extended computational domain will be called a super cell.
It consists of N neighboring absorber lines with an overall period of P = N · d with d
being the period of the unperturbed line-space structure. Referring to Fig. 4.2, LER
is modeled by random perturbations of the center positions xi, i = {1, .., N}, whereas
the line width and the period d of each profile in this chain are fixed to their nominal
values. LWR is presented by randomly perturbed line widths CDi, i = {1, .., N}, with
unperturbed centers xi and constant pitch.
Chapter 4: The Effect of Systematic Errors on Scatterometry 43
Figure 4.2: Super cell containing many profile lines used for roughness modeling by
randomly changed center positions for LER and randomly varied line widths for LWR
The perturbations are assumed to be normally distributed around the unperturbed
center positions and the nominal line width with variances σ2
xi
and σ2
CDi
. Note that
realizations resulting in perturbations that are larger than the period of the grating
are not considered. For different lines, these perturbations are assumed to be inde-
pendent. Obviously, the positions of left and right edges of the lines are correlated in
this modeling concept, i.e., the correlation coefficient is +1 for LER and −1 for LWR.
If both effects are superimposed, denoted by LEWR, a roughness of the left and right
edges is provided and the variance of each line edge is given by σ2
edge = σ2
xi
+
σCDi
2
2
.
Regarding the impact of line roughness for EUV gratings, Kato et al. [39, 40, 62]
have used the same approach of randomly distributed center positions or line widths
for their analytical considerations with Fraunhofer’s diffraction method. Germer [24]
applied similar design principles for his profile variations of silicon lines investigated
in the visible spectral range. Schuster et al. [65] have studied the impact of LER
for silicon gratings on the basis of sinusoidal perturbations for the line positions with
amplitudes in the range of 2–8 nm and for incident light with wavelengths of 400 nm
and 250 nm. Bilski et al. [7] have used a RCWA model to demonstrate that the
presence of LER influences the reconstructed CDs.
44 Chapter 4: The Effect of Systematic Errors on Scatterometry
We present computations for FEM domains containing 24 rectangular absorber
lines with a period of 280 nm and a line-to-space ratio (L:S), i.e., top CD
(d−top CD)
, of 0.5,
leading to super cells with periods P of 6.72 µm. About 1,000 diffraction patterns for
two different scenes of perturbations were calculated. Standard deviations of 2.8 nm
and 5.6 nm, i.e., 1% and 2% relative to the period of d = 280 nm were used to create
random samples of super cells containing the normally distributed center positions
and line widths, respectively.
The resulting orders of diffraction extend from −9 to +8 and efficiencies smaller
than 10−3
% were excluded. A wavelength of λ = 13.389 nm and an angle of incidence
of θinc = 6◦
were applied. Both values are typical for EUV scatterometry. The optical
indices of the material components and the fixed mask parameters are given in Tab.
4.1 below.
Material Height/nm SWA/◦
n k
TaO 12.00 90 0.948 0.0310
TaN 60 90 0.942 0.0342
SiO2 (buffer) 8.40 90 0.975 0.0153
SiO2 (capping) 1.654 0.975 0.0153
Si 12.869 1.000 0.0018
MoSi 0.147 0.970 0.0042
Mo 2.141 0.926 0.0062
MoSi 1.972 0.970 0.0042
Si 2.838 1.000 0.0018
Table 4.1: Geometric parameters and optical constants at a wavelength of 13.389 nm
of the EUV mask used for LER/LWR simulations, period d = 280 nm.
Figures 4.3–4.5 reveal the details of these calculations. Looking at the simulated
efficiencies as a function of the diffraction order in Fig. 4.3, we see significantly in-
creased variances relative to the calculated efficiencies for higher diffraction orders.
Furthermore, one realizes that a doubling of line roughness gives rise to a dispro-
portional growth of the variances of the efficiencies. The mean efficiencies over all
samples are depicted as diamonds in Fig. 4.3.
Chapter 4: The Effect of Systematic Errors on Scatterometry 45
−10 −5 0 5 10
10
−3
10
−2
10
−1
10
0
10
1
diffraction orders
(λ = 13.389 nm)
(a)
efficiencies/%
LEWR simulations; σ = 2.8 nm; 24 lines
−10 −5 0 5 10
10
−3
10
−2
10
−1
10
0
10
1
diffraction orders
(λ = 13.389 nm)
(b)
efficiencies/%
LEWR simulations; σ = 5.6 nm; 24 lines
Figure 4.3: Simulated diffraction patterns for randomly perturbed line-space struc-
tures at a wavelength of 13.389 nm; blue circles depict the calculated efficiencies and
diamonds the mean efficiencies of all 1,000 samples; two different random perturba-
tions of the center positions and the widths (LEWR): (a) σxi
= σCDi
= 2.8 nm and
(b) σxi
= σCDi
= 5.6 nm.
−8 −6 −4 −2 0 2 4 6 8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
order
(a)
normalizeddeviations
(REF−SIM)/REF; σ = 2.8 nm; 24 lines
−8 −6 −4 −2 0 2 4 6 8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
order
(b)
normalizeddeviations
(REF−SIM)/REF; σ = 5.6 nm; 24 lines
Figure 4.4: Normalized deviations from the efficiencies of the unperturbed reference
line structure, depicted as circles; diamonds represent the mean over the deviations
of all 1,000 samples; dashed lines indicate the mean ± standard deviation of the
normalized deviations; two different random perturbations of the center positions
and the widths (LEWR): (a) σxi
= σCDi
= 2.8 nm and (b) σxi
= σCDi
= 5.6 nm.
46 Chapter 4: The Effect of Systematic Errors on Scatterometry
−8 −6 −4 −2 0 2 4 6 8
−0.1
0
0.1
0.2
0.3
0.4
orders
normalizedstandarddeviation
FEM results: σx
i
= σCD
i
= 5.60 nm
FEM results: σx
i
= σCD
i
= 2.80 nm
Approx.: 1−exp(−σ
r
2
k
j
2
/3); σ
r
= 6.09 nm
Approx.: 1−exp(−σ
r
2
k
j
2
/3); σ
r
= 3.09 nm
Figure 4.5: Standard deviations relative to the mean perturbed efficiencies, shown as
circles for the two given examples in previous Fig. 4.4; diamonds depict approxima-
tions by an exponential function.
A systematic nonlinear decrease of the mean efficiencies for higher diffraction or-
ders along with increasing variances is observed for different degrees of roughness.
The deviations between the unperturbed and the mean of perturbed efficiencies nor-
malized to the unperturbed values are always greater than zero. Figure 4.5 reveals
this by comparison and normalization with the reference values of the efficiencies
obtained from the unperturbed line-space structure.
The systematic decrease can be approximated by an exponential function in the
following way: We assume that the general aperiodic perturbation in the sense of the
applied LEWR model (cf. Fig. 4.2) can be characterized by a roughness parameter
σr that scales with the imposed perturbations σedge = σ2
xi
+ σ2
CDi
/4 of the given
grating samples, such that σr = α · σedge. The mean normalized deviations relative to
the references can then be approximated by the following exponential function
fj,ref(p) − fj,pert
fj,ref(p)
≈ 1 − exp(−σ2
r k2
j ) = 1 − exp(−(ασedge)2
k2
j ), (4.1)
with σr = 3.09 nm (α = 0.99) and σr = 6.09 nm (α = 0.97). The diffraction order nj
is expressed by the corresponding x-component of the wavevector of the propagating
plane wave mode for incidence angle θinc = 0◦
(cf. [27]), i.e., kj = 2πnj/d.
Chapter 4: The Effect of Systematic Errors on Scatterometry 47
Equation (4.1) implies that random perturbations of line and space widths cause
an exponential damping of the mean efficiencies similar to a Debye-Waller factor
[19, 72]. The exponent is proportional to the product of the squared diffraction
orders nj and a constant σ2
r which approximates the variance of the line centers and
widths. These outcomes confirm the validity of the main formula derived by Kato
and Scholze [39] using Fraunhofer approximation.
The increasing variances of the efficiencies with higher diffraction orders also be-
come very clear. For the given two examples of LEWR perturbations, Fig. 4.5 depicts
the standard deviations of the efficiencies relative to their mean values. Note that
they can be approximated by an exponential function too. However, its exponent
is weighted by 1/3 compared to Eq. (4.1) and using the determined values 3.09 nm
(α = 0.99) and 6.09 nm (α = 0.97) for σr = α · σedge, characterizing the damping of
the mean efficiencies.
The most important consequence from the presented numerical experiments is that
the revealed LER/LWR bias has to be included in the model by an order-dependent
damping factor. The representation of the measurements and their associated vari-
ances previously given in Section 3.1 extend at least to
yj = exp −σ2
r k2
j fj(p) + ǫj, (4.2)
σ2
j ≈ a exp −σ2
r k2
j fj(p)
2
+ b2
. (4.3)
However, the variances of the efficiencies due to LEWR, i.e., the intensity fluctuations
around the damped efficiencies, are not considered. For real EUV measurements taken
from surface areas with a measurement spot size in the range of 500 µm x 500 µm we
expect that their contribution to σ2
j will be significantly reduced by spacial averaging
compared to the calculated values in our investigations.
In order to estimate the dependency of the variances on the spacial averaging, i.e.,
the number of lines in the super cell, the above calculations are repeated for a super
cell consisting of N = 48 lines leading to a period of 13.74 µm. However, only 100
diffraction patterns are calculated, due to the computational costs. It turns out that
the order-dependent damping is similar to the case of 24 lines and that the associated
variances are scaled by a factor 24
N
= 1
2
compared to those calculated for a super
48 Chapter 4: The Effect of Systematic Errors on Scatterometry
cell consisting of 24 lines. Hence, given a value of N = 1,785, i.e., a virtual extension
of our FEM super cell period to 499.8 µm (= 0.280 µm·1,785), approximately the size
of the probed area for EUV scatterometry, the normalized standard deviations of the
efficiencies would be scaled down by a factor of about 0.116 compared to the values
given in Fig. 4.5 obtained with 24 lines. For higher diffraction orders, LEWR-based
fluctuations of the efficiencies of several percent remain and correspond to variances
as in the first term on the right-hand side of Eq. (4.3), with typical values for factor
a in the range of 0.01–0.03 (cf. Section 3.1).
However, we will see in Section 4.2 that the contribution to the variances due
to multilayer variations are about a magnitude higher than those due to LEWR.
Therefore, we will only take the bias caused by the order-dependent damping factor
into account and will neglect the slightly increased variances for LEWR for the sake
of simplicity. This leads to a modified model function that, for a given wavelength
and a given incident angle, depends on the geometric parameters p and on the general
aperiodic perturbation σr, such that
f : Rn
→ Rm
, (p, σr) → f (p, σr) = exp −σ2
r k2
j fj(p)
m
j=1
(4.4)
with kj =
2πnj
d
, j ∈ {1, . . . , m}.
The presented results are obtained for a typical EUV measurement setup, with a
fixed incident angle and diffraction patterns consisting only of reflected modes due
to the special design of the EUV mask. Similar investigations were additionally
performed for the DUV measurement setup of the MoSi mask. It turned out that
the damping effect also occurs for transmitted modes and that it is independent of
the incident angle of the light source; cf. Figs. 4.6–4.10.
Chapter 4: The Effect of Systematic Errors on Scatterometry 49
−2 0 2 4 6
10
−3
10
−2
10
−1
10
0
10
1
10
2
diffraction order
efficiencies/%
−2 0 2 4 6
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
diffraction order
normalizeddeviations
Figure 4.6: Effect of LEWR on the transmitted modes for the MoSi mask at a
wavelength of 193 nm, incident angle θinc = −43.6◦
and σxi
= σCDi
= 5.6 nm for 100
samples.
−5 0 5
10
−1
10
0
10
1
10
2
diffraction order
efficiencies/%
−5 0 5
−0.05
0
0.05
0.1
0.15
0.2
diffraction order
normalizeddeviations
Figure 4.7: Effect of LEWR on the transmitted modes for the MoSi mask at a
wavelength of 193 nm, incident angle θinc = 0◦
and σxi
= σCDi
= 5.6 nm for 100
samples.
50 Chapter 4: The Effect of Systematic Errors on Scatterometry
−1 0 1 2 3 4 5
10
−2
10
−1
10
0
10
1
diffraction order
efficiencies/%
−1 0 1 2 3 4 5
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
diffraction order
normalizeddeviations
Figure 4.8: Effect of LEWR on the reflected modes for the MoSi mask at a wavelength
of 193 nm, incident angle θinc = −43.6◦
and σxi
= σCDi
= 5.6 nm for 100 samples.
−3 −2 −1 0 1 2 3
10
−2
10
−1
10
0
10
1
diffraction order
efficiencies/%
−3 −2 −1 0 1 2 3
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
diffraction order
normalizeddeviations
Figure 4.9: Effect of LEWR on the reflected modes for the MoSi mask at a wavelength
of 193 nm, incident angle θinc = 0◦
and σxi
= σCDi
= 5.6 nm for 100 samples.
Chapter 4: The Effect of Systematic Errors on Scatterometry 51
−2 0 2 4 6
−0.05
0
0.05
0.1
0.15
0.2
diffraction order
normalizeddeviations
FEM results
Approx: 1−exp(−σ2
r
k
j
2
); σ
r
=6.26 nm
Figure 4.10: Normalized deviations from the transmitted efficiencies of the unper-
turbed reference line structure for an incident angle θinc = −43.6◦
, σxi
= σCDi
= 5.6
nm; Squares depict mean over all 100 samples, the solid line depicts the approximation
from Eq. (4.1) with σr = 6.26 nm.
4.2 Multilayer System Variations
The construction of the measured EUV photomask itself is another source of sys-
tematic errors. In order to improve its reflectivity the mask is set atop of a multilayer
system (MLS) that serves as a Bragg mirror [11, 46]. This MLS consists of 49 peri-
odically repeated groups of four layers. Such a group consists of a molybdenum layer
(Mo) and a silicon layer (Si) separated by two interdiffusion MoSi layers and two
absorbing layers (SiO2, Si) on top of it (see Fig. 2.5). The layer heights are usually
fixed during the optimization process. However, incomplete knowledge of the heights
of those layers can cause errors in the reconstruction.
The sensitivity of the reconstruction with respect to changes in those parameters
has already been investigated in [30] in terms of a simple least squares approach to
the inverse problem. In this section we concentrate on the direct impact of changes
in the MLS on the simulated diffraction pattern. For the sake of simplicity we only
investigate the influence of the height of the first and second absorbing layers, hfc
and hsc, respectively, as well as that of a parameter κ denoting a scaling factor by
which simultaneously all heights of the layers in the 49 periodically repeated groups
52 Chapter 4: The Effect of Systematic Errors on Scatterometry
of MoSi layers are scaled. These parameters are summarized and denoted by the
vector ν = (hfc, hsc, κ).
We use two different line-to-space ratios in this investigation. For the first one a
top CD of 93 nm and a period of 280 nm are chosen resulting in an L:S of 1:2; the
second one has a top CD of 540 nm and a period of 720 nm, leading to an L:S of
3:1. For the investigation, 1,000 random samples of MLS parameters are drawn and
the corresponding diffraction patterns for a wavelength of 13.4 nm and an incident
angle of 6◦
are calculated. The three parameters are chosen to be independent and
normally distributed with means and standard deviations given in Tab. 4.2. Note
that the remaining geometric parameters and simulation conditions are the same in
all the cases.
Parameter µ σ
hfc 1.2 nm 0.01 nm
hsc 12.9 nm 0.5 nm
κ 1 10−3
Table 4.2: Means and standard deviations for the three MLS parameters.
The resulting distribution of efficiencies can be found in Fig. 4.11 for both line-
to-space ratios. Even though the variations in the MLS are small, the resulting
perturbation of the simulated diffraction pattern is quite strong. As one might expect,
the overall variance in the efficiencies increases as the line-to-space ratio decreases.
This is due to the fact that a lower L:S means that more radiation is reflected. In the
present case the efficiencies for the mask with an L:S of 1:2 are about a magnitude
higher than those for the mask with an L:S of 3:1, and hence variations of the MLS
have a stronger impact on the efficiencies. The variance is about 15% in mean and
can be as high as 300% of the unperturbed value for the mask with an L:S of 1:2 and
about 12% in mean and at most 80% of the unperturbed value for the mask with an
L:S of 3:1. However in contrast to LER/LWR, the effect that variations of the MLS
have on the diffraction pattern cannot be described by a suitable analytic formula.
Chapter 4: The Effect of Systematic Errors on Scatterometry 53
−5 0 5
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
1
10
2
diffraction order
efficiencies/%
−5 0 5
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
diffraction order
normalizeddeviations
−10 −5 0 5 10
10
−2
10
−1
10
0
10
1
diffraction order
efficiencies/%
−10 −5 0 5 10
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
diffraction order
normalizeddeviations
Figure 4.11: Simulated diffraction patterns for randomly perturbed MLS (left panels)
and normalized deviations from the efficiencies of the unperturbed MLS (right panels),
diamond symbols represent the mean over all samples; dashed lines indicate the mean
± standard deviation. The top panels show the effect for a period d = 280 nm and
L:S of 1:2, bottom panels for a period of 720 nm and L:S of 3:1.
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013
DISS2013

Weitere ähnliche Inhalte

Was ist angesagt?

Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica KristemKertzeif1
 
PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"Vicente RIBAS-RIPOLL
 
Statistics for economists
Statistics for economistsStatistics for economists
Statistics for economistsMt Ch
 
Clustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory EClustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory EGabriele Pompa, PhD
 
Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014George Jenkins
 
Calculus Research Lab 2: Integrals
Calculus Research Lab 2: IntegralsCalculus Research Lab 2: Integrals
Calculus Research Lab 2: IntegralsA Jorge Garcia
 
Eui math-course-2012-09--19
Eui math-course-2012-09--19Eui math-course-2012-09--19
Eui math-course-2012-09--19Leo Vivas
 
Triangulation methods Mihaylova
Triangulation methods MihaylovaTriangulation methods Mihaylova
Triangulation methods MihaylovaZlatka Mihaylova
 
M152 notes
M152 notesM152 notes
M152 noteswfei
 
Best of numerical
Best of numericalBest of numerical
Best of numericalCAALAAA
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggRohit Bapat
 
MSc_thesis_OlegZero
MSc_thesis_OlegZeroMSc_thesis_OlegZero
MSc_thesis_OlegZeroOleg Żero
 
Business Mathematics Code 1429
Business Mathematics Code 1429Business Mathematics Code 1429
Business Mathematics Code 1429eniacnetpoint
 
Ieml semantic topology
Ieml semantic topologyIeml semantic topology
Ieml semantic topologyAntonio Medina
 
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid Mateos
 

Was ist angesagt? (19)

Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica Manual de tecnicas de bioestadística basica
Manual de tecnicas de bioestadística basica
 
PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"
 
Statistics for economists
Statistics for economistsStatistics for economists
Statistics for economists
 
Clustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory EClustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory E
 
Book linear
Book linearBook linear
Book linear
 
Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014
 
Lecturenotesstatistics
LecturenotesstatisticsLecturenotesstatistics
Lecturenotesstatistics
 
Thesis lebanon
Thesis lebanonThesis lebanon
Thesis lebanon
 
Calculus Research Lab 2: Integrals
Calculus Research Lab 2: IntegralsCalculus Research Lab 2: Integrals
Calculus Research Lab 2: Integrals
 
Eui math-course-2012-09--19
Eui math-course-2012-09--19Eui math-course-2012-09--19
Eui math-course-2012-09--19
 
Triangulation methods Mihaylova
Triangulation methods MihaylovaTriangulation methods Mihaylova
Triangulation methods Mihaylova
 
M152 notes
M152 notesM152 notes
M152 notes
 
Best of numerical
Best of numericalBest of numerical
Best of numerical
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
 
main
mainmain
main
 
MSc_thesis_OlegZero
MSc_thesis_OlegZeroMSc_thesis_OlegZero
MSc_thesis_OlegZero
 
Business Mathematics Code 1429
Business Mathematics Code 1429Business Mathematics Code 1429
Business Mathematics Code 1429
 
Ieml semantic topology
Ieml semantic topologyIeml semantic topology
Ieml semantic topology
 
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
 

Andere mochten auch

2. Adobe Round Table: AEM im Einsatz beim BIT
2. Adobe Round Table: AEM im Einsatz beim BIT2. Adobe Round Table: AEM im Einsatz beim BIT
2. Adobe Round Table: AEM im Einsatz beim BITNamics
 
Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...
Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...
Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...Michael Zarcone
 
Recurso contencioso tributario. irene rosendo
Recurso contencioso tributario. irene rosendoRecurso contencioso tributario. irene rosendo
Recurso contencioso tributario. irene rosendoIrene Rosendo
 
informatica contable
informatica contableinformatica contable
informatica contableLILIANA1234
 
(PDI) Tópico 00 - Apresentação da Disciplina
(PDI) Tópico 00 - Apresentação da Disciplina(PDI) Tópico 00 - Apresentação da Disciplina
(PDI) Tópico 00 - Apresentação da DisciplinaFabricio Narcizo
 
Auto Expo 2012 Coverage03
Auto Expo 2012 Coverage03Auto Expo 2012 Coverage03
Auto Expo 2012 Coverage03Neeraj Upadhyay
 
Plan Alarme Gemini Ge7057 chez autoprestige-tuning
Plan Alarme Gemini Ge7057 chez autoprestige-tuningPlan Alarme Gemini Ge7057 chez autoprestige-tuning
Plan Alarme Gemini Ge7057 chez autoprestige-tuningautoprestige
 
20 Milioni di Investimenti per la Bologna-Porretta
20 Milioni di Investimenti per la Bologna-Porretta20 Milioni di Investimenti per la Bologna-Porretta
20 Milioni di Investimenti per la Bologna-PorrettaRaffaele Donini
 
Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]
Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]
Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]Raffaele Donini
 
Adobe formulare in der praxis
Adobe formulare in der praxisAdobe formulare in der praxis
Adobe formulare in der praxisiProCon GmbH
 

Andere mochten auch (17)

2. Adobe Round Table: AEM im Einsatz beim BIT
2. Adobe Round Table: AEM im Einsatz beim BIT2. Adobe Round Table: AEM im Einsatz beim BIT
2. Adobe Round Table: AEM im Einsatz beim BIT
 
Dot of music
Dot of musicDot of music
Dot of music
 
Instruccion
InstruccionInstruccion
Instruccion
 
Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...
Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...
Back to School Webinar: A Crash Course In Digital Marketing Opportunities You...
 
Recurso contencioso tributario. irene rosendo
Recurso contencioso tributario. irene rosendoRecurso contencioso tributario. irene rosendo
Recurso contencioso tributario. irene rosendo
 
informatica contable
informatica contableinformatica contable
informatica contable
 
(PDI) Tópico 00 - Apresentação da Disciplina
(PDI) Tópico 00 - Apresentação da Disciplina(PDI) Tópico 00 - Apresentação da Disciplina
(PDI) Tópico 00 - Apresentação da Disciplina
 
6623
66236623
6623
 
icm_linked_in_pres
icm_linked_in_presicm_linked_in_pres
icm_linked_in_pres
 
Auto Expo 2012 Coverage03
Auto Expo 2012 Coverage03Auto Expo 2012 Coverage03
Auto Expo 2012 Coverage03
 
Linux
LinuxLinux
Linux
 
Plan Alarme Gemini Ge7057 chez autoprestige-tuning
Plan Alarme Gemini Ge7057 chez autoprestige-tuningPlan Alarme Gemini Ge7057 chez autoprestige-tuning
Plan Alarme Gemini Ge7057 chez autoprestige-tuning
 
20 Milioni di Investimenti per la Bologna-Porretta
20 Milioni di Investimenti per la Bologna-Porretta20 Milioni di Investimenti per la Bologna-Porretta
20 Milioni di Investimenti per la Bologna-Porretta
 
Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]
Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]
Proposta di IGQ- Regione Emilia Romagna [Ottobre 2015]
 
Microanalisis de evidencias
Microanalisis de evidenciasMicroanalisis de evidencias
Microanalisis de evidencias
 
Adobe formulare in der praxis
Adobe formulare in der praxisAdobe formulare in der praxis
Adobe formulare in der praxis
 
MALAD
MALADMALAD
MALAD
 

Ähnlich wie DISS2013

NOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODS
NOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODSNOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODS
NOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODSCanh Le
 
2012-02-17_Vojtech-Seman_Rigorous_Thesis
2012-02-17_Vojtech-Seman_Rigorous_Thesis2012-02-17_Vojtech-Seman_Rigorous_Thesis
2012-02-17_Vojtech-Seman_Rigorous_ThesisVojtech Seman
 
An Introduction To Mathematical Modelling
An Introduction To Mathematical ModellingAn Introduction To Mathematical Modelling
An Introduction To Mathematical ModellingJoe Osborn
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Alexander Zhdanov
 
Lecturenotesstatistics
LecturenotesstatisticsLecturenotesstatistics
LecturenotesstatisticsRekha Goel
 
Mathematical modeling models, analysis and applications ( pdf drive )
Mathematical modeling  models, analysis and applications ( pdf drive )Mathematical modeling  models, analysis and applications ( pdf drive )
Mathematical modeling models, analysis and applications ( pdf drive )UsairamSheraz
 
LChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBMLChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBMLeitao Chen
 
Donhauser - 2012 - Jump Variation From High-Frequency Asset Returns
Donhauser - 2012 - Jump Variation From High-Frequency Asset ReturnsDonhauser - 2012 - Jump Variation From High-Frequency Asset Returns
Donhauser - 2012 - Jump Variation From High-Frequency Asset ReturnsBrian Donhauser
 
From sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational modelFrom sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational modelMarco Piccolino
 
Algorithms for Reinforcement Learning
Algorithms for Reinforcement LearningAlgorithms for Reinforcement Learning
Algorithms for Reinforcement Learningmustafa sarac
 
Robustness in Deep Learning - Single Image Denoising using Untrained Networks...
Robustness in Deep Learning - Single Image Denoising using Untrained Networks...Robustness in Deep Learning - Single Image Denoising using Untrained Networks...
Robustness in Deep Learning - Single Image Denoising using Untrained Networks...Daniel983829
 

Ähnlich wie DISS2013 (20)

NOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODS
NOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODSNOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODS
NOVEL NUMERICAL PROCEDURES FOR LIMIT ANALYSIS OF STRUCTURES: MESH-FREE METHODS
 
MScThesis1
MScThesis1MScThesis1
MScThesis1
 
2012-02-17_Vojtech-Seman_Rigorous_Thesis
2012-02-17_Vojtech-Seman_Rigorous_Thesis2012-02-17_Vojtech-Seman_Rigorous_Thesis
2012-02-17_Vojtech-Seman_Rigorous_Thesis
 
An Introduction To Mathematical Modelling
An Introduction To Mathematical ModellingAn Introduction To Mathematical Modelling
An Introduction To Mathematical Modelling
 
thesis
thesisthesis
thesis
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
 
Lecturenotesstatistics
LecturenotesstatisticsLecturenotesstatistics
Lecturenotesstatistics
 
Mathematical modeling models, analysis and applications ( pdf drive )
Mathematical modeling  models, analysis and applications ( pdf drive )Mathematical modeling  models, analysis and applications ( pdf drive )
Mathematical modeling models, analysis and applications ( pdf drive )
 
final_report_template
final_report_templatefinal_report_template
final_report_template
 
HonsTokelo
HonsTokeloHonsTokelo
HonsTokelo
 
thesis
thesisthesis
thesis
 
LChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBMLChen_diss_Pitt_FVDBM
LChen_diss_Pitt_FVDBM
 
Vasilyev
VasilyevVasilyev
Vasilyev
 
Donhauser - 2012 - Jump Variation From High-Frequency Asset Returns
Donhauser - 2012 - Jump Variation From High-Frequency Asset ReturnsDonhauser - 2012 - Jump Variation From High-Frequency Asset Returns
Donhauser - 2012 - Jump Variation From High-Frequency Asset Returns
 
From sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational modelFrom sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational model
 
Algorithms for Reinforcement Learning
Algorithms for Reinforcement LearningAlgorithms for Reinforcement Learning
Algorithms for Reinforcement Learning
 
phd-thesis
phd-thesisphd-thesis
phd-thesis
 
probabilidades.pdf
probabilidades.pdfprobabilidades.pdf
probabilidades.pdf
 
Robustness in Deep Learning - Single Image Denoising using Untrained Networks...
Robustness in Deep Learning - Single Image Denoising using Untrained Networks...Robustness in Deep Learning - Single Image Denoising using Untrained Networks...
Robustness in Deep Learning - Single Image Denoising using Untrained Networks...
 
phd-2013-dkennett
phd-2013-dkennettphd-2013-dkennett
phd-2013-dkennett
 

DISS2013

  • 1. Statistical Approaches to the Inverse Problem of Scatterometry vorgelegt von Diplom-Mathematiker Mark-Alexander Henn aus Mainz Von der Fakult¨at II – Mathematik und Naturwissenschaften der Technischen Universit¨at Berlin zur Erlangung des akademischen Grades Doktor der Naturwissenschaften – Dr. rer. nat. – genehmigte Dissertation Promotionsausschuss: Vorsitzende: Prof. Dr. rer. nat. Ulrike Woggon Berichter: Prof. Dr. rer. nat. Markus B¨ar Berichter: Prof. Dr. rer. nat. Harald Engel Berichter: Dr. rer. nat. habil. Andreas Rathsfeld Tag der wissenschaftlichen Aussprache: 5. Juli 2013 Berlin 2013 D 83
  • 2.
  • 3. iii “Each of us, deep down, believes that the whole world issues from his own precious body, like images projected from a tiny slide onto an earth-sized screen. And then, deeper down, each of us knows he’s wrong.” Chad Harbach, The Art of Fielding (2011)
  • 4.
  • 5. Abstract In the present work statistical approaches to the inverse problem of scatterometry are discussed. Scatterometry is the dimensional characterization of periodic nano- structures as they are used in the manufacturing of lithographic masks. In contrast to direct imaging methods, such as electron microscopy, scatterometry is a non-imaging indirect measuring method. The critical dimensions (CDs) such as line widths and heights of the surface profile are determined from the measured light diffraction pat- tern. The classical way to solve the inverse problem is the least squares (LSQ) approach. Starting with a model function that depends on the parameters to be reconstructed, the norm of the difference between the measured and simulated data is minimized. The right choice of weights that account for the variances in the measurement data plays a crucial role here, as an inappropriate choice of weights may cause an unsatisfactory reconstruction and furthermore an overestimation of the associated uncertainties of the reconstructed parameters. Therefore the maximum likelihood estimation (MLE) is introduced as a method to solve the inverse problem of scatterometry. By doing this, it is possible to determine the variances of the measurement data in addition to the determination of the critical dimensions. In the case of a simplified model function, in which significant effects are not considered, MLE yields estimates for the variances of the measurement data that are way too large. Thus the present work investigates two types of systematic errors and their effect on the measured diffraction pattern. These errors stem from line roughness and variations of the absorbing structure beneath the periodic line structure. It is shown how the estimated variances for the measurement data reduce if the systematic errors are included into the model function. Furthermore this procedu- re yields estimates for the critical dimensions that are consistent with results from alternative measurement methods. In the last part an example for a Bayesian approach to solving the inverse problem of scatterometry is given. In contrast to LSQ and MLE, the solution to the inverse problem in Bayesian terms is not a single estimate for the parameters of interest but rather their corresponding probability distribution. An advantage of the Bayesian v
  • 6. vi Abstract approach is that information about the critical dimensions obtained by alternative methods can be incorporated as prior knowledge. It is demonstrated that several measurement methods can be combined. As a result the uncertainties for the critical dimensions can be drastically reduced.
  • 7. Zusammenfassung In der vorliegenden Arbeit werden verschiedene statistische Verfahren zur L¨osung des inversen Problems in der Scatterometrie vorgestellt. Scatterometrie bezeichnet hierbei die dimensionelle Charakterisierung periodischer Nanostrukturen wie sie zum Beispiel in der Herstellung von Lithographiemasken benutzt werden. Im Gegensatz zu bildgebenden Verfahren, etwa der Elektronenmikroskopie, handelt es sich bei der Scatterometrie um eine indirekte Messmethode, d.h. aus den winkelabh¨angig gemesse- nen Streulichtintensit¨aten (dem Beugungsmuster) werden kritische Dimensionen wie z.B. Linienbreiten oder H¨ohen der Probe berechnet. Der klassische Weg das inverse Problem zu l¨osen besteht darin, es als Regressions- problem im Sinne der Methode der kleinsten Quadrate zu interpretieren. Ausgehend von einer Modellfunktion, die von den zu rekonstruierenden Parametern abh¨angt, wird der Abstand zwischen den gemessenen Daten und den vom Modell berechneten Werten minimiert. Eine dabei erforderliche Gewichtung der unterschiedlichen Mess- werte, im Sinne der f¨ur diese zu erwartenden Varianzen in der Messung, spielt eine große Rolle. Im Falle nicht genau bekannter Gewichte k¨onnen die rekonstruierten Parameter stark von den tats¨achlichen Werten abweichen und auch die f¨ur die rekon- struierten Parameter gesch¨atzten Unsicherheiten sind unter Umst¨anden deutlich zu groß. In dieser Arbeit wird daher zun¨achst die Maximum Likelihood Methode (MLE) als Verfahren zur L¨osung des inversen Problems angewandt. Diese erm¨oglicht es, ne- ben den kritischen Dimensionen der Probe, auch die Varianzen der Messwerte zu sch¨atzen. F¨ur zu stark vereinfachte Modellfunktionen die dazu f¨uhren, dass bestimm- te signifikante Effekte nicht ber¨ucksichtigt werden, liefert MLE jedoch Sch¨atzungen f¨ur die Varianzen der Eingangsdaten die deutlich zu groß sind. Daher werden in der vorliegenden Arbeit zwei Arten systematischer Fehler und deren Einfluss auf die Mess- werte untersucht. Hierbei handelt es sich um die sogenannte Linienrauheit und um Variationen der Absorberstruktur auf die die Linien aufgebracht werden. Es wird gezeigt, dass sich die gesch¨atzten Varianzen der Messdaten deutlich redu- zieren, wenn die Modellfunktion um die beiden o.g. systematischen Effekte erweitert wird, und dass auch die rekonstruierten kritischen Dimensionen konsistent zu den Ergebnissen alternativer Messmethoden sind. vii
  • 8. viii Zusammenfassung Im letzten Teil dieser Arbeit wird schließlich ein Ausblick auf den Bayes’schen Ansatz zur L¨osung des inversen Problems gegeben. Im Gegensatz zu den beiden vor- her diskutierten Ans¨atzen, geht es bei der Bayes’schen Methode nicht darum einen einzelnen Wert f¨ur die kritischen Dimensionen zu bestimmen, es wird vielmehr die Wahrscheinlichkeitsverteilung der Parameter bestimmt. Ein großer Vorteil dieser Me- thode ist es, dass Informationen ¨uber die kritischen Dimensionen, die ¨uber alterna- tive Messverfahren bestimmt wurden, als Vorwissen eingebracht werden k¨onnen und es damit erm¨oglicht wird verschiedene Messverfahren zu kombinieren, um die den kritischen Dimensionen zugeordneten Unsicherheiten zu reduzieren.
  • 9. Contents Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Citations to Previously Published Work . . . . . . . . . . . . . . . . . . . xvii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi 1 Introduction 1 2 Preliminaries 7 2.1 Experimental Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Mathematical Modeling of Scatterometry . . . . . . . . . . . . . . . . 11 2.3 Inverse Problem Theory . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Maximum Likelihood and Least Squares 29 3.1 Measurement Error Model . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 The Effect of Systematic Errors on Scatterometry 41 4.1 Line Edge and Line Width Roughness . . . . . . . . . . . . . . . . . 42 4.2 Multilayer System Variations . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5 The Effect of Systematic Errors on the Reconstruction using MLE 55 5.1 Maximum Likelihood Estimation and Model Selection . . . . . . . . . 56 5.2 Results for the EUV Mask . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3 Results for the MoSi Mask . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 ix
  • 10. x Contents 6 Bayesian Approach to Scatterometry 71 6.1 Approximation of the Likelihood Function . . . . . . . . . . . . . . . 72 6.2 Using Prior Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3 Bayesian Approach to EUV Scatterometry . . . . . . . . . . . . . . . 74 6.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7 Summary, Conclusions and Outlook 81 7.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 81 7.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Bibliography 85
  • 11. List of Figures 1.1 Sizes of semiconductor manufacturing process nodes . . . . . . . . . . 2 2.1 Scheme of a scatterometric setup . . . . . . . . . . . . . . . . . . . . 8 2.2 Scheme of the spectroscopic reflectometer . . . . . . . . . . . . . . . . 10 2.3 Scheme of the goniometric reflectometer . . . . . . . . . . . . . . . . 11 2.4 Scheme of the computational domain . . . . . . . . . . . . . . . . . . 12 2.5 Cross section of the EUV mask . . . . . . . . . . . . . . . . . . . . . 15 2.6 Cross section of the MoSi mask . . . . . . . . . . . . . . . . . . . . . 17 3.1 χ2 in dependence on CDs for different b/a . . . . . . . . . . . . . . . 32 3.2 Reconstructed CDs and SWA in dependence on b/a for simulated data 32 3.3 Reconstructed noise parameter a and b/a for simulated data . . . . . 34 3.4 Reconstructed SWAs for LSQ and MLS for simulated data . . . . . . 34 3.5 RMSD and mean estimated standard deviations for LSQ and MLE . 35 3.6 Reconstructed CDs and SWA in dependence on b/a for dataset D4 . . 36 3.7 Reconstructed noise parameter a and b/a for measured EUV data . . 36 3.8 Reconstructed SWAs for measured EUV data . . . . . . . . . . . . . 37 3.9 Reconstructed noise parameter a and b/a for measured DUV data . . 38 3.10 Reconstructed SWA for measured DUV data . . . . . . . . . . . . . . 38 4.1 AFM image showing LEWR . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Super cell design for LER/LWR computations . . . . . . . . . . . . . 43 4.3 Simulated diffraction patterns for perturbed line-space structures in EUV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Normalized deviations from the efficiencies of the unperturbed refer- ence line structure in EUV . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 Standard deviations relative to the mean perturbed efficiencies in EUV 46 4.6 Effect of LEWR on the transmitted modes for the MoSi mask I . . . 49 4.7 Effect of LEWR on the transmitted modes for the MoSi mask II . . . 49 4.8 Effect of LEWR on the reflected modes for the MoSi mask I . . . . . 50 4.9 Effect of LEWR on the reflected modes for the MoSi mask II . . . . . 50 xi
  • 12. xii List of Figures 4.10 Normalized deviations from the efficiencies of the unperturbed refer- ence line structure in DUV . . . . . . . . . . . . . . . . . . . . . . . . 51 4.11 Effect of MLS variations on simulated efficiencies . . . . . . . . . . . 53 5.1 Reconstructed SWAs for the different models for simulated data . . . 59 5.2 RMSD from the actual value and square root of mean of the estimated variances for the different models . . . . . . . . . . . . . . . . . . . . 59 5.3 Reconstructed roughness parameter σr and noise parameters a and b for the different models for simulated data . . . . . . . . . . . . . . . 60 5.4 Likelihood-ratio test values for simulated data . . . . . . . . . . . . . 60 5.5 Dependency of log-likelihood on heights for dataset D4 and model M3 62 5.6 Reconstructed SWAs for the different models for measured EUV data 63 5.7 Reconstructed top CD deviation for the different models for measured EUV data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.8 Reconstructed roughness parameter σr and noise parameters a and b for measured EUV data . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.9 Comparison of reconstructed SWAs to 3D/AFM data . . . . . . . . . 65 5.10 Likelihood-ratio test values for measured EUV data . . . . . . . . . . 66 5.11 Reconstructed SWAs for measured DUV data with and without LEWR correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.12 Reconstructed roughness parameter σr and noise parameters a and b for measured DUV data . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.13 Likelihood-ratio values for measured DUV data . . . . . . . . . . . . 68 6.1 Distribution of the geometry parameters for field H5 I . . . . . . . . . 76 6.2 Distribution of the geometry parameters for field H5 II . . . . . . . . 77 6.3 Weights for Lapprox and πpost for different prior variances for field H5 . 78 6.4 Dependence of posterior estimators on prior knowledge for field H5 . 79 6.5 Dependence of posterior standard deviation on prior knowledge for field H5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
  • 13. List of Tables 2.1 Details of the EUV mask I . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Details of the MoSi mask . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Design values of the EUV mask I . . . . . . . . . . . . . . . . . . . . 35 4.1 Geometric parameters and optical constants for LER/LWR computations 44 4.2 Means and standard deviations for the MLS parameters . . . . . . . . 52 5.1 Models used for MLE with EUV data . . . . . . . . . . . . . . . . . . 57 5.2 Details of the EUV mask II . . . . . . . . . . . . . . . . . . . . . . . 58 5.3 Design values of the EUV mask II . . . . . . . . . . . . . . . . . . . . 61 5.4 Optimized parameters using different starting values for dataset D4 and model M3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.5 Results of the reconstruction using different initial values for measured data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.6 Square roots of the mean of the estimated variances for the geometry parameters for measured data . . . . . . . . . . . . . . . . . . . . . . 65 5.7 Models used for MLE with DUV data . . . . . . . . . . . . . . . . . . 66 6.1 Mean values and standard deviations from 3D/CD-AFM analysis of field H5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2 Standard deviations of the geometry parameters for different priors for field H5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 xiii
  • 14. List of Symbols Rn n-dimensional vector space of real numbers In Identity matrix of size n · Euclidean norm in Rn Z Set of integers C Set of complex numbers ∅ Empty set i Imaginary unit Re z Real part of z Im z Imaginary part of z µ0 Permeability of free space ǫ Permittivity E Electric field H Magnetic field ∆ Laplace operator ∂n Normal derivative ω Angular frequency p Parameter vector y Measurement data vector J Jacobian L Likelihood function I Fisher information matrix X Random variable ˆx Estimate of X xiv
  • 15. List of Symbols xv u (ˆx) Uncertainty associated with ˆx π Probability density function N (µ, σ2 ) Univariate normal distribution with mean µ and variance σ2 Σ Covariance matrix σr Roughness parameter
  • 16. List of Abbreviations AFM Atomic force microscopy CD Critical dimension DUV Deep ultraviolet EUV Extreme ultraviolet FEM Finite element method LER Line edge roughness LWR Line width roughness L:S Line-to-space ratio LSQ Least squares MLE Maximum likelihood estimation MLS Multilayer system PDE Partial differential equation PDF Probability density function PTB Physikalisch-Technische Bundesanstalt (National metrology institute of Germany) RCWA Rigorous coupled-wave analysis RMSD Root-mean-square deviation SVD Singular value decomposition SWA Sidewall angle TE Transverse electric polarization TM Transverse magnetic polarization xvi
  • 17. Citations to Previously Published Work Some results from Chapter 3 appear in: “A maximum likelihood approach to the inverse problem of scatterome- try,” M.-A. Henn, H. Gross, F. Scholze, M. Wurm, C. Elster and M. B¨ar, Optics Express 20, 12 (2012) Results from Chapter 4 have already been published in the following three papers: “Modeling of line roughness and its impact on the diffraction intensities and the reconstructed critical dimensions in scatterometry,” H. Gross, M.-A. Henn, S. Heidenreich, A. Rathsfeld and M. B¨ar, Appl. Opt. 51, 30 (2012) “Improved grating reconstruction by determination of line roughness in extreme ultraviolet scatterometry,” M.-A. Henn, S. Heidenreich, H. Gross, A. Rathsfeld, F. Scholze and M. B¨ar, Optics Letters 37, 24 (2012) “The effect of line roughness on DUV scatterometry,” M.-A. Henn, S. Heidenreich, H. Gross, B. Bodermann and M. B¨ar, Proc. SPIE 8789, (2013) Some of the results from Chapter 5 have been published in: “Improved Reconstruction of Critical Dimensions in Extreme Ultraviolet Scatterometry by Modeling Systematic Errors,” M.-A. Henn, H. Gross, S. Heidenreich, F. Scholze, C. Elster and M. B¨ar, Meas. Sci. Technol. 25, 4 (2014) xvii
  • 18.
  • 19. Acknowledgments First I would like to thank my supervisor Prof. Markus B¨ar and my dear col- leagues at the Physikalisch-Technische Bundesanstalt (PTB), Dr. Clemens Elster, Dr. Hermann Groß and Dr. Sebastian Heidenreich, for their support, suggestions and fruitful comments during my work on this thesis. Special thanks to Dr. Bernd Bodermann, Dr. Matthias Wurm and Dr. Frank Scholze for providing the measure- ment data that has been evaluated in this work and to Dr. Gaoliang Dai for allowing me to use his 3D/AFM measurement results. Furthermore I would like to thank Prof. Harald Engel of the Technische Uni- versit¨at Berlin for being the second supervisor for this dissertation, Dr. Andreas Rathsfeld of the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) for helping me with all my questions about DIPOG and programming issues, Gerd Lindner at the PTB for helping me out with my IT problems, and Dr. Sergio Alonso for being such a great office co-worker. I would like to express my sincere gratitude to several special people for guiding me on my way through the scientific world from the very beginning, Prof. Christine Papadakis, Carla Lehrbach, my late godmother Cornelia Beilstein and Prof. Rainer W¨ust. Moreover many thanks to all my dear friends for reminding me that there is more to life than mathematics and physics. Finally and most importantly I would like to thank my family, especially my sister Daniela, my brother-in-law Christian, my nieces Anna and Luisa and most of all my mother for their support, love and encouragement. Nothing would have been possible without you. xix
  • 20.
  • 21. This work is dedicated to my mother. xxi
  • 22.
  • 23. Chapter 1 Introduction Moore’s law, formulated in 1965 [52], states that the number of transistors on integrated circuits doubles approximately every two years, consequently leading to a decrease in the critical dimensions of semiconductor elements, as shown in Fig. 1.1 below. In 2011, Intel launched the first commercial 22 nm semiconductors [37], where 22 nm denotes the average half-pitch (i.e., half the distance between identical features) of a memory cell using this technology. In light of this progress, it is now crucially important to further develop reliable, robust and accurate methods to moni- tor the manufacturing process. Advanced lithography, e.g. extreme ultraviolet (EUV) lithography [8], is the key technology for the manufacturing of such semiconductors. Highly accurate diffractive optical elements (photo masks) play an important role and their dimensional characterization is an active field of current research [22]. As an alternative to direct imaging methods, such as atomic force microscopy (AFM), scatterometry is a non-imaging indirect optical method. In scatterometry the sample is irradiated with light of a specific wavelength and the diffraction pattern of the scattered light is recorded. In this work two types of scatterometry are investi- gated. Deep ultraviolet scatterometry (DUV) [53, 75, 76], operating in the spectrum at about 193 nm, and extreme ultraviolet scatterometry (EUV) [44, 57], using radi- ation with wavelengths around 13.5 nm. Photo masks used in EUV lithography are usually constructed of lines of absorbing materials set atop a multilayer system (MLS) serving as a Bragg mirror for wavelengths in the EUV range. DUV scatterometry can 1
  • 24. 2 Chapter 1: Introduction Figure 1.1: Comparison of sizes of semiconductor manufacturing process nodes with some microscopic objects and visible light wavelengths. (Illustration by Cmglee [1]) be used to evaluate the critical dimensions of the absorber structure, as the influence of the multilayer features is negligible for light in the DUV range. On the other hand, the high sensitivity of EUV radiation with respect to variations in the multilayer can be used to evaluate its properties additionally. The short wavelength of EUV is also advantageous since it provides a large number of diffraction orders from the periodic structures irradiated. Therefore, EUV and DUV scatterometry complement each other for metrology on such masks. As an indirect method scatterometry heavily depends on the post-processing of the actual measurement data, i.e., the conversion of the measurement data into in- formation about the critical dimensions of the investigated photo mask. This post- processing involves the solving of an inverse problem [69]. In contrary to the problem of predicting how a given mask reacts to irradiation by a beam of a given wavelength, called the forward problem or the evaluation of the forward model, the inverse prob- lem is ill-posed in the Hadamard sense [31]. This means that a unique solution may not exist and even if it exists small errors in the input data can lead to large errors in the solution. Explicit use of a priori information and detailed knowledge about the
  • 25. Chapter 1: Introduction 3 variances of the input data are necessary in order to get a reliable solution. There are different approaches to the forward problem in scatterometry. Sev- eral works use the rigorous coupled-wave analysis (RCWA) [42, 50, 51, 53, 54, 70]. However, Bodermann and Ehret [9] and Berger et al. [6] have shown that the finite element method (FEM) yields better results if more complex structures are inves- tigated. In the present case of periodic line structures Maxwell’s equations reduce to the Helmholtz equation [43, 58] with adequate boundary conditions. The finite element method (FEM) is then used to solve these equations [14]. In this work the software package DIPOG [20], developed by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) in Berlin, is used as a forward solver. Obviously, the forward model depends on many parameters. A common method to reduce the dimensionality of the problem, and hence the number of possible so- lutions to the inverse problem, is to assume that the forward model depends only on a small number of parameters. All the remaining influence parameters are fixed to certain values, accounting for a priori knowledge. One example of such a sim- plification for modeling the surface profile is to assume that the mask’s profile is a symmetric polygonal domain composed of a finite number of trapezoid layers with different materials. One would be interested in the width and the sidewall angles of the polygonals, while keeping the heights and optical constants of the materials constant. The classical approach to the inverse problem is to set it up as a weighted least squares problem, i.e., to find the combination of parameters having an influence on the forward model such that the weighted difference between measurement data and the model data is minimized. The weight factors in the least squares function account for the variance in the measurement data and therefore represent knowledge about the underlying measurement error model. The weighted least squares approach is widely used in scatterometry [3, 30, 54, 59, 67] for it is robust, well developed and easy to implement. However, it strongly depends on the used weight factors. Choos- ing inadequate weights can add a bias to the reconstruction such that the application of LSQ yields incorrect values for the parameters of interest and additionally leads to wrong estimates of the associated uncertainties. Therefore, the maximum likeli-
  • 26. 4 Chapter 1: Introduction hood estimation (MLE) [49] is introduced as a method to solve the inverse problem of scatterometry. In MLE the variances of the input data are treated as variables that need to be reconstructed as well. Even though MLE leads to a more reliable solution to the inverse problem and the associated uncertainties in the first place, the reconstructed variances for measurement data are much larger than those estimated by the experimenters. This is due to the fact that the used forward model is still a simplification of the actual experiment such that it does not account for systematic errors. In this work two types of systematic errors and their influence on the measured efficiencies are discussed and eventually incorporated into the MLE approach, namely the influence of line roughness [7, 26, 36, 39, 40, 62] and variations of the multilayer system on the measured efficiencies. It is demonstrated that the incorporation of those systematic effects into the modeling scheme can improve the quality of the reconstruction and reduce the estimated uncertainties. However, the used forward model gets more complicated as more systematic errors and more parameters are added. In order to assess the quality of the several models and the corresponding reconstructions the likelihood-ratio test [49, 66] is employed as a method of model selection. The LSQ approach and MLE are both deterministic methods to find the solution to the inverse problem. This means that the solution is a single set of parameters along with their uncertainties. An alternative to those deterministic approaches is given by the Bayesian framework [38, 68]. Here the solution to the inverse problem is no longer an estimate with uncertainties, but probability distributions of the pa- rameters of interest, called the posterior distribution. This is highly recommended since the likelihood function that is maximized for MLE, as well as the least squares function that is minimized for LSQ, tend to have several local maxima and minima, respectively, such that a single estimate may not give the complete information. The Bayesian approach, however, makes it necessary that all the parameters on which the forward model depends are modeled as random variables and are represented by their probability distributions, which include all the information available about the parameters prior to the measurement process. This prior information can also include
  • 27. Chapter 1: Introduction 5 information obtained by different measurements, like those using AFM measurements [15, 17, 18]. The fact that several measurement techniques can be combined to re- duce the overall uncertainty of the parameters of interest is what makes the Bayesian approach so interesting. Its main disadvantage, however, is that a rigorous evaluation of the posterior distribution is a very time-consuming task. In this work an approach that helps to circumvent this disadvantage by a simple approximation method is derived and applied to measurement data. The thesis is structured as follows: Chapter 2 gives a detailed description of the two scatterometric measurement setups. It also contains the mathematical framework for the solution to the forward problem and basic principles of the three methods used to solve the inverse problem in this work, namely the least squares approach (LSQ), the maximum likelihood estimation (MLE) and the Bayesian approach. A comparison of the LSQ and MLE approaches both in terms of simulated and measured data is given in Chapter 3. The effects systematic errors have on scatterometry are discussed in Chapter 4, while the extension of the forward model to include the systematic errors is given in Chapter 5. Chapter 5 also demonstrates the application of the several models on simulated and measurement data and gives a possible ranking of the model complexity. It is in Chapter 6 that the Bayesian approach is applied to actual measurement data. The work closes with a summary and the conclusions in Chapter 7.
  • 28.
  • 29. Chapter 2 Preliminaries We start by collecting some basic facts about scatterometry and some mathemat- ical concepts that will be useful in the following chapters. We give an overview of the experimental setups used in scatterometry in Section 2.1. The mathematical concepts used to model the interaction of electromagnetic waves with matter are introduced in Section 2.2 and Section 2.3 presents the basics about inverse problem theory. 7
  • 30. 8 Chapter 2: Preliminaries 2.1 Experimental Setups The following section is mainly based on [27, 74]. A scatterometer can most generally be defined as a device that illuminates a sample and measures properties of light scattered from that sample; a schematic representation is shown in Fig. 2.1 below. Figure 2.1: Scheme of a scatterometric setup. There are several different scatterometric techniques available, varying according to the used light source and the measured properties of the scattered light. We will concentrate on two of them, namely standard scatterometry and spectroscopic reflectometry. A standard scatterometer uses light from a monochromatic source with a fixed polarization state. We will only consider the classical case, i.e., the case in which the direction of the inspecting light beam is chosen to be inside the cross section plane perpendicular to the groove direction. The resulting scattered wave directions are then located in the same plane. It is called transverse electric polarization (TE), or S polarization, when the incident electric field E is parallel to the grooves of the sample, and transverse magnetic polarization (TM), or P polarization, when E is perpendicular to the grooves. Note that the measured samples throughout this work are line-space structures (cf. Fig. 2.1), i.e., groups of parallel lines (bridges) placed on a plane surface. The bridges are assumed to have the same cross section in the plane perpendicular to the line direction (groove direction). Due to equal distances between the bridges, the line-space structure forms a grating. The grating is constant in the groove direction
  • 31. Chapter 2: Preliminaries 9 and periodic in the surface direction perpendicular to the grooves. Because of the periodicity of the grating structure, the outgoing light propagates only into a finite number of directions. The scatterometer measures the efficiencies, i.e., the portion of energy conveyed to these discrete outgoing beams. Since each of these beams can be associated with a diffraction order, the measured data is also called a diffraction pattern, representing how much energy is transferred to each diffraction order. Usually the measurement is performed with a fixed angle of incidence θinc either in reflexion or transmission mode, depending on the optical constants of the sample. The measurement device is called a goniometric scatterometer, when the angle of incidence can be varied additionally during the measurement. Reflectometric measurements can also be realized; the light source and the detector are hereby moved simultaneously, such that the detector position is always at an angle −θinc measured from the normal of the surface. With a spectral reflectometer it is possible to vary the wavelength of the inspecting light. This can be done by either using a tunable laser system or a broad-band light source and an adjustable monochromator. In this work experimental data from two scatterometric setups are used. The first one is a spectroscopic reflectometer that is also capable of detecting diffraction orders apart from the main reflex operating with a light source in the EUV (extreme ultra- violet) spectrum of about 13–14 nm. The second one is a goniometric scatterometer operating at a wavelength of 193 nm in the DUV (deep ultraviolet) range. Both ex- perimental setups are described in further detail in the following sections. Note that for either of the two methods the size of the probed area is not infinitely small such that the measured efficiencies are in both cases averaged values for the probed area. 2.1.1 EUV – Experimental Setup The first type of measurement data is obtained with an EUV spectroscopic reflec- tometer, shown in Fig. 2.2. It is operated at the soft x-ray radiometry beam line in the PTB’s synchrotron radiation laboratory at BESSY II in Berlin [44, 63, 64]. The beam line provides monochromatized radiation in the spectral range from 0.7 nm to 35 nm, including the EUV spectral range around 13.5 nm. The probed area is around
  • 32. 10 Chapter 2: Preliminaries 1 mm2 . The measurements shown here are obtained by scanning the detector angle in-plane for three different wavelengths and a fixed angle of incidence of 6◦ for TE polarization. For further processing, only the measured diffraction efficiencies for the discrete diffraction orders were used, no diffusely scattered radiation. For the structures in- vestigated, EUV scatterometry offers the advantage of working in the regime with the wavelength much shorter than the characteristic dimensions of the structures to be investigated (a few 100 nm). Therefore, many diffraction orders can be measured, providing information on the higher harmonics in the spatial frequency range corre- sponding to smaller structure details. A typical set of measurement data consists of 69 to 75 efficiencies for diffraction orders in the range of −10 to 14. Depending on the investigated structure the data in such a measurement dataset covers a wide dynamic range, starting with 10−3 % for higher orders, up to several 10% for the 0th order. Figure 2.2: Scheme of the spectroscopic reflectometer.
  • 33. Chapter 2: Preliminaries 11 2.1.2 DUV – Experimental Setup The second type of data derives from measurements with a DUV goniometric scatterometer from the ultra-high resolution microscopy working group at PTB in Braunschweig. The light source comprises a frequency-quadrupled TiSa laser with a fundamental wavelength from 772 nm to 840 nm. Wavelengths down to 193 nm are available via frequency conversion. A measurement dataset in the present case includes reflected and transmitted diffraction efficiencies from the orders −4 to 2 at seven different incident angles for a TM-polarized laser beam with a wavelength of 193 nm. The measurement spot size is about 100 µm in diameter for the present measurements. It consists typically of 43 data points with transmitted and reflected efficiencies, see [77] for further details on the experiment. A scheme of the measurement setup is shown in Fig. 2.3 below. The dynamic range of the measurement data is not as wide as that from EUV measurements and covers the range from 0.1% to 3%. Figure 2.3: Scheme of the goniometric reflectometer. 2.2 Mathematical Modeling of Scatterometry The following section is mainly taken from [4, 43, 58]. The mathematical model to describe the propagation of electromagnetic waves in matter used here is based on Maxwell’s equations. The efficiencies and phase shifts for the different diffraction directions are calculated based on the data of the incident light and from characteristic
  • 34. 12 Chapter 2: Preliminaries parameters of the irradiated surface profile. The optical grating is modeled to be an infinite plate consisting of different periodic non-magnetic materials with permeability µ0 and dielectric constant ǫ. We chose the coordinate system shown in Fig. 2.4 throughout the calculation. Figure 2.4: Schematic representation of the computational domain. The investigated mask is furthermore assumed to be periodic in x-direction with period d and homogeneous in z-direction, i.e., ǫ is invariant with respect to z. The upper cover material is vacuum and the incident wave is normalized to have unit amplitude. We consider coplanar diffraction with incident wave directions restricted to the xy-plane leading to reflected and transmitted plane waves in the same plane. The incident light then can be described as a superposition of TE-polarized and TM- polarized light. Note that the magnetic field H and the electric field E remain parallel to the structures in the TM and TE cases, respectively, so that the transverse compo- nent of the respective fields can be determined from the two-dimensional Helmholtz equation ∆u (x, y) + k2 (x, y) u (x, y) = 0 (2.1) with the wavenumber function k (x, y) = ω (µ0ǫ (x, y))1/2 and angular frequency ω
  • 35. Chapter 2: Preliminaries 13 of the incident light wave. Note that the wavenumber function is constant in areas filled with the same material. On material interfaces the solution u and its nor- mal derivative ∂nu, for TE polarization and the solution u and product k−1 ∂nu, for TM polarization, have to cross the interface continuously. The usual outgoing wave conditions for half-spaces are required in the infinite regions. The domain Ω in the cross section plane can therefore be reduced to a rectangle with the x-coordinate varying between zero and the period d and with two artificial boundaries Γ± = {y = b± } located beneath the substrate (Γ−) and in the covering vacuum (Γ+). On the lateral part, quasi-periodic boundary conditions are imposed such that u (0, y) = u (d, y) exp (−iα0d) , and non-local boundary conditions are imposed on Γ± . For instance, on Γ+ the trace ∂nu|Γ+ on Γ+ of the normal derivative ∂nu must equal the y derivative of the Rayleigh expansion (see Eq. (2.2)) of the trace u|Γ+ of u from Ω. The component Ez admits an expansion into Rayleigh series above and beneath the grating structure. For TE polarization they are given by Ez x, b+ = ∞ n=−∞ A+ n exp iβ+ n y exp (iαnx) + Ainc 0 exp −iβ+ 0 y exp (iα0x) (2.2) and Ez x, b− = ∞ n=−∞ A− n exp −iβ− n y exp (iαnx) . (2.3) Note that β± n = (k±)2 − (αn)2 , k± = k (x, b± ), Ainc 0 = 1, α0 = k+ sin θinc and αn = k+ sin θinc + 2π d n. The Rayleigh coefficients A± n of interest are those with n ∈ U± , U± = {n ∈ Z : |αn| < k± } if Im k± = 0 ∅ if Im k± > 0 , as they describe the magnitude and phase shift of the propagating plane waves. The modulus |A± n | is the amplitude of the nth reflected/transmitted wave mode and arg (A± n / |A± n |) is the corresponding phase shift. Terms with n /∈ U± lead to evanes- cent waves.
  • 36. 14 Chapter 2: Preliminaries The efficiency of the nth diffracted wave is defined as the ratio of its energy to the energy of the incoming wave. The energy in turn is defined as the flux of the Poynting vector P = Re (E × H) /2 through a reference area parallel to the plane of the grating. The efficiencies can be expressed as e± n = β± n |A± n | 2 β0 |Ainc 0 | 2 , (n, ±) ∈ (n, +) : n ∈ U+ ∪ (n, −) : n ∈ U− , (2.4) see [58] Eq. (1.50). These efficiencies of propagating modes exist for non-absorbing materials, i.e., materials such that Im k± = 0. The efficiencies for TM polarization can be derived analogously. Once Eq. (2.1) is solved with the finite element method (FEM) for elliptic PDEs [14], the Rayleigh coefficients can be obtained by a discretized Fourier series expansion applied to the solution restricted to Γ± [13] (see Eqs. (2.2) and (2.3)). Equation (2.4) yields the efficiencies. We use the software package DIPOG [20] for our investigations, developed at the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) in Berlin. Note that the method described above can be applied to arbitrary complex, peri- odic structures. This means that there are no limits for the forward problem. How- ever, the use of the FEM method as a model function in the inverse problem demands a simplification of the modeling. It is therefore reasonable to limit the geometry of the profile to certain classes of gratings that can be described by a small number of parameters. A common approach is to define the profile of the grating as a polyg- onal structure, composed of several materials. The coordinates of the corner points of the polygonals along with the optical constants of the materials completely de- fine the grating structure in this case. The distribution of the dielectric constant ǫp (x, y) can then be defined by distinction of cases expressed by a combination of linear inequalities for x, y that depend on the chosen parameter vector p. The exact representation of the wavenumber functions consequently also depends on those parameters such that the Helmholtz equation reads as: ∆u (x, y) + k2 (p, x, y) u (x, y) = 0, (2.5) with k (p, x, y) = ω (µ0ǫp (x, y))1/2 . The period of the grating in x-direction, the coordinates of the corner points, usually given relative to the period, the thickness of
  • 37. Chapter 2: Preliminaries 15 the absorber and the specification of the optical constants of the materials p1, . . . , pk are then sufficient for a complete characterization of the sample. Together with the wavelength of the incident light λ and the angle of incidence θinc, all parameters necessary to model the resulting diffraction pattern are given and we end up with the model function, which maps the parameter vector p = (p1, . . . , pk, λ, θinc) onto the corresponding diffraction pattern, i.e., the efficiencies and phase shifts for the different diffraction directions f : Rk+2 → Rm , p → f (p) . Note that the PDE from Eq. (2.5) needs to be solved for every evaluation of the model function. 2.2.1 Geometry Model for EUV The first object under investigation is a photomask with a periodic line pattern designed for use in EUV lithography. It will be called the EUV mask. The cross section profile for one spatial period is shown in Fig. 2.5 below. Figure 2.5: Cross section of the investigated EUV mask.
  • 38. 16 Chapter 2: Preliminaries Material Height/nm SWA/◦ n k TaO 12.00 82.7 0.948 0.0310 TaN - - 0.942 0.0337 SiO2 (buffer) 8.000 90 0.984 0.0082 SiO2 (capping) 1.234 0.984 0.0082 Si 12.869 1.000 0.0018 MoSi 0.147 0.970 0.0043 Mo 2.141 0.925 0.0062 MoSi 1.972 0.970 0.0043 Si 2.838 1.000 0.0018 Substrate ≈6.35 · 106 0.984 0.0082 Table 2.1: Geometric parameters and optical constants at a wavelength of 13.4 nm of the EUV mask used for simulations, period d = 720 nm. It consists of a symmetric polygonal domain composed of three trapezoidal layers of different materials (TaO, TaN, and SiO2). These trapezoids are defined by the heights of the three layers pi, i = 1, 6, 11 and by the coordinates pi, i = 2, 3, 7, 8, 12, 13 defining the x-coordinates of the corner positions. Beneath the line-space structure there are two absorbing layers of SiO2 and of Si on top of a molybdenum silicide (MoSi) multilayer stack (MLS). The stack consists of an in y-direction periodically repeated structure composed of a Mo layer and a Si layer separated by two inter- diffusion MoSi layers. Note that the MLS is added to enable the reflection of EUV waves. It acts as a Bragg mirror at the design wavelength of about 13.4 nm. Important geometric profile parameters are the height p6 of the TaN layer (55–60 nm) and the x-coordinates p2 and p7 of the right corners of the TaN layer. The complex indices of refraction for the involved materials are given in Table 2.1 for a wavelength of 13.4 nm. A symmetric profile is imposed, i.e., the x-coordinates of the corresponding left corners depend on those of the right corners such that p3 = d−p2 or p8 = d−p7, where d is the period of the EUV mask. Furthermore, the sidewall angle (SWA) for the TaO layer is fixed to 82.7◦ . This is done in order to model a certain edge rounding, i.e., the cross section area of this trapezoidal layer is equal to a corresponding TaO layer having curved upper edges with a radius of about 6 nm. Additionally, we assume that
  • 39. Chapter 2: Preliminaries 17 the SWA of the SiO2 layer is constant at 90◦ . The SWA of the TaN layer depends on the x-coordinates of the corners and the height of the TaN layer, i.e., tan(SWA) = p6 p2 − p7 . The geometric features of main interest, i.e., the critical dimensions (CDs) to be determined by scatterometry, are the height, top width and bottom width of the absorbing structure and the SWA of the TaN absorber layer, which depend on the pa- rameters p6, p7 and p2 (cf. Fig. 2.5). In the following we will refer to these parameters as height, top CD, bottom CD and SWA. In our evaluations all remaining parameters are set to the values given in Table 2.1, which represent the manufacturer’s design values [59]. Note that the model function for the EUV mask only depends on the parameter vector p = (p2, p6, p7) once an incident wavelength λ, which additionally determines the optical constants, and an incident angle θinc are given. 2.2.2 Geometry Model for DUV The second object under investigation is another line-space structure, called the MoSi mask. Its cross section is a trapezoidal domain made of molybdenum silicide (MoSi) based on a glass substrate (cf. Fig. 2.6). Figure 2.6: Cross section of the investigated MoSi photomask.
  • 40. 18 Chapter 2: Preliminaries The trapezoid is completely defined by its height p3 and by the x-coordinates pi, i = 1, 2, 4, 5 of its corners. Again a symmetric profile is imposed and the sidewall angle of the MoSi absorber layer depends on the corners and the height, such that: tan(SWA) = p3 p1 − p4 . The critical dimensions to be determined by scatterometry are the height, top width and bottom width of the absorbing structure and the SWA of the MoSi absorber layer. The optical constants and the design values of the manufacturer for the MoSi mask can be found in Table 2.2 below. As for the EUV mask, the model function for the MoSi mask depends on three parameters p = (p1, p3, p4) once an incident wavelength λ and an incident angle θinc are given. Material Height/nm SWA/◦ n k MoSi - - 2.308 0.5975 Substrate ≈ 6.35 · 106 1.575 0 Table 2.2: Geometric parameters and optical constants at a wavelength of 193 nm of the MoSi mask, period d = 560 nm.
  • 41. Chapter 2: Preliminaries 19 2.3 Inverse Problem Theory Equipped with a model function like the ones defined in Section 2.2 f : Rn → Rm , p → f (p) the (finite-dimensional) inverse problem reads as follows: Given a noisy realization of the model y ∈ Rm , usually called the measurement data, compute an estimate for the parameters p ∈ Rn . As we assume that y = f (p) , the straightforward way would be to invert the model function f, such that p = f−1 (y) . Unfortunately, most inverse problems are ill-posed in the Hadamard sense [31], which means that 1. a solution may not even exist, 2. the solution may not be unique, 3. the inverse function f−1 is not continuous, hence small errors in the data y may cause large errors in the estimate of p. We will present a short overview of three different approaches to solve the inverse problem. Motivated by the actual inverse problem of scatterometry we will hereby focus on finite-dimensional problems. The subsections discussing the least squares and maximum likelihood approaches are mainly based on [32, 38, 41, 45], while the section about the Bayesian approaches is based on [10, 23, 48] and also on [38, 41]. 2.3.1 Least Squares Approach We first consider a linear model function, i.e., a model function that can be written as f : Rn → Rm , p → Mp,
  • 42. 20 Chapter 2: Preliminaries for some matrix M ∈ Rm×n with rank r ≤ min (m, n). The singular value decompo- sition (SVD) of the matrix M is defined as M = UDVT , (2.6) with D = Σr 0 0 0 ∈ Rm×n , (2.7) Σr = diag (σ1, . . . , σr) , σ1 ≥ σ2 ≥ . . . ≥ σr > 0, UUT = Im and VVT = In. One way to define the condition number of the matrix M is to define it as cond (M) = σ1 σr . An ill-conditioning of the matrix M can be classified as the following: • The problem is rank-deficient, i.e., the solution is not unique if r < n ≤ m or if m < n. • The problem is numerically rank-deficient if M has a few small singular values, with a clear gap between small and larger values. • The problem is discrete ill-posed, i.e., there is no gap between the singular values, but there are a lot of small singular values. Numerically rank-deficient and discrete ill-posed problems are both characterized by a very large condition number and can be considered underdetermined problems. Most likely the noisy realization of the model y /∈ ran (M) and the inverse problem has no classical solution such that Mp − y = 0. In this case we seek for a solution such that Mp is in a sense close to the measurement y, most commonly using the least squares method. First, one defines an objective function χ2 (p) = Ω (Mp − y) 2 . The weight matrix Ω adjusts the importance of individual data points and is usually chosen to represent the variances of the measurement data. Throughout this work
  • 43. Chapter 2: Preliminaries 21 we assume that Ω = diag ω 1/2 1 , . . . , ω 1/2 m , hence χ2 (p) = m j=1 ωj (Mp)j − yj 2 . The least squares solution to the inverse problem is the parameter vector pLSQ such that: pLSQ = arg min p χ2 . Differentiating χ2 with respect to p and setting ∂χ2 ∂p = 0 yields the set of normal equations MT ΩT (ΩMp − Ωy) = 0. If r = n ≤ m the LSQ solution can be formally obtained as pLSQ = MT ΩT ΩM −1 MT Ωy. (2.8) In the case that the matrix M is rank-deficient, i.e., r < n, and if Ω = Im, we obtain the LSQ solution using the Moore-Penrose pseudoinverse M† of M [56]. Using the SVD from Eqs. (2.6) and (2.7), the pseudoinverse is obtained as M† = V Σ−1 r 0 0 0 UT , and the LSQ solution can be calculated via pLSQ = M† y. If the matrix M is numerically rank-deficient, a regularization technique known as truncated SVD can be applied, whereby the small singular values {σk+1, . . . , σr} are replaced by zeros. The pseudoinverse in this case is obtained as M† = V Σ−1 k 0 0 0 UT , Σ−1 k ∈ Rk×k . Problems in which M is discrete ill-posed require some further regularization, which means that the ill-posed problem is replaced by a well-posed approximation. The
  • 44. 22 Chapter 2: Preliminaries regularization of inverse problems is a vast field and many techniques are available to deal with that problem. Since in this work no further regularization is applied, we will not present those approaches, but refer to [21, 25, 71]. In the case of a nonlinear model function f : Rn → Rm , p → f (p) , the solution to the inverse problem is found by minimizing the resulting weighted least squares function, χ2 (p) = Ω (f (p) − y) 2 = m j=1 ωj [fj (p) − yj]2 . (2.9) In this work the solution to Eq. (2.9) is found by a Gauss-Newton-type iterative optimization proposed by Dr`ege, Alassaad and Byrne [3]. Starting with an initial estimate p0 , the model function for each iteration is approximated by a Taylor series around the current estimate pk , such that χ2 (p) ≈ Ω f pk + J pk p − pk − y 2 , J = ∂fj ∂pi i,j . The next iterate is found by minimizing this linear LSQ problem, similar to Eq. (2.8), leading to the iteration formula p(k+1) = pk + J pk T ΩT Ω J pk −1 J pk T Ω y − f pk . It is clear that the variances of the input data have an influence on the variances of the reconstructed parameters. A higher variance in the measurement noise typi- cally leads to a higher variance of the reconstructed parameter values. One way to estimate these variances is to calculate the approximate covariance matrix as pro- posed in [3, 55]. If we assume that the model function f is approximately linear in the relevant regions of the parameter values pi, then the errors of the reconstructed parameters are, again, normally distributed random numbers with zero mean. The standard deviations u (pi) of the quantities pi are given by the square root of the main diagonal entries of the covariance matrix Σ of the parameters. The matrix Σ can be approximated as Σ ≈ JT ΩT ΩJ −1 , (2.10)
  • 45. Chapter 2: Preliminaries 23 with Ω = diag ω 1/2 1 , . . . , ω 1/2 m . Hence u (pi) ≈ (Σi,i)1/2 . Note that in the earlier works [3, 30, 55] the scaling factors ωj in Eq. (2.10) were chosen according to the predefined error model. A modified approach to the variance estimation for LSQ is to choose the scaling factors according to the resulting residuals of the optimal solution, based on the following reasoning. A consistent solution ˆp to the optimization problem (cf. Eq. (2.9)) should pass the χ2 -test, namely χ2 min = m j=1 ωj [fj (ˆp) − yj]2 ∈ χ2 ν,α/2, χ2 ν,(1−α)/2 , (2.11) where ν = m − n denotes the degrees of freedom, χ2 ν,α/2, χ2 ν,(1−α)/2 the confidence interval of the corresponding χ2 ν distribution for a specific significance level, e.g., α = 0.05. If this is not the case, then we can fulfill the condition of Eq. (2.11) by rescaling the variances of the input data and the weights with some scaling factor κ: ωj = ωjκ, chosen such that the rescaled χ2 min equals ν. Note that this rescaling of the weights does not affect the result of the optimization given by the parameter values that minimize the function in Eq. (2.9). However, it fits the variances of the reconstructed parameters. In this work LSQ refers to the minimization of the function in Eq. (2.9) followed by a rescaling of the variances according to the inclusion in Eq. (2.11). In [28, 29, 30] it has been shown that the variances of the reconstructed parameters calculated using the above approximation are comparable to those obtained using a more time-consuming Monte Carlo-type method assuming known variances of the measurement data and a local linearity of the model function f around the minimum. 2.3.2 Maximum Likelihood Approach Note that for the weighted least squares approach described in the previous sub- section, a complete knowledge of the variances of the measurement data is required.
  • 46. 24 Chapter 2: Preliminaries Since this constraint is seldom fulfilled in real life, we therefore introduce a method that is capable of solving inverse problems without this knowledge, known as maxi- mum likelihood estimation (MLE). All that is required for MLE is a specification of the underlying statistical model. In the present work we exclusively deal with nor- mally distributed measurement errors and we therefore focus on the MLE for such an error model. If we assume that the measurement errors ǫj = yj − fj (p) , j = 1, . . . , m, i.e., the difference between the model and the measured values are uncorrelated, normally distributed with unknown variances σ1, . . . , σm and zero mean for each of the m measurement values, then their probability density function is proportional to L (σ1, . . . , σm, p) = m j=1 2πσ2 j −1/2 exp − (fj(p) − yj)2 2σ2 j . (2.12) The maximum likelihood estimator is the parameter combination that maximizes this likelihood function, i.e., ˆθMLE = (ˆσ1, . . . , ˆσm, ˆp) = arg max σj,p L (σj, p) . This parameter combination is the most likely to produce the measured data y. Note that for fixed variances the maximum likelihood estimator is identical to the LSQ solution to the inverse problem. If the second derivative of the logarithm of the likelihood function exists and is finite, then the covariance matrix for the maximum likelihood estimator ˆθ can asymptotically be expressed in terms of the negative second derivative of log L, the Fisher information matrix [49] I = − ∂2 log L ∂θi∂θj i,j . If we denote by ˆθ the maximum likelihood estimator, then its standard error can be calculated as u ˆθi = (Σi,i)1/2 , with Σ = I−1 . (2.13)
  • 47. Chapter 2: Preliminaries 25 If the operations of integration with respect to y and differentiation with respect to θ can be interchanged for the second derivative of log L, it can be proven that the maximum likelihood estimator is asymptotically efficient. This means that it is asymptotically consistent and it additionally achieves the Cram´er-Rao lower bound. Consequently, no asymptotically unbiased estimator has a lower asymptotic mean squared error than the MLE. Note that the error bars presented throughout this work both for LSQ and MLE correspond to the 95% confidence intervals that can be obtained by rescaling the standard errors by a factor of 1.96. 2.3.3 Bayesian Approach The methods introduced in the previous subsections, LSQ and MLE, both yield single estimators for the parameters of interest. Those estimators are found by min- imizing the weighted least squares function and maximizing the likelihood function. However, especially for MLE but also for LSQ, several local extrema may appear, such that a single estimate may be uninformative. The Bayesian approach to the inverse problem has a different point of view. The solution is no longer a single estimator but a probability density for the parameters of interest, called the posterior probability distribution. The framework necessary to understand this approach will now be given. Note that this subsection mainly originates from [38, 41, 68]. The main principles for the Bayesian approach are: 1. All variables included in the model are modeled as random variables. 2. The randomness describes our degree of information concerning their realiza- tions. 3. The degree of information concerning these values is coded in the probability distributions. 4. The solution of the inverse problem is the posterior probability distribution.
  • 48. 26 Chapter 2: Preliminaries Denoting random variables by capital letters, the model takes the form Y = g (Θ) , with Y taking values in Rm and Θ taking values in Rn+k . Note that this model function g is not identical to the model function f discussed before, as it usually also contains information about the measurement errors and the appropriate error model as well as the model function for the physical model. The directly observable random variable Y is called the measurement and its realization Y = yobserved the data. The non-observable random variable Θ that is of primary interest is called the unknown. Any information that is available about Θ before the measuring process is coded into a probability density θ → πpri (θ) , called the prior density, expressing what we know about the unobservable parameters prior to the measurement. After analyzing the measurement setting as well as all additional information available about the variables, we have the joint probability density of Θ and Y by π (θ, y). The marginal density of the unknown Θ must then be Rm π (θ, y) dy = πpri (θ) . If, on the other hand, we knew the value of the unknown, the conditional proba- bility density of Y, given this information, would be π (y|θ) = π (θ, y) πpri (θ) , if πpri (θ) = 0. This conditional probability of Y is called the likelihood function, as it expresses the likelihood of different measurement outcomes with Θ = θ given (cf. Eq. (2.12)). If the measurement data Y = yobserved is given, we end up with the conditional probability distribution π (θ|yobserved) = π (θ, yobserved) π (yobserved) , if π (y)observed = 0, called the posterior distribution, as it expresses what we know about Θ after the observation Y = yobserved.
  • 49. Chapter 2: Preliminaries 27 The goal in the Bayesian framework is to find the conditional probability function π (θ|yobserved) for a given set of measurement data Y = yobserved. Combining the above results, it can be expressed as πpost (θ) = π (θ|yobserved) = πpri (θ) π (yobserved|θ) π (yobserved) . (2.14) Looking at Eq. (2.14), the solving of an inverse problem in the Bayesian framework can be split into three steps: 1. Collect all the available prior information of the unknown Θ and construct a prior probability density function πpri that reflects this information. 2. Find the likelihood function π (y|θ) that describes the interrelation between the observation and the unknown. 3. Develop methods to explore the posterior probability if it can not be expressed in an analytical way.
  • 50.
  • 51. Chapter 3 Maximum Likelihood and Least Squares This chapter presents a comparison between the least squares method and the maximum likelihood estimation. We start by introducing our model of measurement errors and show how it is incorporated into the two different approaches in Section 3.1. Section 3.2 presents the results obtained by the two methods both for simulated and measured datasets. It is demonstrated how LSQ can lead to unsatisfying results if the knowledge of the measurement error is incomplete and how MLE can be applied to circumvent this problem. The results have already been published in [34]. 29
  • 52. 30 Chapter 3: Maximum Likelihood and Least Squares 3.1 Measurement Error Model Usually a measurement dataset that characterizes the diffraction pattern is given by a vector y = (y1, . . . , ym), consisting of efficiencies or phase shift differences for different wavelengths, incident angles or polarization states with the jth data point being a sum of the value of the model function and a noise contribution due to the perturbation by measurement noise. If yj denotes the corresponding measurement value, we assume yj = fj (p) + ǫj, where ǫj denotes the according measurement error. If there is no correlation between the measurements and there are no further systematic errors, we can assume the errors ǫj of the different measurements to be independent. We furthermore assume that they can be modeled as a sum of two normally distributed random variables that both have zero mean such that ǫj = ǫj,1 + ǫj,2, with ǫj,1 ∼ N 0, (afj)2 and ǫj,2 ∼ N 0, b2 . From an experimental point of view, power fluctuations of the incidental beam during the recording of the diffraction patterns are the main source for the first term. The second term describes the contribution of the background noise independent of the measured light intensities. The overall variance of the errors then reads as σ2 j = (afj)2 + b2 . (3.1) Based on this error model and the resulting variances in Eq. (3.1), the weighted least squares function, given by Eq. (2.9), takes the following form: χ2 (p) = m j=1 ωj [fj (p) − yj]2 , ωj = σ−2 j = (afj (p))2 + b2 −1 . (3.2) Throughout this chapter the LSQ solution to the inverse problem will refer to the minimum of this weighted least squares function. Note that it only depends on the geometry parameters p of the model function. For the minimization of the least squares function a Gauss-Newton-type optimization routine available in DIPOG is used.
  • 53. Chapter 3: Maximum Likelihood and Least Squares 31 The likelihood function for the given error mode is a function of a, b and the geometry parameters p for given measurement data y [49] (see Eq. (2.12)) L (a, b, p) = m j=1 2π (afj(p))2 + b2 −1/2 exp − (fj(p) − yj)2 2 (afj(p))2 + b2 . The MLE solution to the inverse problem will refer to the values a, b and p corre- sponding to the maximum of this likelihood function. For the maximization DIPOG is only used as a black box to solve the forward problem and to calculate the gra- dients of the model function with respect to the parameters p. The optimization is performed using a routine in MATLAB modified by the author. 3.2 Results We now apply the LSQ and the MLE approaches to several simulated and mea- sured datasets both for EUV and DUV scatterometry. 3.2.1 Dependency on the Chosen Weight Factors for EUV Data In this section, we solve the reconstruction problem using simulated data that are superposed by a noise representing the variances of the input values that are parametrized by a and b. We start with a least squares approach assuming fixed uncertainty parameters a and b. Figure 3.1 shows the values of the χ2 -function defined in Eq. (3.2) for one realization of a simulated dataset perturbed by a noise contribution with a = 10% and b = 10−3 (cf. Eq. (3.1)) in dependence on the bottom CD and top CD for two different noise models, i.e., two different weightings in the least squares function from Eq. (3.2). Figure 3.2 shows the dependency of the reconstructed CDs on the ratio b/a for the same example. Note that the reconstruction results depend only on the ratio of b to a and not on the absolute values of the two parameters. The geometrical parameters of the mask used in the simulations were set to a bottom CD of 550 nm and a top CD of 546.9
  • 54. 32 Chapter 3: Maximum Likelihood and Least Squares nm, corresponding to a sidewall angle of 90◦ for the TaN and the SiO2 layer and 82.7◦ for the TaO layer on top of the absorber line (cf. Fig. 2.5). b/a=0.01 bottom CD / nm topCD/nm 545 550 555 560 542 544 546 548 550 552 100 200 300 400 500 600 b/a=0.1 bottom CD / nm topCD/nm 545 550 555 560 542 544 546 548 550 552 6000 8000 10000 12000 14000 Figure 3.1: χ2 in dependence on bottom CD and top CD for different ratios b/a. 0 0.02 0.04 0.06 0.08 0.1 540 545 550 555 560 b/a CD/nm bottom CD top CD 0 0.02 0.04 0.06 0.08 0.1 84 86 88 90 b/a sidewallangle/° Figure 3.2: Reconstructed CDs and SWA in dependence on ratio b/a for a simulated dataset. In Fig. 3.1 one clearly sees how the minimum (the blue dot) shifts upon change of the ratio b/a resulting in two different solutions to the inverse problem. This dependency of course also has an impact on the reconstructed sidewall angle (see Fig. 3.2). One sees that the reconstructed value of the top CD decreases and that of the bottom CD increases by several nanometers as the value of b/a in the reconstruction is chosen by one order of magnitude bigger than the true value. It is this sensitivity that suggested to us to treat the noise parameters a and b as additional variables that need to be reconstructed as well, since a wrong or
  • 55. Chapter 3: Maximum Likelihood and Least Squares 33 incomplete assessment regarding the variances of the input parameters (scattering efficiencies) will not only lead to an under- or overestimation of the variances of the output parameters (reconstructed geometry parameters) but also causes significant systematic errors in the results. It is especially important to consider this when determining the setup for a new experiment, as the knowledge of the variances of the measurement errors is usually incomplete. 3.2.2 Application to Simulated EUV Data In the following, MLE is applied to datasets obtained by simulating the diffraction pattern of an EUV mask with bottom CD = 550 nm, top CD = 546.9 nm and height = 58 nm. The results are compared to those of the LSQ method applied to the same numerically generated datasets. The parameters used in the simulation are similar to the ones used in the actual EUV measurements on such masks. The −10th to 12th diffraction orders of a photo beam at a fixed angle of incidence of 6◦ are numerically computed for three different wavelengths of the incoming EUV radiation, resulting in a set of 3 × 23 data points. The simulated data have been perturbed assuming a normally distributed mea- surement error with a = 10% and b = 10−3 (cf. Eq. (3.1)) resulting in a total of 50 noisy datasets. The weights ωj for the LSQ in Eq. (3.2) were defined with a = 1% and b = 10−3 representing a typical estimate of the variances in the real measurement processes used in earlier publications [30]. For MLE those noise parameters were treated as unknowns and hence needed to be reconstructed as well. The results for the noise parameter a and the ratio b/a along with the approxi- mate 95% confidence intervals are presented in Fig. 3.3. MLE is found to be capable of reconstructing the noise parameter a from a dataset of limited size with a typical relative error of 10%–20%, while errors in b can be substantially larger. Figure 3.4 below presents the reconstructed sidewall angles along with the approximate 95% confidence intervals for the solutions of the two methods. Note that there is a sys- tematic shift of about 1.5◦ between the actual value of 90◦ and the mean estimated sidewall angle obtained by LSQ, while the mean estimated sidewall angle obtained
  • 56. 34 Chapter 3: Maximum Likelihood and Least Squares by MLE is almost identical to the actual value. This bias is observed because of the nonlinearity of the mathematical model. It is found to vanish for the present model if the correct weights are chosen. 0 10 20 30 40 50 0.06 0.08 0.1 0.12 0.14 0.16 0.18 dataset noiseparametera 0 10 20 30 40 50 −0.005 0 0.005 0.01 0.015 0.02 0.025 dataset b/aFigure 3.3: Reconstructed noise parameter a (left panel) and ratio b/a (right panel) with approximate 95% confidence intervals for MLE. The green dotted lines represent the actual values, the red dotted lines represent the mean values. 0 10 20 30 40 50 75 80 85 90 95 100 dataset SWA/° LSQ 0 10 20 30 40 50 75 80 85 90 95 100 dataset SWA/° MLE Figure 3.4: Comparison of the reconstructed sidewall angles with approximate 95% confidence intervals for the solutions for LSQ (left panel) and for MLE (right panel). The green dotted lines represent the actual values, the red dotted lines represent the mean values. For a consistent reconstruction the true value of the geometry parameter should lie within the 95% confidence interval around the reconstructed value. Figure 3.4 shows that the percentage of consistent reconstructed sidewall angles is comparable for both methods (92% for LSQ, 94% for MLE). However, the root-mean-square
  • 57. Chapter 3: Maximum Likelihood and Least Squares 35 deviations (RMSD) of the reconstructed values from the true values as shown in Fig. 3.5 are about twice as high for LSQ as they are for MLE. topCD botCD height SWA 0 0.5 1 1.5 2 2.5 3 3.5 4 relativedeviation/% LSQ, RMSD LSQ, mean topCD botCD height SWA 0 0.5 1 1.5 2 2.5 3 3.5 4 relativedeviation/% MLE, RMSD MLE, mean Figure 3.5: Comparison of the RMSD and mean estimated standard deviations in % of the actual value for LSQ (left panel) and for MLE (right panel). 3.2.3 Application to Measured EUV Data In the following we apply the maximum likelihood estimation to measurement data from EUV scatterometry at PTB. The EUV mask under investigation is structured into 121 dies, each of which contains two different scatterometry fields with periodic line and space structures. While the dies D4, D8, F6 and H8 have identical design values, dies H4, H5 and H6 have different periods but are assumed to have the same bottom CD, for detailed values see Table 3.1. Dataset Period/nm Bottom CD/nm H4 840 140 H5 420 140 H6 280 140 D4,D8,F6,H8 720 540 Table 3.1: Design values of the EUV mask. Note that the uncertainty for the periods lies around 370 pm [73]. All dies share the remaining geometrical and structural parameters given in Table 2.1. Note that
  • 58. 36 Chapter 3: Maximum Likelihood and Least Squares the dependence of the reconstructed solution on the choice of weights shown in Fig. 3.6 has the same qualitative shape as the one obtained for simulated input data in Fig. 3.2. 0 0.02 0.04 0.06 0.08 0.1 535 540 545 550 555 560 565 570 b/a CD/nm bottom CD top CD 0 0.02 0.04 0.06 0.08 0.1 76 78 80 82 84 86 b/a sidewallangle/° Figure 3.6: Reconstructed CDs and SWA in dependence on ratio b/a for measurement dataset D4. H4 H5 H6 D4 D8 F6 H8 0.1 0.14 0.18 0.22 0.26 0.3 noiseparametera H4 H5 H6 D4 D8 F6 H8 0 0.005 0.01 0.015 0.02 0.025 b/a Figure 3.7: Reconstructed noise parameter a (left panel) and ratio b/a (right panel) with approximate 95% confidence intervals for measured data from the EUV scat- terometer. The dotted lines represent the mean values of the reconstructed values. The reconstructed noise parameters a and the ratios b/a for all measured datasets are plotted in Fig. 3.7. The mean estimated value for b/a lies around 1.2 · 10−2 , a value that differs significantly from the value of approximately 3 · 10−2 –10−1 (a = 1%–3%, b = 10−3 ) used for the LSQ method in previous publications [30]. The mean estimated value for the background noise b of about 2 · 10−3 agrees quite well with the value given by the experimenters; the relative error a lies around 16%, which is
  • 59. Chapter 3: Maximum Likelihood and Least Squares 37 about a magnitude larger than expected. Similar to the observations in the case of simulated input data we observe a systematic shift of the sidewall angles reconstructed using MLE towards the design value of 90◦ compared to the LSQ solutions with fixed weights (see Fig. 3.8). Note that the mean sidewall angle obtained by SEM in [59] was approximately 86◦ . H4 H5 H6 D4 D8 F6 H8 60 65 70 75 80 85 90 95 100 sidewallangle/° MLE LSQ Figure 3.8: Reconstructed sidewall angles with approximate 95% confidence intervals for measured data from EUV scatterometry. The dotted lines represent the mean values for the two methods. 3.2.4 Application to Measured DUV Data Finally, we apply the MLE approach to measurement data from the DUV scat- terometer. The dataset consists of 11 measurements on different mask fields, which were fabricated with identical design specification, e.g., at a period of 560 nm [60]. The MLE results for the parameters a and b characterizing the uncertainty of the input data with their approximate 95% confidence intervals are presented in Fig. 3.9. The parameter b representing the background noise of the measurement setup lies around 3 · 10−2 , which is in a fair agreement with the value of 4 · 10−2 given by the experimenters in [76]. Recall that the sensitivity of the reconstructed SWA with respect to changes in the ratio b/a is strong for small values of b/a and indifferent for large values of b/a (cf. Figs. 3.2 and 3.6), hence the geometrical features obtained by LSQ with fixed weights a = 2.5% and b = 4 · 10−2 (b/a = 1.6) are almost identical to those obtained by MLE (b/a ≈ 1), both in terms of consistency and variance, as shown in Fig. 3.10.
  • 60. 38 Chapter 3: Maximum Likelihood and Least Squares 2 4 6 8 10 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 noiseparametera dataset 2 4 6 8 10 0 0.5 1 1.5 2 2.5 3 b/a dataset Figure 3.9: Reconstructed noise parameter a and ratio b/a with approximate 95% confidence intervals for measured data from DUV scatterometry. The dotted lines represent the mean values of the reconstructed values. 0 2 4 6 8 10 80 81 82 83 84 85 86 87 88 dataset sidewallangle/° LSQ 0 2 4 6 8 10 80 81 82 83 84 85 86 87 88 dataset sidewallangle/° MLE Figure 3.10: Reconstructed sidewall angles with approximate 95% confidence intervals for measured data from the DUV scatterometer for LSQ (left panel) and for MLE (right panel). The dotted lines represent the mean values.
  • 61. Chapter 3: Maximum Likelihood and Least Squares 39 3.3 Chapter Summary In the present chapter we have seen how the usage of maximum likelihood estima- tion (MLE) can substantially improve the results for the critical dimensions compared to the standard least squares (LSQ) approach used in earlier papers [3, 30, 54]. We have demonstrated the sensitivity of the LSQ approach with respect to the used statistical model in terms of simulation data. Strong systematic deviations on the re- constructed CDs and sidewall angles could be observed if inappropriate weights that account for the measurement errors of the input data were chosen in the least squares function. Maximum likelihood estimation has been proposed as an alternative. Here the parameters modeling the measurement errors were included as variables in the optimization process. The ability of MLE to solve the inverse problem has been investigated by apply- ing it to simulated datasets with known variances of the input data. Using MLE the geometrical parameters and the noise model parameters could be reconstructed with sufficient accuracy. The variances of the reconstructed parameters were estimated using the Fisher information matrix. Furthermore, MLE has been applied to several sets of measurement data from different photomasks both for EUV and DUV scat- terometry. It has been shown that the inclusion of the parameters of the error model into the optimization improves the reconstruction of the mask’s geometry and leads to a much better agreement between results of the optimized and the correspond- ing measurement data. The obtained knowledge of the variances also allows a more realistic estimate of the accuracy of the reconstructed parameters. Application of MLE to EUV data yielded relative variances of 10%–20% super- posed by absolute background noise in the range 1–2·10−3 for the measurement data. The resulting variances of the CDs were found to be in the range of 2–3 nm, whereas the height variances are approximately 0.5 nm. The sidewall angles were systemati- cally larger than with application of the LSQ method and showed better agreement with the design values as well as with independent measurement with scanning mi- croscopy [59]. For DUV data MLE yielded relative variances of 3% superposed by absolute background noise in the range of 3·10−2 . The variances of the CDs were
  • 62. 40 Chapter 3: Maximum Likelihood and Least Squares found to be in the range of 1–2 nm and the height variances were determined to be 0.5 nm. The relatively high variances for the geometrical parameters of the EUV mask and the relative measurement noise of about 10%–20%, a value that is much higher than the references given by the experimenters, are presumably caused by the much higher sensitivity of the EUV mask to systematic errors stemming from oversimplifications in the mathematical model (e.g., assumption of perfect periodic line structure with- out roughness) and from incomplete knowledge of crucial model parameters (e.g., the periods of the multilayer). In contrast, the MoSi mask has a much simpler struc- ture and is therefore more robust against model errors. Nevertheless in both cases, accuracies in the 1 nm range are within reach. We will demonstrate how the incorporation of systematic errors, such as roughness [39] and deviations in the multilayer structure [30] into the modeling and in the MLE procedure, employed in the solution to the inverse problem, leads to even better reconstruction results in the following chapters.
  • 63. Chapter 4 The Effect of Systematic Errors on Scatterometry The last chapter closed with the conjecture that the relatively high variances for the geometrical parameters of the EUV mask may be caused by systematic errors due to an oversimplification in the mathematical model used to evaluate the measurement data. In this chapter we will therefore discuss the influence systematic errors have on the measurement data for scatterometry. Two types of errors will be investigated: the error caused by line edge (LER) and line width roughness (LWR) and the errors caused by variations of the multilayer system. The results presented in the first section are mainly from [26, 35, 36]; those presented in the second section can be found in [33]. 41
  • 64. 42 Chapter 4: The Effect of Systematic Errors on Scatterometry 4.1 Line Edge and Line Width Roughness The first source of systematic errors that we want to investigate are those caused by line edge and line width roughness. Images taken by atomic force microscopy (cf. Fig. 4.1) show that the assumption that geometry and material properties of the grating under investigation are invariant in one direction is not realistic. Instead, the absorber lines vary along the z-axis (coordinate system as in Fig. 2.4). A rigorous modeling Figure 4.1: Image taken by an atomic force microscope showing the presence of roughness on a photomask. (Image by Advanced Mask Technology Center (AMTC) [2]) of this variation in terms of a three-dimensional structure is at this point quite time- consuming or even impossible; therefore, a two-dimensional model of the roughness will be used. In this model roughness is modeled as a superposition of two effects. The first one, called line edge roughness or LER, is a variation of the center position of the absorbing structure. The second one, called line width roughness or LWR, is the variation of the width of the absorbing structures. Note that the computational domain for FEM needs to contain several profile lines in order to simulate roughness effects in this setting. This extended computational domain will be called a super cell. It consists of N neighboring absorber lines with an overall period of P = N · d with d being the period of the unperturbed line-space structure. Referring to Fig. 4.2, LER is modeled by random perturbations of the center positions xi, i = {1, .., N}, whereas the line width and the period d of each profile in this chain are fixed to their nominal values. LWR is presented by randomly perturbed line widths CDi, i = {1, .., N}, with unperturbed centers xi and constant pitch.
  • 65. Chapter 4: The Effect of Systematic Errors on Scatterometry 43 Figure 4.2: Super cell containing many profile lines used for roughness modeling by randomly changed center positions for LER and randomly varied line widths for LWR The perturbations are assumed to be normally distributed around the unperturbed center positions and the nominal line width with variances σ2 xi and σ2 CDi . Note that realizations resulting in perturbations that are larger than the period of the grating are not considered. For different lines, these perturbations are assumed to be inde- pendent. Obviously, the positions of left and right edges of the lines are correlated in this modeling concept, i.e., the correlation coefficient is +1 for LER and −1 for LWR. If both effects are superimposed, denoted by LEWR, a roughness of the left and right edges is provided and the variance of each line edge is given by σ2 edge = σ2 xi + σCDi 2 2 . Regarding the impact of line roughness for EUV gratings, Kato et al. [39, 40, 62] have used the same approach of randomly distributed center positions or line widths for their analytical considerations with Fraunhofer’s diffraction method. Germer [24] applied similar design principles for his profile variations of silicon lines investigated in the visible spectral range. Schuster et al. [65] have studied the impact of LER for silicon gratings on the basis of sinusoidal perturbations for the line positions with amplitudes in the range of 2–8 nm and for incident light with wavelengths of 400 nm and 250 nm. Bilski et al. [7] have used a RCWA model to demonstrate that the presence of LER influences the reconstructed CDs.
  • 66. 44 Chapter 4: The Effect of Systematic Errors on Scatterometry We present computations for FEM domains containing 24 rectangular absorber lines with a period of 280 nm and a line-to-space ratio (L:S), i.e., top CD (d−top CD) , of 0.5, leading to super cells with periods P of 6.72 µm. About 1,000 diffraction patterns for two different scenes of perturbations were calculated. Standard deviations of 2.8 nm and 5.6 nm, i.e., 1% and 2% relative to the period of d = 280 nm were used to create random samples of super cells containing the normally distributed center positions and line widths, respectively. The resulting orders of diffraction extend from −9 to +8 and efficiencies smaller than 10−3 % were excluded. A wavelength of λ = 13.389 nm and an angle of incidence of θinc = 6◦ were applied. Both values are typical for EUV scatterometry. The optical indices of the material components and the fixed mask parameters are given in Tab. 4.1 below. Material Height/nm SWA/◦ n k TaO 12.00 90 0.948 0.0310 TaN 60 90 0.942 0.0342 SiO2 (buffer) 8.40 90 0.975 0.0153 SiO2 (capping) 1.654 0.975 0.0153 Si 12.869 1.000 0.0018 MoSi 0.147 0.970 0.0042 Mo 2.141 0.926 0.0062 MoSi 1.972 0.970 0.0042 Si 2.838 1.000 0.0018 Table 4.1: Geometric parameters and optical constants at a wavelength of 13.389 nm of the EUV mask used for LER/LWR simulations, period d = 280 nm. Figures 4.3–4.5 reveal the details of these calculations. Looking at the simulated efficiencies as a function of the diffraction order in Fig. 4.3, we see significantly in- creased variances relative to the calculated efficiencies for higher diffraction orders. Furthermore, one realizes that a doubling of line roughness gives rise to a dispro- portional growth of the variances of the efficiencies. The mean efficiencies over all samples are depicted as diamonds in Fig. 4.3.
  • 67. Chapter 4: The Effect of Systematic Errors on Scatterometry 45 −10 −5 0 5 10 10 −3 10 −2 10 −1 10 0 10 1 diffraction orders (λ = 13.389 nm) (a) efficiencies/% LEWR simulations; σ = 2.8 nm; 24 lines −10 −5 0 5 10 10 −3 10 −2 10 −1 10 0 10 1 diffraction orders (λ = 13.389 nm) (b) efficiencies/% LEWR simulations; σ = 5.6 nm; 24 lines Figure 4.3: Simulated diffraction patterns for randomly perturbed line-space struc- tures at a wavelength of 13.389 nm; blue circles depict the calculated efficiencies and diamonds the mean efficiencies of all 1,000 samples; two different random perturba- tions of the center positions and the widths (LEWR): (a) σxi = σCDi = 2.8 nm and (b) σxi = σCDi = 5.6 nm. −8 −6 −4 −2 0 2 4 6 8 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 order (a) normalizeddeviations (REF−SIM)/REF; σ = 2.8 nm; 24 lines −8 −6 −4 −2 0 2 4 6 8 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 order (b) normalizeddeviations (REF−SIM)/REF; σ = 5.6 nm; 24 lines Figure 4.4: Normalized deviations from the efficiencies of the unperturbed reference line structure, depicted as circles; diamonds represent the mean over the deviations of all 1,000 samples; dashed lines indicate the mean ± standard deviation of the normalized deviations; two different random perturbations of the center positions and the widths (LEWR): (a) σxi = σCDi = 2.8 nm and (b) σxi = σCDi = 5.6 nm.
  • 68. 46 Chapter 4: The Effect of Systematic Errors on Scatterometry −8 −6 −4 −2 0 2 4 6 8 −0.1 0 0.1 0.2 0.3 0.4 orders normalizedstandarddeviation FEM results: σx i = σCD i = 5.60 nm FEM results: σx i = σCD i = 2.80 nm Approx.: 1−exp(−σ r 2 k j 2 /3); σ r = 6.09 nm Approx.: 1−exp(−σ r 2 k j 2 /3); σ r = 3.09 nm Figure 4.5: Standard deviations relative to the mean perturbed efficiencies, shown as circles for the two given examples in previous Fig. 4.4; diamonds depict approxima- tions by an exponential function. A systematic nonlinear decrease of the mean efficiencies for higher diffraction or- ders along with increasing variances is observed for different degrees of roughness. The deviations between the unperturbed and the mean of perturbed efficiencies nor- malized to the unperturbed values are always greater than zero. Figure 4.5 reveals this by comparison and normalization with the reference values of the efficiencies obtained from the unperturbed line-space structure. The systematic decrease can be approximated by an exponential function in the following way: We assume that the general aperiodic perturbation in the sense of the applied LEWR model (cf. Fig. 4.2) can be characterized by a roughness parameter σr that scales with the imposed perturbations σedge = σ2 xi + σ2 CDi /4 of the given grating samples, such that σr = α · σedge. The mean normalized deviations relative to the references can then be approximated by the following exponential function fj,ref(p) − fj,pert fj,ref(p) ≈ 1 − exp(−σ2 r k2 j ) = 1 − exp(−(ασedge)2 k2 j ), (4.1) with σr = 3.09 nm (α = 0.99) and σr = 6.09 nm (α = 0.97). The diffraction order nj is expressed by the corresponding x-component of the wavevector of the propagating plane wave mode for incidence angle θinc = 0◦ (cf. [27]), i.e., kj = 2πnj/d.
  • 69. Chapter 4: The Effect of Systematic Errors on Scatterometry 47 Equation (4.1) implies that random perturbations of line and space widths cause an exponential damping of the mean efficiencies similar to a Debye-Waller factor [19, 72]. The exponent is proportional to the product of the squared diffraction orders nj and a constant σ2 r which approximates the variance of the line centers and widths. These outcomes confirm the validity of the main formula derived by Kato and Scholze [39] using Fraunhofer approximation. The increasing variances of the efficiencies with higher diffraction orders also be- come very clear. For the given two examples of LEWR perturbations, Fig. 4.5 depicts the standard deviations of the efficiencies relative to their mean values. Note that they can be approximated by an exponential function too. However, its exponent is weighted by 1/3 compared to Eq. (4.1) and using the determined values 3.09 nm (α = 0.99) and 6.09 nm (α = 0.97) for σr = α · σedge, characterizing the damping of the mean efficiencies. The most important consequence from the presented numerical experiments is that the revealed LER/LWR bias has to be included in the model by an order-dependent damping factor. The representation of the measurements and their associated vari- ances previously given in Section 3.1 extend at least to yj = exp −σ2 r k2 j fj(p) + ǫj, (4.2) σ2 j ≈ a exp −σ2 r k2 j fj(p) 2 + b2 . (4.3) However, the variances of the efficiencies due to LEWR, i.e., the intensity fluctuations around the damped efficiencies, are not considered. For real EUV measurements taken from surface areas with a measurement spot size in the range of 500 µm x 500 µm we expect that their contribution to σ2 j will be significantly reduced by spacial averaging compared to the calculated values in our investigations. In order to estimate the dependency of the variances on the spacial averaging, i.e., the number of lines in the super cell, the above calculations are repeated for a super cell consisting of N = 48 lines leading to a period of 13.74 µm. However, only 100 diffraction patterns are calculated, due to the computational costs. It turns out that the order-dependent damping is similar to the case of 24 lines and that the associated variances are scaled by a factor 24 N = 1 2 compared to those calculated for a super
  • 70. 48 Chapter 4: The Effect of Systematic Errors on Scatterometry cell consisting of 24 lines. Hence, given a value of N = 1,785, i.e., a virtual extension of our FEM super cell period to 499.8 µm (= 0.280 µm·1,785), approximately the size of the probed area for EUV scatterometry, the normalized standard deviations of the efficiencies would be scaled down by a factor of about 0.116 compared to the values given in Fig. 4.5 obtained with 24 lines. For higher diffraction orders, LEWR-based fluctuations of the efficiencies of several percent remain and correspond to variances as in the first term on the right-hand side of Eq. (4.3), with typical values for factor a in the range of 0.01–0.03 (cf. Section 3.1). However, we will see in Section 4.2 that the contribution to the variances due to multilayer variations are about a magnitude higher than those due to LEWR. Therefore, we will only take the bias caused by the order-dependent damping factor into account and will neglect the slightly increased variances for LEWR for the sake of simplicity. This leads to a modified model function that, for a given wavelength and a given incident angle, depends on the geometric parameters p and on the general aperiodic perturbation σr, such that f : Rn → Rm , (p, σr) → f (p, σr) = exp −σ2 r k2 j fj(p) m j=1 (4.4) with kj = 2πnj d , j ∈ {1, . . . , m}. The presented results are obtained for a typical EUV measurement setup, with a fixed incident angle and diffraction patterns consisting only of reflected modes due to the special design of the EUV mask. Similar investigations were additionally performed for the DUV measurement setup of the MoSi mask. It turned out that the damping effect also occurs for transmitted modes and that it is independent of the incident angle of the light source; cf. Figs. 4.6–4.10.
  • 71. Chapter 4: The Effect of Systematic Errors on Scatterometry 49 −2 0 2 4 6 10 −3 10 −2 10 −1 10 0 10 1 10 2 diffraction order efficiencies/% −2 0 2 4 6 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 diffraction order normalizeddeviations Figure 4.6: Effect of LEWR on the transmitted modes for the MoSi mask at a wavelength of 193 nm, incident angle θinc = −43.6◦ and σxi = σCDi = 5.6 nm for 100 samples. −5 0 5 10 −1 10 0 10 1 10 2 diffraction order efficiencies/% −5 0 5 −0.05 0 0.05 0.1 0.15 0.2 diffraction order normalizeddeviations Figure 4.7: Effect of LEWR on the transmitted modes for the MoSi mask at a wavelength of 193 nm, incident angle θinc = 0◦ and σxi = σCDi = 5.6 nm for 100 samples.
  • 72. 50 Chapter 4: The Effect of Systematic Errors on Scatterometry −1 0 1 2 3 4 5 10 −2 10 −1 10 0 10 1 diffraction order efficiencies/% −1 0 1 2 3 4 5 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25 diffraction order normalizeddeviations Figure 4.8: Effect of LEWR on the reflected modes for the MoSi mask at a wavelength of 193 nm, incident angle θinc = −43.6◦ and σxi = σCDi = 5.6 nm for 100 samples. −3 −2 −1 0 1 2 3 10 −2 10 −1 10 0 10 1 diffraction order efficiencies/% −3 −2 −1 0 1 2 3 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 diffraction order normalizeddeviations Figure 4.9: Effect of LEWR on the reflected modes for the MoSi mask at a wavelength of 193 nm, incident angle θinc = 0◦ and σxi = σCDi = 5.6 nm for 100 samples.
  • 73. Chapter 4: The Effect of Systematic Errors on Scatterometry 51 −2 0 2 4 6 −0.05 0 0.05 0.1 0.15 0.2 diffraction order normalizeddeviations FEM results Approx: 1−exp(−σ2 r k j 2 ); σ r =6.26 nm Figure 4.10: Normalized deviations from the transmitted efficiencies of the unper- turbed reference line structure for an incident angle θinc = −43.6◦ , σxi = σCDi = 5.6 nm; Squares depict mean over all 100 samples, the solid line depicts the approximation from Eq. (4.1) with σr = 6.26 nm. 4.2 Multilayer System Variations The construction of the measured EUV photomask itself is another source of sys- tematic errors. In order to improve its reflectivity the mask is set atop of a multilayer system (MLS) that serves as a Bragg mirror [11, 46]. This MLS consists of 49 peri- odically repeated groups of four layers. Such a group consists of a molybdenum layer (Mo) and a silicon layer (Si) separated by two interdiffusion MoSi layers and two absorbing layers (SiO2, Si) on top of it (see Fig. 2.5). The layer heights are usually fixed during the optimization process. However, incomplete knowledge of the heights of those layers can cause errors in the reconstruction. The sensitivity of the reconstruction with respect to changes in those parameters has already been investigated in [30] in terms of a simple least squares approach to the inverse problem. In this section we concentrate on the direct impact of changes in the MLS on the simulated diffraction pattern. For the sake of simplicity we only investigate the influence of the height of the first and second absorbing layers, hfc and hsc, respectively, as well as that of a parameter κ denoting a scaling factor by which simultaneously all heights of the layers in the 49 periodically repeated groups
  • 74. 52 Chapter 4: The Effect of Systematic Errors on Scatterometry of MoSi layers are scaled. These parameters are summarized and denoted by the vector ν = (hfc, hsc, κ). We use two different line-to-space ratios in this investigation. For the first one a top CD of 93 nm and a period of 280 nm are chosen resulting in an L:S of 1:2; the second one has a top CD of 540 nm and a period of 720 nm, leading to an L:S of 3:1. For the investigation, 1,000 random samples of MLS parameters are drawn and the corresponding diffraction patterns for a wavelength of 13.4 nm and an incident angle of 6◦ are calculated. The three parameters are chosen to be independent and normally distributed with means and standard deviations given in Tab. 4.2. Note that the remaining geometric parameters and simulation conditions are the same in all the cases. Parameter µ σ hfc 1.2 nm 0.01 nm hsc 12.9 nm 0.5 nm κ 1 10−3 Table 4.2: Means and standard deviations for the three MLS parameters. The resulting distribution of efficiencies can be found in Fig. 4.11 for both line- to-space ratios. Even though the variations in the MLS are small, the resulting perturbation of the simulated diffraction pattern is quite strong. As one might expect, the overall variance in the efficiencies increases as the line-to-space ratio decreases. This is due to the fact that a lower L:S means that more radiation is reflected. In the present case the efficiencies for the mask with an L:S of 1:2 are about a magnitude higher than those for the mask with an L:S of 3:1, and hence variations of the MLS have a stronger impact on the efficiencies. The variance is about 15% in mean and can be as high as 300% of the unperturbed value for the mask with an L:S of 1:2 and about 12% in mean and at most 80% of the unperturbed value for the mask with an L:S of 3:1. However in contrast to LER/LWR, the effect that variations of the MLS have on the diffraction pattern cannot be described by a suitable analytic formula.
  • 75. Chapter 4: The Effect of Systematic Errors on Scatterometry 53 −5 0 5 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2 diffraction order efficiencies/% −5 0 5 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 diffraction order normalizeddeviations −10 −5 0 5 10 10 −2 10 −1 10 0 10 1 diffraction order efficiencies/% −10 −5 0 5 10 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 diffraction order normalizeddeviations Figure 4.11: Simulated diffraction patterns for randomly perturbed MLS (left panels) and normalized deviations from the efficiencies of the unperturbed MLS (right panels), diamond symbols represent the mean over all samples; dashed lines indicate the mean ± standard deviation. The top panels show the effect for a period d = 280 nm and L:S of 1:2, bottom panels for a period of 720 nm and L:S of 3:1.