Date: 14th November 2018
Location: Fast Data Theatre
Time: 14:30 - 15:00
Speaker: Neil Condon
Organisation: Edwards Vacuum
About: Semiconductor fabrication plants build devices with billions of components, each 1000x smaller than the human hair, with some features that are only a few atoms across. They are the most highly automated manufacturing environments in the world: In large fabs, time-series sensor data alone tops 50 TB/day, and combining that data with subject-matter-expertise fast enough to keep the automated production equipment functioning, is a major and growing challenge. The level of collaboration required to build high-performance real-time analytics, combined with the IP-sensitive nature of the data, results in a unique DataOps environment, where the use of Cloud resources serves to complicate rather than simplify the value equation. We’ll explore some of the challenges, and discuss the attributes of a PaaS that could help the industry tackle its fast-data challenges.
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
1. USING FAST DATA TO MAKE…
SEMICONDUCTORS
Neil Condon | 14 November 2018
2. ABOUT ME
§ Neil Condon (neil.condon@edwardsvacuum.com)
§ 15+ years working alongside leading edge semiconductor players
§ Responsible for understanding future customer needs
§ Advocate for the data-driven business
– Technology and culture
– New, valuable products and services
§ Contributor to the IRDS: Technology road-mapping
– Factory Integration
§ Big data combined with subject-matter expertise
§ Data and IP security
2
3. ABOUT MY COMPANY
§ Global critical subsystems supplier to the industry
– Vacuum equipment, for low-pressure processing, of all kinds
– Gas treatment equipment, for safety and environmental stewardship
§ Wholly owned by Atlas Copco, since 2014 (~46,000 employees; ~US$12.7bn)
§ 6300 employees, globally
§ US$2.15bn revenue (2017)
§ Headquartered, with core R&D in the UK
§ Manufacturing in US, UK, Korea, China
§ Customers all over the world…
3
4. OUTLINE
§ Background to Semiconductor Manufacturing
– Themes that define the industry
§ Role of Fast Data in Semi Manufacturing
– What we know, and what’s missing
§ Example: semi-supervised learning from time-series
§ “Big Data” Semi Roadmap Challenges
4
5. OUTLINE
§ Background to Semiconductor Manufacturing
– Themes that define the industry
§ Role of Fast Data in Semi Manufacturing
– What we know, and what’s missing
§ Example: semi-supervised learning from time-series
§ “Big Data” Semi Roadmap Challenges
5
6. A SENSE OF SCALE, IN SEMI…
6
x 300,000 /month
>100 sensors @ 10 Hz
>1000x 1 min steps
>1.1 GB / wafer
>11 PB / day
7. COMPLEX AUTOMATED TRANSPORT & ROUTING
7
OperationA
OperationB
OperationC
OperationD
OperationE
Product X
Product Y
8. A COMPLEX SUPPLY CHAIN
8
Designs
Fabless
firms
Completed
wafers
Fabs &
Foundries
Functioning
“die”
•Binned by
performance
Dicing &
test
Packaged
“chips”
Packaging
“Final”
Product
Application
Process design rules Device/location
test results
Performance distribution
Equipment
Process
technology
9. 9
Sources: Price, Waterhouse, Coopers; Morgan Stanley (2017); Intl. Business Strategies (2017); IC Insights (2018)
CAGR (1985-2012) 10.1%
CAGR (2013-2018) 4.3%*
Fab Costs ↑ 168%
Process Development Costs ↑225%
Chip Design Costs ↑341%
Sector Revenue Growth Sector Cost Growth
Sector Revenue Drivers Sector Cost Drivers
2017 - 20XX, SEMI MARKET SECTOR DYNAMICS
Increasing complexity + need for SW/HW co-development
2017 -20XX:
Operational Challenges
for the Semiconductor
Industry
Since 2013: Tablet+Gaming shrinking; Automotive+IoT now significant
10. OUTLINE
§ Background to Semiconductor Manufacturing
– Themes that define the industry
§ Role of Fast Data in Semi Manufacturing
– What we know, and what’s missing
§ Example: semi-supervised learning from time-series
§ “Big Data” Semi Roadmap Challenges
10
15. THE SEMICONDUCTOR STORY…
§ 1000+ complex processes, in sequence
– If all are executed correctly, we get functioning products.
– If not – what went wrong?
§ It’s dark, and the torch batteries are fading…
– More and more of the information that would confirm we’re on track, isn’t available.
– It’s expensive and slow to actually look at the wafer.
§ We can’t see where we’re got to, so we “listen” for other indications
– We try to do all the right things well, in the right order
– We “listen” to the production equipment, for noises that confirm that all is well
§ Yield Management (YM) / Fault Detection & Classification (FDC) tools help us to identify problems
– We use what we see as feedback/feed-forward to the factory automation
§ If these are inadequate, or mislead us, things can get messy…
15
16. HOW DO WE TRACK WHAT’S GOING ON?
§ Post-process inspection is slow
§ Inspection equipment is $$$
16
iall
i1
i2
i3
i1
i2
i3
17. OUTLINE
§ Background to Semiconductor Manufacturing
– Themes that define the industry
§ Role of Fast Data in Semi Manufacturing
– What we know, and what’s missing
§ Example: semi-supervised learning from time-series
§ “Big Data” Semi Roadmap Challenges
17
18. THE “FDC” PARADIGM
§ FDC engineers evaluate variability by monitoring time-series traces:
18
spike
slope
plateau
damped oscillation
exponential decay
“control plan” = { parametric features }
19. THE “FDC” PARADIGM
§ FDC engineers evaluate variability by monitoring time-series traces:
19
Contextual triggers:
e.g. signals from tools, or the MES
20. COMMON BUGBEARS OF FDC AT SCALE
20
Lots of process recipes, particularly in Foundry context
Recipes evolve in R2R control paradigm
Recipes adjusted in CIP of throughput or yield
Control Plan CIP, to improve fault detection performance
Different sites develop different CP for same process
…
§ Faster creation of Control Plans
– semi-supervised machine learning?
§ “Elastic” Control Plans
– plans that adapt and “stretch” to programmed recipe variation, without explicit revision?
§ New methods of variability measurement, which are “schema-lite”
– full sensor trace analytic?
Desirable:
22. MOTIF DISCOVERY - TECHNICAL CHALLENGES
Why is this hard?
§ A priori, the number of motifs is unknown
– Many general purpose clustering techniques, e.g. K-means can’t help
§ The length of motifs is unknown, and varies (e.g R2R)
– You may have to try sub-sequences of all lengths, and different lengths
§ Distance computation (measuring similarity) is expensive
– Euclidian distance – O(N); Dynamic-Time-Warping – O(N2), in general
§ In a real-time context it’s hard to summarise data to simplify the problem
– Many techniques (e.g. normalisation) often rely on statistics of the whole TS, which
we don’t have in real-time
22
23. EFFECTIVE ALGORITHMS ARE COMPLEX
§ ML pipelines use many stages, and leverages domain-knowledge in most subroutines
23
Real-time
data
Subsequence
selection
Known
motifs &
occurrences
Matching
algorithm
Domain
rules
Last n motifs
Expectation
theory
Match?
Partial
match?
Common
subsequence
discovery
Domain
hyper-
parameters
N
N
Y
Y
Anomaly? Normal
Advisory
Adaptive
control
Product
application
knowledge
Predictive
maintenance
algorithms
Product
application
knowledge N
Y
24. OUTLINE
§ Background to Semiconductor Manufacturing
– Themes that define the industry
§ Role of Fast Data in Semi Manufacturing
– What we know, and what’s missing
§ Example: semi-supervised learning from time-series
§ “Big Data” Semi Roadmap Challenges
24
26. “BIG DATA” & ANALYTICS REQUIREMENTS
26
YEAR OF PRODUCTION: 2018 2019 2020 2021 2022 2023 2024 2025
Velocity Big Data Requirements
FICS design to support peak equipment data transfer rates (production rate for
each variable).
10 Hz 100Hz 100Hz 1 kHz 1 kHz 1 kHz 1 kHz >1kHz
FICS factory data transfer rates (Bytes / s) per 1000 tools in fab. >2 MHz >10
MHz
>10
MHz
>16
MHz
>16
MHz
>16
MHz
>16
MHz
>16
MHz
Variety Big Data Requirements
Standards to support automatic merging of data stores (Maintenance, Diagnostic
output, Trace, Process Control, Yield and Execution Log) across FI space.
Partial Partial Partial Full Full Full Full Full
Value Big Data Requirements
Enterprise-wide integration of fab and facility data stores TBD
Performance of data I/O to/from the cloud TBD
Data integration up and down the supply-chain TBD
Standards for secure cloud data access TBD
Migration to “Big Data friendly” ecosystems (e.g. Hadoop)
Used for offline analysis and modelling Partial Partial Partial Full Full Full Full Full
Used for “real-time” online diagnostics and control None Minimal Minimal Partial Partial Partial Partial Partial
Reference: https://irds.ieee.org/roadmap-2017
Manufacturable
solutions exist/are
being optimised
Manufacturable
solutions are known
Manufacturable
solutions are NOT
known
27. SECURITY ROADMAP
YEAR OF PRODUCTION: 2018 2019 2020 2021 2022 2023 2024 2025
Security for data sharing
Classification of data into Proprietary/Licensed/Shared/Public categories and establishment
of Data Owners and Licensees
Establishment of Distributed Trust mechanisms for Data Owners, Consumers and
Autonomous Agents
Establishment of non-repudiation, tamper detection, traceability and loss management
features for data.
Development of industry standards for each category of data in a fab and the standard
security level for that category
To balance data confidentiality and integrity with availability by partitioning data with IP
protections and standardizing data encryption.
Adoption of IT standards for identity and access management including human and non-
human access,
To facilitate central management on user accounts management throughout Fab including
production equipment (may include single sign-on as appropriate)
Security for Equipment Operation by the FICS
Protection of the equipment's instrumentation and control systems from attack.
IP protection capabilities and achieving balance between data availability and IP protection
Security for Big Data and Leveraging Big Data for Security
Security protocols in place to support Cloud Computing as a solution for FI systems
Application of big data analytics to identifying security issues
27
Research
required
Development
underway
Qualification/
pre-production
Continuous
improvement
To be
determined
Reference: https://irds.ieee.org/roadmap-2017
28. A SECURE PLATFORM THAT…
§ Spans on-premise, multi-premise and multi-cloud
– Allow data to be moved between trusted parties
– Allow cost-effective use of cloud services for analytics etc.
§ Allows access control policy to be administered and applied consistently, wherever the data are
– Land the data in the fab
– Transfer/replicate the data, for a purpose, to an equipment supplier, or the cloud.
– Data owners can rescind access or delete copies of the data, wherever resides physically
§ Allows analytics to be built and deployed where it’s needed
– Portable, orchestratable, life-cycle managed
§ Captures the valuable output from analytics
– So that it can also be secured
– So that it can be “consumed” in “real-time”, and used in factory automation
28
A Fast Data
PaaS for
Semi?
Multi-
premise
Cloud
Identity
mgmt
Big
Analytics
Small
Analytics