In this talk Mark Baker (CSL) will show how CSL Behring is Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI to a central Hadoop data lake at CSL Behring
The challenge of merging data from disparate systems has been a leading driver behind investments in data warehousing systems, as well as, in Hadoop. While data warehousing solutions are ready-built for RDBMS integration, Hadoop adds the benefits of infinite and economical scale – not to mention the variety of structured and non-structured formats that it can handle. Whether using a data warehouse or Hadoop or both, physical data movement and consolidation is the primary method of integration.
There may also be challenges with synchronizing rapidly changing data from a system of record to a consolidated Hadoop platform .
This introduces the need for “data federation” , where data is integrated without copying data between systems.
For historical/batch data use cases there is a replication of data across remote data hubs into a central data lake using Apache NIFI.
We will demo using Apache Zeppelin for analyzing data using Apache Spark and Apache HIVE.
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Ähnlich wie Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
Ähnlich wie Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring (20)
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI and Apche Zeppelin to a central Hadoop data lake at CSL Behring
1. INGELA VIKSTROM, ANABEL SILVA, SANDRO PRATO
CSL Bio21 Research Scientists
Australia
INGELA VIKSTROM, ANABEL SILVA, SANDRO PRATO
CSL Bio21 Research Scientists
Australia
MARK BAKER
Head of Big Data Infrastructure
CSL Behring
ANALYZING DATA FROM MULTIPLE
MANUFACTURING SITES USING A CENTRAL
HADOOP DATA LAKE
2. Outline
• CSL Behring
– Introduction of CSL Behring
• CSL Behring’s products and focus
• Growth and global placement of manufacturing facilities
• Current PACE globalization initiative
• Streamlining global processes to improve efficiency
• Partnership with Hortonworks to create our Big Data
Platform
• HDP for Data lake and analytics using Zeppelin
• HDF for secure data movement from global manufacturing sites to
our central data repository SAP HANA & HDP.
• Q & A
3. CSL Behring’s Products and Focus
• CSL Behring
– CSL Behring is a global biotherapeutics leader
– Focused on serving patients’ needs by using the latest
technologies
• Deliver innovative therapies that are used to treat rare
and serious conditions.
– One of our “super orphan” therapies treats a condition affecting
approximately 300 patients in the U.S. and only one million
worldwide. To meet growing demand and bring more therapies to
more patients, we continue to invest in the expansion of all our
manufacturing facilities
4. Business Driver
PACE globalization initiative
• PACE is a global, transformation initiative that fulfills our
promise to patients by aligning our processes and
enhancing collaboration to achieve sustainable business
excellence
• Provide advanced analytics capabilities to exploit
existing and new data assets, support decision-making,
and provide predictive models
• Build user community with the right skills and right tools
5. Global Manufacturing Facilities
• Manufacturing Sites
• United States
– Kankakee
• Germany
– Marburg
• Switzerland
– Bern
• Australia
– Melbourne
• Historically separated by region and operated
independently
8. Challenges
• Each Manufacturing system uses a different backend
databases and schema to log the batch execution steps
– 12 x SCADA and MES systems
• Edge servers must not impact MES system performance
– Sensitive systems required impact assessment prior to direct
data extracts
• Data must be encrypted in motion and at rest
– HIPAA compliance and EU privacy requirements
• Data must be compressed over the WAN
– Due to bandwidth constrictions on intranet
• Multiple time zones and string encodings
9. NiFi
• Allows the creation of custom processors for each MES
system (python).
• Uses back pressure to eliminate any full database pulls
after network/hardware outages.
• Encrypts data over the wire.
• Compresses data over the wire
• Allows data enrichment for the addition of UTC column.
• ETL functionality allows for special characters to be
transformed into data analytical tools can process ex. ṏ