In our presentation, Michaël and I will present our solution to tackle a particular problem. Michaël will focus on explaining the specific requirements of his office, while I will focus on how we solved the problem.
The client wanted to migrate specific data from one visualization platform to another platform in an automated way. The platform to migrate from communicating through a data streaming JSON message platform, with all new information being added to this platform (about 20k-30k messages daily). Relevant messages needed to be filtered out from this data stream and transformed in a specific way, as defined by an Excel spreadsheet. This spreadsheet went beyond a simple one-to-one transformation and defined which objects should be migrated and which attributes should be inherited. For instance, the spreadsheet even defined if an attribute should get a value from an external table based on a certain join condition. This was just one example of the complex mapping we were facing.
Additionally, the spreadsheet needed to be dynamically adjustable, ensuring that it was generic and future-proof as the source system is in full expansion and offers customizable data models for each of its object classes. This means that the Excel mapping sheet for migration was not yet complete and would be expanded and modified in the future at minimal cost. We solved this problem by using a set of five workspaces that were all connected with each other through JobSubmitters and automations on FME Server. These ran on a scheduled basis. The mapping workspace dynamically analyzed which information needed to be migrated and how, based on the external lookup table. The complex mapping was solved by making custom transformers for the task. The message queue was also cleaned in the process, as well as producing multiple output data products, each fit for the customer’s needs, only after running validation tests to allow error handling and appropriate logging.
5. The
Peak
of
Data
Integration
20
23
Topic
Using simple spread sheets to extract specific data from a
big data platform.
Advantages?
• User friendly: separate configuration from scripting
• Transparency: paying attention to functional design
• Future-proof: allowing the source and target platforms to scale
• Limited to no adaptations needed to the script
6. The
Peak
of
Data
Integration
20
23
Problem
● New business processes shorten
information management cycle
● One source platform replacing x
different sources
● Data available through message
queue streaming JSON objects
● About 30k messages/day, highly
variable in size
● 1 message contains 1-n objects
● ‘last one wins’
● Automate ETL to uniformise and
shorten data updates
● Allow for efficient future scaling
7. The
Peak
of
Data
Integration
20
23
Difficulties
● How to set up a workflow fit for the job?
● Performant and robust in a non-controllable message stream
volume
● How to make this generic & future-proof?
● Source and target platforms have different stakeholders and
evolve independently
● How to do complex mapping?
● Finding the breakoff point of spreadsheet configuration
8. The
Peak
of
Data
Integration
20
23
Difficulties
● How to deploy this on FME Form and different FME Flow
environments?
● Creating a portable script on FME Form connecting to the source
platform environments on FME Form
● Scalability?
● Accounting for future scaling in the source as well as target
platforms
25. Call to Action
1. Think further than the standard transformers
2. Make your process future-proof by making it as generic
as possible
3. SchemaMapper cannot be configured with scripted
parameters
4. @Value(@Value(Attribute)) is a handy trick for generic
flows