2. Integration Techniques for ELNs
• My background
• Why do we need to integrate ELNs?
• Why kinds of integration do we need to do?
• What prerequisites are there?
• Some examples of technologies and techniques
• Summary
• You can download copies of this presentation from
our web site
http://www.amphora-research.com/
2
3. My background
• MEng in Information Systems Engineering
• First “ELN” was a consulting project for Kodak
• Started in 1996
• Completely electronic, fully integrated
• Thousands of users, worldwide
• This grew into Amphora
• Merged with PatentPad in 2003
• Paper or electronic records according to legal
preference
• Scientists still get an “Electronic” system
• Partner with a wide variety of “ELN” vendors
• Member of CENSA, working on long term
records, serving on Steering Team
http://www.amphora-research.com/
3
4. Experience
• Primarily in ELNs for discovery
• Where patents are a major concern
• I am sure some of this is relevant to regulated areas,
but that’s not my focus
• Work a lot with other “ELN” vendors
• Seldom do you buy one system
• Which means we end up seeing a lot of integration!
• In a variety of industries, all sizes of deployment
• Pharma
• Biotech
• Chemicals
• Customers around the world, offices in the US &
the UK
http://www.amphora-research.com/
4
5. What’s an ELN?
• The term “ELN” is now used to described a wide
variety of systems
• Science specific
• Reaction planning tools, Cheminformatics
databases, structure drawing tools
• Analysis packages, LIMS
• Workflow tools
• General
• Knowledge/Document Management
• Scientific data management
• Laptop/Tablet computers
http://www.amphora-research.com/
5
6. Observations
• The term “ELN”
• Is so ambiguous it can mean almost anything
(especially to a marketing person)
• Doesn’t help us much from a systems architecture
perspective
• A company is unlikely to have just one system that
could be called an “ELN”
• Those ELNs will need to integrate with your
existing & future systems
• Your needs will change with time, so you need to
be able to protect your investment
• In data
• In tools
• In processes
http://www.amphora-research.com/
6
7. Deconstructing “ELN”
• At first sight an ELN project success can look very
complex
• ELN functionality can be split into two dimensions
• Some aspects are common to everyone
• Other requirements are specific to a particular group of
scientists
• Splitting out the functionality into these dimensions really
helps to keep you sane
“Broad” aspects
Security, Collaboration, Patent Protection
etc.
A B C D
http://www.amphora-research.com/
7
8. Benefits
• The corporate functions (Legal, Records, etc.) can
buy/provide a system that provides a service to
the niche-specific systems
• Meet corporate requirements for records etc.
• Provide a cross-discipline collaboration
• The individual niches can buy/find systems to
support their specific needs
• Leverage existing investments
• Justified according to the benefits they bring
• Removes any need to balance competing requirements
• Reduce the need
• Systems can be acquired/purchased in a phased
approach tailored to the needs & requirements of
the business
• Life is a lot less stressful
http://www.amphora-research.com/
8
9. Different levels of abstraction
The “Experiment” is
generally the boundary
between Broad Vs Deep
systems
“Broad” aspects Projects
Experiments
Reports
Raw Data
A B C D
http://www.amphora-research.com/
9
10. Types of integration
Broad/Deep boundary “Broad” aspects
is often exposed as
network-level services
which are relatively
standardized
A B C D
Integrations between different niche systems
is generally custom
http://www.amphora-research.com/
10
11. What prerequisites are there?
• From your ELN product(s)
• Open Interfaces
• Open Data
• Plumbing
• Various technologies, some simple, some more complex
• Expertise - often in-house, sometimes consultants
• Good news - the Open Source movement is really
helpful
• Tools & techniques
• Drive for openness
• Remember: you need to ask your vendor for all of
the “Open” stuff before you sign the order
http://www.amphora-research.com/
11
12. Open Interfaces
• What’s an “Interface”?
• Where one system “prods” another to do something
• Or get some information out
• Or put some information in
• Generally some data is passed back & forth
• What’s “open”?
• Something you can use without undue burden or
barrier
• This covers both commercial and technical aspects
• Concerns are very similar to those involved with Open
Data
http://www.amphora-research.com/
12
13. Open Data
• This is currently a bit of a blind spot for
purchasers of IT systems
• Unfortunately, Open Data is absolutely critical
• For long term records
• For your ability to build up an integrated system
• To protect your IP (partly from a patent perspective, but
mainly from a re-use aspect)
• To maintain a balanced relationship with your vendors
• This absolutely needs to be part of the ELN
purchasing process
http://www.amphora-research.com/
13
14. “Good” (open) file formats
• Publicly documented
• Legally unencumbered
• No patents, copyright concerns etc.
• Any patents or copyright must be in the public domain
• Ideally, self documenting (XML is a good start)
• Degrade gracefully
• If you can’t the data, at least you can see a picture
• Based on more open, primitive formats where
possible
• At least two implementations of readers, one of
which is Open Source
• Widely used (W3C or IETF standards are good
signs)
http://www.amphora-research.com/
14
15. Data formats for the long term
• Good
• For text: Plain ASCII, Unicode, HTML, possibly RTF
• For graphics: PNG, SVG
• For structured data: XML
• To preserve appearance: PDF
• Worry about
• Storing files in databases
• The database file format is probably undocumented
• Store objects on the file system and use the
database to point to them
• Anything that is proprietary - there’s no excuse for it,
and it dramatically increases your risk
• Binary files generally
• Mixing content in files (e.g. embedding XML in PDF)
• Proprietary digital signatures
http://www.amphora-research.com/
15
16. IP concerns & data formats
• Companies have always used Proprietary Data
Formats as a competitive weapon
• Companies are waking up to the use of IP tools
(licenses, patents, copyrights) to reinforce their
control over data formats
• Just because a format is published doesn’t mean it
is open
• The Microsoft Office XML formats are a particularly
bad example
• Right now it looks positively radioactive
• They’re being very careful what they say which
indicates to me they’re planning something
• http://www.groklaw.net/article.php?
story=20050330133833843
• (see section: 4. Dissecting Microsoft’s “Patent License”)
http://www.amphora-research.com/
16
17. Standards
• There are so many to choose from!
• Two key ways of generating “Standards”
• De Facto - dominant supplier/format
• De Jure - committee based
• Who gets to “bless” a standard?
• What makes a “good standard”
• De Jure process has difficulty keeping up with the real
world
• De Facto process has risk of lock-in
• Pragmatic approach
• Expect your suppliers to use open file formats
• If there is an acceptable standard, use it
• Make sure you are using the right kind of format for
each purpose
http://www.amphora-research.com/
17
18. Technologies and techniques
• There are a wide variety of tools you can use to
integrate IT systems
• Tight Vs Loose coupling
• Synchronous Vs Asynchronous
• Text Vs Binary
• Proprietary Vs Open
• Simple Vs Complex
• As a rule
• Loose is cheaper than Tight coupling
• Asynchronous is easier to manage than
Synchronous
• Text is easier to work with, and more flexible than
Binary
• Open interfaces are always better than Proprietary
• Simple are better Complex approaches
http://www.amphora-research.com/
18
19. Considerations when picking tools
• Use stable interfaces
• Get a commitment from the vendor about what they’ll
keep stable across version upgrades
• Use public, documented interfaces
• Sample code is really really useful
• Pick language-neutral interfaces where possible
• Platform-neutrality
• Doesn’t worry (too much) about locking yourself into
Windows on the client
• But if you lock yourself to Windows on the server, it is
going to hurt
http://www.amphora-research.com/
19
20. Glue Languages
• There are a number of really useful “Glue”
languages around
• Python (and Jython, and other relatives)
• Perl (although I have some concerns about
maintainability)
• Groovy, Beanshell, etc.
• All of them
• Play well with XML, http, SOAP etc.
• Play well OLE
• Are cross platform
• My personal preference is Python
• You can learn it in a matter of hours
• You can read other people’s code
• It does everything I need it to do
http://www.amphora-research.com/
20
21. Cool stuff
• SOAP/Web Servers
• Valuable in many areas
• But don’t treat it as a religion
• There are lighter alternatives which bring most of the
benefits for much less effort
• The whole WS-* effort seems to have got out of control
• REST (XML over http) - a lighter alternative to
SOAP
• File swapping (generally, in XML)
• HTTP GET/POST
• Wonderfully easy to debug!
• Very flexible
http://www.amphora-research.com/
21
22. Nice things to see
• Integration points exposed as stable URLs
• For example, our PatentSafe product, we have
committed to stable URL formats to
• Submit a record via http (content & metadata)
• Get a record for display to the user
• These can be used by other systems
• And also embedded in Word documents...
• Lack of wheel re-invention
• e.g. LDAP is The One True place for user information
• e.g. RSS/Atom is The One True alerting mechanism
• Example code
• In multiple languages
http://www.amphora-research.com/
22
23. Here be dragons
• OLE - some times it is unavoidable (e.g. UI stuff),
but avoid it when you can
• Tight coupling
• Buggy
• Proprietary
• Reduces your platform options
• File format issues are awful
• Version-to-version compatabilty is “interesting”
• Direct database access
• Tight coupling
• Difficult to guarantee system integrity
• If you wrote both systems you might want to do this
http://www.amphora-research.com/
23
24. Open Source
• Definitely one to watch
• Not the “Free” lunch you might think, but a
pragmatic business too
• Examples
• Linux
• Postgres
• JBoss,Tomcat etc.
• Ghostscript
• Open Source is part of everyone’s infrastructure
• Make sure you can run your systems on a variety of
platforms
http://www.amphora-research.com/
24
25. Why?
• Good for records
• Gives you top-to-bottom control
• Good for TCO
• We’re finding the Open Source infrastructure easier to
setup and reliable than proprietary alternatives
• Enables a better solution
• Transparent systems mean you can do things the
original designers didn't think of
• This is especially important for ELNs
http://www.amphora-research.com/
25
26. Other stuff to watch
• XML generally (what did we ever do without it)
• Jabber (as computer messaging and IM framework)
• Portals & Portlets
• Especially JSR168,WSRP
• Remember you may well want to portalize any useful application
• AJAX
• Google is my hero
• You can build usable, functional Web Applications
• If you haven’t seen GMail I can send you an “invite”
• VMWare - virtualize your world
• Wow
• Great for serve consolidation, great for testing, great for
development
• Wikis
• Beginning to turn into a lightweight application
environment
http://www.amphora-research.com/
26
27. Trends to watch
• File format nasties
• Closed/Private interfaces
• Unlikely to be stable
• DMCA and other copyright legislation
http://www.amphora-research.com/
27
28. Summary
• You’ll be assembling an “ELN System” from a
series of components
• Some you have, some you’ll build, some you’ll buy
• Get the open stuff before you sign the deal
• Open, documented, stable interfaces
• Open file formats
• Use open, loosely coupled approaches where
possible
• If you can, keep the capability to own the
integration issues in-house
http://www.amphora-research.com/
28