Generative AI for Technical Writer or Information Developers
Aspera bt-big-data-cloud
1. Enabling The Big Data Cloud for HPC
and Collaboration With High-Speed Data
Transport
2. PRESENTER AND AGENDA
PRESENTER
AGENDA
Daniel Kumi
Director, New Market Development
daniel@asperasoft.com • Who and Why Aspera?
• WAN Transport
• Wireless Transport
• Customer Use Cases
• Cloud and Big Data – Transfer Challenges
for HPC and Collaboration
• Aspera On Demand
• BT-Aspera Discussion
3. ASPERA’S MISSION
Creating next-generation transport technologies
that move the world’s digital assets at maximum speed,
regardless of file size, transfer distance and network conditions.
4. Aspera: moving the world’s digital assets at
maximum speed
50% YOY growth in revenue and employees
Over 10,000 licenses sold, and over 1,500 customers world wide
Expanded to Asia PAC and Latin America through direct and channel
Patents issued or pending in 32 countries
Continuing to innovate: fasp3™, fasp-MC™, mobile transport, cloud enablement
8. What Happened to my Bandwidth?
WAN
1000 Mbps
Seattle • 170ms RTT
• 0.001% packet loss rate Paris
WAN Throughput is 1000Mbps
Max TCP Throughput ~29Mbps
Where’s my 970Mbps?
At 29Mbps
50GB transfer will take 4 hrs
1TB transfer will take 3.3 days
9. BIG-DATA and WAN TRANSFER WITH TCP
TCP WAS DESIGNED IN THE EARLY 80’S
• When data was small & bandwidth was limited
• Fantastic for reliable data delivery
• Not fast enough for big-data
TCP IS THE ENGINE THAT DRIVES
• FTP, HTTP & HTTPS
• RSYNC, SCP & DICOM
• CIFS & NFS
TCP DOES NOT LIKE NETWORK LATENCY/ RTT
• Geographic distance increases latency
• Network congestion increases latency
TCP DOES NOT LIKE PACKET LOSS
• Loss is caused by congestion
• Different network capacity
• Wireless and satellite communications
10. So if TCP doesn’t work, what’s the answer?
The Aspera Solution
11. Same WAN Scenario with Aspera
WAN
1000 Mbps
• 170ms RTT
Seattle • 0.001% packet loss rate Paris
WAN is 1000Mbps
Max TCP Throughput ~29Mbps
Max Aspera Throughput ~995Mbps (gain of x34)
ROI measured in $$ cost of not using 971Mbps
At 995 Mbps
• 50GB transfer will take ~4 hrs
• 1TB transfer will take 3.3 days
• 50GB transfer will take ~7 mins
• 1TB transfer will take 2.4 hrs
12. FASP™ — HIGH-PERFORMANCE DATA TRANSPORT
MAXIMUM LINE-RATE WAN TRANSFER SPEED
• Transfer performance scales with bandwidth independent
of transfer distance and resilient to packet loss
• Optimal end-to-end throughput efficiency
CONGESTION AVOIDANCE AND POLICY CONTROL
• Automatic, full utilization of available bandwidth
• On-the-fly prioritization and bandwidth allocation
UNCOMPROMISING SECURITY AND RELIABILITY
• Secure, user/endpoint authentication
• AES-128 cryptography in transit & at-rest
SCALABLE MANAGEMENT, MONITORING AND CONTROL
• Real-time progress, performance and bandwidth utilization
• Detailed transfer history, logging, and manifest
ENTERPRISE-CLASS FILE DELIVERY
• Transfers up to thousands of times faster than FTP/HTTP(S)
• Precise and predictable transfer times
• Extreme scalability (concurrency and throughput)
13. FASP vs TCP PERFORMANCE
fasp Bandwidth ROI
FTP: Limited by Distance & Packet Loss, Not B/W
FTP Across US US – EU US – ASIA Satellite
1 GB 1 – 2 hrs 2 – 4 hrs 4 – 20 hrs 8 – 20 hrs
10 GB 15 – 20 hrs 20 – 40 hrs Impractical Impractical
100 GB Impractical Impractical Impractical Impractical
Aspera: Scales Linearly with Bandwidth
fasp™ 2 Mbps 10 Mbps 45 Mbps 100 Mbps 200 Mbps 1 Gbps
1 GB 70 min. 14 min. 3.2 min. 1.4 min. 42 sec. 8.4 sec.
10 GB 11.7 hrs 140 min. 32 min. 14 min. 7 min. 1.4 min.
100 GB 23.3 hrs 5.3 hrs 2.3 hrs 1.2 hrs 14 min.
Distance & Packet Loss Independent
14. 6 Gbps Scalable WAN Throughput
~6Gbps Big-Data Throughput x3000 improvement vs. TCP
• Latency independent • 1TB data moved in 20 min
• Loss independent • 2 days with TCP over LAN conditions
Scale to ~10Gbps with IQ Accelerator
15. High Speed Mobile Data Transfer with fasp-AIR™
fasp-AIR SDK – maximum data transfer speed and predictability for
mobile devices
• Embeddable software library allows app developers to integrate superior
transport capabilities to their own applications such as faster and more
predictable downloads/uploads.
• Available for Android and iOS on Aspera Developer Network
• Designed for wireless networks with high latency, high packet loss environments
• Integrated transfer queuing, pause, resume and progress reporting
• Achieves significant performance improvements for upload and download
speeds over 3G, 4G and 802.11 g/n.
16. fasp-AIR Benchmarks on Verizon 4G
In some cases (highlighted in orange), speeds will vary
greatly, depending on available bandwidth and the underlying
condition of the wireless network.
18. Large-scale Global Collaboration: 1000 Genomes
Petabytes of data transferred monthly
• Files range in size from KBs to many GBs
Repository contents
• 2,500 genomes from 27 populations NIH
NIH
• Several types of variations: SNPs, small insertions and deletions, NIH
structural variants, and copy number variants Data
Available on web - 4 locations Cloud
• 1000genomes.org, AWS, NCBI, and EBI websites
• Technology web sites use:
• Aspera Connect Server
• Aspera Developers’ Network and SDK Upload/
• Researchers across all locations use: Download
• Aspera Connect client
• (Freely distributable with server license)
19. Researcher to Researcher Collaboration
Faspex in use by world-renowned Cancer
Research Center in Seattle, WA
Use case : Genomic research
Genomic research results sharing
• Research made available to collaborators
• Research published—globally
Workflow
• Illumina > Storage > Researcher > Aspera
• Publish one-to-many
Seattle
Collaboration options
• Person-to-person, one-to-many (faspex server)
• Publish-subscribe (faspex or connect server)
21. CLOUD COMPUTING — WHY IS IT SO COMPELLING?
THE POTENTIAL OF INFINITE COMPUTING RESOURCES, ON DEMAND
• Eliminates the need to plan ahead
• Allows companies to meet demand
• Without the lead-time bottleneck
THE ELIMINATION OF AN UP-FRONT COMMITMENT
• Reduce capital outlay and investment risk
• Start small & increase h/w resources to match need
• Auto-scale to meet demand
PAY-FOR-USE RESOURCE MODEL
• CPU’s by the hour
• Storage by the day
• Bandwidth by the GB
22. SO? WHAT CAN I DO WITH IT?
• Compute Intensive: 10’s, 100’s, 1000’s of CPU cores
DATA PROCESSING
& CONTENT CREATION
• Transcoding, rendering, encoding, watermarking
• Big-data analytics & HPC
• Near-line for editing, creative apps and processing
STORAGE FOR
ARCHIVE & D/R
• B2B / B2C data workflow
• Offsite storage for disaster recovery and business continuity
• OTT, play out, release, project & event specific marketing
DATA & CONTENT
DISTRIBUTION
• Collaborative data exchange
• CDN and global delivery
23. GETTING IN AND OUT OF THE CLOUD
KNOWING WHEN TO CHOSE THE RIGHT TOOL
24. CHALLENGES OF STORING BIG FILES IN THE CLOUD?
BEWARE THE OBJECT STORE:
• Not like traditional NAS or SAN
• Bigger, better, but possibly much more complex
• a.k.a. Google File System, Amazon S3, Hadoop Distributed File System
• Simple read/write of data ―blobs‖, indexed by a key
• Multiple replicas are distributed across storage for durability and optimized for access
• Should work well for storing large numbers of files
UNDERSTAND CHUNKS, BLOCKS and BLOBS
• You need to deal with chunks, blocks and blobs
• ―Chunk‖ sizes are small (64 MB/128 MB)
• Large media files must be ―chunked‖ (1TB file = transporting and reassembling 10,000+ chunks!)
• Multi-chunk APIs impede workflow and are complex
• Data I/O use the standard HTTP(s) protocol
• VERY SLOW at distance
• Single HTTP stream slow even locally (<100 Mbps).
BIG-DATA SERVICES WILL NEED A HIGH-SPEED BRIDGE TO THE CLOUD
• Large files moved at full bandwidth capacity with global access
• Overcome the WAN and storage bottleneck
• Support files of any size or quantity
• Transparent to the end user/data owner (GUI, command line, API, browser, etc.)
• No hardware to support B2B, B2C, C2B workflow
29. OVERCOMING BOTH BOTTLENECKS
#1 — TRANSFER DATA TO EC2 OVER WAN EFFECTIVE THROUGHPUT
• http transfer over WAN (single stream)
<10 Mbps
• Typical internet conditions
• 50–250ms latency & 0.1–3% packet loss
<10 to 100 Mbps
• 15 parallel http streams
• Aspera fasp transfer over WAN to EC2 up to 1Gbps
(per EC2 Extra Large Instance)
#2 — TRANSFER DATA FROM EC2 TO S3 EFFECTIVE THROUGHPUT
• Standard single stream http 10 to 100 Mbps
• Aspera S3 Proxy up to 1Gbps
• With parallel I/O http streams (per EC2 Extra Large Instance)
ASPERA + AWS | ~10 TB transferred per 24 hours | PER EC2 INSTANCE
30. ASPERA DIRECT-TO-S3 — LINE RATE ACCESS TO THE CLOUD
UNRIVALED ASPERA PERFORMANCE
• Built on Aspera fasp™ technology for maximum transfer speed
• Regardless of file size, transfer distance and network conditions
• Precise bandwidth control ensures the available bandwidth is utilized to achieve maximum transfer speeds, while
being fair to other business-critical network traffic
SEAMLESS INTEGRATION WITH S3
• Integrated with S3 multi-part HTTP for maximum ―last foot‖ performance
• Simple configuration of S3 credentials, for both shared and dedicated docroot
• Transfers directly into S3 are seamless and transparent to user
ENTERPRISE-GRADE SECURITY AND RELIABILITY
• Secure authentication with encryption in transit & at rest (AES-128, FIPS 140-2, HIPPA Compliant)
• Packet-level data integrity verification
• Automatic resume of partial or failed transfers
• Full support for AWS S3 Service-side-encryption at rest
INTEROPERATES WITH ALL ASPERA HOST OPTIONS
• Any platform (Windows, Linux, MAC, UNIX, iOS, Android)
• Any Aspera Clients (CLI, Desktop, Point-to-Point, Mobile, Web, Embedded)
• Any Aspera Servers (Enterprise, Connect, faspex)
31. ASPERA FOR AWS: DIRECT-TO-S3
1. Upload using typical multi-part HTTP client
Scale out
HTTP –
2. fasp high-speed upload Direct-to-S3 multipart
2
fasp Aspera Transfer
Server
Aspera Herndon, VA
Client
1
HTTP –
multipart
Client, Dallas, TX
32. HYBRID CLOUD DEPLOYMENT (PUBLIC/PRIVATE)
Shares app transparently communicates with Aspera
server Nodes in cloud and in enterprise
User browses content across authorized shares
High-speed data transfers with Datacenter
High-speed data transfers with Direct-to-S3
Client, NY, NY
fasp Shares fasp
DMZ
Node
Node
Herndon, VA
Datacenter, Emeryville, CA
33. ASPERA SOFTWARE ON DEMAND
Aspera Server Aspera faspex Aspera Shares Aspera Console
Universal file transfer server Global Person-to-person file Global Person-to-person file Global transfer monitoring,
supports desktop, web, mobile & ingest & distribution transfer & exchange reporting & control
embedded
KEY FEATURES
• On demand high-performance data transport to and from remote infrastructures
• Unlimited scale out of transfer capacity with additional AMIs
• Support for all Aspera Server software and use cases
• Additional Client Options: Mobile, Outlook Plug-in & Cargo (Aspera faspex)
• Flexible Storage Options: Local, EBS, AWS S3
• Seamlessly interoperates with on-premise Aspera deployments
• Integrated Management and Monitoring
APPLICATIONS AND USE CASE
• High Performance Computing On Demand
• Content Aggregation, Transformation and Distribution
• Time-boxed event or project-based collaboration, ad-hoc distribution or content ingest
34. Aspera software product & technology portfolio
Distribute Collaborate Automate
Complete portfolio of servers and end point Global person-to-person and project-based Web-based application and SDK for creating and
clients for high-speed digital content delivery and exchange and collaboration of files and directories, managing automated workflows, from simple file
distribution. of any size, over any distance, over any network. forwarding, to complex process orchestration.
Enterprise and Connect Server faspex Server Orchestrator
• Universal file transfer server and web-based • Secure digital delivery and collaborative file • Intuitive graphical workflow designer
interface and directory listing transfers with remote users and partners • File processing decision tree and flow
Client and Point-to-point • Integrated e-mail notifications for delivery and • Rich and flexible plug-in architecture for third-
• Uni- and bi-directional transfer clients successful download party process integration
• Comprehensive administration, user • Comprehensive library of plug-ins for
Connect
management & access control transcoding, virus checking, quality checking,
• Web browser plug-in for high-speed uploads
faspex Multi-Server / HA archive, notifications
and downloads
• Automated bi-directional relays between sites • High volume processing
Mobile • Detailed dashboard, workflow, and step-level
and multiple servers
• High-speed transfer for mobile devices progress reporting.
• 3-tier architecture with support for clustering and
Sync high availability • Open development framework for designing
• Highly scalable, multidirectional file replication and integrating highly processing and
Cargo
and synchronization automation pipelines
• Automated client downloads
Transport
Our unique, patented transport technologies provide unparalleled speed, efficiency, concurrency and bandwidth control over any size, distance, and network
fasp™ fasp3™ Aspera On-Demand S3|Direct
Patented, file-based bulk data transport Next-gen protocol for any bulk data High-speed transfer direct to cloud storage (S3)
fasp-AIR™ fasp-MC™ Console transport management
Uploads and downloads over 3G, LTE and Wi-Fi networks High-speed delivery over multicast Centralized web-based management, monitoring, and reporting
36. ASPERA DEVELOPER NETWORK
A complete set of SDKs provides developers with guides, reference information, and sample code to assist them with
integrating Aspera technology into their own applications. Aspera fasp™ technology can be used in desktop, network-
based, and web applications in place of FTP, HTTP, or custom TCP-based copy protocols.
ASPERA TRANSFER APIs ASPERA MOBILE APIs
Aspera Web Services Android SDK
A SOAP based web service API that allows Aspera Android SDK provides a Java API to transfer files using
initiation, monitoring and controlling of fasp based file fasp-AIR™.
transfers.
iPhone SDK
Aspera iPhone SDK provides an Objective C API to transfer files
Aspera Web using fasp-AIR.
Javascript API exposed by Aspera Connect client. It allows
integration of fasp based file transfers into web
applications.
ASPERA APPLICATION APIs
Connect 2.8 developer Preview 2 faspex™ Web API
Introducing the new Connect 2.8 developer preview! The Aspera faspex Web API provides a set of services that enables
Integrate the functionality of Aspera Connect 2.8, a fasp- users to create and receive digital deliveries via a Web interface, while
based file transfer client, into your own web taking advantage of fasp high-speed transfer technology.
applications, while customizing it to your unique brand.
fasp Manager OTHER INFORMATION
A class library that allows intiations, monitoring and
controlling of fasp based file transfers. Supporting Tools and Libraries
Supporting tools and libraries let you perform other common tasks
Aspera Multicast SDK surrounding file transfers.
A Java class library that allows initiation and management General Reference
of IP multicast based data transmissions using Aspera
Reference on error codes, log file locations, configuration files
fasp-MC™.
and more.
37. Aspera software product & technology portfolio
Distribute Collaborate Automate
Complete portfolio of servers and clients for high- Global person-to-person and project-based Web-based application and SDK for creating and
speed data delivery and distribution. exchange and collaboration of files and directories. managing automated file-based workflows.
Enterprise and Connect Server faspex™ Server Orchestrator
• Universal file transfer server and web-based • Secure digital delivery and collaborative file • Intuitive graphical workflow designer
APIs
interface and directory listing
Client and Point-to-point
• Uni- and bi-directional transfer clients
Connect
• Web browser plug-in
APIs
transfers with remote users and partners
• Web, email, mobile client options
• Comprehensive administration, user
management & access control
faspex™ Multi-Server / HA
APIs
• File processing decision tree and flow
• Rich and flexible plug-in architecture for third-
party process integration
• Comprehensive library of plug-ins for
transcoding, A/V, QC, archive, notifications
Mobile • Automated bi-directional relays between sites • High volume processing
• High-speed transfer for mobile devices
• 3-tier architecture with support for clustering, HA • Detailed dashboard, workflow, and step-level
Sync progress reporting.
• Highly scalable, multidirectional file replication Cargo
• Automated package downloads • Open development framework for designing
and synchronization
and integrating automation pipelines
Transport
API’s
Our unique, patented transport technologies provide unparalleled speed, efficiency, concurrency and bandwidth control over any size, distance, and network
fasp™ fasp3™ Aspera On-Demand S3|Direct
Patented, file-based bulk data transport Next-gen protocol for any bulk data High-speed transfer direct to cloud storage (S3)
fasp-AIR™ fasp-MC™ Console transport management
Uploads and downloads over 3G, LTE and Wi-Fi networks High-speed delivery over multicast Centralized web-based management, monitoring, and reporting
Due to many uncontrolled parameters such as … it is a difficult task to compare the performance of different protocols. For the same protocol, its performance can vary … To exclude these uncontrolled factors, we repeat the same test many times in a relative long time and then compare the mean, worse and best scenario.We also manually verified the performance on the iPhone …
Aspera products used: Connect Server, SDK
Fred Hutch 3000 employeesResearchers doing a variety of researchRaw data comes off of mass spectrometers: data comes off instrument to HDD on instrument PC and copied to file servers (Sun). Lab Key Server proteomics pipeline. Researchers log into web app. Web server mounts sun server. Researchers (from other institutions) need to validate results—comparing raw data. Customers can log into proteomics pipeline. Comparing results. Researchers all over the world. Broad, Harvard, Berkeley—and other countries. Data set: Few hundred Megs to 25 Gigs. Mainly 1 to 5 Gigs range. About 10 percent compared remotely. Some labs it’s all remote; other labs don’t collaborate remotely. Total data set size has file sizes between 500Megs to 1Gigs x 2 or 25 for total data set size. Raw file converted to MZ-XML by pipeline—so a 500Meg file turns into a 1.5Gig file—just used for conversion to results. Next gen DNA research. In-house pipeline. Collaborators outside the hutch. Pipeline software from Illumina, customized with scripts. Submitting jobs to cluster, sending out results. For example, working on publishing a paper, would pretty much share everything—all results (not necessarily all data). Storage: Sun Server mounting 3PAR. HPC cluster with nodes mounting storage. Directories mounted from servers running Aspera (e.g. Faspex). Clients: Diversity of Linux, Windows, Macs. Aspera Developers Network: Fget fsend Other use case: long term collaboration between researchers. Could be one to one or one to many. Collaborating on papers and such. Proteomics and genomics coming in as well.
First bottleneck's solution Transfer bulk data over WAN using Aspera fasp, overcomes TCP limitations under network latency and packet loss.Aspera solutions yield 100x performance improvements
fasp Technology and Software Suite for Predictable, High Speed File-based TransferUnique in the world, patented transport technology providing unparalleled speed, efficiency, concurrency and bandwidth controlFully integrated, cross platform software suite for interoperable file transfer – any size, any distance, and network BWSecure standard for the industryIntegrated global management, tracking and reportingExtends to all Cloud based storageExtends to all major mobile platformsKey Business Areas Enabled with Latest Aspera SoftwareHigh Speed Content Delivery, Synchronization and Distribution (including Cloud!)Ad Hoc Content Ingest, Delivery and CollaborationIntegrated File Workflow Automation and OrchestrationMobile Platform File Delivery