Sally Kleinfeldt and Aaron VanDerlip describe ore.bigfile, a minimalist solution to the problem of uploading, downloading, and versioning very large files in Plone.
Automating Google Workspace (GWS) & more with Apps Script
Large Files without the Trials
1. Large Files
Without the Trials
Aaron VanDerlip and Sally Kleinfeldt
Plone Symposium East 2010
Friday, May 28, 2010
2. Acknowledgments
• Bioneers provides environmental education
and social connectivity through
conferences, radio and TV, books, and online
materials
• Engaged Jazkarta to build a file asset server
based on Plone to help them organize,
capture, and store multimedia and textual
content with files as large as 5 GB.
Friday, May 28, 2010
3. Acknowledgments
• Aaron VanDerlip - Project Manager
• Kapil Thangavelu - Developer
Friday, May 28, 2010
Bioneers funded a project “for a file-asset server system based on Plone”, that would “support the upload and
retrieval of files as large as 5GB”.
4. What is a Big File?
• Anything that makes you wait...
Friday, May 28, 2010
5. Plone Problems with
Big Files
1.Uploading/Downloading
2.Versioning
Friday, May 28, 2010
6. Uploading Big Files
• Both the user and a Zope thread are
waiting for the file transfer
Friday, May 28, 2010
7. Friday, May 28, 2010
Typically Zope has to process the entire Request coming from Apache. This can cause Zope to
block if it has to process large Request bodies
8. Uploading Big Files
• Browser encodes file in multipart mime
format
• Zope must undo this encoding
• CPU and memory intensive, and SLOW
• Zope thread is blocked during this process
Friday, May 28, 2010
10. Learning from Rails
• Get file encoding/unencoding and read/
write operations out of Plone
• Web servers are really good at this -
Apache, Nginx, and Lighttpd
• Our implementation uses Apache
• Apache file streaming is fast and threads
are cheap
Friday, May 28, 2010
Elizabeth Leddy mentioned the similarities between Ruby and Python web apps yesterday,
adopting Rails tools where appropriate
11. Learning from Rails
• Uploads: Apache plus mod_porter
http://therailsway.com/tags/porter
• Downloads: Apache plus mod_xsendfile
http://john.guen.in/past/2007/4/17/
send_files_faster_with_xsendfile/
• ...and of course ZODB Blob storage
Friday, May 28, 2010
12. Mod Porter
• Parses the multipart mime data
• Writes the file to disk
• Changes the Request to contain a pointer
to the temp file on disk
• All done efficiently in C code inside your
Apache process
Friday, May 28, 2010
13. Mod Porter
Friday, May 28, 2010
Mod Porter process the multipart mime data quickly and writes it to disk. It then sends the
modified and lighter weight Request to Zope.
14. Apache Config for
Mod Porter
LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so
LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so
# Apache has a default read limit of 64MB, set it higher
APREQ2_ReadLimit 2G
...
Porter On
# Files below this size will not be handled by mod-porter
PorterMinSize 14M
# Where the uploaded files are stored
PorterDir /mnt/uploads-Apache
Friday, May 28, 2010
15. X-Sendfile
• HTTP header
• Set an X-Sendfile header and the path of a
file on your response
• Apache does the rest
Friday, May 28, 2010
16. Apache Config for
X-Sendfile
LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so
...
EnableSendfile On
XSendFile on
# Config to send file resources directly from blob storage
XSendFilePath /mnt/bioneers/var/blobstorage
Friday, May 28, 2010
17. Using X-Sendfile
from Python
def download(self, response, file_path):
response.setHeader("X-Sendfile",
file_path)
Friday, May 28, 2010
18. Blob Storage
• Uploads
• Blob.consumeFile moves file from
Apache’s temp area to blob storage
(ZODB/blob.py)
• Uses os.rename, file never enters Plone
• Downloads
• Served directly from blob storage
Friday, May 28, 2010
19. Upload Process
Friday, May 28, 2010
File Data is written to local disk. Blob.consumeFile is called with parameters from the Request
containing the location of the file.
20. What About Really
Really Big Files?
• Use FTP
• Supports continuation and batching
• Handles files too large for browser limits
• Content editors use FTP to transfer files to
an upload directory
Friday, May 28, 2010
SFTP guarantees continuation
22. Uploading with FTP
Friday, May 28, 2010
For very large file uploads (that may run into browser limits), the file is uploaded using SFTP to support continuation. The file
name is passed via Plone to Blob.consumeFile and the file is processed in a similar manner
23. ore.bigfile
• Minimally intrusive, works with the grain of
Plone
• Provides Big File content type
• IFrontendFileServer interface defines two
methods that provide web server support
for upload and download
• Apache and Nginx implementations
provided
Friday, May 28, 2010
24. ore.bigfile
Limitations
• Upload directory is hardcoded
• Possibility of error on very large images
which Mod Porter intercepts
Friday, May 28, 2010
25. Versioning Big Files
Friday, May 28, 2010
CMFEditions has a limit on file size of 34 MB
It also makes a new file copy for every version, even if only metadata changed
26. Solution
• Bypass CMFEditions - no file size limitation
• Create a new version only when file
changes (not metadata)
• Allow old versions to be purged
• Version information stored on Big File
object using annotations
Friday, May 28, 2010
27. Conclusion
• ore.bigfile solves the Big File problem for a
particular use case, not feature complete
• It does so by taking advantage of mature
web server technology
• The code is minimally intrusive
• It provides a strategy for implementation
we can learn from as we improve Plone’s
Big File story
Friday, May 28, 2010
29. http://svn.objectrealms.net/
view/public/browser/ore.bigfile
Questions
Friday, May 28, 2010
Why not Tramline?
- older, not blob-aware, no ftp, no versioning
- requires modification of mod_python