Weitere ähnliche Inhalte
Ähnlich wie Web Archiving Whitepaper Aleph Archives (20)
Kürzlich hochgeladen (20)
Web Archiving Whitepaper Aleph Archives
- 1. Web Archiving for
Compliance & eDiscovery
ALEPH ARCHIVES Ltd. ✉ 600 Blv de Maisonneuve suite 1700 - Montréal, Québec (Canada) / chemin des Croix-Rouges 16 - 1007 Lausanne (Switzerland)
✎ info@aleph-archives.com ☞ aleph-archives.com
- 2. Copyright © 2012 Aleph Archives. All Rights Reserved.
WEB ARCHIVING
INTRODUCTION
Quick access to digital data and electronic information stored online is a «must have» when it turns to
elaborate strategies in litigation or statutory compliance turmoil.
There are however many obstacles to permit and manage such access in an efficient way, whilst tak-
ing into account both the frequent complexity of the related turmoil and the legal context which need
to be dealt with. It is often impossible or too late to obtain the relevant information when it is neces-
sary to, such as during eDiscovery processes.
Aleph Archives is an IT service provider dedicated to companies with specific needs regarding Web-
content preservation. Aleph Archives offers turnkey tools to easily and efficiently retrieve relevant data
stored online.
According to recent researches, the average life expectancy of a website is less than 75 days, and
disputes over the content of websites are on the increase. In a certain number of countries, there are
regulatory and archiving compliance regulations (i.e. Sarbanes-Oxley Act - US, Health Insurance Port-
ability and Accountability Act - US, Gramm-Leach Bliley Act. -US, Federal Rules of Civil Procedure -
US, etc) governing, and authorities (i.e. SEC and FINRA - US, Financial Services Authority - UK) based
thereon which supervise, the different industry sectors.
Through a unique cloud-based Web archiving platform named CAMA®, Aleph Archives provides a
«Web Preservation» services for regulatory compliance, litigation support and eDiscovery to help cor-
porate entities, legal and governmental authorities in the collection, management and archiving of their
huge and increasing Web content. CAMA® is the only platform that archives and keeps records of
your websites, webpages, and web presence at large. CAMA® clearly evidences the content of web-
site which has been shown to a particular enduser during its visit thereof and equally as important,
which content – and hence which data - have not.
Web archiving for eDiscovery process is a recent "technological niche", as opposed to legacy eDis-
covery which has been used for years to preserve electronic data (eg. email, files, etc.). The Web ar-
chiving eDiscovery process is based on three main features, as outlined by the Electronic Discovery
reference Model: thorough gathering of electronically stored information from Websites, full access
and playback of any archived web content and conversion to a form that allows full-text search.
1
- 3. Copyright © 2012 Aleph Archives. All Rights Reserved.
PRODUCTS & INNOVATION
CAMA® Web Archiving Platform
Aleph Archives is a pioneer in the domain of Web archiving. We offer a high-quality archive accessibil-
ity and rendering. With CAMA®, Aleph Archives sets the web archiving process and the related quality
assurance (QA) to a higher level by working with crawl engineering experts, QA dedicated teams and
a powerful - yet easy to use - archive access technology1.
Load the archived
version with a click
Testimonials
CAMA® in action: archived (07/04/2011) version of Toyota’s Corporate website
and videos
Aleph Archives targets the companies in need of strict, reliable archiving processes to
ensure compliance with SEC and FINRA regulations. The CAMA® Web archiving platform is more effi-
cient and more reliable than any solution of its main competitors. Aleph Archives offers open (WARC -
ISO 28500:2009 2 ), adaptive (cloud-based computing) and innovative (scheduled crawls, export Web
archives as PDF/PNG, antiviral check, CAMA® Appliance, real-time results deduplication, multilingual
search and translation), etc.
1 Products demo at: http://www.youtube.com/user/alepharchives/
2 WARC ISO file format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717
2
- 4. Copyright © 2012 Aleph Archives. All Rights Reserved.
CAMA® belongs to the category of « client-based & web-served » archiving solution (refer to Appen-
dix A and B for more details) that allow creating and maintaining stable, time-structured, verifiably au-
thentic and independent versions of corporate web presence, « social media » included.
CAMA® in action: archived (05/10/2011) version of AerzteZeitung online German Newspaper
Play all embedded videos as usual
Aleph Archives’s strategy aims at satisfying any of its clients, as CAMA® offers high-quality archived
websites (which can be filed as evidence in case of litigation), easy-to-use browsing and access tools,
and a full-Web-based service to reduce costs (refer to Appendix C).
3
- 5. Copyright © 2012 Aleph Archives. All Rights Reserved.
Today (08/02/2011) live version of NY Daily newspaper
Timeline
Qrcode, Digital
Signing, and
Timestamping
Options Pane
CAMA® in action: archived (10/05/2011) version of NY Daily newspaper
4
- 6. Copyright © 2012 Aleph Archives. All Rights Reserved.
MARKET SECTORS: who is
CAMA® suitable for?
Corporates
a. E-Discovery
Litigation Protection — Websites contain a growing proportion of business records that must be pre-
served for long periods of time. This content is frequently requested during discovery proceedings be-
cause of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is
critical that all relevant electronic content be made available for e-discovery purposes.
Legal Hold — When a hold on data is required, it is imperative that an organization immediately begins
preserving all relevant data. Our web archiving platform CAMA® allows organizations to immediately
place a hold on data when requested by a court or on the advice of legal counsel. If an organization is
not able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of se-
rious consequences, ranging from embarrassment to major legal sanctions or heavy fines.
b. Regulatory Compliance
For just about every organization, there are a large and growing number of regulatory obligations to pre-
serve electronic content. Some of the more important requirements are:
• Sarbanes-Oxley Act of 2002
• Health Insurance Portability and Accountability Act of 1996 (HIPAA)
• Securities and Exchange Commission Rules (SEC)
• Financial Industry Regulatory Authority (FINRA)
• Model Requirements for the Management of Electronic Records (MoReq)
c. Maintain Corporate Memory & Knowledge Management
Web archiving can be very useful for maintaining a corporate record of what has been posted to a Web
site, how long this content was maintained or when it was replaced. For example, a company may want
a record of its Web site for historical purposes, or it may need an archive in order to re-use some of its
content at a later date. Maintaining an accurate archive of Web content can significantly reduce the
costs associated with recreating this content.
5
- 7. Copyright © 2012 Aleph Archives. All Rights Reserved.
Government
Virtually all government agencies have regulatory obligations to preserve electronic content. Because
your agency’s online content is increasing both in complexity and volume, and because governments
are held accountable for the information they publish on the web, you need to employ a records re-
tention policy.
The 2006 changes to the Federal Rules of Civil Procedure indicate that all organizations (including go-
vernments) must be able to find, capture, and produce electronically stored information that might be
relevant to a judicial or regulatory request. This can’t be done with server backups, CMS revision con-
trol, or other outdated methods. You need a solution that can provide indisputable proof of your online
records integrity and authenticity (as required by the Federal Rules of Evidence).
For example, 2010 saw the Executive Office of the President (EOP) issue a solicitation to:
« Provide the necessary services to capture, store, extract to approved formats, and transfer content
published by EOP on publicly-accessible web sites, along with information posted by non-EOP persons
on publicly-accessible web sites where the EOP offices under PRA maintains a presence, throughout
the term of the contract. »
Other requirements come from:
• Presidential Records Act (PRA)
• National Archives and Records Administration (NARA)
• E-GOV - electronic records management initiatives
• Guidance on Managing Records in Web 2.0/Social Media Platforms, October 20th, 2010
• Library of Congress
• Federal Rules of Civil Procedure (FRCP)
• Department of Commerce
• Department of Energy
• Department of Justice
• Environmental Protection Agency
• Office of Management & Budget
• Securities and Exchange Commission Rules (SEC)
• Library & Archives Canada
6
- 8. Copyright © 2012 Aleph Archives. All Rights Reserved.
Website and « social media archiving3 » is a good solution for e-discovery preparedness. Aleph Archi-
ves technology uses web bots (i.e crawlers) that capture all web pages (including social media). The
web pages are stored exactly as they are captured (including links, rich media, video, and Flash),
which satisfies regulatory requirements for digital records. Aleph Archives also provides a digital times-
tamp and signature for each archived page, ensuring data integrity and authenticity. With this SaaS
solution (no tedious installation or software), governments can sign up and begin archiving in less than
an hour.
Adopting a web archiving policy is essential. But it’s not just for big cities or the federal government.
Aleph Archives’s pricing is competitive so that even small towns can stay prepared.
The Internet will only continue to grow in scale and complexity, and governments are increasingly in-
terested in how it can be used for civic growth and development.The issue of records retention must
be addressed from the start, so that agencies can move forward confidently online.
« Government websites are public records and must be archived to comply with
Public Records Laws. Start archiving now. »
Finance
Online marketing/communications can present a challenge for securities traders, investment advisors,
banks, and others in the financial services industry. The benefits of advancing technologies must be
weighed against the risks associated with non-compliance in the area of books and records retention.
Failure to meet the demands of industry standards can result in hefty fines and bad publicity.
Multiple sets of guidelines for the financial industry (issued by SEC, FDIC, FSA, SOX, FINRA, and
others) demand the preservation of business records (both paper AND electronic) in such a way that
the data can be reproduced in a timely and complete manner to a regulator. These requirements are
now being extended to include newer tools such as social media platforms, and FINRA has advised
that no compliance grace period will be in effect for these new technologies.
It’s critical that firms implement a robust records retention policy for their websites and « social media
pages ». Should your corporate web presence be investigated or questioned, a perfect representa-
tion of your company’s online activity is a necessity — and that’s exactly what CAMA® provides.
« Website archiving is vital to fulfilling many key FINRA and SEC regulations.
Start complying today. »
3 Twitter and Government Transparency
7
- 9. Copyright © 2012 Aleph Archives. All Rights Reserved.
Food and Drugs Companies
In archiving their electronic data, public traded companies need to comply with the records manage-
ment regulations of the Sarbanes-Oxley (SOX) Act.
The past year has seen a dramatic increase in the FDA‘s enforcement of regulations that deal with
product claims and labeling. In an effort to be more pro-active, the agency has been investigating
companies for compliance with the FD & C Act, particularly section 403 A, which deals specifically
with product descriptions and claims. As a result, a number of companies have received warning let-
ters — which are viewable online, damaging brand reputation — addressing the product claims made
on their labels or websites.
Since most marketing now happens via websites, social media, and other Internet tools, it is of ut-
most importance for your company to have a reliable, accurate archive of all online activity. Should
your claims be investigated or questioned, defensible evidence of your website’s precise content is a
necessity — and that’s exactly what CAMA® provides.
Using crawling technology, we take automated snapshots of your website. Only new pages or chan-
ged pages are archived, saving storage space. The whole process is automatic — you don’t have to
remember to do anything.
« Have a reliable, accurate and defensible archive of all online activity. »
Law firms
Companies creating content online or law firms can use CAMA® to provide legal proof of intellectual
property. CAMA® provides each page with a digital timestamp and a digital signature that cannot be
altered without detection and, hence, creates legal proof of copyright. This trusted, non-refutable evi-
dence stands up in a court of law if copyright ownership is ever questioned.
« Use websites as legal evidence in court. Have CAMA® create integral and
authentic evidence with support for e-Discovery. »
“ This Court sees no reason to treat web
sites differently than other electronic files. ”
Arteria Prop. Pty Ltd. vs. Universal Funding V.T.O., Inc
8
- 10. Copyright © 2012 Aleph Archives. All Rights Reserved.
CAMA® for Social Media e-Discovery
Organizations and their employees are leveraging social media tools at unprecedented levels. With
over 150 million blogs, an average of 140 million tweets every day, and +800 millions of users of social
media sites worldwide (Facebook, LinkedIn, MySpace...), organizations are challenged to define
usage policies and implement solutions to appropriately govern, discover and preserve relevant infor-
mation from these complex and malleable data sources. Complicating the challenge of performing
discovery on social media sites is the fact that these sites also include rich media such as audio and
video, adding to an already complex environment. Legacy tools and manual processes cannot effecti-
vely manage the risk associated with social media sites and interactive content.
To successfully manage discovery of social media and protect themselves from potential risk, organi-
zations must embrace new technologies to harness and understand the meaning of the social media
content. Since social media content can be subject to legal hold if it contains relevant information, le-
gal teams must be prepared to search, identify, preserve and collect this information. Social media
sites must be managed as other enterprise data sources, as part of a comprehensive Social Media
eDiscovery and information governance program. Given the complexity and volume of social media
content, legal teams must be prepared with an automated solution that can understand meaning and
cull through voluminous data sources to find relevant information.
According to a report issued by Garner, Inc., a leading technology research and advisory firm, half of
all companies will have been asked to produce material from social media sites for e-Discovery by the
end of 2013. Debra Logan, vice president and distinguished analyst at Gartner, wrote:
« In e-Discovery, there is no difference between social media and electronic or even paper artifacts.
The phrase to remember is if it exists, it is discoverable. Unique aspects of social media present addi-
tional challenges, but as with an overall information governance strategy, the key to avoiding or miti-
gating potential legal issues in the use of social media for business purposes is to have a governance
framework, policy and user education. ».
In addition to the challenge of meeting the legal hold and preservation obligation, organizations inclu-
ding those in the Financial Services, Healthcare, and Pharmaceutical industry, must ensure that em-
ployees are not violating regulations by creating or posting non-compliant content. As regulators re-
cognize the influence and risks associated with social media channels, they are beginning to require
organizations to actively monitor and govern employees' social media interactions.
For instance, FINRA (Financial Industry Regulatory Authority) regulatory notice 10-06, requires mem-
ber firms to supervise and archive content posted to social media sites. The Food and Drug Adminis-
tration (FDA), Federal Trade Commission (FTC), and the National Futures Association (NFA) are also
9
- 11. Copyright © 2012 Aleph Archives. All Rights Reserved.
developing rules associated with the use of social media, and the Federal Courts have issued guideli-
nes for monitoring and managing social media sites usage (see Resources & Links section).
For example, if you don’t have an archiving system, you could be in trouble trying to find something
you posted.
Loading archived
version
All media types
(Flash, photos,
videos, posts...)
are preserved
in their native
format
NYTimes newspaper
on Facebook
All links are clickable.
Browse the archived
pages, play videos,
load images...
CAMA® in action: archived (05/17/2011) version of NYTimes newspaper on Facebook
According to Facebook4:
« Currently, you can only search for content that has been posted in the last 30 days. The range of the
search history may be expanded in the future. »
4 same apply to Twitter and LinkedIn, see Archiving Social Media prepares you for e-Discovery
10
- 12. Copyright © 2012 Aleph Archives. All Rights Reserved.
Aleph Archives’s advanced web archiving platform for e-Discovery enables organizations to proactive-
ly manage, search for, identify and preserve any social media content. CAMA® enables organizations
to take advantage of the power and business value of social networks, while ensuring FRCP, and re-
gulatory compliance.
Unique Selling Proposition
The main competitive advantages of the CAMA® platform are:
• superior technology to capture multiple web formats in dynamic websites,
• more comprehensive web archiving process with crawl engineering experts,
• high-quality archive accessibility and rendering,
• Universal Archives View (UAW) independent from OSes and browser types or versions,
• optimized fulltext search engine tailored to very large web archive collections (billions of
documents),
• deduplicated full-text search results in real-time,
• daily archiving capabilities,
• support of WARC ISO file format,
• dedicated quality assurance teams and processes,
• ability to be deployed over commodity machines,
• fault tolerant software design,
• high availability 5
CAMA® is the only solution in the market capable of running without Internet connexion while
accessing the archives and also being able to be fully deployed « In-House » (i.e inside the cus-
tomer’s infrastructure). The « In-House » solution offers you the freedom of exploiting the potential of
CAMA® (training required).
DISASTER & DATA RECOVERY
« Your data safe and secure »
Aleph Archives’s “retention service” includes shadow copies of your archived
data in a geographically distinct locations (USA, Canada, Switzerland, France). This
means that two copies of your web archives exist at any given time to provide
high data availability and avoid data loss.
5 See our Service Legal Agreement (SLA)
11
- 13. Copyright © 2012 Aleph Archives. All Rights Reserved.
Pricing Model
Cloud-based solution
This section describes the implementation process for Aleph’s enterprise web archiving service and
the pricing for the Set Up phases and for the provision of archive services thereafter.
Aleph may calculate the fees using one of two methods of estimation.
1. Where requirements are not fully defined, a simple overall price can be provided, which will be
based on the size and scope of the archive policy in broad terms. A breakdown of these fees may be
provided for transparency.
2. Where requirements are more fully defined, a more rigorous approach to estimating fees may be
used. This will provide a price per URL (i.e archived resource), which will be more accurate than the
simple overall price, in that it is based on the specifics of an archive strategy defined by the more de-
tailed requirements. Three parameters are involved here: the scope, the frequency, and the price per
URL.
• The scope defines which URLs are "in" a particular crawl: the list of URLs the customer would
like to archive.
• The archiving frequency for each scope can vary from daily, to weekly, to monthly to quarterly, to
annually. Aleph Archives is the only web archiving company offering a daily archiving service.
• The price per URL is composed of:
‣ System administration charges;
‣ Archiving services fees;
‣ Infrastructure and storage costs (retention, data integrity, data security, etc.).
InHouse solution
All interested customers in the InHouse version of CAMA® are welcome to contact us for a quote.
12
- 14. Copyright © 2012 Aleph Archives. All Rights Reserved.
APPENDIX A.
Web Archiving Policy
A web archiving policy is the only means of creating and maintaining a stable, time-structured, verifia-
bly authentic and independent version of the corporate web presence. « Independent » means that
access to the content must be possible without requiring the original CMS version to be installed,
configured and running. Having a web archiving policy is the only way the corporate Web-publishing
infrastructure can evolve without threatening accessibility to legacy content. It is also the only way to
avoid the continuous licensing and maintenance costs of legacy CMSs.
A substantial and enduring web archive can be achieved by generating a flat, stable and time-struc-
tred version of the published content, capturing authentic snapshots according to the corporate ar-
chiving policy. These snapshots must be taken as user-centric views of the content, i.e. accurately
reflecting the user’s experience of that particular content. In addition they must be stored and made
accessible in precisely the same form, thereby meeting legal and compliance requirements as authen-
tic copies. And they must enable discovery using familiar web paradigms such as full-text search, as
well as more sophisticated e-discovery techniques including metadata, tagging, filters and complex
search.
A1. How to choose your web archiving solution?
Web archiving has made significant progress during the last five to seven years. It now offers a choice
of approach to both policy and supporting technology. These choices should be considered carefully
against business objectives before the decision is made. The main differences lie in the capture and
access methods used.
Three different methods exist to capture and archive web content:
a. client-side archiving
b. transaction archiving
c. server-side archiving
13
- 15. Copyright © 2012 Aleph Archives. All Rights Reserved.
A2. Client-side Archiving
« Client-side archiving » uses an archival crawler, derived from search engine crawler technologies,
with
significant enhancements to ensure that complex and hard-to-reach content can be found and
captured, as well as stored without change. Starting from seed pages or entry points, these tools au-
tomatically capture pages and parse them to extract all links. The process repeats and continues as
long as newly discovered pages remain within the scope defined for the crawl. The captured web
content and embedded files are stored unchanged — original and authentic copies, an exact equiva-
lent of what the generic user would have received in their browser at the time — and preserved in a
flat, standards-based and self- contained file format that can be confidently considered as future-
proof. This is especially important within a legal context.
To be effective this method requires a crawler with excellent link extraction and path-finding algorithms
that can work in a wide range of circumstances and site/page designs. In addition to client-side archi-
ving, there are two alternative methods to capture web content. Both methods need to be operated
from the server-side; require prior authorisation to services; and need access to both front-end and
back-end servers.
A3. Transaction Archiving
The first of these alternative methods, called « transaction archiving », consists of the systematic cap-
ture and archiving of all browser/server exchanges (request/response pairs), resulting from the interac-
tion of users with sites, regardless of their content type and how they are produced.
Transaction archiving enables tracking and recording of every actual instantiation of content in an au-
thentic flat HTML form, easy to maintain and preserve over time. Moreover, it can be used to archive
hidden web content, provided this content is requested, i.e. read, by the websites’ users during the
capture time.
However, transaction archiving generates unnecessary duplicates of frequently-visited pages and rai-
ses serious privacy concerns as the method implicitly relies on usage tracking.
14
- 16. Copyright © 2012 Aleph Archives. All Rights Reserved.
A4. Server-side Archiving
The second, and more obvious, alternative to client side archiving is « server-side archiving ». This
consists of directly copying files in the document folders to back-up servers. Although it might appear
to be the simplest approach, it is in fact seriously flawed, from both the preservation and archive ac-
cess points of view.
To make certain that any web content archived using this method can be properly restored, server-
side archiving requires that all original CMSs, databases and other software are archived alongside the
content or are actively maintained in an operational state; or that the content is migrated to newer
CMSs, databases, etc. In any case, these activities will be required for the whole period of archive re-
tention. Interestingly, IT backups essentially rely on this method in almost all cases, systematically fai-
ling to meet long-term preservation and ac- cess capabilities that are essential for legal and com-
pliance requirements. However, for some types of hidden-web content, this method can prove to be
useful, mainly in situations where it is required to archive parts of websites that a client-side crawler
cannot reach.
A5. Comparison of Content Capture Methods
The following table summarises the main content capture methods, where: ✔ = fully supported
and ● = possible/custom development.
Server-side Transaction Client-side
Content captured as user sees it, unchanged, and authentic ✔ ✔
Archive access independent of original publishing technology ✔ ✔
Able to capture interactive or query based content ✔ ✔ ●
Retains web URL space (not dependent on server link mapping) ✔ ✔
De-duplication possible ● ✔
Easily directed and scheduled capture ✔ ✔
Flexible archival scope, for a wide range of needs ✔ ✔
Able to capture browser/server exchanges (request/response pairs) ✔
Web server technology independence ✔
Archiving services can be centralized in one place ✔
Cost effective and efficient operations over time ✔
In most cases client-side archiving is the best approach for capturing content. The quality of the resul-
ting archive will depend mainly on the capabilities of the crawler, particularly with respect to link ex-
traction, even when links are encoded in scripts and executables. This is one of the key determinants
for capture of all files in a consistent and timely manner.
15
- 17. Copyright © 2012 Aleph Archives. All Rights Reserved.
APPENDIX B.
Accessing your Web Archives
Two different methods exist to provide access to archives:
a. website-copier approach
b. Web-served approach
The choice is largely determined by how the files are stored. This is critically important, because web
URLs use different naming conventions to file systems, with different permissible and reserved cha-
racters, escaping rules, case sensitivity, etc.
B1. Website-copier Approach
Website copiers write all captured files directly to disk, and therefore need to modify names and links
as they are stored in order to make the archive accessible. This results in an archive that is not an au-
thentic version of the original server’s response stream.
B2. Web-served Approach
Archive web servers, on the other hand, store responses from the original server unchanged in con-
tainer files. This ensures the content and server response stream are kept in an authentic form.
The emerging standard for web archive container files is WARC6 — the Web ARChiving file format —
ISO standard ISO/DIS 28500. It is already being adopted as the foundation for web archive storage
and preservation. A WARC file records the sequence of harvested web files captured by the crawler,
each page preceded by a header containing metadata that briefly describes the harvested content, its
length and checksum.
WARC ensures the preservation of the original naming scheme and linking, thereby providing archive
storage of content in an authentic form, as well as providing the means for additional integrity checks
during the entire period of custodianship.
6 WARC file ISO format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717
16
- 18. Copyright © 2012 Aleph Archives. All Rights Reserved.
B3. Comparison of Access Methods
The following table summarises the main archive access methods, where: ✔ = fully supported
and ● = possible/custom development.
Website Copy Web-served Archive
Searchable ✔ ✔
Browsable ✔ ✔
Content directly navigable from disk ✔ ●
Content stored and accessed unchanged, and authentic ✔
Links independent of naming conventions ✔
Storage and preservation of metadata ✔ ✔
Access independent of file location ✔
Standards-based archives ✔
There is a consensus today that the website-copier approach has serious limitations concerning au-
thenticity of the archive, whereas the Web-served approach can ensure authenticity by design. In pro-
fessional use therefore, especially where legal and regulatory obligations are business priorities, the
Web-served approach is a necessity.
17
- 19. Copyright © 2012 Aleph Archives. All Rights Reserved.
APPENDIX C.
Web Archiving as a Best Practice
The web has matured into a central communication channel for businesses and government agencies,
with digital media (websites and other web-based content) all but replacing print media as the primary
mode of communication with customers, constituents, prospects, investors, and others.
Organizations using the web must keep accurate records of web content — online communication is
just as much of a liability as any other form of communication. As a recent case ruled: « This Court sees
no reason to treat websites differently than other electronic files. »
Web archiving has become a best practice for any organization using the web to communicate. Organi-
zations who neglect to retain accurate records of their web presence are placing themselves at unne-
cessary risk, both from a compliance and litigation standpoint.
Protect your organization by regularly archiving web content with Aleph Archives Web Archiving Plat-
form CAMA®. We provide all the technology and services you need to archive your websites and web
presence from any domain.
18
- 20. Copyright © 2012 Aleph Archives. All Rights Reserved.
APPENDIX D.
ALEPH ARCHIVES’s CAMA® PLATFORM
ARCHITECTURE OVERVIEW
APPENDIX E.
More details about the architecture internals are available upon request.
19
- 21. Copyright © 2012 Aleph Archives. All Rights Reserved.
APPENDIX E.
Elements of a Web Archiving Plan
Setup
Aleph Archives runs, tests, and calibrates the CAMA® robots to get the best rules in order to capture
your website(s) with the highest quality.
Capture
The cost related to website crawl and engineering of the target URL’s on a specified frequency.
Retention
The cost of annual storage and retaining archives of target websites. Standard plan calls 7 years re-
tention.
Operation
Includes the maintaining the designated servers and machines up and running for CAMA®, archives
access, retention, and quality assurance.
Quality Assurance (QA)
- QA Level 1: we check and verify one level deeper (depth 1) from website root (i.e home page).
- QA Level 2: we check and verify two levels deep from the root, and so on accordingly with QA
Level 3 and QA Level 4.
QA can go as far down in website depth as the client needs. In industry practice, QA Level 4 is
sufficient for most enterprises for regulatory compliance, legal and operations purposes.
- Exhaustive QA: we check and verify all designated website's and levels, verifying every page to
the website’s full depth. Exhaustive QA may be cost prohibitive, depending on the customer’s
requirements. Upon request, Aleph Archives will provide price quotation for Exhaustive QA.
- Mixed QA: we combine a sampled QA per website level with an exhaustive QA to a certain level.
20
- 22. Copyright © 2012 Aleph Archives. All Rights Reserved.
APPENDIX F.
Aleph Archives provides the following CAMA® Plans:
FEATURE PROFESSIONAL ENTERPRISE PREMIUM
Crawl engineering team ✔ ✔ ✔
WARC format (ISO 28500:2009) compliance ✔ ✔ ✔
Scheduled crawls ✔ ✔ ✔
Archives summary pane ✔ ✔ ✔
Document format handling (HTML, Word, Power- ✔ ✔ ✔
Point, PDF, Flash …)
Full text search standard advanced advanced
Full text search history ✔ ✔ ✔
Full text search queries import & export ✔ ✔ ✔
Automatic language detection ✔ ✔ ✔
Documents metadata extraction and indexing ✔ ✔ ✔
Infinite archives retention ✔ ✔ ✔
ARC to WARC batch migration ✔ ✔ ✔
WARC to WARC batch conversion ✔ ✔ ✔
Archives verification and repair tools ✔ ✔ ✔
Text summarizer ✔ ✔ ✔
Audit trails identification and traceability ✔ ✔
Deduplicated full text search ✔ ✔
Archived resources export (PDF, PNG) ✔ ✔
Multi-core aware archives servers ✔ ✔
Archives redundancy ✔ ✔
Load balancing for archives access ✔ ✔
Antivirus checker ✔ ✔
Trusted archives (digital signatures) ✔ ✔
SEC 17a-4 and FINRA compliance ✔ ✔
Secured archives access (SSL Encryption) ✔ ✔
Multilanguage instant translator ✔ ✔
Custom Branding ✔ ✔
Archives compression ✔ ✔
Archived data processing and management ✔
21
- 23. Copyright © 2012 Aleph Archives. All Rights Reserved.
FEATURE PROFESSIONAL ENTERPRISE PREMIUM
CAMA® Appliance ✔
CAMA® Appliance on USB pen drive ✔
CAMA® Kit (Access API) ✔
CAMA® 64bits ✔
Quality Assurance team (level) basic medium high
Custom metadata limit 30 unlimited unlimited
Collections limit 100 unlimited unlimited
Accounts limit 10 unlimited unlimited
Crawled resources per month up to 500K up to 5M unlimited
Archived resources per month up to 500GB up to 1TB up to 2TB
A « Custom Plan » is also available via an online form which allows customers to choose product fea-
tures that best suit their needs.
22
- 24. Copyright © 2012 Aleph Archives. All Rights Reserved.
RESOURCES & LINKS
☞ Aleph Archives
- Website
- Products demo
☞ Records Management
Finance
- FINRA Regulation Notices
- FINRA Guidance
- FINRA Regulatory Notice 10-06 on Social Media
- Summary of NASD Rule 3110 — Books and Records
- Federal Rules of Evidence 901 — Data Integrity & Authenticity
- SEC — Division of Trading and Markets
- SEC — Division of Investment Management
- SEC Rule 17 a-4 — Books and Records
- Sarbanes-Oxley Act (SOX)
- Financial Services Authority (FSA) Handbook (Europe)
- FSA Handbook Section 3.2 — see Records Requirements, Sec 3.2.20 (Europe)
- Model Requirements for the Management of Electronic Records (MoReq) (Europe)
Food and Drug Administration
- Federal Rules of Evidence 901 — Data Integrity & Authenticity
- FDA Guidance Documents — Food
- FDA Compliance & Enforcement – Food
- FDA Guidance Documents — Drugs
- Code of Federal Regulations (CFR) Title 21
- Model Requirements for the Management of Electronic Records (MoReq) (Europe)
- Pharma Social Media Wiki
- FDASM (Everything About the FDA, Internet, Social Media)
23