Handwritten Text Recognition for manuscripts and early printed texts
ARMA Winnipeg | AI Auto-classification
1. Artificial Intelligence and Auto-
Classification: Are They a Silver
Bullet for Records Management
and Compliance?
Amitabh Srivastav, VP, Operations & Governance
ARMA Winnipeg
January 29, 2020
2. Agenda – Part 1
2January 29, 2020
Introduction
Terms and Definitions
Digital Transformation Journey
Unstructured Information
1
4
2
3
3. Agenda – Part 2
3
Information Chaos
What is Artificial Intelligence?
THEMIS CS for Auto-classification
Key Takeaways
5
8
6
7
January 29, 2020
4. “Difficulties are just things to overcome, after all”
– Sir Ernest Henry Shackleton –
Part 1
4January 29, 2020
6. Profile
Amitabh Srivastav
IGP, CIP, PMP
Since 2001 worked with ECM technologies and implemented enterprise-
wide programs and projects worth several millions of dollars to clients in the
public and private sectors
Extensive IG / IM experience with a strong portfolio of qualifications in
strategy, transformation, and risk management
Combine strategic IG / IM thinking, risk management techniques, and
practical implementation experience
Provide CxO / VP-level consulting advice on current technology solutions,
industry trends, and best practices with a focus on digital transformation,
change management, records threat management, and compliance
6January 29, 2020
7. HELUX Highlights
7
• A Microsoft Preferred Partner in
Content Services specializing in
SharePoint, O365, and Cloud
technologies
• Using AI and machine learning, our
THEMIS products re-imagine the
way we do information
management and digital
transformation
12. Key concepts (cont.)
12
Information
Management
Taxonomy Metadata
Information
Architecture
Knowledge
Management
ECM / EDRMS
User
Experience
User
Interface
File
Plan
Retention /
Disposition
Security
Model
Archiving
January 29, 2020
13. Content Services (CS) is ECM+
13
Document
Management
Records
Management XaaSECM+
Content
Services
CaaS,
MCaaS,
DaaS, BaaS?
January 29, 2020
14. Content Services and Microsoft’s Modern
Approach to ECM+ …
14
Content
Services
Records
Management
Document
Management
Information
Architecture
Artificial
Intelligence
Auto-
Classification
User
Experience
User
Interface
File
Plan
Retention /
Disposition
Security
Model
Archiving
January 29, 2020
15. … Content Services include …
15
Content
Services
Knowledge
Management
Search e-Discovery
Digital Asset
Management
Digital Rights
Management
January 29, 2020
17. The current state
17
85% will never be retrieved 50% are duplicates
“… digital technologies, tools, and social media platforms now allow individuals to create
information at a torrid pace and instantaneously share it globally …” (IGBoK, 1st Ed., p 111)
“Information chaos and confusion are preventing organizations from achieving their digital
transformation objectives. Many organizations believe they must modernize their
information management strategy in order to meet this challenge and survive.” (John
Brown, CEO, HELUX)
January 29, 2020
18. Digital Transformation (DT) “pain points”
Cyber
Attacks
Data
Breaches
BYOD
Remote
Workforce
Change
Manageme
nt
Cloud
Services
Content
Repositorie
s
Informatio
n Chaos
Content
Monetizatio
n
DT
January 29, 2020
19. Digital Transformation enablers
19
Cloud
Enablement
Intelligent
Capture
Repository
Neutral
Content
Integrated
Collaboration
Information
Governance
Content
Services
Auto-
Classification
Customer
Experience
1
2
3
45
6
7
8
“… deploying capabilities or
services … that exists outside
the firewall.”
“… workflows to convert
physical information into
digital formats using multiple
channels … “
“… repositories that are
independent of … different
systems and underlying
technology platforms …”
“… technology platform …
that allows teams to save,
search, and share information
assets …”
“… the end-users' “felt
experiences” … with an
organization’s on-line
services and digital products
…”
“… using rules … to automate
how content is captured,
analyzed, and governed over
its lifecycle.” (AIIM)
“… delivers content and / or
services on demand,
regardless of its source, to
any device, and anywhere
…”
“… specification of decision
rights and an accountability
framework to encourage
desirable behavior …”
(Gartner)
Source: Intelligent Information Management Maturity (I2M2) Model
January 29, 2020
20. ARMA:
“The structure and
interrelationship of information,
especially with an eye towards
using business rules, observed use
behaviors, and effective interface
design to facilitate access to
information.”
(Glossary of Records Management and Information
Governance Terms, 5th Edition, ARMA International TR 22-
2016)
20
What is Information Architecture (IA)?
Treasury Board Secretariat:
“Information Architecture is the
structure of the information
components of an enterprise, their
interrelationships, and the
principles and guidelines governing
their design and evolution over time.
Information architecture enables the
sharing, reuse, horizontal
aggregation, and analysis of
information.”
(The TBS Information Management Policy, Govt. of Canada)
January 29, 2020
21. Information Architecture applies structure to
content sources
21
Social
Media
Blogs
Information
Architecture
Videos /
Pictures
Audio
Emails and
Documents
Direct
Messages
January 29, 2020
25. The Cost of Search
25
49% said they have
trouble locating
documents
43% have trouble with
document approval
requests and document
sharing
33% struggle with the
document versioning
The average knowledge workers spends:
2.5 hours per day
15% to 30% of the workday
searching for information (IDC)
The inability to find and retrieve document
costs organization, that employ 1,000
workers,
$25 million per year
January 29, 2020
26. The High Cost of Document
26
For every $1 spent to
create a document $10
is spent on
management
30 billion documents are created every
year
(McKinseyGlobalInstitute)85% will never be retrieved
85%
50% are duplicates
60% are
obsolete
50%
60%
Document
Creation
Document
Management
January 29, 2020
27. Unstructured information comes from
…
27
85% will never be retrieved 50% are duplicates
Source: https://www.statista.com/chart/17518/internet-use-one-minute/
January 29, 2020
28. Risk of
losing
control of
information
Content
sprawl
Risk of
data
breaches
Unmanaged
grown
Risk of
non-
compliance
Poor
Governanc
e
Risk of
litigation
Information
leaks
… result in information chaos
28
85% will never be retrieved 50% are duplicates
January 29, 2020
29. Agenda – Part 1 recap
29
Introduction
Terms and Definitions
Digital Transformation Journey
Unstructured Information
1
4
2
3
January 29, 2020
30. Agenda – Part 2 recap
30
Information Chaos
What is Artificial Intelligence?
THEMIS CS for Auto-classification
Key Takeaways
5
8
6
7
January 29, 2020
32. Beware of the “Document Chaos
Monster”
32
Data Security
Storage Costs
User
Productivity
Compliance
Chaos Monster Victims
Office 365Adoption
SILOED CONTENT
UNSTRUCTURED
CONTENT
ROT CONTENT (Redundant, Obsolete, or Trivial)
MISCLASSIFIED
CONTENT
January 29, 2020
33. The Challenge of Taming the “Document
Chaos Monster”
33
SILOED CONTENT
UNSTRUCTURED
CONTENT
ROT CONTENT (Redundant, Obsolete, or Trivial)
MISCLASSIFIED
CONTENT
Rely on Users to Classify
Documents
- inconsistent, incomplete, lack of
knowledge
Automated Classification
Processes
- not smart enough, incomplete
processes
Classification Workflows
- incomplete, inconsistent, reliance
legacy date classification codes
AI Auto-Classification
- intelligent, complete, up to date,
scalable to large data sets, consistent,
ongoing
January 29, 2020
34. “Document Chaos Monster pain points”
34
Internal Drivers
• e-Discovery
• Records management
• Analytics for decision-making
• Metrics for predictive analytics
• Process inefficiencies
• Uncontrolled storage costs
• ROT
• Business continuity and resiliency
• Disaster recovery
External Drivers
• Privacy regulations
• Regulatory fines
• Consumer trust
• Competitive environment
• Political and legal environment
• Reputational damage
• Monetize content
• Digital rights
January 29, 2020
35. 6 What is Artificial Intelligence
Part 2
35January 29, 2020
36. “At last … an AI solution!”
36
85% will never be retrieved 50% are duplicates“Artificial intelligence (AI) has crossed the chasm; more companies and more executives
than ever before have come to realize the value that augmented intelligence offers their
firms. These companies have actively moved to implement the technology in their
organizations.”
(AI's Disruption Of Data Management: Is A Different Approach Needed?,
https://www.forbes.com/sites/forbestechcouncil/2019/09/06/ais-disruption-of-data-
management-is-a-different-approach-needed/#1061832a24df)
January 29, 2020
40. Natural Intelligence (NI):
“… is the opposite of artificial
intelligence: it is all the systems of
control present in biology.”
(http://www.cs.bath.ac.uk/~jjb/web/uni.html)
40
Definitions
January 29, 2020
41. Artificial General Intelligence
(AGI):
“… is the intelligence of a machine
that has the capacity to understand or
learn any intellectual task that
a human being can.”
(www.wikipedia.org)
41
Definitions
January 29, 2020
42. Artificial Intelligence (AI):
“… is intelligence demonstrated by
machines, in contrast to the natural
intelligence displayed by humans,
… that perceives its environment,
and learns, makes decisions, and
takes actions that maximize its
chances of successfully achieving its
goals without human input.”
(Amitabh Srivastav, HELUX)
42
Definitions (modified from Wikipedia)
Red text is my
modification
January 29, 2020
43. Neural Networks:
“A neural network is a network or
circuit of neurons, or in a modern
sense, an artificial neural network,
composed of artificial neurons or
nodes.”
(www.wikipedia.org)
43
Definitions
https://commons.wikimedia.org/w/index.php?curid=5084582
January 29, 2020
44. Machine Learning (ML):
“… is the scientific study of
algorithms and statistical models
that computer systems use to
perform a specific task without using
explicit instructions, relying on
patterns and inference instead.”
(www.wikipedia.org)
44
Definitions
January 29, 2020
45. Deep Learning (DL):
“Deep learning is part of a broader
family of machine learning methods
based on artificial neural networks.”
(www.wikipeida.org)
45
Definitions
January 29, 2020
46. Supervised Learning:
Is training the classifier using many
labeled examples such as images of
a child playing with a dog and
learning to differentiate between the
two. Another example is recognizing
handwriting.
(“Unsupervised Deep Learning Recommender System for
Personal Computer Users”, NTELLI 2017 : The Sixth
International Conference on Intelligent Systems and
Applications (includes InManEnt))
46
AI deep learning models for auto-
classification
January 29, 2020
47. Unsupervised Learning:
Is the process of the classifier
learning without labeled examples
organized into a dataset and there is
no feedback to the classifier
(“Unsupervised Deep Learning Recommender System for
Personal Computer Users”, NTELLI 2017 : The Sixth
International Conference on Intelligent Systems and
Applications (includes InManEnt))
47
AI deep learning models for auto-
classification
January 29, 2020
48. Semi-supervised Learning:
It uses a small amount of labeled
data bolstering a larger set of
unlabeled data
(“Unsupervised Deep Learning Recommender System for
Personal Computer Users”, NTELLI 2017 : The Sixth
International Conference on Intelligent Systems and
Applications (includes InManEnt))
48
AI deep learning models for auto-
classification
January 29, 2020
49. Transfer Learning:
Is an approach in which the classifier
is trained on data that is augmented
by some other already trained model
(“Unsupervised Deep Learning Recommender System for
Personal Computer Users”, NTELLI 2017 : The Sixth
International Conference on Intelligent Systems and
Applications (includes InManEnt))
49
AI deep learning models for auto-
classification
January 29, 2020
50. Reinforcement learning
Is useful in use cases where the
feedback to the learning system only
arrives after some end state is
reached, or after a significant delay
(“Unsupervised Deep Learning Recommender System for
Personal Computer Users”, NTELLI 2017 : The Sixth
International Conference on Intelligent Systems and
Applications (includes InManEnt))
50
AI deep learning models for auto-
classification
January 29, 2020
51. 51
Use AI to auto-classify and enable
AI
Complianc
e
e-
Discovery
Records
Mgmt.
ATIP
Response
s
Open
Govt.
Archival
Unstructure
d
Analytics
January 29, 2020
52. Use AI to search repositories to classify
content
Laptops
Desktops
Cell
phones
Tablets
On-
premise Cloud
storage
Cloud
services
Hybrid
Offsite
storage
Unstructured
data accounts
for 80% of
content on
devices and in
repositories
Very large
amount of
“dark data”
stored in
repositories
AI
January 29, 2020
53. Use AI to search repositories to classify
content
Laptops
Desktops
Cell
phones
Tablets
On-
premise Cloud
storage
Cloud
services
Hybrid
Offsite
storageAI
Unstructured
data accounts
for 80% of
content on
devices and in
repositories
Very large
amount of
“dark data”
stored in
repositories
54. 54
Are these products AI “in action?”
Product Description
Alexa, Siri,
Cortana, and
Google
Assistant
These are virtual (voice / digital) assistants that respond to voice queries
and use NLP to answer questions, make recommendations, and perform
actions
Watson An AI product from IBM that uses NLP to answer questions and is used in
healthcare, education, weather forecasting, etc.
Debater An AI project from IBM, designed to participate in a full live debates with
expert human debaters
January 29, 2020
55. 55
What about AI for content?
Term Description
Classifiers Classifiers can greatly increase the number of content items that
are labeled by learning from the input data given to it and then
using this knowledge to classify new observations
Entity Extractors Extract and process information to identify and classify key
elements from text into pre-defined categories to help transform
unstructured data to structured data
Image Recognizers Gives a machine the ability to interpret the input received
through computer vision and categorize what it “sees”
Independent Component
Analysis
Look for patterns in data that are not obvious to humans
Machine to Machine
Learning
How will AI treat content, especially ethical consideration
January 29, 2020
56. 56
What is Microsoft’s Project Cortex?
“Project Cortex uses AI to create a knowledge network that reasons over your
organization’s data and automatically organizes it into shared topics like projects and
customers. It also delivers relevant knowledge to people across your organization
through topic cards and topic pages in the apps they use every day.”
(www.microsoft.com)
January 29, 2020
57. 57
Using the metadata as a foundation
Coherent across
Microsoft 365
Discover enterprise
content based on
terms
Consistent tagging
experience with
contextual term
suggestions and Auto
Tagging
Improved enterprise
content type
syndication,
discovery, and
enforcement for
consistent metadata
schemas across
tenant
January 29, 2020
58. 7 THEMIS CS for Auto-classification
Part 2
58January 29, 2020
60. How does THEMIS CS “Slay the
Monster?”
60
Unstructured
Content
Information
Architecture
Design
Artificial
Intelligence
Rules
Structured
Content with
metadata
Internal Drivers
January 29, 2020
61. THEMIS CS can search repositories to
classify content
Laptops
Desktops
Cell
phones
Tablets
On-
premise Cloud
storage
Cloud
services
Hybrid
Offsite
storage
Unstructured
content on
devices and in
repositories
Very large
amount of
“dark data”
stored in
repositories
THEMIS
CS
61January 29, 2020
63. OLD WAY
Use Excel or Word
SLOW, TEDIOUS, AND COSTLY PROCESS
SPECIALIZED TEAM
Rinse &
Repeat
Requirements
Gathering
IA Assembly
in Excel
Send to
Development
Back to
Users
User Acceptance
(Maybe!)
IA Expert IT IM Developer
MODERN WAY
CREATE IA INTUITIVELY ONLINE
ANY TEAM
Import &
Analysis
Design &
Visualization
User
Acceptance
Rapid
Deployment
CREATE IA MANUALLY OFFLINE
QUICK, PAINLESS PROCESS
Your Team
IA … then and now … using THEMIS CS
63January 29, 2020
64. 64
IA made easy using THEMIS CS
THEMIS CS turns the complex, time-intensive
task of building and maintaining a robust IAinto
an automated, intelligent process using AI.
Cut your deployment times by 50% or more
Guaranteed 100% error-free deployments
Effective Team Collaboration
January 29, 2020
65. 65
IA made easy using the THEMIS CS
process
IA Visualization
Publish and GO LIVE!
1 2
3
THEMIS IA Designer
IA Analysis Wizard
Iteration and Acceptance
Deploy to Sandbox
4
5
6
THEMIS CS’s step-by-step wizard will ensure you deploy an error-free IA
built on top of industry best practices
January 29, 2020
66. 66
THEMIS CS features for auto-classification
THEMIS CS
Information
Architecture
Assistant
using a
Chatbot
THEMIS Blueprint
Information
Architecture
Assistant using
Machine
Learning
Chatbot assists in
designing the
information
architecture and
configuring GCdocs
or SharePoint
ML recommends
appropriate taxonomy,
folder structure
(GCdocs), site structure
(SharePoint) based on
user-provided
information and using
best practices
HELUX-hosted service
that stores information
architecture blueprint
snippets and then builds
a blueprint based on
best practices
January 29, 2020
67. Eight ways THEMIS CS enables Digital
Transformation
67
Digitize
Paper
Analyze
ROT
Import
Information
Architecture
Apply AI
Rules
Accurate
Auto-
Classification
Rapid
Deployment
Improve
e-Discovery
Increase ROI
on Content
1
2
3
45
6
7
8
January 29, 2020
68. 68
THEMIS CS uses AI for ROT analysis
January 29, 2020
Inventory the
target content
sources
• Shared drives
• Laptops / Desktops
• Off-line
repositories
• Cloud storage
• Mobile devices
Rules to
identify the
content types
• Personally
information
• Health information
• Employee
information
• Confidential data
• Public information
Schema to
classify
content
• File plan
• Taxonomy
• Metadata
• Retention and
disposition
schedule
Consider
additional
rules
• Relevant
regulations
• Industry standards
• Best practices
69. 69
THEMIS CS uses AI for email auto-
categorization
January 29, 2020
Extract
email
headers
• From
• To
• Subject
• Date
• Copied to
• Attachments
• Etc.
Rules to
identify
email
content
• Personal
information
• Health information
• Employee
information
• Confidential data
• Public information
Identify
duplicate
emails
• Duplicate
threshold
• “Near duplicates”
Put
unknown
emails into
“quarantine”
• Does not match
any rules
• Matches rules for
further analysis
Schema to
classify
email and
content
• File plan
• Taxonomy
• Metadata
• Retention and
disposition
schedule
70. 70
THEMIS CS Use Case for AI and Auto-
Classification
Problem
Description
Business Challenge Solution Benefits
• Several terabytes of
pictures and videos on
share drives
• Many years worth of
physical pictures
• Difficult to work with
physical pictures
• Storage costs are
increasing
• Volume of content is
increasing
• Identify duplicates and
“near duplicates”
• Identify content to
dispose
• Tag content with
appropriate metadata
• Retain content for on-
going operations and
possible litigation
• Reduce storage costs
• Accurately and
consistently classify
digital the physical
content
• Use THEMIS IA to
rapidly build the
architecture, rules, and
metadata to tag content
• Use THEMIS AI to
search the shared
drives and apply the
rules and auto-classify
the content
• Use THEMIS RM to
apply the retention and
disposition schedules to
the auto-classified
content
• Correctly and
consistently auto-
classify content
• Identify and dispose of
ROT
• Improve accuracy of
search
• Improve e-Discovery
• Reduce storage costs
• Teach THEMIS AI
additional rules to auto-
classify more content
• THEMIS AI is
“resource” available
24/7 to handle
increasing volumes
January 29, 2020
72. THEMIS CS controls the “Document
Chaos Monster”
72
SILOED CONTENT
UNSTRUCTURED
CONTENT
ROT CONTENT (Redundant, Obsolete, or Trivial)
MISCLASSIFIED
CONTENT
Data Security
Storage Costs
User
Productivity
Compliance
Chaos Monster Victims
Office 365Adoption
January 29, 2020
73. 73
The THEMIS CS Advantage
THEMIS CS
Manually build IA
with spreadsheets
Hire consultant
Time to Deployment
Cost Effective
Accuracy
Error Free
User Satisfaction
Ongoing Monitoring &
Improvements
January 29, 2020
74. IA integrates:
File plan
Taxonomy
Retention and disposition
schedules
Security groups, permissions,
and user accounts
Metadata
Content types, document types,
categories and attributes
74
THEMIS CS benefits
IA enables:
Improved UX and UI via better
navigation
Improved search experience via
better auto-classification
Improved collaboration and
knowledge sharing via a more
intuitive design
Improved change management via
more user awareness and easier
user adoptions
January 29, 2020
The presentation is in two parts
First talk about terms and definitions
Many if not most organizations are on a DT journey
Increasing volume of unstructured information content
The increasing volume is leading to information chaos and overload
So how can AI can help?
First need to understand what is AI
The talk about what THEMIS AI can do to improve autoclassification
How can THEMIS AI help control the chaos and turn unstructured information into structured information via auto-classification
Wrap with Q&A
We chose this particular topic because of numerous questions from clients about how can AI help me identify my content and manage my content
We chose this particular topic because of numerous questions from clients about how can AI help me identify my content and manage my content
When you are dealing with DG / CxO level clients, they are looking at governance, risk, and compliance. So the question eventually gets to what does the organization need?
HELUX focuses on the Microsoft technology stack and its eco-system
IG includes some key concepts
Data Management comprises all disciplines related to managing data as a valuable resource.
It is all the data points an organization captures that provides context and insight into actionable intelligence
It is a five step that I am developing for a client and a possible future presentation
Are there additional key concepts that you feel should be here?
Looking at IM, it is broken down into some sub-concepts that are key
All of should recognize these and be familiar with, including IA
IA is critical because is provides the rules for AI
DT has changed the relationship
CS is a more modern approach to ECM in terms for delivering services
Goes beyond the EDRMS file plans, retention, disposition, content management
It is social media, EFSS, multiple repositories, various types of services from IaaS to SaaS
Focus is on delivering content on demand, anytime, anywhere, and on any device
BaaS for recordkeeping one of the exciting areas for CS. You can visit the Blockchain Council for more information.
A recently published whitepaper also explores blockchain for trusted storage system, including uses cases for records management
CS has changed the relationship
We see the key concepts are re-arranged some of the old concepts, and include ...
CS some new concepts search and e-Discovery
Managing your digital assets and monetizing your digital con
While DT promises my benefits, but we know there are some key pain points
Many of these we know
Do you know what BYOD?
Others are not given the same weight and importance, in particular CM because of transformation means how it affects workers
We know the upside of cloud services, but the downside is all that content sits in multiple repositories
Its hard to find and leads to chaos, which I will talk about more
One are overlook is Content monetization … make money from your content
It’s your streaming services and many other services … it’s so easy to sign up for a subscription services
There are a range of definitions for Information Architecture
Here are two of them
Key thing to notice is structure and interrelationship of the information
The interrelationship has rules
It is important to make connection between IA and CS, since is the link in modern content management
Once you apply structure, then you can auto-classify the content
The volume of digital information and data is increasing at rates never witnessed before. We now live in a world where the number of devices outnumber humans and we can’t possibly consume all of it.
In fact the volume of content is doubling every two years, and by 2020 the digital universe – the data we create and copy annually – will grow by a factor of 10 – from 4.4 zettabytes (or 4.4 trillion gigabytes) in 2013 to 44 zettabytes. Gartner and IDC estimate that within organizations, over 80% of this is data is unstructured content like documents, emails, and video. In 2017, its at least 10 ZB., Documents, images, videos, files.
OR – 220 billion content databases.
So, the key question that needs to be answered is how do we manage all of this content effectively, and how do we surface information to our employees in a way that is contextual and personalized in order to help them with the right insights at the right time.
Transition: What’s needed is a content collaboration platform that brings all of this together.
There is also the “people costs” in lost productivity spent searching information
Once you find document, then do you have the correct version?
The cost are multiplied by a factor of 10 and this will mostly likely only increase!
IG chaos is coming from many sources
Costs are increasing and exposing organization to risks
In popular media, webinars, blogs, etc. mention that more information was generated in the last few years, compared to all off human history
The information chaos is only increase and presenting more difficulties to all organizations, and I mean ALL
This are the same DT pain points as before
You can see the inter-connection between DT and Information chaos
We are familiar with these risks
This leads to the “Document Chaos Monster” affect these key areas
This fitting since this is the day before Halloween
It not a bad costume to go as …
We leave and expect workers to deal with this monster problem to correctly and consistently classify documents
But it just will not happen
Let’s look a bit deeper into the pain points
Consideration for internal drivers:
Improve search and the ability to find relevant information
Improve the quality of information to support informed decision-making
Identify and retain information that has business and / or archival value
Identity and secure vital records for business resiliency
Manage and reduce costs including, storage, IT costs, and process inefficiencies
Consideration for external drivers:
Increase trust by effectively managing and securing personal data
Manage and mitigate data and information security risks
Manage and mitigate legal and compliance risks
Manage digital assets and digital rights by monetizing the content and improving the customer’s experience
Increase trust by managing and demonstrating corporate social responsibility
Improve citizen access to information, including their personal information
How many people recognize these images, and especially the on left?
How many recognize the image on left? Warning, if you raise your hand, then you are giving your age away
These robot were not very nice, but they did display AI that we would consider is the same NI
The could react to almost an infinite range of situations like you and I and problem solve.
That is what most people think when talking about AI
Solving the Rubik's cube is does require learning, but is it AI or closer to NI or is that Machine Learning (ML)?
Same thing with building something with block.
A child and a robot could do the same, and learn how to stack the blocks.
Is that AI?
This is still in the realm of science fiction as you see in the movies
AI or ML has deep learning which imitates the processes of the human brain in analyzing data and creating patterns to make AI decisions
This is imitates the neurons in the human mind
ML is considered a sub-set of AI
For this presentation it is essentially the same thing
Think of this having the IKEA manual to build the furniture
In supervised learning, the classifier adjusts weights during each training iteration in order to minimize the classification error
Key is having correct labels for input-output pairs in order to train the classifier
Think of this as not having the IKEA manual to build the furniture
The classifier has to out figure out the pattern from raw data
The AI model learns from unlabeled data
It also has to assess a method of evaluating the accuracy of what the classifier has learnt
This perhaps looking at picture of the assembled IKEA furniture by looking at the picture on the box
This lies between the supervised learning and unsupervised learning
Think of this as having built the first chair, now you are building the second chair from the IKEA furniture
The advantage of using transfer learning is that it enables a model to start from some already trained on set of feature
The model can be customized for a specific purpose
Think of this a build the IKEA chair, you made a mistake and then re-built the chair. So the second time you avoid repeating the mistake
This is the case for autonomous driving cars and strategy games where learning happen in a feedback loop
This is essentially a reward system
The challenge is auto-classifying content to enable these key areas.
There might be more, but I believe these are key one
The next challenge is also doing with a high degree of accuracy
So the question is how good is the AI and how well can it learn
Do you know where your information assets are?
Imagine if you can point the AI toward the repositories to identify content and classify it
Do you know where your information assets are?
Information volumes continue to grow, and companies have limited visibility into their information assets
Imagine if you can point the AI toward the repositories to identify content and classify it
Alexa ordering doll houses in early 2017
6-year old girl Dallas, TX by mistake triggered Alexa into order an expense doll house by asking “can you play dollhouse with me and get me a dollhouse”
Alexa took it as a command and order the dollhouse
When a newscaster played the story, and the news anchor said “Alexa ordered me a dollhouse” and Alexa heard the command and tried to place an order, too.
In the end – no order were placed, but it is an interesting and cautionary story
IA provides the rule to classify the content and AI uses the rules auto-classify the content
It is as simple as that
Unstructured content doesn’t have an implicit organization
Email and attachments
Shared files, active and archived
Desktop and “loose” files
Paper files, imaged files
Cloud storage & collaborative files
Using IA design to give structure
Using AI rules to classify the content
Apply metadata
Outcome structure content to address the paints
Remember this slide from before?
THEMIS CS can search these repositories to identify content and classify it
Today many organization are looking at moving to O365 as the way forward from Office on the desktop
Regardless of using Office or O365 user are creating information and using on a daily basis
So this information needs to be managed from the time it is created
This is the very important to understand the content lifecycle and manage the content in a structured way
THEMIS CS can manage the content starting at the front end at the time of create to the backend
THEMIS CS with the IA and provide structure to the connect including migration into O365 in a structured way
THEMIS CS with AI can look at the content and classify it in order to apply content lifecycle management
THEMIS CS frees user to work and not care about IG, risk, and compliance, but focus on saving, searching, and sharing and doing their job
But THEMIS CS can help the organization improve its GRC, because it needs to worry compliance, e-discover, ATIP, PII, etc.
So THEMIS book end content preparation, and on-going governance
SO WHAT IS IA
Anyone involved in developing an IA knows how tedious the process it
Endless loop of design sessions and before a draft is ready for review and approval
THEMIS CS stops the endless loop
You have good ROI because you reduce the number specialize them members, the duration and effort to develop and deploy the IA
The IA is easier because the step-by-step process is easier, too.
THEMIS IA can deploy to both GCdocs and SharePoint using the same the same wizard and designer
You change the target environment
So I have been talking about THEMIS CS as a suite of products
The IA product builds the IA, the RM product provide records management and retention
The AI feature includes three capabilities to improve auto-classification
High-level steps for ROT analysis
High-level steps for auto-classifying emails
Let’s deep dive into a use case for auto-classification
When you are dealing with DG / senior management, they want to know what is being don’t improve IG, reduce risks and increase compliance, improve productivity, and user adoption
THEMIS CS
There are cost saving that translate into improved ROI along these KPIs