SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Driving Behavioral Change for
Information Management through
Data-Driven Green Strategy
A Case Study
Urmi Majumder and Fernando Aguilar Islas
EDW 2024
Topics Covered
⬢ What is a Green Information Management (IM) Strategy, and
why should you have one?
⬢ How can Artificial Intelligence (AI) and Machine Learning (ML)
support your Green IM Strategy through content deduplication?
⬢ How can an organization use insights into their data to
influence employee behavior for IM?
⬢ How can you reap additional benefits from content reduction
that go beyond Green IM?
⬢ 15+ years of experience in enterprise system architecture,
design, implementation and operations
⬢ Leads the development of technical solutions in support of
wide variety of knowledge and data management solutions
⬢ Principal architect in knowledge graphs, enterprise AI, and
scalable data management systems
⬢ Ph.D in Computer Science, Duke University
Urmi Majumder
Principal Data Architecture Consultant
Fernando Aguilar Islas
Data Science Consultant
⬢ 9+ years of experience serving as data scientist for graph-powered
machine learning and AI-based solutions
⬢ Implemented several knowledge graph-based enterprise data
catalogs
⬢ Experience leading and supporting the integration and
implementation of 20+ data projects
⬢ MS, Applies Statistics, Penn State University
ENTERPRISE KNOWLEDGE
Green Information
Management (IM)
Strategy
Green Information Management:
Putting the “Green” in Information Management (IM)
What is it? Why should enterprises have a
green IM strategy?
Green Information
Management (gIM) is
a strategic approach
focused on optimizing
and minimizing the
environmental
impact of
information-related
processes within an
organization.
Sustainability
Reduce resource
consumption and waste
associated with IM practices.
Cost Efficiency
Reduce energy consumption
through streamlined
processes and optimized
infrastructure.
Compliance
Address regulatory
requirements to demonstrate
adherence to green
standards.
Corporate
Responsibility
Commit to environmental
stewardship.
A supply-chain giant is committed to an
organizational goal of becoming a
Net-Zero Emissions Business.
The organization realized that they have
a huge digital carbon footprint due to
proliferation of duplicate content – over
226K documents occupying ~1 PB of
space – through use of content
management systems and collaboration
software such as SharePoint and
Microsoft Teams that unintentionally
build siloes because of a lack of
visibility/awareness.
Case Study: The Challenge
The Solution
AI-Powered Digital Carbon Footprint Calculator
ORIGINAL STATE THE NEED SOLUTION
● Rules-based non-record
deletion application deleting
forgotten non-sensitive
documents periodically.
● Algorithm could only delete
documents that were not
modified for at least 3 years
and marked as non-records.
But a lack of sharing culture
meant most documents were
unnecessarily marked as
sensitive.
● Need to aggregate tens of
primary sources with slightly
different metadata and access
levels, and yet are duplicate or
near-duplicate content, to build
a content similarity and
resultant carbon footprint
dashboard.
● Need to augment rule-based
approach relying solely on
metadata with AI relying on
content similarity to identify
duplicate and near- duplicate
content.
● Implemented the data
pipelines – and matching
algorithms – to connect data
siloed in different systems.
● Automated duplicate content
identification to give the
organization the ability to drill
down into duplicate
content-related findings across
data sources and improve QA.
● Built a BI dashboard to provide
a clear view into content
duplication and its connection
to CO2 emissions.
The AI Connection:
How to Use AI for
Green IM
ENTERPRISE KNOWLEDGE
Overall Solution Phased Approach
Phase I: Proof of Concept
● Refine use case, prioritize requirements and
define KPIs
● Conduct Exploratory Data Analysis
● Develop Matching Algorithms
● Implement Content Deduplication Data
Pipeline
● Implement CO2 Emissions BI Dashboard
● Track KPIs
Phase II: Productionalization
● Scale Data Pipeline
● Enhance BI Dashboard to make it
actionable
● Integrate Pipeline with Content
Management System and
Collaboration Software
● Develop broader Green IM strategy
Metadata Ingestion
● Data Source Integration
● Metadata Extraction
● Content Extraction
● Content Vectorization
using AI ● Rule-Based Metadata
Similarity Analysis
● Stochastic Content
Similarity Analysis
● Combining Metadata and
Content Findings to identify
duplicates and
near-duplicates
● Duplicate Storage Impact
Analysis
● Resultant CO2 Emissions
Calculation
● BI Dashboard for summary
statistics and drill down by
key metadata fields
Content Deduplication
CO2 Emissions Viewer
End-to-End Process Overview
The AI Connection
ENTERPRISE KNOWLEDGE
Data Ingestion
● Source system identification
● Establishment of data crawlers
that meet system-specific
access requirements
Duplicate Analysis Output
● Combination of rule-based and
stochastic analysis to identify
duplicates
● Resultant Storage Impact
● Resultant CO2 Emissions
Metadata & Content Extraction
● Metadata Extraction from
either source system or
supporting metadata store
● Content extraction based on
file type
Matching Algorithm Execution
● Rules-based duplicate
inferencing on content
metadata
● Stochastic duplicate
inferencing on content
vectors
Metadata Enrichment
● Use of reference data/taxonomy management system
● Content Vectorization via use of Generative AI
Content
Deduplication
Process
Content Deduplication Pipeline
The AI Connection
ENTERPRISE KNOWLEDGE
Content Deduplication Pipeline
Conceptual Architecture
Minimize Data
Movement
Use Transformer
Models for
Vectorization
Run Analysis
Pipelines in Cloud
Infrastructure
⬢ Reduce energy
consumption from
duplicate content
storage and data transfer
⬢ No copies – extract
content from original file
for in-memory
processing
⬢ < 100m parameters (e.g.,
DistilBERT)
⬢ Use less memory and
storage space due to
smaller model size
⬢ Take advantage of
resource efficiency at
scale
⬢ Use under-utilized
regions (e.g., Azure
Norway East region) or
regions powered by 100%
renewable energy (e.g.,
AWS US East 2)
Training OpenAI’s GPT 3.5 requires 1K
GPU processors running in parallel for
weeks at a time
Content Deduplication Pipeline
Green Application Development Considerations
Reconcile high volume of distinct content items to significantly
lower number of unique content items across silo-ed systems
Give users clear view into content duplication and its
connection to CO2 emissions through meaningful dashboard
Establish a plug-n-play architecture for extracting content from
many file types and vectorizing the same using multiple
Generative AI models to best align the content similarity
pipeline to the organizational needs
Benefits of the AI-Powered Digital
Carbon Footprint Calculator
The Power of Data:
Drive Social Changes
Through Data
Size of the Opportunity
An estimated*
50%
of corporate data is
duplicated across the
organization
Real World Example
⬢ An email server contains 100 instances of the same 1
MB file attachment sent to 100 people
⬢ Without content deduplication, if all 100 people backup
their mailboxes, it would consume 100 MB of storage
In the supply-chain
organization, ~226K
documents occupied
~1 PB of storage,
resulting in 228 tonnes
of CO2 emissions.
34
tonnes
CO2
15% content reduction through
duplicate identification
*equivalent to
20 flights
from JFK to
LHR
* https://www.xillio.com/blog/recognize-duplicate-folder-structures-with-xillio-insights
Why is content duplication so
prevalent in the enterprise?
NON-DELIBERATE action on part of the user
● Users forget a document exists and
recreates it
● Users cannot find what they are looking
for and creates it from scratch
● Users save email attachments,
sometimes the same file multiple times
● Users downloads files from the intranet,
sometimes the same file multiple times
DELIBERATE action on part of the user
● Maintain backup copy
● Copy file for easier transfer/distribution
● Use separate files for different document
versions
Defensible Deletion
● Redundant, obsolete, trivial data held on
by users just in case
● Non-record deletion policy in an
organization can save storage space by
deleting documents not marked as
records that have not been modified
for a predefined period
Barriers to Automated
Content Removal
● Content incorrectly marked as records
due to lack of proper compliance
training
● Content marked with higher sensitivity
labels because of knowledge hoarding
culture
● Content duplicated to associate
different access permissions due to
limited cross departmental collaboration
Automated Content Renewal
DATA-DRIVEN USER
BEHAVIOR CHANGE: Goals
“Educate and empower to influence positive behavior change.”
Educate
● Facilitate self-directed and social
learning opportunities for green
information management
Empower
● Facilitate evidence-based decision by offering
easy-access to personal CO2 emissions viewer
● Propel user into action by equipping him with the
right interactive tool to act on the findings in the
flow of work
● Provide the data needed to identify personal
emissions trends and a way to track progress over
time
ENTERPRISE KNOWLEDGE
Pilot CO2 Emissions Viewer: Demo Time!
DATA-DRIVEN USER
BEHAVIOR CHANGE:
Recommended
Actions
“Educate and empower to influence positive behavior change.”
Educate
● Educate users to
use links instead
of attachments
for file sharing
● Educate users on
componentized
content
management
Empower
● Provide accurate data
○ Establish KPIs measuring accuracy of duplicate detection pipeline
● Frame up the data in the context of the bigger picture
○ Enable visualization of immediate CO2 emission reduction as a result of
deduplication
○ Enable visualization of impact of content reduction over time
○ Enable visualization of a personal digital footprint counter for unique content
over time
● Create a “don’t make me think” experience with push-of-a-button actions
available in the end-user application
○ Enable system triggers to remove content through the application interface
The Bigger Impact:
Beyond Green IM
Generative AI (LLMs)
05
● Increase efficiency in RAG applications by removing noise and bias
● Decrease costs associated with vectorizing content
Legal and Regulatory
Compliance
04
● Reduced exposure to copyrights, trademarks, or other intellectual
property rights violations
● Decrease the risk of privacy breaches, as they may contain sensitive
information that can be accessed by unauthorized parties
Content Auditing and
Analysis
03
● Identify redundant or obsolete data
● Surface similar content content with different associated metadata
Cloud Data Migration
02
● Minimizes the volume of data to be transferred, optimizing network
bandwidth and reducing associated costs and energy consumption
● Lower operational costs and environmental impact
Mergers and Acquisitions
01
● Content deduplication streamlines the integration of data from merged
entities, ensuring a more efficient and sustainable data consolidation process
● Lowering infrastructure costs and minimizing environmental impact through
efficient content management practices
Content Deduplication Use Cases
ENTERPRISE KNOWLEDGE
Questions?
Thank you for listening.
We are happy to take any
questions at this time.
Urmi Majumder
umajumder@enterprise-knowledge
.com
www.linkedin.com/in/urmim/
Fernando Aguilar Islas
fislas@enterprise-knowledge.com
www.linkedin.com/in/feraguilaris/

Weitere ähnliche Inhalte

Ähnlich wie Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024)

Information Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationInformation Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationChristopher Wynder
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDATAVERSITY
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Enterprise content management (in short)
Enterprise content management  (in short)Enterprise content management  (in short)
Enterprise content management (in short)Anatoliy Arkhipov
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Citadelh2020
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Gayane Sedrakyan
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...DLT Solutions
 
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...Wiiisdom
 
Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentMartin Kaltenböck
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalMartin Kaltenböck
 
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationEnabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationDenodo
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findabilityKristian Norling
 
Monitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIMonitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIChristian Buckley
 

Ähnlich wie Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024) (20)

Information Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationInformation Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentation
 
Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Enterprise content management (in short)
Enterprise content management  (in short)Enterprise content management  (in short)
Enterprise content management (in short)
 
David Reeve - UKAD 2016 forum
David Reeve - UKAD 2016 forumDavid Reeve - UKAD 2016 forum
David Reeve - UKAD 2016 forum
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
 
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
 
Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable development
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance Professional
 
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationEnabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Monitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIMonitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROI
 

Mehr von Enterprise Knowledge

Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceEnterprise Knowledge
 
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding AmericaNonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding AmericaEnterprise Knowledge
 
Road to the Taxonomy Rollercoaster
Road to the Taxonomy RollercoasterRoad to the Taxonomy Rollercoaster
Road to the Taxonomy RollercoasterEnterprise Knowledge
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...Enterprise Knowledge
 
Scaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIEnterprise Knowledge
 
Making Knowledge Management Clickable
Making Knowledge Management ClickableMaking Knowledge Management Clickable
Making Knowledge Management ClickableEnterprise Knowledge
 
Building for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your CompanyBuilding for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your CompanyEnterprise Knowledge
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessEnterprise Knowledge
 
Introducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfIntroducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfEnterprise Knowledge
 
Road Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records ManagementRoad Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records ManagementEnterprise Knowledge
 
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph TechnologiesBuilding an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph TechnologiesEnterprise Knowledge
 
Identifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text AnalyticsIdentifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text AnalyticsEnterprise Knowledge
 
Taxonomy in the Age of Personalization
Taxonomy in the Age of PersonalizationTaxonomy in the Age of Personalization
Taxonomy in the Age of PersonalizationEnterprise Knowledge
 
Climbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge GraphClimbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge GraphEnterprise Knowledge
 
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...Enterprise Knowledge
 
Learning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a GraphLearning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a GraphEnterprise Knowledge
 

Mehr von Enterprise Knowledge (20)

Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial Intelligence
 
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding AmericaNonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding America
 
Road to the Taxonomy Rollercoaster
Road to the Taxonomy RollercoasterRoad to the Taxonomy Rollercoaster
Road to the Taxonomy Rollercoaster
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
 
Scaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AI
 
Making Knowledge Management Clickable
Making Knowledge Management ClickableMaking Knowledge Management Clickable
Making Knowledge Management Clickable
 
Building for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your CompanyBuilding for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your Company
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
 
Introducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfIntroducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdf
 
Road Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records ManagementRoad Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records Management
 
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph TechnologiesBuilding an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
 
Identifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text AnalyticsIdentifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text Analytics
 
Taxonomy in the Age of Personalization
Taxonomy in the Age of PersonalizationTaxonomy in the Age of Personalization
Taxonomy in the Age of Personalization
 
Climbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge GraphClimbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge Graph
 
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
 
Learning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a GraphLearning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a Graph
 

Kürzlich hochgeladen

BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxNeo4j
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 

Kürzlich hochgeladen (20)

BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 

Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024)

  • 1. Driving Behavioral Change for Information Management through Data-Driven Green Strategy A Case Study Urmi Majumder and Fernando Aguilar Islas EDW 2024
  • 2. Topics Covered ⬢ What is a Green Information Management (IM) Strategy, and why should you have one? ⬢ How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? ⬢ How can an organization use insights into their data to influence employee behavior for IM? ⬢ How can you reap additional benefits from content reduction that go beyond Green IM?
  • 3. ⬢ 15+ years of experience in enterprise system architecture, design, implementation and operations ⬢ Leads the development of technical solutions in support of wide variety of knowledge and data management solutions ⬢ Principal architect in knowledge graphs, enterprise AI, and scalable data management systems ⬢ Ph.D in Computer Science, Duke University Urmi Majumder Principal Data Architecture Consultant Fernando Aguilar Islas Data Science Consultant ⬢ 9+ years of experience serving as data scientist for graph-powered machine learning and AI-based solutions ⬢ Implemented several knowledge graph-based enterprise data catalogs ⬢ Experience leading and supporting the integration and implementation of 20+ data projects ⬢ MS, Applies Statistics, Penn State University ENTERPRISE KNOWLEDGE
  • 5. Green Information Management: Putting the “Green” in Information Management (IM) What is it? Why should enterprises have a green IM strategy? Green Information Management (gIM) is a strategic approach focused on optimizing and minimizing the environmental impact of information-related processes within an organization. Sustainability Reduce resource consumption and waste associated with IM practices. Cost Efficiency Reduce energy consumption through streamlined processes and optimized infrastructure. Compliance Address regulatory requirements to demonstrate adherence to green standards. Corporate Responsibility Commit to environmental stewardship.
  • 6. A supply-chain giant is committed to an organizational goal of becoming a Net-Zero Emissions Business. The organization realized that they have a huge digital carbon footprint due to proliferation of duplicate content – over 226K documents occupying ~1 PB of space – through use of content management systems and collaboration software such as SharePoint and Microsoft Teams that unintentionally build siloes because of a lack of visibility/awareness. Case Study: The Challenge
  • 7. The Solution AI-Powered Digital Carbon Footprint Calculator ORIGINAL STATE THE NEED SOLUTION ● Rules-based non-record deletion application deleting forgotten non-sensitive documents periodically. ● Algorithm could only delete documents that were not modified for at least 3 years and marked as non-records. But a lack of sharing culture meant most documents were unnecessarily marked as sensitive. ● Need to aggregate tens of primary sources with slightly different metadata and access levels, and yet are duplicate or near-duplicate content, to build a content similarity and resultant carbon footprint dashboard. ● Need to augment rule-based approach relying solely on metadata with AI relying on content similarity to identify duplicate and near- duplicate content. ● Implemented the data pipelines – and matching algorithms – to connect data siloed in different systems. ● Automated duplicate content identification to give the organization the ability to drill down into duplicate content-related findings across data sources and improve QA. ● Built a BI dashboard to provide a clear view into content duplication and its connection to CO2 emissions.
  • 8. The AI Connection: How to Use AI for Green IM
  • 9. ENTERPRISE KNOWLEDGE Overall Solution Phased Approach Phase I: Proof of Concept ● Refine use case, prioritize requirements and define KPIs ● Conduct Exploratory Data Analysis ● Develop Matching Algorithms ● Implement Content Deduplication Data Pipeline ● Implement CO2 Emissions BI Dashboard ● Track KPIs Phase II: Productionalization ● Scale Data Pipeline ● Enhance BI Dashboard to make it actionable ● Integrate Pipeline with Content Management System and Collaboration Software ● Develop broader Green IM strategy
  • 10. Metadata Ingestion ● Data Source Integration ● Metadata Extraction ● Content Extraction ● Content Vectorization using AI ● Rule-Based Metadata Similarity Analysis ● Stochastic Content Similarity Analysis ● Combining Metadata and Content Findings to identify duplicates and near-duplicates ● Duplicate Storage Impact Analysis ● Resultant CO2 Emissions Calculation ● BI Dashboard for summary statistics and drill down by key metadata fields Content Deduplication CO2 Emissions Viewer End-to-End Process Overview The AI Connection
  • 11. ENTERPRISE KNOWLEDGE Data Ingestion ● Source system identification ● Establishment of data crawlers that meet system-specific access requirements Duplicate Analysis Output ● Combination of rule-based and stochastic analysis to identify duplicates ● Resultant Storage Impact ● Resultant CO2 Emissions Metadata & Content Extraction ● Metadata Extraction from either source system or supporting metadata store ● Content extraction based on file type Matching Algorithm Execution ● Rules-based duplicate inferencing on content metadata ● Stochastic duplicate inferencing on content vectors Metadata Enrichment ● Use of reference data/taxonomy management system ● Content Vectorization via use of Generative AI Content Deduplication Process Content Deduplication Pipeline The AI Connection
  • 12. ENTERPRISE KNOWLEDGE Content Deduplication Pipeline Conceptual Architecture
  • 13. Minimize Data Movement Use Transformer Models for Vectorization Run Analysis Pipelines in Cloud Infrastructure ⬢ Reduce energy consumption from duplicate content storage and data transfer ⬢ No copies – extract content from original file for in-memory processing ⬢ < 100m parameters (e.g., DistilBERT) ⬢ Use less memory and storage space due to smaller model size ⬢ Take advantage of resource efficiency at scale ⬢ Use under-utilized regions (e.g., Azure Norway East region) or regions powered by 100% renewable energy (e.g., AWS US East 2) Training OpenAI’s GPT 3.5 requires 1K GPU processors running in parallel for weeks at a time Content Deduplication Pipeline Green Application Development Considerations
  • 14. Reconcile high volume of distinct content items to significantly lower number of unique content items across silo-ed systems Give users clear view into content duplication and its connection to CO2 emissions through meaningful dashboard Establish a plug-n-play architecture for extracting content from many file types and vectorizing the same using multiple Generative AI models to best align the content similarity pipeline to the organizational needs Benefits of the AI-Powered Digital Carbon Footprint Calculator
  • 15. The Power of Data: Drive Social Changes Through Data
  • 16. Size of the Opportunity An estimated* 50% of corporate data is duplicated across the organization Real World Example ⬢ An email server contains 100 instances of the same 1 MB file attachment sent to 100 people ⬢ Without content deduplication, if all 100 people backup their mailboxes, it would consume 100 MB of storage In the supply-chain organization, ~226K documents occupied ~1 PB of storage, resulting in 228 tonnes of CO2 emissions. 34 tonnes CO2 15% content reduction through duplicate identification *equivalent to 20 flights from JFK to LHR * https://www.xillio.com/blog/recognize-duplicate-folder-structures-with-xillio-insights
  • 17. Why is content duplication so prevalent in the enterprise? NON-DELIBERATE action on part of the user ● Users forget a document exists and recreates it ● Users cannot find what they are looking for and creates it from scratch ● Users save email attachments, sometimes the same file multiple times ● Users downloads files from the intranet, sometimes the same file multiple times DELIBERATE action on part of the user ● Maintain backup copy ● Copy file for easier transfer/distribution ● Use separate files for different document versions
  • 18. Defensible Deletion ● Redundant, obsolete, trivial data held on by users just in case ● Non-record deletion policy in an organization can save storage space by deleting documents not marked as records that have not been modified for a predefined period Barriers to Automated Content Removal ● Content incorrectly marked as records due to lack of proper compliance training ● Content marked with higher sensitivity labels because of knowledge hoarding culture ● Content duplicated to associate different access permissions due to limited cross departmental collaboration Automated Content Renewal
  • 19. DATA-DRIVEN USER BEHAVIOR CHANGE: Goals “Educate and empower to influence positive behavior change.” Educate ● Facilitate self-directed and social learning opportunities for green information management Empower ● Facilitate evidence-based decision by offering easy-access to personal CO2 emissions viewer ● Propel user into action by equipping him with the right interactive tool to act on the findings in the flow of work ● Provide the data needed to identify personal emissions trends and a way to track progress over time
  • 20. ENTERPRISE KNOWLEDGE Pilot CO2 Emissions Viewer: Demo Time!
  • 21. DATA-DRIVEN USER BEHAVIOR CHANGE: Recommended Actions “Educate and empower to influence positive behavior change.” Educate ● Educate users to use links instead of attachments for file sharing ● Educate users on componentized content management Empower ● Provide accurate data ○ Establish KPIs measuring accuracy of duplicate detection pipeline ● Frame up the data in the context of the bigger picture ○ Enable visualization of immediate CO2 emission reduction as a result of deduplication ○ Enable visualization of impact of content reduction over time ○ Enable visualization of a personal digital footprint counter for unique content over time ● Create a “don’t make me think” experience with push-of-a-button actions available in the end-user application ○ Enable system triggers to remove content through the application interface
  • 23. Generative AI (LLMs) 05 ● Increase efficiency in RAG applications by removing noise and bias ● Decrease costs associated with vectorizing content Legal and Regulatory Compliance 04 ● Reduced exposure to copyrights, trademarks, or other intellectual property rights violations ● Decrease the risk of privacy breaches, as they may contain sensitive information that can be accessed by unauthorized parties Content Auditing and Analysis 03 ● Identify redundant or obsolete data ● Surface similar content content with different associated metadata Cloud Data Migration 02 ● Minimizes the volume of data to be transferred, optimizing network bandwidth and reducing associated costs and energy consumption ● Lower operational costs and environmental impact Mergers and Acquisitions 01 ● Content deduplication streamlines the integration of data from merged entities, ensuring a more efficient and sustainable data consolidation process ● Lowering infrastructure costs and minimizing environmental impact through efficient content management practices Content Deduplication Use Cases
  • 24. ENTERPRISE KNOWLEDGE Questions? Thank you for listening. We are happy to take any questions at this time. Urmi Majumder umajumder@enterprise-knowledge .com www.linkedin.com/in/urmim/ Fernando Aguilar Islas fislas@enterprise-knowledge.com www.linkedin.com/in/feraguilaris/