SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
BP 301: What’s your second
most valuable asset and
nearly doubles every year?
Henning Kunz, panagenda Consulting
Florian Vogler, panagenda
Introduction
 Henning Kunz
– For about 20 years Services and Consulting guy in the Collaboration space
– More infrastructure than development
– With panagenda more and more analytics as a basis for
agile transformation projects
 Florian Vogler
– For almost all his life Client Management guru
– Development and infrastructure
– panagendas visionary figurehead
Agenda
 Speaking of the 2nd most valuable asset and introduction
 Why are we doing this?
 Where in the world are files?
 Collecting BIG data – Basics
 Statistics – Basics
 Collecting from the file system
 Collecting from IBM Notes & Domino
 Sample reports
 Possibilities are endless (this session is not)
Before we start with the introduction
 Answer to 2nd most valuable asset
 1st most valuable asset?
What can you expect from this session?
 Thoughts on companies file inventory
 Some code snippets to gain inventory information
 Demo is based on inventory information collected from our personal production notebooks
(and a demo backend system) using the code snippets
– Visualization is prepared using a Visual Analytics Tool
 Some ideas on how to use the outcome
FILES ARE
EVERYWHERE
A file – from easy …
 In the easiest sense, a file has
– a potentially mind-boggling number of
attributes, e.g.
• folder structure
• filename
• size
– Content (which may result in attributes, too)
A file – … to complex
 Content is king!
– Zip files – header vs. files vs. file
• Zipping the same files twice creates a unique hash for both zip files …
– Office files (pptx, xlsx, …)
• Contains a lot of information „inside“
Why are we doing this (=Why are files so important / interesting)?
 Storage Amount = Storage (and backup!) Cost
– Increase free disk space, Reduce cost
– Beware of DAOS, Centera, … before you get too excited
 Understand which (types of) files are created (rather: originated), updated, …
 … and by whom  identify knowledge / working-together clusters  Social Business
Going further (not covered in this session)
 Security & Compliance
 Content
 Beyond Windows (Linux, Mac, Mobile, …)
Mostly for French and German attendees
 Some of the use cases and examples covered could be a problem with regards to
Worker‘s Council regulations
  Rethink use case without end user information
– E.g. instead of „who all has (created) PowerPoint files“  „how many PowerPoint files do we
have across how many users (min/avg/max – without information about actual end users)
For everyone: Things to be aware of
 The name of a file (or folder) can be a big problem on its own
– 2015-01-27_money_transfers_to_carribean_account_789XA3_PW_richmaker.xls
– Layoff_in_german_office_Q2_2015.docx
– Increase_salary_of_mr_jones_to_200000.txt
 The mere existence of a file (or folder) can create (at least an ethical) problem on its own
– On someone‘s laptop you find confidential, unauthorized, inappropriate information
• e.g. internal DWG (CAD) files, a copy of the meeting minutes from the last meeting of the board
of management, customer data, performance figures, …
– And now?
Where files are stored
 „Local“ file system
– „Fixed“ disks (C:, D:, …)
– Local removable disks - A:, B:, USB Sticks, CD-Rom, …
 Network file system
– Mounted / mapped / UNC / synched (offline files)
– File server
 NSFs (Email / Applications)
– Local (with or without consistent ACL, with or without DB level encryption)
– Server
– Beware of reader fields, author fields, …
 Connections Files, FileNet, Documentum, SharePoint, Dropbox, Teamdrive, …
How to collect: WYSIWYG or AYCE
 “WYSIWYG”
– Local execution = in context of current OS user
• Other users have to login, too (may never happen)
– Network scanning in context of current OS user
• Shared network drives across departments/company
 “AYCE”
– Local execution as Admin (e.g. with SuRunAs)
• Includes Windows profiles from all users
– Batch network scanning
– Root mount scanning
What to collect
 Simple File attributes
– Name, “extension”, size, created, last modified, … (Dates and Time zoning!)
 Complex (but much more useful) file attributes
– Office properties like Author, Subject, last printed, last whatever, …
– Zip / Rar / 7z / gzip / …
– (e.g. MD5) hash (same  same vs. similar)
 Very complex file attributes
– Security (R/W/…) – NSF & File system
– Fingerprints (“Linux magic numbers”)
 Hilariously complex: Content (also: similar instead of just same)
Mission impossible
 “Impossible” File attributes
– Not accessible
– Not visible from viewpoint of scanner
– Not used (e.g. multiuser PCs where a user doesn’t log on again)
– Encrypted (e.g. Zip with password)
Examples of what not to do
 Do not harm human beings, animals, plants or goods with your findings
– Be good, do good, be a hero!
 Do not analyze for files with same filename
– Approx. 60-70% of all files on a single machine
 Do not just delete duplicates
 Also: do not do nothing
A VERY SHORT
STATISTICS PITCH
Frequency distribution
 In statistics, a frequency distribution is a table that displays the frequency of various
outcomes in a sample.
 i.e. session survey feedback by 100 session participants
Answer COUNT
Speaker skill was brilliant 15
Speaker skill was good 60
Speaker skill was ok 12
Speaker skill was somewhat poor 8
Speaker skill was very poor 5
Grouped data
 A raw dataset can be organized by constructing a table showing the frequency distribution
of the variable (whose values are given in the raw dataset). Such a frequency table is often
referred to as grouped data.
 i.e. time taken to answer a survey by 15 participants
 sorted in symmetric intervals (bins) or qualitative characteristics
Time taken [s] 10 11 9 10 14 20 11 9 14 10 9 13 12 21 24
Interval Count
<5 s 0
5s<=t<10s 3
10s<=t<15s 9
15s<=t<20s 0
20s<=t<25s 3
Interval Count
Fast <10s 3
Normal 10s<=t<20s 9
Slow >=20s 3
Histogram
A histogram is a graphical representation of the distribution of data. To construct a
histogram, the first step is to "bin" the range of values and then count how many values fall
into each interval.
i.e. time needed in [s] to rush from Dolphin Southern Hemisphere 1 to Swan Mockingbird 1-2
(Sample of 50 Participants)
rushtime[s] Count
140 1
150 2
160 5
170 10
180 13
190 11
200 6
210 1
220 0
230 1
0
2
4
6
8
10
12
14
140 150 160 170 180 190 200 210 220 230
Count
Rushtime [s]
197 187 186 179 156
179 181 173 188 188
163 202 174 178 193
169 192 170 185 172
192 169 179 174 164
181 161 137 204 167
198 185 186 148 148
185 197 231 175 184
176 175 176 187 210
180 174 180 204 158
Bin and
Count
Collect/Measure
SCAN FILESYSTEMS
Local
 Scan local Windows based drives
(locally mounted hard disks, portable drives or mounted)
 Using PowerShell
– Script 1. Collect file system information with MD5 and SHA1 hashes
– Needs PowerShell V4
– Uses: Scripting.FileSystemObject, get-acl cmdlet, get-hash cmdlet
– Run locally with ‘super user’ rights
 3 Result files
– Folders (Folder Path, LastWriteTime, Size, FileCount, Depth , FolderName)
– ACLs (Folder Path, IdentityReference, AccessControlType)
– Files (Folder Path, FileName, CreationTime, LastWriteTime, Size, Extension, MD5, SHA1)
A short note on PowerShell Execution Policy
 There is something like execution security in PowerShell
 Execution Policy is set to undefined by default
– Thus it permits individual commands from console, but will not run scripts
 Policytypes
– Restricted, AllSigned, RemoteSigned, Unrestricted, Bypass, Undefined
 Scope
– Local Workstation ,CurrentUser, Process
A short note on PowerShell Execution Policy
 To see current settings
get-ExecutionPolicy –List
 To set
set-ExecutionPolicy RemoteSigned –Scope CurrentUser
 RemoteSigned allows execution of “own” unsigned scripts
– “own” means scripts written/edited/saved in PowerShell ISE
on local machine
– we will not talk about signing PowerShell scripts in this session,
its not like “sign using current users id”
http://technet.microsoft.com/en-us/library/hh847748.aspx
PowerShell Snippet
Enhancement: Collecting Office attributes for .doc* files
 Scan local Widows based drives
(locally mounted hard disks, portable drives or mounted )
 Using PowerShell
– Script 2. Collect file system information with MD5 and SHA1 hashes and .doc* attributes
– Uses: -ComObject Word.Application
BuiltInDocumentProperties
 3 Result files
– Folders (Folder Path, LastWriteTime, Size, FileCount, Depth , FolderName)
– ACLs (Folder Path, IdentityReference, AccessControlType)
– Files (Folder Path, FileName, CreationTime, LastWriteTime, Size, Extension, MD5, SHA1,
Created, Author, Title, Last print date)
Snippet 2
BuiltinDocumentProperties
1 Title
2 Subject
3 Author
4 Keywords
5 Comments
6 Template
7 Last author
8 Revision number
9 Application name
10 Last print date
11 Creation date
12 Last save time
13 Total editing time
14 Number of pages
15 Number of words
16 Number of characters
17 Security
18 Category
19 Format
20 Manager
21 Company
22 Number of bytes
23 Number of lines
24 Number of paragraphs
25 Number of slides
26 Number of notes
27 Number of hidden Slides
28 Number of multimedia clips
29 Hyperlink base
30 Number of characters (with spaces)
Collecting inventory from “Fileserver 2.0”
 Scan SharePoint Inventory
 Using PowerShell
– Script 3. Collect item information from SharePoint Server
– Uses: SharePoint cmdlets
– Result: Web Application, Site, Web, List, Item ID, Item URL, Item Title, Item Created,
Item Modified, File Size, Author, Versions, Filename
Snippet 3
SCAN FILES IN NSF
CONTAINERS
IBM Notes & Domino
 NSFs (Email / Applications)
– Local (with or without consistent ACL, with or without DB level encryption)
– Server
– ACL, reader fields, author fields, document / field encryption, …
– zip-file content
– Fields in general (Subject, from, to, cc:, bcc:, created, modified, Body, …)
• The Subject of a Notes document can be just as problematic as the name of a file (attachment)
• Actually this may apply to pretty much any field
• Note: Message Tracking ID
– ATTNQ# (today‘s *00#.*)
Fs_free_main.exe ConnectED 2015 Edition
 Special Stand-alone version to scan local file system and nsf files
 Inspects zip file content (deliberately limited to filesystem)
 Runs from command line with parameters
– Uses local notes.ini and user.id / server.id
– Therefore in security context of used id-file (ACLs, Reader Fields, DB/Document Encryption)
– Lists (unprotected) zip file content
– Based on C-API
 Result: Path,Size,Modified,md5,sha-1
CHART TIME
….EXAMPLE RESULTS DEMO…
Script 1: 16,728 folders
127,000 files
Script 2: 1,150 doc files
Script 3: 1,316 SP files
Fs.freemain: 1,200,000 records
(250 MB)
POSSIBILITIES ARE
ENDLESS….
Beyond the shown
 Until now we just analyzed what's out there
 How could we use that information?
 Lets think about some interesting use cases
File Server Migrations – File Consolidations
 Use the analysis to understand your file inventory
 With respect to
– File types  which files fit into the target system
(i.e. office files, pdf, jpg, png, wav versus xml, properties, files from non office applications)
– And their
• Volume distribution
• Count distribution
– Uniqueness of local files
– Time stamps (retention, usage hint)
 And act/size based on that information
Suggest Community Clusters
 Based on analysis outcomes
– Inventory overlap
– Same authors, editors
– Same access rights
– Metadata
 Think of it as a one time functionality to rearrange your files world in the first step
 Could be used in the context of an attachment like
SwiftFile* in the second step
– may require content analysis
*http://www-01.ibm.com/support/docview.wss?uid=swg24034409
Companies File Locations
 You do not have to store this file again….
 As a hint for a so far unknown collaboration cluster/ community
 Used in the context of an attachment inside notes
– Shows all MD5 identical files found at formerly scanned locations inside the company
 Biggest challenges
– Real time performance (needs ongoing periodic scanning of all sources)
– Security trimming
(the accounts & groups of all scanned sources have to be resolved/mapped)
THANK YOU
NOTE: POSSIBILITIES ARE ENDLESS – MORESO BEYOND FILES
florian.vogler@panagenda.com, henning.kunz@panagenda.com
come and visit us in the TechnOasis #PED G3 A-C!
Download the latest slide deck and code snippets www.panagenda.com/connected2015files

Weitere ähnliche Inhalte

Was ist angesagt?

Management file and directory in linux
Management file and directory in linuxManagement file and directory in linux
Management file and directory in linux
Zkre Saleh
 
UserGuideHDFS_FinalDocument
UserGuideHDFS_FinalDocumentUserGuideHDFS_FinalDocument
UserGuideHDFS_FinalDocument
Anna Ellis
 
ACH 245 Lecture 01 (Fundamentals) Vista
ACH 245 Lecture 01 (Fundamentals) VistaACH 245 Lecture 01 (Fundamentals) Vista
ACH 245 Lecture 01 (Fundamentals) Vista
guest4eaf048
 

Was ist angesagt? (19)

storage and file structure
storage and file structurestorage and file structure
storage and file structure
 
Management file and directory in linux
Management file and directory in linuxManagement file and directory in linux
Management file and directory in linux
 
Dspace
DspaceDspace
Dspace
 
Ch11 file system implementation
Ch11   file system implementationCh11   file system implementation
Ch11 file system implementation
 
11.file system implementation
11.file system implementation11.file system implementation
11.file system implementation
 
UserGuideHDFS_FinalDocument
UserGuideHDFS_FinalDocumentUserGuideHDFS_FinalDocument
UserGuideHDFS_FinalDocument
 
File management
File managementFile management
File management
 
Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems
 
ACH 245 Lecture 01 (Fundamentals) Vista
ACH 245 Lecture 01 (Fundamentals) VistaACH 245 Lecture 01 (Fundamentals) Vista
ACH 245 Lecture 01 (Fundamentals) Vista
 
OSCh11
OSCh11OSCh11
OSCh11
 
44CON London 2015: NTFS Analysis with PowerForensics
44CON London 2015: NTFS Analysis with PowerForensics44CON London 2015: NTFS Analysis with PowerForensics
44CON London 2015: NTFS Analysis with PowerForensics
 
Linux directory structure by jitu mistry
Linux directory structure by jitu mistryLinux directory structure by jitu mistry
Linux directory structure by jitu mistry
 
Ntfs and computer forensics
Ntfs and computer forensicsNtfs and computer forensics
Ntfs and computer forensics
 
Windows Registry Analysis
Windows Registry AnalysisWindows Registry Analysis
Windows Registry Analysis
 
File system discovery
File system discovery File system discovery
File system discovery
 
Disk scheduling & Disk management
Disk scheduling & Disk managementDisk scheduling & Disk management
Disk scheduling & Disk management
 
AdvFS/Advanced File System Ccncepts
AdvFS/Advanced File System CcnceptsAdvFS/Advanced File System Ccncepts
AdvFS/Advanced File System Ccncepts
 
Aties Presentation
Aties PresentationAties Presentation
Aties Presentation
 
System calls operating system ppt by rohit malav
System calls operating system  ppt by rohit malavSystem calls operating system  ppt by rohit malav
System calls operating system ppt by rohit malav
 

Andere mochten auch

AusLug2012 - A performance boost for your notes client
AusLug2012 - A performance boost for your notes clientAusLug2012 - A performance boost for your notes client
AusLug2012 - A performance boost for your notes client
panagenda
 
AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?
AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?
AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?
panagenda
 

Andere mochten auch (20)

DanNotes 2014 - A Performance Boost for your IBM Notes Client
DanNotes 2014 - A Performance Boost for your IBM Notes ClientDanNotes 2014 - A Performance Boost for your IBM Notes Client
DanNotes 2014 - A Performance Boost for your IBM Notes Client
 
Domino, Exchange, O365: Ihre Email Daten sind Gold wert - Kinoforum 2016
Domino, Exchange, O365: Ihre Email Daten sind Gold wert - Kinoforum 2016Domino, Exchange, O365: Ihre Email Daten sind Gold wert - Kinoforum 2016
Domino, Exchange, O365: Ihre Email Daten sind Gold wert - Kinoforum 2016
 
AusLug2012 - A performance boost for your notes client
AusLug2012 - A performance boost for your notes clientAusLug2012 - A performance boost for your notes client
AusLug2012 - A performance boost for your notes client
 
Lotusphere 2012: BP110 Performance Boost for your Notes Client
Lotusphere 2012: BP110 Performance Boost for your Notes ClientLotusphere 2012: BP110 Performance Boost for your Notes Client
Lotusphere 2012: BP110 Performance Boost for your Notes Client
 
Connect2014 BP105: Performance Boost for your IBM Notes Client
Connect2014 BP105: Performance Boost for your IBM Notes ClientConnect2014 BP105: Performance Boost for your IBM Notes Client
Connect2014 BP105: Performance Boost for your IBM Notes Client
 
BP105 - A Performance Boost for your IBM Lotus Notes Client
BP105 - A Performance Boost for your IBM Lotus Notes ClientBP105 - A Performance Boost for your IBM Lotus Notes Client
BP105 - A Performance Boost for your IBM Lotus Notes Client
 
BP302: Future Proofing Enterprise IT
BP302: Future Proofing Enterprise IT BP302: Future Proofing Enterprise IT
BP302: Future Proofing Enterprise IT
 
Domino Statistiken verstehen und nutzen (Teil 1) - 41. DNUG Konferenz
Domino Statistiken verstehen und nutzen (Teil 1) - 41. DNUG KonferenzDomino Statistiken verstehen und nutzen (Teil 1) - 41. DNUG Konferenz
Domino Statistiken verstehen und nutzen (Teil 1) - 41. DNUG Konferenz
 
Going Cloud - warum und wie? - 42. DNUG
Going Cloud - warum und wie? - 42. DNUGGoing Cloud - warum und wie? - 42. DNUG
Going Cloud - warum und wie? - 42. DNUG
 
Domino Statistiken (noch besser) verstehen und nutzen (Teil 2) - 41. DNUG 2014
 Domino Statistiken (noch besser) verstehen und nutzen (Teil 2) - 41. DNUG 2014 Domino Statistiken (noch besser) verstehen und nutzen (Teil 2) - 41. DNUG 2014
Domino Statistiken (noch besser) verstehen und nutzen (Teil 2) - 41. DNUG 2014
 
1693: 21 Ways to Make Your Data Work for You - IBM Connect 2016
1693: 21 Ways to Make Your Data Work for You - IBM Connect 20161693: 21 Ways to Make Your Data Work for You - IBM Connect 2016
1693: 21 Ways to Make Your Data Work for You - IBM Connect 2016
 
1050: TDI Solutions Best Practises with IBM Connections Deployments - IBM Con...
1050: TDI Solutions Best Practises with IBM Connections Deployments - IBM Con...1050: TDI Solutions Best Practises with IBM Connections Deployments - IBM Con...
1050: TDI Solutions Best Practises with IBM Connections Deployments - IBM Con...
 
BP1491: Virtual, Faster, Better - How to Virtualize the Rich Client and Brows...
BP1491: Virtual, Faster, Better - How to Virtualize the Rich Client and Brows...BP1491: Virtual, Faster, Better - How to Virtualize the Rich Client and Brows...
BP1491: Virtual, Faster, Better - How to Virtualize the Rich Client and Brows...
 
SI1692: When Lightning Strikes Collaboration - IBM Connect 2016
SI1692: When Lightning Strikes Collaboration - IBM Connect 2016SI1692: When Lightning Strikes Collaboration - IBM Connect 2016
SI1692: When Lightning Strikes Collaboration - IBM Connect 2016
 
Connect 2014: ID112: Domino Policies: Deep Dive and Best Practices
Connect 2014: ID112: Domino Policies: Deep Dive and Best PracticesConnect 2014: ID112: Domino Policies: Deep Dive and Best Practices
Connect 2014: ID112: Domino Policies: Deep Dive and Best Practices
 
AD1387: Outside The Box: Integrating with Non-Domino Apps using XPages and Ja...
AD1387: Outside The Box: Integrating with Non-Domino Apps using XPages and Ja...AD1387: Outside The Box: Integrating with Non-Domino Apps using XPages and Ja...
AD1387: Outside The Box: Integrating with Non-Domino Apps using XPages and Ja...
 
AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?
AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?
AdminCamp 2014: Quo Vadis – Wohin mit IBM Notes/Domino?
 
Soccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM ConnectionsSoccnx10: Best and worst practices deploying IBM Connections
Soccnx10: Best and worst practices deploying IBM Connections
 
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
BP107: Ten Lines Or Less: Interesting Things You Can Do In Java With Minimal ...
 
SUTOL 2016: IBM Connections Deployment Best and Worst Practices
SUTOL 2016: IBM Connections Deployment Best and Worst PracticesSUTOL 2016: IBM Connections Deployment Best and Worst Practices
SUTOL 2016: IBM Connections Deployment Best and Worst Practices
 

Ähnlich wie BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Internal lunchtime-learning--2013 jan22--data management
Internal lunchtime-learning--2013 jan22--data managementInternal lunchtime-learning--2013 jan22--data management
Internal lunchtime-learning--2013 jan22--data management
Miles Baltrusaitis
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...
samalanmeister
 
SQL Server Integration Services Tips & Tricks
SQL Server Integration Services Tips & TricksSQL Server Integration Services Tips & Tricks
SQL Server Integration Services Tips & Tricks
Guillermo Caicedo
 
AntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdfAntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdf
ekobelasting
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
aaroncollie
 

Ähnlich wie BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year? (20)

Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object
 
Internal lunchtime-learning--2013 jan22--data management
Internal lunchtime-learning--2013 jan22--data managementInternal lunchtime-learning--2013 jan22--data management
Internal lunchtime-learning--2013 jan22--data management
 
Disk Image!...and then what? Strategies for sustainable long-term storage an...
Disk Image!...and then what?  Strategies for sustainable long-term storage an...Disk Image!...and then what?  Strategies for sustainable long-term storage an...
Disk Image!...and then what? Strategies for sustainable long-term storage an...
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
ConnectED 2015 BP302: Future-Proofing Enterprise IT
ConnectED 2015 BP302: Future-Proofing Enterprise ITConnectED 2015 BP302: Future-Proofing Enterprise IT
ConnectED 2015 BP302: Future-Proofing Enterprise IT
 
Windowsforensics
WindowsforensicsWindowsforensics
Windowsforensics
 
Ach 245 Lecture 01 (Fundamentals) Vista
Ach 245 Lecture 01 (Fundamentals) VistaAch 245 Lecture 01 (Fundamentals) Vista
Ach 245 Lecture 01 (Fundamentals) Vista
 
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
 
Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...
 
File
FileFile
File
 
SQL Server Integration Services Tips & Tricks
SQL Server Integration Services Tips & TricksSQL Server Integration Services Tips & Tricks
SQL Server Integration Services Tips & Tricks
 
009709863.pdf
009709863.pdf009709863.pdf
009709863.pdf
 
AntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdfAntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdf
 
Disk forensics for the lazy and the smart
Disk forensics for the lazy and the smartDisk forensics for the lazy and the smart
Disk forensics for the lazy and the smart
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
 
Accessioning Born-Digital Materials
Accessioning Born-Digital MaterialsAccessioning Born-Digital Materials
Accessioning Born-Digital Materials
 
Unit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-recordsUnit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-records
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 

Mehr von panagenda

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...
Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...
Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...
panagenda
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
panagenda
 
Why you need monitoring to keep your Microsoft 365 journey successful
Why you need monitoring to keep your Microsoft 365 journey successfulWhy you need monitoring to keep your Microsoft 365 journey successful
Why you need monitoring to keep your Microsoft 365 journey successful
panagenda
 
Workshop: HCL Notes 14 Upgrades einfach gemacht – von A bis Z
Workshop: HCL Notes 14 Upgrades einfach gemacht – von A bis ZWorkshop: HCL Notes 14 Upgrades einfach gemacht – von A bis Z
Workshop: HCL Notes 14 Upgrades einfach gemacht – von A bis Z
panagenda
 
How to Perform HCL Notes 14 Upgrades Smoothly
How to Perform HCL Notes 14 Upgrades SmoothlyHow to Perform HCL Notes 14 Upgrades Smoothly
How to Perform HCL Notes 14 Upgrades Smoothly
panagenda
 
The Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad WebThe Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad Web
panagenda
 
Die ultimative Anleitung für HCL Nomad Web Administratoren
Die ultimative Anleitung für HCL Nomad Web AdministratorenDie ultimative Anleitung für HCL Nomad Web Administratoren
Die ultimative Anleitung für HCL Nomad Web Administratoren
panagenda
 
Wie man HCL Nomad eine moderne User Experience verschafft
Wie man HCL Nomad eine moderne User Experience verschafftWie man HCL Nomad eine moderne User Experience verschafft
Wie man HCL Nomad eine moderne User Experience verschafft
panagenda
 
Im Praxistest – Microsoft Teams Performance im hybriden Arbeitsalltag
Im Praxistest – Microsoft Teams Performance im hybriden ArbeitsalltagIm Praxistest – Microsoft Teams Performance im hybriden Arbeitsalltag
Im Praxistest – Microsoft Teams Performance im hybriden Arbeitsalltag
panagenda
 

Mehr von panagenda (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
De05_panagenda_Prepare-Applications-for-64-bit-Clients.pdf
De05_panagenda_Prepare-Applications-for-64-bit-Clients.pdfDe05_panagenda_Prepare-Applications-for-64-bit-Clients.pdf
De05_panagenda_Prepare-Applications-for-64-bit-Clients.pdf
 
Co01_panagenda_NotesDomino-Licensing-Understand-and-Optimize-DLAU-results-wit...
Co01_panagenda_NotesDomino-Licensing-Understand-and-Optimize-DLAU-results-wit...Co01_panagenda_NotesDomino-Licensing-Understand-and-Optimize-DLAU-results-wit...
Co01_panagenda_NotesDomino-Licensing-Understand-and-Optimize-DLAU-results-wit...
 
Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...
Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...
Ad01_Navigating-HCL-Notes-14-Upgrades_A-Comprehensive-Guide-for-Conquering-Ch...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Why you need monitoring to keep your Microsoft 365 journey successful
Why you need monitoring to keep your Microsoft 365 journey successfulWhy you need monitoring to keep your Microsoft 365 journey successful
Why you need monitoring to keep your Microsoft 365 journey successful
 
Developer Special: How to Prepare Applications for Notes 64-bit Clients
Developer Special: How to Prepare Applications for Notes 64-bit ClientsDeveloper Special: How to Prepare Applications for Notes 64-bit Clients
Developer Special: How to Prepare Applications for Notes 64-bit Clients
 
Everything You Need to Know About HCL Notes 14
Everything You Need to Know About HCL Notes 14Everything You Need to Know About HCL Notes 14
Everything You Need to Know About HCL Notes 14
 
Alles was Sie über HCL Notes 14 wissen müssen
Alles was Sie über HCL Notes 14 wissen müssenAlles was Sie über HCL Notes 14 wissen müssen
Alles was Sie über HCL Notes 14 wissen müssen
 
Workshop: HCL Notes 14 Upgrades einfach gemacht – von A bis Z
Workshop: HCL Notes 14 Upgrades einfach gemacht – von A bis ZWorkshop: HCL Notes 14 Upgrades einfach gemacht – von A bis Z
Workshop: HCL Notes 14 Upgrades einfach gemacht – von A bis Z
 
How to Perform HCL Notes 14 Upgrades Smoothly
How to Perform HCL Notes 14 Upgrades SmoothlyHow to Perform HCL Notes 14 Upgrades Smoothly
How to Perform HCL Notes 14 Upgrades Smoothly
 
The Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad WebThe Ultimate Administrator’s Guide to HCL Nomad Web
The Ultimate Administrator’s Guide to HCL Nomad Web
 
Die ultimative Anleitung für HCL Nomad Web Administratoren
Die ultimative Anleitung für HCL Nomad Web AdministratorenDie ultimative Anleitung für HCL Nomad Web Administratoren
Die ultimative Anleitung für HCL Nomad Web Administratoren
 
Bring the Modern and Seamless User Experience You Deserve to HCL Nomad
Bring the Modern and Seamless User Experience You Deserve to HCL NomadBring the Modern and Seamless User Experience You Deserve to HCL Nomad
Bring the Modern and Seamless User Experience You Deserve to HCL Nomad
 
Wie man HCL Nomad eine moderne User Experience verschafft
Wie man HCL Nomad eine moderne User Experience verschafftWie man HCL Nomad eine moderne User Experience verschafft
Wie man HCL Nomad eine moderne User Experience verschafft
 
Im Praxistest – Microsoft Teams Performance im hybriden Arbeitsalltag
Im Praxistest – Microsoft Teams Performance im hybriden ArbeitsalltagIm Praxistest – Microsoft Teams Performance im hybriden Arbeitsalltag
Im Praxistest – Microsoft Teams Performance im hybriden Arbeitsalltag
 
Hybrid Environments and What They Mean for HCL Notes and Nomad
Hybrid Environments and What They Mean for HCL Notes and NomadHybrid Environments and What They Mean for HCL Notes and Nomad
Hybrid Environments and What They Mean for HCL Notes and Nomad
 

Kürzlich hochgeladen

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

  • 1. BP 301: What’s your second most valuable asset and nearly doubles every year? Henning Kunz, panagenda Consulting Florian Vogler, panagenda
  • 2. Introduction  Henning Kunz – For about 20 years Services and Consulting guy in the Collaboration space – More infrastructure than development – With panagenda more and more analytics as a basis for agile transformation projects  Florian Vogler – For almost all his life Client Management guru – Development and infrastructure – panagendas visionary figurehead
  • 3. Agenda  Speaking of the 2nd most valuable asset and introduction  Why are we doing this?  Where in the world are files?  Collecting BIG data – Basics  Statistics – Basics  Collecting from the file system  Collecting from IBM Notes & Domino  Sample reports  Possibilities are endless (this session is not)
  • 4. Before we start with the introduction  Answer to 2nd most valuable asset  1st most valuable asset?
  • 5. What can you expect from this session?  Thoughts on companies file inventory  Some code snippets to gain inventory information  Demo is based on inventory information collected from our personal production notebooks (and a demo backend system) using the code snippets – Visualization is prepared using a Visual Analytics Tool  Some ideas on how to use the outcome
  • 7. A file – from easy …  In the easiest sense, a file has – a potentially mind-boggling number of attributes, e.g. • folder structure • filename • size – Content (which may result in attributes, too)
  • 8. A file – … to complex  Content is king! – Zip files – header vs. files vs. file • Zipping the same files twice creates a unique hash for both zip files … – Office files (pptx, xlsx, …) • Contains a lot of information „inside“
  • 9. Why are we doing this (=Why are files so important / interesting)?  Storage Amount = Storage (and backup!) Cost – Increase free disk space, Reduce cost – Beware of DAOS, Centera, … before you get too excited  Understand which (types of) files are created (rather: originated), updated, …  … and by whom  identify knowledge / working-together clusters  Social Business Going further (not covered in this session)  Security & Compliance  Content  Beyond Windows (Linux, Mac, Mobile, …)
  • 10. Mostly for French and German attendees  Some of the use cases and examples covered could be a problem with regards to Worker‘s Council regulations   Rethink use case without end user information – E.g. instead of „who all has (created) PowerPoint files“  „how many PowerPoint files do we have across how many users (min/avg/max – without information about actual end users)
  • 11. For everyone: Things to be aware of  The name of a file (or folder) can be a big problem on its own – 2015-01-27_money_transfers_to_carribean_account_789XA3_PW_richmaker.xls – Layoff_in_german_office_Q2_2015.docx – Increase_salary_of_mr_jones_to_200000.txt  The mere existence of a file (or folder) can create (at least an ethical) problem on its own – On someone‘s laptop you find confidential, unauthorized, inappropriate information • e.g. internal DWG (CAD) files, a copy of the meeting minutes from the last meeting of the board of management, customer data, performance figures, … – And now?
  • 12. Where files are stored  „Local“ file system – „Fixed“ disks (C:, D:, …) – Local removable disks - A:, B:, USB Sticks, CD-Rom, …  Network file system – Mounted / mapped / UNC / synched (offline files) – File server  NSFs (Email / Applications) – Local (with or without consistent ACL, with or without DB level encryption) – Server – Beware of reader fields, author fields, …  Connections Files, FileNet, Documentum, SharePoint, Dropbox, Teamdrive, …
  • 13. How to collect: WYSIWYG or AYCE  “WYSIWYG” – Local execution = in context of current OS user • Other users have to login, too (may never happen) – Network scanning in context of current OS user • Shared network drives across departments/company  “AYCE” – Local execution as Admin (e.g. with SuRunAs) • Includes Windows profiles from all users – Batch network scanning – Root mount scanning
  • 14. What to collect  Simple File attributes – Name, “extension”, size, created, last modified, … (Dates and Time zoning!)  Complex (but much more useful) file attributes – Office properties like Author, Subject, last printed, last whatever, … – Zip / Rar / 7z / gzip / … – (e.g. MD5) hash (same  same vs. similar)  Very complex file attributes – Security (R/W/…) – NSF & File system – Fingerprints (“Linux magic numbers”)  Hilariously complex: Content (also: similar instead of just same)
  • 15. Mission impossible  “Impossible” File attributes – Not accessible – Not visible from viewpoint of scanner – Not used (e.g. multiuser PCs where a user doesn’t log on again) – Encrypted (e.g. Zip with password)
  • 16. Examples of what not to do  Do not harm human beings, animals, plants or goods with your findings – Be good, do good, be a hero!  Do not analyze for files with same filename – Approx. 60-70% of all files on a single machine  Do not just delete duplicates  Also: do not do nothing
  • 18. Frequency distribution  In statistics, a frequency distribution is a table that displays the frequency of various outcomes in a sample.  i.e. session survey feedback by 100 session participants Answer COUNT Speaker skill was brilliant 15 Speaker skill was good 60 Speaker skill was ok 12 Speaker skill was somewhat poor 8 Speaker skill was very poor 5
  • 19. Grouped data  A raw dataset can be organized by constructing a table showing the frequency distribution of the variable (whose values are given in the raw dataset). Such a frequency table is often referred to as grouped data.  i.e. time taken to answer a survey by 15 participants  sorted in symmetric intervals (bins) or qualitative characteristics Time taken [s] 10 11 9 10 14 20 11 9 14 10 9 13 12 21 24 Interval Count <5 s 0 5s<=t<10s 3 10s<=t<15s 9 15s<=t<20s 0 20s<=t<25s 3 Interval Count Fast <10s 3 Normal 10s<=t<20s 9 Slow >=20s 3
  • 20. Histogram A histogram is a graphical representation of the distribution of data. To construct a histogram, the first step is to "bin" the range of values and then count how many values fall into each interval. i.e. time needed in [s] to rush from Dolphin Southern Hemisphere 1 to Swan Mockingbird 1-2 (Sample of 50 Participants) rushtime[s] Count 140 1 150 2 160 5 170 10 180 13 190 11 200 6 210 1 220 0 230 1 0 2 4 6 8 10 12 14 140 150 160 170 180 190 200 210 220 230 Count Rushtime [s] 197 187 186 179 156 179 181 173 188 188 163 202 174 178 193 169 192 170 185 172 192 169 179 174 164 181 161 137 204 167 198 185 186 148 148 185 197 231 175 184 176 175 176 187 210 180 174 180 204 158 Bin and Count Collect/Measure
  • 22. Local  Scan local Windows based drives (locally mounted hard disks, portable drives or mounted)  Using PowerShell – Script 1. Collect file system information with MD5 and SHA1 hashes – Needs PowerShell V4 – Uses: Scripting.FileSystemObject, get-acl cmdlet, get-hash cmdlet – Run locally with ‘super user’ rights  3 Result files – Folders (Folder Path, LastWriteTime, Size, FileCount, Depth , FolderName) – ACLs (Folder Path, IdentityReference, AccessControlType) – Files (Folder Path, FileName, CreationTime, LastWriteTime, Size, Extension, MD5, SHA1)
  • 23. A short note on PowerShell Execution Policy  There is something like execution security in PowerShell  Execution Policy is set to undefined by default – Thus it permits individual commands from console, but will not run scripts  Policytypes – Restricted, AllSigned, RemoteSigned, Unrestricted, Bypass, Undefined  Scope – Local Workstation ,CurrentUser, Process
  • 24. A short note on PowerShell Execution Policy  To see current settings get-ExecutionPolicy –List  To set set-ExecutionPolicy RemoteSigned –Scope CurrentUser  RemoteSigned allows execution of “own” unsigned scripts – “own” means scripts written/edited/saved in PowerShell ISE on local machine – we will not talk about signing PowerShell scripts in this session, its not like “sign using current users id” http://technet.microsoft.com/en-us/library/hh847748.aspx
  • 26. Enhancement: Collecting Office attributes for .doc* files  Scan local Widows based drives (locally mounted hard disks, portable drives or mounted )  Using PowerShell – Script 2. Collect file system information with MD5 and SHA1 hashes and .doc* attributes – Uses: -ComObject Word.Application BuiltInDocumentProperties  3 Result files – Folders (Folder Path, LastWriteTime, Size, FileCount, Depth , FolderName) – ACLs (Folder Path, IdentityReference, AccessControlType) – Files (Folder Path, FileName, CreationTime, LastWriteTime, Size, Extension, MD5, SHA1, Created, Author, Title, Last print date)
  • 27. Snippet 2 BuiltinDocumentProperties 1 Title 2 Subject 3 Author 4 Keywords 5 Comments 6 Template 7 Last author 8 Revision number 9 Application name 10 Last print date 11 Creation date 12 Last save time 13 Total editing time 14 Number of pages 15 Number of words 16 Number of characters 17 Security 18 Category 19 Format 20 Manager 21 Company 22 Number of bytes 23 Number of lines 24 Number of paragraphs 25 Number of slides 26 Number of notes 27 Number of hidden Slides 28 Number of multimedia clips 29 Hyperlink base 30 Number of characters (with spaces)
  • 28. Collecting inventory from “Fileserver 2.0”  Scan SharePoint Inventory  Using PowerShell – Script 3. Collect item information from SharePoint Server – Uses: SharePoint cmdlets – Result: Web Application, Site, Web, List, Item ID, Item URL, Item Title, Item Created, Item Modified, File Size, Author, Versions, Filename
  • 30. SCAN FILES IN NSF CONTAINERS
  • 31. IBM Notes & Domino  NSFs (Email / Applications) – Local (with or without consistent ACL, with or without DB level encryption) – Server – ACL, reader fields, author fields, document / field encryption, … – zip-file content – Fields in general (Subject, from, to, cc:, bcc:, created, modified, Body, …) • The Subject of a Notes document can be just as problematic as the name of a file (attachment) • Actually this may apply to pretty much any field • Note: Message Tracking ID – ATTNQ# (today‘s *00#.*)
  • 32. Fs_free_main.exe ConnectED 2015 Edition  Special Stand-alone version to scan local file system and nsf files  Inspects zip file content (deliberately limited to filesystem)  Runs from command line with parameters – Uses local notes.ini and user.id / server.id – Therefore in security context of used id-file (ACLs, Reader Fields, DB/Document Encryption) – Lists (unprotected) zip file content – Based on C-API  Result: Path,Size,Modified,md5,sha-1
  • 33. CHART TIME ….EXAMPLE RESULTS DEMO… Script 1: 16,728 folders 127,000 files Script 2: 1,150 doc files Script 3: 1,316 SP files Fs.freemain: 1,200,000 records (250 MB)
  • 35. Beyond the shown  Until now we just analyzed what's out there  How could we use that information?  Lets think about some interesting use cases
  • 36. File Server Migrations – File Consolidations  Use the analysis to understand your file inventory  With respect to – File types  which files fit into the target system (i.e. office files, pdf, jpg, png, wav versus xml, properties, files from non office applications) – And their • Volume distribution • Count distribution – Uniqueness of local files – Time stamps (retention, usage hint)  And act/size based on that information
  • 37. Suggest Community Clusters  Based on analysis outcomes – Inventory overlap – Same authors, editors – Same access rights – Metadata  Think of it as a one time functionality to rearrange your files world in the first step  Could be used in the context of an attachment like SwiftFile* in the second step – may require content analysis *http://www-01.ibm.com/support/docview.wss?uid=swg24034409
  • 38. Companies File Locations  You do not have to store this file again….  As a hint for a so far unknown collaboration cluster/ community  Used in the context of an attachment inside notes – Shows all MD5 identical files found at formerly scanned locations inside the company  Biggest challenges – Real time performance (needs ongoing periodic scanning of all sources) – Security trimming (the accounts & groups of all scanned sources have to be resolved/mapped)
  • 39.
  • 40. THANK YOU NOTE: POSSIBILITIES ARE ENDLESS – MORESO BEYOND FILES florian.vogler@panagenda.com, henning.kunz@panagenda.com come and visit us in the TechnOasis #PED G3 A-C! Download the latest slide deck and code snippets www.panagenda.com/connected2015files