SlideShare a Scribd company logo
1 of 25
Download to read offline
1
2
3
I’m a former chemistry researcher who was really bad at the data management game
the first time I played it.
Now I’m a data services librarian who has produced a book, a blog, and videos in this
area.
I want to make the data management game easy and understandable to all players.
This presentation will not only show you tools but also provide tips on leveling up
during the game.
4
5
6
Beware flash drives as a storage option.
7
Cloud storage is a great option for the 3-2-1 Rule’s offsite copy.
Not all cloud storage is made equal (read Google Drive’s terms of service). And don’t
rely only on cloud storage for your data (several horror stories here).
Many cloud storage providers offer free storage up to a certain amount, and then it’s a
paid plan.
I like SpiderOak. This is primarily a cloud backup solution, which is less good for file
sharing (other options are available for that).
It’s billed as “zero knowledge” cloud storage. Files get encrypted on your computer
before sending to their servers, meaning the company can’t read your files and they
stay secure when travelling across the internet (this is really important).
I combine this with my local computer and an external hard drive to make my 3 copies.
8
9
10
I don’t use Bulk Rename Utility often, but it’s so useful when I do.
Bulk Rename Utility is free for personal users on Windows.
It allows you to rename a large number of files at the same time (such as when you
have a file naming convention you want to apply to existing files).
The interface looks complicated but that is because it is so powerful.
You can: replace particular characters, add or remove things at a particular position,
easily add numbering or dates, swap parts of the file name around, etc.
It takes a few minutes to learn, but it’s a great tool to have in your back pocket.
11
12
Regular expressions (regex) are an amazing tool for search and replace.
Regex doesn’t stand alone, but rather plugs into other tools like Bulk Rename Utility,
notepad++, Java, etc.
Regex works by pattern matching, allowing you to search for all social security numbers
in a document, reformat any phone numbers, change the order of sections in a
document but keep the text the same, etc.
Regex takes a bit more learning but is incredibly useful for anyone doing text
manipulation or clean up.
The first link on this slide is to a tutorial I like.
The second link is to a tool, RegExr, that allows you to test your written regular
expressions against text.
13
14
15
Versioning files by hand takes up a lot of hard drive space.
A version control system, like Git, only saves the differences between one version and
the next instead of the whole file. It also streamlines the versioning process.
Such tools came out of computer science but are being used by many researchers.
Git is free and open source.
Git is different than GitHub – Git basically handles the version control, while GitHub
hosts the files and versions and can make them available to others.
Git is really useful but has a learning curve. Because of that, I recommend starting with
the GUI version unless you are comfortable with the command line.
16
17
This tool originated in computer code
Don’t need anything more complicated than a text editor to make one! I use
notepad++.
18
19
20
21
Excel is a useful tool but isn’t always the best tool for cleaning data.
It’s especially bad with dates and tends to mangle them.
22
OpenRefine is a free, open source tool that was previously known as GoogleRefine.
It is the best tool for cleaning up tabular data.
OpenRefine can break data down by “facet” (variable values or ranges), allowing you to
do quick parsing, counting, or editing.
Editing includes straight replacement, math, basic text manipulation (uppercase to
lowercase, etc.), or other functions using Google Refine Expression Language (GREL).
You can also break multi-component cells apart or combine them into one.
The tool also allows for text clean up, providing a number of different algorithms for
text matching.
23
24
25

More Related Content

What's hot

Couch DB/PouchDB approach for hybrid mobile applications
Couch DB/PouchDB approach for hybrid mobile applicationsCouch DB/PouchDB approach for hybrid mobile applications
Couch DB/PouchDB approach for hybrid mobile applications
Ihor Malytskyi
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMS
Sammy Fung
 
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day   Access Layer Implementation (C#) Based On Mongo DbTikal Fuse Day   Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Knowledge
 

What's hot (17)

Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKServerless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure ChestWeb Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
 
Big Data Applications
Big Data ApplicationsBig Data Applications
Big Data Applications
 
Couch DB/PouchDB approach for hybrid mobile applications
Couch DB/PouchDB approach for hybrid mobile applicationsCouch DB/PouchDB approach for hybrid mobile applications
Couch DB/PouchDB approach for hybrid mobile applications
 
Introduction to Modern DevOps Technologies
Introduction to  Modern DevOps TechnologiesIntroduction to  Modern DevOps Technologies
Introduction to Modern DevOps Technologies
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Easy Data for PhoneGap apps with PouchDB
Easy Data for PhoneGap apps with PouchDBEasy Data for PhoneGap apps with PouchDB
Easy Data for PhoneGap apps with PouchDB
 
Google Big Query UDFs
Google Big Query UDFsGoogle Big Query UDFs
Google Big Query UDFs
 
Codemotion madrid 2017 Arquitectura kappa 2.0
Codemotion madrid 2017  Arquitectura kappa 2.0Codemotion madrid 2017  Arquitectura kappa 2.0
Codemotion madrid 2017 Arquitectura kappa 2.0
 
CCT Check and Calculate Transfer
CCT Check and Calculate TransferCCT Check and Calculate Transfer
CCT Check and Calculate Transfer
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMS
 
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day   Access Layer Implementation (C#) Based On Mongo DbTikal Fuse Day   Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...
 
DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015
 

Viewers also liked

Viewers also liked (11)

Smith - Developing Campus Stakeholders' Collaborations - Sept 8
Smith - Developing Campus Stakeholders' Collaborations - Sept 8Smith - Developing Campus Stakeholders' Collaborations - Sept 8
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Lawless-3-jun15
Lawless-3-jun15Lawless-3-jun15
Lawless-3-jun15
 
Thompson 6-jun15-final
Thompson 6-jun15-finalThompson 6-jun15-final
Thompson 6-jun15-final
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Hansen-2-jun15
Hansen-2-jun15Hansen-2-jun15
Hansen-2-jun15
 
Wacker-4-june15
Wacker-4-june15Wacker-4-june15
Wacker-4-june15
 
Gonzalez-8-jun15
Gonzalez-8-jun15Gonzalez-8-jun15
Gonzalez-8-jun15
 
Stahmer-9-Jun15-final
Stahmer-9-Jun15-finalStahmer-9-Jun15-final
Stahmer-9-Jun15-final
 
Wiggins-7-jun15
Wiggins-7-jun15Wiggins-7-jun15
Wiggins-7-jun15
 
McIlroy - Book Publishing Start-Ups
McIlroy - Book Publishing Start-UpsMcIlroy - Book Publishing Start-Ups
McIlroy - Book Publishing Start-Ups
 

Similar to Briney - Leveling Up Data Management - With Notes

SessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsSessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystems
Hellen Gakuruh
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
herthaweston
 
Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)
softwaresatish
 

Similar to Briney - Leveling Up Data Management - With Notes (20)

SessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsSessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystems
 
Collaborative Data Projects
Collaborative Data ProjectsCollaborative Data Projects
Collaborative Data Projects
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
 
Datasciencetools
DatasciencetoolsDatasciencetools
Datasciencetools
 
Top 10 web development tools in 2022
Top 10 web development tools in 2022Top 10 web development tools in 2022
Top 10 web development tools in 2022
 
Google software engineering practices by handerson
Google software engineering practices by handersonGoogle software engineering practices by handerson
Google software engineering practices by handerson
 
Introduction to go lang
Introduction to go langIntroduction to go lang
Introduction to go lang
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
 
Digital Work Tools for the rest of us (2015)
Digital Work Tools for the rest of us (2015)Digital Work Tools for the rest of us (2015)
Digital Work Tools for the rest of us (2015)
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
 
Windows registry troubleshooting (2015)
Windows registry troubleshooting (2015)Windows registry troubleshooting (2015)
Windows registry troubleshooting (2015)
 
Will Google Docs Spreadsheet Replace Excel?
Will Google Docs Spreadsheet Replace Excel?Will Google Docs Spreadsheet Replace Excel?
Will Google Docs Spreadsheet Replace Excel?
 
Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)
 
Mke15
Mke15Mke15
Mke15
 
Introduction to Operating Systems
Introduction to Operating SystemsIntroduction to Operating Systems
Introduction to Operating Systems
 
Evernote Demo Vs Github Demo.pdf
Evernote Demo Vs Github Demo.pdfEvernote Demo Vs Github Demo.pdf
Evernote Demo Vs Github Demo.pdf
 
Useful Shareware / Freeware for Technical Communicators
Useful Shareware / Freeware for Technical CommunicatorsUseful Shareware / Freeware for Technical Communicators
Useful Shareware / Freeware for Technical Communicators
 
Cs121 Unit Test
Cs121 Unit TestCs121 Unit Test
Cs121 Unit Test
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
Software for paper formatting
Software for paper formatting Software for paper formatting
Software for paper formatting
 

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Briney - Leveling Up Data Management - With Notes

  • 1. 1
  • 2. 2
  • 3. 3
  • 4. I’m a former chemistry researcher who was really bad at the data management game the first time I played it. Now I’m a data services librarian who has produced a book, a blog, and videos in this area. I want to make the data management game easy and understandable to all players. This presentation will not only show you tools but also provide tips on leveling up during the game. 4
  • 5. 5
  • 6. 6
  • 7. Beware flash drives as a storage option. 7
  • 8. Cloud storage is a great option for the 3-2-1 Rule’s offsite copy. Not all cloud storage is made equal (read Google Drive’s terms of service). And don’t rely only on cloud storage for your data (several horror stories here). Many cloud storage providers offer free storage up to a certain amount, and then it’s a paid plan. I like SpiderOak. This is primarily a cloud backup solution, which is less good for file sharing (other options are available for that). It’s billed as “zero knowledge” cloud storage. Files get encrypted on your computer before sending to their servers, meaning the company can’t read your files and they stay secure when travelling across the internet (this is really important). I combine this with my local computer and an external hard drive to make my 3 copies. 8
  • 9. 9
  • 10. 10
  • 11. I don’t use Bulk Rename Utility often, but it’s so useful when I do. Bulk Rename Utility is free for personal users on Windows. It allows you to rename a large number of files at the same time (such as when you have a file naming convention you want to apply to existing files). The interface looks complicated but that is because it is so powerful. You can: replace particular characters, add or remove things at a particular position, easily add numbering or dates, swap parts of the file name around, etc. It takes a few minutes to learn, but it’s a great tool to have in your back pocket. 11
  • 12. 12
  • 13. Regular expressions (regex) are an amazing tool for search and replace. Regex doesn’t stand alone, but rather plugs into other tools like Bulk Rename Utility, notepad++, Java, etc. Regex works by pattern matching, allowing you to search for all social security numbers in a document, reformat any phone numbers, change the order of sections in a document but keep the text the same, etc. Regex takes a bit more learning but is incredibly useful for anyone doing text manipulation or clean up. The first link on this slide is to a tutorial I like. The second link is to a tool, RegExr, that allows you to test your written regular expressions against text. 13
  • 14. 14
  • 15. 15
  • 16. Versioning files by hand takes up a lot of hard drive space. A version control system, like Git, only saves the differences between one version and the next instead of the whole file. It also streamlines the versioning process. Such tools came out of computer science but are being used by many researchers. Git is free and open source. Git is different than GitHub – Git basically handles the version control, while GitHub hosts the files and versions and can make them available to others. Git is really useful but has a learning curve. Because of that, I recommend starting with the GUI version unless you are comfortable with the command line. 16
  • 17. 17
  • 18. This tool originated in computer code Don’t need anything more complicated than a text editor to make one! I use notepad++. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. Excel is a useful tool but isn’t always the best tool for cleaning data. It’s especially bad with dates and tends to mangle them. 22
  • 23. OpenRefine is a free, open source tool that was previously known as GoogleRefine. It is the best tool for cleaning up tabular data. OpenRefine can break data down by “facet” (variable values or ranges), allowing you to do quick parsing, counting, or editing. Editing includes straight replacement, math, basic text manipulation (uppercase to lowercase, etc.), or other functions using Google Refine Expression Language (GREL). You can also break multi-component cells apart or combine them into one. The tool also allows for text clean up, providing a number of different algorithms for text matching. 23
  • 24. 24
  • 25. 25