SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Pickling & CSV
Preservation through Serialization and Tabulation
Pickle
Module for (de)serialization: Storing complete Python objects into files and later
loading them back.
● Supports almost all data types – good.
● Works only with Python – bad.
import pickle
pickle.dump(object, openBinaryFile) # Save object to an open file
object = pickle.load(openBinaryFile) # Restore an object from an open file
2
What Is CSV?
● “Comma Separated Values”
● Tabular file with rows and columns.
● All rows have the same number of fields.
● Fields separated by commas.
○ “Commas” do not have to be commas. Any other character can be used, such as TAB (TSV,
“tab separated values”), vertical bar, space...
● The first row often serves as headers.
3
CSV Example
4
Student, ID, E-mail Address, Phone Number, Class, Academic Level
“Almarar, Hassan A”, 16897**, halmarar2@suffolk.edu, Junior, UG
“Arakelyan, Artur”, 17577**, aarakelyan@suffolk.edu, Sophomore, UG
“Batista, Christopher A”, 16357**, cbatista@suffolk.edu, Senior, UG
Complete file...
Reading CSV
import csv
with open("path-words.csv") as csvfileIn:
reader = csv.reader(csvfileIn, delimiter=',', quotechar='"')
# Returns the next row parsed as a list, if necessary
headers = next(reader)
# Process the rest of the file
for row in reader:
do_something(row)
# Or, since reader is a generator:
all_rows = list(reader)
5
Writing CSV
import csv
with open("path-words.csv", "w") as csvfileOut:
writer = csv.writer(csvfileOut, delimiter=',', quotechar='"')
writer.writerow([..., ..., ...]) # Write headers
# Write the rest of the file; each row is a list of strings or numbers
writer.writerows([row1, row2, row3 ...])
6
Example: Who Are the Students? (students.py)
import csv, collections
with open("class-2017.csv") as mystudents:
reader = csv.reader(mystudents)
headers = next(reader)
class_position = headers.index("Class") # Where is the Class column?
class_levels = [row[class_position] for row in reader]
who_s_who = collections.Counter(class_levels) # Summary
with open("class-summary.csv", "w") as levels:
writer = csv.writer(levels)
writer.writerow(['Class', 'count']) # New headers
writer.writerows(who_s_who.items()) # New content
7
Whenever
possible, use
Pandas
8

Weitere ähnliche Inhalte

Ähnlich wie Pickling and CSV

Ähnlich wie Pickling and CSV (15)

Lenguaje Python
Lenguaje PythonLenguaje Python
Lenguaje Python
 
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
pysdasdasdsadsadsadsadsadsadasdasdthon1.pptpysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
 
coolstuff.ppt
coolstuff.pptcoolstuff.ppt
coolstuff.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
Introductio_to_python_progamming_ppt.ppt
Introductio_to_python_progamming_ppt.pptIntroductio_to_python_progamming_ppt.ppt
Introductio_to_python_progamming_ppt.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
Python by ganesh kavhar
Python by ganesh kavharPython by ganesh kavhar
Python by ganesh kavhar
 
CHAPTER 2 - FILE HANDLING-txtfile.pdf is here
CHAPTER 2 - FILE HANDLING-txtfile.pdf is hereCHAPTER 2 - FILE HANDLING-txtfile.pdf is here
CHAPTER 2 - FILE HANDLING-txtfile.pdf is here
 
manish python.pptx
manish python.pptxmanish python.pptx
manish python.pptx
 
Python Data-Types
Python Data-TypesPython Data-Types
Python Data-Types
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
Programming in Python
Programming in Python Programming in Python
Programming in Python
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
 

Mehr von Dmitry Zinoviev

Network analysis of the 2016 USA presidential campaign tweets
Network analysis of the 2016 USA presidential campaign tweetsNetwork analysis of the 2016 USA presidential campaign tweets
Network analysis of the 2016 USA presidential campaign tweets
Dmitry Zinoviev
 

Mehr von Dmitry Zinoviev (20)

Machine Learning Basics for Dummies (no math!)
Machine Learning Basics for Dummies (no math!)Machine Learning Basics for Dummies (no math!)
Machine Learning Basics for Dummies (no math!)
 
WHat is star discourse in post-Soviet film journals?
WHat is star discourse in post-Soviet film journals?WHat is star discourse in post-Soviet film journals?
WHat is star discourse in post-Soviet film journals?
 
The “Musk” Effect at Twitter
The “Musk” Effect at TwitterThe “Musk” Effect at Twitter
The “Musk” Effect at Twitter
 
Are Twitter Networks of Regional Entrepreneurs Gendered?
Are Twitter Networks of Regional Entrepreneurs Gendered?Are Twitter Networks of Regional Entrepreneurs Gendered?
Are Twitter Networks of Regional Entrepreneurs Gendered?
 
Using Complex Network Analysis for Periodization
Using Complex Network Analysis for PeriodizationUsing Complex Network Analysis for Periodization
Using Complex Network Analysis for Periodization
 
Algorithms
AlgorithmsAlgorithms
Algorithms
 
Text analysis of The Book Club Play
Text analysis of The Book Club PlayText analysis of The Book Club Play
Text analysis of The Book Club Play
 
Exploring the History of Mental Stigma
Exploring the History of Mental StigmaExploring the History of Mental Stigma
Exploring the History of Mental Stigma
 
Roles and Words in a massive NSSI-Related Interaction Network
Roles and Words in a massive NSSI-Related Interaction NetworkRoles and Words in a massive NSSI-Related Interaction Network
Roles and Words in a massive NSSI-Related Interaction Network
 
“A Quaint and Curious Volume of Forgotten Lore,” or an Exercise in Digital Hu...
“A Quaint and Curious Volume of Forgotten Lore,” or an Exercise in Digital Hu...“A Quaint and Curious Volume of Forgotten Lore,” or an Exercise in Digital Hu...
“A Quaint and Curious Volume of Forgotten Lore,” or an Exercise in Digital Hu...
 
Network analysis of the 2016 USA presidential campaign tweets
Network analysis of the 2016 USA presidential campaign tweetsNetwork analysis of the 2016 USA presidential campaign tweets
Network analysis of the 2016 USA presidential campaign tweets
 
Network Analysis of The Shining
Network Analysis of The ShiningNetwork Analysis of The Shining
Network Analysis of The Shining
 
The Lord of the Ring. A Network Analysis
The Lord of the Ring. A Network AnalysisThe Lord of the Ring. A Network Analysis
The Lord of the Ring. A Network Analysis
 
Python overview
Python overviewPython overview
Python overview
 
Welcome to CS310!
Welcome to CS310!Welcome to CS310!
Welcome to CS310!
 
Programming languages
Programming languagesProgramming languages
Programming languages
 
The P4 of Networkacy
The P4 of NetworkacyThe P4 of Networkacy
The P4 of Networkacy
 
DaVinci Code. Network Analysis
DaVinci Code. Network AnalysisDaVinci Code. Network Analysis
DaVinci Code. Network Analysis
 
Soviet Popular Music Landscape: Community Structure and Success Predictors
Soviet Popular Music Landscape: Community Structure and Success PredictorsSoviet Popular Music Landscape: Community Structure and Success Predictors
Soviet Popular Music Landscape: Community Structure and Success Predictors
 
C for Java programmers (part 2)
C for Java programmers (part 2)C for Java programmers (part 2)
C for Java programmers (part 2)
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Pickling and CSV

  • 1. Pickling & CSV Preservation through Serialization and Tabulation
  • 2. Pickle Module for (de)serialization: Storing complete Python objects into files and later loading them back. ● Supports almost all data types – good. ● Works only with Python – bad. import pickle pickle.dump(object, openBinaryFile) # Save object to an open file object = pickle.load(openBinaryFile) # Restore an object from an open file 2
  • 3. What Is CSV? ● “Comma Separated Values” ● Tabular file with rows and columns. ● All rows have the same number of fields. ● Fields separated by commas. ○ “Commas” do not have to be commas. Any other character can be used, such as TAB (TSV, “tab separated values”), vertical bar, space... ● The first row often serves as headers. 3
  • 4. CSV Example 4 Student, ID, E-mail Address, Phone Number, Class, Academic Level “Almarar, Hassan A”, 16897**, halmarar2@suffolk.edu, Junior, UG “Arakelyan, Artur”, 17577**, aarakelyan@suffolk.edu, Sophomore, UG “Batista, Christopher A”, 16357**, cbatista@suffolk.edu, Senior, UG Complete file...
  • 5. Reading CSV import csv with open("path-words.csv") as csvfileIn: reader = csv.reader(csvfileIn, delimiter=',', quotechar='"') # Returns the next row parsed as a list, if necessary headers = next(reader) # Process the rest of the file for row in reader: do_something(row) # Or, since reader is a generator: all_rows = list(reader) 5
  • 6. Writing CSV import csv with open("path-words.csv", "w") as csvfileOut: writer = csv.writer(csvfileOut, delimiter=',', quotechar='"') writer.writerow([..., ..., ...]) # Write headers # Write the rest of the file; each row is a list of strings or numbers writer.writerows([row1, row2, row3 ...]) 6
  • 7. Example: Who Are the Students? (students.py) import csv, collections with open("class-2017.csv") as mystudents: reader = csv.reader(mystudents) headers = next(reader) class_position = headers.index("Class") # Where is the Class column? class_levels = [row[class_position] for row in reader] who_s_who = collections.Counter(class_levels) # Summary with open("class-summary.csv", "w") as levels: writer = csv.writer(levels) writer.writerow(['Class', 'count']) # New headers writer.writerows(who_s_who.items()) # New content 7