SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
So Who Wants to
                           Be a Munger?
                                Dana
                              LSRC 2009




Friday, August 28, 2009
Who am I?

                   • Dana
                   • 8 years in corporate world
                   • Responsible for munging a massive
                          amount of data every day
                   • Now develop Rails Applications for a
                          living


Friday, August 28, 2009
Why is this important?
                   • We live in a data           • Important to know
                          driven society           what data you have
                                                   and what needs to
                   • Companies feed on             happen with it
                          reports
                                                 • The more you know
                   • Clients have data and         about the final output,
                          want ways to display     the easier you can
                          it                       manipulate the data




Friday, August 28, 2009
The Process



Friday, August 28, 2009
The Rule of 3
                              In - Munge - Out
                   • Read data into some construct
                          • anything that understands each()
                   • Transform the data
                   • Output transformed data
                          • some format that understands
                            puts()


Friday, August 28, 2009
1 - Reading



Friday, August 28, 2009
A Basic Munging Script
           The output file     open("new_numbers.txt", "w") do |f|
           The input file        File.foreach("numbers.txt") do |n|
           The transformation     n.capitalize!
                                  f.puts n
                                end
                              end

                            one              One
                            two              Two
                            three            Three
                            four             Four
                            five             Five



Friday, August 28, 2009
Simplify                       pass out
                                                     pass some
                                                                  another
                                                      object to
                                                                  object as
                                                       munge
                                                                   output

               • Don’t confuse reading with      def munge(input, output)
                      munging                      input.each do |record|
                                                     record.capitalize!
               • May have to read various            output.puts record
                                                   end
                      files for the same output
                                                 end

               • Use Ruby’s each() and
                      puts() methods to your
                      advantage



Friday, August 28, 2009
Why this is better
            names = %w[dana james sarah storm gypsy]   numbers = open("numbers.txt")
            stream = $stdout                           stream = open("new_numbers.txt", "w")
            munge(names, stream)                       munge(numbers, stream)




Friday, August 28, 2009
each() and puts()
                             class Rubyist
                               def each
                                 yield "i"
                                 yield "love"
                                 yield "ruby"
                               end
                             end

                             class Speaker
                               def puts(words)
                                 `say #{words}`
                               end
                             end


Friday, August 28, 2009
Reaching ultimate
                           munging power
          class Munger

                                                           m = Munger.new(open("numbers.txt"),
              def initialize(input, output)
                                                           open("new_numbers.txt", "w"))
                @input = input
                                                           m.munge do |n|
                @output = output
                                                             n.strip!
              end
                                                             if n =~ /At/i
                                                               n.reverse
              def munge
                                                             elsif n == "four"
                @input.each do |record|
                                                               nil
                  munged = yield(record)
                                                             else
                  @output.puts munged unless munged.nil?
                                                               n.capitalize
                end
                                                             end
              end
                                                           end
          end




Friday, August 28, 2009
Data

                   • Different kinds of data
                          • Structured - record oriented data
                          • Unstructured
                           • Most difficult to work with
                   • Vast majority of data reading is
                          pattern matching


Friday, August 28, 2009
Somewhere in between
                          SAA_R_009 26-Mar-2009 15:26                           1: BOB's BILLARD HALL               Page 6

        headers Code
              Part                         Description               Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var
                          --------------- ------------------------ ---------- ---------- ------- ---------- ---------- -------
                            Salesperson 22 BILL PRICE
                              Customer 1014 KECK'S MEAT & FOODSERVICE
                                 SA Sort Code 4.42 PORK RIBS
                          44-531           53/3 CU PRK RIB SOY                0          0       0          0          2    -100
                          44-531-0         100/2.5 CU PRK RIB SOY            10         21     -52         14         31     -55
                          15-230           53/3 BB PUB BURGER               150          0       0       150k          0       0
       hierarchical       3680             40/4 RB PRK WHLMSC HARKER        187        243     -23        412        405       2
        categories        3681             30/5.3 RB PRK WHLMS HARKR        207        162      28        378        243      56
                          3686             30/5.3 RB PRK WHLMS HARKR         27         45     -40         72        180     -60
                          3008             33/4.92 RB PRK HARKER            270        300     -10        580        600      -3
                          3010             25/6.4 RB PRK CNTRY HARKR        510        540      -6      1,000      1,080      -7
                          3402             40/4 RU PRK RIB PAT HARKR          0          0       0          0        40k    -100
                          3403             51/3.14 RU PRK RIB HARKER        558        900     -38      1,008      1,170     -14
                          3404             40/4.14 RU PRK RIB HARKER        73k      1,052     -30      1,296      1,592     -19
                                                                     ---------- ---------- ------- ---------- ---------- -------
                                 SA Sort Code subtotals                   2,567      3,263     -21      6,260      5,703       9

                                 SA Sort Code 19.1 WAFFLES
                          5018             36/5 KING B WAFFLES             10         10       0         10         14     -29
                                                                   ---------- ---------- ------- ---------- ---------- -------
                                 SA Sort Code subtotals                    10         10       0         10         14     -29
                                                                   ---------- ---------- ------- ---------- ---------- -------

                          SAA_R_009 26-Mar-2009 15:26                           1: BOB's BILLARD HALL               Page 7
        headers Code
              Part                        Description              Qty Period   QTY LastYr   QTY Var   Lbs Period   Lbs LastYr   Lbs Var
                          --------------- ------------------------ ----------   ----------   -------   ----------   ----------   -------
                              Customer subtotals                        2,577        3,273       -21        6,270        5,717         9
                                                                   ----------   ----------   -------   ----------   ----------   -------
                            Salesperson subtotals                       9,857        8,756        12       45,889       42,556         8
                                                                   ----------   ----------   -------   ----------   ----------   -------
                          Report Totals                                15,008       13,225        13       75,896       72,359         4


Friday, August 28, 2009
require "munger"

                             class RossReader

                                def initialize(file)
                                  @file = file
                                end

                                def each
                                  open(@file) do |report|
                                    report.each do |line|
                                      break if line =~ /AReport Totals/
                                      next if line =~ /As+z/ or
                                              line =~ /As+-/ or
                                              line =~ /b(sub)?totalsb/i
                                      yield line
                                    end # report.each
                                  end # open
                                end # def

                             end

           report = Munger.new(RossReader.new("sample_report.txt"), open("ross_writer.txt", "w"))
           report.munge do |n|
             n
           end


Friday, August 28, 2009
SAA_R_009 26-Mar-2009 15:26                           1: BOB's BILLARD HALL           Page 6
    Part Code        Description               Qty Period QTY LastYr QTY Var Lbs Period   Lbs LastYr Lbs Var
    --------------- ------------------------ ---------- ---------- ------- ----------     ---------- -------
      Salesperson 22 BILL PRICE
         Customer 1014 KECK'S MEAT & FOODSERVICE
           SA Sort Code 4.42 PORK RIBS
    44-531           53/3 CU PRK RIB SOY                0           0       0         0            2    -100
    44-531-0         100/2.5 CU PRK RIB SOY            10          21     -52        14           31     -55
    15-230           53/3 BB PUB BURGER               150           0       0      150k            0       0
    3680             40/4 RB PRK WHLMSC HARKER        187         243     -23       412          405       2
    3681             30/5.3 RB PRK WHLMS HARKR        207         162      28       378          243      56
    3686             30/5.3 RB PRK WHLMS HARKR         27          45     -40        72          180     -60
    3008             33/4.92 RB PRK HARKER            270         300     -10       580          600      -3
    3010             25/6.4 RB PRK CNTRY HARKR        510         540      -6     1,000        1,080      -7
    3402             40/4 RU PRK RIB PAT HARKR          0           0       0         0          40k    -100
    3403             51/3.14 RU PRK RIB HARKER        558         900     -38     1,008        1,170     -14
    3404             40/4.14 RU PRK RIB HARKER        73k       1,052     -30     1,296        1,592     -19
           SA Sort Code 19.1 WAFFLES
    5018             36/5 KING B WAFFLES               10          10       0        10           14     -29
    SAA_R_009 26-Mar-2009 15:26                           1: BOB's BILLARD HALL           Page 7
    Part Code        Description               Qty Period QTY LastYr QTY Var Lbs Period   Lbs LastYr Lbs Var
    --------------- ------------------------ ---------- ---------- ------- ----------     ---------- -------




Friday, August 28, 2009
Ugly Headers

      SAA_R_009 26-Mar-2009 15:26                       1: BOB's BILLARD HALL          Page 6

      Part Code       Description              Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var
      --------------- ------------------------ ---------- ---------- ------- ---------- ---------- -------
        Salesperson 22 BILL PRICE
          Customer 1014 KECK'S MEAT & FOODSERVICE
            SA Sort Code 4.42 PORK RIBS

      SAA_R_009 26-Mar-2009 15:26                       1: BOB's BILLARD HALL          Page 7

      Part Code       Description              Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var
      --------------- ------------------------ ---------- ---------- ------- ---------- ---------- -------




Friday, August 28, 2009
unpack()
                   • Designed for breaking             • “a” means ascii
                          up binary data                  character

                   • Very handy for this               • “x” means skip
                          kind of fixed-width
                                                  "cookies and cream".unpack("a7xa3xa5")
                          work
                                                       ["cookies", "and", "cream"]
                   • unpack() takes in a
                          format string           "--- --- -----".split.
                                                      map {d|"a#{d.length}" }.join("x")
                          • You describe what                  "a3xa3xa5"
                            the data looks like



Friday, August 28, 2009
def initialize(file)
                 @file       = file
                 @headers    = nil
                 @format     = nil
               end

               def each
                 open(@file) do |report|
                   parse_header(Array.new(4) { report.gets })

                   report.each do |line|
                      ...
                   end # report.each
                 end # open
               end # def

               def parse_header(headers)
                 @format = headers[3].split.map { |col| "a#{col.size}" }.join("x")
                 @headers = headers[2].unpack(@format).map { |f| f.strip }
               end




Friday, August 28, 2009
def initialize(file)
                             @file       = file
                             @in_header = false
                             @headers    = nil
                             @format     = nil
                           end

                          def each
                            open(@file) do |report|
                              parse_header(Array.new(4) { report.gets })

                              report.each do |line|
                                if line =~ /ASAA_R/
                                  @in_header = true
                                elsif @in_header
                                  @in_header = false if line =~ /A-/
                                else
                                   ...
                                end
                              end # report.each
                            end # open
                          end # def


Friday, August 28, 2009
Salesperson 22 BILL PRICE
                Customer 1014 KECK'S MEAT & FOODSERVICE
                  SA Sort Code 4.42 PORK RIBS
           44-531           53/3 CU PRK RIB SOY             0       0     0       0       2 -100
           44-531-0         100/2.5 CU PRK RIB SOY         10      21   -52      14      31 -55
           15-230           53/3 BB PUB BURGER            150       0     0    150k       0    0
           3680             40/4 RB PRK WHLMSC HARKER     187     243   -23     412     405    2
           3681             30/5.3 RB PRK WHLMS HARKR     207     162    28     378     243   56
           3686             30/5.3 RB PRK WHLMS HARKR      27      45   -40      72     180 -60
           3008             33/4.92 RB PRK HARKER         270     300   -10     580     600   -3
           3010             25/6.4 RB PRK CNTRY HARKR     510     540    -6   1,000   1,080   -7
           3402             40/4 RU PRK RIB PAT HARKR       0       0     0       0     40k -100
           3403             51/3.14 RU PRK RIB HARKER     558     900   -38   1,008   1,170 -14
           3404             40/4.14 RU PRK RIB HARKER     73k   1,052   -30   1,296   1,592 -19
                  SA Sort Code 19.1 WAFFLES
           5018             36/5 KING B WAFFLES            10      10     0      10      14   -29




Friday, August 28, 2009
assoc()
                   • lookup method                     • slower than a hash -
                                                           don’t use on LARGE
                   • call it on an array of                amounts of data
                          arrays
                                                       • assoc() becomes a poor
                   • pass in the data you                  man’s ordered hash
                          want to lookup
                                                   names = [["James" , "Gray"], ["Dana", "Gray"]]
                   • walks through the             puts names.assoc("James")

                          outer array and
                          returns the inner
                                                                ["James", "Gray"]
                          array that starts with
                          the argument



Friday, August 28, 2009
def initialize(file)
                    ...
                    @categories = []
                  end

                  def each
                    open(@file) do |report| ...
                           if line =~ /As+(w[ws]+?)s+(d.+?)s+z/
                             if cat = @categories.assoc($1)
                               cat[-1] = $2
                             else
                                @categories << [$1, $2]
                             end
                           else
                             yield @headers.zip(line.unpack(@format).map { |f| f.strip }) + @categories
                           end
                        end
                      end # report.each
                    end # open
                  end # def




Friday, August 28, 2009
[["Part Code", "44-531"],
                           ["Description", "53/3 CU PRK RIB SOY"],
                           ["Qty Period", "0"], ["QTY LastYr", "0"],
                           ["Var", "0"],
                           ["Lbs Period", "0"],
                           ["Lbs LastYr", "2"],
                           ["Var", "-100"],
                           ["Salesperson", "22 BILL PRICE"],
                           ["Customer", "1014 KECK'S MEAT & FOODSERVICE"],
                           ["SA Sort Code", "4.42 PORK RIBS"]]
                          [["Part Code", "44-531-0"],
                           ["Description", "100/2.5 CU PRK RIB SOY"],
                           ["Qty Period", "10"], ["QTY LastYr", "21"],
                           ["Var", "-52"],
                           ["Lbs Period", "14"],
                           ["Lbs LastYr", "31"],
                           ["Var", "-55"],
                           ["Salesperson", "22 BILL PRICE"],
                           ["Customer", "1014 KECK'S MEAT & FOODSERVICE"],
                           ["SA Sort Code", "4.42 PORK RIBS"]]
                              ...
                          [["Part Code", "5018"],
                           ["Description", "36/5 KING B WAFFLES"],
                           ["Qty Period", "10"], ["QTY LastYr", "10"],
                           ["Var", "0"],
                           ["Lbs Period", "10"],
                           ["Lbs LastYr", "14"],
                           ["Var", "-29"],
                           ["Salesperson", "22 BILL PRICE"],
                           ["Customer", "1014 KECK'S MEAT & FOODSERVICE"],
                           ["SA Sort Code", "19.1 WAFFLES"]]


Friday, August 28, 2009
def each
                                                             open(@file) do |report|
                                                               parse_header(Array.new(4) { report.gets })

                                                               report.each do |line|
                                                                 if line =~ /ASAA_R/
                                                                   @in_header = true
     class RossReader                                            elsif @in_header
                                                                   @in_header = false if line =~ /A-/
         def initialize(file)                                    else
           @file       = file                                      break if line =~ /AReport Totals/
           @in_header = false                                      next if line =~ /As+z/ or
           @headers    = nil                                                line =~ /As+-/ or
           @format     = nil                                                line =~ /b(sub)?totalsb/i
           @categories = []                                        if line =~ /As+(w[ws]+?)s+(d.+?)s+z/
         end                                                          if cat = @categories.assoc($1)
                                                                        cat[-1] = $2
         def parse_header(headers)                                    else
           @format = headers[3].split.map {                             @categories << [$1, $2]
                      |col| "a#{col.size}" }.join("x")                end
           @headers = headers[2].unpack(@format).map {             else
                      |f| f.strip }                                   yield @headers.zip(line.unpack(@format).map {
         end                                                                         |f| f.strip }) + @categories
                                                                   end
                                                                 end
                                                               end # report.each
                                                             end # open
                                                           end # def

                                                         end




Friday, August 28, 2009
2 - Writing



Friday, August 28, 2009
require "rubygems"
               require "faster_csv"

               class CSVWriter
                 def initialize
                   @headers = nil
                 end

                    def puts(record)
                      if @headers.nil?
                        @headers = record.map { |field| field.first }
                        FCSV { |csv| csv << @headers }
                      end
                      FCSV { |csv| csv << record.map { |field| field.last } }
                    end

               end




Friday, August 28, 2009
3 - Munging



Friday, August 28, 2009
require "munger"
             require "ross_reader"
             require "csv_writer"



             report = Munger.new(RossReader.new(ARGV.shift), CSVWriter.new)
             report.munge do |record|
               record.each do |field|
                 if field.last =~ /A(?:d+,)+d+k?z/
                   field.last.delete!(",")
                 end
                 field.last.sub!(/Ad+kz/) { |num| num.to_i * 1000 }
               end
               record
             end




Friday, August 28, 2009
So what can I do
                              with all this?

                   • Output your data into a spreadsheet
                          such as Excel
                   • Open the data in your text editor
                   • Import the data into a database
                   • Let’s see it in action


Friday, August 28, 2009
Examples



Friday, August 28, 2009
unless File.exist? "db.sqlite"
    require        "munger"                          class CreatePartCodes < ActiveRecord::Migration
    require        "rubygems"                          def self.up
    require        "faster_csv"                          create_table :part_codes do |t|
    require        "active_record"                         t.string :part_code
                                                           t.string :description
    class DBWriter                                         t.integer :qty_period
      def initialize(model, path = "db.sqlite")            t.integer :qty_lastyr
        ActiveRecord::Base.establish_connection(           t.integer :qty_var
            :adapter   => "sqlite3",                       t.integer :lbs_period
            :database => path                              t.integer :lbs_lastyr
          )                                                t.integer :lbs_var
        @model = model                                     t.string :salesperson
      end                                                  t.string :customer
                                                           t.string :sa_sort_code
      def puts(record)                                   end
        @model.create!(record)                         end
      end
    end                                                def self.down
                                                         drop_table :part_codes
    class PartCode < ActiveRecord::Base                end
    end                                              end
                                                   end




Friday, August 28, 2009
reader = FCSV($stdin, :headers => true, :header_converters => :symbol)
       writer = DBWriter.new(PartCode)
       CreatePartCodes.up if defined? CreatePartCodes
       m = Munger.new(reader, writer)
       m.munge do |row|
         row.to_hash
       end




Friday, August 28, 2009
Congratulations!
                          You, too, are now
                             a Munger!


Friday, August 28, 2009

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Who Wants To Be a Munger

  • 1. So Who Wants to Be a Munger? Dana LSRC 2009 Friday, August 28, 2009
  • 2. Who am I? • Dana • 8 years in corporate world • Responsible for munging a massive amount of data every day • Now develop Rails Applications for a living Friday, August 28, 2009
  • 3. Why is this important? • We live in a data • Important to know driven society what data you have and what needs to • Companies feed on happen with it reports • The more you know • Clients have data and about the final output, want ways to display the easier you can it manipulate the data Friday, August 28, 2009
  • 5. The Rule of 3 In - Munge - Out • Read data into some construct • anything that understands each() • Transform the data • Output transformed data • some format that understands puts() Friday, August 28, 2009
  • 6. 1 - Reading Friday, August 28, 2009
  • 7. A Basic Munging Script The output file open("new_numbers.txt", "w") do |f| The input file File.foreach("numbers.txt") do |n| The transformation n.capitalize! f.puts n end end one One two Two three Three four Four five Five Friday, August 28, 2009
  • 8. Simplify pass out pass some another object to object as munge output • Don’t confuse reading with def munge(input, output) munging input.each do |record| record.capitalize! • May have to read various output.puts record end files for the same output end • Use Ruby’s each() and puts() methods to your advantage Friday, August 28, 2009
  • 9. Why this is better names = %w[dana james sarah storm gypsy] numbers = open("numbers.txt") stream = $stdout stream = open("new_numbers.txt", "w") munge(names, stream) munge(numbers, stream) Friday, August 28, 2009
  • 10. each() and puts() class Rubyist def each yield "i" yield "love" yield "ruby" end end class Speaker def puts(words) `say #{words}` end end Friday, August 28, 2009
  • 11. Reaching ultimate munging power class Munger m = Munger.new(open("numbers.txt"), def initialize(input, output) open("new_numbers.txt", "w")) @input = input m.munge do |n| @output = output n.strip! end if n =~ /At/i n.reverse def munge elsif n == "four" @input.each do |record| nil munged = yield(record) else @output.puts munged unless munged.nil? n.capitalize end end end end end Friday, August 28, 2009
  • 12. Data • Different kinds of data • Structured - record oriented data • Unstructured • Most difficult to work with • Vast majority of data reading is pattern matching Friday, August 28, 2009
  • 13. Somewhere in between SAA_R_009 26-Mar-2009 15:26 1: BOB's BILLARD HALL Page 6 headers Code Part Description Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var --------------- ------------------------ ---------- ---------- ------- ---------- ---------- ------- Salesperson 22 BILL PRICE Customer 1014 KECK'S MEAT & FOODSERVICE SA Sort Code 4.42 PORK RIBS 44-531 53/3 CU PRK RIB SOY 0 0 0 0 2 -100 44-531-0 100/2.5 CU PRK RIB SOY 10 21 -52 14 31 -55 15-230 53/3 BB PUB BURGER 150 0 0 150k 0 0 hierarchical 3680 40/4 RB PRK WHLMSC HARKER 187 243 -23 412 405 2 categories 3681 30/5.3 RB PRK WHLMS HARKR 207 162 28 378 243 56 3686 30/5.3 RB PRK WHLMS HARKR 27 45 -40 72 180 -60 3008 33/4.92 RB PRK HARKER 270 300 -10 580 600 -3 3010 25/6.4 RB PRK CNTRY HARKR 510 540 -6 1,000 1,080 -7 3402 40/4 RU PRK RIB PAT HARKR 0 0 0 0 40k -100 3403 51/3.14 RU PRK RIB HARKER 558 900 -38 1,008 1,170 -14 3404 40/4.14 RU PRK RIB HARKER 73k 1,052 -30 1,296 1,592 -19 ---------- ---------- ------- ---------- ---------- ------- SA Sort Code subtotals 2,567 3,263 -21 6,260 5,703 9 SA Sort Code 19.1 WAFFLES 5018 36/5 KING B WAFFLES 10 10 0 10 14 -29 ---------- ---------- ------- ---------- ---------- ------- SA Sort Code subtotals 10 10 0 10 14 -29 ---------- ---------- ------- ---------- ---------- ------- SAA_R_009 26-Mar-2009 15:26 1: BOB's BILLARD HALL Page 7 headers Code Part Description Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var --------------- ------------------------ ---------- ---------- ------- ---------- ---------- ------- Customer subtotals 2,577 3,273 -21 6,270 5,717 9 ---------- ---------- ------- ---------- ---------- ------- Salesperson subtotals 9,857 8,756 12 45,889 42,556 8 ---------- ---------- ------- ---------- ---------- ------- Report Totals 15,008 13,225 13 75,896 72,359 4 Friday, August 28, 2009
  • 14. require "munger" class RossReader def initialize(file) @file = file end def each open(@file) do |report| report.each do |line| break if line =~ /AReport Totals/ next if line =~ /As+z/ or line =~ /As+-/ or line =~ /b(sub)?totalsb/i yield line end # report.each end # open end # def end report = Munger.new(RossReader.new("sample_report.txt"), open("ross_writer.txt", "w")) report.munge do |n| n end Friday, August 28, 2009
  • 15. SAA_R_009 26-Mar-2009 15:26 1: BOB's BILLARD HALL Page 6 Part Code Description Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var --------------- ------------------------ ---------- ---------- ------- ---------- ---------- ------- Salesperson 22 BILL PRICE Customer 1014 KECK'S MEAT & FOODSERVICE SA Sort Code 4.42 PORK RIBS 44-531 53/3 CU PRK RIB SOY 0 0 0 0 2 -100 44-531-0 100/2.5 CU PRK RIB SOY 10 21 -52 14 31 -55 15-230 53/3 BB PUB BURGER 150 0 0 150k 0 0 3680 40/4 RB PRK WHLMSC HARKER 187 243 -23 412 405 2 3681 30/5.3 RB PRK WHLMS HARKR 207 162 28 378 243 56 3686 30/5.3 RB PRK WHLMS HARKR 27 45 -40 72 180 -60 3008 33/4.92 RB PRK HARKER 270 300 -10 580 600 -3 3010 25/6.4 RB PRK CNTRY HARKR 510 540 -6 1,000 1,080 -7 3402 40/4 RU PRK RIB PAT HARKR 0 0 0 0 40k -100 3403 51/3.14 RU PRK RIB HARKER 558 900 -38 1,008 1,170 -14 3404 40/4.14 RU PRK RIB HARKER 73k 1,052 -30 1,296 1,592 -19 SA Sort Code 19.1 WAFFLES 5018 36/5 KING B WAFFLES 10 10 0 10 14 -29 SAA_R_009 26-Mar-2009 15:26 1: BOB's BILLARD HALL Page 7 Part Code Description Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var --------------- ------------------------ ---------- ---------- ------- ---------- ---------- ------- Friday, August 28, 2009
  • 16. Ugly Headers SAA_R_009 26-Mar-2009 15:26 1: BOB's BILLARD HALL Page 6 Part Code Description Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var --------------- ------------------------ ---------- ---------- ------- ---------- ---------- ------- Salesperson 22 BILL PRICE Customer 1014 KECK'S MEAT & FOODSERVICE SA Sort Code 4.42 PORK RIBS SAA_R_009 26-Mar-2009 15:26 1: BOB's BILLARD HALL Page 7 Part Code Description Qty Period QTY LastYr QTY Var Lbs Period Lbs LastYr Lbs Var --------------- ------------------------ ---------- ---------- ------- ---------- ---------- ------- Friday, August 28, 2009
  • 17. unpack() • Designed for breaking • “a” means ascii up binary data character • Very handy for this • “x” means skip kind of fixed-width "cookies and cream".unpack("a7xa3xa5") work ["cookies", "and", "cream"] • unpack() takes in a format string "--- --- -----".split. map {d|"a#{d.length}" }.join("x") • You describe what "a3xa3xa5" the data looks like Friday, August 28, 2009
  • 18. def initialize(file) @file = file @headers = nil @format = nil end def each open(@file) do |report| parse_header(Array.new(4) { report.gets }) report.each do |line| ... end # report.each end # open end # def def parse_header(headers) @format = headers[3].split.map { |col| "a#{col.size}" }.join("x") @headers = headers[2].unpack(@format).map { |f| f.strip } end Friday, August 28, 2009
  • 19. def initialize(file) @file = file @in_header = false @headers = nil @format = nil end def each open(@file) do |report| parse_header(Array.new(4) { report.gets }) report.each do |line| if line =~ /ASAA_R/ @in_header = true elsif @in_header @in_header = false if line =~ /A-/ else ... end end # report.each end # open end # def Friday, August 28, 2009
  • 20. Salesperson 22 BILL PRICE Customer 1014 KECK'S MEAT & FOODSERVICE SA Sort Code 4.42 PORK RIBS 44-531 53/3 CU PRK RIB SOY 0 0 0 0 2 -100 44-531-0 100/2.5 CU PRK RIB SOY 10 21 -52 14 31 -55 15-230 53/3 BB PUB BURGER 150 0 0 150k 0 0 3680 40/4 RB PRK WHLMSC HARKER 187 243 -23 412 405 2 3681 30/5.3 RB PRK WHLMS HARKR 207 162 28 378 243 56 3686 30/5.3 RB PRK WHLMS HARKR 27 45 -40 72 180 -60 3008 33/4.92 RB PRK HARKER 270 300 -10 580 600 -3 3010 25/6.4 RB PRK CNTRY HARKR 510 540 -6 1,000 1,080 -7 3402 40/4 RU PRK RIB PAT HARKR 0 0 0 0 40k -100 3403 51/3.14 RU PRK RIB HARKER 558 900 -38 1,008 1,170 -14 3404 40/4.14 RU PRK RIB HARKER 73k 1,052 -30 1,296 1,592 -19 SA Sort Code 19.1 WAFFLES 5018 36/5 KING B WAFFLES 10 10 0 10 14 -29 Friday, August 28, 2009
  • 21. assoc() • lookup method • slower than a hash - don’t use on LARGE • call it on an array of amounts of data arrays • assoc() becomes a poor • pass in the data you man’s ordered hash want to lookup names = [["James" , "Gray"], ["Dana", "Gray"]] • walks through the puts names.assoc("James") outer array and returns the inner ["James", "Gray"] array that starts with the argument Friday, August 28, 2009
  • 22. def initialize(file) ... @categories = [] end def each open(@file) do |report| ... if line =~ /As+(w[ws]+?)s+(d.+?)s+z/ if cat = @categories.assoc($1) cat[-1] = $2 else @categories << [$1, $2] end else yield @headers.zip(line.unpack(@format).map { |f| f.strip }) + @categories end end end # report.each end # open end # def Friday, August 28, 2009
  • 23. [["Part Code", "44-531"], ["Description", "53/3 CU PRK RIB SOY"], ["Qty Period", "0"], ["QTY LastYr", "0"], ["Var", "0"], ["Lbs Period", "0"], ["Lbs LastYr", "2"], ["Var", "-100"], ["Salesperson", "22 BILL PRICE"], ["Customer", "1014 KECK'S MEAT & FOODSERVICE"], ["SA Sort Code", "4.42 PORK RIBS"]] [["Part Code", "44-531-0"], ["Description", "100/2.5 CU PRK RIB SOY"], ["Qty Period", "10"], ["QTY LastYr", "21"], ["Var", "-52"], ["Lbs Period", "14"], ["Lbs LastYr", "31"], ["Var", "-55"], ["Salesperson", "22 BILL PRICE"], ["Customer", "1014 KECK'S MEAT & FOODSERVICE"], ["SA Sort Code", "4.42 PORK RIBS"]] ... [["Part Code", "5018"], ["Description", "36/5 KING B WAFFLES"], ["Qty Period", "10"], ["QTY LastYr", "10"], ["Var", "0"], ["Lbs Period", "10"], ["Lbs LastYr", "14"], ["Var", "-29"], ["Salesperson", "22 BILL PRICE"], ["Customer", "1014 KECK'S MEAT & FOODSERVICE"], ["SA Sort Code", "19.1 WAFFLES"]] Friday, August 28, 2009
  • 24. def each open(@file) do |report| parse_header(Array.new(4) { report.gets }) report.each do |line| if line =~ /ASAA_R/ @in_header = true class RossReader elsif @in_header @in_header = false if line =~ /A-/ def initialize(file) else @file = file break if line =~ /AReport Totals/ @in_header = false next if line =~ /As+z/ or @headers = nil line =~ /As+-/ or @format = nil line =~ /b(sub)?totalsb/i @categories = [] if line =~ /As+(w[ws]+?)s+(d.+?)s+z/ end if cat = @categories.assoc($1) cat[-1] = $2 def parse_header(headers) else @format = headers[3].split.map { @categories << [$1, $2] |col| "a#{col.size}" }.join("x") end @headers = headers[2].unpack(@format).map { else |f| f.strip } yield @headers.zip(line.unpack(@format).map { end |f| f.strip }) + @categories end end end # report.each end # open end # def end Friday, August 28, 2009
  • 25. 2 - Writing Friday, August 28, 2009
  • 26. require "rubygems" require "faster_csv" class CSVWriter def initialize @headers = nil end def puts(record) if @headers.nil? @headers = record.map { |field| field.first } FCSV { |csv| csv << @headers } end FCSV { |csv| csv << record.map { |field| field.last } } end end Friday, August 28, 2009
  • 27. 3 - Munging Friday, August 28, 2009
  • 28. require "munger" require "ross_reader" require "csv_writer" report = Munger.new(RossReader.new(ARGV.shift), CSVWriter.new) report.munge do |record| record.each do |field| if field.last =~ /A(?:d+,)+d+k?z/ field.last.delete!(",") end field.last.sub!(/Ad+kz/) { |num| num.to_i * 1000 } end record end Friday, August 28, 2009
  • 29. So what can I do with all this? • Output your data into a spreadsheet such as Excel • Open the data in your text editor • Import the data into a database • Let’s see it in action Friday, August 28, 2009
  • 31. unless File.exist? "db.sqlite" require "munger" class CreatePartCodes < ActiveRecord::Migration require "rubygems" def self.up require "faster_csv" create_table :part_codes do |t| require "active_record" t.string :part_code t.string :description class DBWriter t.integer :qty_period def initialize(model, path = "db.sqlite") t.integer :qty_lastyr ActiveRecord::Base.establish_connection( t.integer :qty_var :adapter => "sqlite3", t.integer :lbs_period :database => path t.integer :lbs_lastyr ) t.integer :lbs_var @model = model t.string :salesperson end t.string :customer t.string :sa_sort_code def puts(record) end @model.create!(record) end end end def self.down drop_table :part_codes class PartCode < ActiveRecord::Base end end end end Friday, August 28, 2009
  • 32. reader = FCSV($stdin, :headers => true, :header_converters => :symbol) writer = DBWriter.new(PartCode) CreatePartCodes.up if defined? CreatePartCodes m = Munger.new(reader, writer) m.munge do |row| row.to_hash end Friday, August 28, 2009
  • 33. Congratulations! You, too, are now a Munger! Friday, August 28, 2009