SlideShare a Scribd company logo
1 of 53
Data Warehousing and Mining
 Data from Library and University
Systems for Assessment of Library
           Operations
                ENUG Conference
     Cheng Library, William Paterson University,
               Wayne, New Jersey,
            Thursday, October 21, 2010

                  Ray Schwartz,
           Systems Specialist Librarian
     Cheng Library, William Paterson University,
             Wayne, New Jersey, USA
              schwartzr2 @ wpunj.edu
Outline
• What is Data Mining and Data
  Warehousing and Why Do We Do It?
• Our Library and University
• Patron Statistical Categories
• Application Server
• Reporting



                                     2
Collecting Transactional Data

• ILSs collect transactional data for circulation
  and allocation of collection funds.
• ILL and Document Delivery services supply
  general transactional data.
• Reports from vendor services
   – Bibliographic utilities
   – Subscription agents
   – Book jobbers




                                                3
Collecting Transactional Data
             cont.
•   Most ILSs have search and web server logs
•   Most (if not all) Databases have usage reports
•   Link Resolver logs
•   Proxy Server logs
•   Many other ways of collecting transactional
    data.
    – Gate counts
    – Reference transaction counts
    – Reshelving counts

                                                4
What would we like to see?
• Breakdowns by department and majors.

• Combined usage by department/majors
  of more than one library service.




                                         5
What is Data Mining and Data
        Warehousing
• Extracting data from legacy systems and other
  resources;
• cleaning, scrubbing and preparing data for decision
  support;
• maintaining data in appropriate data stores;
• accessing and analysing data using a variety of end
  user tools;
• and mining data for significant relationships.


 •   Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
     Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.

                                                                                              6
• The primary purpose of these efforts is
  to provide easy access to specifically
  prepared data that can be used with
  decision support applications such as
  management reports, queries, decision
  support systems, executive information
  systems and data mining.




•   Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
    Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.

                                                                                             7
Our University
•   9000 undergraduates
•   1000 graduates (mostly education majors)
•   400 faculty
•   800 adjuncts
•   1000 staff




                                               8
Our Library
•   19 librarians and 26 library staff
•   350,000 volumes
•   18,000 audiovisual items
•   47,000 print and electronic periodicals
•   124 general and subject specific databases
•   $1,100,000 Non-Salary Allocations




                                                 9
Our Transactions
•   600,000 Database Searches
•   413,000 Gate Counts
•   40,000 Library Materials Circulation
•   34,000 Equipment Circulation
•   19,000 Reference Queries
•   3,000 Interlibrary Loans
•   5,000 Documents Delivered



                                           10
Our Systems
•   Voyager ILS
•   Clio ILL Software
•   EZProxy Server
•   Banner – University ERP
•   University Networked Drive K:
•   University Email Server
•   University Web Server

                                    11
Vendor Services
• Serials Solutions
    • A to Z list
    • MARC Record Service
    • Link Resolver
• OCLC – Bibliographic Utility
    • Worldcat Collection Analysis
•   Coutts (was Blackwell) – Book Jobber
•   Ebsco – Subscription Agent
•   Marcive – Authority Control
•   Database Vendors                       12
Email Reports from the ILS




                             13
Voyager Overdue and Fine
     Notices - Daily




                           14
Quarterly Extract for Serials
  Solutions AtoZ Service




                                15
Which categories of patrons
           are
accessing which services?




                              16
First Step – Patron Statistical
          Categories




                                  17
• Voyager Patron Database allows a maximum
  of 10 statistical categories per patron record.

• Decide which statistical categories are needed
  for each patron group defined.

• Work with your University Information Systems
  Department to extract the relevant data from
  the relevant sources.



                                                    18
Groups and Services
• Major                              •   Circulation
• Status                                   – Books
                                           – Media
     – Undergrad or Grad
                                           – Reserve
     – Faculty, Adjunct Faculty or
                                           – By Fund Code
       Staff
                                           – Location
•   Department
                                     •   ILL / Document Delivery
•   College                          •   Databases
•   Degree                           •   Library Web Pages
•   No. of Credits                        – Subject Area Resource Guides
                                          – Reference Requests
•   Year of Study
                                     •   Catalog
•   Campus Location                  •   Other Vendor Services
                                          – Serials Solutions



                                                                       19
History Department - 12 months -                                                                              Feb. 2008
                                                                                                              %
                                                                                                           BORROW              CIRC/       CIRC/
  PATRON STATUS           BOOK CIRC MEDIA CIRC EQUIP CIRC          TOTAL CIRC     MEMBERS        BORROWERS   ING              MEMBER     BORROWER

UNDERGRADUATE
STUDENTS                        2,715           250          698         3,663             238         186              78%      15.39        19.69

GRADUATE
STUDENTS                         419             13           76           508              14           13             93%      36.29        39.08

ADJUNCT FACULTY                  100             65           20           185              32           20             63%       5.78         9.25

FULL-TIME FACULTY                159            115          194           468              24           23             96%      19.50        20.35

HISTORY TOTALS                  3,393           443          988         4,824             308         242              79%      15.66        19.93

LIBRARY TOTALS                23,370          8,713       20,703        52,756         7,418          4,981             67%       7.11        10.59



DEFINITIONS:
BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge
MEDIA CIRCULATION = audio & video materials, including media reserves

EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc.
MEMBER = declared major or department member
BORROWER = any member who borrowed materials
Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers



                                                                                                                                         20
Communications Majors FY08/09
                                                                          Communications
Statistical Categories // Item Type / Location / Call No Type / Call No          Majors    Freshman   Sophomore   Junior   Senior
M- DVD / Media Services / Other / DVD                                               194         17           31      52       94
M- VideoCass / Media Services / Other / VC                                          228         11           40      67      110
T- Book / 2nd Floor - Circulating / Library of Congress / B                          34          9            8      11          6
T- Book / 2nd Floor - Circulating / Library of Congress / BD                          3          1                               2
T- Book / 2nd Floor - Circulating / Library of Congress / BF                         30          5            5      12          8
...
2nd Floor Circulating                                                              1531        222          310     403      596
T- Juvenile / CMC /                                                                 125         14           26      20       35
T- NJDoc / Askew Documents Room / Other /                                             1                                          1
New Jersey History                                                                   10          0            2       7          1
T- ReserveBk / Reserves Desk /                                                      189         13           46      68       62
T- SpecColl / Special Collection / Library of Congress / LC                           3                               3
T- Book-McNaughton / Leisure Lounge / Library of Congress / F                         2                               1          1
T- Book-McNaughton / Leisure Lounge / Library of Congress / HF                        1                       1
T- Book-McNaughton / Leisure Lounge / Library of Congress / HS                        2                               2
T- Book-McNaughton / Leisure Lounge / Library of Congress / HV                        5          1                    2          2
T- Book-McNaughton / Leisure Lounge / Library of Congress / ML                        1                               1
T- Book-McNaughton / Leisure Lounge / Library of Congress / PN                        3          3
T- Book-McNaughton / Leisure Lounge / Library of Congress / PS                       29          4                   10       15
T- Book-McNaughton / Leisure Lounge / Library of Congress / RC                        2          1                               1
T- Book-McNaughton / Leisure Lounge / Library of Congress / TL                        1                                          1
Leisure Lounge                                                                       49          9            1      19       20

                                                                                                                            21
Challenges with combining
 data from various services
• Little to no linkage of data

• Multiple user IDs for authentication




                                         22
Second Step – Setup an Application
             Server




                                 23
What is an Application Server?
• A machine or its software that works in
  conjunction with a web server to deliver
  application services such as the dynamic
  creation of a webpage from content stored in a
  database. From http://www.webtools.ca.gov/help/Glossary.asp

• Web Server Software (Apache or IIS)
• Database Management System – DBMS (MySQL,
  Oracle, MS SQL Server)
• Scripting Language (Perl, PHP, ColdFusion, ASP)

                                                           24
Why an Application Server?
• Relevant data in logfiles need to be in
  a database to be analyze.

• Need your own DBMS to create new
  tables and queries.




                                            25
• Decide how you will use the Application
  Server.

• Decide on the best and most plausible
  configuration.




                                          26
Authentication of ILL and other forms are
 routed through the EZProxy server




                                            27
Daily and Weekly Email
   Reports from the Application
              Server
Circ Fines Audit Daily Report - Daily at 6:05 AM.
Dupe Patron Record Report - Daily at 5:56 AM.
Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM.
Media Service Scheduling Rooms Report - Daily at 6:02 AM.
Media Services Equipment Pickup Summary - Daily at 7:00 AM.
Received Title Alert - Daily at 6:59 AM.
Reserves Overdues - Daily at 5:59 AM.
Scheduled LIS Tasks - Daily at 6:00 AM.

ILL Borrowing Overdues Report - Weekly at 5:59 AM.
ILL Lending Reports - Weekly at 6:15 AM.



                                                                     28
Monthly Email Reports from
      the Application Server
Circ Fines Audit - Monthly at 6:10 AM.
Circulation by Location and Item Type - Monthly at 6:21 AM.
Circulation Lost and Paid - Monthly at 6:25 AM.
Circulation Online Renewal Count - Monthly at 6:30 AM.
Media Circulation - Monthly at 6:35 AM.
Reserve Circulation - Monthly at 6:40 AM.




                                                              29
30
On Demand Reports




                    31
Lending Services Reports

Lists of patrons with fines between $10 and $19.99
•   Student and Alumni fines list - Sorted by either Name, Amount or Notice Date.
•   PALS and Courtesy Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with fines over $19.99
•   Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or
    Notes.
•   PALS and Courtesy Patron fines list - Sorted by Name.
•   VALE Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with overdues older than 30 days
•   Student and Alumni overdues list - Sorted by either Name, IID or Notes.
•   PALS and Courtesy Patron overdues list - Sorted by Name.
•   All other Patron overdues list except VALE - Sorted by Name.


                                                                                     32
Lending Services Reports, cont.
Lists of VALE patrons with overdues older than 6 months
• VALE patron overdues list - Sorted by Name.
Miscellaneous Reports
• Patrons with the word "Collection Agency" or "CA" in their notes.
•   Patrons with the word "FINE" in one of their notes.
•   Patrons with the word "SOILS" in their notes.
•   Patrons with the word "FALL07 SOILS" in their notes.
•   Patrons with the word "HOLD" in their notes.
• Combined list of HOLD, FINE, and CA.
Circulation Reports by Item Type from 2003 to the present
• All Staff.
•   All Colleges
•   Undergraduates by Major.
•   Graduates by Major
•   Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009
    and 30-Nov-2009                                                               33
One of Our Projects
• Mining EZProxy logfiles and linking to patron
  statistical categories from the Voyager Patron
  Database

  – What majors and departments are accessing
    which database services?

  – What majors and departments are accessing
    the ILL services?


                                                   34
ILL request form authentications by major
Article                              Book
Count Major                          Count Major
      62 M- Psychology                   90 M- History
      60 M- Sociology                    28 M- Non-Degree
      42 M- Applied Clinical Psych       25 M- Pub Pol & Intl Affairs
      35 M- Education                    20 M- Spanish
      31 M- History                      18 M- English
      30 M- Spanish                      16 M- Undecided
      29 M- Nursing                      14 M- Art
          M- Communication               14 M- Education
      19 Disorders                       11 M- Sociology
      19 M- Communication                10 M- Biology
      14 M- Biotechnology                 9 M- Music
      14 M- Counseling                    9 M- Special Programs
      14 M- English                       8 M- Psychology
      12 M- Non-Degree                    7 M- Biotechnology
      10 M- Community/Sch Health          7 M- Political Science
        7 M- Biology                      6 M- Anthropology
        7 M- Political Science            6 M- Music - Jazz Studies
        6 M- Undecided                    4 M- Business
        5 M- Comm Media Studies           4 M- Communication
        5 M- Reading                      4 M- Nursing
        4 M- Business
                                                                        35
Which Databases are
accessed by Majors and
    Departments?




                         36
By Major and Host
Major                       Count Host
M- Nursing                    3377 ebscohost.com
M- Non-Degree                 3010 ebscohost.com
M- Psychology                 2303 ebscohost.com
M- Counseling                 1487 ebscohost.com
M- Communication              1359 ebscohost.com
M- Education                  1267 ebscohost.com
M- Business                   1246 proquest.umi.com
M- Sociology                  1152 ebscohost.com
M- Business                   1145 lexis-nexis.com
M- Undecided                  1100 ebscohost.com
M- Applied Clinical Psych     1075 ebscohost.com
M- English                    1034 ebscohost.com
M- Sociology                   916 csa.com
M- Business                    794 ebscohost.com
M- Accounting                  738 lexis-nexis.com
M- Reading                     683 ebscohost.com
M- Physical Education          653 ebscohost.com
M- Special Programs            600 ebscohost.com
M- Non-Degree                  463 ereserve.wpunj.edu
                                                        37
By Dept and Host
Department               Count Host
S- Information Systems      933 webscript.exe?fs.scr
S- Psychology Dept.         742 ebscohost.com
S- Accounting and Law       559 lexis-nexis.com
S- Political Sci Dept.      308 lexis-nexis.com
S- Nursing Dept.            204 ebscohost.com
S- Market & Mgt. Dept.      175 proquest.umi.com
S- Library                  167 ebscohost.com
S- Sociology Dept.          151 ebscohost.com
S- Sociology Dept.          134 csa.com
S- History Dept.            121 serials.abc-clio.com
S- Exercise & Mov Sci       110 ebscohost.com
S- Political Sci Dept.      104 ebscohost.com
S- Library                  103 ILL_article.cfm
S- Library                  100 webscript.exe?fs.scr
S- History Dept.             94 webscript.exe?fs.scr
                                                       38
By Dept and Service

Department                Count Service
S- Information Systems       933 http://www.wpunj.edu/scripts/webscript.exe?fs.scr
S- Accounting and Law        549 http://www.lexis-nexis.com/universe
S- Psychology Dept.          364 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=psych
S- Nursing Dept.             114 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=c8h
S- Sociology Dept.            96 http://www.csa.com/htbin/dbrng.cgi?&db=socioabs-set-c&adv=1
S- Sociology Dept.            75 http://search.ebscohost.com/login.asp?profile=asp
                                 http://webspirs4.silverplatter.com:8900/c119646?
S- Philosophy Dept.           74 sp.form.first.p=srchmain.htm&sp.dbid.p=S(PHIL
S- Library                    65 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=asp
S- Anthropology Dept.         62 http://www.sciencedirect.com/
S- History Dept.              61 http://serials.abc-clio.com/active/start?_appname=serials&initialdb=AHL
S- Psychology Dept.           61 http://search.ebscohost.com/login.asp?profile=psyart
S- History Dept.              58 http://serials.abc-clio.com/active/start?_appname=serials&initialdb=HA
S- Psychology Dept.           54 http://search.ebscohost.com/login.asp?profile=psych
S- Psychology Dept.           42 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=psyart
S- English Dept.              42 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=mzh
                                                                                                           39
IP Address Location =
                 149.151.VlanID.*
Admin VLANs                       Labs VLANs
  Vlan ID         Vlan Name        Vlan ID      Vlan Name
     2             Servers            3         Lab Servers
     4             Admin              9          Imaging
     5             Science           160          Lib Labs
     6           Test Servers        174         STU VPN
     7               NAS             175       Ben Shahn Lab
    101       Energy Management      178        Hobart Lab
    102            Diebold           179          SCI Lab
    104             Xerox            187          CS Lab
    150         Media Services       192          Atrium
    161         Dorms Offices        209           Labs
    162              RBI             212        Resnet Labs
    163             Police           214         Raub Labs
    164          Maintenance         228          VR Labs
                                                               40
FY08/09 On Campus Hits to
Databases by Class C IP Address




                                  41
Patron Privacy and Standards




                           42
Using Voyager as the model
     for Patron Privacy




                             43
• Active Circ transactions are stored in a
  table with patron ID and statistical
  categories.
• Completed Circ transactions are stored
  in a table without the patron ID, but still
  with the patron statistical categories.
• The Patron Table contains the total
  counts of transactions for each
  patron, but no link to which transactions
  they are.


                                                44
• EZProxy transactions would be stored in
  one table with patron statistical
  categories, but without the user
  ID.
• User ID s would be stored in another
  table with counts for each service divided
  by academic
  year.
• Logs are collected monthly and loaded
  and deleted monthly.


                                          45
Example of EZProxy log entry
•   Ip address     nj.dhcp.embarqhsd.net
•   (Not used)     -
•   user id        theuser
•   date/time      1/1/2008 4:25:15 AM
•   Method         GET
•   page           http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZ
                       a&url=http://www.wpunj.edu/scripts/webscript.exe?fs.scr
    retrieved
                   HTTP/1.1
•   Version
                   302
•   response
    code
•   no. of bytes   537
•   Referring      http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/
    URL                webscript.exe?fs.scr
                   Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
•   User agent         1.1.4322)



                                                                                   46
Perl Script for loading ezproxy
       log into MySQL
use strict;
my
%month=(Jan=>'01',Feb=>'02',Mar=>'03',Apr=>'04',May=>'05',Jun=>'06',Jul=>'07',
Aug=>'08',Sep=>'09',Oct=>'10',Nov=>'11',Dec=>'12');
while (<>){
     my $pattern =
            '^(S*) (S*) (S*) (S*) '.
            '[(..)/(...)/(....):(..):(..):(..) .....]'.
            ' "(S*) (S*) (S*)" '.
            '(d*) (-|d*) "([^"]*)" "([^"]*)"';
     if (m/$pattern/){
            my ($tgt,$ref,$agt) = (esc($12),esc($16),esc($17));
            my $byt = $15 eq '_'?'NULL':$15;
            print "INSERT INTO ezproxylogs VALUES ('$1','$2','$3',".
                    " TIMESTAMP '$7/$month{$6}/$5 $8:$9:$10','$11','$tgt',".
                    "'$13',$14,$byt,'$ref','$agt');r.";
     }else{
            print "--Skipped line $.n";
     }
}

sub esc{
     my ($p) = @_;
     $p =~ s/'/''/g;
     return $p;                                                                  47
}
Created table to assist the
            linking
SELECT PATRON_ADDRESS.ADDRESS_TYPE,
Left([ADDRESS_LINE1],InStr([ADDRESS_LINE1],"@"
)-1) AS usr,
PATRON_ADDRESS.PATRON_ID,
PATRON_ADDRESS.ADDRESS_STATUS,
PATRON_ADDRESS.EFFECT_DATE,
PATRON_ADDRESS.EXPIRE_DATE,
PATRON_ADDRESS.MODIFY_DATE,
PATRON_ADDRESS.MODIFY_OPERATOR_ID INTO
emailprefix
FROM PATRON_ADDRESS
WHERE
(((PATRON_ADDRESS.ADDRESS_TYPE)="3"));
                                                 48
Reporting and Standards
• Reporting
     –   Emailed periodically - e.g., daily dossiers,
         and other event triggered reports.
     –   On demand, via email, web pages or a
         printer.
• Standards
     –   Share data for comparative research.
     –   Groups of libraries and consortia




                                                   49
50
51
52
Questions?


             Ray Schwartz,
      Systems Specialist Librarian
Cheng Library, William Paterson University,
        Wayne, New Jersey, USA
        schwartzr2 @ wpunj.edu




                                              53

More Related Content

Similar to Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historyAdam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historysherif user group
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataRay Schwartz
 
SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.Selena Killick
 
SALT - Surfacing the Academic Long Tail
SALT - Surfacing the Academic Long TailSALT - Surfacing the Academic Long Tail
SALT - Surfacing the Academic Long TailLisa Charnock
 
Managing discovery and linking services
Managing discovery and linking servicesManaging discovery and linking services
Managing discovery and linking servicesNASIG
 
Evaluating and Selecting Library Services PlatformNEW
Evaluating and Selecting Library Services PlatformNEWEvaluating and Selecting Library Services PlatformNEW
Evaluating and Selecting Library Services PlatformNEWmahongzn
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web DataMarieke Guy
 
Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.Selena Killick
 
Acquisitions institute 2011 ocul pda project
Acquisitions institute 2011 ocul pda projectAcquisitions institute 2011 ocul pda project
Acquisitions institute 2011 ocul pda projectTony Horava
 
Defining the Libraries' Role in Research: A Needs Assessment  Case Study
Defining the Libraries' Role in Research:  A Needs Assessment  Case StudyDefining the Libraries' Role in Research:  A Needs Assessment  Case Study
Defining the Libraries' Role in Research: A Needs Assessment  Case StudyKathryn Crowe
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data managementIncisive_Events
 
Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)EDINA, University of Edinburgh
 
Alma, the Cloud & the Evolution of the Library Systems Department - Kevin Kidd
Alma, the Cloud & the Evolution of the Library Systems Department - Kevin KiddAlma, the Cloud & the Evolution of the Library Systems Department - Kevin Kidd
Alma, the Cloud & the Evolution of the Library Systems Department - Kevin KiddKevin Kidd
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...datacite
 

Similar to Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations (20)

EDINA Serials UKLA SafeNet
EDINA Serials UKLA SafeNetEDINA Serials UKLA SafeNet
EDINA Serials UKLA SafeNet
 
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historyAdam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.
 
Qs4 group c corti
Qs4 group c cortiQs4 group c corti
Qs4 group c corti
 
SALT - Surfacing the Academic Long Tail
SALT - Surfacing the Academic Long TailSALT - Surfacing the Academic Long Tail
SALT - Surfacing the Academic Long Tail
 
Managing discovery and linking services
Managing discovery and linking servicesManaging discovery and linking services
Managing discovery and linking services
 
Evaluating and Selecting Library Services PlatformNEW
Evaluating and Selecting Library Services PlatformNEWEvaluating and Selecting Library Services PlatformNEW
Evaluating and Selecting Library Services PlatformNEW
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
 
Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.
 
Acquisitions institute 2011 ocul pda project
Acquisitions institute 2011 ocul pda projectAcquisitions institute 2011 ocul pda project
Acquisitions institute 2011 ocul pda project
 
Defining the Libraries' Role in Research: A Needs Assessment  Case Study
Defining the Libraries' Role in Research:  A Needs Assessment  Case StudyDefining the Libraries' Role in Research:  A Needs Assessment  Case Study
Defining the Libraries' Role in Research: A Needs Assessment  Case Study
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
 
Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)
 
Today's forecast for your campus: BLUEcloud
 Today's forecast for your campus: BLUEcloud Today's forecast for your campus: BLUEcloud
Today's forecast for your campus: BLUEcloud
 
NISO Webinar: Evolving Trends in Collection Development Part 2: Putting the U...
NISO Webinar: Evolving Trends in Collection Development Part 2: Putting the U...NISO Webinar: Evolving Trends in Collection Development Part 2: Putting the U...
NISO Webinar: Evolving Trends in Collection Development Part 2: Putting the U...
 
Alma, the Cloud & the Evolution of the Library Systems Department - Kevin Kidd
Alma, the Cloud & the Evolution of the Library Systems Department - Kevin KiddAlma, the Cloud & the Evolution of the Library Systems Department - Kevin Kidd
Alma, the Cloud & the Evolution of the Library Systems Department - Kevin Kidd
 
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
 
Measuring impact
Measuring impactMeasuring impact
Measuring impact
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
 

More from Ray Schwartz

Discovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategiesDiscovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategiesRay Schwartz
 
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...Ray Schwartz
 
Hacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searchingHacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searchingRay Schwartz
 
The path to flexible loading of patron records
The path to flexible loading of patron recordsThe path to flexible loading of patron records
The path to flexible loading of patron recordsRay Schwartz
 
Using drill down within alma analytics reports
Using drill down within alma analytics reportsUsing drill down within alma analytics reports
Using drill down within alma analytics reportsRay Schwartz
 
Vale2017 b13-presentation
Vale2017 b13-presentationVale2017 b13-presentation
Vale2017 b13-presentationRay Schwartz
 
Logging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT LogLogging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT LogRay Schwartz
 
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...Ray Schwartz
 

More from Ray Schwartz (9)

Discovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategiesDiscovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategies
 
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
 
Hacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searchingHacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searching
 
Browses
BrowsesBrowses
Browses
 
The path to flexible loading of patron records
The path to flexible loading of patron recordsThe path to flexible loading of patron records
The path to flexible loading of patron records
 
Using drill down within alma analytics reports
Using drill down within alma analytics reportsUsing drill down within alma analytics reports
Using drill down within alma analytics reports
 
Vale2017 b13-presentation
Vale2017 b13-presentationVale2017 b13-presentation
Vale2017 b13-presentation
 
Logging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT LogLogging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT Log
 
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

  • 1. Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations ENUG Conference Cheng Library, William Paterson University, Wayne, New Jersey, Thursday, October 21, 2010 Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu
  • 2. Outline • What is Data Mining and Data Warehousing and Why Do We Do It? • Our Library and University • Patron Statistical Categories • Application Server • Reporting 2
  • 3. Collecting Transactional Data • ILSs collect transactional data for circulation and allocation of collection funds. • ILL and Document Delivery services supply general transactional data. • Reports from vendor services – Bibliographic utilities – Subscription agents – Book jobbers 3
  • 4. Collecting Transactional Data cont. • Most ILSs have search and web server logs • Most (if not all) Databases have usage reports • Link Resolver logs • Proxy Server logs • Many other ways of collecting transactional data. – Gate counts – Reference transaction counts – Reshelving counts 4
  • 5. What would we like to see? • Breakdowns by department and majors. • Combined usage by department/majors of more than one library service. 5
  • 6. What is Data Mining and Data Warehousing • Extracting data from legacy systems and other resources; • cleaning, scrubbing and preparing data for decision support; • maintaining data in appropriate data stores; • accessing and analysing data using a variety of end user tools; • and mining data for significant relationships. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 6
  • 7. • The primary purpose of these efforts is to provide easy access to specifically prepared data that can be used with decision support applications such as management reports, queries, decision support systems, executive information systems and data mining. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 7
  • 8. Our University • 9000 undergraduates • 1000 graduates (mostly education majors) • 400 faculty • 800 adjuncts • 1000 staff 8
  • 9. Our Library • 19 librarians and 26 library staff • 350,000 volumes • 18,000 audiovisual items • 47,000 print and electronic periodicals • 124 general and subject specific databases • $1,100,000 Non-Salary Allocations 9
  • 10. Our Transactions • 600,000 Database Searches • 413,000 Gate Counts • 40,000 Library Materials Circulation • 34,000 Equipment Circulation • 19,000 Reference Queries • 3,000 Interlibrary Loans • 5,000 Documents Delivered 10
  • 11. Our Systems • Voyager ILS • Clio ILL Software • EZProxy Server • Banner – University ERP • University Networked Drive K: • University Email Server • University Web Server 11
  • 12. Vendor Services • Serials Solutions • A to Z list • MARC Record Service • Link Resolver • OCLC – Bibliographic Utility • Worldcat Collection Analysis • Coutts (was Blackwell) – Book Jobber • Ebsco – Subscription Agent • Marcive – Authority Control • Database Vendors 12
  • 13. Email Reports from the ILS 13
  • 14. Voyager Overdue and Fine Notices - Daily 14
  • 15. Quarterly Extract for Serials Solutions AtoZ Service 15
  • 16. Which categories of patrons are accessing which services? 16
  • 17. First Step – Patron Statistical Categories 17
  • 18. • Voyager Patron Database allows a maximum of 10 statistical categories per patron record. • Decide which statistical categories are needed for each patron group defined. • Work with your University Information Systems Department to extract the relevant data from the relevant sources. 18
  • 19. Groups and Services • Major • Circulation • Status – Books – Media – Undergrad or Grad – Reserve – Faculty, Adjunct Faculty or – By Fund Code Staff – Location • Department • ILL / Document Delivery • College • Databases • Degree • Library Web Pages • No. of Credits – Subject Area Resource Guides – Reference Requests • Year of Study • Catalog • Campus Location • Other Vendor Services – Serials Solutions 19
  • 20. History Department - 12 months - Feb. 2008 % BORROW CIRC/ CIRC/ PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS ING MEMBER BORROWER UNDERGRADUATE STUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69 GRADUATE STUDENTS 419 13 76 508 14 13 93% 36.29 39.08 ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25 FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35 HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93 LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59 DEFINITIONS: BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge MEDIA CIRCULATION = audio & video materials, including media reserves EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc. MEMBER = declared major or department member BORROWER = any member who borrowed materials Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers 20
  • 21. Communications Majors FY08/09 Communications Statistical Categories // Item Type / Location / Call No Type / Call No Majors Freshman Sophomore Junior Senior M- DVD / Media Services / Other / DVD 194 17 31 52 94 M- VideoCass / Media Services / Other / VC 228 11 40 67 110 T- Book / 2nd Floor - Circulating / Library of Congress / B 34 9 8 11 6 T- Book / 2nd Floor - Circulating / Library of Congress / BD 3 1 2 T- Book / 2nd Floor - Circulating / Library of Congress / BF 30 5 5 12 8 ... 2nd Floor Circulating 1531 222 310 403 596 T- Juvenile / CMC / 125 14 26 20 35 T- NJDoc / Askew Documents Room / Other / 1 1 New Jersey History 10 0 2 7 1 T- ReserveBk / Reserves Desk / 189 13 46 68 62 T- SpecColl / Special Collection / Library of Congress / LC 3 3 T- Book-McNaughton / Leisure Lounge / Library of Congress / F 2 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / HF 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / HS 2 2 T- Book-McNaughton / Leisure Lounge / Library of Congress / HV 5 1 2 2 T- Book-McNaughton / Leisure Lounge / Library of Congress / ML 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / PN 3 3 T- Book-McNaughton / Leisure Lounge / Library of Congress / PS 29 4 10 15 T- Book-McNaughton / Leisure Lounge / Library of Congress / RC 2 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / TL 1 1 Leisure Lounge 49 9 1 19 20 21
  • 22. Challenges with combining data from various services • Little to no linkage of data • Multiple user IDs for authentication 22
  • 23. Second Step – Setup an Application Server 23
  • 24. What is an Application Server? • A machine or its software that works in conjunction with a web server to deliver application services such as the dynamic creation of a webpage from content stored in a database. From http://www.webtools.ca.gov/help/Glossary.asp • Web Server Software (Apache or IIS) • Database Management System – DBMS (MySQL, Oracle, MS SQL Server) • Scripting Language (Perl, PHP, ColdFusion, ASP) 24
  • 25. Why an Application Server? • Relevant data in logfiles need to be in a database to be analyze. • Need your own DBMS to create new tables and queries. 25
  • 26. • Decide how you will use the Application Server. • Decide on the best and most plausible configuration. 26
  • 27. Authentication of ILL and other forms are routed through the EZProxy server 27
  • 28. Daily and Weekly Email Reports from the Application Server Circ Fines Audit Daily Report - Daily at 6:05 AM. Dupe Patron Record Report - Daily at 5:56 AM. Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM. Media Service Scheduling Rooms Report - Daily at 6:02 AM. Media Services Equipment Pickup Summary - Daily at 7:00 AM. Received Title Alert - Daily at 6:59 AM. Reserves Overdues - Daily at 5:59 AM. Scheduled LIS Tasks - Daily at 6:00 AM. ILL Borrowing Overdues Report - Weekly at 5:59 AM. ILL Lending Reports - Weekly at 6:15 AM. 28
  • 29. Monthly Email Reports from the Application Server Circ Fines Audit - Monthly at 6:10 AM. Circulation by Location and Item Type - Monthly at 6:21 AM. Circulation Lost and Paid - Monthly at 6:25 AM. Circulation Online Renewal Count - Monthly at 6:30 AM. Media Circulation - Monthly at 6:35 AM. Reserve Circulation - Monthly at 6:40 AM. 29
  • 30. 30
  • 32. Lending Services Reports Lists of patrons with fines between $10 and $19.99 • Student and Alumni fines list - Sorted by either Name, Amount or Notice Date. • PALS and Courtesy Patron fines list - Sorted by Name. • All other Patron fines list - Sorted by Name. Lists of patrons with fines over $19.99 • Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or Notes. • PALS and Courtesy Patron fines list - Sorted by Name. • VALE Patron fines list - Sorted by Name. • All other Patron fines list - Sorted by Name. Lists of patrons with overdues older than 30 days • Student and Alumni overdues list - Sorted by either Name, IID or Notes. • PALS and Courtesy Patron overdues list - Sorted by Name. • All other Patron overdues list except VALE - Sorted by Name. 32
  • 33. Lending Services Reports, cont. Lists of VALE patrons with overdues older than 6 months • VALE patron overdues list - Sorted by Name. Miscellaneous Reports • Patrons with the word "Collection Agency" or "CA" in their notes. • Patrons with the word "FINE" in one of their notes. • Patrons with the word "SOILS" in their notes. • Patrons with the word "FALL07 SOILS" in their notes. • Patrons with the word "HOLD" in their notes. • Combined list of HOLD, FINE, and CA. Circulation Reports by Item Type from 2003 to the present • All Staff. • All Colleges • Undergraduates by Major. • Graduates by Major • Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009 and 30-Nov-2009 33
  • 34. One of Our Projects • Mining EZProxy logfiles and linking to patron statistical categories from the Voyager Patron Database – What majors and departments are accessing which database services? – What majors and departments are accessing the ILL services? 34
  • 35. ILL request form authentications by major Article Book Count Major Count Major 62 M- Psychology 90 M- History 60 M- Sociology 28 M- Non-Degree 42 M- Applied Clinical Psych 25 M- Pub Pol & Intl Affairs 35 M- Education 20 M- Spanish 31 M- History 18 M- English 30 M- Spanish 16 M- Undecided 29 M- Nursing 14 M- Art M- Communication 14 M- Education 19 Disorders 11 M- Sociology 19 M- Communication 10 M- Biology 14 M- Biotechnology 9 M- Music 14 M- Counseling 9 M- Special Programs 14 M- English 8 M- Psychology 12 M- Non-Degree 7 M- Biotechnology 10 M- Community/Sch Health 7 M- Political Science 7 M- Biology 6 M- Anthropology 7 M- Political Science 6 M- Music - Jazz Studies 6 M- Undecided 4 M- Business 5 M- Comm Media Studies 4 M- Communication 5 M- Reading 4 M- Nursing 4 M- Business 35
  • 36. Which Databases are accessed by Majors and Departments? 36
  • 37. By Major and Host Major Count Host M- Nursing 3377 ebscohost.com M- Non-Degree 3010 ebscohost.com M- Psychology 2303 ebscohost.com M- Counseling 1487 ebscohost.com M- Communication 1359 ebscohost.com M- Education 1267 ebscohost.com M- Business 1246 proquest.umi.com M- Sociology 1152 ebscohost.com M- Business 1145 lexis-nexis.com M- Undecided 1100 ebscohost.com M- Applied Clinical Psych 1075 ebscohost.com M- English 1034 ebscohost.com M- Sociology 916 csa.com M- Business 794 ebscohost.com M- Accounting 738 lexis-nexis.com M- Reading 683 ebscohost.com M- Physical Education 653 ebscohost.com M- Special Programs 600 ebscohost.com M- Non-Degree 463 ereserve.wpunj.edu 37
  • 38. By Dept and Host Department Count Host S- Information Systems 933 webscript.exe?fs.scr S- Psychology Dept. 742 ebscohost.com S- Accounting and Law 559 lexis-nexis.com S- Political Sci Dept. 308 lexis-nexis.com S- Nursing Dept. 204 ebscohost.com S- Market & Mgt. Dept. 175 proquest.umi.com S- Library 167 ebscohost.com S- Sociology Dept. 151 ebscohost.com S- Sociology Dept. 134 csa.com S- History Dept. 121 serials.abc-clio.com S- Exercise & Mov Sci 110 ebscohost.com S- Political Sci Dept. 104 ebscohost.com S- Library 103 ILL_article.cfm S- Library 100 webscript.exe?fs.scr S- History Dept. 94 webscript.exe?fs.scr 38
  • 39. By Dept and Service Department Count Service S- Information Systems 933 http://www.wpunj.edu/scripts/webscript.exe?fs.scr S- Accounting and Law 549 http://www.lexis-nexis.com/universe S- Psychology Dept. 364 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=psych S- Nursing Dept. 114 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=c8h S- Sociology Dept. 96 http://www.csa.com/htbin/dbrng.cgi?&db=socioabs-set-c&adv=1 S- Sociology Dept. 75 http://search.ebscohost.com/login.asp?profile=asp http://webspirs4.silverplatter.com:8900/c119646? S- Philosophy Dept. 74 sp.form.first.p=srchmain.htm&sp.dbid.p=S(PHIL S- Library 65 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=asp S- Anthropology Dept. 62 http://www.sciencedirect.com/ S- History Dept. 61 http://serials.abc-clio.com/active/start?_appname=serials&initialdb=AHL S- Psychology Dept. 61 http://search.ebscohost.com/login.asp?profile=psyart S- History Dept. 58 http://serials.abc-clio.com/active/start?_appname=serials&initialdb=HA S- Psychology Dept. 54 http://search.ebscohost.com/login.asp?profile=psych S- Psychology Dept. 42 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=psyart S- English Dept. 42 http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=mzh 39
  • 40. IP Address Location = 149.151.VlanID.* Admin VLANs Labs VLANs Vlan ID Vlan Name Vlan ID Vlan Name 2 Servers 3 Lab Servers 4 Admin 9 Imaging 5 Science 160 Lib Labs 6 Test Servers 174 STU VPN 7 NAS 175 Ben Shahn Lab 101 Energy Management 178 Hobart Lab 102 Diebold 179 SCI Lab 104 Xerox 187 CS Lab 150 Media Services 192 Atrium 161 Dorms Offices 209 Labs 162 RBI 212 Resnet Labs 163 Police 214 Raub Labs 164 Maintenance 228 VR Labs 40
  • 41. FY08/09 On Campus Hits to Databases by Class C IP Address 41
  • 42. Patron Privacy and Standards 42
  • 43. Using Voyager as the model for Patron Privacy 43
  • 44. • Active Circ transactions are stored in a table with patron ID and statistical categories. • Completed Circ transactions are stored in a table without the patron ID, but still with the patron statistical categories. • The Patron Table contains the total counts of transactions for each patron, but no link to which transactions they are. 44
  • 45. • EZProxy transactions would be stored in one table with patron statistical categories, but without the user ID. • User ID s would be stored in another table with counts for each service divided by academic year. • Logs are collected monthly and loaded and deleted monthly. 45
  • 46. Example of EZProxy log entry • Ip address nj.dhcp.embarqhsd.net • (Not used) - • user id theuser • date/time 1/1/2008 4:25:15 AM • Method GET • page http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZ a&url=http://www.wpunj.edu/scripts/webscript.exe?fs.scr retrieved HTTP/1.1 • Version 302 • response code • no. of bytes 537 • Referring http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/ URL webscript.exe?fs.scr Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR • User agent 1.1.4322) 46
  • 47. Perl Script for loading ezproxy log into MySQL use strict; my %month=(Jan=>'01',Feb=>'02',Mar=>'03',Apr=>'04',May=>'05',Jun=>'06',Jul=>'07', Aug=>'08',Sep=>'09',Oct=>'10',Nov=>'11',Dec=>'12'); while (<>){ my $pattern = '^(S*) (S*) (S*) (S*) '. '[(..)/(...)/(....):(..):(..):(..) .....]'. ' "(S*) (S*) (S*)" '. '(d*) (-|d*) "([^"]*)" "([^"]*)"'; if (m/$pattern/){ my ($tgt,$ref,$agt) = (esc($12),esc($16),esc($17)); my $byt = $15 eq '_'?'NULL':$15; print "INSERT INTO ezproxylogs VALUES ('$1','$2','$3',". " TIMESTAMP '$7/$month{$6}/$5 $8:$9:$10','$11','$tgt',". "'$13',$14,$byt,'$ref','$agt');r."; }else{ print "--Skipped line $.n"; } } sub esc{ my ($p) = @_; $p =~ s/'/''/g; return $p; 47 }
  • 48. Created table to assist the linking SELECT PATRON_ADDRESS.ADDRESS_TYPE, Left([ADDRESS_LINE1],InStr([ADDRESS_LINE1],"@" )-1) AS usr, PATRON_ADDRESS.PATRON_ID, PATRON_ADDRESS.ADDRESS_STATUS, PATRON_ADDRESS.EFFECT_DATE, PATRON_ADDRESS.EXPIRE_DATE, PATRON_ADDRESS.MODIFY_DATE, PATRON_ADDRESS.MODIFY_OPERATOR_ID INTO emailprefix FROM PATRON_ADDRESS WHERE (((PATRON_ADDRESS.ADDRESS_TYPE)="3")); 48
  • 49. Reporting and Standards • Reporting – Emailed periodically - e.g., daily dossiers, and other event triggered reports. – On demand, via email, web pages or a printer. • Standards – Share data for comparative research. – Groups of libraries and consortia 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. Questions? Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu 53