3. Introduction
My background
Requirements(
Python, Django, Matplotlib, ajax ) and other
third-party libraries.
What this talk is about ( we will be restricted to
python, matplotlib and django ).
What this talk is not about ( we are not trying
to re-implement Google analytics ).
Source codes are available at (
https://github.com/kenluck2001/PyCon2012_T
alk ).
4. MOTIVATION
There is a need to represent the business
analytic data in a graphical form. This is
because a picture speaks more than a thousand
words.
Source: en.wikipedia.org
5. Where do we find
data?
Source: en.wikipedia.org
7. Steps for data gathering
Identify the data source.
Preprocessing of the data (
removing nulls, wide characters )
e.g. Google refine.
Actual data processing ( perform
some statistical analysis ).
Present the clean data in
descriptive format. i.e Data
visualization
See Appendix 1
8. Visual Representation of
data
Charts / Diagram format
Texts format
Tables
Log files
Source: devk2.wordpress.com Source: elementsdatabase.com
9. Categorization of data
Real-time ( generating charts on
real time. This can also include
mechanism for refreshing the site to
get the latest chart ).
See Appendix 2
Batch-based ( create charts from
csv file. Example in my blog)
See Appendix 2
10. Rules of Data Collection
Keep data in the easiest process able form
e.g database, csv
Keep data collected with timestamp. The
time that the data is collected or
processed, for filtering .
Gather data that are relevant to the
business needs.
Ensure that whenever the data grows so
large. You have to prune some stale or old
data that are no longer needed.
11. Where is the data
visualization done?
Server
See Appendix from 2 - 6
Client
Examples of Javascript library
DS.js ( http://d3js.org/ )
gRaphael.js (
http://g.raphaeljs.com/ )
12. Factors to Consider for
Choice of Visualization
Where do we perform the
visualization processing?
Is it Server or Client?
It depends
Security
Scalability
13. Tools needed for data
analysis
Csvkit
(http://csvkit.readthedocs.org/en/latest/)
networkx (graphs) (spatial analysis)
(http://networkx.lanl.gov/)
pySAL ( http://code.google.com/p/pysal/
)
15. Appendix 1
## This describes a scatter plot of solar radiation against the month.
This aim to describe the steps of data gathering.CSV file from data science
hackathon website. The source code is available in a folder named
“plotCode”
impoqv cuv
fqom
mavplovlib.backendu.backend_agg
impoqv FigtqeCanvauAgg au FigtqeCanvau
fqom mavplovlib.figtqe impoqv Figtqe
def
pqepaqeLiuv(monvh_mouv_common_liuv):
''' Pqepaqe vhe inptv foq pqoceuu by
qemoving all tnneceuuaqy valteu.
Replace "NA" sivh 0''
otvptv_liuv = []
foq x in monvh_mouv_common_liuv:
if x != 'NA':
otvptv_liuv.append(x)
16. Appendix 1 contd.
def plovSolaqRadiavionAgainuvMonvh(filename):
vqainRosReadeq =
cuv.qeadeq(open(filename, 'qb'), delimiveq=',')
monvh_mouv_common_liuv = []
Solaq_qadiavion_64_liuv = []
foq qos in vqainRosReadeq:
monvh_mouv_common = qos[3]
Solaq_qadiavion_64 = qos[6]
monvh_mouv_common_liuv.append(monvh_mouv_common)
Solaq_qadiavion_64_liuv.append(Solaq_qadiavion_6
4)
#conveqv all elemenvu in vhe liuv vo floav
shile ukipping vhe fiquv elemenv foq vhe 1uv
elemenv iu a deucqipvion of vhe field.
monvh_mouv_common_liuv = [floav(i) foq i in
pqepaqeLiuv(monvh_mouv_common_liuv)[1:] ]
Solaq_qadiavion_64_liuv = [floav(i) foq i in
pqepaqeLiuv(Solaq_qadiavion_64_liuv)[1:] ]
fig=Figtqe()
ax=fig.add_utbplov(111)
vivle='Scavveq Diagqam of uolaq qadiavion
againuv monvh of vhe yeaq'
ax.uev_xlabel('Mouv common monvh')
ax.uev_ylabel('Solaq Radiavion')
fig.utpvivle(vivle, fonvuize=14)
vqy:
19. Appendix 3
fqom django.hvvp impoqv HvvpReuponue
fqom mavplovlib.backendu.backend_agg
impoqv FigtqeCanvauAgg au FigtqeCanvaufqom
mavplovlib.figtqe
impoqv Figtqefqom YAAS.uvavu.modelu impoqv
RegiuveqedUueq, OnlineUueq, SvavBid #ucavveq diagqam of
ntmbeq of bidu made againuv ntmbeq of online tuequ
# seekly qepoqv
@uvaff_membeq_qertiqed
def seeklyScavveqOnlinUuqBid(qerteuv, seek_no):
page_vivle='Weekly Scavveq Diagqam baued on Online
tueq vequeu Bid'
seekno=seek_no
fig=Figtqe()
ax=fig.add_utbplov(111)
yeaq=uvav.gevYeaq()
onlUueqObj =
OnlineUueq.objecvu.filveq(seek=seekno).filveq(yeaq=yeaq)
bidObj =
SvavBid.objecvu.filveq(seek=seekno).filveq(yeaq=yeaq)
onlUueqliuv =
liuv(onlUueqObj.valteu_liuv('no_of_online_tueq', flav=Tqte))
bidliuv =
liuv(bidObj.valteu_liuv('no_of_bidu', flav=Tqte))
vivle='Scavveq Diagqam of ntmbeq of online Uueq
againuv ntmbeq of bidu (seek {0l){1l'.foqmav(seekno,yeaq)
ax.uev_xlabel('Ntmbeq of online Uuequ')
ax.uev_ylabel('Ntmbeq of Bidu')
fig.utpvivle(vivle, fonvuize=14)
vqy:
ax.ucavveq(onlUueqliuv, bidliuv)
excepv ValteEqqoq:
pauu
20. Appendix 4
# Example of how database may be deleted to recover some space.
From folder named “YAAS”. Check task.py
@peqiodic_vauk(qtn_eveqy=cqonvab(h
otq=1, mintve=30, day_of_seek=0)
)
def deleveOldIvemuandBidu():
htndeqedandvsenvydayu =
davevime.voday() -
davevime.vimedelva(dayu=120)
myIvem =
Ivem.objecvu.filveq(end_dave__lve
=htndeqedandvsenvydayu ).deleve()
myBid =
Bid.objecvu.filveq(end_dave__lve=
htndeqedandvsenvydayu
).deleve()#poptlave vhe
qegiuveqedtueq and onlinetueq model
av qegtlaq inveqvalu