This document discusses how startups and developers can leverage big data analytics. It defines big data in terms of volume, variety, velocity and veracity of data. It recommends that startups focus on collecting and analyzing their own data sources like web server logs, web analytics, and user comments. The document presents ELK (Elasticsearch, Logstash, Kibana) as a tool to index, search, and visualize this small startup data. It also discusses storing data in DKAN and using GeoServer to incorporate geospatial data. Overall, the document aims to demonstrate how startups can apply big data principles to analyze the small data sources available to them.
2. Why Big Data?
Big Data is not only for big player
Big Data is also for Us. Startup and developers
Data is raw gold. Information about us is the end
product.
Data define us. Web Server log, web page analytic and
comments about or products.
3. What Is Big Data?
Big data is a term for data sets that are so large or
complex that traditional data processing applications are
inadequate to deal with them. Challenges include
analysis, capture, data curation, search, sharing, storage,
transfer, visualization, querying, updating and
information privacy. (Wikipedia)
Lets redefine big data for us.
4. What Is Big Data?
Volume . Variety . Velocity . Veracity
● Very big data
● Multiple sources
● Stream in data
● Accuracy of the data
5.
6. Redefine Big Data For Startup
4 important terms :-
● Data Sets
● Data Processing
● Analytic
● Visualization
Big Data is big. We need to focus
7. What Should We Call Our Big Data?
● Small Data
● Startup Data
● No Data
We need to visualize our data since day 0
It’s a must
8. Why Big Data?
Big data analytics examines large amounts of data to
uncover hidden patterns, correlations and other insights.
(SAS)
We need to know our own insight. Visualize our future.
9. Data Sets
We don’t have any data (No data) or lack of data - Hendak
cari data kita cari data
Our own data or
We have a place to start. www.data.gov.my
10. Data Set : Our Own Data?
● Web server log
○ IP address of the visitors. IP2Country
● Web access analysis
○ Most visited pages
● Comments from our users.
○ Good, bad, Like, Dislike.
11.
12. Issues With The Data?
Lack of useable information.
We need to collect data on our own.
Ini peluang business untuk startup.
14. Good Bad Like Dislike
What we want to know from big data and any data that we
analysis is this :-
GOOD BAD LIKE DISLIKE
Sentiment analysis
15. When Who Where What Why How
When - @timestamp is important for data analysis.
Who - Anonymous is important but we need to know male or female and his
or her age.
Where - Anonymous is important, but we still need the IP address to know
from which country or state or county.
What - The operating system, the browser's version
Why - Keywords thats lead them
How - How they know about us
16. How To Visualize Our Data
I’m a fan of ELK
Elasticsearch Logstash & Kibana
ELK is one of Big Data tools
17. Index The Data With ES
Used Elasticsearch to Index our data.
One misconception. ES is not for storage.
Don’t used ES to store our data.
Data need to be archived elsewhere.
18. ES Search API
The result in JSON. Developer love JSON. (May be)
https://www.elastic.co/guide/en/elasticsearch/reference/5.
0/_exploring_your_data.html
21. DKAN
We can store data with DKAN. DKAN follow CKAN.
The open source open data platform with a full suite of
cataloging, publishing and visualization features that
allows organizations to easily share data with the public.
http://www.nucivic.com/dkan/
Take advantage DKAN Datastore API
22.
23. GeoSpatial Is Important
Our data need to have spatial information (GPS
Coordinate)
We can used GeoServer to have our own Map Server.
http://geoserver.org/
24. The End
Q & A
linuxmalaysia@gmail.com
019-6085482
http://linuxmalaysia.harisfazillah.info/