2. 關 於 我
• Education
– NCU (MIS)
– NCCU (CS)
• Experience
– 大數據平台、分析專案
• Teaching
– III
• Community
– TW Spark User Group
– TW Hadoop User Group
– 台灣資料工程協會理事
Research
III MIC (專欄作家)
AI technology
Big Data & Machine learning
Team
慶騰資訊顧問、聯瞻資訊顧問
12. 資 料 工 程 – 資料治理
• Data stewardship
– DBA
– Data Lifecycle => Company’s data governance policies
• Data quality
– Accuracy, completeness and consistency
– data linking tools, as well as version control, workflow
and project management systems help organizations
attain better data quality
13. • Master data management
– A discipline that establishes a master reference to
ensure consistent use of data across large organizations
– Using Metadata repositories to impose different product
groups or lines of business in the company promote
different views on how to best present data
資 料 工 程 – 資料治理
14. • Data governance use cases
– Business process management, legacy modernization,
financial and regulatory compliance, credit risk
management, analytics, business intelligence
applications, data warehouses, and data lakes
資 料 工 程 – 資料治理
52. 視覺化資料查詢與分析實作
• 關注熱門商品的庫存量
%jdbc(hive)
SELECT MOB,C.QTY, CNT, C.SALES,C.TELEPHONE
FROM (
SELECT PARSE_URL(URL,'QUERY','page') AS MOB, CNT FROM (
SELECT URL, COUNT(*) CNT
FROM WEBLOG
GROUP BY URL
ORDER BY CNT DESC
)A LIMIT 5
)B
LEFT JOIN (
SELECT * FROM WEBLOGMAP
)C ON B.MOB=C.BRAND
ORDER BY QTY DESC;
53. 視覺化資料查詢與分析實作
• 將資料結果寫回 hive 表格
%jdbc(hive)
CREATE TABLE RESULT AS
SELECT MOB,C.QTY, CNT, C.SALES,C.TELEPHONE
FROM (
SELECT PARSE_URL(URL,'QUERY','page') AS MOB, CNT FROM (
SELECT URL, COUNT(*) CNT
FROM WEBLOG
GROUP BY URL
ORDER BY CNT DESC
)A LIMIT 5
)B
LEFT JOIN (
SELECT * FROM WEBLOGMAP
)C ON B.MOB=C.BRAND;