Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and Fresh Data.
1. One Billion rows
Fresh and Fast Data
November 16, 2017
KEITH BOLAM
ENGINERING SOLUTIONS MANAGER
2. FIVE WS AND H
I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.
(Kipling 1902)
3. Your analytics strategy
cost-effective
performance that meets the
speed of your business
on-premise or cloud
deployment
accessible using tools and
skills that are available
Bring new capabilities -
predictive analytics and ML
32. Actian Vector X100 Analytics Engine - The Secret
Ingredient
Top performance in industry-standard
TPC-H benchmark
Key innovations:
– Exploits Intel’s vector instruction set to
process more data elements per
instruction
– Columnar data format to reduce data
transfers from disk and improve
compression optimization
– Optimizes use of large CPU caches to
beat simple in-memory databases
– Tracks write activity to allow updates
without impacting performance,
maintains full ACID compliance via
WAL and PDTs
– Tracks block metadata to avoid
unnecessary data transfers from disk
36. create table actian.yellow_tripdata_staging2_vw as
select year(tpep_dropoff_datetime) as "year",
month(tpep_dropoff_datetime) as "month",
day(tpep_dropoff_datetime) as "day",
hour(tpep_dropoff_datetime) as "hour",
TO_CHAR(tpep_dropoff_datetime,'YYYYMM') as "yearmonth",
tpep_dropoff_datetime, tpep_pickup_datetime,
passenger_count, trip_distance ,
pickup_location_id , dropoff_location_id , fare_amount ,
time(timestamp(tpep_pickup_datetime),0) as "Starttime",
time(timestamp(tpep_dropoff_datetime),0) as "Endtime",
vendor_id
FROM actian.yellow_tripdata_staging2
WHERE pickup_location_id IS NOT NULL
24 real cores / 48 threads
KiB Mem : 26385971
22 500GB 15k spinning disks RAID 5
Actian Vector X100 Analytics Engine - The Secret
Ingredient
37. create table actian.yellow_tripdata_staging2_vw as
select year(tpep_dropoff_datetime) as "year",
month(tpep_dropoff_datetime) as "month",
day(tpep_dropoff_datetime) as "day",
hour(tpep_dropoff_datetime) as "hour",
TO_CHAR(tpep_dropoff_datetime,'YYYYMM') as "yearmonth",
tpep_dropoff_datetime, tpep_pickup_datetime,
passenger_count, trip_distance ,
pickup_location_id , dropoff_location_id , fare_amount ,
time(timestamp(tpep_pickup_datetime),0) as "Starttime",
time(timestamp(tpep_dropoff_datetime),0) as "Endtime",
vendor_id
FROM actian.yellow_tripdata_staging2
WHERE pickup_location_id IS NOT NULL
Executing . . .
(0.000504 secs)
(120737401 rows in 300.498619 secs)
continue
24 real cores / 48 threads
KiB Mem : 26385971
22 500GB 15k spinning disks RAID 5
Actian Vector X100 Analytics Engine - The Secret
Ingredient