Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Azure HDInsight
1.
2. @ashishth
Free
Proven @
Scale
No Lock in
Many
Options
Not really
Free
Operation
alization is
Hard
Expertise
60% of Big
data
projects
will fail*
*According 20 Gartner 60% of Advanced Analytics projects will fail in 2017
Cloud
Cloud
Optimizati
ons/
Security
29. • Hive Low Latency and Analytical Processing (LLAP)
• Serves queries directly from Azure BLOB/ADLS
• Works with TEXT, JSON, CSV, TSV, ORC, Parquet
• Super fast performance with TEXT data
• Modern scalable query concurrency architecture
• Security with Apache Ranger and Active Directory
32. Intelligent cache
Automatically reacting to changes in underlying data
o Shared cache between queries
o Cache eviction is based on source file last modified date
o Every query will check modified date, and reload if a new file has
arrived
DRAM
SSD
ADLS/BLOBStore
Updates
33. • LLAP, Spark, and Presto against 1 TB derived from the TPC-DS benchmark
• Out of the box HDInsight Configuration
• 45 queries derived from the TPC-DS benchmark that ran on all engines
successfully
34.
35.
36. • We used number of different concurrency levels to test the concurrency
performance
• 99 queries on 1 TB data with 32 worker node cluster with max concurrency set
to 32.
Test 1: Run all 99 queries, 1 at a time - Concurrency = 1
Test 2: Run all 99 queries, 2 at a time - Concurrency = 2
Test 3: Run all 99 queries, 4 at a time - Concurrency = 4
Test 4: Run all 99 queries, 8 at a time - Concurrency = 8
Test 5: Run all 99 queries, 16 at a time - Concurrency = 16
Test 6: Run all 99 queries, 32 at a time - Concurrency = 32
Test 7: Run all 99 queries, 64 at a time - Concurrency = 64
40. Use IntelliJ to run and debug Spark application
remotely on an HDInsight cluster anytime.
Developers can inspect variables, watch
intermediate data, step through code, and finally
edit the app and resume execution – all against
Azure HDInsight clusters with cluster data.
Set a breakpoint for both driver and executor
code. Debugging executor code lets developers
detect data-related errors by viewing RDD
intermediate values, tracking distributed task
operations, and stepping through execution units.
Set a breakpoint in Spark external libraries
allowing developers to step into Spark code and
debug in the Spark framework.
View both driver and executor code execution
logs in the console panel.
41. • Interactive responses brings the best properties of Python and
Spark with flexibility to execute one or multiple
statements.
• Built in Python language service such as
IntelliSense auto suggest, auto complete, error
marker, among others.
• Preview and export your PySpark interactive
query results to csv, json, and excel format.
• Integration with Azure for HDInsight cluster
management and query submissions.
• Link with Spark UI and Yarn UI for further trouble
shooting.
55. OMS Agent for
Linux
HDInsight nodes (Head, Worker ,
Zookeeper )
FluentD
HDInsight
plugin
1. Plugin for ‘in_tail’ for all Logs, allows
regexp to create JSON object
2. Filter for WARN and above for each
Log Type. `grep` filter plugin
3. Output to out_oms_api Type
4. Exec plugin for Metrics
HBaseConfigosmconfig
Spark
Hive/ LLAP
Storm
Kafka
Config
Config
Config
Config
Log Analytics(OMS) Service
HDInsight Log Analytics Architecture