7. Huahin Framework
http://huahinframework.org
We released some software which developed in an office
in June 2012 as OSS.
* It is what was used in the panel log analysis.
* Please refer to the slide of the "Hadoop Conference
Japan 2011 Fall" for more information.
http://goo.gl/C9tzf
Huahin Framework is a general term for multiple
products.
8. Huahin Framework
http://huahinframework.org
The origin of the name of Huahin Framework
There is a custom to decide on a wine region in the code name
of the office.
Huahin = Hua Hin = Tourist destinations in Thailand = Wine region
When it comes to Thailand...
Tt is the elephant !
As such, Huahin
image
9. Huahin Framework
http://huahinframework.org
Huahin Framework Configuration
Main is consists of the following elements:
• Huahin Core
• Huahin Tools
• Huahin Manager
10. Huahin Framework
http://huahinframework.org
Huahin Core
• Simplified MapReduce programs
• Do not have to write it yourself Writable and
Secondary Sort
• The basic grouping, sorting, etc., the idea from SQL
• If you want to write, can write natural MapReduce
• C++ is the same as a superset of C
• It can do Hive or Pig. However, if it really want to give
the performances.(Parallel computation, etc...)
• There Huahin Unit as a test driver
• Wraps the MRUnit
• Example of implementation
11. Huahin Framework
http://huahinframework.org
Huahin Example
• Page top 10 rank example
First, natural MapReduce.
Second, Huahin MapReduce.
12. Huahin Framework
http://huahinframework.org
Data of page top 10 rank
Example: format is Tab delimited
Jan 21, 2013 user1 /index.html
Jan 21, 2013 user1 /index2.html
Jan 21, 2013 user2 /contents/foo.html
Jan 21, 2013 user42 /bar.html
Jan 21, 2013 user3 /index.html
Jan 21, 2013 user7 /news/index.html
Jan 21, 2013 user4 /release/2013.html
Jan 21, 2013 user3 /index2.html
Jan 21, 2013 user7 /download.html
Jan 21, 2013 user5 /bar.html
Jan 21, 2013 user12 /release/2012.html
Jan 21, 2013 user5 /contents/foo.html
Jan 21, 2013 user23 /page2.html
Jan 21, 2013 user53 /news.html
Jan 21, 2013 user6 /download.html
Jan 21, 2013 user21 /bar.html
Jan 21, 2013 user18 /index.html
13. Huahin Framework
http://huahinframework.org
Page top 10 rank of natural MapReduce
JobTools
public class PathTop10RankJobTool extends Configured implements Tool {
@Override
public int run(String[] arg0) throws Exception {
Job firstJob = new Job(getConf(), "first");
firstJob.setJarByClass(PathTop10RankJobTool.class);
TextInputFormat.setInputPaths(firstJob, "input");
firstJob.setInputFormatClass(TextInputFormat.class);
firstJob.setMapperClass(PathTop10RankFirstMapper.class);
firstJob.setMapOutputKeyClass(FirstKeyWritable.class);
firstJob.setMapOutputValueClass(IntWritable.class);
firstJob.setReducerClass(PathTop10RankFirstReducer.class);
firstJob.setOutputKeyClass(SecondKeyWritable.class);
firstJob.setOutputValueClass(IntWritable.class);
SequenceFileOutputFormat.setOutputPath(firstJob, new Path("first"));
firstJob.setOutputFormatClass(SequenceFileOutputFormat.class);
if (!firstJob.waitForCompletion(true)) {
return -1;
}
Job secondJob = new Job(getConf(), "second");
secondJob.setJarByClass(PathTop10RankJobTool.class);
SequenceFileInputFormat.setInputPaths(secondJob, "first");
secondJob.setInputFormatClass(SequenceFileInputFormat.class);
secondJob.setMapperClass(Mapper.class);
secondJob.setMapOutputKeyClass(SecondKeyWritable.class);
secondJob.setMapOutputValueClass(IntWritable.class);
secondJob.setGroupingComparatorClass(PathTop10RankGroupingComparatorClass.class);
secondJob.setPartitionerClass(PathTop10RankPartitioner.class);
secondJob.setSortComparatorClass(PathTop10RankingSortComparator.class);
secondJob.setReducerClass(PathTop10RankSecondReducer.class);
secondJob.setOutputKeyClass(SecondKeyWritable.class);
secondJob.setOutputValueClass(IntWritable.class);
TextOutputFormat.setOutputPath(secondJob, new Path("output"));
secondJob.setOutputFormatClass(TextOutputFormat.class);
return secondJob.waitForCompletion(true) ? 0 : -1;
}
}
14. Huahin Framework
http://huahinframework.org
Page top 10 rank of natural MapReduce
FirstMapper
public class PathTop10RankFirstMapper
extends Mapper<LongWritable, Text, FirstKeyWritable, IntWritable> {
private IntWritable ONE = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] s = value.toString().split("t");
context.write(new FirstKeyWritable(s[0], s[2]), ONE);
}
}
FirstReducer
public class PathTop10RankFirstReducer
extends Reducer<FirstKeyWritable, IntWritable, SecondKeyWritable, IntWritable> {
@Override
protected void reduce(FirstKeyWritable key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int pv = 0;
for (IntWritable i : values) {
pv += i.get();
}
context.write(
new SecondKeyWritable(key.getDate().toString(), key.getPage().toString(), pv),
new IntWritable(pv));
}
}
15. Huahin Framework
http://huahinframework.org
Page top 10 rank of natural MapReduce
SecondReducer
public class PathTop10RankSecondReducer
extends Reducer<SecondKeyWritable, IntWritable, SecondKeyWritable, IntWritable> {
@Override
protected void reduce(SecondKeyWritable key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int rank = 0;
for (IntWritable i : values) {
if (rank > 10) {
break;
}
context.write(key, i);
rank++;
}
}
}
16. Huahin Framework
http://huahinframework.org
Page top 10 rank of natural MapReduce
FirstKeyWritable SecondKeyWritable
public class SecondKeyWritable implements WritableComparable<SecondKeyWritable> {
public class FirstKeyWritable implements WritableComparable<FirstKeyWritable> { private Text date = new Text();
private Text date = new Text(); private Text page = new Text();
private Text page = new Text(); private IntWritable pv = new IntWritable();
public FirstKeyWritable() { public SecondKeyWritable() {
} }
public FirstKeyWritable(String date, String page) { public SecondKeyWritable(String date, String page, int pv) {
this.date.set(date); this.date.set(date);
this.page.set(page); this.page.set(page);
} this.pv.set(pv);
}
@Override
public void readFields(DataInput in) throws IOException { @Override
this.date.readFields(in); public void readFields(DataInput in) throws IOException {
this.page.readFields(in); this.date.readFields(in);
} this.page.readFields(in);
this.pv.readFields(in);
@Override }
public void write(DataOutput out) throws IOException {
this.date.write(out); @Override
this.page.write(out); public void write(DataOutput out) throws IOException {
} this.date.write(out);
this.page.write(out);
@Override this.pv.write(out);
public int compareTo(FirstKeyWritable o) { }
int compare = this.date.toString().compareTo(o.date.toString());
if (compare != 0) { @Override
return compare; public int compareTo(SecondKeyWritable o) {
} return this.date.toString().compareTo(o.date.toString());
return this.page.toString().compareTo(o.page.toString()); }
}
@Override
@Override public boolean equals(Object obj) {
public boolean equals(Object obj) { if (obj == null) {
if (obj == null) { return false;
return false; }
}
if (!(obj instanceof SecondKeyWritable)) {
if (!(obj instanceof FirstKeyWritable)) { return false;
return false; }
}
SecondKeyWritable o = (SecondKeyWritable) obj;
FirstKeyWritable o = (FirstKeyWritable) obj; return this.date.equals(o.getDate());
return this.date.equals(o.getDate()) && }
this.page.equals(o.getPage());
} @Override
public String toString() {
/** return this.date + "t" + this.page;
* @return the date }
*/
public Text getDate() { /**
return date; * @return the date
} */
public Text getDate() {
/** return date;
* @param date the date to set }
*/
public void setDate(Text date) { /**
this.date = date; * @param date the date to set
} */
public void setDate(Text date) {
/** this.date = date;
* @return the page }
*/
public Text getPage() { /**
return page; * @return the page
} */
public Text getPage() {
/** return page;
* @param page the page to set }
*/
public void setPage(Text page) { /**
this.page = page; * @param page the page to set
} */
} public void setPage(Text page) {
this.page = page;
}
/**
* @return the pv
*/
public IntWritable getPv() {
return pv;
}
/**
* @param pv the pv to set
*/
public void setPv(IntWritable pv) {
this.pv = pv;
}
}
17. Huahin Framework
http://huahinframework.org
Page top 10 rank of natural MapReduce
GroupingComparator SortComparator
public class PathTop10RankGroupingComparatorClass extends WritableComparator { public class PathTop10RankingSortComparator extends WritableComparator {
public PathTop10RankGroupingComparatorClass() { public PathTop10RankingSortComparator() {
super(SecondKeyWritable.class, true); super(SecondKeyWritable.class, true);
} }
@SuppressWarnings({ "rawtypes", "unchecked" }) @SuppressWarnings({ "rawtypes", "unchecked" })
@Override @Override
public int compare(Object a, Object b) { public int compare(Object a, Object b) {
if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) { if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) {
Comparable one = SecondKeyWritable.class.cast(a).getDate(); Comparable one = SecondKeyWritable.class.cast(a).getDate();
Comparable another = SecondKeyWritable.class.cast(b).getDate(); Comparable another = SecondKeyWritable.class.cast(b).getDate();
return one.compareTo(another);
} int compare = one.compareTo(another);
return super.compare(a, b); if (compare != 0) {
} return compare;
} }
Comparable oneOrder = SecondKeyWritable.class.cast(a).getPv();
Comparable anotherOrder = SecondKeyWritable.class.cast(b).getPv();
return oneOrder.compareTo(anotherOrder);
}
return super.compare(a, b);
Partitioner }
}
public class PathTop10RankPartitioner extends Partitioner<SecondKeyWritable, IntWritable> {
@Override
public int getPartition(SecondKeyWritable key, IntWritable value, int numPartitioner) {
return Math.abs(key.getDate().hashCode()) % numPartitioner;
}
}
18. Huahin Framework
http://huahinframework.org
Page top 10 rank of natural MapReduce
• This is a very long ...
• About 307 lines
19. Huahin Framework
http://huahinframework.org
Page top 10 rank of Huahin MapReduce
JobTools FirstSummarizer
public class PathRankingJobTool extends SimpleJobTool { public class FirstSummarizer extends Summarizer {
@Override @Override
protected String setInputPath(String[] args) { public void init() {
return args[0]; }
}
@Override
@Override public void summarize(Writer writer)
protected String setOutputPath(String[] args) { throws IOException, InterruptedException {
return args[1]; int pv = 0;
} while (hasNext()) {
Record record = next(writer);
/* (non-Javadoc) pv += record.getValueInteger("PV");
* @see org.huahin.core.SimpleJobTool#setup() }
*/
@Override Record emitRecord = new Record();
protected void setup() throws Exception { emitRecord.addGrouping("DATE", getGroupingRecord().getGroupingString("DATE"));
final String[] labels = new String[] { "DATE", "USER", "URL" }; emitRecord.addSort(pv, Record.SORT_UPPER, 1);
emitRecord.addValue("PATH", getGroupingRecord().getGroupingString("PATH"));
SimpleJob job1 = addJob(labels, StringUtil.TAB); emitRecord.addValue("PV", pv);
job1.setFilter(FirstFilter.class); writer.write(emitRecord);
job1.setSummarizer(FirstSummarizer.class); }
SimpleJob job2 = addJob(); @Override
job2.setSummarizer(SecondSummarizer.class); public void summarizerSetup() {
} }
} }
FirstFilter SecondSummarizer
public class FirstFilter extends Filter { public class SecondSummarizer extends Summarizer {
@Override @Override
public void init() { public void init() {
} }
@Override @Override
public void filter(Record record, Writer writer) public void summarize(Writer writer)
throws IOException, InterruptedException { throws IOException, InterruptedException {
Record emitRecord = new Record(); int rank = 1;
emitRecord.addGrouping("DATE", record.getValueString("DATE")); while (hasNext()) {
emitRecord.addGrouping("PATH", record.getValueString("URL")); if (rank > 10) {
emitRecord.addValue("PV", 1); break;
writer.write(emitRecord); }
}
Record record = next(writer);
@Override Record emitRecord = new Record();
public void filterSetup() { emitRecord.addValue("PATH", record.getValueString("PATH"));
} emitRecord.addValue("UU", record.getValueInteger("UU"));
}
writer.write(emitRecord);
rank++;
}
}
@Override
public void summarizerSetup() {
}
}
20. Huahin Framework
http://huahinframework.org
Page top 10 rank of Huahin MapReduce
• This is a very short!!
• About 100 lines
21. Huahin Framework
http://huahinframework.org
Huahin Core
• Other
• Simple Join
• Big Join
• etc ...
22. Huahin Framework
http://huahinframework.org
Huahin Tools
• A collection of tools generic operation.
• Currently only Apache Log molding...
• Operating environment
• On Premises Hadoop
• Stand Alone
• Multi Thread execution for small data
• EMR
• S3://huahin/tools/huahin-tools.0.1.0.jar
23. Huahin Framework
http://huahinframework.org
Huahin Manager
• Manager to manage the MapReduce Job
• Get the Job list
• Get the Job detail
• Kill Job
• Execution Job
•
Run queue management
•
MapReduce Jar
•
Hive Scripts
•
Pig Scripts
• Execution Hive Query
• Execution Pig Latin
• Execution is done in all the REST API.
• Supported Apache Hadoop 1.0.X and 2.0.2-alpha
• Supported CDH3 and CDH4
24. Huahin Framework
http://huahinframework.org
Huahin Manager
• For 2.0.2-alpha and CDH4
• Getting the Application list
• Getting the Cluster info
• Kill Application
• Proxy to YARN APIs
25. Huahin Framework
http://huahinframework.org
Huahin Manager
• EMR Support
• Setting bootstrap
s3://huahin/manager/configure
• Security group setting in order to access the REST API.
•
Security group that you set will be created during the
startup of the EMR.
ElasticMapReduce-master
•
Values to be set
•
Port range: 9010
•
Source: IP addresses that are allowed to connect
26. Huahin Framework
http://huahinframework.org
Huahin Manager
Operating environment of Huahin Manager
Huahin Various HiveServer(1and 2)
Manager operations
Hadoop
Cluster
REST API
27. Huahin Framework
http://huahinframework.org
Huahin EManager
Manager that specializes in EMR
• Manager to manage the Job Flow
•Get the Job Flow list
•Get the Job Flow detail
•Kill Job Flow Step
•Execution Job
•
Run queue management
•
Register of queue
•
Get the queue detail
•
Remove queue
28. Huahin Framework
http://huahinframework.org
Huahin EManager
• Register queue
• The following functions can be assigned to the queue
at the EMR supports.
•Hive
•Pig
•Streaming
•Custom JAR
• EManager can specify the cluster size to be started.
EManager assign a queue to a cluster that is free.
(EMR to be a good point to bring up multiple
cluster!)
29. Huahin Framework
http://huahinframework.org
Huahin EManager
Operating environment of Huahin EManager
Huahin Manager will
be started by the
Various Master node
operations bootstrap.
On premises Amazon
or
EC2 Instance
Elastic
MapReduce
Huahin
Huahin Manager will
EManager
Various be started by the
Master node
operations bootstrap.
Amazon
Elastic
MapReduce
REST API
※ NOTICE: Setup the security group
30. Huahin Framework
http://huahinframework.org
Huahin EManager
Operating environment of Huahin EManager
The place that is different when EManager starts in
Management Console and Tools.
• EManager recycle one Job Flow
Not attempt to start and end every time the
EMR.Order to save costs and performances.
※ It Currently can not Management Console. However,
Can be done from the command line and SDK.
• However, reboot automatically when the upper limit of
the number reaches 255 Step.
31. Huahin Framework
http://huahinframework.org
Huahin EManager
Operating environment of Huahin EManager
The place that is different when EManager starts in
Management Console and Tools.
• It is booting for one hour
• for cost(accounting and performance)
• It do shutdown automatically before the timing
charged.
• However, if it were running the Job is carried over
to the next billing timing.
32. Huahin Framework
http://huahinframework.org
Huahin EManager
Register queue
Done using the PUT or POST method of registration of the
queue.
• PUT:If it have a script or JAR on the S3, It do Job
Flow or only the execution of Step.
• POST:Place the JAR or script in the local to S3.
Boot and execution Step of Job Flow. It is a feature
not in the EMR. And, option to remove the files that
were POST.
• All registration is done in JSON.
33. Huahin Framework
http://huahinframework.org
Huahin EManager
Register queue
Examples of PUT in the Hive:
$ curl -X PUT http://localhost:9020/queue/register/hive
-F ARGUMENTS='{"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}'
Optional arguments of JSON
Examples of POST in the Hive:
$ curl -X POST http://localhost:9020/queue/register/hive
-F SCRIPT=@wordcount.hql
-F ARGUMENTS='{"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}'
Optional arguments of JSON
Deleted after execution by setting the "true": "deleteOnExit"
It no default deleted.
34. Huahin Framework
http://huahinframework.org
Huahin EManager
List of Job Flow
Example of Get all Job Flow list:
$ curl -X GET http://localhost:9020/jobflow/list
Example of get running Job Flow list:
$ curl -X GET http://localhost:9020/jobflow/runnings
Example of Job Flow detail:
$ curl -X GET http://localhost:9020/jobflow/describe/j-XXXXXXXXXXXX
35. Huahin Framework
http://huahinframework.org
Huahin EManager
Queue API
Example of registered queue list:
$ curl -X GET http://localhost:9020/queue/list
Example of runnings queue list:
$ curl -X GET http://localhost:9020/queue/runnings
Example of get queue detail:
$ curl -X GET http://localhost:9020/queue/describe/S_XXXXXXXXXXXX
Example of delete queue:
$ curl -X DELETE http://localhost:9020/queue/kill/S_XXXXXXXXXXXX
36. Huahin Framework
http://huahinframework.org
Huahin EManager
Kill of Job
There is a command to kill the Job running on Hadoop.
hadoop job -kill job_XXXXXXXXXX
However, there is no function that EMR. If start a Job by
mistake, there is no choice but to terminate the Job Flow.
It will be able to kill by SSH to connect to the master node of
the EMR, type the above command.
Troublesome...
37. Huahin Framework
http://huahinframework.org
Huahin EManager
Kill of Job
It made
possible the Kill API from EManager (Manager)!
Example of Step kill:
$ curl -X DELETE http://localhost:9020/jobflow/kill/step/S_XXXXXXXXXXXX
38. Huahin Framework
http://huahinframework.org
Conclusion
• Huahin Core
• Unlike the Hive and Pig
• When it want to use MapReduce to some extent the
natural.
• Huahin Tools
• Still...
• Huahin Manager
• All REST API operation
• Integration with other systems
• Huahin EManager
• Integration with other systems
• Cost and Performance management
• Kill Step of Job Flow!
39. Huahin Framework
http://huahinframework.org
The current version
• Huahin Core 0.1.4
• Huahin Unit 0.1.4
• Huahin Tools 0.1.0
• Huahin Manager
• 0.1.4 for Apache Hadoop 1.0.4
• 0.1.4 for CDH3
• 0.2.1 for Apache hadoop 2.0.2-alpha
• 0.2.1 for CDH4
• Huahin EManager 0.1.1