2. 2
Yahoo! JAPAN
Providing over 100 services
on PC and mobile
64.9billion PV/month
Hadoop cluster
Total 6000Nodes
120PB of variety data
・
・
・
Search
Shopping
News
Auction
Mail
3. Agenda
3
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. HDFS Erasure Coding Tests
• System Tests
• Basic Operations
• Reconstruct Erasure Coding Blocks
• Other features
• Performance Tests
3. Usage in Yahoo! JAPAN
• Principal workloads in our production
• Future plan
4. Agenda
4
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. HDFS Erasure Coding Tests
• System Tests
• Basic Operations
• Reconstruct Erasure Coding Blocks
• Other features
• Performance Tests
3. Usage in Yahoo! JAPAN
• Principal workloads in our production
• Future plan
5. What is Erasure Coding?
A technique to earn data availability and durability
5
Original raw data
Data chunks Parity
striping encoding
Storing raw data and parity
6. Key points
• Missing or corrupted data will be reconstruct with living
data and parity
• Parity is typically smaller than original data
6
7. HDFS Erasure Coding
• New feature in Hadoop 3.0
• To reduce storage overhead
• Half of the tripled replication
• Same durability as replication
7
8. Agenda
8
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. HDFS Erasure Coding Tests
• System Tests
• Basic Operations
• Reconstruct Erasure Coding Blocks
• Other features
• Performance Tests
3. Usage in Yahoo! JAPAN
• Principal workloads in our production
• Future plan
9. Implementation (Phase1)
• Striped Block Layout
• Striping original raw data
• Encode striped data to parity data
• Target on cold data
• Not modified and rarely accessed
• Reed Solomon(6,3) Codec
• System default codec
• 6 data and 3 parity blocks
9
10. 10
Striped Block Management
• Raw data is striped
• The basic unit of striped data
is called “cell”
• The ”cell” is 64kB
64kB
Originalrawdata
striping
11. 11
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
12. 12
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
13. 13
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
14. 14
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
15. 15
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
16. 16
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
17. 17
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
18. 18
Striped Block Management
• The cells are written in blocks
in order
• With striped layout
64kB
Originalrawdata
striping
23. Striped Block Management
Block Group
•Combining data and parity striped blocks
23
Striped raw data Striped parity data
index0 index1 index2 index3 index4 index5 index6 index7 index8
24. Internal Blocks
• The striped blocks in Block Group are called ”internal block”
• Every internal block has index
24
Striped raw data Striped parity data
index0 index1 index2 index3 index4 index5 index6 index7 index8
25. Agenda
25
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. HDFS Erasure Coding Tests
• System Tests
• Basic Operations
• Reconstruct Erasure Coding Blocks
• Other features
• Performance Tests
3. Usage in Yahoo! JAPAN
• Principal workloads in our production
• Future plan
26. Replication vs EC
Replication Erasure Coding(RS-6-
3)
Target HOT COLD
Storage overhead 200% 50%
Data durability ✔ ✔
Data Locality ✔ ✖
Write
performance
✔ ✖
Read
performance
✔ △
26
27. Replication vs EC
Replication Erasure Coding(RS-6-
3)
Target HOT COLD
Storage overhead 200% 50%
Data durability ✔ ✔
Data Locality ✔ ✖
Write
performance
✔ ✖
Read
performance
✔ △
27
It will not be modified
and rarely accessed.
28. Replication vs EC
Replication Erasure Coding(RS-6-
3)
Target HOT COLD
Storage overhead 200% 50%
Data durability ✔ ✔
Data Locality ✔ ✖
Write
performance
✔ ✖
Read
performance
✔ △
28
29. Replication vs EC
Replication Erasure Coding(RS-6-
3)
Target HOT COLD
Storage overhead 200% 50%
Data durability ✔ ✔
Data Locality ✔ ✖
Write
performance
✔ ✖
Read
performance
✔ △
29
The tripled replication mechanism could tolerant missing 2/3
replica.
In the case of the Erasure Coding, if 3/9 of storages were failed,
missing data could be reconstructed
30. Replication vs EC
Replication Erasure Coding(RS-6-
3)
Target HOT COLD
Storage overhead 200% 50%
Data durability ✔ ✔
Data Locality ✔ ✖
Write
performance
✔ ✖
Read
performance
✔ △
30
The data locality would be lost by using the Erasure Coding.
However, cold data in the Erasure Coding would not be accessed
frequently.
31. Replication vs EC
Replication Erasure Coding(RS-6-
3)
Target HOT COLD
Storage overhead 200% 50%
Data durability ✔ ✔
Data Locality ✔ ✖
Write
performance
✔ ✖
Read
performance
✔ △
31
In the Erasure Coding, the calculation of the parity data will
decrease the write throughput.
In the reading situation, the performance will not decrease so
much.
But if some internal blocks were missing, the reading throughput
would be drop down.
32. Replication vs EC
Replication Erasure Coding(RS-6-
3)
Target HOT COLD
Storage overhead 200% 50%
Data durability ✔ ✔
Data Locality ✔ ✖
Write
performance
✔ ✖
Read
performance
✔ △
32
In the Erasure Coding, in order to recovery the missing data, a
node need to read other living raw data and parity data from
remote.
And then the node reconstruct missing data.
These process will use network traffics and CPU resources.
33. Agenda
33
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. The results of the erasure coding testing
• System tests
• Performance tests
3. Usage in our production
• Principal workloads in our production
• Future plan
34. HDFS EC Tests
• System Tests
• Basic operations
• Reconstruct EC blocks
• Decommission DataNodes
• Other Features
34
35. HDFS EC Tests
• System Tests
• Basic operations
• Reconstruct EC blocks
• Decommission DataNodes
• Other Features
35
36. Basic Operations
“hdfs erasurecode –setPolicy”
• Target
• Only directory
• Must be empty
• Sub-directory and files inherit policy
• Superuser privilege needed
• Default policy: Reed Solomon(6,3)
36
37. Basic Operations
To confirm whether the directory has the Erasure Coding
policy
“hdfs erasurecode –getPolicy”
• Show the information about the codec and cell size
37
38. File Operations
After setting EC policy,
Basic file operations are conducted against EC files
• File format transparent to HDFS clients
• Write/read datanode failure tolerant
Move operation is a little different.
• File format not automatically changed by move operation.
38
39. HDFS EC Tests
• System Tests
• Basic operations
• Reconstruct EC blocks
• Decommission DataNodes
• Other Features
39
40. Reconstruct EC Blocks
The missing blocks can be reconstructed with at least 6 living
internal blocks.
40
Reading 6 living blocks and reconstruct missing block
41. Reconstruct EC Blocks
• The cost of the reconstruction is irrelevant with missing internal
block count
• No matter one or three internal blocks are missing, the
reconstruction costs are the same
• block groups with more missing internal blocks has higher priority
41
43. HDFS EC Tests
• System Tests
• Basic operations
• Reconstruct EC blocks
• Decommission DataNodes
• Other Features
43
44. Decommission DNs
Decommission is basically same as recovery blocks
But, transfer blocks to another DataNode is enough in Erasure
Coding.
44
Decommissioning node Other living nodes
Internal block
Simple transfer
Internal block
45. HDFS EC Tests
• System Tests
• Basic operations
• Reconstruct EC blocks
• Decommission DataNodes
• Other Features
45
47. Unsupported
• Flush and Synchronize(hflush/hsync)
• Append to EC files
• Truncate EC files
These features are not supported.
However, those are not so critical, because the target of HDFS erasure coding
is storing cold data.
47
48. Agenda
48
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. The results of the erasure coding testing
• System tests
• Performance tests
3. Usage in our production
• Principal workloads in our production
• Future plan
52. Read/Write Throughput
ErasureCodingBenchmarkThroughput
• Write and read files on the replication and erasure coding formats with
multithreads
• In Erasure Coding, writing throughput was about 65% of replication’s
• Reading throughput decreased slightly in Erasure Coding
52
Replication Erasure
Coding
Write 111.3 MB/s 73.94 MB/s
Read(stateful) 111.3 MB/s 111.3 MB/s
Read(positional
)
111.3 MB/s 107.79 MB/s
53. Read With Missing Internal Blocks
The throughput decreased when internal blocks was missing
Because the client needed to reconstruct missing blocks
53
All blocks
living
An internal block
missing
Read(stateful) 111.3 MB/s 108.94 MB/s
Read(positional) 107.79 MB/s 92.25 MB/s
55. 55
CPU Time of TeraGen
X-axis: number of rows of outputs
Y-axis: log scaled CPU times(ms)
The CPU time increased in the erasure
coding format.
The overhead of the calculate parity
data affected writing performance.
56. 56
TeraSort Map/Reduce Time Cost
Total time spent by map(left) and reduce(right)
X-axis: number of rows of outputs
Y-axis: log scaled CPU times(ms)
The times spent by the reduce tasks
increased significantly in the erasure
coding.
Because the main workload of the
reduce was writing.
58. Elapsed Time of Distcp
• Our real log data(2TB)
• Copying replication to EC was tripled of copying replication to replication
• Currently using distcp is the best way to convert to Erasure Coding
58
Real elapsed time Cpu time
Replication to EC 17mins, 11sec 64,754,291ms
Replication to
Replication
5mins, 5sec 20,187,156ms
59. Agenda
59
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. HDFS Erasure Coding Tests
• System Tests
• Basic Operations
• Reconstruct Erasure Coding Blocks
• Other features
• Performance Tests
3. Usage in Yahoo! JAPAN
• Principal workloads in our production
• Future plan
60. 60
Target
Raw data of weblogs
Daily total 6.2TB
Up to 400 days
The capacity used of our production
HDFS.
We need to reduce storage space cost.
61. EC with Archive Disk
We are using Erasure Coding with archive storages
The cost of archive disk is 70% of normal disk.
In total, the storage cost of EC with archive disk could be reduced to 35% of x3
replication with normal disk.
61
X3 replication with normal disk Erasure coding with archive disk
35% of x3 replica
62. 62
EC Files as Hive Input
Erasure Coding ORC files
2TB, 17billion records
• The query execution time seemed to
increase when the input data is
erasure coded files
• But it will be acceptable, considering
the queries are rarely executed
Queries
63. Agenda
63
1. About HDFS Erasure Coding
• Key points
• Implementation
• Compare to replication
2. HDFS Erasure Coding Tests
• System Tests
• Basic Operations
• Reconstruct Erasure Coding Blocks
• Other features
• Performance Tests
3. Usage in Yahoo! JAPAN
• Principal workloads in our production
• Future plan
64. Future Phase of HDFS EC
• Codecs
• Intel ISA-L
• Hitchhiker algorithm
• Contiguous layout
• To provide data locality
• Implement hflush/hsync
If they were implemented, the erasure coding format would be
used in much more scenarios
64
65. Conclusion
• HDFS Erasure Coding
• The target is storing cold data
• It reduces half storage costs without sacrificing
data durability
• It’s ready for production
65
I’m Takuya Fukudome from Yahoo JAPAN. In this session, I’ll talk about HDFS Erasure Coding in Action
At first, please let me introduce Yahoo! JAPAN
We Yahoo Japan are providing lots of web services on PC and mobile in Japan. Thanks to these services and our users, about 120PB of varieties data is being accumulated into our hadoop clusters.
In this session, I’ll share about the new HDFS feature, erasure coding, which is being used in our cluster and help us save about half of the storage costs. Improving overall ROI without notable loss of data durability, HDFS erasure coding is the feature you might would like to import to your data platform.
In this session.
First of all, I’ll share some key points in detail about HDFS erasure coding.
Then, the results of the erasure coding testing in our cluster will be shared and discussed.
Finally, walking through usage and practical workloads of the HDFS erasure coding in our production cluster.
First, I’ll introduce the HDFS erasure coding.
What is Erasure Coding?
Erasure coding is a technique to earn data availability and durability which original raw data is split into chunks, and encoded the data stripes to parity data.
These striped raw data and parity data will be stored in different storages.
If some data chunks were lost or corrupted, missed chunks were reconstructed from the living raw data and the parity data.
Because the parity data is typically smaller than original raw data, so the erasure coding could save the storage space better.
The erasure coding is a new feature in HDFS, it’s planned to be release in hadoop 3.0. This feature aims to reduce the overhead of the storage space.
The storage overhead of the erasure coding format files will be half of the tripled replication mechanism.
And the same level data durability as replication could be provided by HDFS Erasure Coding as well.
Next I’ll introduce the implementation of Erasure Coding in HDFS.
In the HDFS erasure coding, the original raw data and its parity data are written into storages with striped layout.
In the phase1, erasure coding targets on cold data which will be not modified and rarely accessed. The codec which is used to calculate parity data is Reed Solomon(6,3). In this codec, the original raw data will be striped and written into 6 blocks. Also 3 blocks of parity data will be calculated and written as well.
Let’s take a look at how data is striped and written in Erasure Coding.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
As described before, original raw data is striped. The basic unit of striped data is called “cell” and the ”cell” is 64KB typically.
And in the Reed Solomon(6,3), the cells are written into 6 blocks in order.
Then, six cells of raw data could be used to calculate three cells of parity data. And these nine cells were named “stripe”.
The parity data are calculated in every stripe of the original raw data and each parity data are written in parity blocks.
There are always 3 parity data cells in every stripe and they are written in 3 blocks in order.
The parity data are calculated in every stripe of the original raw data and each parity data are written in parity blocks.
There are always 3 parity data cells in every stripe and they are written in 3 blocks in order.
The parity data are calculated in every stripe of the original raw data and each parity data are written in parity blocks.
There are always 3 parity data cells in every stripe and they are written in 3 blocks in order.
Finally, combining 6 original raw data blocks and 3 parity data blocks, there comes an Erasure Coding Block Group.
And these striped blocks in an block group is respectively named internal blocks.
Each internal block has its own index.
The index of the internal block is decided according to the order when write raw data and parity data. Also these indexes would be used when read and reconstruction data.
The internal blocks which has indexes from the 0 to 5 holds raw data, and the others which has the indexes from 6 to 8 hold parity data.
At the end of introduction to HDFS Erasure Coding, let’s compare erasure coding and replication mechanism in several dimensions.
So that we could get more concrete impression of HDFS Erasure Coding.
First, not like replication, erasure coding targeted on cold data which would not be modified and rarely accessed.
Next, the storage overhead of the tripled replication is 200%, on the other hand, the storage over head of the erasure coding is approximately 50%.
Erasure Coding save the storage by 50% percent comparing to tripled Replication.
As to data durability. tripled replication mechanism could tolerant missing two of the three replica.
In the case of Reed Solomon(6,3) Erasure Coding, on the other hand, even if three ninths of the storages were failed, HDFS could reconstruct missing data.
The durability of the erasure coding is in the similar level as replication.
When it comes to data locality, unfortunately the data locality would be lost by using the erasure coding.
Because in the erasure coding, each striped data are stored on the different DataNodes.
These could slow down the jobs running against Erasure Coding files.
However, typically cold data in Erasure Coding format would not be accessed frequently.
Next two categories are write performance and read performance.
In the erasure coding, the calculation of the parity data decrease the write throughput.
And in the reading situation, the performance will not decrease so much.
But if some internal blocks were missing, the reading throughput would drop down.
Because the client will have to reconstruct missing blocks.
The last one is about the cost of data recovering.
In the case of using the replication, copying replica to the another DataNode is enough.
While in erasure coding, in order to recovery the missing data, a node need to read other living raw data and parity data from remote and reconstruct data.
These processes will use network traffic and CPU resources.
Therefore the cost of data recovery will be higher in the erasure coding.
We have walked through the basic key points of Erasure Coding, then we’ll check the test results of Erasure Coding.
Tests were conducted in the two major categories.
Firstly, system tests.
In system test, basic operations, such as write and read Erasure Coding file, were tested.
And then important features like NameNode HA and auto reconstruction involved with erasure coding files were also tested.
First, I’ll talk about basic operations like write and read
By the way, along with checking the basic operations tests, one could be able to figure out how to start using Erasure Coding as well.
First of all, for using Erasure Coding. A new subcommand “easurecode” is added to the “hdfs” command and “hdfs erasurecode –setPolicy” is the starting point to use the HDFS erasure coding.
It sets erasure coding policy to the specified directory, and this directory should be empty.
Sub-directories and files under the Erasure Coding directories inherit the Erasure Coding policy.
After setting Erasure Coding policy, the files write to or distcp to the Erasure Coding directory are stored as erasure coding files.
The erasure coding policy contains the codec and cell size which are using in erasure coding.
Please make sure to have superuser privilege to use set policy command.
In the phase 1, the system default policy is Reed Solomon(6,3).
And if you want to confirm whether the directory has erasure coding policy or not, you can run ”hdfs erasurecode –getPolicy” command.
It will show the information about the codec and cell size which is set to the specified path.
After setting Erasure Coding policy, basic file operations were conducted against Erasure Coding file. Including write, read, copy and move operations.
Basically we would be able to write and read files without considering whether the target path is replication or erasure coding.
And furthermore, write and read against Erasure Coding files could tolerant reasonable count of datanode failures during operations.
Copy operation is basically same as write and read.
However move operation is a little different.
Please be aware of that the files’ format are not automatically changed between replication and erasure coding by move operation.
When the erasure code files are moved to normal directory which doesn’t have erasure coding policy, the files will still remain erasure coding format and vice versa.
Next is about reconstruct Erasure Coding blocks
about auto reconstruction of the erasure coding files.
In case of using the system default codec, Reed Solomon(6,3), the missing blocks can be reconstructed with at least 6 living internal blocks.
Actually, the cost of the reconstruction of the missing internal blocks is irrelevant with missing internal block count.
No matter one or three internal blocks are missing, the reconstruction costs are the same.
So in reconstruction, block groups with more missing internal blocks has higher priority, because they are more dangerous for be lost.
Block group recovery with rack failures were tested too.
Along with erasure coding support was added to the HDFS, a new block placement policy, “BlockPlacementPolicyRackFaultTolerant” was also added.
In this policy, NameNode will chose the storages to distributed internal blocks to racks as many as possible.
Actually we can write some erasure coding files with nodes under a few racks, but it’s better to have at least 6 racks.
That is enough to satisfy the rack fault tolerant and prevent to lost lots of internal blocks if a rack fails.
Next, I’ll talk about decommission
Decommission is also very important, and it’s basically as same as recovery blocks in the perspective of data completeness.
The difference is for erasure coding, if all internal blocks were living, transfer blocks between nodes do not need reconstruction.
This optimization is good. Because the cost of reconstruction of the internal blocks is higher than transfer blocks.
And we’ve tested other HDFS features with Erasure Coding files,
for example, NameNode HA, Quota, and HDFS fsck.
These features works fine with Erasure Coding.
Especially fsck has been modified to print the necessary information of the Block Groups.
But in the phase 1, some features are not yet been supported in erasure coding.
For example, hflush and hsync are not supported in erasure coding phase1.
Therefore appending to the erasure coding files and truncating erasure coding files are not supported neither.
However, that is not so critical, because the target of the HDFS erasure coding is storing cold data in the current phase.
After checking system test, let’s head to performance tests.
First, the result of the writing and reading benchmark would be shared.
Then, we would check the performance of Erasure Coding files along with teragen and terasort, the typical mapreduce jobs.
Finally, the very practical distcp function which would be widely used were identified by performance perspective.
And let me introduce our test clusters.
We used two clusters in this tests.
They are alpha cluster and beta cluster.
There are 37 Nodes and 82 Nodes.
We did reading, writing tests and teragen, terasort tests in alpha cluster and the tests which uses real data were did in the beta cluster.
First is the result of reading and writing benchmark.
In this test, we used the benchmark tool, ErasureCodingBenchmarkThroughput.
It has been added to the HDFS test package.
This tool write and read files on the replication and erasure coding formats with multithreads.
This table is the result of the benchmark in our cluster.
Writing throughput decreased in the erasure coding format as expected. Writing the erasure coded files’ throughput is about 65% of the throughput writing replication files.
And reading throughput in the erasure coding slightly decrease in the erasure coding format.
But the throughput of the stateful reading is the same as replication.
In addition, we’ve tested reading with missing internal blocks.
Please look at this table.
You could find out that the throughput dropped down when an internal data block was missing.
Because the client needed to reconstruct missing blocks before it could read the data.
And we tested teragen and terasort, the typical mapreduce job benchmark as well.
This chart shows the cpu time of the teragen jobs.
The vertical axis shows log scaled CPU time and horizontal axis shows the number of rows of the output files.
The cpu time increased in the erasure coding format more than replication format as expected.
The overhead of the calculate parity data affected writing performance.
And please look at these charts.
These chars show the total time spent by map and reduce of the terasort jobs.
As you can see, the times spent by the map tasks are approximately the same between erasure coding and replication formats,
but the times spent by the reduce tasks increased significantly in the erasure coding.
Because the main workload of the map of the terasort was reading, on the other hand the main workload of the reduce of the terasort was writing.
This results confirms the previous benchmark test of write and read erasure coding files.
Finally, I’ll show the elapsed time of the distcp as a method to convert replication files to erasure coding files.
This is a typical workload to use erasure coding in HDFS.
We used our real web log data as a target data. It was 2TB size.
This table shows the result.
The elapsed time of copying 2TB files to erasure coded files was about 17 minutes.
This was about tripled the elapsed time of copying files from replication to replication.
But, currently using distcp is best way to convert existing replication files to erasure coding files.
So far I talked about the features and performance of the HDFS erasure coding, and I’ll talk that how are we Yahoo japan using HDFS erasure coding in our data platform.
Our data which is target to erasure coding is 6.2TB every day and we are storing these data up to 400days.
We are thinking to convert these data to erasure coding format gradually. And we are using archive storage with erasure coding to reduce more cost of the storage.
Now we are using erasure coding in our production cluster on trial. We would convert old data to erasure coding format steadily.
The cost of archive disk which is used in our cluster is about 70% of the cost of normal disk.
Furthermore, along with Erasure Coding, the cost would reduce by more 50%.
In total, we could reduce storage costs to 35% of tripled replication.
And we are using erasure coded ORC files as Hive input data.
This chart is the execution time of the hive query on the erasure coded ORC files.
The ORC files are about 2TB and holds 17billion records.
The vertical axis shows execution time seconds. The horizontal axis shows each different queries.
The red bar is using erasure coded files and the blue bar is using replication files.
As you can see, the execution time seemed to increase when the input data is erasure coded files.
But it will be acceptable, considering the queries are rarely executed
Finally, I’m going to talk about the future phase of HDFS erasure coding and new features or enhancement of the HDFS erasure coding which we want to import into our clusters.
First, we are going to try to use other codecs such as Intel ISA-Library and Hitchhiker algorithm to improve write performance.
And we are also interested in contiguous layout erasure coding which can provide data locality.
And we will contribute to implement hflush and hsync.
If they were implemented in HDFS erasure coding, we would be able to use erasure coding format in much more scenarios.
At last, I would like to make a conclusion.
In this talk, HDFS erasure coding was introduced and the results of the tests in our clusters were shared as well. And thanks to erasure coding.
We are really reducing the storage cost in production. It’s ready for production.