The document discusses compaction and splitting in Apache Accumulo distributed key-value stores. It explains that Accumulo tables are divided into non-overlapping ranges called tablets, and that compaction merges sorted files within a tablet into a single file to improve read performance. Splitting divides large tablets into two in order to balance workload. The document provides details on Accumulo's and HBase's compaction algorithms and how they determine when to compact and split tablets.
Hortonworks Data Platform (HDP) is the only 100% open source Apache Hadoop distribution that provides a complete and reliable foundation for enterprises that want to build, deploy and manage big data solutions. It allows you to confidently capture, process and share data in any format, at scale on commodity hardware and/or in a cloud environment. As the foundation for the next generation enterprise data architecture, HDP delivers all of the necessary components to uncover business insights from the growing streams of data flowing into and throughout your business. HDP is a fully integrated data platform that includes the stable core functions of Apache Hadoop (HDFS and MapReduce), the baseline tools to process big data (Apache Hive, Apache HBase, Apache Pig) as well as a set of advanced capabilities (Apache Ambari, Apache HCatalog and High Availability) that make big data operational and ready for the enterprise. Run through the points on left…