Hug Hbase Presentation.

•Download as ODP, PDF•

39 likes•9,472 views

Jack Levin

Technology

Why? – HBASE is used for 99% of the backend

HBASE Best Practices or Taming the Beast ,[object Object]

What's hot

Using Kafka Streams to Analyze Live Trading Activity for Crypto Exchanges (Lu...confluent

Building an ML Platform with Ray and MLflowDatabricks

소프트웨어 엔지니어의 한국/미국 직장생활Joon Hong

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative

Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit

Camel Desing Patterns Learned Through Blood, Sweat, and TearsBilgin Ibryam

Pythonと型チェッカーTetsuya Morimoto

[IMQA] 빠른 웹페이지 만들기 - 당신의 웹페이지는 몇 점인가요?IMQA

Netflix viewing data architecture evolution - EBJUG Nov 2014Philip Fisher-Ogden

File Format Benchmark - Avro, JSON, ORC & ParquetDataWorks Summit/Hadoop Summit

Introduction to GCP BigQuery and DataPrepPaweł Mitruś

codecentric AG: CQRS and Event Sourcing Applications with CassandraDataStax Academy

Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...confluent

MVC, MVVM, ReactorKit, VIPER를 거쳐 RIB 정착기정민 안

CloudWatch(+sns+sqs)で障害対応を自動化してみたTerui Masashi

Rによるword2vecYuichiro Kobayashi

RuboCopAndrew Grimm

Cassandra sharding and consistency (lightning talk)Federico Razzoli

20090622 VelocityJeff Hammerbacher

애자일 스크럼과 JIRA Terry Cho

What's hot (20)

Using Kafka Streams to Analyze Live Trading Activity for Crypto Exchanges (Lu...

Building an ML Platform with Ray and MLflow

소프트웨어 엔지니어의 한국/미국 직장생활

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021

Apache Beam: A unified model for batch and stream processing data

Camel Desing Patterns Learned Through Blood, Sweat, and Tears

Pythonと型チェッカー

[IMQA] 빠른 웹페이지 만들기 - 당신의 웹페이지는 몇 점인가요?

Netflix viewing data architecture evolution - EBJUG Nov 2014

File Format Benchmark - Avro, JSON, ORC & Parquet

Introduction to GCP BigQuery and DataPrep

codecentric AG: CQRS and Event Sourcing Applications with Cassandra

Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...

MVC, MVVM, ReactorKit, VIPER를 거쳐 RIB 정착기

CloudWatch(+sns+sqs)で障害対応を自動化してみた

Rによるword2vec

RuboCop

Cassandra sharding and consistency (lightning talk)

20090622 Velocity

애자일 스크럼과 JIRA

Viewers also liked

Facebook Messages & HBase强王

Adding Search to the Hadoop EcosystemCloudera, Inc.

Apache Hive 0.13 Performance BenchmarksHortonworks

Hadoop World 2011 Keynote: Ebay - Hugh WilliamsCloudera, Inc.

A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit

HDFS Analysis for Small FilesDataWorks Summit/Hadoop Summit

Hive + Tez: A Performance Deep DiveDataWorks Summit

Stream Processing with Kafka in Uber, Danny Yuan confluent

REST to RESTful Web Service家弘周

Intro to HBasealexbaranau

HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz

Viewers also liked (11)

Facebook Messages & HBase

Adding Search to the Hadoop Ecosystem

Apache Hive 0.13 Performance Benchmarks

Hadoop World 2011 Keynote: Ebay - Hugh Williams

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics

HDFS Analysis for Small Files

Hive + Tez: A Performance Deep Dive

Stream Processing with Kafka in Uber, Danny Yuan

REST to RESTful Web Service

Intro to HBase

HBase and HDFS: Understanding FileSystem Usage in HBase

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Nell’iperspazio con Rocket: il Framework Web di Rust!

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Unraveling Multimodality with Large Language Models.pdf

What is DBT - The Ultimate Data Build Tool.pdf

What's New in Teams Calling, Meetings and Devices March 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

DevoxxFR 2024 Reproducible Builds with Apache Maven

SIP trunking in Janus @ Kamailio World 2024

Commit 2024 - Secret Management made easy

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

How to write a Business Continuity Plan

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Advanced Computer Architecture – An Introduction

DMCC Future of Trade Web3 - Special Edition

"Debugging python applications inside k8s environment", Andrii Soldatenko

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

Hug Hbase Presentation.

2. 10,000 Concurrent requests per second

3. Super fast!

4. Super Huge datastore – 2bl rows.

5. Backend is scalable

6. Does not lose data

7. Why? – HBASE is used for 99% of the backend

9. ImageShack: 25 ml monthly uniques

10. Yfrog: 33 ml monthly uniques

11. 4 Hbase Clusters of various sizes (50TB to 1 PT)

12. Storing and serving 250ml photos (500kb average per file), 60 servers

13. Yfrog is powered by smaller 50 TB cluster, with 2 billion rows, 20 servers

14. Using 0.89x and 0.90x versions

15.

16. Lots of RAM is good but only to a point, just avoid swap.

17. We use sub $1k desktop grade servers, they work great!

18. Check your network hardware for packet drops (we had outifDiscards interrupting zookeeper messages, Region servers would suicide during packet loss), just use ping -f to test for packet loss between core nodes.

19. JVM GC does take lots of CPU when misconfigured – e.g. Small NewSize

20. Single Namenode? No problem, just build two clusters have your APP tier do log query replication and replays when needed.

21. Inexpensive 2TB hitachi disks (~$100) work great, get more units for your money.

22.

23. 2. Setup HDFS to work flawlessly (pay attention to ulimits, thread limits, hardware stats, graphs, iowait, etc)

24. 3. Adjust JVM GC NewSize to be at least 100MB (if YG GC is too slow for 100MB, you need faster CPUs).

25. 4. For metadata rows (small rows) adjust your Hbase block size to be 4 or 8kb, you will see less IO and more blocks will fit into RAM.

26.

27.

28. Memstore size graph should be fairly flat with even flushes over time.

29. Iowait graphs should not go over 70-80% during major compaction, and 20% during minor compactions. Otherwise just add more disks and/or nodes.

30. Monitor and graph Thrift threads (via ps -eLf | grep PID), if your threads end up over 25,000, you may run out of RAM. We have dedicated thrift boxes so that we don't accidently kill RS nodes.

31. We use Nagios to monitor and alert for DN, RS, ZK, NN, etc on their web tcp ports – very helpful.

32. Run hbck to check for consistency of meta structures.

33.

34. Various RAM brands – boxes crash for no reason.

35. Glibc in FC13 had race condition bug, would lock up nodes, crash JVM processes under high load. Solution: yum -y update glibc (invalid binfree)

36. When running in mixed hardware environment, some boxes were slow enough to affect HDFS for the whole cluster – looking at “runnable threads” and “fsreadlatency” in Ganglia always pointed which boxes were 'slow'

37. Running cloudera HDFS under user 'hadoop', that was restricted to 1024 threads by default would crash datanodes, but only during compactions. Setting hadoop soft(and hard) nproc 32,000 in limits.conf resolved it.

38. GC sometimes autotunes NewSize of 20MB, caused GC run to 20 or 30 per second, causing CPU to flatline at 100% and kill the RS. Manually setting to 128MB resolved this issue.

39.

40. No strange crashes

41. No OOME

42. Fast – 0.5 ms puts, 2-3ms reads, 10ms disk reads.

43. Recovers quickly when nodes are taken down

44. Oncall team can finally relax

45.

46. Load test HBASE with YCSB – just leave it running for a week, if nothing crashes, you are good. Best not to test with live user traffic :)

47. Do not worry about Namenode redundancy, just backup /name dir frequently. Setup secondary Hbase cluster with the money you save on not buying 'Server' grade nodes.

48. Burn in your disks, even if they are new

49. Put Memcached between your App. Tier and Hbase, App. Bugs will hit memcached first, keeping hbase safe from the assault, which could drive your utilization.

50.

51. JD Cryans

52. Michael Stack

53. And everyone else on the hbase user list who helped us out during the rough times.

Hug Hbase Presentation.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Hug Hbase Presentation.

Similar to Hug Hbase Presentation. (20)

Recently uploaded

Recently uploaded (20)

Hug Hbase Presentation.