Gobblin is used at LinkedIn to ingest data from 3 Kafka clusters containing over 2500 topics and 50 billion records totaling over 15TB daily into related systems like copycat. It supports at least once, at most once, and exactly once semantics for publish and checkpoint persistence. Gobblin uses normal and two-level bin packing for load balancing across mappers and supports both deduplicated and non-deduplicated compaction with options for handling late events. More information can be found on Gobblin's GitHub page and user forum.