Gehören Sie zu den Ersten, denen das gefällt!
Twitter collects petabytes of data every day and empowers its engineers and data scientists for large data processing with an hybrid on-premises and cloud model. In this talk, we will look at its GCP architecture and the resource hierarchy. We will deep dive into the storage design that uses Google Cloud Storage to organize petabytes of data that are replicated from on-premises HDFS clusters. We will take a look at how the user-management tooling has been designed to create and manage access for thousands of accounts (human and service accounts) at Twitter. We will talk about how the design deals with the security measures for accounts and tooling systems running in GCP and the complexities of dataset permissions. We will share the challenges we faced as we tried to design our system at scale and our learnings and solutions.