Hierarchical – single source of data being pushed to distribution worldwide Federation – Each site manages own data – databases already pre-existing Sensors (bottom-up) – Sensors push to a central DB. Flow of data is from the bottom -> up. Hybrid – Combos! Scope – difference between generic and adaption to a particular domain VOs Collaborative – If it is created by entities that share common goal (single). Regulated – Controlled by single organization. Economy-Based – enter into collaborations with consumers due to profit motive (SLA’s etc) Reputation-based – Inviting entities to join based on the level of services that they are known to provide Data sources – no real notion of transient data yet Management – self-explanatory
Function – 3 tier stack (everything above has everything below implicitly). File I/O = remote appears as if local. Overlay network manages routing Security – Also mutually exclusive. Can have multiple for Auth. Fine-grained, more flexible ownership of data – certs with tickets etc. Fault Tolerance – Cache transfer = store-forward Transfer Mode – Latency Management
Centralized (master / copy) or Decentralized – many copies with no master Storage Integration – Control over FS (kernel level) or using File system (high-level) Transfer Protocols – Open = data available outside of rep method Metadata – Two types of attributes (user-defined = vo’s etc). Update Type – how it is updated Replica Update Propagation – Epidemic vs on-demand Catalog – Replica Catalog – Tree, hash, DB
When and where to create a replica of the data. Method – Whether to adapt to changes in demand, bandwidth, or storage availability (more overhead) Granularity – how big Objective Function – Why –
Application model they are targeted towards Scope – Community based- QoS, SLA’s etc vs individual uses Data Replication – Attach to replication Utility – makespan – time it takes for all jobs to go in a se Locality – Spatial – locating a job in such a way that all the data for the job is available on data hosts that are located close to the point of computation (moving jobs to the data) Temporal – fact that if the data is close to the compute node, subsequent jobs which require the same data are schedule to the same node. (moving data to jobs)