1. Map Reduce
databases project and need the explanation and answer to help me learn.
Hello
Having work related to Map Reduce. Are you interested? Need task 3 and 4 only.
Requirements:
Task and Mark distribution: 1. ER model and table generation (252. SQL programming
(30%) 3. Sequential and distributed processing (254. Research report (20%) Notes: 1. You
are expected to use the students can contact Centre for Academic Writing (CAW)2. Please
notify your registry 3. Any student requiring an extension or deferral should follow the
university process as outlined R model and table generation (25%) processing (25%) the
CUHarvard referencing format. For support and advice on how this Centre for Academic
Writing (CAW). registry course support team and module leader for disability supportAny
student requiring an extension or deferral should follow the university process as outlined
For support and advice on how this and module leader for disability support. Any student
requiring an extension or deferral should follow the university process as outlined •
7086CEM – Data Management Systems This assignment is made up of four parts: - Part A
deals with database design, using E-R modelling. - Part B concerns database creation and
querying, using SQL. - Part C covers parallel processing frameworks - Part D involves a
research report Coursework requirements Please note that plagiarism (copying and
pasting from sources) and collusion (submitting the same content as a fellow student) are
detected automatically by Turnitin and flagged. You should be able to check for similarity.
You are expected to submit your work. The report must be self-contained. Links to
external datasets are not acceptable. Appendices are not required. Report structure Please
follow the instructions below. You are expected: a) to include the name and the code of the
module, the title of the CW, your names and your student Ids. b) to follow the structure of
the CW brief, and present your work accordingly, i.e.: Part A 1. 2. ...... Part B 1. 2. a) b) c)
d) e) ...... Submission process for the group CW is submitted as a group of two students as
follows: a) You are expected to work as a group on ALL the parts of the CW. b) one student
will submit individually in his/her name on Aula, the whole report, with both names on the
front page, as a single pdf file, and c) the other student must submit individually in his/her
name on Aula, one page only, with both names, as a pdf file, where he/she presents
reflections on the collaborative work.
Part A. Conceptual modelling An equipment company wishes to create a database to
2. support the hiring of tools and machinery to clients. The company has three types of
equipment: power tools, such as drills and vacuum cleaners, plants such as excavators and
scaffolding. Each piece of equipment is identified by a number. Power tools are described by
their model and the voltage they use, whereas plants are classified by their model and their
size in tonnes. Scaffolding can be traditional, aluminium or fibreglass; in addition, its width
can be single or double. A large piece of equipment may be composed of smaller pieces of
equipment. The company has various outlets and each has staff including a manager and
several senior technicians who are responsible for supervising the work of allocated groups
of technicians. A supervision record is also kept for a specific date. All employees are
identified by their number, name, date of birth (DOB) and address. Furthermore, a record is
kept on their employment records and their qualifications. Each outlet has a stock of
equipment that may be hired by clients for varying periods of time, from a minimum of four
hours to a maximum of six months. Each hire agreement between a client and the company
is uniquely identified by using a hire number. Each client is identified by a number and a
name. The company insists that each client must take out insurance cover for each
equipment hire period. Each insurance cover is identified by a unique insurance number
and includes the description of the insurance. The company wishes to keep a record of the
member of staff who was in charge of a specific hire agreement. Each piece of equipment is
checked for faults when it is returned by the client, and the faults/defects/damage
recorded. The company keeps a record of the hire history of each client. 1. Create an ER
diagram for the above scenario and indicate the cardinality of relationships and the nature
of the associations (mandatory or optional). You should allocate adequate attributes to the
entities of interest, especially the identifiers. (20%) 2. Generate, with justification,
relational tables from the ER diagram. Indicate clearly the names of the tables, the
attributes, the primary keys and the foreign keys. (5%) Guidance: i) Create the ER diagram
and clearly identify any identifiers, and indicate the cardinality of relationships and the
nature of the associations (mandatory or optional). ii) Generate tables and include primary
and foreign keys. Use the schema notation; you do not have to produce SQL statements.
Example of table generation in schema form: Course(courseId, courseName)
Student (studentId, name, courseId*)
Part B: SQL programming Consider the following “Publication” Database schema, where
AuthorId, CategoryId, PubIicationd, UserId are unique identifiers. Category (CategoryId,
Type) Author (AuthorId, Name) Publisher(PublisherID, PublisherName) Publication
(PublicationId, AuthorId, Title, CategorytId, PublishedYear, PublisherId) Request (UserId,
PublicationId, RequestDate) User (UserId, Name, Email, Password) TABLE: Author
AuthorId Name A011 Dingle R A012 Ransome A A013 Wardale R A014 Alexander T A015
Spurrier S TABLE: Publisher PublisherId PublisherName P01 Pearson P02 HarperCollins
P03 Simon and Schuster TABLE: Category CategoryId Type C911 Short stories C912
Journal articles C913 Biography C914 Illustrations
TABLE: Publication PubId AuthorId Title CategoryId PublishedYear PublisherId P001 A011
The Blue Treacle C911 1911 P01 P002 A012 In Aleppo Once C911 2001 P01 P003 A012
Illustrating Arthur Ransome C914 1973 P03 P004 A012 Ransome the Artist C914 1994
P03 P005 A014 Bohemia In London C912 2008 P02 P006 A011 The Best of Childhood
3. C911 2002 P01 P007 A015 Distilled Enthusiasms C912 2010 P02 TABLE: Request UserId
PublicationId RequestDate U016 P001 05-Oct-2020 U241 P001 28-Sep-2020 U55 P002 08-
Sep-2020 U016 P004 06-Oct-2020 U121 P002 23-Sep-2020 TABLE: User UserId Name
Email Password U111 Kenderine J KenderineJ@hotmail.com Kenj2 U241 Wang F
WangF@hotmail.com Wanf05 U55 Flavel K FlavelK@hotmail.com Flak77 U016 Zidane Z
ZidaneZ@hotmail.com Zidz13 U121 Keita R KeitaR@hotmail.com Keir22
1. Use appropriate data types and write the SQL statements to create the tables defined in
the schema above. (10%) 2. Write SQL Statements to return the following data from the
Publication database. a) The names and the email of the users and the dates of all their
requests. (4%) b) The details
of the publications that were requested during September 2020 in descending order of
requested dates. The details should include publication ids, titles and dates on which
publications were requested. (4%) c) The names of the users, the
title of the publications they requested, and the names of their authors.
(4%) d) The number of publications requested for each of the categories (e.g.
‘Illustrations’). (4%) e) The names of users who requested more
than one publication. (4%) Guidance: Please use standard SQL in
Oracle Live SQL. Indicate clearly the primary keys and the foreign keys. State the SQL
statements and give the results. Select and use appropriate data. The presentation of each
query should have a text summary which includes i) the query itself, ii) the corresponding
SQL statement solution, iii) the result of the execution of the statement and iv) evidence that
you have used standard SQL and implemented each statement on a database (use
screenshots, no need for appendices). A small data sample is given. When appropriate, you
can create and insert additional data records in order to make sure that the queries return
results. Links to external datasets or code and are not acceptable. The report must be self-
contained.
Part C. Sequential and parallel processing Consider a university data store about module
coursework (CW) submissions by students. For each CW submission for a specific module,
each record of the data store holds the following data: 1. Module code 2. Module name 3.
Semester id 4. Year 5. CW title 6. Student id 7. Student name 8. CW submission date 9. Actual
date of CW submission by student Examples of records would have the following values:
(7086CEM, Data Management Systems, S1, 2022, Database Management, 10101010, Lionel
Messi, 28/11/2022, 30/11/2022) (7053CEM, Big Data and Visualisation, S2, 2021, Data
Analysis, 07070707, Cristiano Ronaldo, 21/10/2021, 25/10/2021) (7071CEM, Information
Retrieval, S3, 2020, Comparative Evaluation, 11011010, Zinedine Zidane, 09/09/2020,
25/10/2020) A record will contain a list of key-value pairs, as presented in the lectures. A
block will consist of a sequence of records. You are expected to create similar additional
records for this part in order to produce a mapReduce solution in question 2, below. For
each module, the university Registry would like to determine the number of CW
submissions which were submitted late, by one day or more. 1. Assuming that the data is
stored in a relational database, produce with justification a) the SQL statement to create the
corresponding table for the data store and b) the SQL statement to determine the number of
CW submissions that were submitted late by one day or more, for each module. (You do not
4. need to implement the SQL statements in Oracle Live SQL). (5%) 2. Assuming that the data
is too large to be processed in a centralised manner, and that it is stored in an ordinary file,
produce a decentralised solution, equivalent to 1.b) above, which applies MapReduce to the
data processing. Justify your decisions and all the steps of your solution. (20%) Guidance:
You should study carefully the examples of mapReduce covered in the lecture notes. You
should consider the structure of the key in the (key, value) pair in the original record and in
the mapping stage. This is not a programming exercise. The solution should follow the
structure of the solutions given in the lecture notes.
Part D. Research report Write in your own words a brief report where you introduce and
discuss data warehouses. You should cover the rationale for data warehouses, their
structure and their use. The maximum length of the report is 700 words. Longer reports will
be penalised. (20%) Guidance: Use your own words for the report. Copying and
pasting is plagiarism. You should include relevant references. The maximum length of the
report is 700 words. Longer reports will be penalised.
Marking Rubric Marking Scheme Grade Part A Part B Part C Part D <40 Incorrect
interpretation of scenario and Incomplete formulation of solution Limited identification
of entities and poor annotation of relationships Incorrect generation of relational tables
Limited or absent rationale Poor interpretation of requirements and of queries DDL
and DML SQL statements limited in scope Incomplete and incorrect SQL statements
Absence of rationale Partial understanding of requirements and partially correct SQL
formulation Partial understanding of context and relevance of parallel processing
Incomplete steps in the application of MapReduce Partial justification of design decisions
Lack of understanding of requirements Inadequate identification of issues
Incompetent understanding of structural and processing components Poorly written
essay 40-49 Partial interpretation of scenario and formulation of solution Partially
correct ER diagram with relevant entities and relationships Partial consistency in the
generation of the relational tables Partial justification of design decisions Basic
understanding of requirements and partially correct interpretation Relevant use of DDL
and DML statements in solution formulation Partially complete SQL statements Limited
justification of solution Partial understanding of requirements and partially correct SQL
formulation Partial understanding of context and relevance of parallel processing
Incomplete steps in the application of MapReduce Partial justification of design decisions
Partial interpretation of requirements Limited presentation of key issues Relevant
description of structural and processing components Mostly descriptive essay 50-59
Adequate and consistent interpretation of scenario and satisfactory conceptual modelling
Mostly correct generation of ER diagram with relevant entities and relationships
Relatively competent generation of the relational tables Adequately justified design
decisions Adequate understanding of requirements and mostly correct interpretation
Adequate use of DDL and DML SQL statements in solution formulation Mostly complete
SQL statements Adequate justification of solution Adequate understanding of
requirements and correct SQL formulation Adequate presentation of context of
application of parallel processing Mostly correct application of the different steps of
MapReduce Adequately justified solution Adequate interpretation of requirements
5. Key issues well identified and partially addressed Adequate presentation of key structural
and processing components Adequately written essay 60-69 Good and clear
interpretation of scenario and good solution formulation of the two parts Correct and
complete identification of entities and relationships Consistent generation of relational
tables Good interpretation of requirements and good formulation of solution Correct
use of DDL and DML statements in solution formulation Complete and relevant treatment
of queries Good justification of decisions Good solution to the initial query in terms of
SQL statements Focused presentation of context of application of parallel processing
Clearly stated and correct sequential steps of MapReduce Well expressed rationale
Good understanding and statement of requirements Well focused presentation of main
issues Good description of structural and processing components Well presented
Well expressed rationale essay with some reflection 70+ Very good and clear
interpretation of scenario and excellent solution formulation of the two parts Correct and
coherent identification of well annotated entities and types of relationships Logical and
consistent generation of relational tables Excellent justification of design decisions
Excellent interpretation of requirements and very good formulation of solution Excellent
use of DDL and DML statements in solution formulation Complete treatment of queries
and SQL formulation Very good rationale Very good solution to the initial query in
terms of SQL statements Excellent presentation on the need for an overall parallel
solution Clear deployment and annotation of the sequential steps of MapReduce
Excellent rationale Excellent understanding and interpretation of requirements Very
good identification and formulation of key issues Relevant and specific presentation of
structural and processing components Reflective writing supported by an excellent
structure