SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 749
Socially Smart an Aggregation System for Social Media using
Web Scraping
Namburu Srikanth1, Vennapusa Tejaswini2, Chethala Harish3, D. Praveen Kumar4
1,2,3Student, Computer Science and Engineering, Hindustan Institute of Technology and Science, Tamilnadu, India
4Assistant Professor, Department of Computer Science and Engineering, Hindustan Institute of Technology and
Science, Tamilnadu, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract -Abstract - Socially Smart is a dedicated platform
that aggregates all the latest social media posts from multiple
social media web-based platforms such as famous reddit blog,
Onion News and GitHub and summarizes them to display in a
short and crisp words. This online web-based platform
provides a service type interactiontousersacrosstheweb. The
main motto of this application is to access the required and
useful social media content at same place which leads to save
the time being spent on social media. It will fetch the top and
trending posts without less priority posts. This Socially Smart
platform uses Web Crawling or Web Scraping and API tofetch
the top posts form various social media sites. The web
crawling method fetches the posts form the mentioned URL,
similarly the API Based method also gets the top posts from
the websites API URL which is given already. The userswillget
excellent experience on this platform as it fetches data form
the most used social media websites and the data is also the
most commented and trending posts only and presents them
with simple and pleasing User Interface.
Keywords – Social Media aggregator, Web Scraping / Web
Scraping and API.
1. INTRODUCTION
With the creation of new social media platforms, there is a
rapid growth of Social Media users. There web applications
meant for different purposes. Following these websites is
very time consuming and mostlyunimportantinformationis
consumed in these sites. In this busy world it is very difficult
for an individual to find time for every social website may it
be regional news, coding blogs or reddit. This online
platform helps to reduce time spent on these websites by
collecting the data fromthosethroughwebcrawling,API and
presenting the top and most trending data online.
This web-based application acts as a common point tosocial
media sites. Socially Smart fetches the individual posts form
the social media websites and provides them in a single
flexible platform. It reduces the time being spent on social
media and the efficiency of the useful info is also increases
since the posts fetched the most commented and trending
posts on respective websites. The struggle to visit every
website reduces. The users can get full up-to-minute daily
trending posts.
Data Science which is basically a data-driven subject about
how enormous data should be handled and finding easier
ways to extract knowledge or patterns from the data in
various forms, either structured or unstructured. Socially
Smart uses a technique called as Web scraping/Web
crawling which is part of Data Science. Web Crawling/ Web
Scraping is a procedure to acquire the information from the
web-based applications which don’t have API functionality.
This method focuses on converting the formlessdata (HTML
/ XML format) on the web to structured data that can be
stored in database such as SQL-Lite in meaningful format.
2. RELATED WORK
The Socially Smart system architecture is asshowninFigure
1.
Fig 1. Socially Smart system Architecture
2.1 Existing System:
In the existing system, not all the information is available
under a single roof. Therefore, this becomes a problem of
time consumption. Not all the websites will have the same
information. So, the social media user will need to reach
multiple websites in order to see the posts. Therefore, to
solve these issues, the proposed system which we are
making provides information available for the user under a
single roof. There are some existing platforms such as
Newsone which provides news in a consolidated manner
which acts as a news aggregator.
2.1.1 Existing System: Disadvantages
- Provides the source URL for only the news. In this
system it just displays the name of only digital
newspapers available in the web.
Databas
e Server
Backend
Python
Django
Applicatio
n Server
Front End
HTML
Browser
Internet
Request
Response
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 750
- Displays all the news that scraped without any
narrowing of news totopsearchedormostfollowed
post.
- Not Categorized, This System provides the latest
news in a short manner but is not Categorize from
which website it scrapes data from.
2.2 Proposed System:
We have so many social media applicationsopening
every app and using them is time consuming. User cannot
visit each and every social media website separatelytouseit
for hours so I propose Socially Smart:Anaggregationsystem
for social media using Web scraping for fetching the data
form a couple of websites and displays the posts to the user
at a single place. First the data is fetched using two different
methods one is Web Scraping / Web Crawling and the other
one is fetching the data using websites api. Once the data is
fetched from the websites it needs to be stored in a database
for that purpose, we are using MySQL database.Forscraping
of data, we use the library called Beautiful Soup.
Once the data is scraped and stored in SQL database we use
Django framework for creating a web-based platform and
displaying the data from the database.WechoosetheDjango
because it is relatively easy to create websitesina shorttime
and without compromising the security of the website. On
top of Django the html templates are relatively simple and
does the work of displaying title to the users. If the users
want to see a particular post, he can click on the title it will
take him directly to the post in respective site. The posts
have given heading from which the posts are came from and
number of comments also for the reddit posts. Along with
this user have a functionality to take notes if he finds any
post interesting and want to research about it or want to
save it, he can simply click on new note and can write title
and the link of the post for future study. It will be there
permanently until the user decides to delete it. Yes,ithasthe
functionality to delete or update the texts.
2.2.1 Advantages of the proposed system:
- Latest posts from different social media websites
can be made available to people.
- It updates you with social media posts time to time
when the trending post changes in the websites.
- Get a flexible and simple readingexperiencewithits
simple UI/UX.
- Have the notes taking functionality for future
reference.
3. Working principle of Socially Smart
3.1 Application Logic:
First it visits the social media websitesspecifiedand
scrapes the data once done it will store the data in
the SQL lite database deleting the previous data presentinit.
For this the user needs to give the event triggering click in
home page. Once the data is stored in the database the logic
will analyse the data and keeps only the high commented
and trending posts by deleting the remaining. Now theposts
in the database are finally displayed to user in simple UI/UX.
Fig 2. Application Logic
3.2 DATABASE SCHEMA
The schema contains three classes reddit, Onion news and
one for the Notes functionality.
The three classes are -
 Headlines - Onion News
 Story - Reddit
 Notes - For Notes App.
Fig 3. Database Schema
3.3 Web Scraping / Web Crawling:
Web scraping, often called web crawling or web
spidering or programmaticallygoingovera collectionof web
pages and extracting data, is a powerful tool for working
with data on the web. It is a technique employed to extract
large amounts of data from websites whereby the data is
extracted and saved to a local file in your computer or to a
database in table format. Web scraping services is the
technique of automating this process, so that instead of
manually copying the data from websites, the Web Scraping
software will perform the same task within a fraction of the
time.
3.3.1 Web Crawling steps
Here are some basic steps performed by most web crawler:
My SQL Database
Web
Application Logic
Database Interface
UI
Database Interface
Fetcher
UI
Text
Url
Text
Url
Link
Url
Comments
NOTES
HEADLIN
E
STORY
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 751
1. Enter a URL and use a HTTP request to access the
URL
2. Now fetch all the contents in the URL and parse the
data
3. Store the data in any desired database.
4. Enqueue all the URLs in a page.
5. Use the URLs in queue and repeat from process 1.
Figure 4. Shows the Stages of Web Crawling
4. RESULTS
The most trending and most commented content
will get crawled by a click. Finally, the reader will be able to
get the posts from different social media. The user can also
take notes and store URL if he wants this functionality is
provided. The Figure 4 showsthesampleUIofSociallySmart
where the posts are categorized by their websitesalongwith
comments for reddit posts and the functionality of notes
taking.
Fig 5. The sample UI of Socially Smart
5. CONCLUSION:
Socially Smart is a meaningful platform which
reduces the user time being spent on social media by
fetching all the required data to a single place reducing the
time spent on opening every website. Instead of reading all
the posts the user gets to read only top posts only so more
efficiency is achieved. User can also take notes which
altogether makes it a great Social media aggregation
platform.
6. FUTUREWORK:
In future one can develop mobile application of this
system so that mobile user can easily access thisapplication.
Adding new websites such as GitHub, Facebook, twitter and
some more news websites will make it very good and
flexible.
A research work is going to enhance the Socially
Smart which can be personalized and customized according
to the reader. For an example, if a reader has interest in
sports the sports news will be shown as a priority. And also,
additional tab called Education can be added so that the
students will get the latest posts about education and
current affairs which will help them to crack thecompetitive
examinations
7. REFERENCES:
1. Sandeep Sirsat and Vinay Chavan , “ Pattern
matching for extraction of core contents from news
web pages” IEEE Transaction on Web Research
(ICWR),23 June 2016.
2. Kolari P and Joszhi.A, “Application of Web Scraping
and Google API service”, IEEE Transactions on
Knowledge and Data Engineering, Vol.6,No.4,2014.
3. Deepak Kumar Mahto and Lisha Singh, “A Dive into
Web Scraper World”, IEEE Transaction on
Computing for Sustainable Global Development,
Vol.2, No.1, March 2014.
4. Suraj B. Karale and G.A. Patil, “Extracting brief note
from Internet Newspaper”, IEEE Transaction on
Computing for Sustainable Global Development,
Vol.45, No.1 March 2016.
5. TehPohey Lee, Abdul Azim Abdul Ghani and Chang
Yu Huang,”Survey on application tools of Really
Simple Syndication (RSS): A case study at Klang
Valley”, Vol.3, 2008.
6. S.K. Malik, S.M. Ravi, "Information Extraction Using
Web Usage Mining Web Scrapping and Semantic
Annotation", IEEE International Conference on
Computational Intelligence and Communication
Networks (CICN), 2011.
7. Fister, S. Fong, Yan Zhuang, "Data Reconstructionof
Abandoned Websites", IEEE 2nd International
Symposiumon Computational and Business
Intelligence (ISCBI), 2014.
DOCUMENT
LOAD
PARSING
EXTRACTION
TRANSFORMATION
Pull the complete
webpage (HTML)
Parse the HTML
document into
something the script
can understand
Use the result
of parsing to
extract the
desired
content
Convert the
data into
meaningful
format
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 752
8. Quang Thai Le, D Pishva, "Application of Web
Scraping and Google API service to optimize
convenience stores distribution", 17th IEEE
International Conference on Advanced
Communication Technology (ICACT), 2015.
9. Zixuan Zhuang Mehdi Bahrami, Mukesh Singhal, "A
cloud-based web crawler architecture", 18th
International Conference on Intelligence in Next
Generation Networks 978-1-4799-1866-9,pp.216-
223, 2015.

Weitere ähnliche Inhalte

Ähnlich wie IRJET- Socially Smart an Aggregation System for Social Media using Web Scraping

Efficient and effective video sharing in online Social network using revocati...
Efficient and effective video sharing in online Social network using revocati...Efficient and effective video sharing in online Social network using revocati...
Efficient and effective video sharing in online Social network using revocati...IRJET Journal
 
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
Implementation of Sentimental Analysis of Social Media for Stock  Prediction ...Implementation of Sentimental Analysis of Social Media for Stock  Prediction ...
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...IRJET Journal
 
Decentralized Social Network
Decentralized Social NetworkDecentralized Social Network
Decentralized Social NetworkIRJET Journal
 
online blogging system
online blogging systemonline blogging system
online blogging system001vaibhav
 
Ijsred v2 i5p95
Ijsred v2 i5p95Ijsred v2 i5p95
Ijsred v2 i5p95IJSRED
 
IRJET- Summarized News Application using TF-IDF
IRJET- Summarized News Application using TF-IDFIRJET- Summarized News Application using TF-IDF
IRJET- Summarized News Application using TF-IDFIRJET Journal
 
online news portal system
online news portal systemonline news portal system
online news portal systemArman Ahmed
 
IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...
IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...
IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...IRJET Journal
 
IRJET- Android Application for WIFI based Library Book Locator
IRJET-  	  Android Application for WIFI based Library Book LocatorIRJET-  	  Android Application for WIFI based Library Book Locator
IRJET- Android Application for WIFI based Library Book LocatorIRJET Journal
 
IJSRED-V2I5P30
IJSRED-V2I5P30IJSRED-V2I5P30
IJSRED-V2I5P30IJSRED
 
IRJET- News Recommendation based on User Preferences and Location
IRJET-  	  News Recommendation based on User Preferences and LocationIRJET-  	  News Recommendation based on User Preferences and Location
IRJET- News Recommendation based on User Preferences and LocationIRJET Journal
 
Vertex – The All in one Web Application
Vertex – The All in one Web ApplicationVertex – The All in one Web Application
Vertex – The All in one Web ApplicationIRJET Journal
 
IRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data VisualizationIRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data VisualizationIRJET Journal
 
IRJET- Animal Welfare and Wellness Application using Javascript
IRJET- Animal Welfare and Wellness Application using JavascriptIRJET- Animal Welfare and Wellness Application using Javascript
IRJET- Animal Welfare and Wellness Application using JavascriptIRJET Journal
 
IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)
IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)
IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)IRJET Journal
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine ScrapperIRJET Journal
 
WEB 2.0
WEB 2.0WEB 2.0
WEB 2.0ARJUN
 

Ähnlich wie IRJET- Socially Smart an Aggregation System for Social Media using Web Scraping (20)

Efficient and effective video sharing in online Social network using revocati...
Efficient and effective video sharing in online Social network using revocati...Efficient and effective video sharing in online Social network using revocati...
Efficient and effective video sharing in online Social network using revocati...
 
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
Implementation of Sentimental Analysis of Social Media for Stock  Prediction ...Implementation of Sentimental Analysis of Social Media for Stock  Prediction ...
Implementation of Sentimental Analysis of Social Media for Stock Prediction ...
 
Decentralized Social Network
Decentralized Social NetworkDecentralized Social Network
Decentralized Social Network
 
Ceramic invoice final
Ceramic invoice finalCeramic invoice final
Ceramic invoice final
 
online blogging system
online blogging systemonline blogging system
online blogging system
 
Ijsred v2 i5p95
Ijsred v2 i5p95Ijsred v2 i5p95
Ijsred v2 i5p95
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
 
IRJET- Summarized News Application using TF-IDF
IRJET- Summarized News Application using TF-IDFIRJET- Summarized News Application using TF-IDF
IRJET- Summarized News Application using TF-IDF
 
online news portal system
online news portal systemonline news portal system
online news portal system
 
IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...
IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...
IRJET- Implementation of E-Marketing using SPIP, Mysql Database, Html, and Bo...
 
IRJET- Android Application for WIFI based Library Book Locator
IRJET-  	  Android Application for WIFI based Library Book LocatorIRJET-  	  Android Application for WIFI based Library Book Locator
IRJET- Android Application for WIFI based Library Book Locator
 
IJSRED-V2I5P30
IJSRED-V2I5P30IJSRED-V2I5P30
IJSRED-V2I5P30
 
IRJET- News Recommendation based on User Preferences and Location
IRJET-  	  News Recommendation based on User Preferences and LocationIRJET-  	  News Recommendation based on User Preferences and Location
IRJET- News Recommendation based on User Preferences and Location
 
Vertex – The All in one Web Application
Vertex – The All in one Web ApplicationVertex – The All in one Web Application
Vertex – The All in one Web Application
 
IRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data VisualizationIRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data Visualization
 
IRJET- Animal Welfare and Wellness Application using Javascript
IRJET- Animal Welfare and Wellness Application using JavascriptIRJET- Animal Welfare and Wellness Application using Javascript
IRJET- Animal Welfare and Wellness Application using Javascript
 
IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)
IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)
IRJET- Saas: Sharepoint Online Implementation as Platform (Task Monitor)
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
 
Complete-Mini-Project-Report
Complete-Mini-Project-ReportComplete-Mini-Project-Report
Complete-Mini-Project-Report
 
WEB 2.0
WEB 2.0WEB 2.0
WEB 2.0
 

Mehr von IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

Mehr von IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Kürzlich hochgeladen

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 

Kürzlich hochgeladen (20)

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

IRJET- Socially Smart an Aggregation System for Social Media using Web Scraping

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 749 Socially Smart an Aggregation System for Social Media using Web Scraping Namburu Srikanth1, Vennapusa Tejaswini2, Chethala Harish3, D. Praveen Kumar4 1,2,3Student, Computer Science and Engineering, Hindustan Institute of Technology and Science, Tamilnadu, India 4Assistant Professor, Department of Computer Science and Engineering, Hindustan Institute of Technology and Science, Tamilnadu, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract -Abstract - Socially Smart is a dedicated platform that aggregates all the latest social media posts from multiple social media web-based platforms such as famous reddit blog, Onion News and GitHub and summarizes them to display in a short and crisp words. This online web-based platform provides a service type interactiontousersacrosstheweb. The main motto of this application is to access the required and useful social media content at same place which leads to save the time being spent on social media. It will fetch the top and trending posts without less priority posts. This Socially Smart platform uses Web Crawling or Web Scraping and API tofetch the top posts form various social media sites. The web crawling method fetches the posts form the mentioned URL, similarly the API Based method also gets the top posts from the websites API URL which is given already. The userswillget excellent experience on this platform as it fetches data form the most used social media websites and the data is also the most commented and trending posts only and presents them with simple and pleasing User Interface. Keywords – Social Media aggregator, Web Scraping / Web Scraping and API. 1. INTRODUCTION With the creation of new social media platforms, there is a rapid growth of Social Media users. There web applications meant for different purposes. Following these websites is very time consuming and mostlyunimportantinformationis consumed in these sites. In this busy world it is very difficult for an individual to find time for every social website may it be regional news, coding blogs or reddit. This online platform helps to reduce time spent on these websites by collecting the data fromthosethroughwebcrawling,API and presenting the top and most trending data online. This web-based application acts as a common point tosocial media sites. Socially Smart fetches the individual posts form the social media websites and provides them in a single flexible platform. It reduces the time being spent on social media and the efficiency of the useful info is also increases since the posts fetched the most commented and trending posts on respective websites. The struggle to visit every website reduces. The users can get full up-to-minute daily trending posts. Data Science which is basically a data-driven subject about how enormous data should be handled and finding easier ways to extract knowledge or patterns from the data in various forms, either structured or unstructured. Socially Smart uses a technique called as Web scraping/Web crawling which is part of Data Science. Web Crawling/ Web Scraping is a procedure to acquire the information from the web-based applications which don’t have API functionality. This method focuses on converting the formlessdata (HTML / XML format) on the web to structured data that can be stored in database such as SQL-Lite in meaningful format. 2. RELATED WORK The Socially Smart system architecture is asshowninFigure 1. Fig 1. Socially Smart system Architecture 2.1 Existing System: In the existing system, not all the information is available under a single roof. Therefore, this becomes a problem of time consumption. Not all the websites will have the same information. So, the social media user will need to reach multiple websites in order to see the posts. Therefore, to solve these issues, the proposed system which we are making provides information available for the user under a single roof. There are some existing platforms such as Newsone which provides news in a consolidated manner which acts as a news aggregator. 2.1.1 Existing System: Disadvantages - Provides the source URL for only the news. In this system it just displays the name of only digital newspapers available in the web. Databas e Server Backend Python Django Applicatio n Server Front End HTML Browser Internet Request Response
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 750 - Displays all the news that scraped without any narrowing of news totopsearchedormostfollowed post. - Not Categorized, This System provides the latest news in a short manner but is not Categorize from which website it scrapes data from. 2.2 Proposed System: We have so many social media applicationsopening every app and using them is time consuming. User cannot visit each and every social media website separatelytouseit for hours so I propose Socially Smart:Anaggregationsystem for social media using Web scraping for fetching the data form a couple of websites and displays the posts to the user at a single place. First the data is fetched using two different methods one is Web Scraping / Web Crawling and the other one is fetching the data using websites api. Once the data is fetched from the websites it needs to be stored in a database for that purpose, we are using MySQL database.Forscraping of data, we use the library called Beautiful Soup. Once the data is scraped and stored in SQL database we use Django framework for creating a web-based platform and displaying the data from the database.WechoosetheDjango because it is relatively easy to create websitesina shorttime and without compromising the security of the website. On top of Django the html templates are relatively simple and does the work of displaying title to the users. If the users want to see a particular post, he can click on the title it will take him directly to the post in respective site. The posts have given heading from which the posts are came from and number of comments also for the reddit posts. Along with this user have a functionality to take notes if he finds any post interesting and want to research about it or want to save it, he can simply click on new note and can write title and the link of the post for future study. It will be there permanently until the user decides to delete it. Yes,ithasthe functionality to delete or update the texts. 2.2.1 Advantages of the proposed system: - Latest posts from different social media websites can be made available to people. - It updates you with social media posts time to time when the trending post changes in the websites. - Get a flexible and simple readingexperiencewithits simple UI/UX. - Have the notes taking functionality for future reference. 3. Working principle of Socially Smart 3.1 Application Logic: First it visits the social media websitesspecifiedand scrapes the data once done it will store the data in the SQL lite database deleting the previous data presentinit. For this the user needs to give the event triggering click in home page. Once the data is stored in the database the logic will analyse the data and keeps only the high commented and trending posts by deleting the remaining. Now theposts in the database are finally displayed to user in simple UI/UX. Fig 2. Application Logic 3.2 DATABASE SCHEMA The schema contains three classes reddit, Onion news and one for the Notes functionality. The three classes are -  Headlines - Onion News  Story - Reddit  Notes - For Notes App. Fig 3. Database Schema 3.3 Web Scraping / Web Crawling: Web scraping, often called web crawling or web spidering or programmaticallygoingovera collectionof web pages and extracting data, is a powerful tool for working with data on the web. It is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table format. Web scraping services is the technique of automating this process, so that instead of manually copying the data from websites, the Web Scraping software will perform the same task within a fraction of the time. 3.3.1 Web Crawling steps Here are some basic steps performed by most web crawler: My SQL Database Web Application Logic Database Interface UI Database Interface Fetcher UI Text Url Text Url Link Url Comments NOTES HEADLIN E STORY
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 751 1. Enter a URL and use a HTTP request to access the URL 2. Now fetch all the contents in the URL and parse the data 3. Store the data in any desired database. 4. Enqueue all the URLs in a page. 5. Use the URLs in queue and repeat from process 1. Figure 4. Shows the Stages of Web Crawling 4. RESULTS The most trending and most commented content will get crawled by a click. Finally, the reader will be able to get the posts from different social media. The user can also take notes and store URL if he wants this functionality is provided. The Figure 4 showsthesampleUIofSociallySmart where the posts are categorized by their websitesalongwith comments for reddit posts and the functionality of notes taking. Fig 5. The sample UI of Socially Smart 5. CONCLUSION: Socially Smart is a meaningful platform which reduces the user time being spent on social media by fetching all the required data to a single place reducing the time spent on opening every website. Instead of reading all the posts the user gets to read only top posts only so more efficiency is achieved. User can also take notes which altogether makes it a great Social media aggregation platform. 6. FUTUREWORK: In future one can develop mobile application of this system so that mobile user can easily access thisapplication. Adding new websites such as GitHub, Facebook, twitter and some more news websites will make it very good and flexible. A research work is going to enhance the Socially Smart which can be personalized and customized according to the reader. For an example, if a reader has interest in sports the sports news will be shown as a priority. And also, additional tab called Education can be added so that the students will get the latest posts about education and current affairs which will help them to crack thecompetitive examinations 7. REFERENCES: 1. Sandeep Sirsat and Vinay Chavan , “ Pattern matching for extraction of core contents from news web pages” IEEE Transaction on Web Research (ICWR),23 June 2016. 2. Kolari P and Joszhi.A, “Application of Web Scraping and Google API service”, IEEE Transactions on Knowledge and Data Engineering, Vol.6,No.4,2014. 3. Deepak Kumar Mahto and Lisha Singh, “A Dive into Web Scraper World”, IEEE Transaction on Computing for Sustainable Global Development, Vol.2, No.1, March 2014. 4. Suraj B. Karale and G.A. Patil, “Extracting brief note from Internet Newspaper”, IEEE Transaction on Computing for Sustainable Global Development, Vol.45, No.1 March 2016. 5. TehPohey Lee, Abdul Azim Abdul Ghani and Chang Yu Huang,”Survey on application tools of Really Simple Syndication (RSS): A case study at Klang Valley”, Vol.3, 2008. 6. S.K. Malik, S.M. Ravi, "Information Extraction Using Web Usage Mining Web Scrapping and Semantic Annotation", IEEE International Conference on Computational Intelligence and Communication Networks (CICN), 2011. 7. Fister, S. Fong, Yan Zhuang, "Data Reconstructionof Abandoned Websites", IEEE 2nd International Symposiumon Computational and Business Intelligence (ISCBI), 2014. DOCUMENT LOAD PARSING EXTRACTION TRANSFORMATION Pull the complete webpage (HTML) Parse the HTML document into something the script can understand Use the result of parsing to extract the desired content Convert the data into meaningful format
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 752 8. Quang Thai Le, D Pishva, "Application of Web Scraping and Google API service to optimize convenience stores distribution", 17th IEEE International Conference on Advanced Communication Technology (ICACT), 2015. 9. Zixuan Zhuang Mehdi Bahrami, Mukesh Singhal, "A cloud-based web crawler architecture", 18th International Conference on Intelligence in Next Generation Networks 978-1-4799-1866-9,pp.216- 223, 2015.