Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 24 Anzeige

Tabloid

Herunterladen, um offline zu lesen

Tabloid is a content aggregation service. It provides the aggregation of RSS feeds, as well as aggregation of well-known blogs and websites built into the service. Content can be separated into categories and can be marked read or unread.

Tabloid is a content aggregation service. It provides the aggregation of RSS feeds, as well as aggregation of well-known blogs and websites built into the service. Content can be separated into categories and can be marked read or unread.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Tabloid (20)

Anzeige

Aktuellste (20)

Tabloid

  1. 1. TABLOID Submitted to: Submitted By: Mr. Sandeep Kumar Singh Shivam Prakash 10103609 Ms. Parul Agarwal Lohitaksh Varshney 10103599
  2. 2. Introduction • Tabloid is a content aggregation service. It provides the aggregation of RSS feeds, as well as aggregation of well-known blogs and websites built into the service. Content can be separated into categories and can be marked read or unread. • An RSS feed provides a method for websites to publish information once and syndicate it automatically amongst millions of subscribers. An aggregator captures, and organizes, feeds to simplify the consumption of news. No longer do we need to constantly revisit the same site, wondering what’s new. Now we only need to add RSS feeds to our reader, also known as aggregators, and wait for the news to come to us. Readers provide information consumers with huge advantages • Tabloid has a variety of experiences, but for the most part readers swipe upwards to browse through more of a paginated experience. The UI is richer and features more of a magazine style similar to Feedly and Flipboard.
  3. 3. Feature of Tabloid • User can customize their own magazine according to their interest by subscription • Content can be separated into categories. • Example if the user likes news on car and related topic he can categorize it under their own desired name such as automobile • Subscriptions can be easily added or removed. User gets an option to preview its contents before subscribing it. • No longer do the user need to constantly revisit the same site, wondering what’s new. Example if user vests interest in car he will be constantly getting updated about the news related to it. • Recommendation based on the users Facebook profile activities, which will suggest him with various other relevant content that might interest him to overcome the cold start problem in content based analysis • Recommendations on the basis of the news feed read • Recommendation on the basis of the article read
  4. 4. Problem Statement • The aim of the project is to create a web application for customization of news from various sources and to recommend the users with the contents the user would like. • It should provide the aggregation of RSS feeds, as well as aggregation of well-known blogs and websites built into the service. Content can be separated into categories and can be marked. • It should capture and organize feeds to simplify the consumption of news. No longer do the user need to constantly revisit the same site, wondering what’s new. Now the user only need to add RSS feeds to the website, and wait for the news to come to their subscription.
  5. 5. Benefits of the application • Simplified user interface where the user can easily find news and feed by searching for it • User can review the newspaper or feeds before subscribing it • Applications recommendation system work automatically to generate the best relevant news and feed that the user would like read • Application removes repetition of similar news/feeds from various subscriptions by selecting the best one from it
  6. 6. Comparison with present technologies Name of the tool Description of the tool Weakness of the tool Feedly Feedly is a news aggregator application for various Web browsers and mobile devices running iOS and Android, also available as a cloud-based service  No search. Can't search through old read/unread feed content like you could with Google Reader.  The same news repeats from different subscriptions  Recommendation is slow as there is not previous detail of the new customer Flipboard Flipboard is a personalized magazine app designed for phones and tablets, but you can access it on a PC, too. It takes stories from around the web based on your own interests and delivers them to you in an attractive visual feed  The same news repeats from different subscriptions  Recommendation is slow as there is not previous detail of the new customer Digital newsstand A digital newsstand is a digital distribution platform for downloadable newspapers and magazines  UI is not interactive  No Recommendation System used
  7. 7. Tools Used • Gensim is used to extract semantic topics from documents. For this a corpus is made which is used to infer their topic. Then the status fetched previously are used to compare with this corpus using lda and accordingly a topic is assigned to the status. Now we find the users interest on the basis of the topic of his status. We can then use this information to get other users with similar interest. • Pygoogle : pygoogle is a very basic Google search module for Python. It has a limitation of only 64 results at present we are fetching the top five results • NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning • Goose: It is used to extract article ,tagline and main image from the specified URL. • Feedfinder: It’s a type of search engine which searches for RSS feeds starting at any toplevel URL and crawls the whole site.
  8. 8. • Beautiful soup: Beautiful Soup is a Python library for parsing HTML documents (including having malformed mark-up, i.e. non-closed tags, so named after Tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, so this library is useful for web scraping extracting data from websites. • Flask : Flask is used to host local server on our desktop to recommend feed/articles from python script and display it to the user on HTML. API Used • Clipped: Clipped analyses text grammatically to extract the most important information from an article. The algorithm identifies sentence structures statistically, and graphically determines predicate-subject relationships to generate the best summaries possible. • Google RSS Feeder: The Google Feed API takes the pain out of developing mashups in JavaScript because you can now mash up feeds using only a few lines of JavaScript, rather than dealing with complex server-side proxies. This makes it easy to quickly integrate feeds on your website. • RSS Feed Validator: This is a validator for syndicated feeds. To use it, simply enter the address of your feed and click Validate. If the validator finds any problems in your feed, it will give you messages for each type of problem and highlight where the problem first occurs in your feed
  9. 9. Novelty • Tabloid allows blog or website readers a single destination when checking updates. • User can customize their own magazine according to his interest by subscription • Content can be separated into categories. • Example if the user likes news on car and related topic he can categories it under their own desired name such as automobile • Subscriptions can be easily added or removed. User gets an option to preview its contents before subscribing it. • Recommendation based on the users Facebook profile activities, which will suggest him with various other relevant content that might interest him • Does not has a cold start problem as there will always be data as ones the user sign-up the application will have access to all the data from the users profile
  10. 10. Snapshots
  11. 11. Implementation Issues Problem 1 • A key issue with content-based filtering is whether the system is able to learn user preferences from user's actions regarding one content source and use them across other content types it’s a cold start problem Solution The user is asked to register from his Facebook account so that a dataset can be created from his status/comments
  12. 12. Problem 2 To abstract the features of the items in the system generally tf– idf representation (also called vector space representation) is used, but in our case the results were not optimum Solution Semantic analysis was used as it does not involve prior semantic understanding of the documents
  13. 13. Problem 3 Comparisons for the document similarity was used to remove the repetition of documents with the similar news. Solution LSA model was created for discovering the abstract "topics" that occur in a collection of documents.
  14. 14. Algorithms Levenshtein distance • The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single- character edits (i.e. insertions, deletions or substitutions) required to change one word into the other
  15. 15. • Tf-idf • Term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.
  16. 16. • Cosine Similarity • Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a Cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1].
  17. 17. Latent semantic analysis (LSA) • Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per paragraph (rows represent unique words and columns represent each paragraph) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of columns while preserving the similarity structure among rows. Words are then compared by taking the cosine of the angle between the two vectors formed by any two rows. Values close to 1 represent very similar words while values close to 0 represent very dissimilar words
  18. 18. Limitations of the solution • New feeds • Tabloid has a feature to add the news feed by just simply inputting the url of that feed in the search. The problems arises here when a user types in the correct url it gets validated by an url validator api, sometimes the correct url does not get validated, hence can cause trouble to the user News Feed searched as string are not added dynamically • Once the text is searched, it does not updates immediately because first it corresponding rss is searched then updated in the database
  19. 19. Findings • The latent structure approach is useful for helping people find textual information in large collections. It helps overcome vocabulary problems that pose severe limits on human-computer interaction by automatically extracting underlying semantic factors. In terms of helping users find objects of interest, the LSI approach compares quite favorably with existing keyword-based methods, and suggests some new possibilities for customization and selection.
  20. 20. • Propositions typically represent semantic information at a clause level, while LSA is more successful performing analyses at a sentence or paragraph level. The few words in a clause make the vectors in LSA highly dependent on the words used in that clause, while sentences contain enough words to permit a vector that more accurately captures the semantics of the sentence
  21. 21. Conclusion • Tabloid is a content aggregation service. It provides the aggregation of RSS feeds, as well as aggregation of well-known blogs and websites built into the service. • With improved recommendation system it can stand out compaired to other aggregation services. • No longer do the user need to constantly revisit the same site, wondering what’s new. • Subscriptions can be easily added or removed. User gets an option to preview its contents before subscribing it.
  22. 22. Future Work • Expand this for other social networking websites such as Twitter, LinkedIn etc to improve the cold start problem . • Improve the efficiency of the algorithm for more accurate and faster results
  23. 23. Thank You

×