CNIC Information System with Pakdata Cf In Pakistan
Â
Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)
1. Hacking RSS:
Filtering & Processing
Obscene Amounts of Information
#hackingRSS
Dawn Foster
Intel Community Manager
for MeeGo
dawn@fastwonder.com
2. Information Overload
CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
3. Who Cares?
â Most of it is âŠ
â complete crap
â out of date / obsolete
â not interesting to you
â irrelevant for you
Junk Pile: http://www.flickr.com/photos/zen/4013525/
4. You Want to Find the Needle
Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
5. RSS Alone is a Start
â Sources you care about delivered right to you. But âŠ
â Do you care about everything in each feed?
â What about the feeds you aren't subscribed to?
â Can you keep up with what you have?
6. Prioritize Your Reader
â Put things you care about at the top
â Categorize
â Don't try to read everything
7. The Real Magic is in Filtering RSS
Complete Crap
Interesting
Maybe Relevant
Yay!
â In my Google Reader right now:
â Analyst research blogs mentioning Online Community
â Analyst research blogs mentioning MeeGo
â Searches across social sites mentioning me, my projects, my
websites etc. - filtering out things I don't care about
â My favorite blogs filtered using PostRank to find only the
ones with a lot of comments or social mentions
8. RSS Filtering Tools
â Yahoo Pipes (my favorite)
â More powerful & fexible: options to filter any data found in
any field in the rss feed (URL, title, description, author âŠ)
â Downside: takes some time to learn & can be a little faky at
times. Also a single point of failure if Yahoo ever killed it.
â Other Options
â FeedRinse: easy to use, not as fexible. Import RSS feeds,
add filters, get new RSS feeds out.
â RSS readers with filtering / alerts (FeedDemon)
â Code: write your own filters
â Note: many free RSS filtering services have gone out of
business â can be bandwidth intensive & costly to host.
10. PostRank
â Best Posts in a
feed
â Ranked on
engagement (links,
sharing, comments)
â Can get output as
RSS feed
â Feed includes
postrank number as
a field
11. What's In a Feed? PostRank (Yahoo Pipes View)
â Content in feeds varies wildly depending on site.
â Common: title, author, pubDate, link, content, description
â Site-specific: postrank, lat/long, image links, username,
twitter source ⊠(most RSS readers don't show these)
â API: usually has additional data & can output RSS
â If it's in the feed, you can use it!
12. Reformatting / Modifying RSS Feeds
Don't be satisfied with default RSS feed formats!
Twitter
Search
Twitter
RSS
Feed
Modify & more quickly scan key data
13. Yahoo Pipes: Reformat Twitter Feed
â Input:
â Twitter Search
feed
â Loop String Build:
â Author
â : (spacing)
â Title
â Loop Assign:
â Store result back
into title
â Output:
â 1 RSS feed
â Efficient format
14. BackTweets (BackType API)
â Data about links on
Twitter
â Finds links regardless of
shortening service
â No RSS Feeds
â But ⊠You can use
API + Pipes to build
one!
15. BackType + Twitter API + Pipes Output
â Data from BackType + Twitter
â Built an RSS feed using Yahoo Pipes
â Included the information relevant for me
â Could have included or filtered on: name, listed count,
location, profile image, user URL, ...
16. Admit it, we ALL do vanity searches
â You can enter your search queries in Google, Twitter,
Flickr âŠ
â Add a new project & have to update all of them
â Can be hard to filter out some results
â May have duplicates from multiple searches
â Yahoo Pipes
â Update keywords in a CSV file
â Use CSV file as input into a bunch of searches (RSS or
API inputs)
â Filter out what you don't want
â Get 1 filtered RSS feed as output
2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
17. How Should / Shouldn't You Use All of This?
â Do:
â Use this for personal productivity
â Play around, create prototypes and understand the possibilities
â Don't:
â Don't violate licenses on content or republish w/o permission
â Don't use in critical or production environments
â For production use or putting data on websites:
â Re-write in a real programming language with cached results
and error checking
XKCD Comic: http://xkcd.com/327/
18. Learn More
About Dawn:
â Intel Community Manager for MeeGo
â Author of Companies and Communities
â More Info: http://fastwonderblog.com
â Dawn@FastWonder.com
â @geekygirldawn on Twitter
18
Additional Reading & audio from 1 hour version of this talk:
â http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
21. Yahoo Pipes: Reformat PostRank Feed
â Input:
â 3 PostRank feeds
â Loop String Build:
â PostRank
â : (spacing)
â Title
â Loop Assign:
â Store result back
into title
â Output:
â 1 RSS feed
â Efficient format
22. Yahoo Pipes PostRank Example
â Input PostRank
Feeds:
â Engadget
â CrunchGear
â Boy Genius
â Filter by content
â Tablet
â Sort:
â PostRank
â Output
â 1 RSS feed
â Best tablet posts
23. Using Web APIs 101
â Many API calls are basically URLs
â Constructing URLs
â Use API documentation/examples to
format the URL
â http://api.twitter.com/1/statuses/show
/ID.xml
â Version 1 of API show status for ID
in .format
â API keys
â Tells API who you are (password)
â Rate limiting
â Only get so much & you're cut of
â Limited by IP or API key
â Chill out for a while & come back
XKCD Comic: http://xkcd.com/844/
24. Backtweets API + Twitter API + Yahoo Pipes
â What we want to do:
â Start with a set of URLs (blog posts in a feed)
â Find any tweet mentioning those URLs
â Return the tweet and data about the person who posted it
â Mission: Build feed using only data from these 2 APIs
â BackType API provides Tweet ID (not humanly useful)
â http://api.backtype.com/tweets/search/links.xml?
q=URL&mode=batch&key=KEY
â List of Twitter Status IDs for Tweets linking to URL
â Note: I think this feature may be deprecated
â Twitter API uses Tweet ID to get everything else
â http://api.twitter.com/1/statuses/show/ID.xml
â Returns a single status all relevant data for ID
25. BackTweets API: Get Tweet ID
â Take WebWorkerDaily Author Feed
â Use WWD URLs to build URLs for BackType API call
â Fetch data from BackType URLs to get Tweet ID
26. Twitter API: Get Data Based on Tweet ID
â Use BackType tweet ID to build URL for Twitter API
â Fetch data about Tweet & User from Twitter API
â Re-Build title to show âuser (followers): tweetâ
27. Add Filters to BackType + Twitter Example
â Show only tweets from people with 1000+ followers