Before things had really caught on with Atom, RSS etc. There were many people looking for ways to handle Syndicated content. This was a pretty successful talk that I ended up giving quite a bit.
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
2001: Bridging the Gap between RSS and Java Old School Style
1. 1
Enabling Live
Newsfeeds using RSS,
Servlets and
Transformations
Russell Castagnaro
Russell@4charity.com
Introduction
Presenter
? Russell Castagnaro
? Chief Mentor
? 4Charity.com
? SyncTank Solutions, Inc
? russell@4charity.com
? Experience
2. 2
Introduction
4Charity.com
? Application Service Provider for the Non-
Profit industry
? Pure Java development
? Http://www.4charity.com
? Locations:
? San Francisco,CA (HQ)
? Honolulu, HI (Tech Team)
Goals?
Leverage the Servlet 2.2 API
Employ XML for data and configuration
Use Resource Definition Format for
content data
Format XML using XSL Transformation
Eliminate hard-coding values
3. 3
What’s the deal?
Newsfeeds are becoming a requirement
for portal sites.
Easy integration with existing web
services is a key requirement?
How can we avoid writing custom code
for information providers?
Can we avoid applets!!?
Background
In 1999 I wrote an information portal
application.
Live newsfeeds seemed like a good idea
I wrote custom parsers and employed an
open-source tool called Cocoon
Every time the html changed, I had to
recode!
4. 4
Code Example
Needed different ‘ParsSpec’for each
content provider
URLToXMLConsumer.java
SpaceProducer.java
These worked great for 2 months...
‘ParseSpec’
#HeadlineEntry
cacheTime=6000
HeadlineEntry=start=n,end=<p>,attributes=Link,URL,Headline,Source,Date
HeadlineEntry.Link=start=<a href=",end=">
HeadlineEntry.Headline=start=">,end=</a>
HeadlineEntry.Source=start=<font size="-1">,end=</font>
#HeadlineEntry.Description=start=<br>,end=<br>
HeadlineEntry.Date=start=- <i>,end=</i>
HeadlineEntry.DTD="http://space.synctank.com/dtds/newsfeed.dtd "
HeadlineEntry.Doctype=Newsfeed
HeadlineEntry.URL=http://search.news.yahoo.com/search/news?p=space+aerospace&n=
HeadlineEntry.QTY=10
HeadlineEntry.XML=version="1.0"
HeadlineEntry.Header=
<?xml-stylesheet href="http://space.synctank.com/xsl/spacenews.xsl" type="text/xsl"?>n
<?cocoon-process type="xslt"?>n
<!-- ============================================================ -->n
<!-- spacenews.xml -->n
<!-- Simple XML file that uses the Newsfeed DTD. -->n
<!-- Author: XML Loader Russell Castagnaro Thu Nov 18 22:59:07 HST 1999 ->n
<!-- ============================================================ -->n
5. 5
Java Code
URLToXMLProducer.xml and subclasses
Nice Features
All search providers content was
converted to one XML document type
Once the XML was created all search
engines results were handled easily
with XSLT
6. 6
Document Type Definition
<?xml version="1.0" encoding="US-ASCII" ?>
<!-- Newsfeed.dtd -->
<!-- Simple DTD that defines a grammar for news Feeds. -->
<!-- Author: Russell Castagnaro Nov 15 1999 -->
<!ELEMENT Newsfeed (HeadlineEntry)+>
<!ELEMENT HeadlineEntry (Link, Headline, Source, Description, Date)>
<!ELEMENT Link (#PCDATA)>
<!ELEMENT Headline (#PCDATA)>
<!ELEMENT Source (#PCDATA)>
<!ELEMENT Description (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
NewsFeed Content (XML)
<?xml version="1.0"?>
<?xml-stylesheet href="spacenews.xsl" type="text/xsl"?>
<?cocoon-process type="xslt"?>
<Newsfeed>
<HeadlineEntry>
<Link>http://dailynews.yahoo.com/h/ap/19991222/sc/space_shuttle_77.html</Link>
<Headline>Shuttle Astronauts Begin <b>Space</b>walk</Headline>
<Source>(Associated Press)</Source>
<Date>Dec 22 6:08 PM EST</Date>
</HeadlineEntry>
<HeadlineEntry>
<Link>http://biz.yahoo.com/rf/991222/xr.html</Link>
<Headline>RESEARCH ALERT - Boeing raised to buy</Headline>
<Source>(Reuters)</Source>
<Date>Dec 22 12:03 PM EST</Date>
</HeadlineEntry>
</Newsfeed>
7. 7
Transforming the
Newsfeed
Make the news feed human readable:
? Create a Stylesheet using the XML
DOCTYPE rules
? Transform the XML Document Using the XSL
Document
* Specifics on transformations coming soon!
The StyleSheet
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="no"/>
<xsl:template match="/">
<TABLE width="100%" cellpadding="0" cellspacing="0" border="0">
<TR><TD bgcolor="#3366CC" align="left" valign="middle">
<font face="helvetica, arial" size="2" color="#FFFFFF">
<nobr><b>News</b></nobr></font>
</TD><TD align="right" bgcolor="#3366CC" valign="top" >
<a href="/space/news/spacenews.xml">
<font face="helvetica, arial" size="1" color="#FFFFFF">View</font></a>
<IMG SRC="/space/images/spacer2.gif" BORDER="0" WIDTH="5" HEIGHT="2"/>
</TD></TR>
<TR><TD>
<font size="2" face="Arial, Helvetica, sans-serif">
<b>Space and Aerospace News</b></font><BR/>
<xsl:apply-templates/>
</TD></TR></TABLE>
</xsl:template>
<xsl:template match="HeadlineEntry">
<B><FONT face="helvetica, arial" size="1">
<A HREF="{Link}"><xsl:value-of select="Headline"/></A>
</FONT></B> - <I>
<FONT size="-2" face="Arial, Helvetica, sans-serif">
<xsl:value-of select="Source"/></FONT></I><BR/>
</xsl:template>
</xsl:stylesheet>
8. 8
HTML Content
Then the Display Format
Changed
Simple changes in the format from any
site required significant changes
Changing the parsing rules was not
trivial
Eventually this became boring and
tiresome
9. 9
Interesting Points
I was not interested in manipulating XML
documents within Java*
I did not want to deal with DOM or SAX
I was interested in displaying data in a
clean, efficient manner
The producer code I created was a bit
embarrassing
*I was not lazy. I had a very full schedule at the time… . Sheesh!
Time Warp (Oct 2000)
None of my parsing instructions still
worked ?
I had no interest in using the old code
There had to be a better way
I heard about O’reilly’s merkat project…
10. 10
Enter RDF Site Summary
Preliminary format was v .91 from
Netscape (remember them?)
Resource Definition Format Summary
(RSS .91) http://my.netscape.com/publish/formats/rss-0.91.dtd
Eliminates the need to parse through
HTML for content.
Standard - now WC3 has recommended
version 1.0
RSS Example<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
"http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
<channel>
<title> Space science news</title> <link>http://www.moreover.com</link>
<description>Space science news - news headlines from around the web, refreshed every 15
minutes</description> <language>en-us</language>
<image>
<title>moreover...</title> <url>http://i.moreover.com/pics/rss.gif</url>
<link>http://www.moreover.com</link> <width>144</width> <height>16</height>
<description>News headlines from more than 1,800 sources, harvested every 15
minutes...</description>
</image>
<item>
<title>NASA releases space station crew logs</title>
<link>http://c.moreover.com/click/here.pl?r16768175</link>
<description>floridatoday.com Mar 22 2001 12:20AM ET</description>
</item> <item>
<title>Tough love but support for space by George W. Bushs team</title>
<link>http://c.moreover.com/click/here.pl?r16768185</link>
<description>floridatoday.com Mar 22 2001 12:20AM ET</description>
</item>
</channel>
</rss>
C:developmentCastagnarospacespace-moreover.xml
12. 12
Access to RSS Feeds
Where do you find providers???
Directory of open RSS providers:
? http://www.superopendirectory.com/directory/4/standards/rss/sources
RSS Providers
? 10.am
? http://10.am/search/-rss?search=<your term here>
? List of topics: http://10.am/extra/ocsdirectory.xml
? echofactor
? http://www.echofactor.com/feed_categories.html?format=RSS
? MoreOver
? http://w.moreover.com/categories/category_list.html
Now we need to make this content readable!
Transforming
XML to HTML
We have many options on performing XSL
Transformations:
? Depend on the client’s browser to transform the XML
? Write a Servlet to handle the transformation
? Use software that is widely available and standards based
Issues:
? IE 5.x is one of the few browsers that support XSL
transformations
? Publicly available software has many merits too
? Servlets are easy enough. Transformations can be done in
< 10 lines
13. 13
Transformation in a
Servletpublic void service(HttpServletRequest req, HttpServletResponse res)
throws IOException, ServletException {
PrintWriter out = res.getWriter();
res.setContentType("text/html");
File xmlFile = new File(sourcePath, req.getParameter("XML"));
File xslFile = new File(sourcePath, req.getParameter("XSL"));
try {
XSLTProcessor processor = XSLTProcessorFactory.getProcessor();
processor.process(new XSLTInputSource(new FileReader(xmlFile)),
new XSLTInputSource(new FileReader(xslFile)),
new XSLTResultTarget(out));
} catch (Exception e) {
out.println("Error: " + e.getMessage());
}
out.flush();
}
One Problem
We have to get the XML (RSS) file from
the content provider!
Use the networking classes to access the
URL
Be considerate of your provider!
14. 14
New Codepublic void doGet(HttpServletRequest req, HttpServletResponse res) {
try {
PrintWriter out = res.getWriter(); res.setContentType("text/html");
URLConnection con; DataInputStream in;
URL url = new URL(sourceURL); con = url.openConnection();
con.connect(); String type = null;
in = new DataInputStream(con.getInputStream());
FileReader fr = new FileReader(xslsrc);
try {
XSLTProcessor processor = XSLTProcessorFactory.getProcessor();
processor.process(new XSLTInputSource(in), new XSLTInputSource(fr),
new XSLTResultTarget(out));
} catch (Exception e) { log("Error: " + e.getMessage());
} finally { in.close(); fr.close(); }
out.flush();
} catch (Exception e) { …
}
XSLT Model
Request
Response
Servlet
URL Loaded
XML
XSLT
Processor
XSL
Document
HTML
NewsFeed
15. 15
Setting up your servlet
Most Appservers or Webservers
support WAR’s and Deployment
Descriptors
You create a WebApp which has
servlets, parameters and servlet
mappings
Deployment Descriptor
<web-app>
<servlet>
<servlet-name>newsServlet</servlet-name>
<servlet-class>com.synctank.http.servlets.RSSServlet</servlet-class>
<init-param>
<param-name>ERROR_URL</param-name>
<param-value>/error.jsp</param-value>
<description>The error page for this app.</description>
</init-param>
<init-param>
<param-name>SOURCE_SERVLET_URI</param-name>
<param-value>http://www.moreover.com/cgi-
local/page?o=rss&c=Space%20science%20news</param-value>
<description>An absolute url that points to your XML</description>
</init-param>
16. 16
Deployment Descriptor
<init-param>
<param-name>STYLESHEET</param-name>
<param-value>/xsl/rss.xsl</param-value>
<description>The Stylesheet for presentation of the headlines. Should be a
subdirectory of the war. The default is /xsl/rss.xsl </description>
</init-param>
<load-on-startup>0</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>newsServlet</servlet-name>
<url-pattern>/newsy</url-pattern>
</servlet-mapping> <welcome-file-list>
<welcome-file>/foo/news.html</welcome-file>
</welcome-file-list>
<error-page>
<error-code>404</error-code>
<location>/error.jsp</location>
</error-page>
</web-app>
War directory structure
Root
? WEB-INF
? Web.xml
? classes
? comsynctankhttpservletsRSSServlet.class
? xsl
? rss.xsl
? docs
? Index.html
? error.jsp
17. 17
Moving Forward
RSS version 1.0 has been
recommended by the w3c
1.0 Uses has more flexibility
Once more providers support
Review
Don’t do the time!
Leverage RSS and open content providers
Use XSL to transform XML content to your
format of choice
Cache requests to content providers (keep
them free!)