. Topicala Page Index Token

A Journal about the experiences I have developing little applications in C#, Perl, Html and Javascript and talking about things new things that I use. Always Geeky; Always Nerdy; Always poor Grammer!

I am a Software Analyst Developer working in Southport, England but living in Liverpool. I develop mainly in C# and ASP.Net. I have been developing comercial software for several years now. I maintain this site (hosted at SwitchMedia UK) as a way of exploring new technologies (such as AJAX) and just generally talking about techie geek issues. This site is developed through a host of Perl scripts and a liberal use of Javascript. I enjoy experimenting with new technologies and anything that I make I host here.

Quick Search

Web www.kinlan.co.uk

Saturday, September 08, 2007

Microformat.net

I would like to take this opportunity to announce that I have created a usable [although beta] release of a generic Microformat parser for .Net.  I don't know of any other frameworks that easily allow you to find Microformats in an html/XML stream that are specifically built for .Net, so I believe that this project is a first (and hopefully a de-facto choice in time to come).

The project can be found on Codeplex at http://www.codeplex.com/microformat.  The current release is Iteration 3.

The parser is stream based and uses an application configuration (see below for an example) to define the how the parser should parse the html/XML stream.  This flexible configuration means that if a spec changes for a Microformat or a new one is introduced then no code needs to be changed in the framework to let users of the framework see the changed data.

<configuration>
<
configSections>
<
section name="MicroformatsSection" type="Microformats.ConfigurationSections.MicroformatConfigSection, Microformat.net"/>
</
configSections>
<
MicroformatsSection>
<
Microformats>
<
Microformat type="rel-tag" rootType="rel" root="tag" dataType="System.Uri" />
<
Microformat type="hCard" rootType="class" root="vcard" dataType="System.String">
<
Fields>
<
Field name="fn" dataType="System.String" plurality="Singular"/>
<
Field name="url" dataType="System.Uri" plurality="Singular"/>
<
Field name="email" dataType="System.Uri" plurality="Singular"/>
<
Field name="adr" dataType="Microformat" plurality="Singular"/>
</
Fields>
</
Microformat>
<
Microformat type="adr" rootType="class" root="adr" dataType="System.String">
<
Fields>
<
Field name="post-office-box" dataType="System.String" plurality="Singular"/>
<
Field name="extended-address" dataType="System.String" plurality="Singular"/>
<
Field name="street-address" dataType="System.String" plurality="Singular"/>
<
Field name="locality" dataType="System.String" plurality="Singular"/>
<
Field name="region" dataType="System.String" plurality="Singular"/>
<
Field name="postal-code" dataType="System.String" plurality="Singular"/>
<
Field name="country-name" dataType="System.String" plurality="Singular"/>
</
Fields>
</
Microformat>
</
Microformats>
</
MicroformatsSection>

The above configuration says that the following Microformats are to be searched for: rel-tag, hCard and adr.  Each Microformat configuration can also be nested (see the hCard spec that allows an adr to be nested inside itself).  This saves on duplicating configuration information.  (Unfortunately a circular reference in the configuration can be defined and plurality of elements is not implemented.  This will be fixed soon).  Currently in this configuration not all of the hCard spec is defined (this was done for simplicity of me showing you how the config works), obviously this means that any parts of a Microformat that you are not interested in you won't see in the output of the framework.

The code that follows shows how easy it is to use this framework:

using (TextReader ms = new StringReader(@"<html><body><div class=""vcard author"">
<a class=""url fn"" href=""http://www.kinlan.co.uk/"">Paul Kinlan</a>
<a class=""email"" href=""mailto:paul.kinlan@gmail.com"">paul.kinlan@gmail.com</a>
<div class=""adr"">
<span class=""locality"">Liverpool</span>,<span class=""region"">Merseyside</span>
</div>
</div>
</div><a href=""http://test.com/test"" rel=""tag"">Test Tag</a></body></html>"
))
{
using (Microformats.Readers.MicroformatReader mr = new Microformats.Readers.MicroformatReader(ms))
{
Microformat m = null;
while(( m = mr.Read()) != null)
{
Console.Out.Write("Found Microformat: " + m.Name + ". Machine Value:" + m.MachineValue + "\n");

foreach (IField f in m.Fields)
{
Console.Out.WriteLine("\t" + f.Name + ": " + f.MachineValue);
}
}

}
}

The first line, simply converts makes a TextReader object that can be used to pass into the MicroformatReader object.  Once the stream has been presented to the framework, then it is as simple as calling mr.Read to iterate to across all the valid Microformats in the document.  The Read() method returns fully constructed Microformat objects that can be examined and used directly in your programs.

I still have a lot of work to do, however it appears (to me at least) to be quite flexible.  I would greatly appreciate any comments and feedback and if you use the framework I would love to hear about it.  If anyone is interested in joining the project let me know.