. - C#, .Net Framework" /> Topicala Page Index Token

A Journal about the experiences I have developing little applications in C#, Perl, Html and Javascript and talking about things new things that I use. Always Geeky; Always Nerdy; Always poor Grammer!

I am a Software Analyst Developer working in Southport, England but living in Liverpool. I develop mainly in C# and ASP.Net. I have been developing comercial software for several years now. I maintain this site (hosted at SwitchMedia UK) as a way of exploring new technologies (such as AJAX) and just generally talking about techie geek issues. This site is developed through a host of Perl scripts and a liberal use of Javascript. I enjoy experimenting with new technologies and anything that I make I host here.

Quick Search

Web www.kinlan.co.uk

Friday, May 26, 2006

Some things about XLinq

I have been playing around with XLinq in C# 3.0, and I must say I am not that fulfilled by the querying aspects of the API.

I must say this right now, I have not explored its potential fully, and I am definatly not an expert on the subject but I was hoping for something more. For instance, my biggest gripe at the moment is that it has an "in-memory" query language (unless I am mistaken) which means that the XML document has to be fully loaded into memory.

I don't have the code I was using in hand at the moment, but I wanted to load a 900MB XML file to do some simple processing on it. I had the code ready to itterate accross the XML document and do a simple select on the data fields that I wanted and it would return a List<> of the correct objects (this part seemed cool). I ran out of memory though :( I did the same thing with a normal XML Reader in just the same time it took me to create the SELECT statement (admitadly I had to learn about XLinq) and it was soooooo much quicker and the memory footprint was a lot smaller. It just struck me that using XLinq was an overkill, it didn't offer me anything extra for this simple task and it had to load the whole document into memory. I would like to see an XLinq that didn't have to load the whole document but could SAX push or XmlReader pull the data as it scanned through the document. I am pretty sure (after I thought about it) that this would be achieve able quite easily on Microsofts part, because I presume (and I am only presuming) that the XLinq has to forward scan and depth traverse through the DOM that it would be like scanning through the document with an XMLReader, after all the way the .Select arguments are ordered is pretty intuative to that sort of scanning. Other operations such as grouping could be done one the array of filtered objects has been brought back. This way he document would not have to be completly loaded into memory first and only a subset of the data would be loaded.

Maybe XLinq can do this already. I am definatly not seeing how, but I know I can miss things. It just seems that it will be okay at the small things, but after a certain size document it loses its appeal.

Related Tags
[feed], [feed], [feed], [feed], [feed], [feed], [feed], [feed], [feed], [feed], [feed]

Related Wikipedia Documents
, , , , , , , , ,

My Related Documents
, , , , , , ,

Related Amazon Books
Oracle XML Handbook (Oracle Press S.): / Xquery - XML Query Language: / XQuery from the Experts: A Guide to the W3C XML Query Language: / Sql: The Structured Query Language: / Sql: The Structured Query Language: / Professional ASP.NET 2.0: / Pro C# 2005 & the .NET 2.0 Platform: / Beginning ASP.NET 2.0: / CLR Via C#: Applied .NET Framework 2.0 Programming: / Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries: /

Related Images From Flickr

Comments: [Add New]

Hi, I'm the program manager for XLinq at Microsoft. I wanted to let you know that we are looking into this very problem right now. It would be good to hear from you and others in more detail about how your big XML file is structured. Your idea of having a LINQ-queryable XmlReader stream is one we have considered, but that doesn't really leverage the rest of XLinq. We'd prefer something akin to the XStreamingElement class in the May CTP, where a repeated element structure is evaluated "lazily". The trick is to define the structure of the streaming input without a) requiring a schema, b) requiring the user to learn a different technology such as XPath (remember that XLinq is not necessarily aimed at an audience of XML experts who already know such things), and c) making it so complex that users might as well use XmlReader to do the job.

Specific question: does your 900MB document have a regular structure, e.g. is it 900,000 1K elements that have the same structure, 900 1MB documents with varying structures, one big amorphous thing, or what? Is there some less structured "header" information at the beginning before any regular repeating structure begins? I think we'll be able to offer something that is simple to use and powerful for the case where large documents consist of many relatively well-structured top-level elements, but we're wondering how much complexity beyond that we can feasibly support before saying "just use XmlReader".

Thanks! You can contact me via the "contact us" form at blogs.msdn.com/mikechampion if you want to follow up, or leave a comment in one of the entries there.

By Anonymous Mike Champion, at Sunday, June 04, 2006 8:45:00 PM

After we published the code samples for LINQ in Action's first chapters in LINQPad a few weeks ago, the samples of three more chapters have just been added. These chapters cover LINQ to XML. Thanks Ji...

By Blogger AkshayV, at Tuesday, June 30, 2009 5:27:00 AM