I must say this right now, I have not explored its potential fully, and I am definatly not an expert on the subject but I was hoping for something more. For instance, my biggest gripe at the moment is that it has an "in-memory" query language (unless I am mistaken) which means that the XML document has to be fully loaded into memory.
I don't have the code I was using in hand at the moment, but I wanted to load a 900MB XML file to do some simple processing on it. I had the code ready to itterate accross the XML document and do a simple select on the data fields that I wanted and it would return a List<> of the correct objects (this part seemed cool). I ran out of memory though :( I did the same thing with a normal XML Reader in just the same time it took me to create the SELECT statement (admitadly I had to learn about XLinq) and it was soooooo much quicker and the memory footprint was a lot smaller. It just struck me that using XLinq was an overkill, it didn't offer me anything extra for this simple task and it had to load the whole document into memory. I would like to see an XLinq that didn't have to load the whole document but could SAX push or XmlReader pull the data as it scanned through the document. I am pretty sure (after I thought about it) that this would be achieve able quite easily on Microsofts part, because I presume (and I am only presuming) that the XLinq has to forward scan and depth traverse through the DOM that it would be like scanning through the document with an XMLReader, after all the way the .Select arguments are ordered is pretty intuative to that sort of scanning. Other operations such as grouping could be done one the array of filtered objects has been brought back. This way he document would not have to be completly loaded into memory first and only a subset of the data would be loaded.
Maybe XLinq can do this already. I am definatly not seeing how, but I know I can miss things. It just seems that it will be okay at the small things, but after a certain size document it loses its appeal.
| Related Tags |
| memory footprint [feed], xml document [feed], xmlreader [feed], xml file [feed], overkill [feed], query language [feed], gripe [feed], XLinq [feed], Linq [feed], .Net [feed], xml [feed] |
| Related Amazon Books |
| Oracle XML Handbook (Oracle Press S.): View From Amazon UK/View From Amazon USA Xquery - XML Query Language: View From Amazon UK/View From Amazon USA XQuery from the Experts: A Guide to the W3C XML Query Language: View From Amazon UK/View From Amazon USA Sql: The Structured Query Language: View From Amazon UK/View From Amazon USA Sql: The Structured Query Language: View From Amazon UK/View From Amazon USA Professional ASP.NET 2.0: View From Amazon UK/View From Amazon USA Pro C# 2005 & the .NET 2.0 Platform: View From Amazon UK/View From Amazon USA Beginning ASP.NET 2.0: View From Amazon UK/View From Amazon USA CLR Via C#: Applied .NET Framework 2.0 Programming: View From Amazon UK/View From Amazon USA Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries: View From Amazon UK/View From Amazon USA |
| Related Images From Flickr |
![]() ![]() ![]() ![]() ![]() |
Comments: [Add New]
Hi, I'm the program manager for XLinq at Microsoft. I wanted to let you know that we are looking into this very problem right now. It would be good to hear from you and others in more detail about how your big XML file is structured. Your idea of having a LINQ-queryable XmlReader stream is one we have considered, but that doesn't really leverage the rest of XLinq. We'd prefer something akin to the XStreamingElement class in the May CTP, where a repeated element structure is evaluated "lazily". The trick is to define the structure of the streaming input without a) requiring a schema, b) requiring the user to learn a different technology such as XPath (remember that XLinq is not necessarily aimed at an audience of XML experts who already know such things), and c) making it so complex that users might as well use XmlReader to do the job.
Specific question: does your 900MB document have a regular structure, e.g. is it 900,000 1K elements that have the same structure, 900 1MB documents with varying structures, one big amorphous thing, or what? Is there some less structured "header" information at the beginning before any regular repeating structure begins? I think we'll be able to offer something that is simple to use and powerful for the case where large documents consist of many relatively well-structured top-level elements, but we're wondering how much complexity beyond that we can feasibly support before saying "just use XmlReader".
Thanks! You can contact me via the "contact us" form at blogs.msdn.com/mikechampion if you want to follow up, or leave a comment in one of the entries there.







