. Topicala Page Index Token

A Journal about the experiences I have developing little applications in C#, Perl, Html and Javascript and talking about things new things that I use. Always Geeky; Always Nerdy; Always poor Grammer!

I am a Software Analyst Developer working in Southport, England but living in Liverpool. I develop mainly in C# and ASP.Net. I have been developing comercial software for several years now. I maintain this site (hosted at SwitchMedia UK) as a way of exploring new technologies (such as AJAX) and just generally talking about techie geek issues. This site is developed through a host of Perl scripts and a liberal use of Javascript. I enjoy experimenting with new technologies and anything that I make I host here.

Quick Search

Web www.kinlan.co.uk

Saturday, January 15, 2005

Binary XML

I have just been reading a bit about Binary XML. Am I missing something?

There are lots of arguments against XML which I will try and summarise:
  • Text Processing is Slow;
  • The Verbosity can cause storage problems;
  • You can't store binary data,

Basically, most of the arguments against using XML for data storage and transmission revolve around the time it takes to process the text, storing strings of data rather than there binary representation (such as "12" instead of 00001100 [binary byte]).

I have been working with XML based data for a while now and I don't really see many problems with data storage as text. It doesn't seem to be that slow for what I use. Processing speed and datasizes can be made up by slightyly reducing the verbosity of the tags and attributes. In the end you can always gzip. As for binary data, couldn't you just store the data in a CDATA, and also include the endian?

With Binary XML's I just don't see the point. How much space would you save if you had a well defined XML. How are you supposed to know the data you are looking at in Binary XML with the added loss of verbosity (I suppose a schema would come in to play, but would that also be binary?) why not just lose the XML and just have a binary data store which you define and process how you want? How would you throw a nice little XPath query together? Would you have to perform a translation BINARY TOKEN <-> TEXTUAL TOKEN?

Would a Binary XML parser be better for memory constrained devices? <- I don't know :)

Some good links I have found

Comments: [Add New]

you're right, for what you use you most likely won't notice any storage/speed issues.

for enterprise size usage this might be quite significant though.

assume the following
- text compression rate is usually approx x20
- heavily loaded web frontend can produce up to 2GB of log data everyday
- web farm is 100 nodes

so, what do we have. we have 2x100=200gb of logs produced everyday. if we were to use binary/compressed format, that could be reduced to 10gb, thus saving us 190gb each day.

EU rulings are that sensitive data must be kept for at least 5 years.

now tell me, does 190x365x5 GB which is approx 330TB make a difference?

By Anonymous, at Saturday, May 26, 2007 4:24:00 PM

but realistically, of that 330Tb, how much is necessary to archive? yup, I know that jnr. prog will always leave the trace level on max but is a busy web server going to produce 2Gb of transactional data ( sufficient to reconstruct a transaction ) ? I think not. It's that transactional data that has to be in permanent storage and we have DBMS for that purpose.

By Anonymous, at Monday, June 04, 2007 1:58:00 PM