There are lots of arguments against XML which I will try and summarise:
- Text Processing is Slow;
- The Verbosity can cause storage problems;
- You can't store binary data,
Basically, most of the arguments against using XML for data storage and transmission revolve around the time it takes to process the text, storing strings of data rather than there binary representation (such as "12" instead of 00001100 [binary byte]).
I have been working with XML based data for a while now and I don't really see many problems with data storage as text. It doesn't seem to be that slow for what I use. Processing speed and datasizes can be made up by slightyly reducing the verbosity of the tags and attributes. In the end you can always gzip. As for binary data, couldn't you just store the data in a CDATA, and also include the endian?
With Binary XML's I just don't see the point. How much space would you save if you had a well defined XML. How are you supposed to know the data you are looking at in Binary XML with the added loss of verbosity (I suppose a schema would come in to play, but would that also be binary?) why not just lose the XML and just have a binary data store which you define and process how you want? How would you throw a nice little XPath query together? Would you have to perform a translation BINARY TOKEN <-> TEXTUAL TOKEN?
Would a Binary XML parser be better for memory constrained devices? <- I don't know :)
Some good links I have found
- http://www.xml.com/pub/a/2003/08/13/deviant.html [Binary XML, Again], A nice level article about some of the pros and cons.
- http://www.www2004.org/proceedings/docs/1p345.pdf [An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing], A paper comparing Binary XML based technologies. I haven't seen most of the technologies but it is a good read anyway.
- http://lists.xml.org/archives/xml-dev/200104/msg00207.html [RE: "Binary XML" proposals], A thread about storing Binary Data in an XML file that seemed to gravitate towards BinaryXML :)
Comments: [Add New]
you're right, for what you use you most likely won't notice any storage/speed issues.
for enterprise size usage this might be quite significant though.
assume the following
- text compression rate is usually approx x20
- heavily loaded web frontend can produce up to 2GB of log data everyday
- web farm is 100 nodes
so, what do we have. we have 2x100=200gb of logs produced everyday. if we were to use binary/compressed format, that could be reduced to 10gb, thus saving us 190gb each day.
EU rulings are that sensitive data must be kept for at least 5 years.
now tell me, does 190x365x5 GB which is approx 330TB make a difference?
but realistically, of that 330Tb, how much is necessary to archive? yup, I know that jnr. prog will always leave the trace level on max but is a busy web server going to produce 2Gb of transactional data ( sufficient to reconstruct a transaction ) ? I think not. It's that transactional data that has to be in permanent storage and we have DBMS for that purpose.
By , at Monday, June 04, 2007 1:58:00 PM
