Thursday, August 25, 2005

more xml schema validation

I thought I had Xml validation sorted after the following entries (java and xml, and schema validation).

So today when I went to validate some xml's against a Xsd I was a bit surprised when it didn't work

What was more surprising was that it was working, and when I changed something small, it stopped.

As part of a Unit Test I was validating an xml file against a schema, and it was working fine. However I was loading the Xsd via a webserver into the parser. This wouldn't do since other developpers would want to run the unit tests and would not want the hassle of starting up a webserver and dropping the xsd file in place initally before running the tests.

So I changed the URL to load the Xsd via the file:// protocal. See below

xsd="file://"+path+"wishlist.xsd";


Running that however gave the following exception



org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element 'wishList'.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
....


But I knew the XSD was correct because it had been working before when I retrieved it via http:// on my webserver....

Do you see the problem? Well I didn't initially. After some poking around I found the solution.

xsd="file:///"+path+"wishlist.xsd";

See the difference. Subtle isn't it. Basically xerces insists that file needs 3 forward slashes in it url definition (file:///wishlist.xsd). That was the problem. Surely a better error message could have been generated there. (Maybe XSD not found or something)

Anyway I then started gettign the following error



org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
...


Short answer to this problem was my file character encoding. I had saved my xsd using a free editor (crimson). By default it saves files in Ascii encoding. However my Xsd expected to be in UTF-8. Changing the character encoding and re-saving solved the problem.

Amazing how a simple little task can tie you down for hours..

2 comments:

Anonymous said...

it has ever been file:/// since the old remote exploit of ie4

Anonymous said...

I just wanted to say thank you.

I had:

org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence

problem.

I googled it and found a solution (text editor saving in ascii) here.

So, thx. :)