Woodstox and the w3c 503 error
This morning, I was testing our static analysis tool and it threw a very strange error:
Could not read source file: Server returned HTTP response code: 503 for URL http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
It came from the Woodstox parser when it was trying to parse the dtd listed in the html tag of one of our files.
So, I browsed to said dtd and found this page instead:
IP blocked due to re-requesting files too often
Your IP address has been blocked from accessing our site for 24 hours due to abuse.
The specific type of abuse we observed is: re-requesting the same resource too frequently. Specifically, we received at least 500 requests for the same resource (URI) from your IP address within a ten-minute time interval.
If you are using an application that makes HTTP requests to other sites, please configure it to use an outgoing HTTP cache instead of re-requesting the same files over and over again.
... and so on.
So, this led me to a lot of interesting research that I won't go over here. I am simply going to show one way that I learned around the issue with the Java XMLStreamReader and Woodstox.
To get an XMLStreamReader, one can do this:
InputStream is = ...;
XMLInputFactory factory = XMLInputFactory.getInstance();
XMLStreamReader reader = factory.createXMLStreamReader(is);
This will create a "ValidatingStreamReader" which is going to request the dtd each time it sees one. Thus, the complaint from w3c that its xhtml1-transitional dtd was being requested to often.
There are two ways that I see to solve this, and I found one after digging in the API for a few minutes. If I change my code to read this:
InputStream is = ...;
XMLInputFactory factory = XMLInputFactory.getInstance();
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
XMLStreamReader reader = factory.createXMLStreamReader(is);
Then it won't request any dtds when parsing the xml file.
I found another property when digging through the Woodstox code that I can't figure out how to access. It was in InputConfigFlags and is referenced in ReaderConfig, which is an object fashioned in the Woodstox implementation of XMLStreamReader:
/**
* If true, input factory is allowed cache parsed external DTD subsets,
* potentially speeding up things for which DTDs are needed for: entity
* substitution, attribute defaulting, and of course DTD-based validation.
*/
final static int CFG_CACHE_DTDS = 0x00010000;
This seems like the more appropriate solution. Any ideas on how to access it?
1 comments: