One of the entries that has been collecting dust in the “draft” folder for this blog was about how it would be nice to have a search engine for XML documents. So, when the announcement of Google Code Search came out, I thought it was finally done and I could delete the never-published entry. Well, turns out it doesn’t support searching on XML documents. I don’t care to debate whether XML (or some XML dialects) is code or not, all I know is that it would be very nice to be able to do things such as:
- look for instances of a specific GED
- compare how often different XSD constructs are used (choice, sequence…)
- look for all wsdl:binding elements that implement a given portType
- look for all wsdl:port elements and all the WS-A EPRs that have an address in the hp.com domain
- look for all XML documents for which a given XPath query evaluates to “true”
- look at the entire Web (or a subset of it) as one giant SML model and query it
- even for good old HTML/XHTML documents, it would be nice to search them as XML documents and be able to look for pages that contain a certain string as part of the title element or as part of a list.
In the meantime, people are going to have fun searching for password embedded in source code and other vulnerabilities.