Post

Search engine for XML documents

In Everything, Tech on October 10, 2006 by @vambenepe

One of the entries that has been collecting dust in the “draft” folder for this blog was about how it would be nice to have a search engine for XML documents. So, when the announcement of Google Code Search came out, I thought it was finally done and I could delete the never-published entry. Well, turns out it doesn’t support searching on XML documents. I don’t care to debate whether XML (or some XML dialects) is code or not, all I know is that it would be very nice to be able to do things such as:

  • look for instances of a specific GED
  • compare how often different XSD constructs are used (choice, sequence…)
  • look for all wsdl:binding elements that implement a given portType
  • look for all wsdl:port elements and all the WS-A EPRs that have an address in the hp.com domain
  • look for all XML documents for which a given XPath query evaluates to “true”
  • look at the entire Web (or a subset of it) as one giant SML model and query it
  • even for good old HTML/XHTML documents, it would be nice to search them as XML documents and be able to look for pages that contain a certain string as part of the title element or as part of a list.

In the meantime, people are going to have fun searching for password embedded in source code and other vulnerabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>