Google Now Crawls Forms

Google is continuing its effort to bring relevant webpages to the Google searching community. Recently, they announced that its spiders will now be entering data into a small number of HTML forms that it comes accross while crawling the web. By doing this, they hope to find additional pages and URLs that are not currently available through normal text links and are part of the hidden web.

The Google spiders will only be querying forms that use the GET methond and will not be following those that use the POST method or have a password text box. They will not be entering into forms that seem to require personal information such as logon or user ids. Google says that the spider will enter a small number of queries into forms that it encounters while crawling among high-quality websites. They also state that only a small number of very useful sites will have their forms crawled and that their Googlebot will continue to adhere to nofollow and noindex tags as well as the robot.txt documents.

This new crawl should have very little, if any, impact on typical websites, however, will lead to an increase in quality search engine results.

Jayant Madhavan and Alon Halevy from Google’s Crawling and Indexing Team state on the Google Webmaster Central Blog:
“This experiment is part of Google’s broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience. “