How To Download A Crawler?


hello
we have make an application that require a crawler for images and web pages
1.able to store and index all types of common web and images formats, asp,jsp,xhtml, coldfusion,htm, dhtml, and images support different ratio of compression and convertible but saved in least and most stored format
2.can run on multiple different platform of os, and support distributed computing, and multi programming language integration
3.support backup and recovery

,

  1. #1 by bailo_de at July 4th, 2009

    Well, the question truly depends on your purposes, but I use a crawler called “wget”, which seems to fulfil your needs. It is a gpl – multiplatform application, with lots of possibilities. You call it from a command line.
    For example, if you want to download a whole web page, say: http://www.only-from-this-domain.com/kil...
    and save it in some local folder, but only the references corresponding to http://www.only-from-this-domain.com, with a third level of deepness, asking it to convert the links to local ones, then you could use this command line:
    wget -D http://www.only-from-this-domain.com -r -l3 -Pc:localfolder –convert-links -H “http://www.only-from-this-domain.com/ki…
    Regards, from
    Bogotá, Colombia.

(will not be published)
  1. No trackbacks yet.
Powered by Yahoo! Answers