README.rdoc

July 19, 2010 ยท View on GitHub

= R.Daneel

An EventMachine+Ruby library to fetch urls obeying robots.txt rules.

RDaneel is built it on top of @igrigorik's {em-http-request}[http://github.com/igrigorik/em-http-request]

== Features

  • Support following redirects, honoring robots.txt for each host in the redirect chain.
  • Support an external cache to store robots.txt
  • Compatible with all options defined in em-http-request

== Install

$ gem install rdaneel

== Examples

=== Following redirects

require 'rdaneel'

EM.run do r = RDaneel.new("http://bit.ly/cbEnpa") r.callback{ puts r.http_client.response_header.status puts r.http_client.response[0,80] puts r.redirects puts r.uri EM.stop } r.errback{ puts "should not happen" EM.stop } r.get(:redirects => 3) end

=> 200 => http://bit.ly:80/cbEnpa => http://github.com:80/hasmanydevelopers/RDaneel

=== Denied by robots.txt

require 'rdaneel'

EM.run do r = RDaneel.new("http://github.com/hasmanydevelopers/RDaneel/tarball/v0.0.0") r.callback{ puts "should not happen" EM.stop } r.errback{ puts r.error EM.stop } r.get(:redirects => 3) end

=> robots denied

== Why RDaneel?

R Daneel Olivaw is a fictional robot created by Isaac Asimov - http://en.wikipedia.org/wiki/R._Daneel_Olivaw

== Acknowledge

To Ilya Grigorik (@igrigorik) for em-http-request lib and his support and advice.

== Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

== Copyright

Copyright (c) 2010 has_many :developers. See LICENSE for details.