web-crawling.md
July 15, 2021 ยท View on GitHub
Bookmarks tagged [web-crawling]
www.codever.land/bookmarks/t/web-crawling
anemone
https://github.com/chriskite/anemone
Ruby library and CLI for crawling websites.
- tags: ruby, web-crawling
- :octocat: source code
LinkThumbnailer
https://github.com/gottfrois/link_thumbnailer
Ruby gem that generates thumbnail images and videos from a given URL. Much like popular social website with link preview.
- tags: ruby, web-crawling
- :octocat: source code
Mechanize
https://github.com/sparklemotion/mechanize
Mechanize is a ruby library that makes automated web interaction easy.
- tags: ruby, web-crawling
- :octocat: source code
MetaInspector
https://github.com/jaimeiniesta/metainspector
Ruby gem for web scraping purposes.
- tags: ruby, web-crawling
- :octocat: source code
Upton
https://github.com/propublica/upton
A batteries-included framework for easy web-scraping.
- tags: ruby, web-crawling
- :octocat: source code
Wombat
https://github.com/felipecsl/wombat
Web scraper with an elegant DSL that parses structured data from web pages.
- tags: ruby, web-crawling
- :octocat: source code
cola
https://github.com/chineking/cola
A distributed crawling framework.
- tags: python, web-crawling, web-scraping
- :octocat: source code
feedparser
https://pythonhosted.org/feedparser/
Universal feed parser.
- tags: python, web-crawling, web-scraping
grab
https://github.com/lorien/grab
Site scraping framework.
- tags: python, web-crawling, web-scraping
- :octocat: source code
MechanicalSoup
https://github.com/MechanicalSoup/MechanicalSoup
A Python library for automating interaction with websites.
- tags: python, web-crawling, web-scraping
- :octocat: source code
portia
https://github.com/scrapinghub/portia
Visual scraping for Scrapy.
- tags: python, web-crawling, web-scraping
- :octocat: source code
pyspider
https://github.com/binux/pyspider
A powerful spider system.
- tags: python, web-crawling, web-scraping
- :octocat: source code
robobrowser
https://github.com/jmcarp/robobrowser
A simple, Pythonic library for browsing the web without a standalone web browser.
- tags: python, web-crawling, web-scraping
- :octocat: source code
scrapy
A fast high-level screen scraping and web crawling framework.
- tags: python, web-crawling, web-scraping
- :octocat: source code
Apache Nutch
Highly extensible, highly scalable web crawler for production environments.
- tags: java, web-crawling
Crawler4j
https://github.com/yasserg/crawler4j
Simple and lightweight web crawler.
- tags: java, web-crawling
- :octocat: source code
jsoup
Scrapes, parses, manipulates and cleans HTML.
- tags: java, web-crawling
StormCrawler
SDK for building low-latency and scalable web crawlers.
- tags: java, web-crawling
webmagic
https://github.com/code4craft/webmagic
Scalable crawler with downloading, url management, content extraction and persistent.
- tags: java, web-crawling
- :octocat: source code