Command Line Tools

October 29, 2024 ยท View on GitHub

This list contains network and data processing tools with command line interface written in any programming langauge.

Contents

Network

EMPTY CONTENT

Web Scraping

  • pipet - A swiss-army tool for scraping and extracting data using selectors, JavaScript and unix pipes
  • trafilatura - Gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

URLs

  • courlan - Clean, filter and sample URLs to optimize data collection: Deduplication, spam, content and language filters