appimages.scraper

June 22, 2018 · View on GitHub

Search for AppImage releases over the web.

Run

Normal run:scrapy crawl generic.crawler -a project_file=./projects/org.appimage.appimaged.json
Output results to json: scrapy crawl appimage.github.io -o result.json -t json

Sometimes authors doesnt provide good metadata about their project so we could help them by means of preset values. Take a look in the following example at the presets field and to the decription field inside. It will be use as a fallback value in case that the author forgets to fill that field.

{
  "urls" : ["https://github.com/AppImage/AppImageKit/releases"]
  "presets": {
        "id" : "org.appimage.appimaged",
        "description" : {"null": "Daemon to monitor AppImage files in the user home dir."}
  }
}

Multiple applications release in a single page ?

No problem use the match field. It expects to be a python regex that will be used to match the right AppImage download links for the app you are scraping.

{
  "urls" : ["https://github.com/AppImage/AppImageKit/releases"],
  "match" : ".*\/appimagetool.*",
  "presets": {
    "id" : "org.appimagekit.appimaged",
    "description" : {"null": "Daemon to monitor AppImage files in the user home dir."}
  }
}

appimages.scraper

Dependencies

Run

Input