web-content-extracting.md
July 15, 2021 ยท View on GitHub
Bookmarks tagged [web-content-extracting]
www.codever.land/bookmarks/t/web-content-extracting
html2text
https://github.com/Alir3z4/html2text
Convert HTML to Markdown-formatted text.
- tags: python, web-content-extracting
- :octocat: source code
lassie
https://github.com/michaelhelmick/lassie
Web Content Retrieval for Humans.
- tags: python, web-content-extracting
- :octocat: source code
micawber
https://github.com/coleifer/micawber
A small library for extracting rich content from URLs.
- tags: python, web-content-extracting
- :octocat: source code
newspaper
https://github.com/codelucas/newspaper
News extraction, article extraction and content curation in Python.
- tags: python, web-content-extracting
- :octocat: source code
python-readability
https://github.com/buriy/python-readability
Fast Python port of arc90's readability tool.
- tags: python, web-content-extracting
- :octocat: source code
requests-html
https://github.com/kennethreitz/requests-html
Pythonic HTML Parsing for Humans.
- tags: python, web-content-extracting
- :octocat: source code
sumy
https://github.com/miso-belica/sumy
A module for automatic summarization of text documents and HTML pages.
- tags: python, web-content-extracting
- :octocat: source code
textract
https://github.com/deanmalmgren/textract
Extract text from any document, Word, PowerPoint, PDFs, etc.
- tags: python, web-content-extracting
- :octocat: source code
toapi
https://github.com/gaojiuli/toapi
Every web site provides APIs.
- tags: python, web-content-extracting
- :octocat: source code