Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: python3-html-text | Distribution: Fedora Project |
Version: 0.6.2 | Vendor: Fedora Project |
Release: 1.fc40 | Build date: Fri Oct 25 05:45:29 2024 |
Group: Unspecified | Build host: buildvm-a64-18.iad2.fedoraproject.org |
Size: 30568 | Source RPM: python-html-text-0.6.2-1.fc40.src.rpm |
Packager: Fedora Project | |
Url: https://github.com/zytedata/html-text | |
Summary: Extract text from HTML |
How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup? - Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; - html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; - html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers.
MIT
* Fri Oct 18 2024 Benson Muite <benson_muite@emailplus.org> - 0.6.2-1 - Initial packaging
/usr/lib/python3.12/site-packages/html_text /usr/lib/python3.12/site-packages/html_text-0.6.2.dist-info /usr/lib/python3.12/site-packages/html_text-0.6.2.dist-info/INSTALLER /usr/lib/python3.12/site-packages/html_text-0.6.2.dist-info/LICENSE /usr/lib/python3.12/site-packages/html_text-0.6.2.dist-info/METADATA /usr/lib/python3.12/site-packages/html_text-0.6.2.dist-info/WHEEL /usr/lib/python3.12/site-packages/html_text-0.6.2.dist-info/top_level.txt /usr/lib/python3.12/site-packages/html_text/__init__.py /usr/lib/python3.12/site-packages/html_text/__pycache__ /usr/lib/python3.12/site-packages/html_text/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/html_text/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/html_text/__pycache__/html_text.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/html_text/__pycache__/html_text.cpython-312.pyc /usr/lib/python3.12/site-packages/html_text/html_text.py /usr/share/doc/python3-html-text /usr/share/doc/python3-html-text/README.rst
Generated by rpm2html 1.8.1
Fabrice Bellet, Sat Nov 16 05:36:41 2024