Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: python-Scrapy-doc | Distribution: openSUSE Tumbleweed |
Version: 2.12.0 | Vendor: openSUSE |
Release: 1.3 | Build date: Tue Dec 3 09:24:29 2024 |
Group: Unspecified | Build host: reproducible |
Size: 0 | Source RPM: python-Scrapy-2.12.0-1.3.src.rpm |
Packager: https://bugs.opensuse.org | |
Url: https://scrapy.org | |
Summary: Documentation for python-Scrapy |
Provides documentation for python-Scrapy.
BSD-3-Clause
* Tue Dec 03 2024 Steve Kowalik <steven.kowalik@suse.com> - Update to 2.12.0: * Dropped support for Python 3.8, added support for Python 3.13 * start_requests can now yield items * Added scrapy.http.JsonResponse * Added the CLOSESPIDER_PAGECOUNT_NO_ITEM setting * Thu Jul 11 2024 Dirk Müller <dmueller@suse.com> - update to 2.11.2 (bsc#1224474, CVE-2024-1968): * Redirects to non-HTTP protocols are no longer followed. Please, see the 23j4-mw76-5v7h security advisory for more information. (:issue:`457`) * The Authorization header is now dropped on redirects to a different scheme (http:// or https://) or port, even if the domain is the same. Please, see the 4qqq-9vqf-3h3f security advisory for more information. * When using system proxy settings that are different for http:// and https://, redirects to a different URL scheme will now also trigger the corresponding change in proxy settings for the redirected request. Please, see the jm3v-qxmh-hxwv security advisory for more information. (:issue:`767`) * :attr:`Spider.allowed_domains <scrapy.Spider.allowed_domains>` is now enforced for all requests, and not only requests from spider callbacks. * :func:`~scrapy.utils.iterators.xmliter_lxml` no longer resolves XML entities. * defusedxml is now used to make :class:`scrapy.http.request.rpc.XmlRpcRequest` more secure. * Restored support for brotlipy_, which had been dropped in Scrapy 2.11.1 in favor of brotli. (:issue:`6261`) Note brotlipy is deprecated, both in Scrapy and upstream. Use brotli instead if you can. * Make :setting:`METAREFRESH_IGNORE_TAGS` ["noscript"] by default. This prevents :class:`~scrapy.downloadermiddlewares. redirect.MetaRefreshMiddleware` from following redirects that would not be followed by web browsers with JavaScript enabled. * During :ref:`feed export <topics-feed-exports>`, do not close the underlying file from :ref:`built-in post-processing plugins <builtin-plugins>`. * :class:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>` now properly applies the unique and canonicalize parameters. * Do not initialize the scheduler disk queue if :setting:`JOBDIR` is an empty string. * Fix :attr:`Spider.logger <scrapy.Spider.logger>` not logging custom extra information. * robots.txt files with a non-UTF-8 encoding no longer prevent parsing the UTF-8-compatible (e.g. ASCII) parts of the document. * :meth:`scrapy.http.cookies.WrappedRequest.get_header` no longer raises an exception if default is None. :func:`scrapy.utils.response.get_base_url` to determine the base URL of a given :class:`~scrapy.http.Response`. * :class:`~scrapy.selector.Selector` now uses :func:`scrapy.utils.response.get_base_url` to determine the base URL of a given :class:`~scrapy.http.Response`. (:issue:`6265`) * The :meth:`media_to_download` method of :ref:`media pipelines <topics-media-pipeline>` now logs exceptions before stripping them. * When passing a callback to the :command:`parse` command, build the callback callable with the right signature. * Add a FAQ entry about :ref:`creating blank requests <faq- blank-request>`. * Document that :attr:`scrapy.selector.Selector.type` can be "json". * Make builds reproducible. * Packaging and test fixes * Mon Mar 25 2024 Dirk Müller <dmueller@suse.com> - update to 2.11.1 (bsc#1220514, CVE-2024-1892, bsc#1221986): * Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892) - ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of :func:`~scrapy.utils.iterators.xmliter_lxml`, which :class:`~scrapy.spiders.XMLFeedSpider` now uses. To minimize the impact of this change on existing code, :func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating the node namespace with a prefix in the node name, and big files with highly nested trees when using libxml2 2.7+. - Fixed regular expressions in the implementation of the :func:`~scrapy.utils.response.open_in_browser` function. .. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS * :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security advisory`_ for more information. (bsc#1221986) .. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7 * Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the deprecated ``scrapy.downloadermiddlewares.decompression`` module has been removed. * The ``Authorization`` header is now dropped on redirects to a different domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more information. * The OS signal handling code was refactored to no longer use private Twisted functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`) * Improved documentation for :class:`~scrapy.crawler.Crawler` initialization changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`) * Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`. * Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`, * Added a link to Zyte's export guides to the :ref:`feed exports * Added a missing note about backward-incompatible changes in :class:`~scrapy.exporters.PythonItemExporter` to the 2.11.0 release notes. * Added a missing note about removing the deprecated ``scrapy.utils.boto.is_botocore()`` function to the 2.8.0 release notes. * Other documentation improvements. (:issue:`6128`, :issue:`6144`, :issue:`6163`, :issue:`6190`, :issue:`6192`) - drop twisted-23.8.0-compat.patch (upstream) * Wed Jan 10 2024 Daniel Garcia <daniel.garcia@suse.com> - Add patch twisted-23.8.0-compat.patch gh#scrapy/scrapy#6064 - Update to 2.11.0: - Spiders can now modify settings in their from_crawler methods, e.g. based on spider arguments. - Periodic logging of stats. - Bug fixes. - 2.10.0: - Added Python 3.12 support, dropped Python 3.7 support. - The new add-ons framework simplifies configuring 3rd-party components that support it. - Exceptions to retry can now be configured. - Many fixes and improvements for feed exports. - 2.9.0: - Per-domain download settings. - Compatibility with new cryptography and new parsel. - JMESPath selectors from the new parsel. - Bug fixes. - 2.8.0: - This is a maintenance release, with minor features, bug fixes, and cleanups. * Mon Nov 07 2022 Yogalakshmi Arunachalam <yarunachalam@suse.com> - Update to v2.7.1 * Relaxed the restriction introduced in 2.6.2 so that the Proxy-Authentication header can again be set explicitly in certain cases, restoring compatibility with scrapy-zyte-smartproxy 2.1.0 and older Bug fixes * full change-log https://docs.scrapy.org/en/latest/news.html#scrapy-2-7-1-2022-11-02 * Thu Oct 27 2022 Yogalakshmi Arunachalam <yarunachalam@suse.com> - Update to v2.7.0 Highlights: * Added Python 3.11 support, dropped Python 3.6 support * Improved support for :ref:`asynchronous callbacks <topics-coroutines>` * :ref:`Asyncio support <using-asyncio>` is enabled by default on new projects * Output names of item fields can now be arbitrary strings * Centralized :ref:`request fingerprinting <request-fingerprints>` configuration is now possible Modified requirements * Python 3.7 or greater is now required; support for Python 3.6 has been dropped. Support for the upcoming Python 3.11 has been added. The minimum required version of some dependencies has changed as well: - lxml: 3.5.0 → 4.3.0 - Pillow (:ref:`images pipeline <images-pipeline>`): 4.0.0 → 7.1.0 - zope.interface: 5.0.0 → 5.1.0 (:issue:`5512`, :issue:`5514`, :issue:`5524`, :issue:`5563`, :issue:`5664`, :issue:`5670`, :issue:`5678`) Deprecations - :meth:`ImagesPipeline.thumb_path <scrapy.pipelines.images.ImagesPipeline.thumb_path>` must now accept an item parameter (:issue:`5504`, :issue:`5508`). - The scrapy.downloadermiddlewares.decompression module is now deprecated (:issue:`5546`, :issue:`5547`). Complete changelog https://github.com/scrapy/scrapy/blob/2.7/docs/news.rst * Fri Sep 09 2022 Yogalakshmi Arunachalam <yarunachalam@suse.com> - Update to v2.6.2 Security bug fix: * When HttpProxyMiddleware processes a request with proxy metadata, and that proxy metadata includes proxy credentials, HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set. * There are third-party proxy-rotation downloader middlewares that set different proxy metadata every time they process a request. * Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, including both HttpProxyMiddleware and any third-party proxy-rotation downloader middleware. * These third-party proxy-rotation downloader middlewares could change the proxy metadata of a request to a new value, but fail to remove the Proxy-Authentication header from the previous value of the proxy metadata, causing the credentials of one proxy to be sent to a different proxy. * To prevent the unintended leaking of proxy credentials, the behavior of HttpProxyMiddleware is now as follows when processing a request: + If the request being processed defines proxy metadata that includes credentials, the Proxy-Authorization header is always updated to feature those credentials. + If the request being processed defines proxy metadata without credentials, the Proxy-Authorization header is removed unless it was originally defined for the same proxy URL. + To remove proxy credentials while keeping the same proxy URL, remove the Proxy-Authorization header. + If the request has no proxy metadata, or that metadata is a falsy value (e.g. None), the Proxy-Authorization header is removed. + It is no longer possible to set a proxy URL through the proxy metadata but set the credentials through the Proxy-Authorization header. Set proxy credentials through the proxy metadata instead. * Also fixes the following regressions introduced in 2.6.0: + CrawlerProcess supports again crawling multiple spiders (issue 5435, issue 5436) + Installing a Twisted reactor before Scrapy does (e.g. importing twisted.internet.reactor somewhere at the module level) no longer prevents Scrapy from starting, as long as a different reactor is not specified in TWISTED_REACTOR (issue 5525, issue 5528) + Fixed an exception that was being logged after the spider finished under certain conditions (issue 5437, issue 5440) + The --output/-o command-line parameter supports again a value starting with a hyphen (issue 5444, issue 5445) + The scrapy parse -h command no longer throws an error (issue 5481, issue 5482) * Fri Mar 04 2022 Ben Greiner <code@bnavigator.de> - Update runtime requirements and test deselections * Wed Mar 02 2022 Matej Cepl <mcepl@suse.com> - Update to v2.6.1 * Security fixes for cookie handling (CVE-2022-0577 aka bsc#1196638, GHSA-mfjm-vh54-3f96) * Python 3.10 support * asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version * Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing - Remove unnecessary patches: - remove-h2-version-restriction.patch - add-peak-method-to-queues.patch * Sun Jan 16 2022 Ben Greiner <code@bnavigator.de> - Skip a failing test in python310: exception format not recognized
/usr/share/doc/packages/python-Scrapy-doc /usr/share/doc/packages/python-Scrapy-doc/html
Generated by rpm2html 1.8.1
Fabrice Bellet, Sun Mar 30 23:22:36 2025