Name: python-Scrapy-doc	Distribution: openSUSE Tumbleweed
Version: 2.12.0	Vendor: openSUSE
Release: 1.3	Build date: Tue Dec 3 09:24:29 2024
Group: Unspecified	Build host: reproducible
Size: 0	Source RPM: python-Scrapy-2.12.0-1.3.src.rpm
Packager: https://bugs.opensuse.org
Url: https://scrapy.org
Summary: Documentation for python-Scrapy

Provides documentation for python-Scrapy.

Provides

python-Scrapy-doc

Requires

License

BSD-3-Clause

Changelog

* Tue Dec 03 2024 Steve Kowalik <steven.kowalik@suse.com>
- Update to 2.12.0:
* Dropped support for Python 3.8, added support for Python 3.13
* start_requests can now yield items
* Added scrapy.http.JsonResponse
* Added the CLOSESPIDER_PAGECOUNT_NO_ITEM setting
* Thu Jul 11 2024 Dirk Müller <dmueller@suse.com>
- update to 2.11.2 (bsc#1224474, CVE-2024-1968):
* Redirects to non-HTTP protocols are no longer followed.
Please, see the 23j4-mw76-5v7h security advisory for more
information. (:issue:`457`)
* The Authorization header is now dropped on redirects to a
different scheme (http:// or https://) or port, even if the
domain is the same. Please, see the 4qqq-9vqf-3h3f security
advisory for more information.
* When using system proxy settings that are different for
http:// and https://, redirects to a different URL scheme
will now also trigger the corresponding change in proxy
settings for the redirected request. Please, see the
jm3v-qxmh-hxwv security advisory for more information.
(:issue:`767`)
* :attr:`Spider.allowed_domains
<scrapy.Spider.allowed_domains>` is now enforced for all
requests, and not only requests from spider callbacks.
* :func:`~scrapy.utils.iterators.xmliter_lxml` no longer
resolves XML entities.
* defusedxml is now used to make
:class:`scrapy.http.request.rpc.XmlRpcRequest` more secure.
* Restored support for brotlipy_, which had been dropped in
Scrapy 2.11.1 in favor of brotli. (:issue:`6261`) Note
brotlipy is deprecated, both in Scrapy and upstream. Use
brotli instead if you can.
* Make :setting:`METAREFRESH_IGNORE_TAGS` ["noscript"] by
default. This prevents :class:`~scrapy.downloadermiddlewares.
redirect.MetaRefreshMiddleware` from following redirects that
would not be followed by web browsers with JavaScript
enabled.
* During :ref:`feed export <topics-feed-exports>`, do not close
the underlying file from :ref:`built-in post-processing
plugins <builtin-plugins>`.
* :class:`LinkExtractor
<scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>` now
properly applies the unique and canonicalize parameters.
* Do not initialize the scheduler disk queue if
:setting:`JOBDIR` is an empty string.
* Fix :attr:`Spider.logger <scrapy.Spider.logger>` not logging
custom extra information.
* robots.txt files with a non-UTF-8 encoding no longer prevent
parsing the UTF-8-compatible (e.g. ASCII) parts of the
document.
* :meth:`scrapy.http.cookies.WrappedRequest.get_header` no
longer raises an exception if default is None.
:func:`scrapy.utils.response.get_base_url` to determine the
base URL of a given :class:`~scrapy.http.Response`.
* :class:`~scrapy.selector.Selector` now uses
:func:`scrapy.utils.response.get_base_url` to determine the
base URL of a given :class:`~scrapy.http.Response`.
(:issue:`6265`)
* The :meth:`media_to_download` method of :ref:`media pipelines
<topics-media-pipeline>` now logs exceptions before stripping
them.
* When passing a callback to the :command:`parse` command,
build the callback callable with the right signature.
* Add a FAQ entry about :ref:`creating blank requests <faq-
blank-request>`.
* Document that :attr:`scrapy.selector.Selector.type` can be
"json".
* Make builds reproducible.
* Packaging and test fixes
* Mon Mar 25 2024 Dirk Müller <dmueller@suse.com>
- update to 2.11.1 (bsc#1220514, CVE-2024-1892, bsc#1221986):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information. (bsc#1221986)
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
:class:`~scrapy.exporters.PythonItemExporter` to the 2.11.0 release notes.
* Added a missing note about removing the deprecated
``scrapy.utils.boto.is_botocore()`` function to the 2.8.0 release notes.
* Other documentation improvements. (:issue:`6128`, :issue:`6144`,
:issue:`6163`, :issue:`6190`, :issue:`6192`)
- drop twisted-23.8.0-compat.patch (upstream)
* Wed Jan 10 2024 Daniel Garcia <daniel.garcia@suse.com>
- Add patch twisted-23.8.0-compat.patch gh#scrapy/scrapy#6064
- Update to 2.11.0:
- Spiders can now modify settings in their from_crawler methods,
e.g. based on spider arguments.
- Periodic logging of stats.
- Bug fixes.
- 2.10.0:
- Added Python 3.12 support, dropped Python 3.7 support.
- The new add-ons framework simplifies configuring 3rd-party
components that support it.
- Exceptions to retry can now be configured.
- Many fixes and improvements for feed exports.
- 2.9.0:
- Per-domain download settings.
- Compatibility with new cryptography and new parsel.
- JMESPath selectors from the new parsel.
- Bug fixes.
- 2.8.0:
- This is a maintenance release, with minor features, bug fixes, and
cleanups.
* Mon Nov 07 2022 Yogalakshmi Arunachalam <yarunachalam@suse.com>
- Update to v2.7.1
* Relaxed the restriction introduced in 2.6.2 so that the Proxy-Authentication header can again be set explicitly in certain cases,
restoring compatibility with scrapy-zyte-smartproxy 2.1.0 and older
Bug fixes
* full change-log https://docs.scrapy.org/en/latest/news.html#scrapy-2-7-1-2022-11-02
* Thu Oct 27 2022 Yogalakshmi Arunachalam <yarunachalam@suse.com>
- Update to v2.7.0
Highlights:
* Added Python 3.11 support, dropped Python 3.6 support
* Improved support for :ref:`asynchronous callbacks <topics-coroutines>`
* :ref:`Asyncio support <using-asyncio>` is enabled by default on new projects
* Output names of item fields can now be arbitrary strings
* Centralized :ref:`request fingerprinting <request-fingerprints>` configuration is now possible
Modified requirements
* Python 3.7 or greater is now required; support for Python 3.6 has been dropped. Support for the upcoming Python 3.11 has been added.
The minimum required version of some dependencies has changed as well:
- lxml: 3.5.0 → 4.3.0
- Pillow (:ref:`images pipeline <images-pipeline>`): 4.0.0 → 7.1.0
- zope.interface: 5.0.0 → 5.1.0
(:issue:`5512`, :issue:`5514`, :issue:`5524`, :issue:`5563`, :issue:`5664`, :issue:`5670`, :issue:`5678`)
Deprecations
- :meth:`ImagesPipeline.thumb_path <scrapy.pipelines.images.ImagesPipeline.thumb_path>` must now accept an item parameter (:issue:`5504`, :issue:`5508`).
- The scrapy.downloadermiddlewares.decompression module is now deprecated (:issue:`5546`, :issue:`5547`).
Complete changelog https://github.com/scrapy/scrapy/blob/2.7/docs/news.rst
* Fri Sep 09 2022 Yogalakshmi Arunachalam <yarunachalam@suse.com>
- Update to v2.6.2
Security bug fix:
* When HttpProxyMiddleware processes a request with proxy metadata, and that proxy metadata includes proxy credentials,
HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set.
* There are third-party proxy-rotation downloader middlewares that set different proxy metadata every time they process a request.
* Because of request retries and redirects, the same request can be processed by downloader middlewares more than once,
including both HttpProxyMiddleware and any third-party proxy-rotation downloader middleware.
* These third-party proxy-rotation downloader middlewares could change the proxy metadata of a request to a new value,
but fail to remove the Proxy-Authentication header from the previous value of the proxy metadata, causing the credentials of one
proxy to be sent to a different proxy.
* To prevent the unintended leaking of proxy credentials, the behavior of HttpProxyMiddleware is now as follows when processing a request:
+ If the request being processed defines proxy metadata that includes credentials, the Proxy-Authorization header is always updated
to feature those credentials.
+ If the request being processed defines proxy metadata without credentials, the Proxy-Authorization header is removed unless
it was originally defined for the same proxy URL.
+ To remove proxy credentials while keeping the same proxy URL, remove the Proxy-Authorization header.
+ If the request has no proxy metadata, or that metadata is a falsy value (e.g. None), the Proxy-Authorization header is removed.
+ It is no longer possible to set a proxy URL through the proxy metadata but set the credentials through the Proxy-Authorization header.
Set proxy credentials through the proxy metadata instead.
* Also fixes the following regressions introduced in 2.6.0:
+ CrawlerProcess supports again crawling multiple spiders (issue 5435, issue 5436)
+ Installing a Twisted reactor before Scrapy does (e.g. importing twisted.internet.reactor somewhere at the module level)
no longer prevents Scrapy from starting, as long as a different reactor is not specified in TWISTED_REACTOR (issue 5525, issue 5528)
+ Fixed an exception that was being logged after the spider finished under certain conditions (issue 5437, issue 5440)
+ The --output/-o command-line parameter supports again a value starting with a hyphen (issue 5444, issue 5445)
+ The scrapy parse -h command no longer throws an error (issue 5481, issue 5482)
* Fri Mar 04 2022 Ben Greiner <code@bnavigator.de>
- Update runtime requirements and test deselections
* Wed Mar 02 2022 Matej Cepl <mcepl@suse.com>
- Update to v2.6.1
* Security fixes for cookie handling (CVE-2022-0577 aka
bsc#1196638, GHSA-mfjm-vh54-3f96)
* Python 3.10 support
* asyncio support is no longer considered experimental, and works
out-of-the-box on Windows regardless of your Python version
* Feed exports now support pathlib.Path output paths and per-feed
item filtering and post-processing
- Remove unnecessary patches:
- remove-h2-version-restriction.patch
- add-peak-method-to-queues.patch
* Sun Jan 16 2022 Ben Greiner <code@bnavigator.de>
- Skip a failing test in python310: exception format not recognized

Files

/usr/share/doc/packages/python-Scrapy-doc
/usr/share/doc/packages/python-Scrapy-doc/html

Generated by rpm2html 1.8.1

Fabrice Bellet, Sun Mar 30 23:22:36 2025

python-Scrapy-doc-2.12.0-1.3 RPM for noarch

From OpenSuSE Tumbleweed for noarch

Provides

Requires

License

Changelog

Files