Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: python312-dask-complete | Distribution: openSUSE:Factory:zSystems |
Version: 2024.12.0 | Vendor: openSUSE |
Release: 1.2 | Build date: Wed Dec 4 11:16:13 2024 |
Group: Unspecified | Build host: reproducible |
Size: 1531 | Source RPM: python-dask-2024.12.0-1.2.src.rpm |
Packager: https://bugs.opensuse.org | |
Url: https://dask.org | |
Summary: All dask components |
A flexible library for parallel computing in Python. Dask is composed of two parts: - Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads. - “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers. This package pulls in all the optional dask components.
BSD-3-Clause
* Wed Dec 04 2024 Ben Greiner <code@bnavigator.de> - Update to 2024.12.0 * Revert "Add LLM chatbot to Dask docs (#11556)" @dchudz (#11577) * Automatically rechunk if array in to_zarr has irregular chunks @phofl (#11553) * Blockwise uses Task class @fjetter (#11568) * Migrate rechunk and reshape to task spec @phofl (#11555) * Cache svg-representation for arrays @dcherian (#11560) * Fix empty input for containers @fjetter (#11571) * Convert Bag graphs to TaskSpec graphs during optimization @fjetter (#11569) * add LLM chatbot to Dask docs @dchudz (#11556) * Add support for Python 3.13 @phofl (#11456) * Fuse data nodes in linear fusion too @phofl (#11549) * Migrate slicing code to task spec @phofl (#11548) * Speed up ArraySliceDep tokenization @phofl (#11551) * Fix fusing of p2p barrier tasks @phofl (#11543) * Remove infra/mentions of GPU CI @charlesbluca (#11546) * Temporarily disable gpuCI update CI job @jrbourbeau (#11545) * Use BlockwiseDep to implement map_blocks keywords @phofl (#11542) * Remove optimize_slices @phofl (#11538) * Make reshape_blockwise a noop if shape is the same @phofl (#11541) * Remove read-only flag from open_arry in open_zarr @phofl (#11539) * Implement linear_fusion for task spec class @phofl (#11525) * Remove recursion from TaskSpec @fjetter (#11477) * Fixup test after dask-expr change @phofl (#11536) * Bump codecov/codecov-action from 3 to 5 @dependabot (#11532) * Create dask-expr frame directly without roundtripping @phofl (#11529) * Add scikit-image nightly back to upstream CI @jrbourbeau (#11530) * Remove from\_dask\_dataframe import @phofl (#11528) * Ensure that from_array creates a copy @phofl (#11524) * Simplify and improve performance of normalize chunks @phofl (#11521) * Fix flaky nanquantile test @phofl (#11518) * Fix tests for new read\_only kwarg in zarr=3 @phofl (#11516) * Wed Nov 27 2024 Ben Greiner <code@bnavigator.de> - Skip tokenize test on python313 * See gh#dask/dask#11456, gh#dask/dask#11457 * Fri Nov 22 2024 Dirk Müller <dmueller@suse.com> - reenable for python 313 as numba is now available * Tue Nov 19 2024 Dirk Müller <dmueller@suse.com> - update to 2024.11.2: * fix critical performance regression in 2024.11.1 - skip dask on 313 as numba is not yet ready for it * Tue Nov 12 2024 Dirk Müller <dmueller@suse.com> - update to 2024.11.1: * Zarr-Python 3 compatibility (:pr:`11388`) * Avoid exponentially increasing taskgraph in overlap * Ensure numba tokenization does not use slow pickle path * Sun Sep 08 2024 Dirk Müller <dmueller@suse.com> - update to 2024.8.2: * Avoid capturing code of xdist @fjetter * Reduce memory footprint of culling P2P rechunking * Add tests for choosing default rechunking method * Increase visibility of GPU CI updates @charlesbluca * Bump test\_pause\_while\_idle timeout @fjetter * Concatenate small input chunks before P2P rechunking * Remove dump cluster from gen\_cluster @fjetter * Bump `numpy>=1.24` and `pyarrow>=14.0.1` minimum versions * Fix PipInstall plugin on Worker @hendrikmakait * Remove more Python 3.10 compatibility code @jrbourbeau * Use task-based rechunking to prechunk along partial boundaries @hendrikmakait * Ensure client\_desires\_keys does not corrupt Scheduler state @fjetter * Bump minimum ``cloudpickle`` to 3 @jrbourbeau * Thu Aug 29 2024 Ben Greiner <code@bnavigator.de> - Update to 2024.8.1 * Improve output chunksizes for reshaping Dask Arrays * Improve scheduling efficiency for Xarray Rechunk-GroupBy-Reduce patterns * Drop support for Python 3.9 - Release 2025.8.0 * Improve efficiency and performance of slicing with positional indexers * Improve scheduling efficiency for Xarray GroupBy-Reduce patterns - Release 2025.7.1 * More resilient distributed lock - Release 2025.7.0 * Drop support for pandas 1.x * Publish-subscribe APIs deprecated - Overhaul multibuild setup: Prepare for python313 * Thu Jul 11 2024 Ben Greiner <code@bnavigator.de> - Do not pin to numpy < 2. Dask supports it, downstream packages must take care of their own. * Mon Jul 08 2024 Steve Kowalik <steven.kowalik@suse.com> - Update to 2024.6.2: * profile._f_lineno: handle next_line being None in Python 3.13 * Cache global query-planning config * Python 3.13 fixes * Fix test_map_freq_to_period_start for pandas=3 * Tokenizing memmap arrays will now avoid materializing the array into memory. * Fix test_dt_accessor with query planning disabled * Remove deprecated dask.compatibility module * Ensure compatibility for xarray.NamedArray * Avoid rounding error in test_prometheus_collect_count_total_by_cost_multipliers * Log key collision count in update_graph log event * Rename safe to expected in Scheduler.remove_worker * Eagerly update aggregate statistics for TaskPrefix instead of calculating them on-demand * Improve graph submission time for P2P rechunking by avoiding unpack recursion into indices * Add safe keyword to remove-worker event * Improved errors and reduced logging for P2P RPC calls * Adjust P2P tests for dask-expr * Iterate over copy of Server.digests_total_since_heartbeat to avoid RuntimeError * Add Prometheus gauge for task groups * Fix too strict assertion in shuffle code for pandas subclasses * Reduce noise from erring tasks that are not supposed to be running * Fri Apr 26 2024 Ben Greiner <code@bnavigator.de> - Update to 2024.4.2 * Trivial Merge Implementation * Auto-partitioning in read_parquet - Release 2024.4.1 * Fix an error when importing dask.dataframe with Python 3.11.9. - Release 2024.4.0 * Query planning fixes * GPU metric dashboard fixes - Release 2024.3.1 * Demote an exception to a warning if dask-expr is not installed when upgrading. - Release 2024.3.0 * Query planning * Sunset of Pandas 1.X support * Tue Mar 05 2024 Ben Greiner <code@bnavigator.de> - Update to 2024.2.1 * Allow silencing dask.DataFrame deprecation warning * More robust distributed scheduler for rare key collisions * More robust adaptive scaling on large clusters - The test subpackage now directly depends on pandas-test which does not use pytest-asyncio anymore * Wed Feb 14 2024 Ben Greiner <code@bnavigator.de> - Update to 2024.2.0 * Deprecate Dask DataFrame implementation * Improved tokenization * https://docs.dask.org/en/stable/changelog.html#v2024-2-0 - Really drop python39 from testing instead of testing it with every other test flavor * Tue Feb 06 2024 Dirk Müller <dmueller@suse.com> - drop python39 from testing * Sun Feb 04 2024 Ben Greiner <code@bnavigator.de> - Add python312 test flavor * Tue Jan 30 2024 Dirk Müller <dmueller@suse.com> - update to 2024.1.1: * This release contains compatibility updates for the latest pandas and scipy releases. See :pr:`10834`, :pr:`10849`, :pr:`10845`, and :pr-distributed:`8474` from `crusaderky`_ for details. * Sat Jan 20 2024 Dirk Müller <dmueller@suse.com> - update to 2024.1.0: * Released on January 12, 2024 * P2P rechunking now utilizes the relationships between input and output chunks. For situations that do not require all-to- all data transfer, this may significantly reduce the runtime and memory/disk footprint. It also enables task culling. * The fastparquet Parquet engine has been deprecated. Users should migrate to the pyarrow engine by installing PyArrow and removing engine="fastparquet" in read_parquet or to_parquet calls. * This release improves serialization robustness for arbitrary data. Previously there were some cases where serialization could fail for non-msgpack serializable data. In those cases we now fallback to using pickle. * Deprecate shuffle keyword in favour of shuffle_method for DataFrame methods (:pr:`10738`) `Hendrik Makait`_ * Deprecate automatic argument inference in repartition * Deprecate compute parameter in set_index * Deprecate inplace in eval * Deprecate Series.view * Deprecate npartitions="auto" for set_index & sort_values * Mon Dec 18 2023 Dirk Müller <dmueller@suse.com> - update to 2023.12.1: * Dask DataFrames are now much more performant by using a logical query planner. * ``read_parquet`` will now infer the Arrow types ``pa.date32()``, ``pa.date64()`` and ``pa.decimal()`` as a ``ArrowDtype`` in pandas. These dtypes are backed by the original Arrow array, and thus avoid the conversion to NumPy object. * This release contains several updates that fix a possible deadlock introduced in 2023.9.2 and improve the robustness of P2P-based merging when the cluster is dynamically scaling up. * The ``distributed.scheduler.pickle`` configuration option is no longer supported. As of the 2023.4.0 release, ``pickle`` is used to transmit task graphs, so can no longer be disabled. We now raise an informative error when ``distributed.scheduler.pickle`` is set to ``False``. * Update DataFrame page * Add changelog entry for ``dask-expr`` switch * [Dask.order] Remove non-runnable leaf nodes from ordering * Update installation docs * Fix software environment link in docs * Avoid converting non-strings to arrow strings for read_parquet * Dask.order rewrite using a critical path approach * Avoid substituting keys that occur multiple times * Add missing image to docs * Update landing page * Make meta check simpler in dispatch * Pin PR Labeler * Reorganize docs index a bit * Avoid ``RecursionError`` when failing to pickle key in ``SpillBuffer`` and using ``tblib=3`` * Allow tasks to override ``is_rootish`` heuristic * Remove GPU executor (:pr-distributed:`8399`) `Hendrik Makait`_ * Do not rely on logging for subprocess cluster * Update gpuCI ``RAPIDS_VER`` to ``24.02`` * Ensure output chunks in P2P rechunking are distributed homogeneously (:pr-distributed:`8207`) `Florian Jetter`_ * Trivial: fix typo (:pr-distributed:`8395`) * Sat Dec 02 2023 Dirk Müller <dmueller@suse.com> - update to 2023.12.0: * Bokeh 3.3.0 compatibility * Add ``network`` marker to ``test_pyarrow_filesystem_option_real_data`` * Bump GPU CI to CUDA 11.8 (:pr:`10656`) * Tokenize ``pandas`` offsets deterministically * Add tokenize ``pd.NA`` functionality * Update gpuCI ``RAPIDS_VER`` to ``24.02`` (:pr:`10636`) * Fix precision handling in ``array.linalg.norm`` (:pr:`10556`) `joanrue`_ * Add ``axis`` argument to ``DataFrame.clip`` and ``Series.clip`` (:pr:`10616`) `Richard (Rick) Zamora`_ * Update changelog entry for in-memory rechunking (:pr:`10630`) `Florian Jetter`_ * Fix flaky ``test_resources_reset_after_cancelled_task`` * Bump GPU CI to CUDA 11.8 * Bump ``conda-incubator/setup-miniconda`` * Add debug logs to P2P scheduler plugin * ``O(1)`` access for ``/info/task/`` endpoint * Remove stringification from shuffle annotations * Don't cast ``int`` metrics to ``float`` * Drop asyncio TCP backend * Add offload support to ``context_meter.add_callback`` * Test that ``sync()`` propagates contextvars * Fix ``test_statistical_profiling_cycle`` * Replace ``Client.register_plugin`` s ``idempotent`` argument with ``.idempotent`` attribute on plugins * Fix test report generation * Install ``pyarrow-hotfix`` on ``mindeps-pandas`` CI * Reduce memory usage of scheduler process - optimize ``scheduler.py::TaskState`` class * Update cuDF test with explicit ``dtype=object`` * Fix ``Cluster`` / ``SpecCluster`` calls to async close methods * Thu Nov 16 2023 Ondřej Súkup <mimi.vx@gmail.com> - Update to 2023.11.0 * Zero-copy P2P Array Rechunking * Deprecating PyArrow <14.0.1 * Improved PyArrow filesystem for Parquet * Improve Type Reconciliation in P2P Shuffling * official support for Python 3.12 * Reduced memory pressure for multi array reductions * improved P2P shuffling robustness * Reduced scheduler CPU load for large graphs * Sun Sep 10 2023 Ben Greiner <code@bnavigator.de> - Update to 2023.9.1 [#]# Enhancements * Stricter data type for dask keys (GH#10485) crusaderky * Special handling for None in DASK_ environment variables (GH#10487) crusaderky [#]# Bug Fixes - Release 2023.9.0 [#]# Bug Fixes * Remove support for np.int64 in keys (GH#10483) crusaderky * Fix _partitions dtype in meta for shuffling (GH#10462) Hendrik Makait * Don’t use exception hooks to shorten tracebacks (GH#10456) crusaderky - Release 2023.8.1 [#]# Enhancements * Adding support for cgroup v2 to cpu_count (GH#10419) Johan Olsson * Support multi-column groupby with sort=True and split_out>1 (GH#10425) Richard (Rick) Zamora * Add DataFrame.enforce_runtime_divisions method (GH#10404) Richard (Rick) Zamora * Enable file mode="x" with a single_file=True for Dask DataFrame to_csv (GH#10443) Genevieve Buckley [#]# Bug Fixes * Fix ValueError when running to_csv in append mode with single_file as True (GH#10441) - Release 2023.8.0 [#]# Enhancements * Fix for make_timeseries performance regression (GH#10428) Irina Truong - Release 2023.7.1 * This release updates Dask DataFrame to automatically convert text data using object data types to string[pyarrow] if pandas>=2 and pyarrow>=12 are installed. This should result in significantly reduced memory consumption and increased computation performance in many workflows that deal with text data. You can disable this change by setting the dataframe.convert-string configuration value to False with dask.config.set({"dataframe.convert-string": False}) [#]# Enhancements * Convert to pyarrow strings if proper dependencies are installed (GH#10400) James Bourbeau * Avoid repartition before shuffle for p2p (GH#10421) Patrick Hoefler * API to generate random Dask DataFrames (GH#10392) Irina Truong * Speed up dask.bag.Bag.random_sample (GH#10356) crusaderky * Raise helpful ValueError for invalid time units (GH#10408) Nat Tabris * Make repartition a no-op when divisions match (divisions provided as a list) (GH#10395) Nicolas Grandemange [#]# Bug Fixes * Use dataframe.convert-string in read_parquet token (GH#10411) James Bourbeau * Category dtype is lost when concatenating MultiIndex (GH#10407) Irina Truong * Fix FutureWarning: The provided callable... (GH#10405) Irina Truong * Enable non-categorical hive-partition columns in read_parquet (GH#10353) Richard (Rick) Zamora * concat ignoring DataFrame withouth columns (GH#10359) Patrick Hoefler - Release 2023.7.0 [#]# Enhancements * Catch exceptions when attempting to load CLI entry points (GH#10380) Jacob Tomlinson [#]# Bug Fixes * Fix typo in _clean_ipython_traceback (GH#10385) Alexander Clausen * Ensure that df is immutable after from_pandas (GH#10383) Patrick Hoefler * Warn consistently for inplace in Series.rename (GH#10313) Patrick Hoefler - Release 2023.6.1 [#]# Enhancements * Remove no longer supported clip_lower and clip_upper (GH#10371) Patrick Hoefler * Support DataFrame.set_index(..., sort=False) (GH#10342) Miles * Cleanup remote tracebacks (GH#10354) Irina Truong * Add dispatching mechanisms for pyarrow.Table conversion (GH#10312) Richard (Rick) Zamora * Choose P2P even if fusion is enabled (GH#10344) Hendrik Makait * Validate that rechunking is possible earlier in graph generation (GH#10336) Hendrik Makait [#]# Bug Fixes * Fix issue with header passed to read_csv (GH#10355) GALI PREM SAGAR * Respect dropna and observed in GroupBy.var and GroupBy.std (GH#10350) Patrick Hoefler * Fix H5FD_lock error when writing to hdf with distributed client (GH#10309) Irina Truong * Fix for total_mem_usage of bag.map() (GH#10341) Irina Truong [#]# Deprecations * Deprecate DataFrame.fillna/Series.fillna with method (GH#10349) Irina Truong * Deprecate DataFrame.first and Series.first (GH#10352) Irina Truong - Release 2023.6.0 [#]# Enhancements * Add missing not in predicate support to read_parquet (GH#10320) Richard (Rick) Zamora [#]# Bug Fixes * Fix for incorrect value_counts (GH#10323) Irina Truong * Update empty describe top and freq values (GH#10319) James Bourbeau * Sat Jun 10 2023 ecsos <ecsos@opensuse.org> - Add %{?sle15_python_module_pythons} * Mon Jun 05 2023 Steve Kowalik <steven.kowalik@suse.com> - Tighten bokeh requirement to match distributed. * Fri May 26 2023 Ben Greiner <code@bnavigator.de> - Update to 2023.5.1 * This release drops support for Python 3.8. As of this release Dask supports Python 3.9, 3.10, and 3.11. [#]# Enhancements * Drop Python 3.8 support (GH#10295) Thomas Grainger * Change Dask Bag partitioning scheme to improve cluster saturation (GH#10294) Jacob Tomlinson * Generalize dd.to_datetime for GPU-backed collections, introduce get_meta_library utility (GH#9881) Charles Blackmon-Luca * Add na_action to DataFrame.map (GH#10305) Patrick Hoefler * Raise TypeError in DataFrame.nsmallest and DataFrame.nlargest when columns is not given (GH#10301) Patrick Hoefler * Improve sizeof for pd.MultiIndex (GH#10230) Patrick Hoefler * Support duplicated columns in a bunch of DataFrame methods (GH#10261) Patrick Hoefler * Add numeric_only support to DataFrame.idxmin and DataFrame.idxmax (GH#10253) Patrick Hoefler * Implement numeric_only support for DataFrame.quantile (GH#10259) Patrick Hoefler * Add support for numeric_only=False in DataFrame.std (GH#10251) Patrick Hoefler * Implement numeric_only=False for GroupBy.cumprod and GroupBy.cumsum (GH#10262) Patrick Hoefler * Implement numeric_only for skew and kurtosis (GH#10258) Patrick Hoefler * mask and where should accept a callable (GH#10289) Irina Truong * Fix conversion from Categorical to pa.dictionary in read_parquet (GH#10285) Patrick Hoefler [#]# Bug Fixes * Spurious config on nested annotations (GH#10318) crusaderky * Fix rechunking behavior for dimensions with known and unknown chunk sizes (GH#10157) Hendrik Makait * Enable drop to support mismatched partitions (GH#10300) James Bourbeau * Fix divisions construction for to_timestamp (GH#10304) Patrick Hoefler * pandas ExtensionDtype raising in Series reduction operations (GH#10149) Patrick Hoefler * Fix regression in da.random interface (GH#10247) Eray Aslan * da.coarsen doesn’t trim an empty chunk in meta (GH#10281) Irina Truong * Fix dtype inference for engine="pyarrow" in read_csv (GH#10280) Patrick Hoefler - Release 2023.5.0 [#]# Enhancements * Implement numeric_only=False for GroupBy.corr and GroupBy.cov (GH#10264) Patrick Hoefler * Add support for numeric_only=False in DataFrame.var (GH#10250) Patrick Hoefler * Add numeric_only support to DataFrame.mode (GH#10257) Patrick Hoefler * Add DataFrame.map to dask.DataFrame API (GH#10246) Patrick Hoefler * Adjust for DataFrame.applymap deprecation and all NA concat behaviour change (GH#10245) Patrick Hoefler * Enable numeric_only=False for DataFrame.count (GH#10234) Patrick Hoefler * Disallow array input in mask/where (GH#10163) Irina Truong * Support numeric_only=True in GroupBy.corr and GroupBy.cov (GH#10227) Patrick Hoefler * Add numeric_only support to GroupBy.median (GH#10236) Patrick Hoefler * Support mimesis=9 in dask.datasets (GH#10241) James Bourbeau * Add numeric_only support to min, max and prod (GH#10219) Patrick Hoefler * Add numeric_only=True support for GroupBy.cumsum and GroupBy.cumprod (GH#10224) Patrick Hoefler * Add helper to unpack numeric_only keyword (GH#10228) Patrick Hoefler [#]# Bug Fixes * Fix clone + from_array failure (GH#10211) crusaderky * Fix dataframe reductions for ea dtypes (GH#10150) Patrick Hoefler * Avoid scalar conversion deprecation warning in numpy=1.25 (GH#10248) James Bourbeau * Make sure transform output has the same index as input (GH#10184) Irina Truong * Fix corr and cov on a single-row partition (GH#9756) Irina Truong * Fix test_groupby_numeric_only_supported and test_groupby_aggregate_categorical_observed upstream errors (GH#10243) Irina Truong - Release 2023.4.1 [#]# Enhancements * Implement numeric_only support for DataFrame.sum (GH#10194) Patrick Hoefler * Add support for numeric_only=True in GroupBy operations (GH#10222) Patrick Hoefler * Avoid deep copy in DataFrame.__setitem__ for pandas 1.4 and up (GH#10221) Patrick Hoefler * Avoid calling Series.apply with _meta_nonempty (GH#10212) Patrick Hoefler * Unpin sqlalchemy and fix compatibility issues (GH#10140) Patrick Hoefler [#]# Bug Fixes * Partially revert default client discovery (GH#10225) Florian Jetter * Support arrow dtypes in Index meta creation (GH#10170) Patrick Hoefler * Repartitioning raises with extension dtype when truncating floats (GH#10169) Patrick Hoefler * Adjust empty Index from fastparquet to object dtype (GH#10179) Patrick Hoefler - Release 2023.4.0 [#]# Enhancements * Override old default values in update_defaults (GH#10159) Gabe Joseph * Add a CLI command to list and get a value from dask config (GH#9936) Irina Truong * Handle string-based engine argument to read_json (GH#9947) Richard (Rick) Zamora * Avoid deprecated GroupBy.dtypes (GH#10111) Irina Truong [#]# Bug Fixes * Revert grouper-related changes (GH#10182) Irina Truong * GroupBy.cov raising for non-numeric grouping column (GH#10171) Patrick Hoefler * Updates for Index supporting numpy numeric dtypes (GH#10154) Irina Truong * Preserve dtype for partitioning columns when read with pyarrow (GH#10115) Patrick Hoefler * Fix annotations for to_hdf (GH#10123) Hendrik Makait * Handle None column name when checking if columns are all numeric (GH#10128) Lawrence Mitchell * Fix valid_divisions when passed a tuple (GH#10126) Brian Phillips * Maintain annotations in DataFrame.categorize (GH#10120) Hendrik Makait * Fix handling of missing min/max parquet statistics during filtering (GH#10042) Richard (Rick) Zamora [#]# Deprecations * Deprecate use_nullable_dtypes= and add dtype_backend= (GH#10076) Irina Truong * Deprecate convert_dtype in Series.apply (GH#10133) Irina Truong - Drop dask-pr10042-parquetstats.patch * Tue Apr 04 2023 Ben Greiner <code@bnavigator.de> - Drop python38 test flavor * Thu Mar 30 2023 Ben Greiner <code@bnavigator.de> - Enable pyarrow in the [complete] extra * Mon Mar 27 2023 Ben Greiner <code@bnavigator.de> - Update to 2023.3.2 [#]# Enhancements * Deprecate observed=False for groupby with categoricals (GH#10095) Irina Truong * Deprecate axis= for some groupby operations (GH#10094) James Bourbeau * The axis keyword in DataFrame.rolling/Series.rolling is deprecated (GH#10110) Irina Truong * DataFrame._data deprecation in pandas (GH#10081) Irina Truong * Use importlib_metadata backport to avoid CLI UserWarning (GH#10070) Thomas Grainger * Port option parsing logic from dask.dataframe.read_parquet to to_parquet (GH#9981) Anton Loukianov [#]# Bug Fixes * Avoid using dd.shuffle in groupby-apply (GH#10043) Richard (Rick) Zamora * Enable null hive partitions with pyarrow parquet engine (GH#10007) Richard (Rick) Zamora * Support unknown shapes in *_like functions (GH#10064) Doug Davis [#]# Maintenance * Restore Entrypoints compatibility (GH#10113) Jacob Tomlinson * Allow pyarrow build to continue on failures (GH#10097) James Bourbeau * Fix test_set_index_on_empty with pyarrow strings active (GH#10054) Irina Truong * Temporarily skip pyarrow_compat tests with pandas 2.0 (GH#10063) James Bourbeau * Sun Mar 26 2023 Ben Greiner <code@bnavigator.de> - Add dask-pr10042-parquetstats.patch gh#dask/dask#10042 - Enable python311 build: numba is not a strict requirement * Sat Mar 11 2023 Ben Greiner <code@bnavigator.de> - Update to v2023.3.1 [#]# Enhancements * Support pyarrow strings in MultiIndex (GH#10040) Irina Truong * Improved support for pyarrow strings (GH#10000) Irina Truong * Fix flaky RuntimeWarning during array reductions (GH#10030) James Bourbeau * Extend complete extras (GH#10023) James Bourbeau * Raise an error with dataframe.convert_string=True and pandas<2.0 (GH#10033) Irina Truong * Rename shuffle/rechunk config option/kwarg to method (GH#10013) James Bourbeau * Add initial support for converting pandas extension dtypes to arrays (GH#10018) James Bourbeau * Remove randomgen support (GH#9987) Eray Aslan [#]# Bug Fixes * Skip rechunk when rechunking to the same chunks with unknown sizes (GH#10027) Hendrik Makait * Custom utility to convert parquet filters to pyarrow expression (GH#9885) Richard (Rick) Zamora * Consider numpy scalars and 0d arrays as scalars when padding (GH#9653) Justus Magin * Fix parquet overwrite behavior after an adaptive read_parquet operation (GH#10002) Richard (Rick) Zamora [#]# Maintenance * Remove stale hive-partitioning code from pyarrow parquet engine (GH#10039) Richard (Rick) Zamora * Increase minimum supported pyarrow to 7.0 (GH#10024) James Bourbeau * Revert “Prepare drop packunpack (GH#9994) (GH#10037) Florian Jetter * Have codecov wait for more builds before reporting (GH#10031) James Bourbeau * Prepare drop packunpack (GH#9994) Florian Jetter * Add CI job with pyarrow strings turned on (GH#10017) James Bourbeau * Fix test_groupby_dropna_with_agg for pandas 2.0 (GH#10001) Irina Truong * Fix test_pickle_roundtrip for pandas 2.0 (GH#10011) James Bourbeau * Wed Mar 08 2023 Benjamin Greiner <code@bnavigator.de> - Update dependencies - Skip one more test failing because of missing pyarrow * Wed Mar 08 2023 Dirk Müller <dmueller@suse.com> - update to 2023.3.0: * Bag must not pick p2p as shuffle default (:pr:`10005`) * Minor follow-up to P2P by default (:pr:`10008`) `James Bourbeau`_ * Add minimum version to optional ``jinja2`` dependency (:pr:`9999`) `Charles Blackmon-Luca`_ * Enable P2P shuffling by default * P2P rechunking * Efficient `dataframe.convert_string` support for `read_parquet` * Allow p2p shuffle kwarg for DataFrame merges * Change ``split_row_groups`` default to "infer" * Add option for converting string data to use ``pyarrow`` strings * Add support for multi-column ``sort_values`` * ``Generator`` based random-number generation in``dask.array`` * Support ``numeric_only`` for simple groupby aggregations for ``pandas`` 2.0 compatibility * Fix profilers plot not being aligned to context manager enter time * Relax dask.dataframe assert_eq type checks * Restore ``describe`` compatibility for ``pandas`` 2.0 * Improving deploying Dask docs * More docs for ``DataFrame.partitions`` * Update docs with more information on default Delayed scheduler * Deployment Considerations documentation * Temporarily rerun flaky tests * Update parsing of FULL_RAPIDS_VER/FULL_UCX_PY_VER * Increase minimum supported versions to ``pandas=1.3`` and ``numpy=1.21`` * Fix ``std`` to work with ``numeric_only`` for ``pandas`` 2.0 * Temporarily ``xfail`` ``test_roundtrip_partitioned_pyarrow_dataset`` (:pr:`9977`) * Fix copy on write failure in `test_idxmaxmin` (:pr:`9944`) * Bump ``pre-commit`` versions (:pr:`9955`) `crusaderky`_ * Fix ``test_groupby_unaligned_index`` for ``pandas`` 2.0 * Un-``xfail`` ``test_set_index_overlap_2`` for ``pandas`` 2.0 * Fix ``test_merge_by_index_patterns`` for ``pandas`` 2.0 * Bump jacobtomlinson/gha-find-replace from 2 to 3 (:pr:`9953`) * Fix ``test_rolling_agg_aggregate`` for ``pandas`` 2.0 compatibility * Bump ``black`` to ``23.1.0`` * Run GPU tests on python 3.8 & 3.10 (:pr:`9940`) * Fix ``test_to_timestamp`` for ``pandas`` 2.0 (:pr:`9932`) * Fix an error with ``groupby`` ``value_counts`` for ``pandas`` 2.0 compatibility * Config converter: replace all dashes with underscores * Sun Feb 26 2023 Ben Greiner <code@bnavigator.de> - Prepare test multiflavors for python311, but skip python311 * Numba is not ready for python 3.11 yet gh#numba/numba#8304 * Fri Feb 17 2023 Ben Greiner <code@bnavigator.de> - Update to 2023.2.0 [#]# Enhancements * Update numeric_only default in quantile for pandas 2.0 (GH#9854) Irina Truong * Make repartition a no-op when divisions match (GH#9924) James Bourbeau * Update datetime_is_numeric behavior in describe for pandas 2.0 (GH#9868) Irina Truong * Update value_counts to return correct name in pandas 2.0 (GH#9919) Irina Truong * Support new axis=None behavior in pandas 2.0 for certain reductions (GH#9867) James Bourbeau * Filter out all-nan RuntimeWarning at the chunk level for nanmin and nanmax (GH#9916) Julia Signell * Fix numeric meta_nonempty index creation for pandas 2.0 (GH#9908) James Bourbeau * Fix DataFrame.info() tests for pandas 2.0 (GH#9909) James Bourbeau [#]# Bug Fixes * Fix GroupBy.value_counts handling for multiple groupby columns (GH#9905) Charles Blackmon-Luca * Sun Feb 05 2023 Ben Greiner <code@bnavigator.de> - Update to 2023.1.1 [#]# Enhancements * Add to_backend method to Array and _Frame (GH#9758) Richard (Rick) Zamora * Small fix for timestamp index divisions in pandas 2.0 (GH#9872) Irina Truong * Add numeric_only to DataFrame.cov and DataFrame.corr (GH#9787) James Bourbeau * Fixes related to group_keys default change in pandas 2.0 (GH#9855) Irina Truong * infer_datetime_format compatibility for pandas 2.0 (GH#9783) James Bourbeau [#]# Bug Fixes * Fix serialization bug in BroadcastJoinLayer (GH#9871) Richard (Rick) Zamora * Satisfy broadcast argument in DataFrame.merge (GH#9852) Richard (Rick) Zamora * Fix pyarrow parquet columns statistics computation (GH#9772) aywandji [#]# Documentation * Fix “duplicate explicit target name” docs warning (GH#9863) Chiara Marmo * Fix code formatting issue in “Defining a new collection backend” docs (GH#9864) Chiara Marmo * Update dashboard documentation for memory plot (GH#9768) Jayesh Manani * Add docs section about no-worker tasks (GH#9839) Florian Jetter [#]# Maintenance * Additional updates for detecting a distributed scheduler (GH#9890) James Bourbeau * Update gpuCI RAPIDS_VER to 23.04 (GH#9876) * Reverse precedence between collection and distributed default (GH#9869) Florian Jetter * Update xarray-contrib/issue-from-pytest-log to version 1.2.6 (GH#9865) James Bourbeau * Dont require dask config shuffle default (GH#9826) Florian Jetter * Un-xfail datetime64 Parquet roundtripping tests for new fastparquet (GH#9811) James Bourbeau * Add option to manually run upstream CI build (GH#9853) James Bourbeau * Use custom timeout in CI builds (GH#9844) James Bourbeau * Remove kwargs from make_blockwise_graph (GH#9838) Florian Jetter * Ignore warnings on persist call in test_setitem_extended_API_2d_mask (GH#9843) Charles Blackmon-Luca * Fix running S3 tests locally (GH#9833) James Bourbeau - Release 2023.1.0 [#]# Enhancements * Use distributed default clients even if no config is set (GH#9808) Florian Jetter * Implement ma.where and ma.nonzero (GH#9760) Erik Holmgren * Update zarr store creation functions (GH#9790) Ryan Abernathey * iteritems compatibility for pandas 2.0 (GH#9785) James Bourbeau * Accurate sizeof for pandas string[python] dtype (GH#9781) crusaderky * Deflate sizeof() of duplicate references to pandas object types (GH#9776) crusaderky * GroupBy.__getitem__ compatibility for pandas 2.0 (GH#9779) James Bourbeau * append compatibility for pandas 2.0 (GH#9750) James Bourbeau * get_dummies compatibility for pandas 2.0 (GH#9752) James Bourbeau * is_monotonic compatibility for pandas 2.0 (GH#9751) James Bourbeau * numpy=1.24 compatability (GH#9777) James Bourbeau [#]# Documentation * Remove duplicated encoding kwarg in docstring for to_json (GH#9796) Sultan Orazbayev * Mention SubprocessCluster in LocalCluster documentation (GH#9784) Hendrik Makait * Move Prometheus docs to dask/distributed (GH#9761) crusaderky [#]# Maintenance * Temporarily ignore RuntimeWarning in test_setitem_extended_API_2d_mask (GH#9828) James Bourbeau * Fix flaky test_threaded.py::test_interrupt (GH#9827) Hendrik Makait * Update xarray-contrib/issue-from-pytest-log in upstream report (GH#9822) James Bourbeau * pip install dask on gpuCI builds (GH#9816) Charles Blackmon-Luca * Bump actions/checkout from 3.2.0 to 3.3.0 (GH#9815) * Resolve sqlalchemy import failures in mindeps testing (GH#9809) Charles Blackmon-Luca * Ignore sqlalchemy.exc.RemovedIn20Warning (GH#9801) Thomas Grainger * xfail datetime64 Parquet roundtripping tests for pandas 2.0 (GH#9786) James Bourbeau * Remove sqlachemy 1.3 compatibility (GH#9695) McToel * Reduce size of expected DoK sparse matrix (GH#9775) Elliott Sales de Andrade * Remove executable flag from dask/dataframe/io/orc/utils.py (GH#9774) Elliott Sales de Andrade - Drop dask-pr9777-np1.24.patch * Mon Jan 02 2023 Ben Greiner <code@bnavigator.de> - Update to 2022.12.1 [#]# Enhancements * Support dtype_backend="pandas|pyarrow" configuration (GH#9719) James Bourbeau * Support cupy.ndarray to cudf.DataFrame dispatching in dask.dataframe (GH#9579) Richard (Rick) Zamora * Make filesystem-backend configurable in read_parquet (GH#9699) Richard (Rick) Zamora * Serialize all pyarrow extension arrays efficiently (GH#9740) James Bourbeau [#]# Bug Fixes * Fix bug when repartitioning with tz-aware datetime index (GH#9741) James Bourbeau * Partial functions in aggs may have arguments (GH#9724) Irina Truong * Add support for simple operation with pyarrow-backed extension dtypes (GH#9717) James Bourbeau * Rename columns correctly in case of SeriesGroupby (GH#9716) Lawrence Mitchell [#]# Maintenance * Add zarr to Python 3.11 CI environment (GH#9771) James Bourbeau * Add support for Python 3.11 (GH#9708) Thomas Grainger * Bump actions/checkout from 3.1.0 to 3.2.0 (GH#9753) * Avoid np.bool8 deprecation warning (GH#9737) James Bourbeau * Make sure dev packages aren’t overwritten in upstream CI build (GH#9731) James Bourbeau * Avoid adding data.h5 and mydask.html files during tests (GH#9726) Thomas Grainger - Release 2022.12.0 [#]# Enhancements * Remove statistics-based set_index logic from read_parquet (GH#9661) Richard (Rick) Zamora * Add support for use_nullable_dtypes to dd.read_parquet (GH#9617) Ian Rose * Fix map_overlap in order to accept pandas arguments (GH#9571) Fabien Aulaire * Fix pandas 1.5+ FutureWarning in .str.split(..., expand=True) (GH#9704) Jacob Hayes * Enable column projection for groupby slicing (GH#9667) Richard (Rick) Zamora * Support duplicate column cum-functions (GH#9685) Ben * Improve error message for failed backend dispatch call (GH#9677) Richard (Rick) Zamora [#]# Bug Fixes * Revise meta creation in arrow parquet engine (GH#9672) Richard (Rick) Zamora * Fix da.fft.fft for array-like inputs (GH#9688) James Bourbeau * Fix groupby -aggregation when grouping on an index by name (GH#9646) Richard (Rick) Zamora [#]# Maintenance * Avoid PytestReturnNotNoneWarning in test_inheriting_class (GH#9707) Thomas Grainger * Fix flaky test_dataframe_aggregations_multilevel (GH#9701) Richard (Rick) Zamora * Bump mypy version (GH#9697) crusaderky * Disable dashboard in test_map_partitions_df_input (GH#9687) James Bourbeau * Use latest xarray-contrib/issue-from-pytest-log in upstream build (GH#9682) James Bourbeau * xfail ttest_1samp for upstream scipy (GH#9670) James Bourbeau * Update gpuCI RAPIDS_VER to 23.02 (GH#9678) - Add dask-pr9777-np1.24.patch gh#dask/dask#9777 - Move to PEP517 build * Mon Nov 21 2022 Ben Greiner <code@bnavigator.de> - Go back to bokeh 2.4 * gh#dask/dask#9659 * we provide a legacy bokeh2 instead * Sun Nov 20 2022 Ben Greiner <code@bnavigator.de> - Update to version 2022.11.1 [#]# Enhancements * Restrict bokeh=3 support (GH#9673) Gabe Joseph (ignored in rpm, fixed by bokek 3.0.2, see gh#dask/dask#9659) * Updates for fastparquet evolution (GH#9650) Martin Durant [#]# Maintenance * Revert importlib.metadata workaround (GH#9658) James Bourbeau - Release 2022.11.0 [#]# Enhancements * Generalize from_dict implementation to allow usage from other backends (GH#9628) GALI PREM SAGAR [#]# Bug Fixes * Avoid pandas constructors in dask.dataframe.core (GH#9570) Richard (Rick) Zamora * Fix sort_values with Timestamp data (GH#9642) James Bourbeau * Generalize array checking and remove pd.Index call in _get_partitions (GH#9634) Benjamin Zaitlen * Fix read_csv behavior for header=0 and names (GH#9614) Richard (Rick) Zamora [#]# Maintenance * Allow bokeh=3 (GH#9659) James Bourbeau * Add pre-commit to catch breakpoint() (GH#9638) James Bourbeau * Bump xarray-contrib/issue-from-pytest-log from 1.1 to 1.2 (GH#9635) * Remove blosc references (GH#9625) Naty Clementi * Harden test_repartition_npartitions (GH#9585) Richard (Rick) Zamora - Release 2022.10.2 * This was a hotfix and has no changes in this repository. The necessary fix was in dask/distributed, but we decided to bump this version number for consistency. - Release 2022.10.1 [#]# Enhancements * Enable named aggregation syntax (GH#9563) ChrisJar * Add extension dtype support to set_index (GH#9566) James Bourbeau * Redesigning the array HTML repr for clarity (GH#9519) Shingo OKAWA [#]# Bug Fixes * Fix merge with emtpy left DataFrame (GH#9578) Ian Rose [#]# Maintenance * Require Click 7.0+ in Dask (GH#9595) John A Kirkham * Temporarily restrict bokeh<3 (GH#9607) James Bourbeau * Resolve importlib-related failures in upstream CI (GH#9604) Charles Blackmon-Luca * Remove setuptools host dep, add CLI entrypoint (GH#9600) Charles Blackmon-Luca * More Backend dispatch class type annotations (GH#9573) Ian Rose - Create a -test subpackage in order to avoid rpmlint errors - Drop extra conftest: included in sdist. * Fri Oct 21 2022 Ben Greiner <code@bnavigator.de> - Update to version 2022.10.0 * Backend library dispatching for IO in Dask-Array and Dask-DataFrame (GH#9475) Richard (Rick) Zamora * Add new CLI that is extensible (GH#9283) Doug Davis * Groupby median (GH#9516) Ian Rose * Fix array copy not being a no-op (GH#9555) David Hoese * Add support for string timedelta in map_overlap (GH#9559) Nicolas Grandemange * Shuffle-based groupby for single functions (GH#9504) Ian Rose * Make datetime.datetime tokenize idempotantly (GH#9532) Martin Durant * Support tokenizing datetime.time (GH#9528) Tim Paine * Avoid race condition in lazy dispatch registration (GH#9545) James Bourbeau * Do not allow setitem to np.nan for int dtype (GH#9531) Doug Davis * Stable demo column projection (GH#9538) Ian Rose * Ensure pickle-able binops in delayed (GH#9540) Ian Rose * Fix project CSV columns when selecting (GH#9534) Martin Durant * Update Parquet best practice (GH#9537) Matthew Rocklin - move -all metapackage to -complete, mirroring upstream's [complete] extra. * Fri Sep 30 2022 Arun Persaud <arun@gmx.de> - update to version 2022.9.2: * Enhancements + Remove factorization logic from array auto chunking (:pr:`9507`) `James Bourbeau`_ * Documentation + Add docs on running Dask in a standalone Python script (:pr:`9513`) `James Bourbeau`_ + Clarify custom-graph multiprocessing example (:pr:`9511`) `nouman`_ * Maintenance + Groupby sort upstream compatibility (:pr:`9486`) `Ian Rose`_ * Fri Sep 16 2022 Arun Persaud <arun@gmx.de> - update to version 2022.9.1: * New Features + Add "DataFrame" and "Series" "median" methods (:pr:`9483`) `James Bourbeau`_ * Enhancements + Shuffle "groupby" default (:pr:`9453`) `Ian Rose`_ + Filter by list (:pr:`9419`) `Greg Hayes`_ + Added "distributed.utils.key_split" functionality to "dask.utils.key_split" (:pr:`9464`) `Luke Conibear`_ * Bug Fixes + Fix overlap so that "set_index" doesn't drop rows (:pr:`9423`) `Julia Signell`_ + Fix assigning pandas "Series" to column when "ddf.columns.min()" raises (:pr:`9485`) `Erik Welch`_ + Fix metadata comparison "stack_partitions" (:pr:`9481`) `James Bourbeau`_ + Provide default for "split_out" (:pr:`9493`) `Lawrence Mitchell`_ * Deprecations + Allow "split_out" to be "None", which then defaults to "1" in "groupby().aggregate()" (:pr:`9491`) `Ian Rose`_ * Documentation + Fixing "enforce_metadata" documentation, not checking for dtypes (:pr:`9474`) `Nicolas Grandemange`_ + Fix "it's" --> "its" typo (:pr:`9484`) `Nat Tabris`_ * Maintenance + Workaround for parquet writing failure using some datetime series but not others (:pr:`9500`) `Ian Rose`_ + Filter out "numeric_only" warnings from "pandas" (:pr:`9496`) `James Bourbeau`_ + Avoid "set_index(..., inplace=True)" where not necessary (:pr:`9472`) `James Bourbeau`_ + Avoid passing groupby key list of length one (:pr:`9495`) `James Bourbeau`_ + Update "test_groupby_dropna_cudf" based on "cudf" support for "group_keys" (:pr:`9482`) `James Bourbeau`_ + Remove "dd.from_bcolz" (:pr:`9479`) `James Bourbeau`_ + Added "flake8-bugbear" to "pre-commit" hooks (:pr:`9457`) `Luke Conibear`_ + Bind loop variables in function definitions ("B023") (:pr:`9461`) `Luke Conibear`_ + Added assert for comparisons ("B015") (:pr:`9459`) `Luke Conibear`_ + Set top-level default shell in CI workflows (:pr:`9469`) `James Bourbeau`_ + Removed unused loop control variables ("B007") (:pr:`9458`) `Luke Conibear`_ + Replaced "getattr" calls for constant attributes ("B009") (:pr:`9460`) `Luke Conibear`_ + Pin "libprotobuf" to allow nightly "pyarrow" in the upstream CI build (:pr:`9465`) `Joris Van den Bossche`_ + Replaced mutable data structures for default arguments ("B006") (:pr:`9462`) `Luke Conibear`_ + Changed "flake8" mirror and updated version (:pr:`9456`) `Luke Conibear`_ * Sat Sep 10 2022 Arun Persaud <arun@gmx.de> - update to version 2022.9.0: * Enhancements + Enable automatic column projection for "groupby" aggregations (:pr:`9442`) `Richard (Rick) Zamora`_ + Accept superclasses in NEP-13/17 dispatching (:pr:`6710`) `Gabe Joseph`_ * Bug Fixes + Rename "by" columns internally for cumulative operations on the same "by" columns (:pr:`9430`) `Pavithra Eswaramoorthy`_ + Fix "get_group" with categoricals (:pr:`9436`) `Pavithra Eswaramoorthy`_ + Fix caching-related "MaterializedLayer.cull" performance regression (:pr:`9413`) `Richard (Rick) Zamora`_ * Documentation + Add maintainer documentation page (:pr:`9309`) `James Bourbeau`_ * Maintenance + Revert skipped fastparquet test (:pr:`9439`) `Pavithra Eswaramoorthy`_ + "tmpfile" does not end files with period on empty extension (:pr:`9429`) `Hendrik Makait`_ + Skip failing fastparquet test with latest release (:pr:`9432`) `James Bourbeau`_ * Thu Sep 01 2022 Steve Kowalik <steven.kowalik@suse.com> - Update to 2022.8.1: * Implement ma.*_like functions (:pr:`9378`) `Ruth Comer`_ * Fuse compatible annotations (:pr:`9402`) `Ian Rose`_ * Shuffle-based groupby aggregation for high-cardinality groups (:pr:`9302`) `Richard (Rick) Zamora`_ * Unpack namedtuple (:pr:`9361`) `Hendrik Makait`_ * Fix SeriesGroupBy cumulative functions with axis=1 (:pr:`9377`) `Pavithra Eswaramoorthy`_ * Sparse array reductions (:pr:`9342`) `Ian Rose`_ * Fix make_meta while using categorical column with index (:pr:`9348`) `Pavithra Eswaramoorthy`_ * Don't allow incompatible keywords in DataFrame.dropna (:pr:`9366`) `Naty Clementi`_ * Make set_index handle entirely empty dataframes (:pr:`8896`) `Julia Signell`_ * Improve dataclass handling in unpack_collections (:pr:`9345`) `Hendrik Makait`_ * Fix bag sampling when there are some smaller partitions (:pr:`9349`) `Ian Rose`_ * Add support for empty partitions to da.min/da.max functions (:pr:`9268`) `geraninam`_ * Use entry_points utility in sizeof (:pr:`9390`) `James Bourbeau`_ * Add entry_points compatibility utility (:pr:`9388`) `Jacob Tomlinson`_ * Upload environment file artifact for each CI build (:pr:`9372`) `James Bourbeau`_ * Remove werkzeug pin in CI (:pr:`9371`) `James Bourbeau`_ * Fix type annotations for dd.from_pandas and dd.from_delayed (:pr:`9362`) `Jordan Yap`_ * Ensure make_meta doesn't hold ref to data (:pr:`9354`) `Jim Crist-Harif`_ * Revise divisions logic in from_pandas (:pr:`9221`) `Richard (Rick) Zamora`_ * Warn if user sets index with existing index (:pr:`9341`) `Julia Signell`_ * Add keepdims keyword for da.average (:pr:`9332`) `Ruth Comer`_ * Change repr methods to avoid Layer materialization (:pr:`9289`) `Richard (Rick) Zamora`_ * Make sure order kwarg will not crash the astype method (:pr:`9317`) `Genevieve Buckley`_ * Fix bug for cumsum on cupy chunked dask arrays (:pr:`9320`) `Genevieve Buckley`_ * Match input and output structure in _sample_reduce (:pr:`9272`) `Pavithra Eswaramoorthy`_ * Include meta in array serialization (:pr:`9240`) `Frédéric BRIOL`_ * Fix Index.memory_usage (:pr:`9290`) `James Bourbeau`_ * Fix division calculation in dask.dataframe.io.from_dask_array (:pr:`9282`) `Jordan Yap`_ * Switch js-yaml for yaml.js in config converter (:pr:`9306`) `Jacob Tomlinson`_ * Update da.linalg.solve for SciPy 1.9.0 compatibility (:pr:`9350`) `Pavithra Eswaramoorthy`_ * Update test_getitem_avoids_large_chunks_missing (:pr:`9347`) `Pavithra Eswaramoorthy`_ * Import loop_in_thread fixture in tests (:pr:`9337`) `James Bourbeau`_ * Temporarily xfail test_solve_sym_pos (:pr:`9336`) `Pavithra Eswaramoorthy`_ * Update gpuCI RAPIDS_VER to 22.10 (:pr:`9314`) * Return Dask array if all axes are squeezed (:pr:`9250`) `Pavithra Eswaramoorthy`_ * Make cycle reported by toposort shorter (:pr:`9068`) `Erik Welch`_ * Unknown chunk slicing - raise informative error (:pr:`9285`) `Naty Clementi`_ * Fix bug in HighLevelGraph.cull (:pr:`9267`) `Richard (Rick) Zamora`_ * Sort categories (:pr:`9264`) `Pavithra Eswaramoorthy`_ * Use max (instead of sum) for calculating warnsize (:pr:`9235`) `Pavithra Eswaramoorthy`_ * Fix bug when filtering on partitioned column with pyarrow (:pr:`9252`) `Richard (Rick) Zamora`_ * Add type annotations to dd.from_pandas and dd.from_delayed (:pr:`9237`) `Michael Milton`_ * Update test_plot_multiple for upcoming bokeh release (:pr:`9261`) `James Bourbeau`_ * Add typing to common array properties (:pr:`9255`) `Illviljan`_ * Mon Jul 11 2022 Arun Persaud <arun@gmx.de> - update to version 2022.7.0: * Enhancements + Support "pathlib.PurePath" in "normalize_token" (:pr:`9229`) `Angus Hollands`_ + Add "AttributeNotImplementedError" for properties so IPython glob search works (:pr:`9231`) `Erik Welch`_ + "map_overlap": multiple dataframe handling (:pr:`9145`) `Fabien Aulaire`_ + Read entrypoints in "dask.sizeof" (:pr:`7688`) `Angus Hollands`_ * Bug Fixes + Fix "TypeError: 'Serialize' object is not subscriptable" when writing parquet dataset with "Client(processes=False)" (:pr:`9015`) `Lucas Miguel Ponce`_ + Correct dtypes when "concat" with an empty dataframe (:pr:`9193`) `Pavithra Eswaramoorthy`_ * Documentation + Highlight note about persist (:pr:`9234`) `Pavithra Eswaramoorthy`_ + Update release-procedure to include more detail and helpful commands (:pr:`9215`) `Julia Signell`_ + Better SEO for Futures and Dask vs. Spark pages (:pr:`9217`) `Sarah Charlotte Johnson`_ * Maintenance + Use "math.prod" instead of "np.prod" on lists, tuples, and iters (:pr:`9232`) `crusaderky`_ + Only import IPython if type checking (:pr:`9230`) `Florian Jetter`_ + Tougher mypy checks (:pr:`9206`) `crusaderky`_ * Fri Jun 24 2022 Ben Greiner <code@bnavigator.de> - Update to to 2022.6.1 * Enhancements - Dask in pyodide (GH#9053) Ian Rose - Create dask.utils.show_versions (GH#9144) Sultan Orazbayev - Better error message for unsupported numpy operations on dask.dataframe objects. (GH#9201) Julia Signell - Add allow_rechunk kwarg to dask.array.overlap function (GH#7776) Genevieve Buckley - Add minutes and hours to dask.utils.format_time (GH#9116) Matthew Rocklin - More retries when writing parquet to remote filesystem (GH#9175) Ian Rose * Bug Fixes - Timedelta deterministic hashing (GH#9213) Fabien Aulaire - Enum deterministic hashing (GH#9212) Fabien Aulaire - shuffle_group(): avoid converting to arrays (GH#9157) Mads R. B. Kristensen * Deprecations - Deprecate extra format_time utility (GH#9184) James Bourbeau - Release 2022.6.0 * Enhancements - Add feature to show names of layer dependencies in HLG JupyterLab repr (GH#9081) Angelos Omirolis - Add arrow schema extraction dispatch (GH#9169) GALI PREM SAGAR - Add sort_results argument to assert_eq (GH#9130) Pavithra Eswaramoorthy - Add weeks to parse_timedelta (GH#9168) Matthew Rocklin - Warn that cloudpickle is not always deterministic (GH#9148) Pavithra Eswaramoorthy - Switch parquet default engine (GH#9140) Jim Crist-Harif - Use deterministic hashing with _iLocIndexer / _LocIndexer (GH#9108) Fabien Aulaire - Enfore consistent schema in to_parquet pyarrow (GH#9131) Jim Crist-Harif * Bug Fixes - Fix pyarrow.StringArray pickle (GH#9170) Jim Crist-Harif - Fix parallel metadata collection in pyarrow engine (GH#9165) Richard (Rick) Zamora - Improve pyarrow partitioning logic (GH#9147) James Bourbeau - pyarrow 8.0 partitioning fix (GH#9143) James Bourbeau - Release 2022.05.2 * Enhancements - Add a dispatch for non-pandas Grouper objects and use it in GroupBy (GH#9074) brandon-b-miller - Error if read_parquet & to_parquet files intersect (GH#9124) Jim Crist-Harif - Visualize task graphs using ipycytoscape (GH#9091) Ian Rose - Release 2022.05.1 * New Features - Add DataFrame.from_dict classmethod (GH#9017) Matthew Powers - Add from_map function to Dask DataFrame (GH#8911) Richard (Rick) Zamora * Enhancements - Improve to_parquet error for appended divisions overlap (GH#9102) Jim Crist-Harif - Enabled user-defined process-initializer functions (GH#9087) ParticularMiner - Mention align_dataframes=False option in map_partitions error (GH#9075) Gabe Joseph - Add kwarg enforce_ndim to dask.array.map_blocks() (GH#8865) ParticularMiner - Implement Series.GroupBy.fillna / DataFrame.GroupBy.fillna methods (GH#8869) Pavithra Eswaramoorthy - Allow fillna with Dask DataFrame (GH#8950) Pavithra Eswaramoorthy - Update error message for assignment with 1-d dask array (GH#9036) Pavithra Eswaramoorthy - Collection Protocol (GH#8674) Doug Davis - Patch around pandas ArrowStringArray pickling (GH#9024) Jim Crist-Harif - Band-aid for compute_as_if_collection (GH#8998) Ian Rose - Add p2p shuffle option (GH#8836) Matthew Rocklin * Bug Fixes - Fixup column projection with no columns (GH#9106) Jim Crist-Harif - Blockwise cull NumPy dtype (GH#9100) Ian Rose - Fix column-projection bug in from_map (GH#9078) Richard (Rick) Zamora - Prevent nulls in index for non-numeric dtypes (GH#8963) Jorge López - Fix is_monotonic methods for more than 8 partitions (GH#9019) Julia Signell - Handle enumerate and generator inputs to from_map (GH#9066) Richard (Rick) Zamora - Revert is_dask_collection; back to previous implementation (GH#9062) Doug Davis - Fix Blockwise.clone does not handle iterable literal arguments correctly (GH#8979) JSKenyon - Array setitem hardmask (GH#9027) David Hassell - Fix overlapping divisions error on append (GH#8997) Ian Rose * Deprecations - Add pre-deprecation warnings for read_parquet kwargs chunksize and aggregate_files (GH#9052) Richard (Rick) Zamora - Release 2022.05.0 * This is a bugfix release with doc changes only - Release 2022.04.2 * This release includes several deprecations/breaking API changes to dask.dataframe.read_parquet and dask.dataframe.to_parquet: - to_parquet no longer writes _metadata files by default. If you want to write a _metadata file, you can pass in write_metadata_file=True. - read_parquet now defaults to split_row_groups=False, which results in one Dask dataframe partition per parquet file when reading in a parquet dataset. If you’re working with large parquet files you may need to set split_row_groups=True to reduce your partition size. - read_parquet no longer calculates divisions by default. If you require read_parquet to return dataframes with known divisions, please set calculate_divisions=True. - read_parquet has deprecated the gather_statistics keyword argument. Please use the calculate_divisions keyword argument instead. - read_parquet has deprecated the require_extensions keyword argument. Please use the parquet_file_extension keyword argument instead. * New Features - Add removeprefix and removesuffix as StringMethods (GH#8912) Jorge López * Enhancements - Call fs.invalidate_cache in to_parquet (GH#8994) Jim Crist-Harif - Change to_parquet default to write_metadata_file=None (GH#8988) Jim Crist-Harif - Let arg reductions pass keepdims (GH#8926) Julia Signell - Change split_row_groups default to False in read_parquet (GH#8981) Richard (Rick) Zamora - Improve NotImplementedError message for da.reshape (GH#8987) Jim Crist-Harif - Simplify to_parquet compute path (GH#8982) Jim Crist-Harif - Raise an error if you try to use vindex with a Dask object (GH#8945) Julia Signell - Avoid pre_buffer=True when a precache method is specified (GH#8957) Richard (Rick) Zamora - from_dask_array uses blockwise instead of merging graphs (GH#8889) Bryan Weber - Use pre_buffer=True for “pyarrow” Parquet engine (GH#8952) Richard (Rick) Zamora * Bug Fixes - Handle dtype=None correctly in da.full (GH#8954) Tom White - Fix dask-sql bug caused by blockwise fusion (GH#8989) Richard (Rick) Zamora - to_parquet errors for non-string column names (GH#8990) Jim Crist-Harif - Make sure da.roll works even if shape is 0 (GH#8925) Julia Signell - Fix recursion error issue with set_index (GH#8967) Paul Hobson - Stringify BlockwiseDepDict mapping values when produces_keys=True (GH#8972) Richard (Rick) Zamora - Use DataFram`eIOLayer in DataFrame.from_delayed (GH#8852) Richard (Rick) Zamora - Check that values for the in predicate in read_parquet are correct (GH#8846) Bryan Weber - Fix bug for reduction of zero dimensional arrays (GH#8930) Tom White - Specify dtype when deciding division using np.linspace in read_sql_query (GH#8940) Cheun Hong * Deprecations - Deprecate gather_statistics from read_parquet (GH#8992) Richard (Rick) Zamora - Change require_extension to top-level parquet_file_extension read_parquet kwarg (GH#8935) Richard (Rick) Zamora - Release 2022.04.1 * New Features - Add missing NumPy ufuncs: abs, left_shift, right_shift, positive. (GH#8920) Tom White * Enhancements - Avoid collecting parquet metadata in pyarrow when write_metadata_file=False (GH#8906) Richard (Rick) Zamora - Better error for failed wildcard path in dd.read_csv() (fixes [#8878]) (GH#8908) Roger Filmyer - Return da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series (GH#8558) Julia Signell - Let get_dummies use meta computation in map_partitions (GH#8898) Julia Signell - Masked scalars input to da.from_array (GH#8895) David Hassell - Raise ValueError in merge_asof for duplicate kwargs (GH#8861) Bryan Weber * Bug Fixes - Make is_monotonic work when some partitions are empty (GH#8897) Julia Signell - Fix custom getter in da.from_array when inline_array=False (GH#8903) Ian Rose - Correctly handle dict-specification for rechunk. (GH#8859) Richard - Fix merge_asof: drop index column if left_on == right_on (GH#8874) Gil Forsyth * Deprecations - Warn users that engine='auto' will change in future (GH#8907) Jim Crist-Harif - Remove pyarrow-legacy engine from parquet API (GH#8835) Richard (Rick) Zamora - Release 2022.04.0 * This is the first release with support for Python 3.10 * New Features - Add Python 3.10 support (GH#8566) James Bourbeau * Enhancements - Add check on dtype.itemsize in order to produce a useful error (GH#8860) Davide Gavio - Add mild typing to common utils functions (GH#8848) Matthew Rocklin - Add sanity checks to divisions setter (GH#8806) Jim Crist-Harif - Use Blockwise and map_partitions for more tasks (GH#8831) Bryan Weber * Bug Fixes - Fix dataframe.merge_asof to preserve right_on column (GH#8857) Sarah Charlotte Johnson - Fix “Buffer dtype mismatch” for pandas >= 1.3 on 32bit (GH#8851) Ben Greiner - Fix slicing fusion by altering SubgraphCallable getter (GH#8827) Ian Rose * Deprecations - Remove support for PyPy (GH#8863) James Bourbeau - Drop setuptools at runtime (GH#8855) crusaderky - Remove dataframe.tseries.resample.getnanos (GH#8834) Sarah Charlotte Johnson - Drop dask-fix8169-pandas13.patch and dask-py310-test.patch * Sun Mar 27 2022 Ben Greiner <code@bnavigator.de> - dask.dataframe requires dask.bag (revealed by swifter test suite) * Fri Mar 25 2022 Ben Greiner <code@bnavigator.de> - Update to 2022.3.0 * Bag: add implementation for reservoir sampling * Add ma.count to Dask array * Change to_parquet default to compression="snappy" * Add weights parameter to dask.array.reduction * Add ddf.compute_current_divisions to get divisions on a sorted index or column * Pass __name__ and __doc__ through on DelayedLeaf * Raise exception for not implemented merge how option * Move Bag.map_partitions to Blockwise * Improve error messages for malformed config files * Revise column-projection optimization to capture common dask-sql patterns * Useful error for empty divisions * Scipy 1.8.0 compat: copy private classes into dask/array/stats.py - Release 2022.2.1 * Add aggregate functions first and last to dask.dataframe.pivot_table * Add std() support for datetime64 dtype for pandas-like objects * Add materialized task counts to HighLevelGraph and Layer html reprs * Do not allow iterating a DataFrameGroupBy * Fix missing newline after info() call on empty DataFrame * Add groupby.compute as a not implemented method * Improve multi dataframe join performance * Include bool type for Index * Allow ArrowDatasetEngine subclass to override pandas->arrow conversion also for partitioned write * Increase performance of k-diagonal extraction in da.diag() and da.diagonal() * Change linspace creation to match numpy when num equal to 0 * Tokenize dataclasses * Update tokenize to treat dict and kwargs differently - Release 2022.2.0 * Add region to to_zarr when using existing array * Add engine_kwargs support to dask.dataframe.to_sql * Add include_path_column arg to read_json * Add expand_dims to Dask array * Add scheduler option to assert_eq utilities * Fix eye inconsistency with NumPy for dtype=None * Fix concatenate inconsistency with NumPy for axis=None * Type annotations, part 1 * Really allow any iterable to be passed as a meta * Use map_partitions (Blockwise) in to_parquet - Update dask-fix8169-pandas13.patch - Add dask-py310-test.patch -- gh#dask/dask#8566 - Make the distributed/dask update sync requirement even more obvious. * Sat Jan 29 2022 Ben Greiner <code@bnavigator.de> - Update to 2022.1.1 * Add dask.dataframe.series.view() * Update tz for fastparquet + pandas 1.4.0 * Cleaning up misc tests for pandas compat * Moving to SQLAlchemy >= 1.4 * Pandas compat: Filter sparse warnings * Fail if meta is not a pandas object * Use fsspec.parquet module for better remote-storage read_parquet performance * Move DataFrame ACA aggregations to HLG * Add optional information about originating function call in DataFrameIOLayer * Blockwise array creation redux * Refactor config default search path retrieval * Add optimize_graph flag to Bag.to_dataframe function * Make sure that delayed output operations still return lists of paths * Pandas compat: Fix to_frame name to not pass None * Pandas compat: Fix axis=None warning * Expand Dask YAML config search directories * Fix groupby.cumsum with series grouped by index * Fix derived_from for pandas methods * Enforce boolean ascending for sort_values * Fix parsing of __setitem__ indices * Avoid divide by zero in slicing * Downgrade meta error in * Pandas compat: Deprecate append when pandas >= 1.4.0 * Replace outdated columns argument with meta in DataFrame constructor * Refactor deploying docs * Pin coverage in CI * Move cached_cumsum imports to be from dask.utils * Update gpuCI RAPIDS_VER to 22.04 * Update cocstring for from_delayed function * Handle plot_width / plot_height deprecations * Remove unnecessary pyyaml importorskip * Specify scheduler in DataFrame assert_eq * Tue Jan 25 2022 Ben Greiner <code@bnavigator.de> - Revert python310 enablement -- gh#dask/distributed#5460 * Tue Jan 25 2022 Dirk Müller <dmueller@suse.com> - reenable python 3.10 build as distributed is also reenabled * Thu Jan 20 2022 Ben Greiner <code@bnavigator.de> - Update to 2022.1.0 * Add groupby.shift method (GH#8522) kori73 * Add DataFrame.nunique (GH#8479) Sarah Charlotte Johnson * Add da.ndim to match np.ndim (GH#8502) Julia Signell * Replace interpolation with method and method with internal_method (GH#8525) Julia Signell * Remove daily stock demo utility (GH#8477) James Bourbeau * Add Series and Index is_monotonic* methods (GH#8304) Daniel Mesejo-León * Deprecate token keyword argument to map_blocks (GH#8464) James Bourbeau * Deprecation warning for default value of boundary kwarg in map_overlap (GH#8397) Genevieve Buckley - Skip python310: Not supported by distributed yet - - gh#dask/distributed#5350
/usr/share/licenses/python312-dask-complete /usr/share/licenses/python312-dask-complete/LICENSE.txt
Generated by rpm2html 1.8.1
Fabrice Bellet, Tue Jan 14 23:24:16 2025