Skip to content

Writing dask arrays to a ZipStore causes a corrupt zip file #3516

@csubich

Description

@csubich

Zarr version

3.1.3

Numcodecs version

0.16.1

Python Version

3.11.3

Operating System

Linux

Installation

Via binder, with xarray's blank_template.ipynb documentation example

Description

When dask provides the data for a zarr array that is backed by a ZipStore, the ZipStore on disk becomes corrupt and cannot be read back in, throwing a BadZipFile exception. This corruption does not happen with a LocalStore. Nontrivial sharding/chunking is not required.

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

# your reproducer code
import dask.array as da
import numpy as np
import zarr

dask_array = da.zeros((1,),dtype=np.float32)

store = zarr.storage.ZipStore("bug.zarr.zip",mode='w',read_only=False)
# store = zarr.storage.LocalStore("bug.zarr",read_only=False) # Does not error
group = zarr.open_group(store=store, mode="w")
zarr_array = group.create_array(
    name="data",
    shape=dask_array.shape,
    dtype=dask_array.dtype,
    overwrite=True,
)

da.to_zarr(dask_array, zarr_array)
store.close()

store_read = zarr.storage.ZipStore("bug.zarr.zip",mode='r',read_only=True)
# store_read = zarr.storage.LocalStore("bug.zarr",read_only=True) # Does not error
group_read = zarr.open_group(store=store_read, mode="r")
# Error:
## ---------------------------------------------------------------------------
## BadZipFile                                Traceback (most recent call last)
## [...]
## BadZipFile: Bad magic number for file header

array_read = group_read["data"]
array_read[0] # Should be array(0., dtype=float32)

# zarr.print_debug_info()

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions