ZFS on Linux/src c1b5801module/zfs arc.c

Minimize aggsum_compare(&arc_size, arc_c) calls.

For busy ARC situation when arc_size close to arc_c is desired.  But
then it is quite likely that aggsum_compare(&arc_size, arc_c) will need
to flush per-CPU buckets to find exact comparison result.  Doing that
often in a hot path penalizes whole idea of aggsum usage there, since it
replaces few simple atomic additions with dozens of lock acquisitions.

Replacing aggsum_compare() with aggsum_upper_bound() in code increasing
arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles
allows to save ~5% of CPU time in aggsum code during sequential write
to 12 ZVOLs with 16KB block size on large dual-socket system.

I suppose there some minor arc_p behavior change due to lower precision
of the new code, but I don't think it is a big deal, since it should
affect only very small window in time (aggsum buckets are flushed every
second) and in ARC size (buckets are limited to 10 average ARC blocks
per CPU).

Reviewed-by: Chris Dunlop <chris at onthe.net.au>
Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Allan Jude <allanjude at freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Closes #8901 
DeltaFile
+1-1module/zfs/arc.c
+1-11 files

ZFS on Linux/src 63b88f7. META

Tag zfs-0.8.1

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
DeltaFile
+1-1META
+1-11 files

ZFS on Linux/src b1b4ac2config always-python.m4 always-pyzfs.m4

Python config cleanup

Don't require Python at configure/build unless building pyzfs.
Move ZFS_AC_PYTHON_MODULE to always-pyzfs.m4 where it is used.
Make test syntax more consistent.

Sponsored by: iXsystems, Inc.
Reviewed-by: Neal Gompa <ngompa at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ryan Moeller <ryan at ixsystems.com>
Closes #8895 

ZFS on Linux/src 7218b29include/sys zio_compress.h

lz4_decompress_abd declared but not defined

`lz4_decompress_abd` is declared in zio_compress.h but it is not defined
anywhere. The declaration should be removed.

Reviewed by: Dan Kimmel <dan.kimmel at delphix.com>
Reviewed-by: Allan Jude <allanjude at freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens at delphix.com>
External-issue: DLPX-47477
Closes #8894 

ZFS on Linux/src 53dce5ainclude/sys vdev_removal.h, man/man5 zfs-module-parameters.5

panic in removal_remap test on 4K devices

If the zfs_remove_max_segment tunable is changed to be not a multiple of
the sector size, then the device removal code will malfunction and try
to create mappings that are smaller than one sector, leading to a panic.

On debug bits this assertion will fail in spa_vdev_copy_segment():
    ASSERT3U(DVA_GET_ASIZE(&dst), ==, size);

On nondebug, the system panics with a stack like:
    metaslab_free_concrete()
    metaslab_free_impl()
    metaslab_free_impl_cb()
    vdev_indirect_remap()
    free_from_removing_vdev()
    metaslab_free_impl()
    metaslab_free_dva()
    metaslab_free()

Fortunately, the default for zfs_remove_max_segment is 1MB, so this
can't occur by default.  We hit it during this test because
removal_remap.ksh changes zfs_remove_max_segment to 1KB. When testing on
4KB-sector disks, we hit the bug.

This change makes the zfs_remove_max_segment tunable more robust,

    [13 lines not shown]

ZFS on Linux/src be89734man/man5 zfs-module-parameters.5, module/zfs zio.c

compress metadata in later sync passes

Starting in sync pass 5 (zfs_sync_pass_dont_compress), we disable
compression (including of metadata).  Ostensibly this helps the sync
passes to converge (i.e. for a sync pass to not need to allocate
anything because it is 100% overwrites).

However, in practice it increases the average number of sync passes,
because when we turn compression off, a lot of block's size will change
and thus we have to re-allocate (not overwrite) them.  It also increases
the number of 128KB allocations (e.g. for indirect blocks and spacemaps)
because these will not be compressed.  The 128K allocations are
especially detrimental to performance on highly fragmented systems,
which may have very few free segments of this size, and may need to load
new metaslabs to satisfy 128K allocations.

We should increase zfs_sync_pass_dont_compress.  In practice on a highly
fragmented system we see a few 5-pass txg's, a tiny number of 6-pass
txg's, and no txg's with more than 6 passes.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Reviewed by: Pavel Zakharov <pavel.zakharov at delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Reviewed-by: George Wilson <george.wilson at delphix.com>

    [3 lines not shown]

ZFS on Linux/src ae5c78emodule/zfs vdev_queue.c

Move write aggregation memory copy out of vq_lock

Memory copy is too heavy operation to do under the congested lock.
Moving it out reduces congestion by many times to almost invisible.
Since the original zio removed from the queue, and the child zio is
not executed yet, I don't see why would the copy need protection.
My guess it just remained like this from the time when lock was not
dropped here, which was added later to fix lock ordering issue.

Multi-threaded sequential write tests with both HDD and SSD pools
with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major
reduction of lock congestion, saving from 15% to 35% of CPU time
and increasing throughput from 10% to 40%.

Reviewed-by: Richard Yao <ryao at gentoo.org>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Closes #8890 
DeltaFile
+12-10module/zfs/vdev_queue.c
+12-101 files

ZFS on Linux/src d3230d7man/man5 zfs-module-parameters.5, module/zfs metaslab.c

looping in metaslab_block_picker impacts performance on fragmented pools

On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).


    [8 lines not shown]

ZFS on Linux/src 9c7da9ainclude zfs_namecheck.h, lib/libzfs libzfs_dataset.c

Restrict filesystem creation if name referred either '.' or '..'

This change restricts filesystem creation if the given name
contains either '.' or '..'

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Signed-off-by: TulsiJain <tulsi.jain at delphix.com>
Closes #8842 
Closes #8564 

ZFS on Linux/src 3475724module/zfs vdev_removal.c

ztest: dmu_tx_assign() gets ENOSPC in spa_vdev_remove_thread()

When running zloop, we occasionally see the following crash:

    dmu_tx_assign(tx, TXG_WAIT) == 0 (0x1c == 0)
    ASSERT at 
../../module/zfs/vdev_removal.c:1507:spa_vdev_remove_thread()/sbin/ztest(+0x89c3)[0x55faf567b9c3]


The error value 0x1c is ENOSPC.

The transaction used by spa_vdev_remove_thread() should not be able to
fail due to being out of space. i.e. we should not call
dmu_tx_hold_space().  This will allow the removal thread to schedule its
work even when the pool is low on space.  The "slop space" will provide
enough free space to sync out the txg.

Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens at delphix.com>
External-issue: DLPX-37853
Closes #8889 

ZFS on Linux/src daddbdcmodule/zfs zfs_sysfs.c

Fix lockdep warning on insmod

sysfs_attr_init() is required to make lockdep happy for dynamically
allocated sysfs attributes. This fixed #8868 on Fedora 29 running
kernel-debug.

This requirement was introduced in 2.6.34.
See include/linux/sysfs.h for what it actually does.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro at gmail.com>
Closes #8868 
Closes #8884 

ZFS on Linux/src d9b4bf0include/sys zap.h, man/man5 zfs-module-parameters.5

fat zap should prefetch when iterating

When iterating over a ZAP object, we're almost always certain to iterate
over the entire object. If there are multiple leaf blocks, we can
realize a performance win by issuing reads for all the leaf blocks in
parallel when the iteration begins.

For example, if we have 10,000 snapshots, "zfs destroy -nv
pool/fs at 1%9999" can take 30 minutes when the cache is cold. This change
provides a >3x performance improvement, by issuing the reads for all ~64
blocks of each ZAP object in parallel.

Reviewed-by: Andreas Dilger <andreas.dilger at whamcloud.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens at delphix.com>
External-issue: DLPX-58347
Closes #8862 

ZFS on Linux/src d9cd66emodule/zfs arc.c

Target ARC size can get reduced to arc_c_min

Sometimes the target ARC size is reduced to arc_c_min, which impacts
performance.  We've seen this happen as part of the random_reads
performance regression test, where the ARC size is reduced before the
reads test starts which impacts how long it takes for system to reach
good IOPS performance.

We call arc_reduce_target_size when arc_reap_cb_check() returns TRUE,
and arc_available_memory() is less than arc_c>>arc_shrink_shift.

However, arc_available_memory() could easily be low, even when arc_c is
low, because we can have tons of unused bufs in the abd kmem cache. This
would be especially true just after the DMU requests a bunch of stuff be
evicted from the ARC (e.g. due to "zpool export").

To fix this, the ARC should reduce arc_c by the requested amount, not
all the way down to arc_size (or arc_c_min), which can be very small.

Reviewed-by: Tim Chase <tim at chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens at delphix.com>
External-issue: DLPX-59431
Closes #8864 
DeltaFile
+0-2module/zfs/arc.c
+0-21 files

ZFS on Linux/src 10269e0module/zfs vdev_raidz_math.c

Fix typo in vdev_raidz_math.c

Fix typo in vdev_raidz_math.c

Reviewed by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Brad Forschinger <github at bnjf.id.au>
Closes #8875 
Closes #8880 

ZFS on Linux/src 7288881module/zfs arc.c

Fix comparison signedness in arc_is_overflowing()

When ARC size is very small, aggsum_lower_bound(&arc_size) may return
negative values, that due to unsigned comparison caused delays, waiting
for arc_adjust() to "fix" it by calling aggsum_value(&arc_size).  Use
of signed comparison there fixes the problem.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Closes #8873
DeltaFile
+2-2module/zfs/arc.c
+2-21 files

ZFS on Linux/src 581c77emodule/zfs dmu_recv.c

Fix incorrect error message for raw receive

This patch fixes an incorrect error message that comes up when
doing a non-forcing, raw, incremental receive into a dataset
that has a newer snapshot than the "from" snapshot. In this
case, the current code prints a confusing message about an IVset
guid mismatch.

This functionality is supported by non-raw receives as an
undocumented feature, but was never supported by the raw receive
code. If this is desired in the future, we can probably figure
out a way to make it work.

Reviewed by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Matthew Ahrens <mahrens at delphix.com>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Issue #8758
Closes #8863

ZFS on Linux/src ba505f9cmd/arc_summary Makefile.am

arc_summary: prefer python3 version and install when there is no python

This matches the behavior of other python scripts, such as arcstat and
dbufstat, which are always installed but whose install-exec-hook actions
will simply touch up the shebang if a python interpreter was configured
*and* that interpreter is a python2 interpreter.

Fixes installation in a minimal build chroot without python available.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ryan Moeller <ryan at freqlabs.com>
Signed-off-by: Eli Schwartz <eschwartz at archlinux.org>
Closes #8851

ZFS on Linux/src eaa21b2scripts kmodtool

Fix %post and %postun generation in kmodtool

During zfs-kmod RPM build, $(uname -r) gets unintentionally evaluated on
the build host, once and for all. It should be evaluated during the
execution of the scriptlets on the installation host. Escaping the $
character avoids evaluating it during build.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Neal Gompa <ngompa at datto.com>
Signed-off-by: Samuel Verschelde <stormi-xcp at ylix.fr>
Closes #8866
DeltaFile
+2-2scripts/kmodtool
+2-21 files

ZFS on Linux/src 5662fd5include/sys abd.h, module/zfs abd.c arc.c

single-chunk scatter ABDs can be treated as linear

Scatter ABD's are allocated from a number of pages.  In contrast to
linear ABD's, these pages are disjoint in the kernel's virtual address
space, so they can't be accessed as a contiguous buffer.  Therefore
routines that need a linear buffer (e.g. abd_borrow_buf() and friends)
must allocate a separate linear buffer (with zio_buf_alloc()), and copy
the contents of the pages to/from the linear buffer.  This can have a
measurable performance overhead on some workloads.

https://github.com/zfsonlinux/zfs/commit/87c25d567fb7969b44c7d8af63990e
("abd_alloc should use scatter for >1K allocations") increased the use
of scatter ABD's, specifically switching 1.5K through 4K (inclusive)
buffers from linear to scatter.  For workloads that access blocks whose
compressed sizes are in this range, that commit introduced an additional
copy into the read code path.  For example, the
sequential_reads_arc_cached tests in the test suite were reduced by
around 5% (this is doing reads of 8K-logical blocks, compressed to 3K,
which are cached in the ARC).

This commit treats single-chunk scattered buffers as linear buffers,
because they are contiguous in the kernel's virtual address space.

All single-page (4K) ABD's can be represented this way.  Some multi-page
ABD's can also be represented this way, if we were able to allocate a

    [20 lines not shown]

ZFS on Linux/src b873825include/sys zil_impl.h zil.h, man/man5 zfs-module-parameters.5

make zil max block size tunable

We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations.  The large allocations are for ZIL blocks.  If there is a
lot of fragmentation, the large allocations can be hard to satisfy.

The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more.  In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.

To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.

External-issue: DLPX-61719
Reviewed-by: George Wilson <george.wilson at delphix.com>
Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens at delphix.com>
Closes #8865 

ZFS on Linux/src 5a902f5module/zfs arc.c

Fix comparison signedness in arc_is_overflowing()

When ARC size is very small, aggsum_lower_bound(&arc_size) may return
negative values, that due to unsigned comparison caused delays, waiting
for arc_adjust() to "fix" it by calling aggsum_value(&arc_size).  Use
of signed comparison there fixes the problem.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Closes #8873 
DeltaFile
+2-2module/zfs/arc.c
+2-21 files

ZFS on Linux/src c08c30emodule/zfs dmu_recv.c

Fix incorrect error message for raw receive

This patch fixes an incorrect error message that comes up when
doing a non-forcing, raw, incremental receive into a dataset
that has a newer snapshot than the "from" snapshot. In this
case, the current code prints a confusing message about an IVset
guid mismatch.

This functionality is supported by non-raw receives as an
undocumented feature, but was never supported by the raw receive
code. If this is desired in the future, we can probably figure
out a way to make it work.

Reviewed by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Matthew Ahrens <mahrens at delphix.com>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Issue #8758 
Closes #8863 

ZFS on Linux/src cfc16f8tests/zfs-tests/include blkdev.shlib

Improve ZTS block_device_wait debugging

The udevadm settle timeout can be 120 or 180 seconds by default
for some distributions. If a long delay is experienced, it could
be due to some strangeness in a malfunctioning device that isn't
related to the devices under test. To help debug this condition,
a notice is given if settle takes too long.

Arguments can now be passed to block_device_wait. The expected
arguments are block device pathnames.

Reviewed by: John Kennedy <john.kennedy at delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling at RichardElling.com>
Closes #8839

ZFS on Linux/src 4cb1b54tests/zfs-tests/include blkdev.shlib, tests/zfs-tests/tests/functional/rsend send-wDR_encrypted_zvol.ksh

Block_device_wait does not return an error code

Reviewed by: John Kennedy <john.kennedy at delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling at RichardElling.com>
Closes #8839

ZFS on Linux/src bef70aftests/zfs-tests/include libtest.shlib

Remove redundant redundant remove

Reviewed by: John Kennedy <john.kennedy at delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling at RichardElling.com>
Closes #8839

ZFS on Linux/src 2ff615btests/zfs-tests/include libtest.shlib

Fix logic error in setpartition function

Reviewed by: John Kennedy <john.kennedy at delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling at RichardElling.com>
Closes #8839

ZFS on Linux/src 215e4fecmd/arc_summary Makefile.am

arc_summary: prefer python3 version and install when there is no python

This matches the behavior of other python scripts, such as arcstat and
dbufstat, which are always installed but whose install-exec-hook actions
will simply touch up the shebang if a python interpreter was configured
*and* that interpreter is a python2 interpreter.

Fixes installation in a minimal build chroot without python available.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Ryan Moeller <ryan at freqlabs.com>
Signed-off-by: Eli Schwartz <eschwartz at archlinux.org>
Closes #8851 

ZFS on Linux/src 01d1e88scripts kmodtool

Fix %post and %postun generation in kmodtool

During zfs-kmod RPM build, $(uname -r) gets unintentionally evaluated on
the build host, once and for all. It should be evaluated during the
execution of the scriptlets on the installation host. Escaping the $
character avoids evaluating it during build.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Neal Gompa <ngompa at datto.com>
Signed-off-by: Samuel Verschelde <stormi-xcp at ylix.fr>
Closes #8866 
DeltaFile
+2-2scripts/kmodtool
+2-21 files

ZFS on Linux/src d6920fbcmd Makefile.am, config always-python.m4 always-pyzfs.m4

Make Python detection optional and more portable

Previously, --without-python would cause ./configure to fail. Now it is
able to proceed, and the Python scripts will not be built.

Use portable parameter expansion matching instead of nonstandard
substring matching to detect the Python version.  This test is
duplicated in several places, so define a function for it.

Don't assume the full path to binaries, since different platforms do
install things in different places.  Use AC_CHECK_PROGS instead.

When building without Python, also build without pyzfs.

Sponsored by: iXsystems, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Richard Laager <rlaager at wiktel.com>
Reviewed-by: Eli Schwartz <eschwartz93 at gmail.com>
Signed-off-by: Ryan Moeller <ryan at freqlabs.com>
Closes #8809 
Closes #8731 

ZFS on Linux/src 9fd95a2contrib/initramfs/scripts zfs.in

If $ZFS_BOOTFS contains guid, replace the guid portion with $pool

Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Richard Laager <rlaager at wiktel.com>
Signed-off-by: Garrett Fields <ghfields at gmail.com>
Closes #8356 

ZFS on Linux/src 5802560module/zfs dmu.c

Fix integer overflow in get_next_chunk()

dn->dn_datablksz type is uint32_t and need to be casted to uint64_t
to avoid an overflow when the record size is greater than 4 MiB.

Reviewed-by: Tom Caputi <tcaputi at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Olivier Mazouffre <olivier.mazouffre at ims-bordeaux.fr>
Closes #8778 
Closes #8797 
DeltaFile
+2-2module/zfs/dmu.c
+2-21 files

ZFS on Linux/src 0c6206etests/test-runner/bin test-runner.py

test-runner.py: change shebang to python3

In commit 6e72a5b9b61066146deafda39ab8158c559f5f15 python scripts which
work with python2 and python3 changed the shebang from /usr/bin/python
to /usr/bin/python3. This gets adapted by the build-system on systems
which don't provide python3.
This commit changes test-runner.py to also use /usr/bin/python3,
enabling the change during buildtime and fixing a minor lintian issue
for those Debian packages, which depend on a specific python version
(python3/python2).

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Reviewed by: John Kennedy <john.kennedy at delphix.com>
Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
Closes #8803

ZFS on Linux/src 94866d8tests/runfiles linux.run, tests/zfs-tests/tests/functional/link_count link_count_root_inode.ksh Makefile.am

Add link count test for root inode

Add tests for
97aa3ba44("Fix link count of root inode when snapdir is visible")
as suggested in #8727.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro at osnexus.com>
Closes #8732 

ZFS on Linux/src a0bf249config kernel.m4

Allow TRIM_UNUSED_KSYM when build as a builtin-module

If ZFS is built with enable_linux_builtin, it seems to be possible
to compile the kernel with TRIM_UNUSED_KSYM.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Torsten Wörtwein <twoertwein at gmail.com>
Closes #8820 
DeltaFile
+3-2config/kernel.m4
+3-21 files

ZFS on Linux/src 58b2de6module/zfs bqueue.c

Wait in 'S' state when send/recv pipe is blocking

Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: DHE <git at dehacked.net>
Closes #8733 
Closes #8752 

ZFS on Linux/src 27b446ftests/zfs-tests/tests/functional/alloc_class Makefile.am, tests/zfs-tests/tests/functional/cli_root/zpool_import zpool_import.kshlib

tests: fix cosmetic permission issues during `make install`

files in dist_*_SCRIPTS get installed with 0755, those in dist_*_DATA
with 0644. This commit moves all .kshlib, .shlib and .cfg files in the
testsuite to dist_pkgdata_DATA, and removes the shebang from
zpool_import.kshlib.

This ensures that the files are installed with appropriate permissions
and silences some warnings from lintian

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Reviewed by: John Kennedy <john.kennedy at delphix.com>
Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
Closes #8803

ZFS on Linux/src 4f8eef2module/zfs dmu.c

Revert "Report holes when there are only metadata changes"

This reverts commit ec4f9b8f30 which introduced a narrow race which
can lead to lseek(, SEEK_DATA) incorrectly returning ENXIO.  Resolve
the issue by revering this change to restore the previous behavior
which depends solely on checking the dirty list.

Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8816 
Closes #8834 
DeltaFile
+3-28module/zfs/dmu.c
+3-281 files

ZFS on Linux/src 5108d27tests/zfs-tests/tests/functional/hkdf Makefile.am

hkdf_test binary should only have one icp instance

The build for test binary hkdf_test was linking both against libicp 
and libzpool. This results in two instances of libicp inside the 
binary but the call to icp_init() only initializes one of them!

Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Don Brady <don.brady at delphix.com>
Closes #8850 

ZFS on Linux/src 11ad06dmodule/zfs dsl_scan.c

Make zfs_async_block_max_blocks handle zero correctly

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: TulsiJain <tulsi.jain at delphix.com>
Closes #8829
Closes #8289

ZFS on Linux/src 02010e9man/man1 raidz_test.1

Fixed a small typo in man/man1/raidz_test.1

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Chris Dunlop <chris at onthe.net.au>
Signed-off-by: Peter Wirdemo <peter.wirdemo at gmail.com>
Closes #8855 

ZFS on Linux/src 51de7cccmd/zfs zfs_main.c, cmd/zpool zpool_main.c

Endless loop in zpool_do_remove() on platforms with unsigned char

On systems where "char" is an unsigned type the value returned by
getopt() will never be negative (-1), leading to an endless loop:
this issue prevents both 'zpool remove' and 'zstreamdump' for
working on some systems.

Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Chris Dunlop <chris at onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8789 

ZFS on Linux/src 8dc8bbdmodule/zfs dnode_sync.c

Reinstate raw receive check when truncating

This patch re-adds a check that was removed in 369aa50. The check
confirms that a raw receive is not occuring before truncating an
object's dn_maxblkid. At the time, it was believed that all cases
that would hit this code path would be handled in other places,
but that was not the case.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Closes #8852 
Closes #8857 

ZFS on Linux/src 35050efmodule/zfs zfs_znode.c

Fix integer overflow of ZTOI(zp)->i_generation

The ZFS on-disk format stores each inode's generation ID as a 64
bit number on disk and in-core. However, the Linux kernel's inode
is only a 32 bit number. In most places, the code handles this
correctly, but the cast is missing in zfs_rezget(). For many pools,
this isn't an issue since the generation ID is computed as the
current txg when the inode is created and many pools don't have
more than 2^32 txgs.

For the pools that have more txgs, this issue causes any inode with
a high enough generation number to report IO errors after a call to
"zfs rollback" while holding the file or directory open. This patch
simply adds the missing cast.

Reviewed-by: Alek Pinchuk <apinchuk at datto.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Closes #8858 

ZFS on Linux/src aaf3b30module/zfs zfs_ioctl.c, tests/zfs-tests/tests/functional/cli_root/zpool_create zpool_create_encrypted.ksh

Double-free of encryption wrapping key due to invalid pool properties

This commits fixes a double-free in zfs_ioc_pool_create() triggered by
specifying an unsupported combination of properties when creating a pool
with encryption enabled.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tom Caputi <tcaputi at datto.com>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8791 

ZFS on Linux/src a1eaf0dmodule/zfs vdev.c

Exclude log device ashift from normal class

When opening a log device during import its allocation bias will
not yet have been set by vdev_load().  This results in the log
device's ashift being incorrectly applied to the maximum ashift
of the vdevs in the normal class.  Which in turn prevents the
removal of any top-level devices due to the ashift check in the
spa_vdev_remove_top_check() function.

This issue is resolved by including vdev_islog in the check since
it will be set correctly during vdev_open().

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8735 
DeltaFile
+1-4module/zfs/vdev.c
+1-41 files

ZFS on Linux/src ad0157elib/libzfs libzfs_dataset.c

zfs: don't pretty-print objsetid property

The objsetid property, while being stored as a number, is a dataset
identifier and should not be pretty-printed.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Chris Dunlop <chris at onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8784 

ZFS on Linux/src cd75d5fcmd/zfs zfs_main.c

zfs: missing newline character in zfs_do_channel_program() error message

This commit simply adds a missing newline ("\n") character to the error
message printed by the zfs command when the provided pool parameter
can't be found.

Reviewed-by: Chris Dunlop <chris at onthe.net.au>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8783 
DeltaFile
+2-1cmd/zfs/zfs_main.c
+2-11 files

ZFS on Linux/src a727f69config kernel-shrink.m4

Linux 5.2 compat: Fix config/kernel-shrink.m4 test failure

"whether ->count_objects callback exists" test failed with
"error: error" message for using an incomplete function shrinker_cb().

This is caused by torvalds/linux at 83da1bed86. It's configurable,
but we would want to be able to compile with default kbuild setting.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro at osnexus.com>
Closes #8776 

ZFS on Linux/src e2e7b0atests/zfs-tests/tests/functional/reservation reservation_003_pos.ksh reservation_003_pos.sh

Rename reservation tests from *.sh to *.ksh

Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Igor Kozhukhov <igor at dilos.org>
Closes #8729 

ZFS on Linux/src e0b3689. Makefile.am, tests/zfs-tests/include zpool_script.shlib

zfs-tests: fix warnings when packaging some .shlib files

This change prevents the following warning when packaging some zfs-tests
files:

   *** WARNING: ./usr/src/zfs-0.8.0/tests/zfs-tests/include/zpool_script.shlib
   is executable but has empty or no shebang, removing executable bit

Reviewed by: John Kennedy <john.kennedy at delphix.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Signed-off-by: loli10K <ezomori.nozomu at gmail.com>
Closes #8787