Question on ZFS

Wed Feb 14 00:21:27 UTC 2024

On Tue, Feb 13, 2024 at 05:03:36PM -0700, Bob Proulx wrote:
> Phil Marsh wrote:
> > Hi Bob, All,
> > I was wondering. Do you recommend using an SSD cache for ZFS, i.e. an L2ARC
> > cache?
>
> I was rather hoping Zak would have jumped in with a response on this
> one as I know Zak is running several large high performance arrays.
> But not having heard anything I will try to muddle through. :-)

Sorry!  I was out in the mountains when this came through and meant to
respond but then got lost in to-do items when I got back.

I typically use arrays that are comprised of several vdevs, each vdev
being a raidz1 with a total of 3 disks.     Lately these disks have
been Micron 7+ TB SSDs (SATA).   So a typical zpool status is
something like this:

  pool: poolajax2
 state: ONLINE
  scan: scrub repaired 0 in 6h3m with 0 errors on Sun Feb 11 07:03:05 2024
config:

        NAME                                                STATE     READ WRITE CKSUM
        poolajax2                                           ONLINE       0     0     0
          raidz1-0                                          ONLINE       0     0     0
            ata-Micron_5210_MTFDDAK7T6QDE_1951258907D       ONLINE       0     0     0
            ata-Micron_5210_MTFDDAK7T6QDE_19512589194       ONLINE       0     0     0
            ata-Micron_5210_MTFDDAK7T6QDE_1951258908E       ONLINE       0     0     0
          raidz1-2                                          ONLINE       0     0     0
            ata-Micron_5300_MTFDDAK7T6TDS_22153919723       ONLINE       0     0     0
            ata-Micron_5300_MTFDDAK7T6TDS_22133A91C46       ONLINE       0     0     0
            ata-Micron_5300_MTFDDAK7T6TDS_22133A91C42       ONLINE       0     0     0

These SSD's are large but not exceptionally fast.  Splitting them up
like this allows more iops and good overall gross throughput (ie it
scales pretty much with the number of non-redundant drives -- in this
case 4* see below.)

on  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ajax           126G   142  99 891682  99 484110  80   419  99 1703975  90 15249 353
Latency             69761us   12634us     154ms   28101us   59982us   22794us

But I also use an L2arc ("cache") on an Optane NVME,

        logs
          nvme-INTEL_SSDPED1D280GA_PHMB7515005680CGN-part1  ONLINE       0     0     0
        cache
          nvme-INTEL_SSDPED1D280GA_PHMB7515006280CGN-part2  ONLINE       0     0     0

I am not sure if the log makes a performance difference (ie, I have
not measured it), but the cache definitely does.  If I prefetch most
of my working set when I boot, the response time is much faster
loading from nvme than from the ssd array.  e.g. a prefetch script:

#!/bin/sh
cd
find . -type f -size -10048576c | while read a; do
    dd if="$a" of=/dev/null bs=1M  > /dev/null 2>&1
    echo -n "."
done
echo

Hope this helps
Zak

--
Zak Smith
307-543-7820 office
Please do not send private or confidential information via email.