Can't backup on two different machines when VOLUME_SIZE_LIMIT is enabled

13
open
israsanc
israsanc
Posted 9 months ago

Can't backup on two different machines when VOLUME_SIZE_LIMIT is enabled #81

When I try to use the same timemachine volume on a second MacBook Pro it fails and log shows

timemachine | fruit_tmsize_do_dirent: tmsize potential overflow: bandsize [67108864] nbands [1459] timemachine | sys_disk_free: VFS disk_free failed. Error was : No error information

This doesn't occur for the first backup/machine and doesn't occur if I don't set VOLUME_SIZE_LIMIT

mbentley
mbentley
Created 9 months ago

What is the full Docker command to start the time machine? How much disk space is being used on the Docker host where your persistent data is?

I know it works with two macs as I back up two myself but let's narrow down the potential issues.

israsanc
israsanc
Created 9 months ago

This is my service definition in the docker-compose yaml:

  timemachine:
    container_name: timemachine
    image: mbentley/timemachine:smb-armv7l
    hostname: timemachine
    domainname: {my_domain}
    mac_address: {random_mac_address}
    networks:
      macvlan:
        ipv4_address: {local_ip}
    environment:
      - CUSTOM_SMB_CONF=false
      - CUSTOM_USER=false
      - DEBUG_LEVEL=1
      - MIMIC_MODEL=TimeCapsule8,119
      - EXTERNAL_CONF=
      - HIDE_SHARES=no
      - TM_USERNAME=timemachine
      - TM_GROUPNAME=timemachine
      - TM_UID=1000
      - TM_GID=1000
      - PASSWORD={my_password}
      - SET_PERMISSIONS=false
      - SHARE_NAME=TimeMachine
      - SMB_INHERIT_PERMISSIONS=no
      - SMB_NFS_ACES=yes
      - SMB_METADATA=stream
      - SMB_PORT=445
      - SMB_VFS_OBJECTS=acl_xattr fruit streams_xattr
      - VOLUME_SIZE_LIMIT=1 T
      - WORKGROUP=WORKGROUP
    volumes:
      - ./timemachine-opt-timemachine:/opt/timemachine
      - ./timemachine-var-lib-samba:/var/lib/samba
      - ./timemachine-var-cache-samba:/var/cache/samba
      - ./timemachine-run-samba:/run/samba
    ports:
      - 137:137/udp
      - 138:138/udp
      - 139:139
      - 445:445
    restart: unless-stopped

I'm using macvlan driver to avoid conflicts with avahi, and my filesystem is btrfs. My current backup uses only 92G (du says).

mbentley
mbentley
Created 9 months ago

I think what you're hitting is related to what is being seen or at least was tempted to be worked around here: https://gitlab.com/artmg/samba/-/commit/b1714dbf74035550ff30494858e3d879c8d46003

Taking a look a the comment message in the diff:

	/*
	 * Arithmetic on 32-bit systems may cause overflow, depending on
	 * size_t precision. First we check its unlikely, then we
	 * force the precision into target off_t, then we check that
	 * the total did not overflow either.
	 */

Which would be 97911832576 and that converted to GiB (which is what it is measuring against, not GB) is 91.1875 GiB which matches what you're seeing on disk via du. I am not much of a programmer and I don't have experience in C so I am not exactly sure what it is doing but it just seems to be failing on https://gitlab.com/samba-team/samba/-/blob/b0ba7cd4f96a6ea227943cb05ef51a463e292b2d/source3/modules/vfs_fruit.c#L4995-4999

Based on the output you provided: bandsize [67108864] nbands [1459]

And then looking at the if statement's math: bandsize > SIZE_MAX/nbands

The actual math (I believe) should be:

67108864 > 1099511627776 / 1459
67108864 > 753606324

Which should return false so it should never drop into that loop and output the message you're seeing if it wasn't overflowing as warned.

That seems odd to me. Could you get the contents of the smb.conf that is generated inside your container? For example, mine:

# docker exec -it timemachine cat /etc/samba/smb.conf
[global]
   access based share enum = no
   hide unreadable = no
   inherit permissions = no
   load printers = no
   log file = /var/log/samba/log.%m
   logging = file
   max log size = 1000
   security = user
   server min protocol = SMB2
   server role = standalone server
   smb ports = 445
   workgroup = WORKGROUP
   vfs objects = acl_xattr fruit streams_xattr
   fruit:aapl = yes
   fruit:nfs_aces = yes
   fruit:model = TimeCapsule8,119
   fruit:metadata = stream
   fruit:veto_appledouble = no
   fruit:posix_rename = yes
   fruit:wipe_intentionally_left_blank_rfork = yes
   fruit:delete_empty_adfiles = yes

[TimeMachine]
   path = /opt/timemachine
   inherit permissions = no
   read only = no
   valid users = timemachine
   vfs objects = acl_xattr fruit streams_xattr
   fruit:time machine = yes
   fruit:time machine max size = 2 T

I want to make sure that it is setting fruit:time machine max size as expected.

israsanc
israsanc
Created 9 months ago

Thank you for your help. Using du without the human-readable switch says 95520632.

It seems you've found a good clue to follow. I'll investigate this myself as well.

My current smb.conf:

[global]
   access based share enum = no
   hide unreadable = no
   inherit permissions = no
   load printers = no
   log file = /var/log/samba/log.%m
   logging = file
   max log size = 1000
   security = user
   server min protocol = SMB2
   server role = standalone server
   smb ports = 445
   workgroup = WORKGROUP
   vfs objects = acl_xattr fruit streams_xattr
   fruit:aapl = yes
   fruit:nfs_aces = yes
   fruit:model = TimeCapsule8,119
   fruit:metadata = stream
   fruit:veto_appledouble = no
   fruit:posix_rename = yes
   fruit:wipe_intentionally_left_blank_rfork = yes
   fruit:delete_empty_adfiles = yes

[TimeMachine]
   path = /opt/timemachine
   inherit permissions = no
   read only = no
   valid users = timemachine
   vfs objects = acl_xattr fruit streams_xattr
   fruit:time machine = yes
   fruit:time machine max size = 1 T
mbentley
mbentley
Created 9 months ago

Hmm yeah, it seems to be setting it correctly. I previously recall some strange compose behaviors with values that include spaces but on first glance, I see nothing that could be impacted here. I almost never use compose just due to how often I find myself fighting syntax issues instead of the actual problem I am solving so my memory there is a big fuzzy.

xrvo
xrvo
Created 9 months ago

I have the same issue: I get the "tmsize potential overflow" error in the logs.

I'm using the armv7l docker image with VOLUME_SIZE_LIMIT = 500G

I did also independently trace the issue down to the same issue in the samba repository that @mbentley pointed out. Samba had a fix applied on Mar. 3, 2020, and it is apparent the change is in there on the installed version since the diff shows the error message string changing from tmsize overflow to tmsize potential overflow. The issue persists however.

mbentley
mbentley
Created 9 months ago

From what I can tell from the code in this commit, it would exit the function due to the return false; so it never hits the modifications made in tm_size = (off_t)bandsize * (off_t)nbands;. I am not sure if that is the intent - the change in the output makes it sound like it should be reporting a potential overflow but maybe doing some further check but I might just be misunderstanding because when looking at the original implementation here, it mentions it can't check for multiplication overflow on performing multiplication. I don't know enough about what exactly it is doing and why to understand and bring it up to someone who does know exactly.

hollie
hollie
Created 8 months ago

I can confirm I am hitting the same issue running the armv7l docker image with VOLUME_SIZE_LIMIT set to 1 T.

The error in the log is:

fruit_tmsize_do_dirent: tmsize potential overflow: bandsize [67108864] nbands [6372]
sys_disk_free: VFS disk_free failed. Error was : No error information

This error also prevents other clients to make a connection via Samba, you can mount the share but when you start browsing it via Finder is results in an 'network share is temporarily unavailable error'. (Might not be the exact error in English, it is translated from my local language).

My current workaround is to remove the VOLUME_SIZE_LIMIT parameter from the configuration when starting the docker container. Then all is working as expected.

mbentley
mbentley
Created 7 months ago

Looking at another image available, there might be another way to apply a limit: https://github.com/awlx/samba-timemachine/blob/main/entrypoint#L37

I'll have to look into the use of a .com.apple.TimeMachine.quota.plist file as an alternative.

bugsyb
bugsyb
Created 1 week ago

Hi @mbentley ,

Smells am running into very same issue:

fruit_tmsize_do_dirent: tmsize potential overflow: bandsize [8388608] nbands [2805]
sys_disk_free: VFS disk_free failed. Error was : Argument list too long

Limit is set to 1T too and it is during initial copy (migration) of existing time machine disk. Sparse initialize by adding new disk, once it started, cancelled, mounted the sparse "disk image" and started to copy over the source from HDD (time machine).

Plenty of these messages pop up continuously.

Environment - it's aarch64 with alpine:latest as of today (PRETTY_NAME="Alpine Linux v3.14"). HW side is:

model name	: Amlogic S922X rev a
Hardware	: Hardkernel ODROID-N2
Revision	: 0400

Most probably it pulled armv7 and not armv8 as for other images unless I've forced it by arm64v8/alpine:latest then it was using armv7. Not sure anymore how to check on existing container.

Would you be able to assist how to overcome the problem or what might be consequences of leaving it like this? I wouldn't like to play with backup if something is odd on underlying fs.

Removing the quote doesn't seem to be a good idea here as it is same FS (ext4) which is used for other services hence quota needs to be enforced at software level. If not limited time machine will happily eat all space, won't it?

Previous