Geo-replication shows faulty due to "stale file handle" in gsyncd.py
Issue
Volume geo-replication status is intermittently switching between faulty and active.
Looking at the log_file for the geo-replication session we see Python stack trace errors of the form:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1177, in service_loop
g2.crawlwrap()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 477, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1077, in crawl
self.process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 835, in process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 768, in process_change
st = lstat(go)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 432, in lstat
return os.lstat(e)
OSError: [Errno 116] Stale file handle: '.gfid/2035abcc-57f6-4e63-bda0-c0e203360412'
Environment
- Red Hat Storage Server 2.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.