← Back to the Blog

How to repair bad sectors in OpenBSD (Part II)

fixing bad sectors on /var filesystem

 

In a part I of write-up we were able to owerwrite corrupted data blocks using dd command, and mark partition clear by running fdisk on it,
now we will be moving a whole partition to the new place on disk.

 

issue clarification:

 

Ok, very common sitiation for a physical host running on a non ‘self-healing’ filesystem (like UFS) After a recent power outage, /var/log/messages is overflooded with following errors:

Apr 11 23:20:54 odin last message repeated 3 times
Apr 11 23:20:54 odin /bsd: wd0e: uncorrectable data error reading fsbn 7424 of 7420-7451 (wd0 bn 15209824; cn 946 tn 195 sn 49), retrying
Apr 11 23:20:54 odin /bsd: wd0e: uncorrectable data error reading fsbn 7424 of 7420-7451 (wd0 bn 15209824; cn 946 tn 195 sn 49)
Apr 12 00:10:23 odin /bsd: wd0e: uncorrectable data error reading fsbn 7420 of 7420-7451 (wd0 bn 15209820; cn 946 tn 195 sn 45), retrying
Apr 12 00:10:53 odin last message repeated 3 times

Let’s see, who is lucky enough to be /dev/wd0e ?

mount |grep wd0e 
/dev/wd0e on /var type ffs (local, nodev, nosuid)

Ok, issue is rather critical, as most of the services are using /var mount point for I/O operations most of a system databases/cache and log files are living there too


 

solution :

 

try to run fsck first:

  1. stop/kill services which survived,
  2. reboot into single user mode and run fsck -y on wd0e

Then, if did not help and issue persist ..

 

Plan B:

 

Assuming we have some space available on a HDD not used by any partitions: High level action plan:

  • we will add a new partition,
  • create a new file system on it
  • copy over data from corrupted /var,
  • and remount /var
  • retire affected partition

reference: openbsd faq

as root

disklabel wd0 
# /dev/rwd0c:
type: ESDI
disk: ESDI/IDE disk
label: IC35L090AVV207-0
duid: 327813c700099302
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 9726
total sectors: 156250000
boundstart: 64
boundend: 156248190
drivedata: 0 

16 partitions:
#                size           offset  fstype [fsize bsize  cpg]
  a:          2097152               64  4.2BSD   2048 16384    1 # /
  b:          4716576          2097216    swap                   # none
  c:        156250000                0  unused                   
  d:          4206798          6813792  4.2BSD   2048 16384    1 # /tmp
  e:          4204120         15202400  4.2BSD   2048 16384    1 # /var
  f:         20974656         31975584  4.2BSD   2048 16384    1 # /usr
  g:         82847168         73400992  4.2BSD   2048 16384    1 # /d01
  k:          8394233         65006752  4.2BSD   2048 16384    1 # /home
 
using disklabel to create a spare mountpoint:
disklabel -E wd0
a l ## means add a new partition with a logical name "l" on it  
3G  ## 3GB should be enough 
q   ## exits saving changes
 
create new filesystem using newfs command
newfs wd0l
cp -p /etc/fstab /etc/fstab.bad
echo "27813c700099302.l /mnt/newvar ffs rw,nodev,nosuid 1 2" >>/etc/fstab 
mount /mnt/newvar
  • stop all services on the machine:
  • highlevel copy of /var/* /mnt/newvar/
  • record all the error messages from the copy process and try to resolve them manually

for example:

cp: /var/www/opt/otrs/scripts/test/sample/ImportExport/
ImportExportFormatCSV002-MSExcel-Semicolon.csv: Input/output error
 
remount /var form a new filsystem
umount -f /dev/wd0l
umount -f /dev/wd0e 
cat /etc/fstab.bad |grep -v wd0e | sed -e 's:newvar:var:g' > /etc/fstab
mount /var && mount |grep var 
/dev/wd0l on /var type ffs (local, nodev, nosuid)
  • restart services
  • reboot
  • check the logs
  • (optional) using disklabel salvage old /var
disklabel -E wd0
>d e 
>q

ALLSET !!


Be the first to reply