Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#319820 - 27/02/2009 01:26 11-hour long fsck and still running...
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049
I'm 11.5 hours into a manual fsck on a 60GB drive. Quite a few duplicate/bad blocks so far. Dead or dying drive, right?

I'm doing this because a sync was interrupted when I tripped over the power supply and unplugged the empeg power. I had been having intermittent no hard drive detected errors on startup, but I had been attributing those to an IDE header that needs soldering.

Is there any situation where a properly functioning 60GB disk would take this long to complete fsck?

Thanks again, guys.

Jim

Top
#319829 - 27/02/2009 08:04 Re: 11-hour long fsck and still running... [Re: TigerJimmy]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31596
Loc: Seattle, WA
Good question. Mark, would cable or header trouble also cause this? My guess is yes it might.

Did you look at the cable and header before starting the FSCK ?
_________________________
Tony Fabris

Top
#319831 - 27/02/2009 09:04 Re: 11-hour long fsck and still running... [Re: tfabris]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
Originally Posted By: tfabris
My guess is yes it might.

Mine too...
Quote:
Did you look at the cable and header before starting the FSCK?

A very good question; an fsck that intermittently finds that it can't write to the drive is likely to do more harm than good.

Peter

Top
#319834 - 27/02/2009 12:10 Re: 11-hour long fsck and still running... [Re: TigerJimmy]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14491
Loc: Canada
Originally Posted By: TigerJimmy
I'm 11.5 hours into a manual fsck on a 60GB drive. Quite a few duplicate/bad blocks so far. Dead or dying drive, right?

Dunno. But as usual, a serial port log will tell the true story. No point in even speculating without first looking there.

11.5 hours is too long. Something else is wrong. If you are doing this from (J)emplode, then the app may simply have gotten confused.

Use the serial port and control^C, and kick off the fsck by hand from the command line.

-ml

Top
#319897 - 02/03/2009 15:42 Re: 11-hour long fsck and still running... [Re: mlord]
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049
This was a command line fsck. It finally finished after about 23 hours. The second drive completed in under 3 minutes. That rules out the interface, doesn't it? I'll post a serial port log shortly.

Thanks!

Top
#319902 - 02/03/2009 16:31 Re: 11-hour long fsck and still running... [Re: TigerJimmy]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31596
Loc: Seattle, WA
Quote:
That rules out the interface, doesn't it?


Hm, Mark would be able to answer this definitively, but I don't think so. I think that if some of the connections on the header or cable were bad, it might have the capability to cause errors on one drive but not on the other. Especially if the connection problems were on the cable connector that connects to the drive (just like pictured in the FAQ).
_________________________
Tony Fabris

Top
#320164 - 09/03/2009 14:09 Re: 11-hour long fsck and still running... [Re: mlord]
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049

Those are disk errors, right???

Thanks in advance,

Jim

empeg-car bootstrap v1.02 20001106 (hugo@empeg.com)
If there is anyone present who wants to upgrade the flash, let them speak now,
or forever hold their peace...it seems not. Let fly the Penguins of Linux!

e000 v1.04
Copying kernel...
Calling linux kernel...
Uncompressing Linux..................................... done, booting the kernel.

Linux version 2.2.17-rmk5-np17-empeg55-hijack-v508 (hijack@rtr.ca) (gcc version 2.95.3 20010315 (release)) #2 Fri Jan 9 16:06:35 EST 2009

Processor: Intel StrongARM-1100 revision 11

Checking for extra DRAM:

c1000000: wrote ffffffff, read e28cc001

NetWinder Floating Point Emulator V0.94.1 (c) 1998 Corel Computer Corp.

empeg-car player (hardware revision 9, serial number 40103176) 16MB DRAM

Command line: mem=16m

Calibrating delay loop... 207.67 BogoMIPS

Memory: 15000k/16M available (996k code, 20k reserved, 364k data, 4k init)

Dentry hash table entries: 2048 (order 2, 16k)

Buffer cache hash table entries: 16384 (order 4, 64k)

Page cache hash table entries: 4096 (order 2, 16k)

POSIX conformance testing by UNIFIX

Linux NET4.0 for Linux 2.2

Based upon Swansea University Computer Society NET3.039

NET4: Linux TCP/IP 1.0 for NET4.0

IP Protocols: ICMP, UDP, TCP

TCP: Hash tables configured (ehash 16384 bhash 16384)

IrDA (tm) Protocols for Linux-2.2 (Dag Brattli)

Starting kswapd v 1.5

SA1100 serial driver version 4.27 with no serial options enabled

ttyS00 at 0xf8010000 (irq = 15) is a SA1100 UART

ttyS01 at 0xf8050000 (irq = 17) is a SA1100 UART

ttyS02 at 0xf8030000 (irq = 16) is a SA1100 UART

Signature is 206f6972 'rio '

Tuner: loopback=0, ID=-1

show_message("Hijack v508 by Mark Lord")

empeg display initialised.

empeg dsp audio initialised

empeg dsp mixer initialised

empeg dsp initialised

empeg audio-in initialised, CS4231A revision a0

empeg remote control/panel button initialised.

empeg usb initialised, PDIUSBD12 id 1012

empeg state support initialised 0089/88c1 (save to d0004500).

empeg RDS driver initialised

empeg power-pic driver initialised (first boot)

RAM disk driver initialized: 16 RAM disks of 4096K size

empeg single channel IDE

Probing primary interface...

ide_data_test: wrote 0x0000 read 0xff80

ide_data_test: wrote 0xffff read 0xff80

ide_data_test: wrote 0xaaaa read 0xaa80

ide_data_test: wrote 0x5555 read 0x5580

ide_data_test: wrote 0x0000 read 0xff20

ide_data_test: wrote 0xffff read 0xff20

ide_data_test: wrote 0xaaaa read 0xaa20

ide_data_test: wrote 0x5555 read 0x5520

ide_data_test: wrote 0x0000 read 0xff20

ide_data_test: wrote 0xffff read 0xff20

ide_data_test: wrote 0xaaaa read 0xaa20

ide_data_test: wrote 0x5555 read 0xd720

hda: TOSHIBA MK8025GAS, ATA DISK drive

ide_data_test: wrote 0x0000 read 0xff00

ide_data_test: wrote 0xffff read 0xff00

ide_data_test: wrote 0xaaaa read 0xaa00

ide_data_test: wrote 0x5555 read 0x5500

hda: TOSHIBA MK8025GAS, ATA DISK drive

ide_data_test: wrote 0x0000 read 0xff00

ide_data_test: wrote 0xffff read 0xff00

ide_data_test: wrote 0xaaaa read 0xaa00

ide_data_test: wrote 0x5555 read 0x5500

hda: TOSHIBA MK8025GAS, ATA DISK drive

ide_data_test: wrote 0x0000 read 0xff00

ide_data_test: wrote 0xffff read 0xff00

ide_data_test: wrote 0xaaaa read 0xaa00

ide_data_test: wrote 0x5555 read 0xd500

hda: TOSHIBA MK8025GAS, ATA DISK drive

ide_data_test: wrote 0x0000 read 0xff00

ide_data_test: wrote 0xffff read 0xff00

ide_data_test: wrote 0xaaaa read 0xaa00

ide_data_test: wrote 0x5555 read 0x5500

hda: TOSHIBA MK8025GAS, ATA DISK drive

ide_data_test: wrote 0x0000 read 0xff00

ide_data_test: wrote 0xffff read 0xff00

ide_data_test: wrote 0xaaaa read 0xaa00

ide_data_test: wrote 0x5555 read 0x5500

hda: TOSHIBA MK8025GAS, ATA DISK drive

ide0 at 0x000-0x007,0x038 on irq 6

hda: TOSHIBA MK8025GAS, 76319MB w/0kB Cache, CHS=9729/255/63

empeg-flash driver initialized

smc chip id/revision 0x3349

smc9194.c:v0.12 03/06/96 by Erik Stahlman (erik@vt.edu)


SMC9194: SMC91C94(r:9) at 0x4008000 IRQ:7 INTF:TP MEM:6144b MAC 00:02:d7:28:0c:68

Partition check:

hda: unknown partition table

RAMDISK: ext2 filesystem found at block 0

RAMDISK: Loading 320 blocks [1 disk] into ram disk... |/-\|/-\|/-\|/-\|/-\done.

EXT2-fs warning: checktime reached, running e2fsck is recommended

VFS: Mounted root (ext2 filesystem).

empeg-pump v0.03 (19980601)
Press Ctrl-A to enter pump...attempt to access beyond end of device

03:05: rw=0, want=2, limit=0

dev 03:05 blksize=1024 blocknr=1 sector=2 size=1024 count=1

EXT2-fs: unable to read superblock

Kernel panic: VFS: Unable to mount root fs on 03:05


Top
#320169 - 09/03/2009 14:35 Re: 11-hour long fsck and still running... [Re: TigerJimmy]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14491
Loc: Canada
Nope.

But it looks like this is a new disk?
One that has never had empeg stuff on it before?

-ml

Top
#320185 - 09/03/2009 18:23 Re: 11-hour long fsck and still running... [Re: mlord]
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049
Ah, no. It's an old disk that the builder image wouldn't (re)build. So I installed the developer image and zeroed out the partition table (the first part of the manual build process) and then tried the builder image again (because I was lazy and doing other stuff). Now I just get Hard Disk Not Found...


Edited by TigerJimmy (09/03/2009 18:25)

Top
#320186 - 09/03/2009 18:26 Re: 11-hour long fsck and still running... [Re: TigerJimmy]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14491
Loc: Canada
Well, the drive is fine. So zero the partition table (again), and then grab builder_bigdisk_v4.upgrade and zap it with that.

-ml

Top
#320187 - 09/03/2009 18:46 Re: 11-hour long fsck and still running... [Re: mlord]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31596
Loc: Seattle, WA
Quote:
Well, the drive is fine. So zero the partition table (again), and then grab builder_bigdisk_v4.upgrade and zap it with that.


Which is located at http://rtr.ca/bigdisk/ by the way.

This will take care of the drive, but he still needs to solve this problem he mentioned in his original post:

Quote:
I had been having intermittent no hard drive detected errors on startup
_________________________
Tony Fabris

Top
#320189 - 09/03/2009 19:56 Re: 11-hour long fsck and still running... [Re: mlord]
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049
That's the builder that didn't do it the first time...

OK, I'm done for the day, so I'll tear into it and see if I can't make it work.

Thanks!

Top
#320190 - 09/03/2009 20:20 Re: 11-hour long fsck and still running... [Re: TigerJimmy]
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049
The builder seems to have worked now. I think the issue may have been that the only drive attached was the slave? Could that be a problem?

Top
#320191 - 09/03/2009 20:49 Re: 11-hour long fsck and still running... [Re: TigerJimmy]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31596
Loc: Seattle, WA
Originally Posted By: TigerJimmy
the issue may have been that the only drive attached was the slave? Could that be a problem?


If you neglected to remove the slave jumper before attempting to run the builder on it, yeah, that would do it. There has to be a master there before a slave will work, so if you've got only one drive in there, it's gotta be jumperless (i.e., a master).
_________________________
Tony Fabris

Top
#320203 - 10/03/2009 14:03 Re: 11-hour long fsck and still running... [Re: tfabris]
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049
Yes, that was the problem. I had one drive only, plugged in to the slave position on the cable, and jumpered as a slave, but with no master drive installed. Changed all that around and ran the builder with the disk on the master and it built and worked. In retrospect, probably a stupid mistake.

Top
#320204 - 10/03/2009 14:04 Re: 11-hour long fsck and still running... [Re: tfabris]
TigerJimmy
old hand

Registered: 15/02/2002
Posts: 1049
Originally Posted By: tfabris
Quote:
Well, the drive is fine. So zero the partition table (again), and then grab builder_bigdisk_v4.upgrade and zap it with that.


Which is located at http://rtr.ca/bigdisk/ by the way.

This will take care of the drive, but he still needs to solve this problem he mentioned in his original post:

Quote:
I had been having intermittent no hard drive detected errors on startup


Yeah, I'm pretty sure that Stu has this fixed for me. He's resoldered the IDE header and recrimped the cable. I haven't had the detection problems while running on my spare (except for when the disk wasn't building because it was a slave-only configuration).

Top
#320208 - 10/03/2009 15:40 Re: 11-hour long fsck and still running... [Re: mlord]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
Originally Posted By: TigerJimmy
Those are disk errors, right???
[...]
ide_data_test: wrote 0x0000 read 0xff00
ide_data_test: wrote 0xffff read 0xff00
ide_data_test: wrote 0xaaaa read 0xaa00
ide_data_test: wrote 0x5555 read 0x5500
Originally Posted By: mlord
Nope.

I think these messages have worried a lot of people, over the years... if these differences between "wrote" and "read" are perfectly normal, is there perhaps some way the message could be reworded to sound less error-like?

Peter

Top
#320210 - 10/03/2009 15:46 Re: 11-hour long fsck and still running... [Re: peter]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31596
Loc: Seattle, WA
Yup, as Peter is saying, the messages tend to look a bit like errors even when they aren't. Don't know how Mark could make them look less like errors... because until the drive is fully spun up and functioning, they really are errors (if I'm understanding the way they work correctly).

(For completeness' sake, here is the description of how the IDE data test messages are used.)
_________________________
Tony Fabris

Top
#320240 - 11/03/2009 12:22 Re: 11-hour long fsck and still running... [Re: peter]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14491
Loc: Canada
Mmm.. I wonder if perhaps this:

ide_probe: wrote 0xffff read 0xff00

??

Top
#320246 - 11/03/2009 12:49 Re: 11-hour long fsck and still running... [Re: mlord]
Roger
carpal tunnel

Registered: 18/01/2000
Posts: 5683
Loc: London, UK
Originally Posted By: mlord
Mmm.. I wonder if perhaps this:

ide_probe: wrote 0xffff read 0xff00

??


Maybe highlight the ones that match:

ide_probe: wrote 0xffff read 0xff00
ide_probe: wrote 0xffff read 0xffff - OK

That way, normal disks go from being not OK (without shouting) to OK (with shouting); broken disks never state OK. Bit less scary?
_________________________
-- roger

Top
#320249 - 11/03/2009 14:02 Re: 11-hour long fsck and still running... [Re: Roger]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14491
Loc: Canada
Perhaps, yes.

I wonder though, if then it will lead to even more inquiries as to why some of the tests "fail" (not "OK") whereas others don't.

People are strange beasts at times. The only way to keep them from asking is to remove the messages (MS style). But these are incredibly useful diagnostics, so they're staying put.

Cheers

Top
#320252 - 11/03/2009 14:12 Re: 11-hour long fsck and still running... [Re: mlord]
Roger
carpal tunnel

Registered: 18/01/2000
Posts: 5683
Loc: London, UK
Originally Posted By: mlord
The only way to keep them from asking is to remove the messages (MS style).


As I'm currently spending a few months in the development end of our support team, I'm beginning to come round to that point of view smile
_________________________
-- roger

Top
#320260 - 11/03/2009 14:59 Re: 11-hour long fsck and still running... [Re: Roger]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
Originally Posted By: Roger
That way, normal disks go from being not OK (without shouting) to OK (with shouting); broken disks never state OK. Bit less scary?

A bit, maybe, but note that the drive in this thread never tests OK by that criterion (which is also the criterion in the FAQ) -- the bottom four bits come back zero the whole time (and the bottom 8 most of the time), but apparently this is still actually OK?

Peter

Top
#320261 - 11/03/2009 16:21 Re: 11-hour long fsck and still running... [Re: peter]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14491
Loc: Canada
Yup, that drive is just fine.

The "data test" messages are *ONLY* meaningful in the context of a known hardware fault.

-ml

Top
#320262 - 11/03/2009 16:33 Re: 11-hour long fsck and still running... [Re: mlord]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
Originally Posted By: mlord
Mmm.. I wonder if perhaps this:

ide_probe: wrote 0xffff read 0xff00

??

How about just:

ide_probe: 0xffffff00

? That way you'd still get all the data, but most users wouldn't even perceive that the two halves of the number were in some way "meant" to be the same.

Peter

Top