Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#293452 - 04/02/2007 16:36 Bad drive? Kernel panics and segfault errors
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
A friend of mine sent me his empeg so I could troubleshoot it. It has two drives, latest hijack and 2.0 final.
The boot log looks pretty horrendous:

see attachment 2 posts down

It ljust loops. NOt looking good. Can anyone decipher that? Sidenote, when he gave it to me, apparently he'd already tried to reseat the IDE cables, and the one on the second drive was plugged in upside down....


Edited by loren (04/02/2007 17:19)
_________________________
|| loren ||

Top
#293453 - 04/02/2007 16:58 Re: Bad drive? Kernel panics and segfault errors [Re: loren]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14496
Loc: Canada
You've mangled the log, probably by trying to cut'n'paste using hyperterm.

I think Tony has a FAQ on that someplace.

Top
#293454 - 04/02/2007 17:07 Re: Bad drive? Kernel panics and segfault errors [Re: loren]
schofiel
carpal tunnel

Registered: 25/06/1999
Posts: 2993
Loc: Wareham, Dorset, UK
You don't actually say what's wrong with the player, so please come back and give us a description.

However, I am willing to bet you have got the "Click of Death":


ide0 at 0x000-0x007,0x038 on irq 6
hda: IBM-DJSA-220, 19077MB w/1874kB Cache, CHS=38760/16/63
ide_data_test: wrote 0xaaaa read 0x
hdb: FUJITSU MHL2300AT, 28615MB w/2048kB Cache, CHS=58140/16/63 0x5555 read 0x0080


There's your problem - the Fujitsu 2300, which was one of the drives affected by the chips damaged by the fire retardent in the plastic from years back - bet it's dead.
_________________________
One of the few remaining Mk1 owners... #00015

Top
#293455 - 04/02/2007 17:20 Re: Bad drive? Kernel panics and segfault errors [Re: schofiel]
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
Im attaching a proper log to this post. Sorry guys.

It boots up, then flashes segfault error and segkill and goes into a loop.

If both drives have the proper install, should I be able to unattach one and have it boot properly after a database rebuild assuming i disconnect the drive with the errors?

Thanks!


Attachments
293976-capture.txt (156 downloads)



Edited by loren (04/02/2007 17:20)
_________________________
|| loren ||

Top
#293456 - 04/02/2007 19:56 Re: Bad drive? Kernel panics and segfault errors [Re: loren]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14496
Loc: Canada
Much better, thanks.

Hardware looks healthy, except for possible memory-gone-bad.

But it's probably a bad tune or something.

So, go into the Hijack menu (long knob press), and check the Vital Signs to find out what the suspect FID (track) is. Write it down for later.

Then kill off the player with control^C, and then do a player -i to reset things. Then see how it behaves.

If all is well, then poke around in /drive0/fids (or whatever that's called), and find out what the name of the tune was, by looking at the FID file with the same number you wrote down, swapping the final 0 for a 1. Then play that tune again, and see if the crashes resume.

Get the idea?

Cheers

Top
#293457 - 05/02/2007 03:18 Re: Bad drive? Kernel panics and segfault errors [Re: mlord]
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
Thanks Mark!

I couldn't get into hijack to check the Vital Signs... it reboots way to quick. BUT... I did kill the player and reset it with -i and the player now seems to be stable. So no idea what the bad track was, if that was the cause. I'll have to see if Ryan remembers what he was playing when it started.

So what are the odds that it is a disk problem... bad sector or something? Or do you definitely think it's a bad track. Pretty crazy, i had no idea a bad track could cause something like that.

I'll try and run smartctl just to check things out while I have it here on the "test bench".

Thanks again!
_________________________
|| loren ||

Top
#293458 - 05/02/2007 04:16 Re: Bad drive? Kernel panics and segfault errors [Re: loren]
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
Here are the only errors I get with smartctl:

Code:

empeg:/drive0/var# ./smartctl -l error /dev/hda
smartctl version 5.33 [arm-empeg-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 2

ATA Error Count: 0
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 0 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in standby mode.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
a2 00 86 b6 e2 51 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
df 20 84 b4 e0 20 70 10 43d+09:17:39.225 MEDIA UNLOCK
df 00 84 b4 e0 20 70 10 43d+09:17:39.225 MEDIA UNLOCK
df e0 81 b4 e0 20 70 10 43d+09:17:39.225 MEDIA UNLOCK
df c0 81 b4 e0 20 70 10 43d+09:17:39.225 MEDIA UNLOCK
df a0 81 b4 e0 20 70 10 43d+09:17:39.225 MEDIA UNLOCK

Error -1 occurred at disk power-on lifetime: 719 hours (29 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 59 82 f2 7e bb e3 Error: IDNF at LBA = 0x03bb7ef2 = 62619378

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 22 1b 02 e0 00 38d+05:30:19.500 READ MULTIPLE
c4 00 08 1a 1b 02 e0 00 38d+05:30:19.500 READ MULTIPLE
c4 00 08 12 1b 02 e0 00 38d+05:30:19.500 READ MULTIPLE
c4 00 08 0a 1b 02 e0 00 38d+05:30:19.500 READ MULTIPLE
c4 00 08 02 1b 02 e0 00 38d+05:30:19.500 READ MULTIPLE

Error -2 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 00 00 00 00 00 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------

empeg:/drive0/var#



Mean anything?
_________________________
|| loren ||

Top
#293459 - 05/02/2007 13:39 Re: Bad drive? Kernel panics and segfault errors [Re: loren]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14496
Loc: Canada
Quote:
10 59 82 f2 7e bb e3 Error: IDNF at LBA = 0x03bb7ef2 = 62619378


That's a (single) bad sector. I don't know if it was the cause of the troubles, though.

Perhaps someone with more sleep than I could figure out which partition that was on. It's at a point right around offset 30GB on the drive, so that's probably a corrupted tune file.

I wouldn't call the drive bad yet at this point.

But perhaps someday I'll add some stuff to the Hijack kernel so that a simple read-rewrite disk repair utility could be used in situations like this.

Cheers

Top
#293460 - 05/02/2007 13:43 Re: Bad drive? Kernel panics and segfault errors [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14496
Loc: Canada
One thing you could try, is: connect over serial, kill off the player again (Control^C), and then just try reading all of the tune files. It should eventually die on the one with the bad sector (if any).

Something like this:Code:
  cat /drive*/fids/* /drive*/fids/*/* > /dev/null



Note that this will take many hours to complete. Hopefully it will also print out the name of the bad file, if it finds one.

Cheers


Edited by mlord (05/02/2007 13:44)

Top
#293461 - 05/02/2007 15:44 Re: Bad drive? Kernel panics and segfault errors [Re: mlord]
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
K, i'll give that a shot. Is there no way to get the drive to map the bad sector so it won't use it anymore? I'd always assumed disk check utils did that... but you know what they say about assuming.
_________________________
|| loren ||

Top
#293462 - 05/02/2007 16:52 Re: Bad drive? Kernel panics and segfault errors [Re: loren]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14496
Loc: Canada
Quote:
K, i'll give that a shot. Is there no way to get the drive to map the bad sector so it won't use it anymore? I'd always assumed disk check utils did that... but you know what they say about assuming.


The drive will map it out of use on the next write to that sector. But until rewritten, it has to continue returning "error" on reads, to guarantee that we know we've lost data.

-ml

Top
#293463 - 05/02/2007 17:19 Re: Bad drive? Kernel panics and segfault errors [Re: mlord]
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
Great. Thanks Mark, all seems to be well with the player... for now! =] Ryan will be really stoked, he hasn't been able to sync it for months he says.
_________________________
|| loren ||

Top
#293464 - 05/02/2007 19:38 Hijack v468: Show playlist.trk info in VitalSigns and on errors. [Re: loren]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14496
Loc: Canada
Hijack v468 is now available.

New in this version: The playlist FID and current track index (counting from zero within the playlist) are now shown in the Vital Signs display, and in the Hijack popup error windows (eg. sigkill). The popup error messages are also now echoed to the serial port.

All of this is intended to make it easier to identify bad tracks which repeatedly cause the player software to die.

Cheers

Top
#293465 - 05/02/2007 20:37 Re: Hijack v468: Show playlist.trk info in VitalSigns and on errors. [Re: mlord]
loren
carpal tunnel

Registered: 23/08/2000
Posts: 3826
Loc: SLC, UT, USA
Brilliant... once again above and beyond. Thanks Mark!
_________________________
|| loren ||

Top
#293466 - 05/02/2007 20:47 Re: Hijack v468: Show playlist.trk info in VitalSigns and on errors. [Re: mlord]
maczrool
pooh-bah

Registered: 13/01/2002
Posts: 1649
Loc: Louisiana, USA
[simpsons reference] Hijack, is there anything it can't do? [/simpsons reference]

Stu
_________________________
If you want it to break, buy Sony!

Top