Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#197412 - 09/01/2004 04:37 Validating website for broken links
Roger
carpal tunnel

Registered: 18/01/2000
Posts: 5683
Loc: London, UK
I've decided to install drupal to manage the content on my website, and I'm (slowly) migrating the content across.

I'm doing it by taking the HTML from each page from my old home page and creating new pages in drupal using the content.

The old HTML has (relative) links in it, which no longer work under drupal (not surprisingly). I ought to go through and fix up the links to point at the other pages correctly (once I've imported those pages).

So, can anyone point me at a good tool that will spider my website, looking for broken links?
_________________________
-- roger

Top
#197413 - 09/01/2004 06:52 Re: Validating website for broken links [Re: Roger]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
It's not particularly pretty, but I'm guessing you could do a recursive wget on the site and it would probably complain about broken links ...

Top
#197414 - 09/01/2004 10:02 Re: Validating website for broken links [Re: Roger]
juenk
journeyman

Registered: 12/01/2002
Posts: 84
Loc: Waardenburg, The Netherlands
Is this useful?

jelle
_________________________
Empeg M2A Blue # 010101908 80Gb
Empeg M2A Blue # 030102771 with backlight buttons - Need repair (IDE cable connection on main board) - volunteers?

Top
#197415 - 14/01/2004 06:33 Re: Validating website for broken links [Re: mschrag]
Roger
carpal tunnel

Registered: 18/01/2000
Posts: 5683
Loc: London, UK
recursive wget on the site and it would probably complain about broken links

That doesn't appear to tell me where the link is from.
_________________________
-- roger

Top
#197416 - 14/01/2004 08:14 Re: Validating website for broken links [Re: Roger]
g_attrill
old hand

Registered: 14/04/2002
Posts: 1172
Loc: Hants, UK
Server logs? A grep 404 should do it.

I think there is a verbose option for wget.

Gareth

Top
#197417 - 14/01/2004 08:39 Re: Validating website for broken links [Re: Roger]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
try:

wget -nv -r http://gary.rwd.com/temp/test.html

It gives the following output (test.html has a link to ohgod.html that is missing):

[mschrag@gary mschrag]$ wget -nv -r http://gary.rwd.com/temp/test.html
12:36:47 URL:http://gary.rwd.com/temp/test.html [86/86] -> "gary.rwd.com/temp/test.html" [1]
http://gary.rwd.com/robots.txt:
12:36:47 ERROR 404: Not Found.
http://gary.rwd.com/ohgod.html:
12:36:47 ERROR 404: Not Found.

FINISHED --12:36:47--
Downloaded: 86 bytes in 1 files

EDIT: This is with GNU Wget 1.8.2


Edited by mschrag (14/01/2004 08:39)

Top
#197418 - 14/01/2004 08:43 Re: Validating website for broken links [Re: Roger]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
Oh wait -- did I misread? are you asking for what page the link was on? I think you are whoops ... Back to the drawing board.

Top
#197419 - 14/01/2004 08:48 Re: Validating website for broken links [Re: Roger]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
OK .. here we go:

http://validator.w3.org/checklink/

has a link to the source at the bottom .. you can run it from commandline too .. I had to remove the -T from the top of the script because taintperl was complaining.

Here's the output:

[mschrag@gary mschrag]$ checklink.pl -q -r http://gary.rwd.com/temp/test.html

Processing http://gary.rwd.com/temp/test.html


List of broken links and redirects:

http://gary.rwd.com/temp/test3.html Line: 6
Code: 404 Not Found
To do: The link is broken. Fix it NOW!

----------------------------------------

Processing http://gary.rwd.com/temp/test2.html


List of broken links and redirects:

http://gary.rwd.com/ohgod.html Line: 4
Code: 404 Not Found
To do: The link is broken. Fix it NOW!

Top