Networking Forums

Networking Forums > Computer Networking > Linux Networking > A script for scanning the internet for one particular index.html?

Reply
Thread Tools Display Modes

A script for scanning the internet for one particular index.html?

 
 
Tee Jay
Guest
Posts: n/a

 
      09-22-2007, 02:13 PM
Hello,

I want to know how I make a script for scanning the entire internet
(127.0.0.1 - 255.255.255.255)
for one unique index.html file (I got the index.html file I'm looking for)
and then whenever it finds a
match to the index.html file saving the results into a textfile, all
scanning done using standar http port 80.

So the end result will be a text file contaning the ip addresses found
matching the index.html I got.
I'm sure it can be done, I'm just not that good at linux scripting.


--
Tee Jay/Sapphire
((E-Mail Removed))



 
Reply With Quote
 
 
 
 
Man-wai Chang ToDie
Guest
Posts: n/a

 
      09-22-2007, 03:00 PM
> I want to know how I make a script for scanning the entire internet
> (127.0.0.1 - 255.255.255.255)
> for one unique index.html file (I got the index.html file I'm looking for)


use wget to fetch the index.html into a file then grep -i its content.

--
@~@ Might, Courage, Vision, SINCERITY.
/ v \ Simplicity is Beauty! May the Force and Farce be with you!
/( _ )\ (Xubuntu 7.04) Linux 2.6.22.6
^ ^ 22:58:01 up 1 day 7:14 1 user load average: 0.00 0.00 0.00
news://news.3home.net news://news.hkpcug.org news://news.newsgroup.com.hk
 
Reply With Quote
 
Michael Heiming
Guest
Posts: n/a

 
      09-22-2007, 03:04 PM
In comp.os.linux.networking Man-wai Chang ToDie <(E-Mail Removed)>:
>> I want to know how I make a script for scanning the entire internet
>> (127.0.0.1 - 255.255.255.255)
>> for one unique index.html file (I got the index.html file I'm looking for)


> use wget to fetch the index.html into a file then grep -i its content.


I'd suggest to learn how to use a search engine and stop
multi-posting...

--
Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94)
mail: echo (E-Mail Removed) | perl -pe 'y/a-z/n-za-m/'
#bofh excuse 157: Incorrect time synchronization
 
Reply With Quote
 
Floyd L. Davidson
Guest
Posts: n/a

 
      09-22-2007, 03:50 PM
Man-wai Chang ToDie <(E-Mail Removed)> wrote:
>> I want to know how I make a script for scanning the entire internet
>> (127.0.0.1 - 255.255.255.255)
>> for one unique index.html file (I got the index.html file I'm looking for)

>
>use wget to fetch the index.html into a file then grep -i its content.


How many decades will it take a PC to search half the Internet that way?

--
Floyd L. Davidson <http://www.apaflo.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska) (E-Mail Removed)
 
Reply With Quote
 
Allen Kistler
Guest
Posts: n/a

 
      09-22-2007, 05:11 PM
Floyd L. Davidson wrote:
> Man-wai Chang ToDie <(E-Mail Removed)> wrote:
>>> I want to know how I make a script for scanning the entire internet
>>> (127.0.0.1 - 255.255.255.255)
>>> for one unique index.html file (I got the index.html file I'm looking for)

>>
>> use wget to fetch the index.html into a file then grep -i its content.

>
> How many decades will it take a PC to search half the Internet that way?


Not quite half.

Routable addresses are (theoretically):
the range 1.0.0.0/8 - 223.0.0.0/8
minus:
10.0.0.0/8
127.0.0.0/8
169.254.0.0/16
172.16.0.0/12
192.0.2.0/24
192.168.0.0/16

In the meantime, however, this request seems an awful lot like what bots
do to look for new servers they can infect.
 
Reply With Quote
 
Ken Sims
Guest
Posts: n/a

 
      09-22-2007, 06:29 PM
Hi Allen and the group -

On Sat, 22 Sep 2007 12:11:23 -0500, Allen Kistler
<(E-Mail Removed)> wrote:

>Floyd L. Davidson wrote:
>> Man-wai Chang ToDie <(E-Mail Removed)> wrote:
>>>> I want to know how I make a script for scanning the entire internet
>>>> (127.0.0.1 - 255.255.255.255)
>>>> for one unique index.html file (I got the index.html file I'm looking for)
>>>
>>> use wget to fetch the index.html into a file then grep -i its content.

>>
>> How many decades will it take a PC to search half the Internet that way?

>
>Not quite half.
>
>Routable addresses are (theoretically):
> the range 1.0.0.0/8 - 223.0.0.0/8
>minus:
> 10.0.0.0/8
> 127.0.0.0/8
> 169.254.0.0/16
> 172.16.0.0/12
> 192.0.2.0/24
> 192.168.0.0/16
>
>In the meantime, however, this request seems an awful lot like what bots
>do to look for new servers they can infect.


That comment brings up another point. With the proliferation of
virtual named hosts, going by IP address is going to miss a lot. I
have multiple domains on my server, and in some cases multiple hosts
per domain. And the default container (which is what access by IP
address without a host name or with the IP address as host name would
get) does not contain any of those sites.

--
Ken
http://www.kensims.net/
 
Reply With Quote
 
Tee Jay
Guest
Posts: n/a

 
      09-22-2007, 06:32 PM

"Allen Kistler" <(E-Mail Removed)> wrote in message
news:%0cJi.10362$(E-Mail Removed) et...
> Floyd L. Davidson wrote:
> > Man-wai Chang ToDie <(E-Mail Removed)> wrote:
> >>> I want to know how I make a script for scanning the entire internet
> >>> (127.0.0.1 - 255.255.255.255)
> >>> for one unique index.html file (I got the index.html file I'm looking

for)
> >>
> >> use wget to fetch the index.html into a file then grep -i its content.

> >
> > How many decades will it take a PC to search half the Internet that way?

>
> Not quite half.
>
> Routable addresses are (theoretically):
> the range 1.0.0.0/8 - 223.0.0.0/8
> minus:
> 10.0.0.0/8
> 127.0.0.0/8
> 169.254.0.0/16
> 172.16.0.0/12
> 192.0.2.0/24
> 192.168.0.0/16
>
> In the meantime, however, this request seems an awful lot like what bots
> do to look for new servers they can infect.


Ok, that I can ensure it's not for. It's actually more a hypothetic question
rather than a request. I just wanna know stuff like that


--
Tee Jay



 
Reply With Quote
 
david
Guest
Posts: n/a

 
      09-22-2007, 09:56 PM
On Sat, 22 Sep 2007 20:32:59 +0200, Tee Jay rearranged some electrons to
say:

> "Allen Kistler" <(E-Mail Removed)> wrote in message
> news:%0cJi.10362$(E-Mail Removed) et...
>> Floyd L. Davidson wrote:
>> > Man-wai Chang ToDie <(E-Mail Removed)> wrote:
>> >>> I want to know how I make a script for scanning the entire internet
>> >>> (127.0.0.1 - 255.255.255.255)
>> >>> for one unique index.html file (I got the index.html file I'm
>> >>> looking

> for)
>> >>
>> >> use wget to fetch the index.html into a file then grep -i its
>> >> content.
>> >
>> > How many decades will it take a PC to search half the Internet that
>> > way?

>>
>> Not quite half.
>>
>> Routable addresses are (theoretically):
>> the range 1.0.0.0/8 - 223.0.0.0/8
>> minus:
>> 10.0.0.0/8
>> 127.0.0.0/8
>> 169.254.0.0/16
>> 172.16.0.0/12
>> 192.0.2.0/24
>> 192.168.0.0/16
>>
>> In the meantime, however, this request seems an awful lot like what
>> bots do to look for new servers they can infect.

>
> Ok, that I can ensure it's not for. It's actually more a hypothetic
> question rather than a request. I just wanna know stuff like that



"I want to know how I make a script for scanning the entire internet
for one unique index.html file (I got the index.html file I'm looking for)
and then whenever it finds a match to the index.html file saving the
results into a textfile, all scanning done using standar http port 80.
So the end result will be a text file contaning the ip addresses found
matching the index.html I got"

Doesn't sound like a hypothetical to me.. sounds like you are doing
something specific (and, suspicious).




 
Reply With Quote
 
Mark Hobley
Guest
Posts: n/a

 
      09-23-2007, 11:08 AM
Tee Jay <(E-Mail Removed)> wrote:
> I want to know how I make a script for scanning the entire internet
> (127.0.0.1 - 255.255.255.255)


That is not the entire internet, as already mentioned.

An alternative approach might be to find some sort of fingerprint (unique
segment) in the index file, and utilize a search engine (such as google) to try
and match the fingerprint. (The search engine may provide an API that can be
utilized.)

The search will of course be limited to pages indexed by the search engine,
but you will need a lot of computing power and sufficient bandwidth if you
want to outperform google. (There are billions of pages to process.)

Regards,

Mark.

--
Mark Hobley
393 Quinton Road West
QUINTON
Birmingham
B32 1QE

Email: markhobley at hotpop dot donottypethisbit com

http://markhobley.yi.org/

 
Reply With Quote
 
Bill Marcum
Guest
Posts: n/a

 
      09-24-2007, 03:04 PM
On Sat, 22 Sep 2007 12:11:23 -0500, Allen Kistler
<(E-Mail Removed)> wrote:
>
>
> Floyd L. Davidson wrote:
>> Man-wai Chang ToDie <(E-Mail Removed)> wrote:
>>>> I want to know how I make a script for scanning the entire internet
>>>> (127.0.0.1 - 255.255.255.255)
>>>> for one unique index.html file (I got the index.html file I'm looking for)
>>>
>>> use wget to fetch the index.html into a file then grep -i its content.

>>
>> How many decades will it take a PC to search half the Internet that way?

>
> Not quite half.
>

And by the time it is done, half the internet will probably have switched
to IPv6.

--
I have gained this by philosophy:
that I do without being commanded what others do only from fear of the law.
-- Aristotle
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
forward http://p2p.mydomain.com to http://mydomainIP:50001/gui/index.html, is that possible? aticatac Network Routers 1 11-13-2007 12:00 AM
Scanning / Picking Up an Open / Internet connection. What equipment do I need? JimBob Wireless Internet 1 11-22-2006 11:23 PM
ebook download index water9580@yahoo.com Wireless Internet 2 10-19-2006 05:01 AM
Internet Explorer Script Error vbsouthern Windows Networking 2 11-22-2003 06:33 PM
Draytek Vigor2600X and SDX INDEX BRI kevin[dot]kenny Broadband 2 07-08-2003 08:48 PM



1 2 3 4 5 6 7 8 9 10 11