In message <gssdi0$sit$(E-Mail Removed)>, Chris Hills
<(E-Mail Removed)> writes
>On 24/04/09 09:29, Dave Saville wrote:
>> Surely *something* is going to show up in your web logs? Tip: When
>> processing log files, exclude what you know you don't want. Then
>> whatever is left is out of the ordinary. You can't program for what
>> you don't know is there (yet) :-)
>
>Dave
>
>When my sites get accessed the user agents are indeed logged. However,
>once I know what the crawler agent is, it will be too late since it
>will already have been crawled :-)
>
>Regards,
>
>Chris
Does Phorm crawl the sites? Will the traffic they profile show up in the
logs in any way which differs from the original requester?
AIUI Phorm's methodology is to look at the web pages individuals are
browsing in order to profile the individual not the website. It
identifies the individual from cookies set on the users machine, and
then passes on the original request to the website as if it came from
the individual.
Phorm say (see
http://www.cl.cam.ac.uk/~rnc1/080518-phorm.pdf) that when
the website is first visited (by any ISP customer) the Robots.Txt file
is retrieved and cached (for a month). That implies you might see in the
log one request for Robots.txt once a month from a Phorm IP. The rest of
the traffic from your website that they profile will be unidentifiable
by you in any way.
The same document also suggests that they will only respect a
User-Agent: * construction and not one targeted at their Agent. See Para
44 "we work on the basis that if a site allows spidering of its
contents by search engines, then its material is being openly published.
Conversely, if the site has disallowed spidering and indexing by search
engines, we respect those restrictions in robots.txt". If they aren't
going to respect a User-Agent: Phorm (and why would they) then there is
no real point in knowing what the agent is really called anyway.
I suspect there is going to be no way a website can block Phorm while
allowing Google etc. to index it without someone resorting to the
courts.
--
Invalid