Networking Forums  

Go Back   Networking Forums > Networking Newsgroups > Linux Networking

linux c socket read html problem

Reply
 
Thread Tools Display Modes
  #1  
Old 06-12-2007, 04:55 PM
Default linux c socket read html problem




hi all !

there is a problem confused me, i create a socket ,and connect
the web host at 80, get the html, when read data form fd(which is
created above), it read some data seems not belong to the web site's
html. ex:
i read the www.google.cn ,it return following:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=GB2312
Set-Cookie:
PREF=ID=2089898e46137a4a:NW=1:TM=1181662196:LM=118 1662196:S=T5zGUwA1MR8QBCoI;
expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Server: GWS/2.1
Transfer-Encoding: chunked
X-Google-Backends: prcsat-gfe.l.google.com:80,mctf10:80
X-Google-Service: www
X-Google-Request-Trace: mctf10:80,prcsat-gfe.l.google.com:80,mctf10:80
Date: Tue, 12 Jun 2007 15:29:26 GMT

bcf -----> here is the data confused me
<html><head><meta http-equiv="content-type" content="text/html;
charset=GB2312"><title>Google</title><style><!--
body,td,a,p,.h{font-family:""}
..h{font-size:20px}
..h{color:#3366cc}
..q{color:#00c}
--></style>
<script>
<!--
function sf(){document.f.q.focus();}
// -->
</script>
</head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b
alink=#ff0000 onload="sf();if(document.images){new Image().src='/
images/nav_logo3.png'}" topmargin=3 marginheight=3><div align=right
id=guser style="font-size:84%;padding-bottom:4px" width=100%><nobr><a
href="https://www.google.com/accounts/Login?continue=http://
203.208.33.101/& hl=zh-CN">怬</a></nobr></div><center><br
id=lgpd><table cellpadding=0 cellspacing=0 border=0><tr><td
align=right valign=bottom><img src=images/hp0.gif width=158 height=78
alt="Google"></td><td valign=bottom><img src=images/hp1.gif width=50
height=78 alt=""></td><td valign=bottom><img src=images/hp2.gif
width=68 height=78 alt=""></td></tr><tr><td class=h align=right
valign=top><b></b></td><td valign=top><img src=images/hp3.gif width=50
height=32 alt=""></td><td valign=top class=h><font color=#666666
style=font-size:16px><b>ÖÐÎÄ(¼òÌå)</b></font>< /td></tr></table><br><form
action="/search" name=f><style>#lgpd{display:none}</style><script
defer><!--
//-->
</script><table border=0 cellspacing=0 cellpadding=4><tr><td
nowrap><font size=-1><b>ÍøÒ³</b> <a class=q href="http://
images.google.com/imghp?ie=GB2312&oe=GB2312&hl=zh-CN& tab=wi">ͼƬ</
a> <a class=q href="http://news.google.com/nwshp?
ie=GB2312&oe=GB2312&hl=zh-CN& tab=wn">×Ê Ñ¶</a> <a class=q
href="http://groups.google.com/grphp?ie=GB2312&oe=GB2312&hl=zh-CN&
tab=wg">ÂÛ̳</a> <b><a href="/intl/zh-CN/options/" class=q>¸ü¶à </a></
b></font></td></tr> </table><table cellpadding=0 cellspacing=0><tr
valign=top><td width=25%> </td><td align=center nowrap><input name=hl
type=hidden value=zh-CN><input type=hidden name=ie
value="GB2312"><input maxlength=2048 name=q size=55 title="GoogleËÑË÷"
value=""><br><input name=btnG type=submit value="Google ËÑË÷"><input
name=btnI type=submit value="ÊÖÆø²»´í"></td><td nowrap width=25%><font
size=-1> <a href=/advanced_search?hl=zh-CN>¸ß¼¶ËÑË÷</a><br> <a href=/
preferences?hl=zh-CN>ʹÓÃÆ«ºÃ</a><br> <a href=/language_tools?hl=zh-CN>ÓïÑÔ¹¤
¾ß</a></font></td> </tr><tr><td align=center colspan=3><font
size=-1><input id=all type=radio name=lr value="" checked><label
for=all>ËùÓÐÍøÒ³ </label><input id=ch type=radio name=lr value="lang_zh-
CN|lang_zh-TW"><label for=ch>ÖÐÎÄÍøÒ³ </label><input id=il type=radio
name=lr value="lang_zh-CN"><label for=il>¼òÌå ÖÐÎÄÍøÒ³ </label></font></
td></tr></table></form><br><br><font size=-1><a href="/intl/zh-CN/
ads/">¹ã¸æ¼Æ»®</a> - <a href="/intl/zh-CN/about.html">Google ´óÈ«</a> - <a
href=http://www.google.com/ncr>Google.com in English</a></
font><p><font size=-1> 2007 Google</font></p></center></body></
5 -----> and here is the data confused me
html>
0 ----->and here is the data confused me

why this happen? can someone tell me the thing i should care? thanks !



step
Reply With Quote
  #2  
Old 06-13-2007, 02:41 AM
David Schwartz
Guest
 
Posts: n/a
Default Re: linux c socket read html problem

On Jun 12, 8:55 am, step <fxl...@gmail.com> wrote:

> HTTP/1.1 200 OK


I'm betting you sent 'HTTP/1.1' in your query.

> 0 ----->and here is the data confused me
>
> why this happen? can someone tell me the thing i should care? thanks !


Did you read the HTTP 1.1 specification? DO NOT EVER CLAIM TO SUPPORT
A PROTOCOL YOU DO NOT ACTUALLY SUPPORT.

DS

Reply With Quote
  #3  
Old 06-13-2007, 12:59 PM
Lew Pitcher
Guest
 
Posts: n/a
Default Re: linux c socket read html problem

On Jun 12, 11:55 am, step <fxl...@gmail.com> wrote:
> hi all !
>
> there is a problem confused me, i create a socket ,and connect
> the web host at 80, get the html, when read data form fd(which is
> created above), it read some data seems not belong to the web site's
> html. ex:
> i read thewww.google.cn,it return following:
> HTTP/1.1 200 OK
> Cache-Control: private
> Content-Type: text/html; charset=GB2312
> Set-Cookie:
> PREF=ID=2089898e46137a4a:NW=1:TM=1181662196:LM=118 1662196:S=T5zGUwA1MR8QBCoI;
> expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
> Server: GWS/2.1
> Transfer-Encoding: chunked
> X-Google-Backends: prcsat-gfe.l.google.com:80,mctf10:80
> X-Google-Service: www
> X-Google-Request-Trace: mctf10:80,prcsat-gfe.l.google.com:80,mctf10:80
> Date: Tue, 12 Jun 2007 15:29:26 GMT
>
> bcf -----> here is the data confused me
> <html><head><meta http-equiv="content-type" content="text/html;

[snip]
> font><p><font size=-1> 2007 Google</font></p></center></body></
> 5 -----> and here is the data confused me
> html>
> 0 ----->and here is the data confused me
>
> why this happen? can someone tell me the thing i should care? thanks !


To augment David's response, what you see is the HTTP "chunked"
response control information. You want to take a look at section 3.6.1
of RFC 2616, where it tells you about the format of a "chunked"
response.

>From the RFC:

3.6.1 Chunked Transfer Coding

The chunked encoding modifies the body of a message in order to
transfer it as a series of chunks, each with its own size
indicator,
followed by an OPTIONAL trailer containing entity-header fields.
This
allows dynamically produced content to be transferred along with
the
information necessary for the recipient to verify that it has
received the full message.

[snip modified BNF for chunk encoding]

The chunk-size field is a string of hex digits indicating the size
of
the chunk. The chunked encoding is ended by any chunk whose size is
zero, followed by the trailer, which is terminated by an empty
line.

The trailer allows the sender to include additional HTTP header
fields at the end of the message. The Trailer header field can be
used to indicate which header fields are included in a trailer (see
section 14.40).

Reply With Quote
Reply

Tags
html, linux, problem, read, socket

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 05:02 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.