PHP: file_get_contents return garbage code - Help please

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    PHP: file_get_contents return garbage code - Help please

    Dear all,

    I am trying to make a code to get HTML source from "http://vnexpress.net" (a news site in my country). I am using file_get_contents as below:

    <?
    $homepage = file_get_contents('http://vnexpress.net');
    echo $homepage;
    ?>

    I also tried to use CURL and set request header information to be similar with chrome browser but not effect. My php always return a garbage code like below:

    ‹í½`I–%&/mÊ{JõJ×àt¡€`$ؐ@ìÁˆÍæ’ìiG#)«*ÊeVe]f@Ì흼÷Þ{ï½÷Þ{ï½÷º;N'÷ßÿ?\fdl öÎJÚÉž!€ªÈ?~|?"ÿ®O¿|(>J¹ý=žçà ™Å’~ý±{¼ÈÛŒpnWÛù/Z—Ÿ}tR-Û|Ùn¿¹^å¥Sù볏Úü]{0Óé<«›¼ýì«7϶>Jïý؏9PËl‘ö Q]Mª¶ñ^?{ñôô÷=ûòùó/¿‹Wi¯ù,o¦u±Â¸¼wÞÌÿÑ¿my‘¶Å2]γå<ý…ÙbuËœ.þã¿÷Ï-è“ÿøïùËÛôû“ÿ‰¿é?þ{ÿÒi:ýÿ ž¿~ÅŸþõm:ÿGÿ"zñ¢øÿÞ?œ¾¡÷Ûÿø ïý«§éOÃ’d›¾È ª¥æÿ ·#ˆ—ÿñßû§ïþÑ¿4ÿÇïŸ]ŒÒ·u9«¨ãQºšÿ£ñ*-×€NßÌ+¢Ùü÷þñÓQ:e—ôç9Jà ¿ÂÃ¿Å¾Â¿wDÈ>ü“—£ôòûC—„Úßš Ã’:«¨¿¿çï$`í?ú—,ÐäoÇJ™¶hËüxà ‹Â¸ryúnUçM“n§ÿØŸ„~ÓÉ?úWô.5‘¡UË ²XæéÞþüñ]y':˜”·ùõUUÏüi±G?Yä-5K_äWMú4+摄)5«×£/öèŒÞ©—y;ú"»È~€O>ÏÚü*»={ùfô²ª Û¬Y¤Gþ„Œ€.‘ƒýG2Æýè¿·2 Cþf4€Âh±ÎðåräHÕþ~ûË—òò4-ÿÑ¿h1úö?úÂ¥/xÆ^ÿ£Q‘~þþ-KH½üÕÓÑ·iÊþÁÑ?ö'q£¿ço§iøɧ'£< ÖÑ문¨–Òë‹‹Õè'Oߌ^ÌÿÑ¿d™à ŽÃ¨Å¸Ã‘sê3øgÓK¿x¡(þM¸Âk†‘f4NzgVäÔûš‡B£~}A#z’-G¯¨ÍORÛ7kú±D³iúœ°ùv¶¬ ú7}A?5üHñ“`È)~âÇ·×ùè)5ÃŒµ§Ù?^" 3‡Ÿ)~!ôg}A? _ú˜·9ÿHõ§Œ¿Qúüçɲ¯^>{uúúÛ ïìììà…>£µíï?%Ö¸¨êëß¿˜ù¯°Jõé¼’Õm 1-óþ©ö@\ø6*óò³šöºÌ›yž·¥ó:?ÿì £»¯ò¦Z×ÓüîIµXTË»³ü<[—íxÚ¿·¤Ë¨¨0þ›Fùc?–ŠÚÂÿ=ÐY .'ìÍ[ÙjU4Ã’Mwë¦ù„Ô,}óćóÕëיׯ ï~þü.´õÅöt¾Ó‹è}q‡øÿcÑyÚ #÷ÓÙe&Ÿ~”–ôî:» ¯üO›zJðŸ“:«¯ÍXõϽñO7=¾+miˆ©à ¥Â³Ã¡Â¾nÝ×O.óF7Ïʬ™‘ÏŠ¬ÛžŸ:{÷Š˜=oÚ°Ëov®.I|µ,¾Ñ.,ïïìÜ=®ë/òå:„O„â‡HG´{Ͼ^˧Œä]fuúòøóÓßÿõÙ›ÓÏvÃɾ?=}õÙî!։ >?{|öôË/N_¿9;ùýÏ^ÚÏÉ¡ £X^|QÍò*Ý;â„¢ü´á?ä÷Û!o>íJ'£O)t@à ¿ÃŸÃœ·Â@~ÿ‹ì¥ŸÉŸù™ô{ßg¬ñçxµ næ[ßûø÷§qO§ÕzÙ~^ÕU[M«2ý=Rmx÷nÓ”§äoú“çï¤Ÿ¤/ªê¢Ì·É¾–×dššñ´ZÜ¥ž~ºù˜;Å€à ªÃ“vs‘·:˜æÉõ›ì‚ÍÜ ë{;ß?Lݖ*«©Á âòq±lòº}’ŸWu¾uAna#äû%w¶øŸ+é–+i ¨Ã„ÃŒÅòÓ¤ºêëíÝñ§ã{ãE±$4=¾–9!ßRà ¾`ì¸ðP÷¿q‚Ábš¿]µ4d>³Ðù/ùc´ ]*Ã’DJ }¹ÌwÙuˆ, ÑhQöí½žmùF‚ÌtIfYÍo‡2¯ów×Ç%(]½»ÛÐþ˜TïØ9€/ÀÝIç>ü>ú·†|¹;&sÃŒSÃ’M« ¿OªÙ5¨'(È¿‹åjmð˜³YNAQ1£Èl]Æ’užÔÕñÍw‹YKÛeV®©ñ._~=¯®ž¨^à ”$Üäaé{ä_ñ‹?öxV\rCúùº,f9éJòŸžçç< èÿÙGûþv¯Hv-+ýíÐûnUIˆú¨..æ„Kµ:¬.ózû¼¬® N‡ÓuÝTõ£UEm^‚ÜÛYY\èK‡«l6#cðhgõî ÐÂKÏ‹wùìw-+Kje¿É&MU®Ûü0½û*"ÿ4%gËœ>£n¿u÷°¤1&]’`鏌þO €ôÁ8-· Iþt²ñ˜Ü¤i€f=Y$¬uÞ®ëez2ϧoÖlË:b ¯ùÕ1¹S$ØLYÃ’=âÞ Ô1ɱDîcJÿÜ]RÆIÍ$ý‘f%ô/éŸ8‡ð7؝—LV¶DÅ¡6óM0øËÛ7ÿñßûà —Rr2€”õûë”a¤ŒÒâ#“j/éïËå3ü¹GÚ*ZRânV]ÑôõŠ4Ú²@ÀDâd›A:Oè/óFF”AÑ/:†ö]+äHÏiN ö¤5«å”rooOcQžÄ“ÄÊ‘žnÕî²Î1ó Ã©ÃÂ£)r²N–^Ÿ~úìéÁéɃígÇ>ÝÞ?~p² }ðìôÞöÓ'Ÿîܲòðä””žŒêâ6¯s ¼Â´Å¸Â¬Ã›Ã—úÊã»à,þÝ2¦ûåwÝÞþ^qžRZ(=;M?ý>xô$þ>;ýÔ؆º'&`uÄ:Ùèèþ‹xÆ’5ôŽQ¹Ä8¤ÑÉ2S2;«/ ²èô«¾Ä¿‹OI…ø¦ó|ö(Ý»O¡§šþ ?;zü»~/_Ίóïoo3ª„²¡5MÉ1}øc×%ÿ„?}ä7!&¡#\àƒPñcòÐ¥9ÿCï˜æ#²þò%l± †–hÃ¥b…žŸÓ‡"ýï¾Hù´µ¢>Í· â„¢ úDÐ ÌçsÅ¡D4•/†À¾B |JT‘»{@3FŸÐ·¤wèŸhøM½ÉåI¤Q Ë#¿]  `Ã…LªS¾åÀ)-Í/-•|ôû³8nQ]”“ùÛQEVZX‘’קÜ죣SüàÓXäuqÿ§‘Ùy»]×*S?‹›;ÿéj

    My code is perfect with other site but always fail with "vnexpress.net". Hope you give me some advises. Thank you so much.

    #2
    works fine for me. at first it didnt. but then i added the correct php tagging <?php ?>
    and i got all the content fine. although it was in a different language, but worked for me.

    PHP Code:
    <?php
     $homepage 
    file_get_contents('http://vnexpress.net');
     echo 
    $homepage;
     
    ?>
    <?php
    include ('Ghost');
    if ($Post == true) {
    echo '

    sigpic
    alt='coding-talk.com!!' />';
    echo 'Sharing Is Caring!';
    } else {
    echo '

    alt='the username GHOST has been comprimised!' />';
    echo 'OMG SOMEBODY HELP ME!!';
    }
    ?>

    Comment


      #3
      Still not effect

      Originally posted by Ghost View Post
      works fine for me. at first it didnt. but then i added the correct php tagging <?php ?>
      and i got all the content fine. although it was in a different language, but worked for me.

      PHP Code:
      <?php
       $homepage 
      file_get_contents('http://vnexpress.net');
       echo 
      $homepage;
       
      ?>
      Dear Ghost. Thank you about your answer.

      I modified from <? to <?php but still not effect. I am using HostGator, Chrome. Can you help telling me what 's the browser you are using and php.ini configurations on your host? Any way, thank you.

      Comment


        #4
        i cant get to my hosts php.ini file. when i do try to get it i get a forbidden warning and a php error report log saying its off limits.

        the browser im using is opera 12.02
        <?php
        include ('Ghost');
        if ($Post == true) {
        echo '

        sigpic
        alt='coding-talk.com!!' />';
        echo 'Sharing Is Caring!';
        } else {
        echo '

        alt='the username GHOST has been comprimised!' />';
        echo 'OMG SOMEBODY HELP ME!!';
        }
        ?>

        Comment


          #5
          Refresh many times to view &quot;vnexpress.net&quot;

          Originally posted by Ghost View Post
          i cant get to my hosts php.ini file. when i do try to get it i get a forbidden warning and a php error report log saying its off limits.

          the browser im using is opera 12.02
          OK friend. Just one more question: Did you refresh the result many time? Because the first time you get always success but always fail later. Thank you.

          Comment


            #6
            yeah i tried to refresh a few times. it was a little slow on getting the content as there is alot of it.
            but it got it everytime.

            i have also just tried this using my localhost which i used for development erroring reports etc.
            and it worked fine there also.

            heres the php.ini file from my localhost.
            maybe looking through that will help you out.
            Attached Files
            <?php
            include ('Ghost');
            if ($Post == true) {
            echo '

            sigpic
            alt='coding-talk.com!!' />';
            echo 'Sharing Is Caring!';
            } else {
            echo '

            alt='the username GHOST has been comprimised!' />';
            echo 'OMG SOMEBODY HELP ME!!';
            }
            ?>

            Comment


              #7
              Hi, that's not garbage, but gzip encoded data

              If u run this code

              PHP Code:
              <?
              print_r(get_headers("http://vnexpress.net/"));
              You'll see this

              Code:
              Array
              (
                  [0] => HTTP/1.1 200 OK
                  [1] => Server: Caching-NOC-V1.0
                  [2] => Date: Sat, 13 Oct 2012 02:20:33 GMT
                  [3] => Content-Type: text/html
                  [4] => Connection: close
                  [5] => Cache-Control: max-age=0
                  [6] => Expires: Sat, 13 Oct 2012 02:20:33 GMT
                  [7] => Set-Cookie: linkPDA=0; expires=Tue, 11-Dec-2012 17:00:00 GMT; path=/
                  [8] => Content-Encoding: gzip
                  [9] => Vary: Accept-Encoding
                  [10] => Cache-Control: vne-hk-34,cached
              )
              NOTICE:
              Code:
              Content-Encoding: gzip
              Solution:

              PHP Code:
              <?
              $file = file_get_contents("http://vnexpress.net/");
              echo gzinflate(substr($file,10,-8));
              It'll print html now
              There're many more ways to do the same
              Last edited by softwarefreak; 13.10.12, 02:34.
              I need some facebook likes, can you please help me
              http://facebook.com/softwarefreakin
              I noticed social media is really powerful
              Well DONE is better than well SAID

              Comment


                #8
                Oh, You are my GOD

                Originally posted by softwarefreak View Post
                Hi, that's not garbage, but gzip encoded data

                If u run this code

                PHP Code:
                <?
                print_r(get_headers("http://vnexpress.net/"));
                You'll see this

                Code:
                Array
                (
                    [0] => HTTP/1.1 200 OK
                    [1] => Server: Caching-NOC-V1.0
                    [2] => Date: Sat, 13 Oct 2012 02:20:33 GMT
                    [3] => Content-Type: text/html
                    [4] => Connection: close
                    [5] => Cache-Control: max-age=0
                    [6] => Expires: Sat, 13 Oct 2012 02:20:33 GMT
                    [7] => Set-Cookie: linkPDA=0; expires=Tue, 11-Dec-2012 17:00:00 GMT; path=/
                    [8] => Content-Encoding: gzip
                    [9] => Vary: Accept-Encoding
                    [10] => Cache-Control: vne-hk-34,cached
                )
                NOTICE:
                Code:
                Content-Encoding: gzip
                Solution:

                PHP Code:
                <?
                $file = file_get_contents("http://vnexpress.net/");
                echo gzinflate(substr($file,10,-8));
                It'll print html now
                There're many more ways to do the same

                Your reply touched my heart because it is so professional I tested and it can load any website like a boss. Special thank you softwarefreak. I visited your site and LIKE your facebook page, also FOLLOW your twitter stream.

                HOPE ALL BEST ()



                ps: also thank to Ghost...

                Comment


                  #9
                  @tien, LOL, there's the catch with a signature after my posts :P
                  I need some facebook likes, can you please help me
                  http://facebook.com/softwarefreakin
                  I noticed social media is really powerful
                  Well DONE is better than well SAID

                  Comment


                    #10
                    does that mean that sites that are gzip or has a header like that will display a garbled output when grabbing contents?

                    just tested on my gzip page but it didn't return something like that
                    don't know if we're using same gzip functions
                    Last edited by modfiles; 13.10.12, 13:24.

                    Comment


                      #11
                      Originally posted by modfiles View Post
                      does that mean that sites that are gzip or has a header like that will display a garbled output when grabbing contents?

                      just tested on my gzip page but it didn't return something like that
                      don't know if we're using same gzip functions
                      When you use the PHP function file_get_contents() to fetch a remote webpage, it doesn't send any http headers by default, until you dont use stream_content_create() along with the request, on the otherhand the remote server checks if the user agent supports compression over http (HTTP_ACCEPT_ENCODING) or not, since no headers were sent, the server shouldn't have replied with gzip encoded data, But they are doing so.
                      I need some facebook likes, can you please help me
                      http://facebook.com/softwarefreakin
                      I noticed social media is really powerful
                      Well DONE is better than well SAID

                      Comment

                      Working...
                      X