Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #21855 > unrolled thread

security quirk

Started byRichD <r_delaney2001@yahoo.com>
First post2013-01-29 20:55 -0800
Last post2013-01-30 20:40 -0500
Articles 12 — 8 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  security quirk RichD <r_delaney2001@yahoo.com> - 2013-01-29 20:55 -0800
    Re: security quirk Roedy Green <see_website@mindprod.com.invalid> - 2013-01-30 01:55 -0800
    Re: security quirk Martin Musatov <marty.musatov@gmail.com> - 2013-01-30 06:37 -0800
      Re: security quirk "Auric__" <not.my.real@email.address> - 2013-01-30 22:40 +0000
    Re: security quirk Gandalf  Parker <gandalf@the.dead.ISP.of.Community.net> - 2013-01-30 14:39 +0000
      Re: security quirk RichD <r_delaney2001@yahoo.com> - 2013-01-30 11:39 -0800
        Re: security quirk alex23 <wuwei23@gmail.com> - 2013-01-30 17:16 -0800
        Re: security quirk Roedy Green <see_website@mindprod.com.invalid> - 2013-01-31 04:20 -0800
        Re: security quirk Gandalf  Parker <gandalf@the.dead.ISP.of.Community.net> - 2013-01-31 14:07 +0000
          Re: security quirk Roedy Green <see_website@mindprod.com.invalid> - 2013-02-01 05:11 -0800
    Re: security quirk Big Bad Bob <BigBadBob-at-mrp3-dot-com@testing.local> - 2013-01-30 11:59 -0800
    Re: security quirk Arne Vajhøj <arne@vajhoej.dk> - 2013-01-30 20:40 -0500

#21855 — security quirk

FromRichD <r_delaney2001@yahoo.com>
Date2013-01-29 20:55 -0800
Subjectsecurity quirk
Message-ID<b968c6c6-5aa9-4584-bd7a-5b097f17c54d@pu9g2000pbc.googlegroups.com>
I read Wall Street Journal, and occasionally check
articles on their Web site.  It's mostly free, with some items
available to subscribers only.  It seems random, which ones
they block, about 20%.

Anywho, sometimes I use their search utility, the usual author
or title search, and it blocks, then I look it up on Google, and
link from there, and it loads!  ok, Web gurus, what's going on?


--
Rich

[toc] | [next] | [standalone]


#21863

FromRoedy Green <see_website@mindprod.com.invalid>
Date2013-01-30 01:55 -0800
Message-ID<kfrhg8dhgkjnu42mdt6v138vuj0m0jmao5@4ax.com>
In reply to#21855
On Tue, 29 Jan 2013 20:55:44 -0800 (PST), RichD
<r_delaney2001@yahoo.com> wrote, quoted or indirectly quoted someone
who said :

>Anywho, sometimes I use their search utility, the usual author
>or title search, and it blocks, then I look it up on Google, and
>link from there, and it loads!  ok, Web gurus, what's going on?

This is not Java, but one way this could happen is Google buys or gets
a free subscription to the WSJ.  That enables them to spider and index
it.

The WSJ designed their security system around their own search engine
refusing to find pages, not on refusing the serve them once the URL is
known.  I have a dim view of the WSJ for reasons unrelated to the
competence of their programmers.
-- 
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development 
time. 
~ Tom Cargill  Ninety-ninety Law 

[toc] | [prev] | [next] | [standalone]


#21868

FromMartin Musatov <marty.musatov@gmail.com>
Date2013-01-30 06:37 -0800
Message-ID<2b0bd8ac-1aa1-4b42-8e60-f83b64201d8e@d8g2000pbm.googlegroups.com>
In reply to#21855
On Jan 29, 8:55 pm, RichD <r_delaney2...@yahoo.com> wrote:
> I read Wall Street Journal, and occasionally check<NotepadPlus>
    <UserLang name="MUSATOV" ext=".myl" udlVersion="2.0">
        <Settings>
            <Global caseIgnored="no" allowFoldOfComments="no"
forceLineCommentsAtBOL="no" foldCompact="yes" />
            <Prefix Keywords1="no" Keywords2="no" Keywords3="no"
Keywords4="no" Keywords5="no" Keywords6="no" Keywords7="no"
Keywords8="no" />
        </Settings>
        <KeywordLists>
            <Keywords name="Comments" id="0">00commentBegin 01comment
02commentEnd 03 04</Keywords>
            <Keywords name="Numbers, additional" id="1"></Keywords>
            <Keywords name="Numbers, prefixes" id="2"></Keywords>
            <Keywords name="Numbers, extras with prefixes" id="3"></
Keywords>
            <Keywords name="Numbers, suffixes" id="4"></Keywords>
            <Keywords name="Operators1" id="5">();</Keywords>
            <Keywords name="Operators2" id="6"></Keywords>
            <Keywords name="Folders in code1, open" id="7">Open</
Keywords>
            <Keywords name="Folders in code1, middle" id="8">middle</
Keywords>
            <Keywords name="Folders in code1, close" id="9">Close</
Keywords>
            <Keywords name="Folders in code2, open" id="10">Open</
Keywords>
            <Keywords name="Folders in code2, middle" id="11">middle</
Keywords>
            <Keywords name="Folders in code2, close" id="12">Close</
Keywords>
            <Keywords name="Folders in comment, open" id="13">Open</
Keywords>
            <Keywords name="Folders in comment, middle"
id="14">middle</Keywords>
            <Keywords name="Folders in comment, close" id="15">Close</
Keywords>
            <Keywords name="Keywords1" id="16">%%</Keywords>
            <Keywords name="Keywords2" id="17"></Keywords>
            <Keywords name="Keywords3" id="18"></Keywords>
            <Keywords name="Keywords4" id="19"></Keywords>
            <Keywords name="Keywords5" id="20"></Keywords>
            <Keywords name="Keywords6" id="21"></Keywords>
            <Keywords name="Keywords7" id="22"></Keywords>
            <Keywords name="Keywords8" id="23"></Keywords>
            <Keywords name="Delimiters" id="24"></Keywords>
        </KeywordLists>
        <Styles>
            <WordsStyle name="DEFAULT" styleID="0" fgColor="FFFFFF"
bgColor="000000" fontName="Monotype Corsiva" fontStyle="7"
fontSize="14" nesting="0" />
            <WordsStyle name="COMMENTS" styleID="1" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="LINE COMMENTS" styleID="2"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="NUMBERS" styleID="3" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS1" styleID="4" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS2" styleID="5" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS3" styleID="6" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS4" styleID="7" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS5" styleID="8" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS6" styleID="9" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS7" styleID="10" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="KEYWORDS8" styleID="11" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="OPERATORS" styleID="12" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="FOLDER IN CODE1" styleID="13"
fgColor="FFFFFF" bgColor="000000" fontName="" fontStyle="7"
fontSize="10" nesting="0" />
            <WordsStyle name="FOLDER IN CODE2" styleID="14"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="FOLDER IN COMMENT" styleID="15"
fgColor="FFFFFF" bgColor="000000" fontName="Times New Roman"
fontStyle="7" fontSize="8" nesting="0" />
            <WordsStyle name="DELIMITERS1" styleID="16"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="DELIMITERS2" styleID="17"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="DELIMITERS3" styleID="18"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="DELIMITERS4" styleID="19"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="DELIMITERS5" styleID="20"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="DELIMITERS6" styleID="21"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="DELIMITERS7" styleID="22"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
            <WordsStyle name="DELIMITERS8" styleID="23"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
        </Styles>
    </UserLang>
</NotepadPlus>

> articles on their Web site.  It's mostly free, with some items
> available to subscribers only.  It seems random, which ones
> they block, about 20%.
>
> Anywho, sometimes I use their search utility, the usual author
> or title search, and it blocks, then I look it up on Google, and
> link from there, and it loads!  ok, Web gurus, what's going on?
>
> --
> Rich

[toc] | [prev] | [next] | [standalone]


#21888

From"Auric__" <not.my.real@email.address>
Date2013-01-30 22:40 +0000
Message-ID<XnsA1589F8B53EBauricauricauricauric@78.46.70.116>
In reply to#21868
Martin Musatov wrote:

> On Jan 29, 8:55 pm, RichD <r_delaney2...@yahoo.com> wrote:
>> I read Wall Street Journal, and occasionally check<NotepadPlus>
>     <UserLang name="MUSATOV" ext=".myl" udlVersion="2.0">
[snip]
>     </UserLang>
> </NotepadPlus>

Ignoring the big ol' unneccessary crosspost... What the fuck?

-- 
Oooh, I just learned a new euphemism.

[toc] | [prev] | [next] | [standalone]


#21869

FromGandalf Parker <gandalf@the.dead.ISP.of.Community.net>
Date2013-01-30 14:39 +0000
Message-ID<XnsA15843BB95369gandalfparker@78.46.70.116>
In reply to#21855
RichD <r_delaney2001@yahoo.com> contributed wisdom to  news:b968c6c6-5aa9-
4584-bd7a-5b097f17c54d@pu9g2000pbc.googlegroups.com:

> Web gurus, what's going on?
> 

That is the fault of the site itself.
If they are going to block access to users then they should also block 
access to the automated spiders that hit the site to collect data.

[toc] | [prev] | [next] | [standalone]


#21881

FromRichD <r_delaney2001@yahoo.com>
Date2013-01-30 11:39 -0800
Message-ID<badd4188-196b-45e3-ba8a-511d471282fa@nh8g2000pbc.googlegroups.com>
In reply to#21869
On Jan 30, Gandalf  Parker <gand...@the.dead.ISP.of.Community.net>
wrote:
> > Web gurus, what's going on?
>
> That is the fault of the site itself.
> If they are going to block access to users then they should also block
> access to the automated spiders that hit the site to collect data.

well yeah, but what's going on, under the hood?
How does it get confused?  How could this
happen?  I'm looking for some insight, regarding a
hypothetical programmimg glitch -


--
Rich

[toc] | [prev] | [next] | [standalone]


#21895

Fromalex23 <wuwei23@gmail.com>
Date2013-01-30 17:16 -0800
Message-ID<ee336fe9-fb4f-455c-a345-18a6751db5be@qi8g2000pbb.googlegroups.com>
In reply to#21881
On Jan 31, 5:39 am, RichD <r_delaney2...@yahoo.com> wrote:
> well yeah, but what's going on, under the hood?
> How does it get confused?  How could this
> happen?  I'm looking for some insight, regarding a
> hypothetical programmimg glitch -

As has been stated, this has nothing to do with Python, so please stop
posting your questions here.

However, here's an answer to get you to stop repeating yourself: it's
not uncommon to find that content you're restricted from accessing via
a site's own search is available to you through Google. This has to do
with Google's policy of _requiring_ that pages that it is allowed to
index _must_ be available for view. Any site that allows Google to
index its pages that then blocks you from viewing them will swiftly
find themselves web site-a non gratis in Google search. As most
websites are attention whores, they'll do anything to ensure they
remain within Google's indices.

[toc] | [prev] | [next] | [standalone]


#21920

FromRoedy Green <see_website@mindprod.com.invalid>
Date2013-01-31 04:20 -0800
Message-ID<3bokg85edd5f494u8pa4kvnrclke98egv1@4ax.com>
In reply to#21881
On Wed, 30 Jan 2013 11:39:41 -0800 (PST), RichD
<r_delaney2001@yahoo.com> wrote, quoted or indirectly quoted someone
who said :

>well yeah, but what's going on, under the hood?
>How does it get confused?  How could this
>happen?  I'm looking for some insight, regarding a
>hypothetical programmimg glitch -
Monitor the responses in all newsgroups you post to.
-- 
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development 
time. 
~ Tom Cargill  Ninety-ninety Law 

[toc] | [prev] | [next] | [standalone]


#21922

FromGandalf Parker <gandalf@the.dead.ISP.of.Community.net>
Date2013-01-31 14:07 +0000
Message-ID<XnsA1593E485EC8Egandalfparker@46.4.102.18>
In reply to#21881
RichD <r_delaney2001@yahoo.com> contributed wisdom to  news:badd4188-196b-
45e3-ba8a-511d471282fa@nh8g2000pbc.googlegroups.com:

> On Jan 30, Gandalf  Parker <gand...@the.dead.ISP.of.Community.net>
> wrote:
>> > Web gurus, what's going on?
>>
>> That is the fault of the site itself.
>> If they are going to block access to users then they should also block
>> access to the automated spiders that hit the site to collect data.
> 
> well yeah, but what's going on, under the hood?
> How does it get confused?  How could this
> happen?  I'm looking for some insight, regarding a
> hypothetical programmimg glitch -

(from alt.hacker)

You dont understand. It is not in the code. It is in the site.
It is as if someone comes and picks fruit off of your tree, and you are 
questioning the tree for how it bears fruit. 

The site creates web pages. 
Google collects web pages.
The site needs to set things like robot.txt to tell Google to NOT collect 
the pages in the archives. Which is not an absolute protection but at least 
its an effort that works for most sites.

[toc] | [prev] | [next] | [standalone]


#21954

FromRoedy Green <see_website@mindprod.com.invalid>
Date2013-02-01 05:11 -0800
Message-ID<lffng89fvr74fulvo98lqd0hhu9tkl8spr@4ax.com>
In reply to#21922
On Thu, 31 Jan 2013 14:07:21 +0000 (UTC), Gandalf  Parker
<gandalf@the.dead.ISP.of.Community.net> wrote, quoted or indirectly
quoted someone who said :

>The site creates web pages. 
>Google collects web pages.
>The site needs to set things like robot.txt to tell Google to NOT collect 
>the pages in the archives. Which is not an absolute protection but at least 
>its an effort that works for most sites.

To the site, Google is just a voracious reader.  If they block readers
from hoovering up content, that automatically stops Google.

The site owners wanted Google to spider the site, bring in customers,
then hit them with a fee.  They forgot that anyone coming in directly
via Google's links would bypass their own search engine.
-- 
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development 
time. 
~ Tom Cargill  Ninety-ninety Law 

[toc] | [prev] | [next] | [standalone]


#21882

FromBig Bad Bob <BigBadBob-at-mrp3-dot-com@testing.local>
Date2013-01-30 11:59 -0800
Message-ID<5e2dncvzz7Y55pTMnZ2dnUVZ_smdnZ2d@earthlink.com>
In reply to#21855
On 01/29/13 20:55, RichD so wittily quipped:
> I read Wall Street Journal, and occasionally check
> articles on their Web site.  It's mostly free, with some items
> available to subscribers only.  It seems random, which ones
> they block, about 20%.
>
> Anywho, sometimes I use their search utility, the usual author
> or title search, and it blocks, then I look it up on Google, and
> link from there, and it loads!  ok, Web gurus, what's going on?

in my last post, I quoted an article from 'The Register' where they talk 
about how Facebook (literally) "broke" that feature.

[this works in a LOT of places, but sometimes you have to enable cookies 
or javascript to actually see the content]

[toc] | [prev] | [next] | [standalone]


#21896

FromArne Vajhøj <arne@vajhoej.dk>
Date2013-01-30 20:40 -0500
Message-ID<5109cb8c$0$288$14726298@news.sunsite.dk>
In reply to#21855
On 1/29/2013 11:55 PM, RichD wrote:
> I read Wall Street Journal, and occasionally check
> articles on their Web site.  It's mostly free, with some items
> available to subscribers only.  It seems random, which ones
> they block, about 20%.
>
> Anywho, sometimes I use their search utility, the usual author
> or title search, and it blocks, then I look it up on Google, and
> link from there, and it loads!  ok, Web gurus, what's going on?

WSJ want their articles to be findable from Google.

So they open up for Google indexing them.

If they require any type of registration to see an article,
then Google will remove the link.

So therefore WSJ (and many other web sites!) gives more access
if you come from Google than if not.

Arne

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web