Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #1409

Re: Failure of Java HTTP, some puzzles for those with not enough to do

Newsgroups comp.lang.java.help
Subject Re: Failure of Java HTTP, some puzzles for those with not enough to do
From Ian Shef <invalid@avoiding.spam>
References <f7qad7h1a48tcaunk3u44jn8783hnutt2b@4ax.com> <Z4eBq.279$LO2.151@newsfe13.iad> <b4uad7tkpp8lkjk07ko7512ae3es8slbsp@4ax.com>
Message-ID <Xns9FAD643CFAAC3vaj4088ianshef@138.125.254.103> (permalink)
Date 2011-11-30 16:51 +0000
Organization Raytheon Company

Show all headers | View raw


Roedy Green <see_website@mindprod.com.invalid> wrote in 
news:b4uad7tkpp8lkjk07ko7512ae3es8slbsp@4ax.com:

<snip>
> 
> One trick I ran into was you could not just go to the page. You had to
> go to the home page and navigate your way there picking up cookies as
> you went.
> 
One web site that I screenscrape forces me not only to pick up cookies along 
the way, but also pick up the value of variables that are buried in 
Javascript.  Sometimes these variables are passed as a parameter in a POST, 
and sometimes as part of a document window.

I use WebScarab to view how a browser handles the web page, and then I figure 
out how to emulate this behavior in Java.  The web page keeps evolving;  the 
authors seem to be finding increasingly arcane ways to make this hard.  I 
don't know that the authors are intentionally making this hard.  It could be 
that they are using tools that automate the process and that are evolving as 
well.

Sometimes there are multiple redirects, with each redirect providing another 
critical piece of data (cookie, POST parameter, document window variable) in 
the chain.

Good Luck!

Back to comp.lang.java.help | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Failure of Java HTTP, some puzzles for those with not enough to do Roedy Green <see_website@mindprod.com.invalid> - 2011-11-29 15:28 -0800
  Re: Failure of Java HTTP, some puzzles for those with not enough to do Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-11-29 15:41 -0800
    Re: Failure of Java HTTP, some puzzles for those with not enough to do Roedy Green <see_website@mindprod.com.invalid> - 2011-11-29 16:36 -0800
      Re: Failure of Java HTTP, some puzzles for those with not enough to do Ian Shef <invalid@avoiding.spam> - 2011-11-30 16:51 +0000
        Re: Failure of Java HTTP, some puzzles for those with not enough to do Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-11-30 11:30 -0800
        Re: Failure of Java HTTP, some puzzles for those with not enough to do Roedy Green <see_website@mindprod.com.invalid> - 2011-12-01 01:59 -0800
          Re: Failure of Java HTTP, some puzzles for those with not enough to do Lew <lewbloch@gmail.com> - 2011-12-01 06:47 -0800
            Re: Failure of Java HTTP, some puzzles for those with not enough to do Roedy Green <see_website@mindprod.com.invalid> - 2011-12-02 01:58 -0800
              Re: Failure of Java HTTP, some puzzles for those with not enough to do Lew <lewbloch@gmail.com> - 2011-12-02 07:20 -0800
                Re: Failure of Java HTTP, some puzzles for those with not enough to do Roedy Green <see_website@mindprod.com.invalid> - 2011-12-03 09:23 -0800
                Re: Failure of Java HTTP, some puzzles for those with not enough to do Lew <lewbloch@gmail.com> - 2011-12-03 11:28 -0800
                Re: Failure of Java HTTP, some puzzles for those with not enough to do Roedy Green <see_website@mindprod.com.invalid> - 2011-12-06 21:47 -0800
                Re: Failure of Java HTTP, some puzzles for those with not enough to do Lew <lewbloch@gmail.com> - 2011-12-07 07:08 -0800
                Re: Failure of Java HTTP, some puzzles for those with not enough to do Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-12-07 10:27 -0600

csiph-web