Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #100079 > unrolled thread
| Started by | iverson.zhou@gmail.com |
|---|---|
| First post | 2015-12-07 00:03 -0800 |
| Last post | 2015-12-08 08:40 +0100 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
Help with stale exception Python iverson.zhou@gmail.com - 2015-12-07 00:03 -0800
Re: Help with stale exception Python dieter <dieter@handshake.de> - 2015-12-08 08:40 +0100
| From | iverson.zhou@gmail.com |
|---|---|
| Date | 2015-12-07 00:03 -0800 |
| Subject | Help with stale exception Python |
| Message-ID | <99840119-b7c2-4171-b2f8-7c600c85bd87@googlegroups.com> |
I'm new to Python and programming. Been learning it for 3 weeks now but have had lot of obstacles along the way. I found some of your insights very useful as a starter but I have come across many more complicated challenges that aren't very intuitive.
For example,I'm trying to scrap this web(via university library (fully access) so it is a proxy) using selenium (because it is very heavily java script driven). There is a button which allows user to navigate to the next page of company and my script go and find the elements of interest from each page write to a csv and then click to the next page and do it recursively. I have a couple of problems need some help with. Firstly the element that I'm really interested is only company website(which isn't always there) but when it is there the location of the element can change all the time(see http://pasteboard.co/2GOHkbAD.png and http://pasteboard.co/2GOK2NBT.png) depending on the number of elements in the parent level. I'm using driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")
hoping to capture all information(e.g. phone,email,website) and then do some cleansing later on However, it appears not all the web elements are captured using this method and write to csv from each page. Some pages were written to the file but some were missing. I couldn't figure it out why.
A second problem which is a more complicate and have been driving me nuts was the the DOM changes as a result of web content changes and elements are destroyed and/maybe being recreated after driver.find_element_by_id('detail-pagination-next-btn').click()
I have tried uncountable number of methods (e.g. explicit, implicit wait) but the stale error still persists as it seems to stays stale as long as it is staled.
Have anyone come up with a solution to this and what is the best way to deal with DOM tree changes.
Much appreciated for your help. My code is attached:
with open('C:/Python34/email.csv','w') as f:
z=csv.writer(f, delimiter='\t',lineterminator = '\n',)
while True:
row = []
for link in driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]"):
try:
row.append(str(link.text))
z.writerow(link.text)
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
time.sleep(10)
c=driver.find_element_by_id('detail-pagination-next-btn')
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
c.click()
time.sleep(10)
continue
except StaleElementReferenceException as e:
c=driver.find_element_by_id('detail-pagination-next-btn')
for link in driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]"):
row.append(str(link.text))
z.writerow(link.text)
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
time.sleep(10)
c=driver.find_element_by_id('detail-pagination-next-btn')
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
c.click()
time.sleep(10)
much appreciated
Iverson
[toc] | [next] | [standalone]
| From | dieter <dieter@handshake.de> |
|---|---|
| Date | 2015-12-08 08:40 +0100 |
| Message-ID | <mailman.51.1449560439.12405.python-list@python.org> |
| In reply to | #100079 |
iverson.zhou@gmail.com writes:
> ...
> I have tried uncountable number of methods (e.g. explicit, implicit wait) but the stale error still persists as it seems to stays stale as long as it is staled.
>
> Have anyone come up with a solution to this and what is the best way to deal with DOM tree changes.
> ...
> with open('C:/Python34/email.csv','w') as f:
> z=csv.writer(f, delimiter='\t',lineterminator = '\n',)
> while True:
> row = []
> for link in driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]"):
The "find_elements_by_xpath" likely gives you a list of elements
on the *initial* page. The "for" sets things up that this list
is iterated over.
> try:
> ...
> c=driver.find_element_by_id('detail-pagination-next-btn')
> ...
> c.click()
> ...
> except StaleElementReferenceException as e:
This "click" may change the page which means that
you would now be on a page different from the initial page.
It is likely that your web access framework contains some form
of automatic garbage collection: when you switch to a new page,
references to the old page may become stale.
This could explain that you sometimes observe a
"StaleElementReferenceException".
Read the documentation of your web access framework to find
out whether you can control stalelifying of elements when you
switch a page.
If this is impossible, avoid accessing elements of a page
once you have switched to a new page. To this end, extract (and store)
all relevant information from the page before you switch to a new one
and, if necessary, use the extracted information to later restore
the previous page state.
As you can see from my explanation: your question is much more
related to your web access framework than to Python in general.
Likely, there is a forum (dedicated to this framework)
that can better help you with this question than this general
Python list.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web