Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90228
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2015-05-09 03:31 -0700 |
| References | <f4d02ce1-f528-4632-acb1-af667690a064@googlegroups.com> <mailman.260.1431113391.12865.python-list@python.org> <8360473a-45ac-4270-9bf3-81932da5f223@googlegroups.com> <554d768d$0$13000$c3e8da3$5496439d@news.astraweb.com> |
| Message-ID | <580ee0d6-a703-4da3-af2d-105589a1780f@googlegroups.com> (permalink) |
| Subject | Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape |
| From | zljubisicmob@gmail.com |
Steven,
please do look at the code bellow:
# C:\Users\zoran\PycharmProjects\mm_align\hrt3.cfg contents
# [Dir]
# ROOTDIR = C:\Users\zoran\hrt
import os
import shutil
import configparser
import requests
import re
Config = configparser.ConfigParser()
Config.optionxform = str # preserve case in ini file
cfg_file = os.path.join('C:\\Users\\zoran\\PycharmProjects\\mm_align\\hrt3.cfg' )
Config.read(cfg_file)
ROOTDIR = Config.get('Dir', 'ROOTDIR')
print(ROOTDIR)
html = requests.get("http://radio.hrt.hr/prvi-program/arhiva/ujutro-prvi-poligraf-politicki-grafikon/118/").text
art_html = re.search('<article id="aod_0">(.+?)</article>', html, re.DOTALL).group(1)
for p_tag in re.finditer(r'<p>(.*?)</p>', art_html, re.DOTALL):
if '<strong>' not in p_tag.group(1):
title = p_tag.group(1)
title = title[:232]
title = title.replace(" ", "_").replace("/", "_").replace("!", "_").replace("?", "_")\
.replace('"', "_").replace(':', "_").replace(',', "_").replace('"', '')\
.replace('\n', '_').replace(''', '')
print(title)
src_file = os.path.join(ROOTDIR, 'src_' + title + '.txt')
dst_file = os.path.join(ROOTDIR, 'des_' + title + '.txt')
print(len(src_file), src_file)
print(len(dst_file), dst_file)
with open(src_file, mode='w', encoding='utf-8') as s_file:
s_file.write('test')
shutil.move(src_file, dst_file)
It works, but if you change title = title[:232] to title = title[:233], you will get "FileNotFoundError: [Errno 2] No such file or directory".
As you can see ROOTDIR contains \U.
Regards.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-08 12:00 -0700
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape random832@fastmail.us - 2015-05-08 15:29 -0400
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-08 13:39 -0700
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-09 12:53 +1000
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-09 03:31 -0700
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Dave Angel <davea@davea.name> - 2015-05-09 08:25 -0400
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-10 14:10 -0700
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Dave Angel <davea@davea.name> - 2015-05-10 21:33 -0400
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-12 11:57 -0700
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-10 01:13 +1000
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Chris Angelico <rosuav@gmail.com> - 2015-05-10 01:22 +1000
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-10 14:14 -0700
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape MRAB <python@mrabarnett.plus.com> - 2015-05-08 20:33 +0100
Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Chris Angelico <rosuav@gmail.com> - 2015-05-09 08:54 +1000
csiph-web