Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #90228

Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Newsgroups comp.lang.python
Date 2015-05-09 03:31 -0700
References <f4d02ce1-f528-4632-acb1-af667690a064@googlegroups.com> <mailman.260.1431113391.12865.python-list@python.org> <8360473a-45ac-4270-9bf3-81932da5f223@googlegroups.com> <554d768d$0$13000$c3e8da3$5496439d@news.astraweb.com>
Message-ID <580ee0d6-a703-4da3-af2d-105589a1780f@googlegroups.com> (permalink)
Subject Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
From zljubisicmob@gmail.com

Show all headers | View raw


Steven,

please do look at the code bellow:

# C:\Users\zoran\PycharmProjects\mm_align\hrt3.cfg contents
# [Dir]
# ROOTDIR = C:\Users\zoran\hrt


import os
import shutil
import configparser
import requests
import re

Config = configparser.ConfigParser()
Config.optionxform = str # preserve case in ini file
cfg_file = os.path.join('C:\\Users\\zoran\\PycharmProjects\\mm_align\\hrt3.cfg' )
Config.read(cfg_file)



ROOTDIR = Config.get('Dir', 'ROOTDIR')

print(ROOTDIR)

html = requests.get("http://radio.hrt.hr/prvi-program/arhiva/ujutro-prvi-poligraf-politicki-grafikon/118/").text

art_html = re.search('<article id="aod_0">(.+?)</article>', html, re.DOTALL).group(1)
for p_tag in re.finditer(r'<p>(.*?)</p>', art_html, re.DOTALL):
    if '<strong>' not in p_tag.group(1):
        title = p_tag.group(1)

title = title[:232]
title = title.replace(" ", "_").replace("/", "_").replace("!", "_").replace("?", "_")\
                    .replace('"', "_").replace(':', "_").replace(',', "_").replace('&#34;', '')\
                    .replace('\n', '_').replace('&#39', '')

print(title)

src_file = os.path.join(ROOTDIR, 'src_' + title + '.txt')
dst_file = os.path.join(ROOTDIR, 'des_' + title + '.txt')

print(len(src_file), src_file)
print(len(dst_file), dst_file)

with open(src_file, mode='w', encoding='utf-8') as s_file:
    s_file.write('test')


shutil.move(src_file, dst_file)

It works, but if you change title = title[:232] to title = title[:233], you will get "FileNotFoundError: [Errno 2] No such file or directory".
As you can see ROOTDIR contains \U.

Regards.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-08 12:00 -0700
  Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape random832@fastmail.us - 2015-05-08 15:29 -0400
    Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-08 13:39 -0700
      Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-09 12:53 +1000
        Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-09 03:31 -0700
          Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Dave Angel <davea@davea.name> - 2015-05-09 08:25 -0400
            Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-10 14:10 -0700
              Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Dave Angel <davea@davea.name> - 2015-05-10 21:33 -0400
                Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-12 11:57 -0700
          Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-10 01:13 +1000
            Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Chris Angelico <rosuav@gmail.com> - 2015-05-10 01:22 +1000
            Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape zljubisicmob@gmail.com - 2015-05-10 14:14 -0700
  Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape MRAB <python@mrabarnett.plus.com> - 2015-05-08 20:33 +0100
  Re: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Chris Angelico <rosuav@gmail.com> - 2015-05-09 08:54 +1000

csiph-web