Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #53323 > unrolled thread
| Started by | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| First post | 2013-08-31 09:41 +0300 |
| Last post | 2013-09-02 20:49 -0400 |
| Articles | 20 on this page of 50 — 11 participants |
Back to article view | Back to comp.lang.python
UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 09:41 +0300
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-08-31 16:53 +1000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:02 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:18 +0300
Re: UnicodeDecodeError issue Peter Otten <__peter__@web.de> - 2013-08-31 09:25 +0200
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:58 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 11:31 +0300
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 11:28 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 15:58 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 16:07 +0300
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 15:44 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-08-31 23:50 -0700
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:12 +1000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 10:23 +0300
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:28 +1000
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 10:35 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 16:59 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:40 +0000
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 20:51 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-01 08:35 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:08 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:25 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:36 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 19:10 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 01:23 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 23:14 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 07:16 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 11:38 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 14:49 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:21 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 18:05 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 18:28 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-09-04 01:35 -0700
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 11:26 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 14:38 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 12:38 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 17:29 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-05 00:17 +0000
Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 03:07 +0000
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-05 13:59 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 05:28 +0000
Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 12:56 +0100
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:24 +0000
Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 15:44 +0100
Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-03 08:23 -0700
Re: UnicodeDecodeError issue Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-09-04 10:01 +0200
Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-04 07:08 -0700
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-03 08:45 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-03 14:56 +0000
Re: UnicodeDecodeError issue Joel Goldstick <joel.goldstick@gmail.com> - 2013-09-02 20:49 -0400
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-01 17:08 +0300 |
| Message-ID | <kvvhoi$1v4o$1@news.ntua.gr> |
| In reply to | #53406 |
Στις 1/9/2013 11:35 πμ, ο/η Steven D'Aprano έγραψε:
> On Sat, 31 Aug 2013 23:50:23 -0700, Ferrous Cranus wrote:
>
>> Τη Σάββατο, 31 Αυγούστου 2013 9:41:27 π.μ. UTC+3, ο χρήστης Ferrous
>> Cranus έγραψε:
>>> Suddenly my webiste superhost.gr running my main python script presents
>>>
>>> me with this error:
>>>
>>>
>>>
>>> Code:
>>>
>>> UnicodeDecodeError('utf-8', b'\xb6\xe3\xed\xf9\xf3\xf4\xef
>>>
>>> \xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2', 0, 1,
>>>
>>> 'invalid start byte')
>>>
>>>
>>>
>>>
>>>
>>> Does anyone know what this means?
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Webhost <http://superhost.gr>
>>
>> Good morning Steven,
>>
>> Ye i'm aware that i need to define variables before i try to make use of
>> them. I have study all of your examples and then re-view my code and i
>> can *assure* you that the line statement that tied to set the 'host'
>> variable is very early at the top of the script(of course after
>> imports), and the cur.execute comes after.
>>
>> The problem here is not what you say, that i try to drink k a coffee
>> before actually making one first but rather than i cannot drink the
>> coffee although i know *i have tried* to make one first.
>>
>>
>> i will upload the code for you to prove my sayings at pastebin.
>>
>> http://pastebin.com/J97guApQ
>
>
> You are mistaken. In line 20-25, you have this:
>
> try:
> gi = pygeoip.GeoIP('/usr/local/share/GeoIPCity.dat')
> city = gi.time_zone_by_addr( os.environ['REMOTE_ADDR'] ) or
> gi.time_zone_by_addr( os.environ['HTTP_CF_CONNECTING_IP'] )
> host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
> socket.gethostbyaddr( os.environ['HTTP_CF_CONNECTING_IP'] )[0]
> or "Proxy Detected"
> except Exception as e:
> print( repr(e), file=open( '/tmp/err.out', 'w' ) )
>
>
> An error occurs inside that block, *before* host gets set. Who knows what
> the error is? You have access to the err.out file, but apparently you
> aren't reading it to find out.
>
> Then, 110 lines later, at line 135, you try to access the value of "host"
> that never got set.
>
> Your job is to read the error in /tmp/err.out, see what is failing, and
> fix it.
>
>
But i'm Steven! That why i make use of it to read it immediately after
my script run at browser time.
i have even included a sys.exit(0) after the try:/except block:
Here is it:
errout = open( '/tmp/err.out', 'w' ) # opens and truncates the error
output file
try:
gi = pygeoip.GeoIP('/usr/local/share/GeoIPCity.dat')
city = gi.time_zone_by_addr( os.environ['REMOTE_ADDR'] ) or
gi.time_zone_by_addr( os.environ['HTTP_CF_CONNECTING_IP'] )
host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
socket.gethostbyaddr( os.environ['HTTP_CF_CONNECTING_IP'] )[0] or "Proxy
Detected"
except Exception as e:
print( "Xyzzy exception-", repr( sys.exc_info() ), file=errout )
errout.flush()
sys.exit(0)
and the output of error file is:
nikos@superhost.gr [~]# cat /tmp/err.out
UnicodeDecodeError('utf-8', b'\xb6\xe3\xed\xf9\xf3\xf4\xef
\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2', 0, 1,
'invalid start byte')
--
Webhost <http://superhost.gr>
--
Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-01 17:25 +0300 |
| Message-ID | <kvvioj$21rg$1@news.ntua.gr> |
| In reply to | #53415 |
Στις 1/9/2013 5:08 μμ, ο/η Ferrous Cranus έγραψε:
> Στις 1/9/2013 11:35 πμ, ο/η Steven D'Aprano έγραψε:
>> On Sat, 31 Aug 2013 23:50:23 -0700, Ferrous Cranus wrote:
>>
>>> Τη Σάββατο, 31 Αυγούστου 2013 9:41:27 π.μ. UTC+3, ο χρήστης Ferrous
>>> Cranus έγραψε:
>>>> Suddenly my webiste superhost.gr running my main python script presents
>>>>
>>>> me with this error:
>>>>
>>>>
>>>>
>>>> Code:
>>>>
>>>> UnicodeDecodeError('utf-8', b'\xb6\xe3\xed\xf9\xf3\xf4\xef
>>>>
>>>> \xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2', 0, 1,
>>>>
>>>> 'invalid start byte')
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Does anyone know what this means?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Webhost <http://superhost.gr>
>>>
>>> Good morning Steven,
>>>
>>> Ye i'm aware that i need to define variables before i try to make use of
>>> them. I have study all of your examples and then re-view my code and i
>>> can *assure* you that the line statement that tied to set the 'host'
>>> variable is very early at the top of the script(of course after
>>> imports), and the cur.execute comes after.
>>>
>>> The problem here is not what you say, that i try to drink k a coffee
>>> before actually making one first but rather than i cannot drink the
>>> coffee although i know *i have tried* to make one first.
>>>
>>>
>>> i will upload the code for you to prove my sayings at pastebin.
>>>
>>> http://pastebin.com/J97guApQ
>>
>>
>> You are mistaken. In line 20-25, you have this:
>>
>> try:
>> gi = pygeoip.GeoIP('/usr/local/share/GeoIPCity.dat')
>> city = gi.time_zone_by_addr( os.environ['REMOTE_ADDR'] ) or
>> gi.time_zone_by_addr( os.environ['HTTP_CF_CONNECTING_IP'] )
>> host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
>> socket.gethostbyaddr( os.environ['HTTP_CF_CONNECTING_IP'] )[0]
>> or "Proxy Detected"
>> except Exception as e:
>> print( repr(e), file=open( '/tmp/err.out', 'w' ) )
>>
>>
>> An error occurs inside that block, *before* host gets set. Who knows what
>> the error is? You have access to the err.out file, but apparently you
>> aren't reading it to find out.
>>
>> Then, 110 lines later, at line 135, you try to access the value of "host"
>> that never got set.
>>
>> Your job is to read the error in /tmp/err.out, see what is failing, and
>> fix it.
>>
>>
>
> But i'm Steven! That why i make use of it to read it immediately after
> my script run at browser time.
>
> i have even included a sys.exit(0) after the try:/except block:
>
> Here is it:
>
>
> errout = open( '/tmp/err.out', 'w' ) # opens and truncates the
> error output file
> try:
> gi = pygeoip.GeoIP('/usr/local/share/GeoIPCity.dat')
> city = gi.time_zone_by_addr( os.environ['REMOTE_ADDR'] ) or
> gi.time_zone_by_addr( os.environ['HTTP_CF_CONNECTING_IP'] )
> host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
> socket.gethostbyaddr( os.environ['HTTP_CF_CONNECTING_IP'] )[0] or "Proxy
> Detected"
> except Exception as e:
> print( "Xyzzy exception-", repr( sys.exc_info() ), file=errout )
> errout.flush()
>
> sys.exit(0)
>
> and the output of error file is:
>
>
> nikos@superhost.gr [~]# cat /tmp/err.out
> UnicodeDecodeError('utf-8', b'\xb6\xe3\xed\xf9\xf3\xf4\xef
> \xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2', 0, 1,
> 'invalid start byte')
>
But i noticed that err.out and /usr/local/apache/logs/error_log produced
different output.
In any case i check both:
nikos@superhost.gr [~]# chmod 777 /tmp/err2.out
ouput of error_log
nikos@superhost.gr [~]# [Sun Sep 01 14:23:46 2013] [error] [client
173.245.49.120] Premature end of script headers: metrites.py
[Sun Sep 01 14:23:46 2013] [error] [client 173.245.49.120] File does not
exist: /home/nikos/public_html/500.shtml
Also i have even changed output error filename.
turns out empty.
nikos@superhost.gr [~]# cat /tmp/err2.out
--
Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-01 15:36 +0000 |
| Message-ID | <mailman.450.1378049809.19984.python-list@python.org> |
| In reply to | #53415 |
On 1/9/2013 10:08, Ferrous Cranus wrote:
<snip>
> Here is it:
>
>
> errout = open( '/tmp/err.out', 'w' ) # opens and truncates the error
> output file
> try:
> gi = pygeoip.GeoIP('/usr/local/share/GeoIPCity.dat')
> city = gi.time_zone_by_addr( os.environ['REMOTE_ADDR'] ) or
> gi.time_zone_by_addr( os.environ['HTTP_CF_CONNECTING_IP'] )
> host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
> socket.gethostbyaddr( os.environ['HTTP_CF_CONNECTING_IP'] )[0] or "Proxy
> Detected"
> except Exception as e:
> print( "Xyzzy exception-", repr( sys.exc_info() ), file=errout )
> errout.flush()
>
> sys.exit(0)
>
> and the output of error file is:
>
>
> nikos@superhost.gr [~]# cat /tmp/err.out
> UnicodeDecodeError('utf-8', b'\xb6\xe3\xed\xf9\xf3\xf4\xef
> \xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2', 0, 1,
> 'invalid start byte')
>
Nope. The label "Xyzzy exception" is not in that file, so that's not
the file you created in this run. Further, if that line existed before,
it would have been wiped out by the open with mode "w".
i suggest you add yet another write to that file, immediately after
opening it:
errout = open( '/tmp/err.out', 'w' ) # opens and truncates the error
print("starting run", file=errorout)
errout.flush()
Until you can reliably examine the same file that was logging your
errors, you're just spinning your wheels. you might even want to write
the time to the file, so that you can tell whether it was now, or 2 days
ago that the run was made.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-01 19:10 +0300 |
| Message-ID | <kvvoua$2ifa$1@news.ntua.gr> |
| In reply to | #53417 |
Στις 1/9/2013 6:36 μμ, ο/η Dave Angel έγραψε:
> On 1/9/2013 10:08, Ferrous Cranus wrote:
>
> <snip>
>> Here is it:
>>
>>
>> errout = open( '/tmp/err.out', 'w' ) # opens and truncates the error
>> output file
>> try:
>> gi = pygeoip.GeoIP('/usr/local/share/GeoIPCity.dat')
>> city = gi.time_zone_by_addr( os.environ['REMOTE_ADDR'] ) or
>> gi.time_zone_by_addr( os.environ['HTTP_CF_CONNECTING_IP'] )
>> host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
>> socket.gethostbyaddr( os.environ['HTTP_CF_CONNECTING_IP'] )[0] or "Proxy
>> Detected"
>> except Exception as e:
>> print( "Xyzzy exception-", repr( sys.exc_info() ), file=errout )
>> errout.flush()
>>
>> sys.exit(0)
>>
>> and the output of error file is:
>>
>>
>> nikos@superhost.gr [~]# cat /tmp/err.out
>> UnicodeDecodeError('utf-8', b'\xb6\xe3\xed\xf9\xf3\xf4\xef
>> \xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2', 0, 1,
>> 'invalid start byte')
>>
>
> Nope. The label "Xyzzy exception" is not in that file, so that's not
> the file you created in this run. Further, if that line existed before,
> it would have been wiped out by the open with mode "w".
>
> i suggest you add yet another write to that file, immediately after
> opening it:
>
> errout = open( '/tmp/err.out', 'w' ) # opens and truncates the error
> print("starting run", file=errorout)
> errout.flush()
>
> Until you can reliably examine the same file that was logging your
> errors, you're just spinning your wheels. you might even want to write
> the time to the file, so that you can tell whether it was now, or 2 days
> ago that the run was made.
>
>
I tried it and it printed nothing.
But suddenly thw ebpage sttaed to run and i get n invalid byte entried
and no weird messge files.py is working as expcted.
what on earht?
Now i ahve thso error:
#
=================================================================================================================
# DATABASE INSERTS - do not increment the counter if a Cookie is set to
the visitors browser already
#
=================================================================================================================
if( not vip and re.search(
r'(msn|gator|amazon|yandex|reverse|cloudflare|who|fetch|barracuda|spider|google|crawl|pingdom)',
host ) is None ):
print( "i'm in and data is: ", host )
try:
#find the needed counter for the page URL
if os.path.exists( path + page ) or os.path.exists( cgi_path + page ):
cur.execute('''SELECT ID FROM counters WHERE url = %s''', page )
data = cur.fetchone() #URL is unique, so should only be one
if not data:
#first time for page; primary key is automatic, hit is defaulted
cur.execute('''INSERT INTO counters (url) VALUES (%s)''', page )
cID = cur.lastrowid #get the primary key value of the new record
else:
#found the page, save primary key and use it to issue hit UPDATE
cID = data[0]
cur.execute('''UPDATE counters SET hits = hits + 1 WHERE ID = %s''',
cID )
#find the visitor record for the (saved) cID and current host
cur.execute('''SELECT * FROM visitors WHERE counterID = %s and host =
%s''', (cID, host) )
data = cur.fetchone() #cID&host are unique
if not data:
#first time for this host on this page, create new record
cur.execute('''INSERT INTO visitors (counterID, host, city, useros,
browser, lastvisit) VALUES (%s, %s, %s, %s, %s, %s)''', (cID, host,
city, useros, browser, date) )
else:
#found the page, save its primary key for later use
vID = data[0]
#UPDATE record using retrieved vID
cur.execute('''UPDATE visitors SET city = %s, useros = %s, browser =
%s, hits = hits + 1, lastvisit = %s
WHERE counterID = %s and host = %s''', (city, useros, browser,
date, vID, host) )
con.commit() #if we made it here, the transaction is complete
except pymysql.ProgrammingError as e:
print( repr(e) )
con.rollback() #something failed, rollback the entire transaction
sys.exit(0)
i get no counter increment when visitors visit my webpage.
What on eart is going on?
How the previous error with the invalid byte somehtign got solved?
--
Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-02 01:23 +0300 |
| Message-ID | <l00epm$1ce3$1@news.ntua.gr> |
| In reply to | #53419 |
Στις 1/9/2013 7:10 μμ, ο/η Ferrous Cranus έγραψε:
> Στις 1/9/2013 6:36 μμ, ο/η Dave Angel έγραψε:
>> On 1/9/2013 10:08, Ferrous Cranus wrote:
>>
>> <snip>
>>> Here is it:
>>>
>>>
>>> errout = open( '/tmp/err.out', 'w' ) # opens and truncates the
>>> error
>>> output file
>>> try:
>>> gi = pygeoip.GeoIP('/usr/local/share/GeoIPCity.dat')
>>> city = gi.time_zone_by_addr( os.environ['REMOTE_ADDR'] ) or
>>> gi.time_zone_by_addr( os.environ['HTTP_CF_CONNECTING_IP'] )
>>> host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
>>> socket.gethostbyaddr( os.environ['HTTP_CF_CONNECTING_IP'] )[0] or "Proxy
>>> Detected"
>>> except Exception as e:
>>> print( "Xyzzy exception-", repr( sys.exc_info() ), file=errout )
>>> errout.flush()
>>>
>>> sys.exit(0)
>>>
>>> and the output of error file is:
>>>
>>>
>>> nikos@superhost.gr [~]# cat /tmp/err.out
>>> UnicodeDecodeError('utf-8', b'\xb6\xe3\xed\xf9\xf3\xf4\xef
>>> \xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2', 0, 1,
>>> 'invalid start byte')
>>>
>>
>> Nope. The label "Xyzzy exception" is not in that file, so that's not
>> the file you created in this run. Further, if that line existed before,
>> it would have been wiped out by the open with mode "w".
>>
>> i suggest you add yet another write to that file, immediately after
>> opening it:
>>
>> errout = open( '/tmp/err.out', 'w' ) # opens and truncates the
>> error
>> print("starting run", file=errorout)
>> errout.flush()
>>
>> Until you can reliably examine the same file that was logging your
>> errors, you're just spinning your wheels. you might even want to write
>> the time to the file, so that you can tell whether it was now, or 2 days
>> ago that the run was made.
>>
>>
>
>
> I tried it and it printed nothing.
> But suddenly thw ebpage sttaed to run and i get n invalid byte entried
> and no weird messge files.py is working as expcted.
> what on earht?
>
> Now i ahve thso error:
>
> #
> =================================================================================================================
>
> # DATABASE INSERTS - do not increment the counter if a Cookie is set to
> the visitors browser already
> #
> =================================================================================================================
>
> if( not vip and re.search(
> r'(msn|gator|amazon|yandex|reverse|cloudflare|who|fetch|barracuda|spider|google|crawl|pingdom)',
> host ) is None ):
>
> print( "i'm in and data is: ", host )
> try:
> #find the needed counter for the page URL
> if os.path.exists( path + page ) or os.path.exists( cgi_path +
> page ):
> cur.execute('''SELECT ID FROM counters WHERE url = %s''',
> page )
> data = cur.fetchone() #URL is unique, so should only
> be one
>
> if not data:
> #first time for page; primary key is automatic, hit is
> defaulted
> cur.execute('''INSERT INTO counters (url) VALUES (%s)''',
> page )
> cID = cur.lastrowid #get the primary key value of
> the new record
> else:
> #found the page, save primary key and use it to issue hit
> UPDATE
> cID = data[0]
> cur.execute('''UPDATE counters SET hits = hits + 1 WHERE ID
> = %s''', cID )
>
> #find the visitor record for the (saved) cID and current host
> cur.execute('''SELECT * FROM visitors WHERE counterID = %s and
> host = %s''', (cID, host) )
> data = cur.fetchone() #cID&host are unique
>
> if not data:
> #first time for this host on this page, create new record
> cur.execute('''INSERT INTO visitors (counterID, host, city,
> useros, browser, lastvisit) VALUES (%s, %s, %s, %s, %s, %s)''', (cID,
> host, city, useros, browser, date) )
> else:
> #found the page, save its primary key for later use
> vID = data[0]
> #UPDATE record using retrieved vID
> cur.execute('''UPDATE visitors SET city = %s, useros = %s,
> browser = %s, hits = hits + 1, lastvisit = %s
> WHERE counterID = %s and host =
> %s''', (city, useros, browser, date, vID, host) )
>
> con.commit() #if we made it here, the transaction is
> complete
>
> except pymysql.ProgrammingError as e:
> print( repr(e) )
> con.rollback() #something failed, rollback the entire
> transaction
> sys.exit(0)
>
>
> i get no counter increment when visitors visit my webpage.
> What on eart is going on?
>
> How the previous error with the invalid byte somehtign got solved?
>
i still wonder how come the invalid byte messge dissapeared
--
Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-01 23:14 +0000 |
| Message-ID | <mailman.462.1378077287.19984.python-list@python.org> |
| In reply to | #53437 |
On 1/9/2013 18:23, Ferrous Cranus wrote:
<snip>
>>
> i still wonder how come the invalid byte messge dissapeared
>
Too bad you never bothered to narrow it down to its source. It could
be anywhere on those three lines. If I had to guess, I'd figure it was
one of those environment variables. The Linux environment variables are
strings of bytes, and the os.environ is a dict of strings. Apparently
it converts them using utf-8, and if you've somehow set them using some
other encoding, you could be getting that error.
Have you tried to decode those bytes in various encodings other than
utf-8 ?
--
Signature file not found
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-02 07:16 +0300 |
| Message-ID | <l013f1$230h$1@news.ntua.gr> |
| In reply to | #53438 |
Στις 2/9/2013 2:14 πμ, ο/η Dave Angel έγραψε: > On 1/9/2013 18:23, Ferrous Cranus wrote: > > <snip> >>> >> i still wonder how come the invalid byte messge dissapeared >> > > Too bad you never bothered to narrow it down to its source. if only i knew how up until yesterday when they were appearing. > It could > be anywhere on those three lines. If I had to guess, I'd figure it was > one of those environment variables. The Linux environment variables are > strings of bytes, and the os.environ is a dict of strings. Apparently > it converts them using utf-8, and if you've somehow set them using some > other encoding, you could be getting that error. > > Have you tried to decode those bytes in various encodings other than > utf-8 ? No, because i wasn't aware of what string/variable they were pertaining at. -- Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-02 11:38 +0000 |
| Message-ID | <mailman.484.1378121913.19984.python-list@python.org> |
| In reply to | #53453 |
On 2/9/2013 00:16, Ferrous Cranus wrote:
>>
>> Have you tried to decode those bytes in various encodings other than
>> utf-8 ?
>
>
> No, because i wasn't aware of what string/variable they were pertaining at.
>
>
http://pypi.python.org/pypi/chardet
is a package which tries to 'guess' an encoding for a string of bytes.
I happen to have the 2.7 version installed, but not the 3.x version, so
the following is in 2.7. Same thing should work in 3.3....
>>> chardet.detect(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2')
{'confidence': 0.9638983132261467, 'encoding': 'windows-1253'}
>>> print b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2'.decode('windows-1253')
¶γνωστοόνομα συστήματος
I don't have a clue what it might be; it's not English, and I don't
know whatever language it may be in.
Does that string make any sense to you? You may want to try it on your
own machine, since the email may obscure the encoding. Or you might
want to do the decode using whatever the default encoding is for that
server.
The Linux 'file' utility thinks this string is in ISO-8859, so you might
want to try a decode('ISO-8859-1') as well. (and maybe ISO-8859-2, -3,
-4, and -5)
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-02 14:49 +0300 |
| Message-ID | <l01tvf$1a8k$1@news.ntua.gr> |
| In reply to | #53474 |
Στις 2/9/2013 2:38 μμ, ο/η Dave Angel έγραψε:
> On 2/9/2013 00:16, Ferrous Cranus wrote:
>
>
>>>
>>> Have you tried to decode those bytes in various encodings other than
>>> utf-8 ?
>>
>>
>> No, because i wasn't aware of what string/variable they were pertaining at.
>>
>>
>
> http://pypi.python.org/pypi/chardet
>
> is a package which tries to 'guess' an encoding for a string of bytes.
> I happen to have the 2.7 version installed, but not the 3.x version, so
> the following is in 2.7. Same thing should work in 3.3....
>
>>>> chardet.detect(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2')
> {'confidence': 0.9638983132261467, 'encoding': 'windows-1253'}
>>>> print b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2'.decode('windows-1253')
> ¶γνωστοόνομα συστήματος
>
>
> I don't have a clue what it might be; it's not English, and I don't
> know whatever language it may be in.
>
> Does that string make any sense to you?
Yes it does, it mean "Unknown Hostname"
> The Linux 'file' utility thinks this string is in ISO-8859, so you might
> want to try a decode('ISO-8859-1') as well. (and maybe ISO-8859-2, -3,
> -4, and -5)
How did you test it? The utility afaik analyzes a file's encodings not
string encodings.
nikos@superhost.gr [~]# file www/cgi-bin/files.py
www/cgi-bin/files.py: a /usr/bin/python script text executable
--
Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-02 12:21 +0000 |
| Message-ID | <mailman.487.1378124525.19984.python-list@python.org> |
| In reply to | #53475 |
On 2/9/2013 07:49, Ferrous Cranus wrote:
<snip>
> Στις 2/9/2013 2:38 μμ, ο/η Dave Angel έγραψε:
>>
>> Does that string make any sense to you?
>
> Yes it does, it mean "Unknown Hostname"
>
>> The Linux 'file' utility thinks this string is in ISO-8859, so you might
>> want to try a decode('ISO-8859-1') as well. (and maybe ISO-8859-2, -3,
>> -4, and -5)
>
> How did you test it? The utility afaik analyzes a file's encodings not
> string encodings.
>
Starting with the byte string in the error message:
>>> f = open("junk.txt", "w")
>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>> f.close()
> nikos@superhost.gr [~]# file www/cgi-bin/files.py
> www/cgi-bin/files.py: a /usr/bin/python script text executable
>
>
No point in doing that, as the string in question doesn't exist there.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-02 18:05 +0300 |
| Message-ID | <l029ev$2a1g$1@news.ntua.gr> |
| In reply to | #53478 |
Στις 2/9/2013 3:21 μμ, ο/η Dave Angel έγραψε:
> Starting with the byte string in the error message:
>>>> f = open("junk.txt", "w")
>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>> f.close()
Ιndeed but yet again, file checks out the encoding of the filename that
consists of these lines above, not of the actual strings.
--
Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-02 18:28 +0000 |
| Message-ID | <mailman.511.1378146537.19984.python-list@python.org> |
| In reply to | #53492 |
On 2/9/2013 11:05, Ferrous Cranus wrote:
> Στις 2/9/2013 3:21 μμ, ο/η Dave Angel έγραψε:
>> Starting with the byte string in the error message:
>>>>> f = open("junk.txt", "w")
>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>>> f.close()
>
>
> Ιndeed but yet again, file checks out the encoding of the filename that
> consists of these lines above, not of the actual strings.
>
>
'file' does nothing interesting with the filename, it just opens it and
examines the contents. For example,
file www/cgi-bin/files.py
will examine the Python source file, not run it.
So first in the interpreter, I ran
>>>> f = open("junk.txt", "w")
>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>> f.close()
then at the bash prompt, I ran:
davea@think2:~$ file junk.txt
junk.txt: ISO-8859 text
davea@think2:~$
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-09-04 01:35 -0700 |
| Message-ID | <3e549761-4323-4379-b4e4-ce51597d59c0@googlegroups.com> |
| In reply to | #53522 |
Τη Δευτέρα, 2 Σεπτεμβρίου 2013 9:28:36 μ.μ. UTC+3, ο χρήστης Dave Angel έγραψε:
> On 2/9/2013 11:05, Ferrous Cranus wrote:
>
>
>
> > Στις 2/9/2013 3:21 μμ, ο/η Dave Angel έγραψε:
>
> >> Starting with the byte string in the error message:
>
> >>>>> f = open("junk.txt", "w")
>
> >>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>
> >>>>> f.close()
>
> >
>
> >
>
> > Ιndeed but yet again, file checks out the encoding of the filename that
>
> > consists of these lines above, not of the actual strings.
>
> >
>
> >
>
>
>
> 'file' does nothing interesting with the filename, it just opens it and
>
> examines the contents. For example,
>
>
>
> file www/cgi-bin/files.py
>
>
>
> will examine the Python source file, not run it.
>
>
>
> So first in the interpreter, I ran
>
>
>
> >>>> f = open("junk.txt", "w")
>
> >>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>
> >>>> f.close()
>
>
>
> then at the bash prompt, I ran:
>
>
>
> davea@think2:~$ file junk.txt
>
> junk.txt: ISO-8859 text
That is one Clever Idea Dave.
I take it that the charset of the file 'junk.txt' gets identified by the characters encoding that read form within the file?
But wait a minute: What editor do you uses to write these 3 lines?
I mean am a bit confused.
i for example i 'nano tets.py' which has within:
f = open("junk.txt", "w")
f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
f.close()
then when i save the file within nano for example by default in utf-8 charset
how would it be able to detect the bytestring within that is supposed to be of greek-iso's
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-04 11:26 +0000 |
| Message-ID | <mailman.38.1378294002.5461.python-list@python.org> |
| In reply to | #53609 |
On 4/9/2013 04:35, Ferrous Cranus wrote:
> Τη Δευτέρα, 2 Σεπτεμβρίου 2013 9:28:36 μ.μ. UTC+3, ο χρήστης Dave Angel έγραψε:
>> On 2/9/2013 11:05, Ferrous Cranus wrote:
>>
>>
>>
>> > Στις 2/9/2013 3:21 μμ, ο/η Dave Angel έγραψε:
>>
>> >> Starting with the byte string in the error message:
>>
>> >>>>> f = open("junk.txt", "w")
>>
>> >>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>
>> >>>>> f.close()
>>
>> >
>>
>> >
>>
>> > Ιndeed but yet again, file checks out the encoding of the filename that
>>
>> > consists of these lines above, not of the actual strings.
>>
>> >
>>
>> >
>>
>>
>>
>> 'file' does nothing interesting with the filename, it just opens it and
>>
>> examines the contents. For example,
>>
>>
>>
>> file www/cgi-bin/files.py
>>
>>
>>
>> will examine the Python source file, not run it.
>>
>>
>>
>> So first in the interpreter, I ran
>>
>>
>>
>> >>>> f = open("junk.txt", "w")
>>
>> >>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>
>> >>>> f.close()
>>
>>
>>
>> then at the bash prompt, I ran:
>>
>>
>>
>> davea@think2:~$ file junk.txt
>>
>> junk.txt: ISO-8859 text
>
>
> That is one Clever Idea Dave.
>
> I take it that the charset of the file 'junk.txt' gets identified by the characters encoding that read form within the file?
'file' only guesses the most likely encoding for 'junk.txt' But at
least it can know it's not utf-8, since that would give an decoding
error.
That's why, whenever 'file' makes its verdict, it's up to you to check
it by displaying the data after decoding it with that tentative
encoding.
>
> But wait a minute: What editor do you uses to write these 3 lines?
> I mean am a bit confused.
As I said right above, "in the interpreter, I ran"...
And if that's not clear enough, you can see the >>>> prompts that the
Python interpreter uses. By interpeter, I mean I ran Python with no
parameters. I did not run IDLE or any other IDE, that might take it
upon itself to interfere.
>
> i for example i 'nano tets.py' which has within:
>
> f = open("junk.txt", "w")
> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
> f.close()
>
> then when i save the file within nano for example by default in utf-8 charset
That's the encoding for the file tets.py, and you'll notice that it's
actually ASCII. Notice that the string I copied from the error message
uses escape sequences for all non-ASCII bytes.
>
> how would it be able to detect the bytestring within that is supposed to be of greek-iso's
I wouldn't be running 'file' on the tets.py file, but on the junk.txt
file created when you run
python tets.py
So since the tets.py file was a sidetrack, I just ran those three lines
in the interpreter.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-04 14:38 +0300 |
| Message-ID | <l07641$vh2$1@dont-email.me> |
| In reply to | #53616 |
Στις 4/9/2013 2:26 μμ, ο/η Dave Angel έγραψε:
> On 4/9/2013 04:35, Ferrous Cranus wrote:
>
>> Τη Δευτέρα, 2 Σεπτεμβρίου 2013 9:28:36 μ.μ. UTC+3, ο χρήστης Dave Angel έγραψε:
>>> On 2/9/2013 11:05, Ferrous Cranus wrote:
>>>
>>>
>>>
>>>> Στις 2/9/2013 3:21 μμ, ο/η Dave Angel έγραψε:
>>>
>>>>> Starting with the byte string in the error message:
>>>
>>>>>>>> f = open("junk.txt", "w")
>>>
>>>>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>
>>>>>>>> f.close()
>>>
>>>>
>>>
>>>>
>>>
>>>> Ιndeed but yet again, file checks out the encoding of the filename that
>>>
>>>> consists of these lines above, not of the actual strings.
>>>
>>>>
>>>
>>>>
>>>
>>>
>>>
>>> 'file' does nothing interesting with the filename, it just opens it and
>>>
>>> examines the contents. For example,
>>>
>>>
>>>
>>> file www/cgi-bin/files.py
>>>
>>>
>>>
>>> will examine the Python source file, not run it.
>>>
>>>
>>>
>>> So first in the interpreter, I ran
>>>
>>>
>>>
>>>>>>> f = open("junk.txt", "w")
>>>
>>>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>
>>>>>>> f.close()
>>>
>>>
>>>
>>> then at the bash prompt, I ran:
>>>
>>>
>>>
>>> davea@think2:~$ file junk.txt
>>>
>>> junk.txt: ISO-8859 text
>>
>>
>> That is one Clever Idea Dave.
>>
>> I take it that the charset of the file 'junk.txt' gets identified by the characters encoding that read form within the file?
>
> 'file' only guesses the most likely encoding for 'junk.txt' But at
> least it can know it's not utf-8, since that would give an decoding
> error.
>
> That's why, whenever 'file' makes its verdict, it's up to you to check
> it by displaying the data after decoding it with that tentative
> encoding.
>
>>
>> But wait a minute: What editor do you uses to write these 3 lines?
>> I mean am a bit confused.
>
> As I said right above, "in the interpreter, I ran"...
> And if that's not clear enough, you can see the >>>> prompts that the
> Python interpreter uses. By interpeter, I mean I ran Python with no
> parameters. I did not run IDLE or any other IDE, that might take it
> upon itself to interfere.
>
>
>>
>> i for example i 'nano tets.py' which has within:
>>
>> f = open("junk.txt", "w")
>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>> f.close()
>>
>> then when i save the file within nano for example by default in utf-8 charset
>
> That's the encoding for the file tets.py, and you'll notice that it's
> actually ASCII. Notice that the string I copied from the error message
> uses escape sequences for all non-ASCII bytes.
>
>>
>> how would it be able to detect the bytestring within that is supposed to be of greek-iso's
>
> I wouldn't be running 'file' on the tets.py file, but on the junk.txt
> file created when you run
> python tets.py
>
> So since the tets.py file was a sidetrack, I just ran those three lines
> in the interpreter.
>
I'm still consused about this.
say we save those 3 lines inside junk.txt and we save it by default as utf-8
when we 'file junk.txt'
what will file respond with?
filename's charset?
or
will it llook at the bystering within to decide what encoding it uses?
fi
--
Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-04 12:38 +0000 |
| Message-ID | <mailman.42.1378298304.5461.python-list@python.org> |
| In reply to | #53618 |
On 4/9/2013 07:38, Ferrous Cranus wrote:
> Στις 4/9/2013 2:26 μμ, ο/η Dave Angel έγραψε:
>>
>>>>
>>>> So first in the interpreter, I ran
>>>>
>>>>
>>>>
>>>>>>>> f = open("junk.txt", "w")
>>>>
>>>>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>>
>>>>>>>> f.close()
>>>>
>>>>
>>>>
<snip>
>> So since the tets.py file was a sidetrack, I just ran those three lines
>> in the interpreter.
>>
> I'm still consused about this.
>
> say we save those 3 lines inside junk.txt and we save it by default as utf-8
>
> when we 'file junk.txt'
>
> what will file respond with?
junk2.txt: ASCII text
>
> filename's charset?
>
> or
>
> will it llook at the bystering within to decide what encoding it uses?
>
'file' isn't magic. And again, it doesn't look at the filename, it
looks at the content. What heuristics it uses, I don't know, but it has
hundreds of them. ( I wish you hadn't confused the issue by using the
same name junk.txt for an entirely different purpose) When it looks at a
file like this one, it looks only at the bytes within it. In this
case, the instance of 'file' on my machine decides it's an ASCII file.
if I add an silly shebang line
#!/usr/tmp/pyttthon
it says
junk2.txt: a /usr/tmp/pyttthon script, ASCII text executable
It doesn't know it's python, it just trusts the shebang line. And it
identifies it as ASCII, not utf-8, since there are no non-ascii
characters in it. It certainly does not try to interpret the b'xxxx'
byte string by Python syntax rules.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Date | 2013-09-04 17:29 +0300 |
| Message-ID | <l07g4b$lcv$3@dont-email.me> |
| In reply to | #53624 |
Στις 4/9/2013 3:38 μμ, ο/η Dave Angel έγραψε: > 'file' isn't magic. And again, it doesn't look at the filename, it > looks at the content. So, you are saying that it looks a the content of the file and not of what encoding we used to save the file into? But the contents have within: f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n') so it should have said greek-iso and not ascii. -- Webhost <http://superhost.gr>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-05 00:17 +0000 |
| Message-ID | <mailman.68.1378340276.5461.python-list@python.org> |
| In reply to | #53627 |
On 4/9/2013 10:29, Ferrous Cranus wrote: > Στις 4/9/2013 3:38 μμ, ο/η Dave Angel έγραψε: >> 'file' isn't magic. And again, it doesn't look at the filename, it >> looks at the content. > So, you are saying that it looks a the content of the file and not of > what encoding we used to save the file into? That's right. There's no place where your text editor stores the encoding it used, so 'file' has to guess, based only on the content. > > But the contents have within: > > f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 > \xf3\xf\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n') > > so it should have said greek-iso and not ascii. > No, that line is totally ASCII. Only when it's EXECUTED by Python will a non ASCII byte string object be created. Like I said, 'file' doesn't know the first thing about Python syntax, nor should it. -- Signature file not found
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2013-09-05 03:07 +0000 |
| Message-ID | <5227f57e$0$2743$c3e8da3$76491128@news.astraweb.com> |
| In reply to | #53660 |
On Thu, 05 Sep 2013 00:17:36 +0000, Dave Angel wrote: > On 4/9/2013 10:29, Ferrous Cranus wrote: > >> Στις 4/9/2013 3:38 μμ, ο/η Dave Angel έγραψε: >>> 'file' isn't magic. And again, it doesn't look at the filename, it >>> looks at the content. >> So, you are saying that it looks a the content of the file and not of >> what encoding we used to save the file into? > > That's right. There's no place where your text editor stores the > encoding it used, so 'file' has to guess, based only on the content. Correct. The thing that people often fail to understand is that there is no *reliable* way to store the encoding used for a text file in the text file itself. The encoding is *metadata*, not data: it is data about the data, and consequently it has to be stored "out of band". It has to be stored somewhere else, outside of the file. In the case of text files, it is usually not stored anywhere at all. IBM mainframes assume that text files are using EBCDIC; modern Linux systems assume text files are UTF-8; old DOS applications assume text files are ASCII. Some text editors will try to guess the encoding, using various heuristics such as "if the file starts with \xFE\xFF it is UTF-16" but none of them are foolproof: http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx sometimes with amusing consequences: http://www.hoax-slayer.com/bush-hid-the-facts-notepad.html >> But the contents have within: >> >> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 >> \xf3\xf\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n') >> >> so it should have said greek-iso and not ascii. But the above byte string is also valid ISO-8859-5 (Cyrillic): 'Жуэљѓєяќэяьсѓ\x0fѓєоьсєяђ\n' ISO-8859-2 (Central European): 'śăíůóôďüíďěáó\x0fóôŢěáôďň\n' and ISO-8859-4 (Baltic): 'ļãíųķôīüíīėáķ\x0fķôŪėáôīō\n' Surely you don't expect the file utility to actually recognise that 'Άγνωστοόνομασ\x0fστήματος\n' makes a valid Greek phrase while the others are not meaningful? > No, that line is totally ASCII. Only when it's EXECUTED by Python will > a non ASCII byte string object be created. Like I said, 'file' doesn't > know the first thing about Python syntax, nor should it. Technically, it's not ASCII, since ASCII only knows about bytes \x00 through \x7F (decimal 0 through 127). That's why it isn't correct to describe Python bytes strings as "ASCII strings". They're byte strings that happen to be displayed as ASCII-plus-other-stuff. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-09-05 13:59 +1000 |
| Message-ID | <mailman.71.1378353565.5461.python-list@python.org> |
| In reply to | #53664 |
On Thu, Sep 5, 2013 at 1:07 PM, Steven D'Aprano <steve@pearwood.info> wrote: > Technically, it's not ASCII, since ASCII only knows about bytes \x00 > through \x7F (decimal 0 through 127). That's why it isn't correct to > describe Python bytes strings as "ASCII strings". They're byte strings > that happen to be displayed as ASCII-plus-other-stuff. The line of code is itself entirely ASCII. The sequence REVERSE SOLIDUS, LATIN SMALL LETTER X, LATIN SMALL LETTER B, DIGIT SIX is four Unicode characters that are in the ASCII set. That Python interprets them as representing the byte value 182 doesn't change that; the line of code *is* ASCII. ChrisA
[toc] | [prev] | [next] | [standalone]
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
Back to top | Article view | comp.lang.python
csiph-web