Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #91736 > unrolled thread

Parser needed.

Started by"Skybuck Flying" <skybuck2000@hotmail.com>
First post2015-06-02 02:29 +0200
Last post2015-06-09 04:31 +0200
Articles 17 on this page of 57 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 02:29 +0200
    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 02:55 +0200
      Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:02 +0200
        Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:10 +0200
          Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:16 +0200
            Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:27 +0200
              Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:29 +0200
                Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:31 +0200
                  Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:34 +0200
                    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 04:02 +0200
                      Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 04:22 +0200
                        Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 04:27 +0200
                          Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 04:39 +0200
                  Re: Parser needed. Joel Goldstick <joel.goldstick@gmail.com> - 2015-06-01 21:41 -0400
                    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 04:04 +0200
    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 02:56 +0200
    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:01 +0200
    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:03 +0200
      Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:04 +0200
    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:07 +0200
      Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:07 +0200
    Re: Parser needed. Michael Torrie <torriem@gmail.com> - 2015-06-01 19:12 -0600
      Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:19 +0200
        Re: Parser needed. Michael Torrie <torriem@gmail.com> - 2015-06-01 19:31 -0600
          Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 03:35 +0200
            Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 04:11 +0200
              Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 04:20 +0200
        Re: Parser needed. Grant Edwards <invalid@invalid.invalid> - 2015-06-02 13:48 +0000
    Re: Parser needed. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-02 04:42 +0100
    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 06:23 +0200
      Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 06:29 +0200
        Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 06:39 +0200
          Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 06:45 +0200
            Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 06:51 +0200
              Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 06:53 +0200
            Re: Parser needed. MRAB <python@mrabarnett.plus.com> - 2015-06-02 17:43 +0100
              Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-02 19:09 +0200
            Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 18:01 +0200
    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 19:02 +0200
      Re: Parser needed. Steven D'Aprano <steve@pearwood.info> - 2015-06-05 03:14 +1000
        Re: Parser needed. Michael Torrie <torriem@gmail.com> - 2015-06-04 11:25 -0600
      Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 19:23 +0200
        Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 19:34 +0200
          Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 19:38 +0200
            Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 19:44 +0200
              Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 19:50 +0200
                Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-04 20:00 +0200
                  Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-09 04:30 +0200
                    Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-09 14:20 +0200
                      Re: Parser needed. Michael Torrie <torriem@gmail.com> - 2015-06-09 17:33 -0600
                        Re: Parser needed. Rustom Mody <rustompmody@gmail.com> - 2015-06-09 19:08 -0700
                        Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-11 13:15 +0200
                          Re: Parser needed. Joel Goldstick <joel.goldstick@gmail.com> - 2015-06-11 08:35 -0400
                          Re: Parser needed. Larry Martell <larry.martell@gmail.com> - 2015-06-11 08:38 -0400
                            Re: Parser needed. Rustom Mody <rustompmody@gmail.com> - 2015-06-11 08:35 -0700
                    Re: Parser needed. Tony the Tiger <tony@tiger.invalid> - 2015-06-14 16:01 +0000
                  Re: Parser needed. "Skybuck Flying" <skybuck2000@hotmail.com> - 2015-06-09 04:31 +0200

Page 3 of 3 — ← Prev page 1 2 [3]


#92072

FromMichael Torrie <torriem@gmail.com>
Date2015-06-04 11:25 -0600
Message-ID<mailman.167.1433438768.13271.python-list@python.org>
In reply to#92069
On 06/04/2015 11:14 AM, Steven D'Aprano wrote:
> On Fri, 5 Jun 2015 03:02 am, Skybuck Flying wrote:
> 
>> Yeah... my first nice parser for this kind of stuff...
>>
>> Python is really nice for this stuff...
>>
>> Piece a cake.. now I just need to stuff it in some dictionary and I am
>> done or so ;)
>>
>> Though a dictionary might be hard to traverse in sequence...
> 
> Use an ordered dict:
> 
> from collections import OrderedDict

I prefer a list of Dicts, or a list of objects you define.  That way it
keeps order, but you can have an arbitrary structure on each item

[toc] | [prev] | [next] | [standalone]


#92071

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-04 19:23 +0200
Message-ID<74dd4$55708983$5419aafe$17465@news.ziggo.nl>
In reply to#92067
Very nice code almost done.

Now I am trying to do the code correctly and fast, thus using a dictionary, 
but I run into a little problem:

The dictionary is declared as:

DemoEntityRefIndex = {}

Pairs are added as:

DemoEntityRefIndex[Ref] = DemoEntityIndex

And now I try to retrieve the demo entit index as follows:

DemoEntityIndex = DemoEntityRefIndex[Ref]

But somehow it fails with error:

[error] script [ ParseDemoFile ] stopped with error in line 211
[error] KeyError ( EntityRef )
[error] --- Traceback --- error source first line: module ( function ) 
statement 146: main ( ProcessUpdateEntityRef ) DemoEntityIndex = 
DemoEntityRefIndex[Ref]

Hmm..

I may try dict.get maybe that ll help.

Bye,
  Skybuck.









[toc] | [prev] | [next] | [standalone]


#92076

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-04 19:34 +0200
Message-ID<ebb0e$55708c3f$5419aafe$19843@news.ziggo.nl>
In reply to#92071
get was tried but now new error somewhere else:

[error] TypeError ( list indices must be integers )
[error] --- Traceback --- error source first line: module ( function ) 
statement 133: main ( ProcessUpdateEntityDead ) 
DemoEntityDead[DemoEntityIndex] = Dead

Apperently the returned index from get is not an integer ?

Weird... so far dictionaries offer misery.

Bye,
  Skybuck. 

[toc] | [prev] | [next] | [standalone]


#92078

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-04 19:38 +0200
Message-ID<cb46a$55708d22$5419aafe$20596@news.ziggo.nl>
In reply to#92076
I feel my conclusion is a bit hasty... but using dictionaries is not easy 
that for sure.

Apperently the problem is

DemoEntityInde is none ?

But why would it be none ?

Hmmm strange... maybe some refs are not in there... hmmm...

Yeah could be... I cutted some stuff out... so I better check for none then 
for now ;)

So for now consider this solved, unless I return ! ;) :)

Bye,
  Skybuck.

[toc] | [prev] | [next] | [standalone]


#92079

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-04 19:44 +0200
Message-ID<8dfe$55708e76$5419aafe$21711@news.ziggo.nl>
In reply to#92078
Something strange happens with: 36044817

near the update section... for some reason it doesn't copy it properly...

Hmm...

Maybe a bug in output or an additional new line or maybe something wrong...

Hmm..

Bye,
  Skybuck.

[toc] | [prev] | [next] | [standalone]


#92080

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-04 19:50 +0200
Message-ID<7241f$55708fd7$5419aafe$23020@news.ziggo.nl>
In reply to#92079
Ok problem found.

The data contains:

  EntityRef EntityRef

So perhaps I screwed it up or perhaps the data is a bit bad.

I ll check on my web drive:

http://www.skybuck.org/Games/StartrekOnline/Parser/SpaceFleetAlertEnemyExample.demo

Firefox doesn't find it... so apperently I fucked up data a bit...

Ok so it's not a bug... thank god for that.

So this is not really a problem.

Now to redownload this file for now or so ;)

Bye,
  Skybuck. 

[toc] | [prev] | [next] | [standalone]


#92082

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-04 20:00 +0200
Message-ID<4252d$55709249$5419aafe$25285@news.ziggo.nl>
In reply to#92080
Well... I must say I am impressed:

Python parsers the file/info I want in just:

Seconds: 0.0139999389648

For +/- 20.000 lines of input data/text.

This makes it very usuable cool !

Now I try the bigger file:

+/- 285.000 lines of input data/text:

Seconds: 0.0929999351501

Very impressive !

I guess I was wrong about python...

Python pretty good at text processing wieee =D

Bye,
  Skybuck =D

[toc] | [prev] | [next] | [standalone]


#92352

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-09 04:30 +0200
Message-ID<74396$55764fb5$5419aafe$6142@news.ziggo.nl>
In reply to#92082
I made it way too difficult on myself with that stupid dictionary bs...

What I really wanted was to know if the ref was already in the reflist.

Turns out python has a really nice simple operation for that:

   if not (Ref in EntityRef):
    EntityRef.append(Ref)

DONE ! =D

No need for dictionary.

Anyway... I am trying a more robust parser... because my own parser right 
now didn't work out for new inputs.

It almost worked except for first item... don't know what problem was.... 
maybe just this...

But I'll try and do it the usually way.

"Tokenize", "Parse" etc.

It's significantly slower though.

Maybe an idea for a new kind of parser could be:

Find all recgonized token keywords in one go... and stuff all find indexes 
into a list... and then perhaps sort the list...

and then use the sorted list... to return token found at that position...

and then start processing like that...

Benefit of this idea is less characters to compare... and could be done with 
just integer compare... or even lookup table processing.

Added benefit is still in-order which is nice.

For now I won't do it... cause I am not sure it would be an improvemt.

Another idea which might be an improvement is.

Parallel searching... ofcourse that might not be possible... though... multi 
core does it exist and threading too.

But to mimic parallel searching and to prevent problems.

A sliding window approach could be taken.

And perhaps items found that way in a certain buffer or so and then still 
added to processing list or so...

which kinda mimics parallel search... but just uses data cache nicely.

Though it's somewhat of a complex method... I will avoid it for now.

The split() routine is real nice... to get rid of fudd/white space... and 
just tokens which is nice.

So for now I will use that as my tokenizer ;) =D

and bracket level counting and sections and stuff like that yeah...

Bye,
  Skybuck. 

[toc] | [prev] | [next] | [standalone]


#92358

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-09 14:20 +0200
Message-ID<48a90$5576da1e$5419aafe$34050@news.ziggo.nl>
In reply to#92352
Euhm...

My parser is already done... since today

Loving it too

Wrote it myself... based on the c# code technique explained somewhere in 
this thread too

Bye,
  Skybuck.

[toc] | [prev] | [next] | [standalone]


#92378

FromMichael Torrie <torriem@gmail.com>
Date2015-06-09 17:33 -0600
Message-ID<mailman.327.1433892811.13271.python-list@python.org>
In reply to#92358
On 06/09/2015 06:20 AM, Skybuck Flying wrote:
> Euhm...
> 
> My parser is already done... since today
> 
> Loving it too
> 
> Wrote it myself... based on the c# code technique explained somewhere in 
> this thread too

I'm glad you're having fun, and making good progress.  And it's good to
hear of success with Python.  However, I wouldn't really call this a
"thread;" more a "monologue."  I'm not sure list members really
appreciate the blow-by-blow description of a project, for the future.  I
know I find it rather noisy.  I don't think anyone has actually read any
of your messages (I'm pretty sure some have just blacklisted your
messages), except to determine you're trying to parse something rather
vague.  Might I suggest a blog is a more appropriate place to ruminate
over a project (and posting code and examples) and use the list for
questions and advice?

[toc] | [prev] | [next] | [standalone]


#92383

FromRustom Mody <rustompmody@gmail.com>
Date2015-06-09 19:08 -0700
Message-ID<bf92c4ed-c616-458f-9d71-060aeb15cd46@googlegroups.com>
In reply to#92378
On Wednesday, June 10, 2015 at 5:04:17 AM UTC+5:30, Michael Torrie wrote:
> On 06/09/2015 06:20 AM, Skybuck Flying wrote:
> > Euhm...
> > 
> > My parser is already done... since today
> > 
> > Loving it too
> > 
> > Wrote it myself... based on the c# code technique explained somewhere in 
> > this thread too
> 
> I'm glad you're having fun, and making good progress.  And it's good to
> hear of success with Python.  However, I wouldn't really call this a
> "thread;" more a "monologue."  I'm not sure list members really
> appreciate the blow-by-blow description of a project, for the future.  I
> know I find it rather noisy.  I don't think anyone has actually read any
> of your messages (I'm pretty sure some have just blacklisted your
> messages), except to determine you're trying to parse something rather
> vague.  Might I suggest a blog is a more appropriate place to ruminate
> over a project (and posting code and examples) and use the list for
> questions and advice?

+1

[toc] | [prev] | [next] | [standalone]


#92464

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-11 13:15 +0200
Message-ID<efad0$55796dd4$5419aafe$46357@news.ziggo.nl>
In reply to#92378
Well it did help a little bit.

Somebody asked if there was already a parser for it.

I answered yes in C#.

So I took a closer look at it... and learned something from it.

Maybe I would have done that anyway... or maybe not...

Now we will never know... but I am happy that the parser is now ok, done and 
pretty easy extendable.

I think it's probably my first really good one.

I did do a little assembler before... but not sure if it was any good.

I also read before I started the thread... what "parse" means in dictionary.

That also helped me understand it a little bit better.

And also I read how it's usually done... tokens/tokenize etc... kinda 
already knew that... but still.

Turned out to be quite easy...

But at the start /always it seemed so difficult...

So any little bits of help/advice/information can help ! :)

Bye,
  Skybuck. 

[toc] | [prev] | [next] | [standalone]


#92477

FromJoel Goldstick <joel.goldstick@gmail.com>
Date2015-06-11 08:35 -0400
Message-ID<mailman.397.1434026144.13271.python-list@python.org>
In reply to#92464
On Thu, Jun 11, 2015 at 7:15 AM, Skybuck Flying <skybuck2000@hotmail.com> wrote:
> Well it did help a little bit.
>
> Somebody asked if there was already a parser for it.
>
> I answered yes in C#.
>
> So I took a closer look at it... and learned something from it.
>
> Maybe I would have done that anyway... or maybe not...
>
> Now we will never know... but I am happy that the parser is now ok, done and
> pretty easy extendable.
>
> I think it's probably my first really good one.
>
> I did do a little assembler before... but not sure if it was any good.
>
> I also read before I started the thread... what "parse" means in dictionary.
>
> That also helped me understand it a little bit better.
>
> And also I read how it's usually done... tokens/tokenize etc... kinda
> already knew that... but still.
>
> Turned out to be quite easy...
>
> But at the start /always it seemed so difficult...
>
> So any little bits of help/advice/information can help ! :)
>
>
> Bye,
>  Skybuck.
> --
> https://mail.python.org/mailman/listinfo/python-list


but you aren't asking questions.  You are having a conversation with
yourself on a public q/a list.  Its unpleasant
-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [next] | [standalone]


#92478

FromLarry Martell <larry.martell@gmail.com>
Date2015-06-11 08:38 -0400
Message-ID<mailman.398.1434026290.13271.python-list@python.org>
In reply to#92464
On Thu, Jun 11, 2015 at 8:35 AM, Joel Goldstick
<joel.goldstick@gmail.com> wrote:
> but you aren't asking questions.  You are having a conversation with
> yourself on a public q/a list.  Its unpleasant

Well, he did mention masterbation in another post.

[toc] | [prev] | [next] | [standalone]


#92485

FromRustom Mody <rustompmody@gmail.com>
Date2015-06-11 08:35 -0700
Message-ID<b128dc4b-ac60-4ad9-bf5a-db3296327167@googlegroups.com>
In reply to#92478
On Thursday, June 11, 2015 at 6:08:22 PM UTC+5:30, Larry....@gmail.com wrote:
> On Thu, Jun 11, 2015 at 8:35 AM, Joel Goldstick wrote:
> > but you aren't asking questions.  You are having a conversation with
> > yourself on a public q/a list.  Its unpleasant
> 
> Well, he did mention masterbation in another post.

Er...

Those of us who happen to be teachers are getting pointed at by the misspelling+archaism combo above.

Thought python had no rogue pointers? <wink>

[toc] | [prev] | [next] | [standalone]


#92618

FromTony the Tiger <tony@tiger.invalid>
Date2015-06-14 16:01 +0000
Message-ID<Tzhfx.967682$LI3.100944@fx24.am4>
In reply to#92352
On Tue, 09 Jun 2015 04:30:13 +0200, Skybuck Flying wrote:

> Bye,
>   Skybuck.

Yes, that's VERY true.

 /Grrr
-- 
          ___                  ___
 (\_--_/)  | _ ._    _|_|_  _   |o _  _ ._
 ( 9  9 )  |(_)| |\/  |_| |(/_  ||(_|(/_|
 stripes are forever - as overripe ferrets

[toc] | [prev] | [next] | [standalone]


#92353

From"Skybuck Flying" <skybuck2000@hotmail.com>
Date2015-06-09 04:31 +0200
Message-ID<97a06$55764fe4$5419aafe$6212@news.ziggo.nl>
In reply to#92082
Oh I think I forgot to mention... parser is now getting close to 1 second... 
with tokenizer and such.

But I think this is still within acceptable performance level for now.

Bye,
  Skybuck.

[toc] | [prev] | [standalone]


Page 3 of 3 — ← Prev page 1 2 [3]

Back to top | Article view | comp.lang.python


csiph-web