Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90729 > unrolled thread
| Started by | bruceg113355@gmail.com |
|---|---|
| First post | 2015-05-16 06:28 -0700 |
| Last post | 2015-05-16 23:24 +0000 |
| Articles | 14 — 9 participants |
Back to article view | Back to comp.lang.python
Fastest way to remove the first x characters from a very long string bruceg113355@gmail.com - 2015-05-16 06:28 -0700
Re: Fastest way to remove the first x characters from a very long string Joel Goldstick <joel.goldstick@gmail.com> - 2015-05-16 09:43 -0400
Re: Fastest way to remove the first x characters from a very long string Chris Angelico <rosuav@gmail.com> - 2015-05-16 23:45 +1000
Re: Fastest way to remove the first x characters from a very long string bruceg113355@gmail.com - 2015-05-16 07:02 -0700
Re: Fastest way to remove the first x characters from a very long string bruceg113355@gmail.com - 2015-05-16 09:22 -0700
Re: Fastest way to remove the first x characters from a very long string Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-16 10:57 -0600
Re: Fastest way to remove the first x characters from a very long string Chris Angelico <rosuav@gmail.com> - 2015-05-17 02:59 +1000
Re: Fastest way to remove the first x characters from a very long string bruceg113355@gmail.com - 2015-05-16 10:35 -0700
Re: Fastest way to remove the first x characters from a very long string Cameron Simpson <cs@zip.com.au> - 2015-05-17 08:41 +1000
Re: Fastest way to remove the first x characters from a very long string Grant Edwards <invalid@invalid.invalid> - 2015-05-16 14:59 +0000
Re: Fastest way to remove the first x characters from a very long string Rustom Mody <rustompmody@gmail.com> - 2015-05-16 08:13 -0700
Re: Fastest way to remove the first x characters from a very long string bruceg113355@gmail.com - 2015-05-16 09:24 -0700
Re: Fastest way to remove the first x characters from a very long string Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2015-05-16 18:55 +0200
Re: Fastest way to remove the first x characters from a very long string Denis McMahon <denismfmcmahon@gmail.com> - 2015-05-16 23:24 +0000
| From | bruceg113355@gmail.com |
|---|---|
| Date | 2015-05-16 06:28 -0700 |
| Subject | Fastest way to remove the first x characters from a very long string |
| Message-ID | <6a383ce2-5975-4225-b4f2-f744c9d7a516@googlegroups.com> |
I have a string that contains 10 million characters. The string is formatted as: "0000001 : some hexadecimal text ... \n 0000002 : some hexadecimal text ... \n 0000003 : some hexadecimal text ... \n ... 0100000 : some hexadecimal text ... \n 0100001 : some hexadecimal text ... \n" and I need the string to look like: "some hexadecimal text ... \n some hexadecimal text ... \n some hexadecimal text ... \n ... some hexadecimal text ... \n some hexadecimal text ... \n" I can split the string at the ":" then iterate through the list removing the first 8 characters then convert back to a string. This method works, but it takes too long to execute. Any tricks to remove the first n characters of each line in a string faster? Thanks, Bruce
[toc] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2015-05-16 09:43 -0400 |
| Message-ID | <mailman.71.1431783817.17265.python-list@python.org> |
| In reply to | #90729 |
On Sat, May 16, 2015 at 9:28 AM, <bruceg113355@gmail.com> wrote: > I have a string that contains 10 million characters. > > The string is formatted as: > > "0000001 : some hexadecimal text ... \n > 0000002 : some hexadecimal text ... \n > 0000003 : some hexadecimal text ... \n > ... > 0100000 : some hexadecimal text ... \n > 0100001 : some hexadecimal text ... \n" > > and I need the string to look like: > > "some hexadecimal text ... \n > some hexadecimal text ... \n > some hexadecimal text ... \n > ... > some hexadecimal text ... \n > some hexadecimal text ... \n" > > I can split the string at the ":" then iterate through the list removing the first 8 characters then convert back to a string. This method works, but it takes too long to execute. > > Any tricks to remove the first n characters of each line in a string faster? > slicing might be faster than searching for : Do you need to do this all at once? If not, use a generator > Thanks, > Bruce > -- > https://mail.python.org/mailman/listinfo/python-list -- Joel Goldstick http://joelgoldstick.com
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-16 23:45 +1000 |
| Message-ID | <mailman.72.1431783967.17265.python-list@python.org> |
| In reply to | #90729 |
On Sat, May 16, 2015 at 11:28 PM, <bruceg113355@gmail.com> wrote:
> I have a string that contains 10 million characters.
>
> The string is formatted as:
>
> "0000001 : some hexadecimal text ... \n
> 0000002 : some hexadecimal text ... \n
> 0000003 : some hexadecimal text ... \n
> ...
> 0100000 : some hexadecimal text ... \n
> 0100001 : some hexadecimal text ... \n"
>
> and I need the string to look like:
>
> "some hexadecimal text ... \n
> some hexadecimal text ... \n
> some hexadecimal text ... \n
> ...
> some hexadecimal text ... \n
> some hexadecimal text ... \n"
>
> I can split the string at the ":" then iterate through the list removing the first 8 characters then convert back to a string. This method works, but it takes too long to execute.
>
> Any tricks to remove the first n characters of each line in a string faster?
Given that your definition is "each line", what I'd advise is first
splitting the string into lines, then changing each line, and then
rejoining them into a single string.
lines = original_text.split("\n")
new_text = "\n".join(line[8:] for line in lines)
Would that work?
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | bruceg113355@gmail.com |
|---|---|
| Date | 2015-05-16 07:02 -0700 |
| Message-ID | <4272d9d9-3d5b-4b8b-9875-6b66634b490c@googlegroups.com> |
| In reply to | #90731 |
On Saturday, May 16, 2015 at 9:46:17 AM UTC-4, Chris Angelico wrote:
> On Sat, May 16, 2015 at 11:28 PM, <bruceg113355@gmail.com> wrote:
> > I have a string that contains 10 million characters.
> >
> > The string is formatted as:
> >
> > "0000001 : some hexadecimal text ... \n
> > 0000002 : some hexadecimal text ... \n
> > 0000003 : some hexadecimal text ... \n
> > ...
> > 0100000 : some hexadecimal text ... \n
> > 0100001 : some hexadecimal text ... \n"
> >
> > and I need the string to look like:
> >
> > "some hexadecimal text ... \n
> > some hexadecimal text ... \n
> > some hexadecimal text ... \n
> > ...
> > some hexadecimal text ... \n
> > some hexadecimal text ... \n"
> >
> > I can split the string at the ":" then iterate through the list removing the first 8 characters then convert back to a string. This method works, but it takes too long to execute.
> >
> > Any tricks to remove the first n characters of each line in a string faster?
>
> Given that your definition is "each line", what I'd advise is first
> splitting the string into lines, then changing each line, and then
> rejoining them into a single string.
>
> lines = original_text.split("\n")
> new_text = "\n".join(line[8:] for line in lines)
>
> Would that work?
>
> ChrisA
Hi Chris,
I meant to say I can split the string at the \n.
Your approach using .join is what I was looking for.
Thank you,
Bruce
[toc] | [prev] | [next] | [standalone]
| From | bruceg113355@gmail.com |
|---|---|
| Date | 2015-05-16 09:22 -0700 |
| Message-ID | <f3836ca1-3f87-4c5c-9839-4d8a35aa77e4@googlegroups.com> |
| In reply to | #90733 |
On Saturday, May 16, 2015 at 10:06:31 AM UTC-4, Stefan Ram wrote:
> bruceg113355@gmail.com writes:
> >Your approach using .join is what I was looking for.
>
> I'd appreciate a report of your measurements.
# Original Approach
# -----------------
ss = ss.split("\n")
ss1 = ""
for sdata in ss:
ss1 = ss1 + (sdata[OFFSET:] + "\n")
# Chris's Approach
# ----------------
lines = ss.split("\n")
new_text = "\n".join(line[8:] for line in lines)
Test #1, Number of Characters: 165110
Original Approach: 18ms
Chris's Approach: 1ms
Test #2, Number of Characters: 470763
Original Approach: 593ms
Chris's Approach: 16ms
Test #3, Number of Characters: 944702
Original Approach: 2.824s
Chris's Approach: 47ms
Test #4, Number of Characters: 5557394
Original Approach: 122s
Chris's Approach: 394ms
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-05-16 10:57 -0600 |
| Message-ID | <mailman.77.1431795491.17265.python-list@python.org> |
| In reply to | #90740 |
On Sat, May 16, 2015 at 10:22 AM, <bruceg113355@gmail.com> wrote:
> # Chris's Approach
> # ----------------
> lines = ss.split("\n")
> new_text = "\n".join(line[8:] for line in lines)
Looks like the approach you have may be fast enough already, but I'd
wager the generator expression could be replaced with:
map(operator.itemgetter(slice(8, None)), lines)
for a modest speed-up. On the downside, this is less readable.
Substitute itertools.imap for map if using Python 2.x.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-17 02:59 +1000 |
| Message-ID | <mailman.78.1431795548.17265.python-list@python.org> |
| In reply to | #90740 |
On Sun, May 17, 2015 at 2:22 AM, <bruceg113355@gmail.com> wrote:
> # Original Approach
> # -----------------
> ss = ss.split("\n")
> ss1 = ""
> for sdata in ss:
> ss1 = ss1 + (sdata[OFFSET:] + "\n")
>
>
> # Chris's Approach
> # ----------------
> lines = ss.split("\n")
> new_text = "\n".join(line[8:] for line in lines)
Ah, yep. This is exactly what str.join() exists for :) Though do make
sure the results are the same for each - there are two noteworthy
differences between these two. Your version has a customizable OFFSET,
where mine is hard-coded; I'm sure you know how to change that part.
The subtler one is that "\n".join(...) won't put a \n after the final
string - your version ends up adding one more newline. If that's
important to you, you'll have to add one explicitly. (I suspect
probably not, though; ss.split("\n") won't expect a final newline, so
you'll get a blank entry in the list if there is one, and then you'll
end up reinstating the newline when that blank gets joined in.) Just
remember to check correctness before performance, and you should be
safe.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | bruceg113355@gmail.com |
|---|---|
| Date | 2015-05-16 10:35 -0700 |
| Message-ID | <8354e4ba-0a80-48a1-b2e9-6edb4b67ff36@googlegroups.com> |
| In reply to | #90745 |
On Saturday, May 16, 2015 at 12:59:19 PM UTC-4, Chris Angelico wrote:
> On Sun, May 17, 2015 at 2:22 AM, <bruceg113355@gmail.com> wrote:
> > # Original Approach
> > # -----------------
> > ss = ss.split("\n")
> > ss1 = ""
> > for sdata in ss:
> > ss1 = ss1 + (sdata[OFFSET:] + "\n")
> >
> >
> > # Chris's Approach
> > # ----------------
> > lines = ss.split("\n")
> > new_text = "\n".join(line[8:] for line in lines)
>
> Ah, yep. This is exactly what str.join() exists for :) Though do make
> sure the results are the same for each - there are two noteworthy
> differences between these two. Your version has a customizable OFFSET,
> where mine is hard-coded; I'm sure you know how to change that part.
> The subtler one is that "\n".join(...) won't put a \n after the final
> string - your version ends up adding one more newline. If that's
> important to you, you'll have to add one explicitly. (I suspect
> probably not, though; ss.split("\n") won't expect a final newline, so
> you'll get a blank entry in the list if there is one, and then you'll
> end up reinstating the newline when that blank gets joined in.) Just
> remember to check correctness before performance, and you should be
> safe.
>
> ChrisA
Hi Chris,
Your approach more than meets my requirements.
Data is formatted correctly and performance is simply amazing.
OFFSET and \n are small details.
Thank you again,
Bruce
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2015-05-17 08:41 +1000 |
| Message-ID | <mailman.87.1431817664.17265.python-list@python.org> |
| In reply to | #90746 |
On 16May2015 10:35, bruceg113355@gmail.com <bruceg113355@gmail.com> wrote:
>On Saturday, May 16, 2015 at 12:59:19 PM UTC-4, Chris Angelico wrote:
>> On Sun, May 17, 2015 at 2:22 AM, <bruceg113355@gmail.com> wrote:
>> > # Original Approach
>> > # -----------------
>> > ss = ss.split("\n")
>> > ss1 = ""
>> > for sdata in ss:
>> > ss1 = ss1 + (sdata[OFFSET:] + "\n")
>> >
>> > # Chris's Approach
>> > # ----------------
>> > lines = ss.split("\n")
>> > new_text = "\n".join(line[8:] for line in lines)
[...]
>
>Your approach more than meets my requirements.
>Data is formatted correctly and performance is simply amazing.
>OFFSET and \n are small details.
The only comment I'd make at this point is to consider if you really need a
single string at the end. Keeping it as a list of lines may be more flexible.
(It will consume more memory.) If you're doing more stuff with the string as
lines then you'd need to re-split it, and so forth.
Cheers,
Cameron Simpson <cs@zip.com.au>
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <invalid@invalid.invalid> |
|---|---|
| Date | 2015-05-16 14:59 +0000 |
| Message-ID | <mj7m19$qn$1@reader1.panix.com> |
| In reply to | #90729 |
On 2015-05-16, bruceg113355@gmail.com <bruceg113355@gmail.com> wrote: > I have a string that contains 10 million characters. > > The string is formatted as: > > "0000001 : some hexadecimal text ... \n > 0000002 : some hexadecimal text ... \n > 0000003 : some hexadecimal text ... \n > ... > 0100000 : some hexadecimal text ... \n > 0100001 : some hexadecimal text ... \n" > > and I need the string to look like: > > "some hexadecimal text ... \n > some hexadecimal text ... \n > some hexadecimal text ... \n > ... > some hexadecimal text ... \n > some hexadecimal text ... \n" > > I can split the string at the ":" then iterate through the list > removing the first 8 characters then convert back to a string. This > method works, but it takes too long to execute. > > Any tricks to remove the first n characters of each line in a string faster? Well, if the strings are all in a file, I'd probably just use sed: $ sed 's/^........//g' file1.txt >file2.txt or $ sed 's/^.*://g' file1.txt >file2.txt
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2015-05-16 08:13 -0700 |
| Message-ID | <3faf260f-7777-4a60-8212-981340e478b3@googlegroups.com> |
| In reply to | #90737 |
On Saturday, May 16, 2015 at 8:30:02 PM UTC+5:30, Grant Edwards wrote: > On 2015-05-16, bruceg113355 wrote: > > > I have a string that contains 10 million characters. > > > > The string is formatted as: > > > > "0000001 : some hexadecimal text ... \n > > 0000002 : some hexadecimal text ... \n > > 0000003 : some hexadecimal text ... \n > > ... > > 0100000 : some hexadecimal text ... \n > > 0100001 : some hexadecimal text ... \n" > > > > and I need the string to look like: > > > > "some hexadecimal text ... \n > > some hexadecimal text ... \n > > some hexadecimal text ... \n > > ... > > some hexadecimal text ... \n > > some hexadecimal text ... \n" > > > > I can split the string at the ":" then iterate through the list > > removing the first 8 characters then convert back to a string. This > > method works, but it takes too long to execute. > > > > Any tricks to remove the first n characters of each line in a string faster? > > Well, if the strings are all in a file, I'd probably just use sed: > > $ sed 's/^........//g' file1.txt >file2.txt > > or > > $ sed 's/^.*://g' file1.txt >file2.txt And if they are not in a file you could start by putting them (it) there :-) Seriously... How does your 'string' come into existence? How/when do you get hold of it?
[toc] | [prev] | [next] | [standalone]
| From | bruceg113355@gmail.com |
|---|---|
| Date | 2015-05-16 09:24 -0700 |
| Message-ID | <5b815f6c-d639-4193-a644-ea2e1a1759a6@googlegroups.com> |
| In reply to | #90738 |
On Saturday, May 16, 2015 at 11:13:45 AM UTC-4, Rustom Mody wrote: > On Saturday, May 16, 2015 at 8:30:02 PM UTC+5:30, Grant Edwards wrote: > > On 2015-05-16, bruceg113355 wrote: > > > > > I have a string that contains 10 million characters. > > > > > > The string is formatted as: > > > > > > "0000001 : some hexadecimal text ... \n > > > 0000002 : some hexadecimal text ... \n > > > 0000003 : some hexadecimal text ... \n > > > ... > > > 0100000 : some hexadecimal text ... \n > > > 0100001 : some hexadecimal text ... \n" > > > > > > and I need the string to look like: > > > > > > "some hexadecimal text ... \n > > > some hexadecimal text ... \n > > > some hexadecimal text ... \n > > > ... > > > some hexadecimal text ... \n > > > some hexadecimal text ... \n" > > > > > > I can split the string at the ":" then iterate through the list > > > removing the first 8 characters then convert back to a string. This > > > method works, but it takes too long to execute. > > > > > > Any tricks to remove the first n characters of each line in a string faster? > > > > Well, if the strings are all in a file, I'd probably just use sed: > > > > $ sed 's/^........//g' file1.txt >file2.txt > > > > or > > > > $ sed 's/^.*://g' file1.txt >file2.txt > > > And if they are not in a file you could start by putting them (it) there :-) > > Seriously... How does your 'string' come into existence? > How/when do you get hold of it? Data is coming from a wxPython TextCtrl widget. The widget is displaying data received on a serial port for a user to analyze.
[toc] | [prev] | [next] | [standalone]
| From | Irmen de Jong <irmen.NOSPAM@xs4all.nl> |
|---|---|
| Date | 2015-05-16 18:55 +0200 |
| Message-ID | <55577688$0$2821$e4fe514c@news.xs4all.nl> |
| In reply to | #90741 |
On 16-5-2015 18:24, bruceg113355@gmail.com wrote: > Data is coming from a wxPython TextCtrl widget. Hm, there should be a better source of the data before it ends up in the textctrl widget. > The widget is displaying data received on a serial port for a user to analyze. If this is read from a serial port, can't you process the data directly when it arrives? This may give you the chance to simply operate on the line as soon as it arrives from the port, before pasting it all in the textctrl Irmen
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-05-16 23:24 +0000 |
| Message-ID | <mj8jik$f1c$5@dont-email.me> |
| In reply to | #90729 |
On Sat, 16 May 2015 06:28:19 -0700, bruceg113355 wrote: > I have a string that contains 10 million characters. > > The string is formatted as: > > "0000001 : some hexadecimal text ... \n 0000002 : some hexadecimal text > ... \n 0000003 : some hexadecimal text ... \n ... > 0100000 : some hexadecimal text ... \n 0100001 : some hexadecimal text > ... \n" > > and I need the string to look like: > > "some hexadecimal text ... \n some hexadecimal text ... \n some > hexadecimal text ... \n ... > some hexadecimal text ... \n some hexadecimal text ... \n" Looks to me as if you have a 10 Mbyte encoded file with line numbers as ascii text and you're trying to strip the line numbers before decoding the file. Are you looking for a one-off solution, or do you have a lot of these files? If you have a lot of files to process, you could try using something like sed. sed -i.old 's/^\d+ : //' *.ext -- Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web