Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #67666 > unrolled thread

find and replace string in binary file

Started byloial <jldunn2000@gmail.com>
First post2014-03-04 04:27 -0800
Last post2014-03-05 07:06 -0500
Articles 9 — 8 participants

Back to article view | Back to comp.lang.python


Contents

  find and replace string in binary file loial <jldunn2000@gmail.com> - 2014-03-04 04:27 -0800
    Re: find and replace string in binary file MRAB <python@mrabarnett.plus.com> - 2014-03-04 13:08 +0000
    Re: find and replace string in binary file Peter Otten <__peter__@web.de> - 2014-03-04 14:18 +0100
    Re: find and replace string in binary file Chris Angelico <rosuav@gmail.com> - 2014-03-05 09:44 +1100
    Re: find and replace string in binary file emile <emile@fenx.com> - 2014-03-04 16:13 -0800
      Re: find and replace string in binary file loial <jldunn2000@gmail.com> - 2014-03-05 01:59 -0800
        Re: find and replace string in binary file Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-05 12:42 +0000
        Re: find and replace string in binary file Emile van Sebille <emile@fenx.com> - 2014-03-05 09:46 -0800
    Re:find and replace string in binary file Dave Angel <davea@davea.name> - 2014-03-05 07:06 -0500

#67666 — find and replace string in binary file

Fromloial <jldunn2000@gmail.com>
Date2014-03-04 04:27 -0800
Subjectfind and replace string in binary file
Message-ID<01951a7d-2ab3-4203-a9c5-2f79017a980d@googlegroups.com>
How do I read a binary file, find/identify a character string and replace it with another character string and write out to another file?

Its the finding of the string in a binary file that I am not clear on.

Any help appreciated

[toc] | [next] | [standalone]


#67671

FromMRAB <python@mrabarnett.plus.com>
Date2014-03-04 13:08 +0000
Message-ID<mailman.7708.1393938532.18130.python-list@python.org>
In reply to#67666
On 2014-03-04 12:27, loial wrote:
> How do I read a binary file, find/identify a character string and
> replace it with another character string and write out to another
> file?
>
> Its the finding of the string in a binary file that I am not clear
> on.
>
> Any help appreciated
>
Read it in chunks and search each chunk (the chunks should be at least
as long as the search string).

You should note that the string you're looking for could be split
across 2 chunks, so when writing the code make sure that you include
some overlap between adjacent chunks (it's best if the overlap is at
least N-1 characters, where N is the length of the search string).

[toc] | [prev] | [next] | [standalone]


#67672

FromPeter Otten <__peter__@web.de>
Date2014-03-04 14:18 +0100
Message-ID<mailman.7709.1393939155.18130.python-list@python.org>
In reply to#67666
loial wrote:

> How do I read a binary file, find/identify a character string and replace
> it with another character string and write out to another file?
> 
> Its the finding of the string in a binary file that I am not clear on.

That's not possible. You have to convert either binary to string or string 
to binary before you can replace. Whatever you choose, you have to know the 
encoding of the file. Consider

#python3
ENCODING = "iso-8859-1"
with open(source, encoding=ENCODING) as infile:
    data = infile.read()
with open(dest, "w", encoding=ENCODING) as outfile:
    outfile.write(data.replace("nötig", "möglich"))

If the file is indeed iso-8859-1 this will replace occurrences of the bytes

b'n\xf6tig' with b'm\xf6glich'

But if you were guessing wrong and the file is utf-8 it may contain the 
bytes b'n\xc3\xb6tig' instead which are incorrectly interpreted by your 
script as 'nötig' and thus left as is.

[toc] | [prev] | [next] | [standalone]


#67743

FromChris Angelico <rosuav@gmail.com>
Date2014-03-05 09:44 +1100
Message-ID<mailman.7760.1393973045.18130.python-list@python.org>
In reply to#67666
On Wed, Mar 5, 2014 at 12:18 AM, Peter Otten <__peter__@web.de> wrote:
> loial wrote:
>
>> How do I read a binary file, find/identify a character string and replace
>> it with another character string and write out to another file?
>>
>> Its the finding of the string in a binary file that I am not clear on.
>
> That's not possible. You have to convert either binary to string or string
> to binary before you can replace. Whatever you choose, you have to know the
> encoding of the file.

If it's actually a binary file (as in, an executable, or an image, or
something), then the *file* won't have an encoding, so you'll need to
know the encoding of the particular string you want and encode your
string to bytes.

ChrisA

[toc] | [prev] | [next] | [standalone]


#67762

Fromemile <emile@fenx.com>
Date2014-03-04 16:13 -0800
Message-ID<mailman.7773.1393978398.18130.python-list@python.org>
In reply to#67666
On 03/04/2014 02:44 PM, Chris Angelico wrote:
> On Wed, Mar 5, 2014 at 12:18 AM, Peter Otten <__peter__@web.de> wrote:
>> loial wrote:
>>
>>> How do I read a binary file, find/identify a character string and replace
>>> it with another character string and write out to another file?
>>>
>>> Its the finding of the string in a binary file that I am not clear on.
>>
>> That's not possible. You have to convert either binary to string or string
>> to binary before you can replace. Whatever you choose, you have to know the
>> encoding of the file.
>
> If it's actually a binary file (as in, an executable, or an image, or
> something), then the *file* won't have an encoding, so you'll need to
> know the encoding of the particular string you want and encode your
> string to bytes.


On 2.7 it's as easy as it sounds without having to think much about 
encodings and such.  I find it mostly just works.

emile@paj39:~$ which python
/usr/bin/python
emile@paj39:~$ python
Python 2.7.3 (default, Sep 26 2013, 16:38:10)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> image = open('/usr/bin/python','rb').read()
 >>> image.find("""Type "help", "copyright", "credits" """)
1491592
 >>> image = image[:1491592]+"Echo"+image[1491592+4:]
 >>> open('/home/emile/pyecho','wb').write(image)
 >>>
emile@paj39:~$ chmod a+x /home/emile/pyecho
emile@paj39:~$ /home/emile/pyecho
Python 2.7.3 (default, Sep 26 2013, 16:38:10)
[GCC 4.7.2] on linux2
Echo "help", "copyright", "credits" or "license" for more information.

YMMV,

Emile

[toc] | [prev] | [next] | [standalone]


#67830

Fromloial <jldunn2000@gmail.com>
Date2014-03-05 01:59 -0800
Message-ID<687bff67-c486-402d-9f50-82e018c75585@googlegroups.com>
In reply to#67762
Thanks Emile.

Unfortunately I have to use python 2.6 for this


On Wednesday, 5 March 2014 00:13:00 UTC, emile  wrote:
> On 03/04/2014 02:44 PM, Chris Angelico wrote:
> 
> > On Wed, Mar 5, 2014 at 12:18 AM, Peter Otten <__peter__@web.de> wrote:
> 
> >> loial wrote:
> 
> >>
> 
> >>> How do I read a binary file, find/identify a character string and replace
> 
> >>> it with another character string and write out to another file?
> 
> >>>
> 
> >>> Its the finding of the string in a binary file that I am not clear on.
> 
> >>
> 
> >> That's not possible. You have to convert either binary to string or string
> 
> >> to binary before you can replace. Whatever you choose, you have to know the
> 
> >> encoding of the file.
> 
> >
> 
> > If it's actually a binary file (as in, an executable, or an image, or
> 
> > something), then the *file* won't have an encoding, so you'll need to
> 
> > know the encoding of the particular string you want and encode your
> 
> > string to bytes.
> 
> 
> 
> 
> 
> On 2.7 it's as easy as it sounds without having to think much about 
> 
> encodings and such.  I find it mostly just works.
> 
> 
> 
> emile@paj39:~$ which python
> 
> /usr/bin/python
> 
> emile@paj39:~$ python
> 
> Python 2.7.3 (default, Sep 26 2013, 16:38:10)
> 
> [GCC 4.7.2] on linux2
> 
> Type "help", "copyright", "credits" or "license" for more information.
> 
>  >>> image = open('/usr/bin/python','rb').read()
> 
>  >>> image.find("""Type "help", "copyright", "credits" """)
> 
> 1491592
> 
>  >>> image = image[:1491592]+"Echo"+image[1491592+4:]
> 
>  >>> open('/home/emile/pyecho','wb').write(image)
> 
>  >>>
> 
> emile@paj39:~$ chmod a+x /home/emile/pyecho
> 
> emile@paj39:~$ /home/emile/pyecho
> 
> Python 2.7.3 (default, Sep 26 2013, 16:38:10)
> 
> [GCC 4.7.2] on linux2
> 
> Echo "help", "copyright", "credits" or "license" for more information.
> 
> 
> 
> YMMV,
> 
> 
> 
> Emile

[toc] | [prev] | [next] | [standalone]


#67841

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-03-05 12:42 +0000
Message-ID<mailman.7817.1394023373.18130.python-list@python.org>
In reply to#67830
On 05/03/2014 09:59, loial wrote:

I'm pleased to see that you have answers.  In return would you please 
read and action this https://wiki.python.org/moin/GoogleGroupsPython to 
prevent us seeing double line spacing, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]


#67861

FromEmile van Sebille <emile@fenx.com>
Date2014-03-05 09:46 -0800
Message-ID<mailman.7826.1394041582.18130.python-list@python.org>
In reply to#67830
On 3/5/2014 1:59 AM, loial wrote:

> Unfortunately I have to use python 2.6 for this

Did you try it?

Emile


[toc] | [prev] | [next] | [standalone]


#67834

FromDave Angel <davea@davea.name>
Date2014-03-05 07:06 -0500
Message-ID<mailman.7811.1394020928.18130.python-list@python.org>
In reply to#67666
 loial <jldunn2000@gmail.com> Wrote in message:
> How do I read a binary file, find/identify a character string and replace it with another character string and write out to another file?
> 
> Its the finding of the string in a binary file that I am not clear on.
> 
> Any help appreciated
> 

I see from another message that you're using Python 2.6. That
 makes a huge difference and should have been in your query, along
 with a minimal code sample.

Is the binary file under 100 MB or so? Then open it (in binary
 mode 'rb'), and read it. You'll now have a (large) byte string
 containing the entire file. 

The next question is whether you're sure that your search and
 replace strings are ASCII. Assuming that is probably a mistake, 
 but it will get you started. 

Now the substitution is trivial:
        new_bytes = old_bytes.replace (search, replace)
It's also possible to emulate that with find and slice, mainly if
 you need to report progress to the user.

If the search and/or replace strings are not ASCII, you have to
 know what encoding the file may have used for them.  You need to
 build a Unicode string, encode it the same way as the file uses,
 and then call the replace method. 

Now for a huge caveat.  If you don't know the binary format,
 you're risking the creation of pure junk. Here are just two
 examples of what might go wrong, assuming the file is an
 executable.  The same risks exist for other files, but I'm just
 supposing. 

If the two byte strings are not the same length, then all the
 remaining code and data in the file will be moved to a new spot. 
 If you're lucky,  the code will crash quickly,  since all
 pointers referencing that code and data are incorrect.

If some non-textual part of the file happens to match your search
 string you're going to likely trash that portion of the code.  If
 the search string is large enough,  maybe this is unlikely.  But
 I recall taking the challenge of writing assembly programs which
 could be generated entirely from one or more type commands
 (msdos)



-- 
DaveA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web