Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Tim Watts <tw@dionic.net>
Newsgroups: comp.databases.mysql
Subject: Re: Can MySql database store images?
Followup-To: comp.databases.mysql
Date: Wed, 27 Apr 2011 16:32:01 +0100
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <hgpl88-ifi.ln1@squidward.dionic.net>
References: <npqdnTD4r-AUWi7QnZ2dnUVZ5qudnZ2d@giganews.com> <ip15se$33s$1@dont-email.me> <55md88-7pb.ln1@squidward.dionic.net> <ip1c0d$dde$1@dont-email.me> <ip21sn$cb8$1@dont-email.me> <ip23aj$v3q$2@dont-email.me> <ip361h$ml9$1@dont-email.me> <68mf88-4l5.ln1@squidward.dionic.net> <ip57sl$rc5$1@dont-email.me> <ip6jp4$kca$1@dont-email.me> <ip7uo6$u89$1@dont-email.me> <ip8gb0$2ni$2@news.albasani.net> <slrnirg20d.2mh.hellsop@nibelheim.ninehells.com> <n3fl88-n0k.ln1@squidward.dionic.net> <slrnirg9ps.2mh.hellsop@nibelheim.ninehells.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
Injection-Info: mx03.eternal-september.org; posting-host="PfteNUsu9gxPcp7CbOxONA"; logging-data="19376"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/m0GxdZp1AkOloTInNKyeBUiYON31otjM="
User-Agent: KNode/4.4.9
Cancel-Lock: sha1:EpYEPCVj4TNPdMRqdFmHu/282fg=
Xref: x330-a1.tempe.blueboxinc.net comp.databases.mysql:705

Peter H. Coffin wrote:

> On Wed, 27 Apr 2011 13:34:31 +0100, Tim Watts wrote:
>> Peter H. Coffin wrote:
>>
>>> On Wed, 27 Apr 2011 08:24:16 +0100, The Natural Philosopher wrote:
>>>> Norman Peelman wrote:
>>>>> Doug Miller wrote:
>>>>>> In article <ip57sl$rc5$1@dont-email.me>, Norman Peelman
>>>>>>> 460 * 5.3kb
>>>>>>>
>>>>>> You wrote above "460 images ... with an average size of 5393kb" .
>>>>>> 5393kb is 5.4 MEGAbytes, not 5.3kb.
>>>>> 
>>>>>   Yes, my fingers were going faster than my brain.
>>>>> 
>>>>> Average of 5393 bytes (5kb)
>>>>> Max of 10240 bytes (10kb)
>>>>> 
>>>>> ...these are small images.
>>>>> 
>>>>> Dump file (w/images) = 29.8MB
>>>>> Dump file zipped = 1.7MB
>>>>> 
>>>>> 
>>>> It is seldom possible to compress images more than they are already
>>>> compressed.
>>>>
>>>> So I still think you have made a mistake.
>>> 
>>> Dump files can (intentionally and with malice aforethought) export
>>> binary columns in hexidecimal text, which is rather compressible. It's
>>> also very safe from things like people fussing with it with text
>>> editors, being copied and pasted into emails for demonstrative purposes,
>>> and other kinds of mistreatment.
>>> 
>>
>> I think the point that is being overlooked, is that the text, whilst
>> compressible, is itself a re-encoding of an already highly compressed bit
>> of data.
> 
> Some parts of the file are highly-compressed data. Well,
> fairly-highly-compressed, anyway. The actual compression used in, for
> example JFIF/.jpeg is Huffman. More of the initial 'compression' in
> those comes from not actual compression but rather lossy tricks to make
> an image that looks about the same as the original to the eye, but isn't
> itself 'compression' in the sense that the data inside itself isn't
> necessarily further incompressible. What this means is that fairly small
> images don't have a lot of "compressed data" in them in the first place,
> and the overhead for graphics with small fields of image data might be
> easily half overhead.
> 
>> if anything, the intermidate text encoding should make things worse
>> overall, not better.
> 
> One would think so at first glance, but text is really easy to compress.
> 
>> We are still talking about 2.2MB compressed image data mixed up with
>> other stuff in an exploded ASCII form being, somehow, recompressed down
>> to a total which is less than the sum of the original images alone.
> 
> That's the key to what I think is happening here. See, one image may be
> compressible for some small gains. But many images, especially with very
> similar information in the overhead portions of the formats, like
> they're mostly all the same sizes, or use similar color pallets, end up
> being compressable by being able to compress duplicate information
> *between* the images as well as within the image itself.
> 
>> It would make me want to double check the dumps to see they really had
>> everything...
> 
> Always a worthwhile step. But if the dump restores okay, the size alone
> isn't necessarily a warning that something else is wrong.
> 

Nice explanation Peter. That makes sense (in particular the commonality 
between images).

Cheers

Tim

-- 
Tim Watts