Groups | Search | Server Info | Login | Register

Re: Voice compression

From	pozz <pozzugno@gmail.com>
Newsgroups	comp.arch.embedded
Subject	Re: Voice compression
Date	2025-04-03 19:53 +0200
Organization	A noiseless patient Spider
Message-ID	<vsmhuo$188tp$1@dont-email.me> (permalink)
References	<vsjotj$14v21$1@dont-email.me> <vsjtmt$2ej99$1@dont-email.me>

Show all headers | View raw

Il 02/04/2025 19:55, Rafael Deliano ha scritto:
> CVSD uses a bit-serial data stream. Harris datasheets for obsolete 
> Codecs are HC55516, HC55532. The "recording"-circuit can be an analog 
> hack ( Kop, flipflop, 4 Bit shiftregister ) that sends data via SPI.
> The "playback" would have to emulate this circuit in software and output
> via a 8 bit D/A ( R2R resistor network, but serial ICs may be easier in 
> SMD ).
> 16kBit/sec is very moderate quality, 24kBit/sec more reasonable.
> We used these in the 80ies for digital answering machines in cars for 
> the analog telephone system via radio that predated GSM in Germany. 
> 24kBit was for incoming messages in RAM, 16 kBit for the fixed messages 
> from EPROM. CVSD was ok, as the analog radio was a bit noisy
> anyway.

Thank you for the suggestion. I tried to implement a simple CVSD codec 
in Python just to test the quality. I finally got these two functions[1].

I started from this audio[2] and obtained this one[3] after an encoding 
and decoding process. It's a short speech from an italian voice. I think 
you can see how bad the quality of decoded audio is.

I suspect I made some errors, because I don't think this is the quality 
of this audio codec. You said this codec was used in the past, but even 
if the quality some years ago wasn't high, the quality I reached in my 
implementation is very poor, quite unusable.

[2] https://we.tl/t-RmC6EszYRS
[3] https://we.tl/t-oVbXFy5twW


> At 32kBit/sec ADPCM is better, but you probably do not intend to use a 
> 64kBit PCM codec as a frontend. If you use a handset or a digital
> PCM-link, the quality of CVSD may be not competitive. For playback via
> a loudspeaker sufficient, there is usually enough background noise.

My sounds is quite clear, they are generated by a TTS engine. Then they 
are flashed on the chip memory.


[1]
def cvsd_encode(samples):
     prev_sample = 0
     step_size = 16
     STEP_SIZE_MIN = 16
     STEP_SIZE_MAX = 16384

     encoded_stream = bytearray()
     encoded_byte = ""
     last_bits = 0x00
     for sample in samples:
         bit = 1 if sample >= prev_sample else 0

         # Aggiorna il valore del campione precedente
         if bit == 1:
             prev_sample += step_size
         else:
             prev_sample -= step_size

         # Adatta la dimensione dello step guardando gli ultimi 3 bit
         last_bits = last_bits << 1
         last_bits += 1 if bit == 1 else 0
         last_bits &= 0x07
         if last_bits == 0x00 or last_bits == 0x07:
             step_size = step_size * 2
         else:
             step_size = step_size // 2
         # Limita la dimensione del passo
         if step_size > STEP_SIZE_MAX:
             step_size = STEP_SIZE_MAX
         elif step_size < STEP_SIZE_MIN:
             step_size = STEP_SIZE_MIN

         encoded_byte += "1" if bit == 1 else "0"
         if len(encoded_byte) == 8:
             encoded_stream += bytes([int(encoded_byte,2)])
             encoded_byte = ""

     return encoded_stream


def cvsd_decode(bitstream):
     prev_sample = 0
     step_size = 16
     STEP_SIZE_MIN = 16
     STEP_SIZE_MAX = 16384

     samples = []
     last_bits = 0x00
     for byte in bitstream:
         for sbit in f"{byte:08b}":
             bit = 1 if sbit == "1" else 0
             if bit == 1:
                 prev_sample += step_size
             else:
                 prev_sample -= step_size

             samples += [prev_sample]

             # Adatta la dimensione dello step guardando gli ultimi 3 bit
             last_bits = last_bits << 1
             last_bits += 1 if bit == 1 else 0
             last_bits &= 0x07
             if last_bits == 0x00 or last_bits == 0x07:
                 step_size = step_size * 2
             else:
                 step_size = step_size // 2
             # Limita la dimensione del passo
             if step_size > STEP_SIZE_MAX:
                 step_size = STEP_SIZE_MAX
             elif step_size < STEP_SIZE_MIN:
                 step_size = STEP_SIZE_MIN

     return samples

Back to comp.arch.embedded | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

Voice compression pozz <pozzugno@gmail.com> - 2025-04-02 18:33 +0200
  Re: Voice compression Rafael Deliano <Rafael_Deliano@arcor.de> - 2025-04-02 19:55 +0200
    Re: Voice compression pozz <pozzugno@gmail.com> - 2025-04-03 19:53 +0200
      Re: Voice compression Rafael Deliano <Rafael_Deliano@arcor.de> - 2025-04-05 11:12 +0200
        Re: Voice compression pozz <pozzugno@gmail.com> - 2025-04-07 13:13 +0200
  Re: Voice compression Paul Rubin <no.email@nospam.invalid> - 2025-04-04 13:54 -0700
    Re: Voice compression pozz <pozzugno@gmail.com> - 2025-04-07 13:09 +0200

csiph-web