29.06.2020

Sound restoration. Criteria for assessing audio material during restoration. Restoring audio recordings at home Mike Thornton restoration and cleaning of audio files

Restoration of audio recordings at home There are materials that the reader needs like air. To be honest, I have received dozens of letters with questions on this topic, and the total amount has already exceeded a hundred. Well, let's figure it out...

To be honest, sound recorded from a low-quality source is practically impossible to restore to its original quality level. It can be processed with noise reduction, add/excite a couple of harmonics, but as we do these things we move further and further from the original. That is, we will do what women sometimes do Balzac age To be more beautiful, apply makeup and hide blemishes. And sometimes, you must admit, they do it simply perfectly. So let's look at the sources that readers use. There are three main ones: a tape recorder, a TV and LP disc players. The loss in recording quality from such sources is obvious. But each method requires its own approach.

Tape recorder: noise, background, harmonic distortion, intermodulation distortion, specific spectrogram of the reproducing audio path.

TV: in some cases, sound “rattling” (corrected with a compressor), noise, problems due to conversion modern works from mono to stereo, distortion of the picture due to the use in some modern models of a delay between two channels to create surround effects, a specific spectrogram of the reproducing audio path.

LP players. Crackling, clicking, noise, cutoff of some frequency bands. If we are dealing with old vinyls from the Melodiya company, then in some cases the frequency range is very narrow, but this ensures maximum audibility. Let's agree that we will not talk about high-tech representatives of high-end devices. We're talking about regular audio systems.

Noises

Almost all of the devices we consider have a drawback - noise. This is the most serious point of restoration work. Noise is a sound that is made up of many different harmonics (overtones), and in some cases extends over the entire frequency range. In fact, I have come across the only algorithm for high-quality processing of frequency bands - when encoding into MP3. We will not touch upon the algorithms themselves, but I would like to note that any file encoded in MP3 has greater “outline” sound. I remember doing an experiment with multiple encoding/decoding of the same file. As I remember, the file itself was taken from the vocal track of Suzanne Vega. The result of the research was a whisper file obtained from vocals. In addition, some additional noise was generated, which became noticeable from a certain moment. Therefore, MP3 encoding must be used very carefully. It may or may not be suitable for each individual case.

The use of Compressor/Gate/Limiter was previously considered an effective method. Personally, I do not consider this use to be correct, since almost all the information that is available to us from the above-mentioned devices has already been processed by this combination during mastering or premastering. In this case, we simply “kill” some harmonics, which is not a way out.

The most advanced solution was to create a noise sample and overlay it out of phase with the original file. It is this method that has become popular, since in some cases it is absolutely harmless to the useful signal. We can see its implementation in Cool Edit and Sonic Foundry Sound Forge.

You can also create noise samples yourself. For example, if we are dealing with a recording from a cassette, then we are interested in the fragment of “silence” between songs. Having recorded this fragment, we subject it to the following processing: we select a certain cyclic part, looking at the image of the wave. After that, we cut it off at the very beginning and end at the point of intersection of the sinusoid with zero. Then for this fragment we apply the “Invert” operation (exists in almost all sound editors). And that’s it, a noise sample in antiphase has been created. After this, in the AudioMulch level program, we launch the prepared sample in cyclic playback mode. At the same time, turn on the cassette and adjust the sample delay. With a normal match, a significant part of the noise should disappear. Many will ask: why such a cumbersome scheme? My answer: imagine that you need to record 90 minutes of material. If each file, after recording, is processed in Cool Edit, then these 90 minutes of material will result in several days of work. The most important thing in this method is not to touch the playback and recording level controls, since everything will go to waste.

Quantization noise

One of the sources of noise and distortion should be considered the recording audio path, that is, your sound card. To ensure minimal losses on this side, you need to set the recording level correctly. The amplitude maxima should be at the maximum possible level, but “slices” of these maxima are excluded. If the recording level is low, then quantization noise will be to a greater extent be present in the signal. To remove quantization noise, a dithering operation is provided - generating and adding noise that excludes quantization noise.

Intermodulation distortion

This type of distortion is specific to analog tape recorders. When playing audio, the tape speed undergoes short-term changes (the device does not have a single device responsible for synchronization). As a result, the frequency spectrum is slightly blurred, since it is known that when it slows down, the sound becomes lower, and when it speeds up, it becomes higher. The most positive way is MP3 -> WAV encoding/decoding. The recording will take on a more “digital” sound.

Removing clicks and clicks

No matter how many DeClicker software modules I have encountered in my life, I have not been satisfied with the result of their action. In most cases, the frequency range suffers. Life even taught me a very good lesson at one time: the material of one group’s album was recorded with ADAT in Lexicon, but there were obvious problems with synchronization and there were clicks. And all the time. The quality of the recorded material was also not best characteristics. I realized that using DeClickers, I would not get the quality of the output material that corresponded to my ideas about quality. I had to do the processing manually. And you know, I got an unexpected result. What did I do?

In Sound Forge, at maximum Zoom (scale of the wave image), I processed each click using the Normalize, Fade In, Fade Out operations. For the overall picture, these short-term “disappearances” were not noticeable, but the clicks disappeared.

If you want to work for high level When restoring audio recordings, it is advisable to avoid phoning devices. Use prefixes, not complexes. Take it as a fact that any module standing on the path from the carrier to the ADC adds its own “additions” to the useful signal. I have worked with guitar sound for a long time to know how sometimes it is useless to fight the background from interference and cheap gadgets. To remove the background, you can use tone generators by connecting the signal they generate in antiphase with the original one.

Many background destruction algorithms come close to noise destruction. That is, the same sample file, and the same antiphase.

In some cases, hum at low frequencies is simply removed by an equalizer.

Improved sound quality

In fact, little has been invented.

1. Enhancers.

2. Surround.

3. Exciting harmonics and adding distortion.

4. Generation of additional harmonics and their addition to the original signal.

5. MaxxBass system or similar.

6. Hi Cutoff and Hi Boost.

8. Compression.

How all this works can be seen in the programs DFX (WinAmp plug-in), Ray Gun, Restoration-NR, T-RackS, etc.

What should be on a “restorer’s” desktop?

Dart Pro 32; Cool Edit Pro 1.2

Arboretum Systems Restoration-NR

Arboretum Systems Ray Gun

Blade MP3 Enc (or any other encoder)

T-RackS; AudioMulch 0.9 b4

PAS Analysis&EQ 2.0

conclusions

The technological level in music production is increasing day by day. Therefore, restoration of recordings is the sphere of highly qualified specialists and studios with expensive equipment. In fact, to make a “pupa” out of a product, first of all, experience is required. Therefore, in this material I showed how I solve these problems, but it is not a fact that these are complete solutions in this area. The main thing I want to say is that the solution to this problem will never come close to ideal. An ideally clean sound is the sound of a GM synthesizer, but it is not a fact that listeners love it. Therefore, when working on restoration, you must achieve simply the purity and transparency that are so inherent in modern music industry products. Christopher M. IHIHO

MINISTRY OF EDUCATION

RUSSIAN FEDERATION

MOSCOW STATE INSTITUTE

RADIO ENGINEERING ELECTRONICS AND AUTOMATION

(TECHNICAL UNIVERSITY)

Coursework in computer science

Old record: What is digital sound and sound restoration using digital processing.

Student Chistyakov I.A.

Group OTO 4-04

Teacher Andrianova E. G.

The work is approved for protection ______________________________

Course work is defended with a grade of ___________________

Moscow 2005

1.Introduction……………………………………………………………..3

2. Part one, theoretical……..……………………………..3

A. Theory of digital sound…………………………….……3

B. Digitization of sound and its storage on digital media.7

D. Advantages and disadvantages of digital audio………….14

D. On the issue of sound processing……………..……………….17

E. Equipment……………………………………..…………18

G. Software………………………….……..22

3.Part two: more practical……………..…………………25

1. Connecting the player to a computer…..……….25

2. Setting up sound card capabilities……..………..26

3. Restoration……………………………………….…………26

4. Preparing files…………………………………………32

5. Dividing the wave file into separate compositions........32

6. Prospects and issues……………………………33

7. Glossary of terms…………………………………….34

1. Introduction

Recently, the capabilities of multimedia equipment have undergone significant growth, and a sufficient amount of attention has been paid to this area, but still the average user cannot get a clear idea of what capabilities his iron friend hides in the field of sound reproduction, squeaking, noise, binary waves etc. Everything is limited to the reproduction of screams and explosions in games and films (fortunately technical progress has already reached this level) and listening to the home music library (or is it time to come up with another name, something like “digital library”?).

In this work we will try to understand the main aspects of this problem. Let's talk a little about the anatomy, the theory of digital sound and what can be learned from an old vinyl record and audio cassette.

What exactly do we know about computer audio capabilities, other than the fact that our home computer has a sound card and two speakers? Unfortunately, probably due to insufficient literature or for some other reasons, the user is most often not familiar with anything other than the built-in Windows audio input/output mixer and Recorder. In order to find out what a computer can do in the field of sound, you just need to take an interest and opportunities will open up for you that you may not have even imagined. And all this is not as difficult as it might seem at first glance.

2.Part one: more theoretical.

All processes of recording, processing and reproducing sound in one way or another work on one organ with which we perceive sounds - the ear. Two pieces:).
Without understanding what we hear, what is important to us and what is not, what is the reason for certain musical patterns - without these and other little things it is impossible to design good audio equipment, it is impossible to effectively compress or process sound. What is described here is just the very basics.
From the outside we see the so-called external ear. Nothing special interests us here. Then there is a channel - approximately 0.5 cm in diameter and about 3 cm in length. Next is the eardrum, to which the bones are attached - the middle ear. These bones transmit the vibration of the eardrum further - to another membrane, into the inner ear - a tube with liquid, about 0.2 mm in diameter and another 3-4 cm long, twisted like a snail. The point of having a middle ear is that air vibrations are too weak to directly vibrate fluid, and the middle ear, together with the eardrum and inner ear membrane, constitute a hydraulic amplifier - the area of the eardrum is many times larger than the inner ear membrane, so the pressure
(which is equal to F/S) is amplified tens of times.
In the inner ear, along its entire length, a certain thing resembling a string is stretched - another elongated membrane, hard at the beginning of the ear and soft at the end. A certain section of this membrane vibrates in its range, the low frequencies are in the soft section towards the end, the highest ones are at the very beginning. Along this membrane are nerves that sense vibrations and transmit them to the brain using two principles:
The first is the shock principle. Since the nerves are still capable of transmitting vibrations
(binary pulses) with a frequency of up to 400-450 Hz, it is this principle that is used in the field of low-frequency hearing. It is difficult there otherwise - the vibrations of the membrane are too strong and affect too many nerves. The percussion principle is expanded slightly to about 4 kHz with a trick - several (up to ten) nerves are struck in different phases, adding up their bandwidth. This method is good because the brain perceives information more fully - on the one hand, we still have easy frequency separation, and on the other hand, we can also look at the vibrations themselves, their shape and features, and not just the frequency spectrum. This principle is extended to the most important part for us
- spectrum of the human voice. And in general, all the most important information for us is located up to 4 kHz.

Well, the second principle is simply the location of the excited nerve, used for sounds above 4 kHz. Here, apart from the fact, we don’t care about anything at all - neither the phase, nor the duty cycle... The bare spectrum.
Thus, in the high-frequency region we have purely spectral hearing of not very high resolution, but for frequencies close to the human voice - more complete, based not only on spectrum separation, but also on additional analysis of information by the brain itself, giving a more complete stereo - a picture, for example. More on this below.

The main perception of sound occurs in the range of 1 - 4 kHz; the human voice is in the same range (and the sounds produced by most processes in nature that are important to us). Correct transmission of this frequency segment is the first condition for natural sound.

About sensitivity (power and frequency)

Now about decibels. In short, it is an additive relative logarithmic measure of the loudness (power) of sound, which best reflects the human perception of loudness, and at the same time is quite simply calculated.

In acoustics, it is customary to measure volume in dB SPL (Sound Power Level - I don’t know what it sounds like here). The zero of this scale is approximately the minimum sound that a person can hear. Accordingly, the countdown is in a positive direction. A person can meaningfully hear sounds up to approximately 120 dB SPL. At 140 dB severe pain is felt, at 150 dB ear damage occurs. Normal conversation is approximately 60 - 70 dB SPL.
Later in this section, when dB is mentioned, it means dB from zero SPL.
The sensitivity of the ear to different frequencies varies greatly. Maximum sensitivity is in the region of 1 - 4 kHz, the basic tones of the human voice.
A sound of 3 kHz is the same sound that is heard at 0 dB. Sensitivity drops significantly in both directions - for example, for a sound of 100 Hz we need as much as 40 dB (100 times the vibration amplitude), for 10 kHz - 20 dB.
We can usually tell that two sounds differ in volume by a difference of about 1 dB. Despite this, 1 dB is more a lot than a little. We just have a very highly compressed, leveled perception of loudness.
But the entire range - 120 dB - is truly huge, in amplitude it’s millions of times!

By the way, doubling the amplitude corresponds to an increase in volume by 6 dB. Attention! don’t be confused: 12 dB is 4 times, but the difference of 18 dB is already 8 times! and not 6, as one might think. dB - logarithmic measure)

The spectral sensitivity is similar in properties. We can say that two sounds (simple tones) differ in frequency if the difference between them is about 0.3% in the region of 3 kHz, and in the region of 100 Hz a difference of 4% is required! For reference, note frequencies (if taken together with semitones, that is, two adjacent piano keys, including black ones) differ by about 6%.
In general, in the region of 1 - 4 kHz, the sensitivity of the ear in all respects is maximum, and is not so much if we take non-logarithmic values with which digital technology has to work. Take note - a lot of what happens in digital audio processing can look terrible digitally and still sound indistinguishable from the original.

In digital processing, the concept of dB is calculated from zero and down into the region of negative values. Zero - maximum level, represented by a digital circuit.

A. Strictly speaking, about the figure itself.

Some facts and concepts that are hard to do without.

In accordance with the theory of the mathematician Fourier, a sound wave can be represented as a spectrum of frequencies included in it.

The frequency components of the spectrum are sinusoidal oscillations (so-called pure tones), each of which has its own amplitude and frequency. Thus, any vibration, even the most complex in form,
(for example, the human voice) can be represented as a sum of the simplest sinusoidal oscillations of certain frequencies and amplitudes. And vice versa, by generating different vibrations and superimposing them on each other (mixing, mixing), you can get different sounds.

Background: The human hearing aid/brain is capable of distinguishing frequency components of sound ranging from 20 Hz to ~20 KHz (the upper limit may vary depending on age and other factors).
In addition, the lower limit fluctuates greatly depending on the sound intensity.

B. Digitization of sound and its storage on digital media

“Ordinary” analog sound is represented in analog equipment as a continuous electrical signal. The computer operates with data in digital form. This means that the sound in the computer is represented in digital form. How does the conversion of an analog signal into a digital one take place?

Digital audio is a way of representing an electrical signal through discrete numerical values of its amplitude. Let's say we have an analog audio track of good quality (by saying “good quality” we will assume a quiet recording containing spectral components from the entire audible frequency range - approximately from 20 Hz to 20 KHz) and we want to “input” it into a computer (that is, digitize it) without loss of quality. How to achieve this and how does digitization occur? A sound wave is a complex function, the dependence of the amplitude of a sound wave on time. It would seem that since this is a function, then you can write it into the computer “as is,” that is, describe the mathematical form of the function and save it in the computer’s memory.
However, this is practically impossible, since sound vibrations cannot be represented by an analytical formula (like y=COSx, for example). There is only one way left - to describe the function by storing its discrete values at certain points. In other words, at each point in time the signal amplitude value can be measured and written down as numbers. However, this method also has its drawbacks, since we cannot record the signal amplitude values with infinite precision and are forced to round them. In other words, we will approximate this function along two coordinate axes - amplitude and time
(to approximate at points means, in simple terms, to take the values of a function at points and write them down with finite accuracy). Thus, signal digitization includes two processes - the sampling process
(sampling) and quantization process. The sampling process is the process of obtaining the values of the converted signal values at certain time intervals (Fig. 1).

Quantization - replacement process real values signal approximate with a certain accuracy (Fig. 2). Thus, digitization is the recording of the signal amplitude at certain intervals and recording the resulting amplitude values in the form of rounded digital values (since the amplitude values are a continuous value, it is not possible to write down the exact value of the signal amplitude in a finite number, which is why they resort to rounding). The recorded signal amplitude values are called samples.
Obviously, the more often we take amplitude measurements (the higher the sampling frequency) and the less we round the resulting values (the more quantization levels), the more accurate representation of the signal in digital form we will get.

The digitized signal can be stored as a set of successive amplitude values.

Now about practical problems. Firstly, we must keep in mind that the computer’s memory is not infinite, so every time when digitizing it is necessary to find some kind of compromise between the quality (directly dependent on the parameters used during digitization) and the volume occupied by the digitized signal.

Secondly, the sampling frequency sets an upper limit on the frequencies of the digitized signal, namely, the maximum frequency of the spectral components is equal to half the signal sampling frequency. Simply put, to obtain complete information about sound in the frequency band up to 22050 Hz, a sampling frequency of at least 44.1 kHz is required.

There are other problems and nuances associated with audio digitization.
Without going into too much detail, we note that in “digital sound”, due to the discreteness of information about the amplitude of the original signal, various noises and distortions appear (the phrase “in digital sound there are such and such frequencies and noise” means that when this sound is converted back from digital to analog, then the mentioned frequencies and noise will be present in its sound). For example, jitter
– noise that appears as a result of the fact that signal sampling during sampling does not occur at absolutely equal intervals of time, but with some deviations. That is, if, say, sampling is carried out with a frequency of 44.1 KHz, then samples are taken not exactly every 1/44100 of a second, but sometimes a little earlier, sometimes a little later. And since the input signal is constantly changing, such an error leads to the “capture” of a signal level that is not entirely correct. As a result, some jitter and distortion may be felt when playing the digitized signal. The appearance of jitter is the result of not absolute stability of analog-to-digital converters.
To combat this phenomenon, highly stable clock generators are used.
Another nuisance is crushing noise. As we said, when quantizing the signal amplitude, it is rounded to the nearest level. This error causes a feeling of “dirty” sound.

Reference: the standard parameters for recording audio CDs are as follows: sampling frequency - 44.1 KHz, quantization level - 16 bits.
Such parameters correspond to 65536 (2) amplitude quantization levels when taking its values 44100 times per second.

In practice, the digitization process (sampling and quantization of the signal) remains invisible to the user - all the rough work is done by various programs that give the appropriate commands to the driver
(operating system control routine) sound card. Any program (be it built into Windows Recorder or a powerful audio editor) capable of recording an analog signal into a computer, one way or another digitizes the signal with certain parameters that may be important in subsequent work with the recorded sound, and it is for this reason that it is important to understand how the digitization process occurs and what factors influence its results.

We must hear it, but we are not allowed to hear numbers.

2. Converting sound from digital to analog

How to listen to sound after digitization? That is, how to convert it back from digital to analog?

To convert the sampled signal into an analog form suitable for processing by analog devices (amplifiers and filters) and subsequent playback through speaker systems, a digital-to-analog converter (DAC) is used. The conversion process is the inverse sampling process: having information about the magnitude of the samples
(signal amplitude) and taking a certain number of samples per unit time, the original signal is restored by interpolation
(Fig. 3).

Until recently, sound reproduction in home computers was a problem, since computers were not equipped with special DACs. At first, the built-in speaker (PC speaker) was used as the simplest sound device in a computer. Generally speaking, this speaker is still available in almost all PCs, but no one remembers how to “boost” it so that it starts playing. In short, this speaker is connected to a port on the motherboard, which has two positions - 1 and 0. So, if this port is quickly turned on and off, then more or less believable sounds can be extracted from the speaker. The reproduction of different frequencies is achieved due to the fact that the speaker cone has a finite response and is not able to instantly jump from place to place, so it
“swings smoothly” due to abrupt changes in voltage across it. And if you vibrate it at different speeds, you can get air vibrations at different frequencies. A natural alternative to the speaker was the so-called Covox - this is the simplest DAC, made on several selected resistances (or a ready-made microcircuit), which ensure the conversion of the digital representation of the signal into analog - that is, into real amplitude values. Covox is easy to manufacture and therefore it was a success among amateurs until the time when the sound card became available to everyone.

In a modern computer, sound is reproduced and recorded using a sound card - either connected or built into the computer's motherboard. The job of a sound card in a computer is to input and output audio.
In practice, this means that the sound card is the converter that converts analog sound into digital and vice versa. To describe it in a simplified way, the operation of a sound card can be explained as follows.
Let's assume that an analog signal is supplied to the input of the sound card and the card is turned on (by software). First, the input analog signal goes to an analog mixer, which mixes the signals and adjusts the volume and balance. The mixer is necessary, in particular, to allow the user to control the levels. The adjusted and balanced signal then enters an analog-to-digital converter, where the signal is sampled and quantized, resulting in a bit stream being sent to the computer via the data bus, which represents the digitized audio signal. The output of audio information is almost the same as the input, only it occurs in the opposite direction. The data stream sent to the sound card overcomes a digital-to-analog converter, which forms an electrical signal from numbers describing the signal amplitude; the resulting analog signal can be passed through any analog paths for further transformations, including playback. It should be noted that if the sound card is equipped with an interface for exchanging digital data, then when working with digital audio, no analog blocks of the card are used.

There are many different ways to store digital audio. As we said, digitized sound is a set of signal amplitude values taken at certain intervals. Thus, firstly, a block of digitized audio information can be written to a file “as is,” that is, as a sequence of numbers (amplitude values). In this case, there are two ways to store information.

The first (Fig. 4) is PCM (Pulse Code Modulation) - a method of digital signal coding by recording absolute amplitude values (there are signed or unsigned representations). This is the form in which data is recorded on all audio CDs.

The second method (Fig. 5) is ADPCM (Adaptive Delta PCM - adaptive relative pulse-code modulation) - recording signal values not in absolute, but in relative changes in amplitudes (incrementations).

Secondly, it is possible to compress or simplify the data so that it takes up less memory than if it were written “as is”. Here too there are two ways.

Lossless coding is an audio encoding method that allows for 100% recovery of data from a compressed stream. This method of data compression is used in cases where maintaining the original data quality is critical. For example, after mixing sound in a recording studio, the data must be stored in an archive in its original quality for possible subsequent use. Lossless encoding algorithms existing today (for example, Monkeys Audio) can reduce the volume occupied by data by 20-50%, but at the same time ensure 100% restoration of the original data from the data obtained after compression. Such encoders are a kind of data archivers (like ZIP, RAR and others), only designed for audio compression.

There is a second coding way, which we will discuss in a little more detail - lossy coding. The purpose of such encoding is to use any means to achieve the similarity of the sound of the restored signal with the original with as little amount of packed data as possible. This is achieved by using various algorithms that “simplify” the original signal (throwing out “unnecessary” hard-of-audible details from it), which leads to the fact that the decoded signal actually ceases to be identical to the original, but only sounds similar. There are many compression methods, as well as programs that implement these methods. The most famous are MPEG-1 Layer I, II, III (the last is the well-known MP3),
MPEG-2 AAC (advanced audio coding), Ogg Vorbis, Windows Media Audio (WMA),
TwinVQ (VQF), MPEGPlus, TAC, and others. On average, the compression ratio provided by such encoders is in the range of 10-14 (times). It should be especially emphasized that all lossy coders are based on the use of the so-called psychoacoustic model, which is precisely what
"simplification" of the original signal. More precisely, the mechanism of such encoders performs an analysis of the encoded signal, during which parts of the signal are determined, in certain frequency regions of which there are nuances inaudible to the human ear (masked or inaudible frequencies), after which they are removed from the original signal. Thus, the degree of compression of the original signal depends on the degree of its
"simplification"; strong compression achieved through "aggressive simplification"
(when the encoder “considers” multiple nuances unnecessary), such compression naturally leads to severe degradation of quality, since not only inconspicuous, but also significant sound details can be removed.

As we said, there are quite a lot of modern lossy coders.
The most common format is MPEG-1 Layer III (the well-known MP3).
The format gained its popularity completely deservedly - it was the first widespread codec of its kind, which achieved such a high level of compression with excellent sound quality. Today there are many alternatives to this codec, but the choice is up to the user. Advantages
MP3 is widespread and has fairly high encoding quality, which is objectively improving thanks to the development of various MP3 encoders by enthusiasts (for example, the Lame encoder). Powerful alternative to MP3 codec
Microsoft Windows Media Audio (.WMA and .ASF files). According to various tests, this codec shows itself from “like MP3” to “noticeably worse than MP3” at medium bitrates, and, more often, “better than MP3” at low bitrates. Ogg Vorbis (files
.OGG) is a completely license-free codec created by independent developers. Most often it behaves better than MP3, the only drawback is its low prevalence, which can become a critical argument when choosing a codec for long-term audio storage. Let's remember the still young MP3 Pro codec, announced in July 2001 by Coding
Technologies in association with Thomson Multimedia. The codec is a continuation, or more precisely, a development of the old MP3 - it is compatible with MP3 backwards (fully) and forwards (partially). Through the use of new SBR (Spectral
Band Replication), the codec behaves noticeably better than other formats at low bitrates, but the encoding quality at medium and high bitrates is often inferior to the quality of almost all described codecs. Thus, MP3 Pro is more suitable for conducting audio broadcasts on the Internet, as well as for creating previews of songs and music.

Speaking about ways to store sound in digital form, one cannot help but recall data storage media. The familiar audio CD that appeared at the beginning
80s, it became widespread in recent years (which is due to the significant reduction in the cost of the media and drives). And before that, digital data carriers were magnetic tape cassettes, but not ordinary ones, but specially designed for so-called DAT tape recorders. Nothing remarkable - tape recorders are just like tape recorders, but the price for them has always been high, and such a pleasure was not for everyone. These tape recorders were used mainly in recording studios. The advantage of such tape recorders was that, despite the use of familiar media, the data on them was stored in digital form and there were practically no losses when reading/writing to them (which is very important for studio processing and sound storage). appeared today a large number of various storage media, in addition to the familiar CDs. Media are being improved and become more accessible and compact every year. This opens up great opportunities in the field of creating mobile audio players. Already today a huge number of different models of portable digital players are sold. And, we can assume that this is still far from the peak of development of this type of technology.

D. Advantages and disadvantages of digital audio

From the point of view of the average user, there are many benefits - the compactness of modern storage media allows him, for example, to transfer all the disks and records from his collection into digital form and store them for many years on a small three-inch hard drive or on a dozen or two CDs; you can use special software and thoroughly “clean” old recordings from reels and records, removing noise and crackling from their sound; You can also not only adjust the sound, but also embellish it, add richness, volume, and restore frequencies.
In addition to the listed manipulations with sound at home, the Internet also comes to the aid of the audio enthusiast. For example, the network allows people to share music, listen to hundreds of thousands of different Internet radio stations, and showcase their audio creations to the public, all with just a computer and the Internet. And finally, recently a huge mass of various portable digital audio equipment has appeared, the capabilities of even the most average representative of which often make it possible to easily take with you on the road a collection of music equal in duration to tens of hours.

From a professional's point of view, digital audio opens up truly immense possibilities. If previously sound and radio studios were located in several dozen square meters, then now they can be replaced by a good computer, which is superior in capabilities to ten such studios combined, and in cost is many times cheaper than one. This removes many financial barriers and makes sound recording more accessible to both professionals and ordinary amateurs. Modern software allows you to do whatever you want with sound. Previously, various sound effects were achieved with the help of ingenious devices, which did not always represent the height of technical thought or were simply homemade devices. Today, the most complex and previously unimaginable effects are achieved by pressing a couple of buttons. Of course, the above is somewhat exaggerated and a computer does not replace a person - a sound engineer, director or editor, but we can say with confidence that the compactness, mobility, colossal power and ensured quality of modern digital technology designed for sound processing have already almost completely replaced the old technology from studios. analog equipment.

However, the digital representation of data has one undeniable and very important advantage - if the media is preserved, the data on it will not be distorted over time. If the magnetic tape demagnetizes over time and the quality of the recording is lost, if the record is scratched and clicks and crackles are added to the sound, then the CD/hard drive/electronic memory is either readable (if intact) or not, and there is no aging effect. It's important to note that we're not talking about Audio CDs here.
DA is a standard that establishes the parameters and format of recording on audio CDs) since, despite the fact that this is a digital information carrier, the aging effect still does not escape it. This is due to the peculiarities of storing and reading audio data from an Audio CD. Information on all types of CDs is stored frame by frame and each frame has a title by which it can be identified. However, different types of CDs have different structures and use different frame marking methods.
Since computer CD-ROM drives are designed to read primarily Data-
CDs (it must be said that there are various variations of the Data-CD standard, each of which complements the main CD-DA standard), they are often unable to correctly “orient” to Audio CD, where the way frames are marked is different from Data-CD (on audio CD frames do not have a special header and to determine the offset of each frame it is necessary to monitor the information in the frame). This means that if, when reading a Data-CD, the drive easily “orients itself” on the disk and never confuses frames, then when reading from an audio CD, the drive cannot be oriented clearly, which, if, say, a scratch or dust appears, can lead to reading the wrong frame and, as a result, jumping or crackling sound. Same problem
(the inability of most drives to position themselves correctly on CD-DA) is the cause of another unpleasant effect: copying information from
Audio CD causes problems even when working with fully intact discs due to the fact that the correct “orientation on the disc” is entirely dependent on the read drive and cannot be clearly controlled by software.

The widespread distribution and further development of the already mentioned lossy audio encoders (MP3, AAC and others) has opened up the broadest possibilities for audio distribution and storage. Modern communication channels have long made it possible to transfer large amounts of data in a relatively short time, but the slowest data transfer remains between the end user and the communication service provider. The telephone lines through which most users connect to the Internet do not allow for fast data transfer. Needless to say, such volumes of data as occupied by uncompressed audio and video information will take a very long time to transmit over conventional communication channels. However, the emergence of lossy encoders providing ten to fifteen times compression has turned the transmission and exchange of audio data into an everyday activity for every user.
Internet and removed all barriers created by weak communication channels.
Regarding this, it must be said that digital mobile communications, which are developing by leaps and bounds today, owe much to lossy coding.
The fact is that audio transmission protocols over mobile communication channels operate on approximately the same principles as well-known music encoders. Therefore, further development in the field of audio encoding invariably leads to a reduction in the cost of data transmission in mobile systems, from which the end user only benefits: communication becomes cheaper, new opportunities appear, the battery life of mobile devices is extended, etc. To a lesser extent, lossy encoding helps save money on buying CDs with your favorite songs - today you just have to go to
The Internet is where you can find almost any song you are interested in. Of course, this state of affairs has long been an eyesore for record companies - right under their noses, instead of buying CDs, people exchange songs directly through
The Internet, which turns what was once a goldmine into a low-profit business, but this is already a matter of ethics and finance. One thing is certain: nothing can be done about this state of affairs and the boom in music exchange through
The Internet, generated precisely by the emergence of lossy coders, can no longer be stopped. And this only benefits the average user.

D. On the issue of sound processing

Sound processing should be understood as various transformations of sound information in order to change some sound characteristics. Sound processing includes methods for creating various sound effects, filtering, as well as methods for clearing sound from unwanted noise, changing timbre, etc. All this huge variety of transformations ultimately boils down to the following basic types:

1. Amplitude transformations. They are performed on the amplitude of the signal and lead to its amplification/weakening or change according to some law in certain parts of the signal.

2. Frequency conversions. They are performed on the frequency components of sound: the signal is presented in the form of a frequency spectrum at certain time intervals, the necessary frequency components are processed, for example, filtering, and the signal is reversed from the spectrum into a wave.

3. Phase transformations. Shifting the phase of the signal in one way or another; for example, such transformations of a stereo signal allow you to realize the effect of rotation or “volume” of sound.

4. Temporary transformations. Implemented by superimposing, stretching/compressing signals; allow you to create, for example, echo or chorus effects, as well as influence the spatial characteristics of sound.

A discussion of each of these types of transformations can become an entire scientific work. It is worth giving several practical examples of using these types of transformations when creating real sound effects:

Echo (echo) Implemented using time transformations. In fact, to obtain an echo, it is necessary to superimpose a time-delayed copy of it onto the original input signal. In order for the human ear to perceive the second copy of the signal as a repetition, and not as an echo of the main signal, the delay time must be set to approximately 50 ms. You can superimpose not just one copy of it, but several, on the main signal, which will allow you to get the effect of multiple repetition of sound (polyphonic echo) at the output.
In order for the echo to appear to be fading, it is necessary to superimpose on the original signal not just delayed copies of the signal, but muted ones in amplitude.

Reverberation (repetition, reflection). The effect is to give the sound the volume characteristic of a large hall, where each sound generates a corresponding, slowly fading echo. In practice, with the help of reverberation you can “revive”, for example, a soundtrack made in a quiet room. Reverb differs from the echo effect in that a time-delayed output signal is superimposed on the input signal, rather than a delayed copy of the input signal. In other words, a reverb block is simply a loop where the output of the block is connected to its input, so that the already processed signal is fed back into the input every cycle, mixing with the original signal.

Chorus (choir). As a result of its application, the sound of the signal turns into the sound of a choir or the simultaneous sound of several instruments. The scheme for obtaining such an effect is similar to the scheme for creating an echo effect, with the only difference being that the delayed copies of the input signal are subjected to weak frequency modulation (on average from 0.1 to 5 Hz) before mixing with the input signal. Increasing the number of voices in a choir is achieved by adding copies of the signal with different delay times.

Of course, as in all other areas, signal processing also has problems that are a kind of stumbling block. For example, when decomposing signals into a frequency spectrum, there is an uncertainty principle that cannot be overcome. The principle states that it is impossible to obtain an accurate spectral picture of a signal at a specific moment in time: either to obtain a more accurate spectral picture, we need to analyze a larger time section of the signal, or, if we are more interested in the time when this or that change in the spectrum occurred, we need to sacrifice the accuracy of the spectrum itself . In other words, it is impossible to obtain the exact spectrum of a signal at a point - the exact spectrum for a large section of the signal, or a very approximate spectrum, but for a short section.

E. Equipment

An important part of the conversation about sound is related to equipment. There are many different devices for processing and input/output of sound. Regarding a regular personal computer, we should dwell in more detail on sound cards. Sound cards are usually divided into sound, music and sound-musical. By design, all sound cards can be divided into two groups: main (installed on the computer motherboard and providing input and output of audio data) and daughter (they have a fundamental design difference from the main boards - they are most often connected to a special connector located on the main board ). Daughter boards are most often used to provide or expand the capabilities of a MIDI synthesizer.

Audio, music and sound cards are made in the form of devices inserted into the motherboard slot (or already built into it initially).
Visually, they usually have two analog inputs - line and microphone, and several analog outputs: line outputs and a headphone output. Recently, cards have also begun to be equipped with a digital input and output, providing audio transmission between digital devices. Analog inputs and outputs usually have jacks similar to headphone jacks (1/8”). In general, the sound card has a little more inputs than two: analog CD, MIDI and other inputs. They, unlike the microphone and line inputs, are located not on the rear panel of the sound card, but on the board itself; There may be other inputs, for example, for connecting a voice modem. Digital inputs and outputs are usually made in the form of an S/PDIF (digital signal transfer interface) interface with a corresponding connector (S/PDIF - short for Sony/Panasonic Digital Interface - Sony/Panasonic Digital Interface). S/PDIF is a “consumer” version of the more complex professional AES/EBU standard (Audio Engineering Society /
European Broadcast Union). S/PDIF signal is used for digital transmission
(encoding) 16-bit stereo data at any sampling rate.
In addition to the above, audio and music boards have a MIDI interface with connectors for connecting MIDI devices and joysticks, as well as for connecting a daughter music card (although recently the ability to connect the latter has become rare). Some models of sound cards, for user convenience, are equipped with a front panel installed on the front side of the computer system unit, on which connectors connected to various inputs and outputs of the sound card are located.

Let's define several main blocks that make up sound and sound-music boards.

1. Digital signal processing unit (codec). In this block, analog-to-digital and digital-to-analog conversions (ADC and
DAC). This block determines such card characteristics as the maximum sampling frequency when recording and playing back a signal, the maximum quantization level and the maximum number of processed channels (mono or stereo). To a large extent, the noise characteristics depend on the quality and complexity of the components of this block.

2. Synthesizer block. Present in music cards. Performed on the basis of either FM or WT synthesis, or both at once. It can work either under the control of its own processor or under the control of a special driver.

3. Interface block. Provides data transfer over various interfaces (for example, S/PDIF). A pure sound card often lacks this block.

4. Mixing unit. In sound cards, the mixing unit provides regulation of: signal levels from linear inputs; levels from MIDI input and digital audio input; general signal level; panning; timbre

Let's consider the most important parameters characterizing sound and sound-music boards. The most important characteristics are: the maximum sampling rate in recording mode and playback mode, the maximum quantization level or bit depth
(max. quantization level) in recording and playback mode. In addition, since audio and music boards also have a synthesizer, their characteristics also include the parameters of the installed synthesizer.
Naturally, the higher the quantization level the card is capable of encoding signals, the higher the signal quality is achieved. All modern models of sound cards are capable of encoding a signal with a level of 16 bits, and recently household cards with a level of 24 bits have appeared (line of cards
Audigy, Audigy II from Creative). One of the important characteristics is the ability to simultaneously play and record audio streams.
The ability of a card to simultaneously play and record is called full duplex. There is one more characteristic that often plays a decisive role when buying a sound card - the signal/noise ratio (S/N). This indicator affects the purity of recording and playback of the signal. Signal-to-noise ratio is the ratio of signal power to noise power at the output of a device; this indicator is usually measured in dB. A ratio of 80-85 dB can be considered good; perfect
– 95-100 dB. However, it must be taken into account that the quality of playback and recording is greatly influenced by interference (interference) from other computer components (power supply, etc.). As a result, the signal-to-noise ratio may change for the worse. In practice, there are quite a lot of methods to combat this. Some people suggest grounding the computer.
Others, in order to protect the sound card from interference as thoroughly as possible,
“take” it outside the computer case. However, it is very difficult to completely protect yourself from interference, since even the elements of the map itself create interference with each other. They are also trying to combat this and for this purpose they shield every element on the board. But no matter how much effort is put into solving this problem, it is impossible to completely eliminate the influence of external interference.

Another equally important characteristic is the coefficient of nonlinear distortion or Total Harmonic Distortion, THD. This indicator also critically affects sound purity. The coefficient of nonlinear distortion is measured as a percentage: 1% - “dirty” sound; 0.1% - normal sound; 0.01% - pure Hi-Fi sound; 0.002% - Hi-Fi – Hi End class sound. Nonlinear distortion is the result of inaccuracy in restoring the signal from digital to analog.
Simplified, the process of measuring this coefficient is carried out as follows. A pure sinusoidal signal is supplied to the sound card input. At the output of the device, a signal is taken, the spectrum of which is the sum of sinusoidal signals (the sum of the original sinusoid and its harmonics).
Then, using a special formula, the quantitative ratio of the original signal and its harmonics obtained at the output of the device is calculated. This quantitative relationship is the total harmonic distortion (THD).

What is a MIDI synthesizer? The term "synthesizer" is usually used to refer to an electronic musical instrument in which sound is created and processed, changing its color and characteristics.
Naturally, the name of this device comes from its main purpose - sound synthesis. There are only two main methods of sound synthesis: FM (Frequency modulation) and WT (Wave Table
– table-wave). Since we cannot discuss them in detail here, we will only describe the basic idea of the methods. FM synthesis is based on the idea that even the most complex oscillation is essentially the sum of the simplest sinusoidal ones. Thus, it is possible to superimpose signals from a finite number of sine wave generators and, by changing the frequencies of the sine waves, produce sounds similar to the real thing. Table-wave synthesis is based on a different principle. Sound synthesis when using this method is achieved through manipulation of pre-recorded
(digitized) sounds of real musical instruments. These sounds (called samples) are stored in the synthesizer's permanent memory.

It should be noted that since MIDI data is a set of commands, music that is written using MIDI is also written using synthesizer commands. In other words, a MIDI score is a sequence of commands: what note to play, what instrument to use, what is the duration and key of its sound, and so on. Familiar to many
MIDI files (.MID) are nothing more than a collection of such commands. Naturally, since there are a great many manufacturers of MIDI synthesizers, the same file can sound differently on different synthesizers (because the instruments themselves are not stored in the file, but only instructions for the synthesizer which instruments to play, while how different synthesizers can sound different).

Let's return to the consideration of audio and music boards. Since we have already clarified what MIDI is, we cannot ignore the characteristics of the built-in hardware synthesizer of the sound card. A modern synthesizer is most often based on the so-called “wave table” - WaveTable (in short, the principle of operation of such a synthesizer is that the sound in it is synthesized from a set of recorded sounds by dynamically superimposing them and changing sound parameters), previously the main type synthesis was FM
(Frequency Modulation - sound synthesis by generating simple sinusoidal oscillations and mixing them). The main characteristics of the WT synthesizer are: the number of instruments in the ROM and its volume, the presence
RAM and its maximum volume, the number of possible signal processing effects, as well as the possibility of channel-by-channel effect processing (of course, if there is an effects processor), the number of generators that determine the maximum number of voices in polyphonic (polyphonic) mode and, perhaps most importantly, the standard according to which the synthesizer is made
(GM, GS or XG). By the way, the memory capacity of a synthesizer is not always a fixed value. The fact is that recently synthesizers no longer have their own ROM, but use the main RAM of the computer: in this case, all sounds used by the synthesizer are stored in a file on disk and, if necessary, read into RAM.

G. Software

The topic of software is very broad, so we will consider a small fraction of programs for audio processing.

The most important class of programs are digital audio editors. The main features of such programs are, at a minimum, the ability to record
(digitization) of audio and saving to disk. Developed representatives of this kind of programs allow much more: recording, multi-channel audio mixing on several virtual tracks, processing with special effects (both built-in and externally connected - more on that later), noise removal, have developed navigation and tools in the form of a spectroscope and others virtual devices, control/controllability of external devices, audio conversion from format to format, signal generation, recording to CDs and much more. Some of these programs: Cool Edit Pro
(Syntrillium), Sound Forge (Sonic Foundry), Nuendo (Steinberg), Samplitude
Producer (Magix), Wavelab (Steinberg), Dart.

The main features of the Cool Edit Pro 2.0 editor (see Screenshot 1 - an example of the program's working window in multi-track mode): editing and mixing audio on 128 tracks, 45 built-in DSP effects, including tools for mastering, analysis and restoration of audio, 32-bit processing , support for audio with 24-bit / 192 KHz parameters, powerful tools for working with loops (loops), DirectX support, as well as SMPTE/MTC control, support for working with video and MIDI, and more.

Screenshot 1.

The main features of the Sound Forge 6.0a editor (see Screenshot 2 - an example of the program's working window): powerful non-destructive editing capabilities, multitasking background processing of tasks, support for files with parameters up to 32 bit / 192 KHz, preset manager, support for files over 4 GB, working with video, a large set of processing effects, recovery from freezes, preview of applied effects, spectral analyzer, etc.

Screenshot 2

Specialized audio restorers also play an important role in audio processing. Such programs allow you to restore the lost sound quality of audio material, remove unwanted clicks, noise, crackling, specific interference from recordings from audio cassettes, and make other audio adjustments. Similar programs: Dart, Clean (from Steinberg
Inc.), Audio Cleaning Lab. (from Magix Ent.), Wave Corrector.

The main features of the Clean 3.0 restorer (see Screenshot 3 - working window of the program): elimination of all kinds of crackles and noise, auto-correction mode, a set of effects for processing corrected sound, including the “surround sound” function with visual acoustic modeling of the effect, recording a CD with prepared data, “intelligent” hint system, support for external VST plug-ins and other features.

Screenshot 3

Part two: more practical

Recently, the topic of archiving old vinyl discs and cassettes has become relevant. Today everyone listens to music on their computers, and sometimes it’s a shame that we don’t have access to old recordings.

Music digitization

Try giving your old records and audio cassettes new life. Until the mid-eighties, music lovers were divided into two camps: specialists who knew how to keep precious vinyl records and tapes in excellent condition, and amateurs who did not care about fingerprints or scratches on the surface of the record. Now there are no such problems. CDs are more compact and more difficult to damage. When CDs became widespread, gramophone records were retired. Of course, only in the digital age can musical treasures continue to delight listeners: process old recordings on your computer and burn them to CD!

1. Connecting the player to the computer

First of all, connect the player to your computer. This can be done in several ways. Some players have their own amplifier - connect it to the Line-in input of the sound card. If your player does not have its own amplifier and the signal is too weak, use an external amplifier, such as a stereo system. Cables with various connectors for these purposes can be purchased at electrical stores or stalls. If you need a cable with an unusual combination of connectors that is not commercially available, buy the required connectors separately and make the cable you need yourself. Be careful not to let a loop appear as it will cause noise later. To avoid this, additionally connect the player's ground cable to the computer case. It would be highly advisable to first listen to what we want to restore, since a “surprise” in the form of “whoosh, the record is stuck” will not be very useful when recording a soundtrack on the HDD. If such a problem is detected, then we adjust the load on the pickup, but if there is no adjustment, you will have to put some kind of weight on the tonearm head
(very undesirable, but there is no other way out). If everything went well, then in the speakers (headphones) there is a chance to hear the beneficial sounds of vinyl round timber sand or the unsuppressed sound of the surf from something like the Yauza 221-1S MP with MK-60, type I.

2. Configuring sound card capabilities

Use the Volume Control program in Windows to adjust the incoming signal level. To do this, double-click on the speaker icon in the tray with the left mouse button. The Line-in and Wave level controls must be turned on.
Moreover, the mobile volume control should be no lower than the middle of the ruler. A powerful source of interference is often the microphone input. It often produces unnecessary noise, so there should be a checkmark next to the “Turn off” sign. Now let's configure the recording parameters. Open in
"Volume control" tab "Options" - "Properties". In option
"Volume settings" select the "Recording" option, in the "Display volume controls" field on the Line-in line there must be a checkmark.
Click OK. In the "Volume Control" select the recording source - in our case it will be Line-in. Place the sliding adjuster approximately in the middle of the ruler. Leave the "Volume Control" window open for now.

3.Restoration

Now, strictly speaking, you can move from the desired to the actual.
There is a sufficient amount of software for restoring old recordings.
There are professional solutions for sound engineers that are closer to sound editors, which the average user will be able to understand in five years.
During this time, what could have been restored will be lost forever.
Fortunately, there are programs in this world that you can figure out quite quickly and get a fairly high quality of the final material.
One such solution is CoolEdit from Syntrillium.
(www.syntrillium.com) sound editor with the ability to record and process music.
Well, did you download the program? Installed? I hope that you haven’t forgotten about the “Audio Cleanup” plugin, which will be very useful to us. Let's launch the editor and look at the main program window (Fig. 2). Standard
Windows interface (to a certain extent, as is often said, intuitive). Let's agree this way: there will not be a detailed story about all the capabilities of the “Cool Edit 2000” version 1.1 program here.
Let's consider only what we really need for sound restoration.

Now it's time to make a small, but very important setting some program parameters. Let me explain why: probably many users reading these lines will not want to limit themselves to cleaning sound from vinyl media, since no one has yet canceled ordinary compact cassettes. What can you say about recording voices from a microphone? Owners of the “SB Live!” card It will be useful to read the next short paragraph.

Almost all owners of stationary cassette players know that the frequency range of recorded information lies in the range of 40-14,000 Hz in the case of a type 1 tape cassette. Chrome dioxide tape gives a wider range. But few people know that the tape recorder still reproduces individual “cries of the soul” and in an even wider range (20-20,000 Hz), which, being clogged with all kinds of noise and interference, do not reach the gentle ear of a music lover. The “Cool Edit” program allows you to correct this shortcoming and level out the amplitude-frequency response of the cassette player to a quite decent level.

Now, especially for the lucky owners of the “SB Live!” card: you probably know that the frequency response of the codec is not at all ideal, and after 4.5 kHz a stepwise decline in the level of the upper frequencies begins, which in many cases is not good. With the help of “Cool Edit” we will overcome this obstacle too.

To correct the above shortcomings, let’s set up the editor’s FFT filter, which will be our assistant on initial stage sound cleaning.
Let's open any wav file or record a few seconds of silence by pressing the button with the red dot located in the button panel in the lower right corner.
Then go to the program menu: Transform-Filters-FFT Filter. In the window that opens (Fig. 2), let’s create any absolutely thoughtless preset (setting), for which we drag the yellow line a little with the mouse. Using the “Add” button, let’s call our setting any censored name and save it. For what? And for the fact that now we will edit the “Cool.ini” file, located at the address:
X:Program Files Cool2000. Namely, by introducing some additional parameters to correct the annoying shortcomings of the cassette player and the “SB Live!” card.
Open the “Cool.ini” file, where we look for the section. But the caveat is that this section in this file appears only after we use the services of the FFT filter. That's why we needed body movements to create some kind of abstract preset. Now let's look where in the section
the setting we created is located - we simply find the name with which we called our preset. And then it’s not difficult: in a free line we write this “tiny” parameter:

Item29=MCRESTORATION,3,19,0,20,426,5,845,0,1288,0,1986,0,2259,0,2855,
6,3179,9,3444,1,3583,28,3688,42,3773,48,3848,61,3925,76,3957,96,3998,100,
4004,100,4012,5,4096,5,19,0,20,426,5,845,0,1288,0,1986,0,2259,0,2855,6,3179,

9,3444,21,3583,28,3688,42,3773,48,3848,61,3925,76,3957,96,3998,100,4004,100,

4012,5,4096,5,2,0,12000,1,2,0,0,1000,100,5,-10,100,-
0.5,12,24000,1,0,1,1,48000

We definitely write down this number in one line! These are parameters for correcting the frequency response of cassette recorders. Igor Babailov kindly shared the edited Cool.ini file, which he spent many days and nights working on: www.hot.ee/uvs/Cool.zip. For which I give him my deepest bow.
I don’t think that the frequency response parameters of different cassette consoles will vary greatly. On the other hand, if you spent lots and lots of evergreens on Dragon Nacamichi or Maranz, then you obviously don't need these settings.
For owners of “SB Live!” It is strongly recommended to first pass any recorded phonogram through this filter with the following preset:

Item36=SBCORRECtion,3,20,0,0.83,0.532,1,793,1,1003,2,1223,4,1713,5,2046,
10,2391,12,2569,15,2710,18,3066,24,3234,27,3398,35,3480,41,3546,47,3628,
56,3726,70,3825,89,4096,100,20,0,0,83,0,532,1,793,1,1003,2,1223,4,1713,5,
2046,10,2391,12,2569,15,2710,18,3066,24,3234,27,3398,35,3480,41,3546,47,
3628,56,3726,70,3825,89,4096,100, 2,0,12000,1,2,0,0,1000,100,3, -10,100,0,
14,24000,1,0,0,1,48000

We write this value in the same Cool.ini and in the same section
, not forgetting that the parameters must be written in one unbroken line.
Since our computer and player are connected, tested for functionality and ready for work, and the “Cool Edit 2000” program feels great after making additions, we can safely begin the process of recording a phonogram on a hard drive.
We start recording either through “File” - “New”, or press the record button on the panel. Don't be afraid, recording will not start immediately: we will be asked to select the characteristics of the signal to be recorded. In the window that opens, select Sample
Rate =44100 and 16-bit sound in stereo mode - why do we need mono? We place the tonearm on the record, and when you press the OK button, recording will begin, as evidenced by the recording indicator in the form of pulsating red bars and a timer. It would be highly desirable to record a specific thing with capturing empty sections at the beginning and end of the phonogram, i.e., where only the rustling of the mass itself is heard - we will need this in the future for deep sound purification. You can even capture a few seconds of the previous or next song.
In the place we need, press the Stop button, and in the program window we observe the recorded phonogram (Fig. 3).

Fig.3
Let's save the recorded masterpiece to a wav file (File - Save as). Next, if you use “SB Live!”, open the already familiar FFT filter (Transform-
Filters-FFT Filter) and, by selecting the SBCORRECtion preset and pressing the OK button, we begin to equalize the frequency response of the recorded signal (Fig. 4). Next, we proceed to remove the crackles and clicks of the resulting phonogram 2.

Why do they all go together to the menu Transform-Noise Reduction-Click/
Pop Eliminator, where the most optimal sound cleaning parameters are selected in the window that opens (data were presented by professional restorers who have eaten more than one dog in this case).
In Fig. 5 these parameters are presented for everyone to see. But first we need to click the Auto Find All Levels button, and only then OK to start the sound cleaning process.

If, after zooming in, we have not found the initial section, do not be alarmed, but move the mouse cursor to the slider at the top of the picture and move this slider to the left, where you will find the beginning of the phonogram. Now use the mouse to select the initial area with noise, corresponding to approximately 3 seconds of sound (the time is indicated on the panel below), and in the menu Transform-Noise, Reduction-Noise,
Reduction, using the Get Profile from Selection button, we’ll take a picture of the noise we hate so much. But first, let’s set the most optimal values of some parameters, which can be seen in Fig. 7 along with a photo of the noise. And to hear the preliminary result of the noise reduction work, click on the Preview button, not forgetting to select the phonogram to be cleared in advance.
If the result is satisfactory, press the OK button and begin the noise reduction process, which lasts much less than clearing the signal from crackles and clicks. After finishing, it would be a good idea to save the result under a different name in order to have an idea of all the stages of sound purification.

It should be said that after completely clearing the phonogram of noise, the signal level decreases quite noticeably, and, therefore, you will have to normalize the signal using the Transform-Amplitude-Normalize command, which will take quite a bit of time.
Finally, the time has come to carry out the final phase of our process, namely, trimming our phonogram around the edges and saving it, dear, in a file. I’ll say right away that you can trim unnecessary starting and ending sections of an audio file in “Cool Edit” by selecting the unnecessary section and using the Edit-Cut command.
For those who want to convert recordings from compact cassettes into digital form, the task is simplified by exactly one step - cleaning the phonogram from noise and clicks. After finishing recording music on the hard drive, select the MCRESTORation setting in the FFT filter, equalize the frequency response of the signal from the cassette player, and then simply begin the process of noise reduction and signal normalization.
Well, we save the cleaned and trimmed piece of the phonogram, brought into divine form, as you like - wav format or another. With the first one you can do anything, even encoding into MP3. Oh, I almost forgot, you still need to record this whole thing on a CD, so that you can then comfortably listen to it somewhere on your friends’ player, puffing out your cheeks and nostrils from your own importance in the process of updating the sound.

As a result, you must burn the prepared audio files to a CD.

4. Preparing files

Now let's burn the music to CD. We use Nero. Other programs, of course, work similarly. Launch Nero and select Compile a new CD and Audio-CD.
The song viewer and file manager will open. Drag and drop WAV files into the Songs window. You can drag and drop in this window to change the order of files.
Then select all the songs, right-click on them and select
Properties. Set the length of pauses between songs. If pauses already exist in the files themselves, then this value is best set to zero. After this, call the Filters tab. There are also tools for improving sound quality here. Using the normalization function, you can equalize the volume of individual songs: Highlight the Normalize field and set the method to Maximum. Nero will adjust all songs to the maximum possible volume so that there is no sound distortion.
Close the dialog by clicking OK.

5. Splitting the wave file into separate compositions

Now divide large files into separate parts so that the CD player can recognize the songs. Click on the appropriate file and open the Properties dialog again. Call the Indexes, Limits, Split tab.
Nero will display the sound waves in sequence. Then make a division: where the notch appears in the wave, the intended beginning of the composition is located. You will find out whether this is true if you click on this place and press Play. If you are talking about a transition, highlight the range and click Zoom In. Click to insert a gray line where you want to separate the compositions. Then click on Split. Go to Full
View, if necessary, repeat everything again and finally confirm what you have done by clicking OK. Use "Properties" to assign names to individual compositions. After that, click the button in the toolbar to burn a CD, set the speed, click Write - and that’s it, you are the owner of your own masterpiece!

6. Prospects and issues

The prospects for the development and use of digital audio seem very broad. It would seem that everything that could be done in this area has already been done. However, it is not. There remain a lot of problems that are still completely untouched.

For example, the field of speech recognition is still very undeveloped. Attempts have long been made and are being made to create software capable of qualitatively recognizing human speech, but all of them have not yet led to the desired result. But the long-awaited breakthrough in this area could incredibly simplify entering information into a computer. Just imagine that instead of typing text, you could simply dictate it while drinking coffee somewhere near your computer. There are many programs supposedly capable of providing such an opportunity, but all of them are not universal and go astray when the reader’s voice deviates slightly from the specified tone.
Such work brings not so much convenience as grief. An even more difficult task (quite possibly impossible to solve at all) is recognizing common sounds, for example, the sound of a violin in the sounds of an orchestra or identifying a piano part. One can hope that someday this will become possible, because the human brain can easily cope with such tasks, but today it is too early to talk about even the slightest changes in this area.

There is also room for exploration in the field of audio synthesis. Today there are several methods of sound synthesis, but none of them makes it possible to synthesize sound that could not be distinguished from the real thing. If, say, the sounds of a piano or trombone are even more or less amenable to implementation, they have not yet been able to achieve the believable sound of a saxophone or electric guitar - there are a lot of sound nuances that are almost impossible to recreate artificially.

Thus, we can safely say that in the field of processing, creation and synthesis of sound and music, we are still very far from that decisive word that will put an end to the development of this branch of human activity.

7. Glossary of terms

1) DSP – Digital Signal Processor (digital signal processor).
A device (or software engine) designed for digital signal processing.

2) Bitrate – in relation to data streams - the number of bits per second
(bits per second). As applied to audio files (for example, after lossy encoding), how many bits describe one second of audio.

3) Sound is an acoustic wave propagating in space; at each point in space can be represented as a function of amplitude versus time.

4) Interface - a set of software and hardware designed to organize the interaction of various devices.

5) Interpolation - finding intermediate values of a quantity based on some of its known values; finding the values of the function f(x) at points x lying between points xo

Since the existence of analog technology for recording and reproducing sound, huge archives with truly priceless audio materials have been accumulated. However, now few people will prefer old vinyl records and compact cassettes to new digital recording media. The new requirements for sound quality that listeners are now making do not allow us to simply take and transfer old phonograms to a new digital medium. In addition, modern technology can significantly improve the source material of modern recordings, correct many mistakes made by the sound engineer and the shortcomings of the acoustics of the studios themselves. In this article we will look at the arsenal of tools in a modern recording studio, try to identify the pros and cons that modern digital technology has brought to work with sound, and talk a little about the problems of perceiving sound recorded on digital media.

Most modern computers have multimedia capabilities, allowing you to work with sound. But, alas, I must upset many of you. As a rule, sound adapters for computers do not meet professional studio equipment standards. For example, audio cards that are not equipped with digital audio input/output interfaces such as AES/EBU or, in extreme cases, S/PDIF, cannot be considered suitable for professional remastering and restoration. And we will immediately reject the possibility of working through analog inputs/outputs - due to the unsatisfactory quality of the DAC (digital-to-analog converter) and ADC (analog-to-digital converter) built into the audio adapters. Thus, systems designed to work as part of conventional multimedia computers are not suitable for professional remastering and restoration of phonograms, but they allow a musician or sound engineer to experiment with sound at home. I'm talking about programs like Sound Forge, Samplitude Studio, DART, and some others. Those who are already familiar with these systems will probably agree that they are hardly suitable in a production environment due to their very slow operation. In addition, the quality of sound processing by these systems leaves much to be desired. The same applies to digital filters and more complex algorithms - interpolation, noise reduction and impulse noise suppression. Difficulties also arise during normal installation. Sometimes in a difficult situation it is almost impossible to achieve an unnoticeable installation.

The quality of the final phonogram strongly depends on the choice of ADC at the stage of recording the source material. As you know, the standard sampling rate for CDs has been set to 44.1 kHz. Although, based on the well-known Kotelnikov theorem, this value can be considered quite sufficient, in practice this is clearly not enough. Imagine, for example, that in such a digital path you need to reproduce a sinusoidal sound with a frequency of 20 kHz. It turns out that each sample of the voltage value will occur each time in a different (“floating”) phase of the signal period, and as a result we will get a low envelope, which was not in the original signal. To avoid this effect, good DACs and ADCs use oversampling circuits, and the new DVD audio standard uses a different sampling frequency - 96 kHz. As a philosophical note, we can add that sound itself (as well as the perception of sound) is continuous in nature, and digital technology introduces discreteness into it. Therefore, there is a tendency to increase the bit depth of audio digitization and increase the sampling frequency.

If you are going to do sound restoration, get ready to repel numerous attacks and serious accusations against you, caused by the fact that not everyone will hear in your soundtrack what they have become accustomed to for many years or what they want to hear. There are probably no people left who have heard Fyodor Chaliapin live, but there are many people who claim that they know how his voice should sound. Although no one thinks about what they are used to hearing distorted voice from old records. The position “don’t touch the original with your hands”, it seems to me, is not entirely correct. On the contrary, one should not strive for complete compliance with the original, but identify its shortcomings and distortions, correcting them. Of course, with this approach you can get a restored version that is very different from the original. And the “experts” will wave their hands at you and curse you. And the listeners will thank and listen with pleasure to their favorite musicians and actors, without forcing themselves to abstract themselves from noise and interference. And if their number increases thanks to your work, this will mean that your work was not in vain.

Recovering old records

February 2, 2015
Not long ago there was news on Habré that scientists managed to restore one of the very old recordings made back in 1905 without damaging the media. The main achievement here is precisely the fact of the integrity of the medium, since the recording was made not on just anything, but on a wax cylinder. This is almost the very first invented method of recording/playing sound that was widely used. Before this, the carrier was glass cylinders with soot (they really didn’t know how to reproduce them), then foil became the coating and only then wax.

I, of course, became interested in such and such a rarity and decided to listen to how people generally lived there, back in 1905... To my surprise, the original recording turned out to be quite noisy, although it was claimed that it had been processed by various noise-reducing algorithms and etc... It is worth noting, of course, that compared to other recordings restored from wax cylinders, this one was really quite good - the quality is already quite comparable to the first records. However, as we know, the best is the enemy of the good.

I’m generally a fan of old pre-war songs, and often I have to slightly update the sound of exhibits mined from the depths of the network. In particular, for example, recordings of songs from old films suffer from this, since the original film itself does not shine with sound quality.

In the case we are considering, we actually have an impression of the track made by a laser. Now I am not considering the fact that gentlemen archaeologists also applied some procedures to the original sound, but I will assume that they tried to reproduce the original as accurately as possible. Since the linear dimensions of the track, in this case, are quite large, there is practically no need to talk about digitization errors, especially since a fairly high sampling frequency was chosen for the source file, approximately four times higher than the frequency of sounds in the recording. This means that we can assume that we have before us an almost perfect cast of the original sound wave.

Here we are faced with the peculiarities of the material and the method of reproduction itself.

Fact number one: the material from which the cylinder is made is quite soft (wax), and even if it were perfectly new, it is impossible to capture sounds with a wavelength less than a certain value.
Fact number two: in addition to the material, the recording technology itself on such cylinders also makes its own adjustments - the sound was literally written with a needle along the cylinder.
Fact number three: the reproduction of such a cylinder itself destroyed the medium.

The first fact gives us a limit on the maximum recording frequency of approximately 5-6 kHz and, as we will see later, this is very important. And the second and third facts indicate that you don’t have to worry too much about maintaining the steepness of the fronts and the shape of the waves - the accuracy is not the same initially.

First, let's turn on the spectral representation of the signal and take up the equalizer (and the equalizer is everything to us).

What do we see in the spectrogram? Our file sampling frequency is as much as 22000 kHz, while, as we can see, there are no sounds in the recording above 4.5 kHz, which was to be expected (see fact number one). However, if you look more closely, some dirt still seeps higher (to make it easier to see, I increased the contrast and brightness in the square circled in red). It’s not clear where it comes from, but without going into details, the first thing we can do is clear conscience rip everything higher so-called "Nyquist frequencies" for our sampling frequency (11 kHz). Since there was still a decent margin there, I didn’t waste time on trifles and removed everything above 8 kHz, as well as below 100 Hz, since, judging by the spectrogram, there was nothing useful there either.

After thinking a little, I approached this point even more radically, namely, I didn’t bother with the equalizer, but started all over again and immediately changed the file sampling frequency to 11 kHz.

Next, without further ado, we will use a tool that is available in many modern audio editors: Noise Reduction. The idea is simple - we select a place in the track where we have nothing but noise. Next, we create a pattern of these noises (Capture Profile). In the simplest case, one single Noise Reduction Level slider will be enough for you.

He, however, says that they have supposedly updated the engine (we are talking about version CS5.5) and now they know how not to create unnecessary artifacts during noise reduction, but you and I know that there is practically no difference. And the settings are still the same, except that the window has been redrawn in the new version.

There are some other nuances that this charismatic guy does not touch on in the video, for example, the “window width” for the Fourier transform (FFT Size).

The window width affects the frequency and time resolution of the signal - increasing the window width increases the frequency resolution, but decreases the time resolution and increases the computational cost of performing the fast Fourier transform.

Without going into details, when getting rid of random (this is important) noise, you should strive to use as many points as possible (Snapshots in profile) with the maximum possible FFT Size for a given segment. All this means that for a high-quality “noise profile” we need as long a segment as possible, in which there is only noises. In general, what’s good about Noise Reduction is that it can be used not only for noise, but also for various background sounds (the noise of a forest, rain, etc.)

In fact, SoundForge also has more interesting tools, for example, the ability to subtract an arbitrary wave from a signal, but I started using Audition back in the old days, when it was called CoolEdit, and then SoundForge couldn’t even come close to anything similar , and now I’m too lazy to take on something new.

As usual, I started looking for a segment with noise in the file, but it turned out that they were too short and when forming a sample based on them, it was only possible to create patterns that were too short. Because of this, either the noise reduction was not audible at all, or the wildest artifacts appeared. Then I began to think about how I should deal with this. While I was thinking, I decided to go in from the other end.

The most annoying in our case are soft clicks, as well as periodic noise immediately following the clicks. I think this effect appeared due to deviations of the cross section of the cylinder from the shape of a circle, or the axis was not level. At some point, during recording, the needle entered too deeply into the wax (initial click) and then for a short segment there was an unevenness (a characteristic noise that continued several times after the click), then the cylinder made a full revolution and the noise was repeated. As we will see later, we can still get rid of this noise using Noise Reduction, but moving through the file and looking at the waveform, I noticed that there are also quite strange jambs, similar to the characteristic distortions of A\B amplifiers. Very typical example on the title screen of the article, but it is still quite controversial, since that distortion has too long a period (I chose it at random from the file). But the next screenshot shows very clearly what I mean.

About how these things ( Not)need to be treated and why, I decided to write separately, and to prevent the article from growing too long, I hid it under a spoiler. You can skip this part, it’s almost a lyrical digression.

About how to remove distortion and why it is not necessary

It’s not very clear in the screenshot above, since I moved a little away for scale, but the duration of such an oscillation is a measly 80 microseconds. Let's do some simple calculations:

T = 0.00008 ms (period)
F = 1 / T = 12500 Hz (frequency)

It's time to remember the first fact stated earlier: obviously, in such a recording there is nowhere for 12 kilohertz to come from, so this is almost certainly noise too. Here one could turn to a spectrometer, however, since these vibrations have a very tiny amplitude, and besides, for some reason, there are too many of them, in the spectral representation they do not stand out at all and look like a dark, dark dotted background (picture with a more contrasting square, this is exactly what it is).

It is unlikely that such jambs arose due to the movement of the needle. I believe these are micro-cracks in the wax, which most likely appeared due to time.

In an ideal case, it might seem like it would be cool to just take and cut out such places, and for entire periods: since these vibrations occur in places where we actually have recording defects, we can definitely say that there is no useful information they do not, and since it is not important for us to preserve the original timing of the composition, we could safely delete them - on average they last no more than 100 μs, this is a completely tiny interval, it is unrealistic to notice this by ear.

Only now, we don’t live in an ideal world (although it depends on how you look at it), so this is a pretty bad idea. The fact is that when a section is deleted, the so-called smoothing, i.e. smoothing the levels of neighboring points. Since this is digital, such tiny irregularities after smoothing are the most natural high-frequency noise. We limited it a little by lowering the file sampling rate, but still. Here you could try to cut such noise with an equalizer after all the removal, but, again, this will change the wave shape, and due to the fact that our sound is digital, it all comes down to mathematics - making an even cut at the desired frequency with an equalizer simply won’t work. . In addition, as I said above, there are too many such distortions, which is why it is almost pointless to edit everything manually like this - such fragments have a duration of about 100 μs, which means that in order for this “improvement” to be at least a little noticeable by ear (theoretically), you It is simply an incredible number of such areas that need to be removed. At the same time, since these distortions will not go away in the rest of the file, a “clean segment” of a couple of milliseconds will be simply unnoticeable against the background of noisy ones. And another big kick in the ointment - the results of smoothing hundreds of deleted sections will create such noise that the original version (which was without editing) will even seem to you better than that what you end up with.

In addition to all of the above, waves sometimes interfere with each other in a completely unimaginable way (forgive the tautology), which is why it is very difficult to clearly understand where exactly the distortion is, and where, say, sibilants are - for this, at a minimum, you need to have a good knowledge of the features of human speech and the formation of harmonics and everything else. So, even if it were not for smoothing, with such a distance there is a non-illusory chance of simply spoiling the original sound, breaking the interference pattern.

And yet, there is a cure for these problems - below I will show how to clean up the sound, including such distortions.

At one point, when I was looking at the spectrogram of the signal, it dawned on me that at the end of the song, there is a fairly long moment when nothing but the “whistle of birds” sounds. And this whistle on the spectrogram has a completely unambiguous band.

Which means we can cheat. We uncover the equalizer, set the maximum Range (this is the dynamic range, in simpler terms, by how many dB this or that frequency will be amplified/attenuated) and cut the frequencies at which our birds sing, and leave everything lower/higher.

Since even the maximum dynamic range is not enough to completely kill all the birds, I repeat the equalizer twice. In general, I could write a separate article about how it works and why everything happens this way, but I’m afraid I don’t know the mathematics of algorithms well enough to get smart about this topic.

So now we have a fairly long section with just noise... and, that's right, we're back where we started. We capture the noise profile (after capturing, press close, not cancel, otherwise all settings will be reset to the previous ones used).

In addition to Noise Reduction, there is also a Hiss Reduction filter, which, as the name suggests, will help us get rid of whistling and all sorts of things like that. The settings there are almost the same as noise reduction, except that FFT Size works somehow differently, but I haven’t figured out exactly how, so I’m acting here empirically, which is what I advise you to do. For Hiss Reduction, you also need to specify the base noise level (Get Noise Floor button), and so, this base level should be captured on the same segment in which we captured the noise profile.

After using these two types of noise reduction, we get a result that is quite suitable for consumption. Except that small artifacts appear at the ends of the spectrum. Here the equalizer comes to our aid again - we mercilessly cut everything below 150 Hz and above 4.5 kHz.

It has become noticeably quieter, but the clicks are still audible. Now the spectrometer comes into play again. If at this stage you listen to the file and observe the spectrogram, it will be obvious that at the moment of the click the noise has a very wide spectrum, but the melody, on the contrary, follows clear wavy lines (below, for clarity, one click is highlighted in red).

First, let's eliminate the amplitude spikes at the click sites. To do this, switch to the waveform display mode. As a rule, this is just one “off-scale” period of the wave.

In case this period was simply too loud, I usually just turned it down. If it was also very distorted, then it was deleted entirely (too lazy to straighten each one, what can I do).

And here I’ll tell you how to literally edit the amplitude of such jumps in just two clicks.

A hint on how convenient it is to use favorites in Audition

Actually, the idea is trivial. In Audition, for an arbitrary section of the recording, we can set a specific volume change schedule (Amplitude and Compression -> Envelope). Those. for example, we can make a smooth fade-out, or a sharp appearance. Well, in general, you can draw whatever your heart desires. Typically this tool is used on a large scale. However, I figured out how it can be used on a micro scale. Open "Favorites" (Window->Favorites) and create a new item. Select the Envelope effect and edit the settings. In the settings, we create the simplest arc graph, with one single exact minimum exactly in the center of the graph (50% time, 50% amplitude). We come up with a name, save and move on to our first click.

Now you just need to select, as close as possible, one period of the “click” wave, which is off scale, and double-click on the created effect in your favorites. Voila - the level click becomes approximately equal to the fluctuations surrounding it. A sort of “ultra-precision softlimiter”. In principle, you can achieve a similar effect with the Hard Limiter, but it will cut all the sounds in the track to one level, and we only need to cut the unnecessary ones. In addition, there are a number of nuances - for example, it is often simply impossible to select settings at which the limiter cuts only what is needed, for example, when the clicks are too sharp.

Once the loudest clicks have been defeated, it's time to get rid of the small distortions that exist throughout the file. In the previous spoiler, I already told you how you shouldn’t delete them, but now I’ll tell you how to do it more or less correctly.

Here the spectral display of the signal will come in handy again. If you look at the toolbar in this mode with a keen eye, you will notice something that seems completely unrelated to the world of audio editors. Brush. That's what we need.

It allows you to delete arbitrary areas of the spectrum. It's like a dynamic, ultra-precise equalizer.

Remember when I wrote that clicks have a wide spectrum, but musical sounds are clearly readable against their background? Now we will use this. Select this brush, look for the moment where we have such a column of noise against the background of music. Next, we paint with this brush in places where we only have noise, without affecting the musical line. Next, press del, then move the selection slider to see what we have done, or simply start painting with a brush in a new place. And we see that where we just removed it, we now have darkness, that is, silence.

It's just a thankless task, really. Because in reality, sounds have a much more complex nature and, in addition to noise between the main harmonics, there are often less important overtones. But despite the fact that they are less important, they give the sound a more natural color and character, which can be lost if you start cutting out all non-harmonic sounds.

By the way, remember I wrote about the fact that when you remove periods, a certain noise appears, which seems to have nowhere to come from? Working for a long time with a brush in the spectrograph, you can notice that even in this mode, such noise also sometimes appears - around a remote area, white areas appear out of nowhere - this is it.

The attentive reader is probably still wondering, why did I use an equalizer, and not this “brush,” in order to get a noise profile (where “birds whistle”), since there is nothing there except whistling?

In fact, it’s a good question, it really could be done, but firstly, I was too lazy to trace the entire envelope, and secondly, because The whistle here is not an ideal synthesizer; there remains the possibility of additional harmonics in the whistle, which are also present in the rest of the song. And if we calculate the noise profile based on them, then, obviously, we will mistakenly get rid of them during processing. Although, in fact, of course, the main role in this particular case was played by my laziness...

So, as a final touch, we run the Automatic Click remover in a slightly stronger than average mode (top slider 30, bottom 75) - it will remove sharp clicks that could appear as a result of our manipulations. And with the equalizer we once again cut off everything above 5 kHz and below 100 Hz. Next, we normalize the file to 100%. I also deleted the very beginning of the file, literally half a second somewhere, but after all our manipulations there was still nothing left.

By the way, in the original article there was no mention, and no tags were added in the file, to the original performers: Harlan And Belmont.
And Byron Harlan is even on

Anatoly Weizenfeld: Let's start with general approaches to sound restoration. What in the existing phonogram are we trying to save, trying to get out, and what can we neglect and leave as is? Let's take vinyl records as an example - what can we do with them using the tools available today?

Alexey Lukin: Vinyl records are a fairly popular case, because today they are restored not only by professionals who could not find the master tapes (or they no longer exist), but also by amateurs who collect records but want to preserve this pleasant crackling sound for their collection on an iPad, for example. This is called needle drop in English, when digitized records are listened to on a computer.

The first thing I would like to say when talking about vinyl is that it is very important to play vinyl correctly. Before we talk about high-quality ADCs and programs, it is important to clean the record well. Some use water and a soft cloth, others use special adhesive compounds to then remove this adhesive compound along with the dust, some even go as far as “wet listening”, when the record is played from a damp and even water-covered record... in general, there are many ways, more on that there is literature, stories from restorers.

Let us assume that the plate material is already "removed ", digitized with maximum quality. What is the sequence of actions for restoration?

The first thing we encounter in such a recording is click suppression. For records, especially old, worn ones, this is the most noticeable interference, which, in turn, makes it difficult to suppress other interference, such as noise, network interference. Therefore, the first thing they do is suppress clicks and crackles. There are many software products for this, some of the best: iZotope RX, Click Repair - an inexpensive, high-quality competitor to RX in the field of working with clicks (if you don’t need all the functionality of RX), CEDAR is an already expensive hardware Hi-End restoration system, but it also exists in the form of plugins for SADiE, and now for Pro Tools.

You need to start working with the phonogram of a record with an automatic declicker. It finds clicks in real time and suppresses everything it can find. When processing a recording with a declicker, I recommend making a couple of passes, because sometimes it turns out that on the first pass, after suppressing one powerful click, several small clicks may be detected nearby, because the declicker did not notice them. Therefore, two or even three passes can improve the cleaning quality.

The second is the removal of what we at iZotope call crakle. This is what we call not individual clicks, say 10 per second, but a continuous stream of clicks merging over time, which sound almost like noise, but still not a smooth noise, but such “sand”, that is, small, small clicks that go very often by time. It also has an automatic De-Crackler process. Similar to a declicker, it clears very small clicks that the declicker might have missed. However, a declicker and a decracker are based on different principles of operation and therefore can complement each other - if one method fails to do something, then the other can come to the rescue. In general, when processing audio, I often recommend using different tools. If both work, for example, denoiser and deconstructor (this is also a module for noise reduction), and both give good results, you can mix them in some proportion or sequentially drive them with greater and lesser force, and then their advantages add up.

But let's get back to vinyl. After the clicks and fine “sand” are suppressed, I recommend checking the recording for stereo balance. It often happens that a vinyl recording is monophonic, but the record is played on a stereo player. If the player's stylus is chosen well, then such playback is rather an advantage, because the scratches in the groove are not symmetrical and the clicks end up in different channels, while the useful recording will sound in the center of the stereo panorama. This allows both clicks and stationary noise to be more effectively suppressed.

How can you use this? The first step is to make sure that the desired signal sounds in the center of the stereo panorama. If this is not the case, that is, the channels are not completely balanced or the needle is not completely vertical (these may be shortcomings of the turntable or phono preamplifier), then the amplitude balance should be equalized.

Another important point in channel matching is azimuth alignment. This concept comes from tape recording, and means equalizing the delay between the left and right channels so that the recording of the desired signal is centered not only in amplitude, but also in time. You can, of course, manually select the delay, you can use a stereogoniometer or vectorscope, which will show whether the recording is centered or not, but you can use a software tool called an azimuth corrector. It is located in the channel operations section. The azimuth corrector allows you to automatically select both channel levels and time shift based on the analysis of the phonogram. Equalization thanks to resampling and oversampling is carried out with an accuracy of one hundredth of a sample (sample), which is impossible to achieve manually. Therefore, when the tracks are combined into mono, there will be no high-frequency rollover.

Once we have aligned the channels, we can apply MS encoding, that is, add the tracks to get mono, and subtract them from each other to get a difference channel, and process them independently. If you know that the recording is mono, you can discard the Side channel, it contains only noise and crackling, and the Mid channel, that is, half the sum of the left and right channels, contains the useful signal. Instead of MS encoding, you can use the new RX 3 option - this is extracting the center channel from the recording. This allows you to extract center channel sounds more efficiently than MS encoding. If we simply use MS encoding, then in the M channel those noises and clicks that were in the S channel are reduced by 3-6 dB, and with the help of a special tool for extracting the central channel, the level of clicks can be reduced by 20 decibels. This the instrument is designed to extract the centered vocals from a song. You can simply add two tracks out of phase, but then the accompaniment will drop by about 3 dB, and with this tool you can suppress off-center sounds more significantly.

All this applies to cases where it is known that the recording on the record is mono. In the case of stereo recording, you can move on to noise reduction. There are several tools for noise reduction. The most obvious one is the denoiser. It works well in cases where the noise is stationary or constant in time and does not change in power and spectrum. In this case, the denoizer is “trained” using a fragment of noise from the phonogram and is applied to the entire recording. If there are fragments on the record that are more damaged than others or that the declicker could not cope with at the first stage, then additional operations are necessary. For example, for poorly suppressed or long clicks, use the Spectral Repair tool. There are several modes that allow you to both reduce the amplitude of the selected sound until it merges with the surrounding background, and completely resynthesize the selected fragment based on the surrounding material, replacing it with a synthetic “patch”, which in many cases sounds quite realistic. Spectral repair works effectively not only on fragments separated by time, but also by frequency and time, when the interference is limited in spectrum to a certain frequency range.

But let's get back to noise suppression. If it turns out that the noise on a vinyl record changes significantly over time, then the denoizer will not be very effective in this case and will have to be done manually. For those areas where the noise level increases, you can apply a denoizer with a higher threshold. The advanced version of RX 3 has a Dialog Denoiser function. It differs from a conventional denoiser in simpler controls and convenient automation. It allows you to change the noise profile in real time, and if you automate the noise profile according to how the noise level in the record changes, for example, draw a periodic change in the noise profile, then you can more accurately suppress this noise.

Another tool that can be used if the noise changes over time is the Deconstructor module. It separates the signal into tonal and noise components. Unlike a denoizer, it allows you to separate them not only by level, but also by separating tonal and noise components. For example, if we have a recording of a flute in a room, then the denoiser, having analyzed the noise of the room, will remove it from the recording, and Deconstructor does not pay attention to the signal level, so it will select only harmonics from the flute and allow it to suppress, for example, breathing noise or air sound emanating from the valves.

But such "distilled purification "may already be a distortion of the realism of the sound...

Absolutely right, this is a dangerous tool, it should be used in places where it is known that all noise is unwanted. If there is noise that is part of the useful signal, for example, drum brushes or consonant hissing sounds, the use of a deconstructor is not justified. It should be applied only to selected fragments, and not to the entire recording.

Another tool that can be used when noise changes over time is the already mentioned “spectral repair”, you can highlight the loudest parts of the noise and apply interpolation, horizontal or vertical, and reduce this noise level.

This is perhaps the most common chain of operations for vinyl restoration. Depending on the situation, you can use some other specific tools, but to summarize, we can say that the first thing you need to do is remove clicks and crackles, and then remove noise. Although, of course, as with any restoration, it is necessary to balance the loss of the useful signal with the degree of noise removal. Experienced restorers always know that clicks and noises are not nearly as disturbing to ordinary listeners as they are to themselves, because they are accustomed to monitoring distortion, and for the average listener, clicks and crackles in an old recording are familiar. If they are too loud and disturbing, they should be suppressed. If they are simply creating a background atmosphere, they do not need to be suppressed. Often it is enough to simply remove the clicks without affecting the faint residual background noise.

What format is optimal for digitizing vinyl?

It is better to take the maximum bit depth, 24 bits, while the file format can even be 32 bits with floating point, or maybe 24 bits, it depends on your workstation, and the sampling frequency of 44.1 kHz is quite enough for high-quality sound and for processing those the tools I talked about. High sampling rates, 96 kHz and higher, do not provide any benefit in the quality of record restoration.

Which control is better to choose during restoration? Large monitors, small monitors, headphones?

This largely depends on the preferences of the restorer, but most people use large, good monitors. They use headphones only part of the time for certain operations.

But some defects, small clicks, this one "rash "heard better with headphones...

Yes, such things are better controlled with headphones, but it’s better to listen to the full picture after finishing work on good monitors. Monitors are closer to what sound engineers focus on when creating recordings. Although it is possible, lately this has been gradually changing towards headphones.

Yes, technical restoration is still a laboratory process, it’s not about the color of the sound that monitors can provide, it’s about working directly with sound information without regard to any artistic decisions. Here it is important to remove defects, and not just to improve the sound, as with mastering. Therefore, the main thing we should hear during restoration is all the defects, all the rubbish...

Yes I agree. What is more convenient for a sound engineer-restorer to work with is what he prefers. But different types of control are needed; it is impossible to control using instruments.

Now let's talk about the specifics and problems of magnetic tape restoration. They are also different, there are films from the 1950s, and others from the 80s. There are studio reels, and there are cassettes...

Yes, they all differ significantly in noise level and detonation level. When digitizing a tape, as always, we start with choosing equipment. A professional tape should be played on a working professional tape recorder. There is a whole science of how to handle old tapes; here it is better to refer to the relevant literature. If we are talking about cassettes, then you need to have access to decks with low detonation and a professional level of output signal.

There is an opinion that you need to play the cassette on the same copy of the tape recorder on which it was recorded, in order to completely match the azimuths of the heads...

I only partly agree with this. It is always important to know what the tape was recorded on and how this can affect playback, and it is good if the tape is digitized from the device on which it was recorded, but it happens that playing it on a professional deck is better than introducing detonation from playing on a bad device. A compromise is needed here, and it’s good if there are both playback options - an original and a professional device.

When the recording has already been digitized, the first thing, since there are usually no clicks, is to check whether the recording is truly stereophonic or whether it is close to mono. If it is almost mono and the stereo image can be sacrificed in the same way as we did with vinyl, you should adjust the azimuth and amplitude balance of the channels, and then apply either MS encoding or a center channel extraction operation. This allows for significant suppression of noise, which in stereo recording is often uncorrelated in the left and right channels. And then stationary noise should be suppressed, since this main view interference in magnetic recording. Such noise is quite successfully suppressed by a denoiser.

What if the recording has a high level of nonlinear distortion?

If this is a feature of a particular tape or cassette throughout the entire recording, it can hardly be overcome; if the overshoot occurs only in certain areas, you should use a declipper. It doesn't always work, but in most cases it improves the result. In the third version of RX, the declipper has made some improvements to both the interpolation algorithm and the user interface. Now you can automatically analyze the histogram of audio file levels and the program will suggest, using the Suggest button, the level at which clipping occurs, and this level can be set to be different for both the positive and negative half-waves. This is important in cases where the tape is saturated differently for positive and negative magnetization levels. Using such an independent clipping threshold, you can more accurately preserve the information that is not yet distorted and interpolate only those areas that are located behind the clipping point.

But there are distortions that arose not from level overload, but for some other reasons, and are at low levels - all sorts of growls, hoarseness, turbidity. Is there any way to clean this out?

This is a complex operation and will most likely have to be done manually. Two tools I would try are Deconstruct and Spectral Repair. Deconstruct allows you to generally distinguish harmonic components from non-harmonic components and suppress them, although sometimes non-harmonic components are part of the desired signal and the instrument does not distinguish them well. But in general, Deconstruct allows you to “thin out” the spectrum and make it a little less “cluttered” with small sound artifacts and leave larger fundamental harmonics.

But what about low-level signals, when the useful signal rises slightly above the noise level - how can we separate the useful signal and noise here - technically and, perhaps, even conceptually?

A very difficult question. A problem that often arises is when someone speaks into the buttonhole and their clothing touches it, causing a rustling sound that mixes with the speech. This noise is non-stationary, it changes very quickly in time and is covered by speech, which makes it impossible to simply reduce its level. You have to manually, using spectral repair, carefully select small fragments and reduce the amplitude of the rustle. You can also use Deconstruct, but again to selections made by hand. I don't know any good automatic tools for such cases.

Sometimes a stereo looper is used when recording, in this case the voice comes from a certain direction, then using azimuth correction you can position the voice in the center and apply center channel extraction. At the same time, since the lavalier is stereo, the left and right capsules can catch rustles differently and they will be more widely spaced across the stereo panorama. But this doesn’t always work; this is one of the most difficult problems when editing.

Haven't they appeared? "medicines "from detonation?

There are no good automatic “medicines” yet. However, some technologies have appeared, software and hardware plunging process. These are systems for recording a cassette along with a bias tone. This applies not only to cassettes, but also to professional roll tapes. The recording is made at a high sampling rate from a tape recorder that does not suppress the bias current, and special software allows, using frequency fluctuations of this bias tone, to resample, that is, resample the recording to compensate for detonations. This is a very correct process, I think, but it requires special equipment. In addition, it is patented, so implementing this solution in other developments is not so easy; patent law issues will have to be resolved with the authors of the technology.

To correct detonation without using a bias current pilot tone, Celemony recently released Capstan software. She tries, based on changes in the pitch of the musical tone itself, to identify the most likely characteristic of detonation and then apply it to the recording in order to cancel the detonation. The most difficult thing here is to understand - what kind of detonation was there? How can you differentiate the vibrato of a string instrument from the creep of the entire orchestra? If there are problems with the entire orchestra, the fact and nature of detonation is easier to establish, but in the case of a solo instrument it is difficult to distinguish vibrato from detonation. According to reviews, this program does not cope very well in automatic mode, but this area is promising and, I think, will continue to develop. Although, as old media are transferred to digital form, the relevance of such work will decrease. In general, we can say that the nature of this activity is changing from the restoration of old media simply to the correction of defects in modern recordings, which are caused not so much by defects of the media, but by difficult recording conditions - poor premises, extraneous noise, as well as clipping and other deviations from the correct setting of the recording path .

But it seems to me that the field of activity for restorers is very wide. Not everything has been digitized yet, a lot is in collections and storage rooms, even the latest concert recordings always need cleaning, there are endless and often very interesting and valuable amateur archives, not to mention such a field of application as forensics...

Now a question about modern media. Nowadays, it is often necessary to use compressed file formats as sources - MP3, etc. What problems might there be during restoration?

Interestingly, much of the distortion that occurs in lossy compression formats is the signal clipping that occurs during decompression. When a file is encoded, normalized to 0 dB and limited with a hard threshold, then a significant portion of the samples in the file have an amplitude of 0 dB. And although there is no clipping in the original file, since the compression of a file in MP3 includes both filtering and approximation with a simpler signal, as a result the signal levels also change. And due to this, peak levels may increase. When such a file is decoded, clipping occurs that was not present in the original file.

The first advice that can be given in this case is to use a decompressor that allows you to write in a floating point format so that clipping due to compression does not occur. In programs using QuickTime, that is, on Mac computers and many PCs, decompression is performed from MP3 to floating point format, and all peaks that are restored are saved in the file and are above 0 dB. If such a file is played back, it will clip, but this can be limited by a limiter or simply reduced in level so that the shape of these peaks is preserved. This is the very first and most important thing that comes to mind.

But then everything, unfortunately, is much more difficult. I don't know of any good algorithm for restoring the quality of recordings that have been compressed to MP3, AC3, AAC or any other lossy format. There are some software tools that claim to restore the quality of MP3 files, but in reality they use operations that do not restore the signal, but are a kind of “creative” sound improvement. For example, Transient Shaper, that is, a shaper of transient signal components. Emphasizing transients in some way increases the clarity of attacks and percussion instruments in the phonogram, but such an operation can hardly be called restoration of artifacts in MP3.

There are also exciters that try to synthesize the high frequencies lost during MP3 encoding, but they can introduce distortion below the cutoff frequency. Since the exciter has no information about what these high frequencies actually were, it tries to restore them as it assumes and sees fit, and this does not necessarily bring us closer to the original. Although this may look like some improvement in sound.

Therefore, I repeat, it is best to use a decompressor that allows you to decode without clipping a file limited in the original to 0 dB, and then - depending on the situation. There is information that iTunes records are normalized to -1 dB to avoid clipping during playback. In general, the quality of the algorithms and operation of specific MP3 encoder programs is extremely important; they vary greatly in sound quality with the same bitrate and bit depth parameters.

That's why it's known general rule- during restoration, if possible, do not deal with compressed files.

I completely agree - if an uncompressed file is available, preference should be given to it.

But there is a huge amount of material related to audiovisual products - all sorts of online videos, clips from YouTube etc., where, unfortunately, unpredictable things happen to the soundtrack, and as a result, the sound can change greatly in the nature of the signal.

Yes, this is a separate big topic. I have not specifically researched this issue myself, but I know that strong compression is indeed applied to both video and audio, and the nature of audio compression may depend on what compression the user has chosen for the video. For example, if HD video is selected, then the audio will be transmitted in higher quality, and if SD 360 is selected, then the audio will be at a low bitrate. Apparently, high and low quality versions are stored on the servers. This is more a question for YouTube programmers - can a user influence how the service will process your track. Maybe soon there will be automatic compression, noise reduction or normalization. It is clear that for a user who has never worked with sound, any automation may seem like a blessing, but for sound engineers and creators of professional media content, this is, of course, a big problem - not knowing what is happening with your recording on the server.

New methods for measuring audio signals are being introduced. In the third version RX a statistics window appeared, where, in addition to the usual RMS , parameters such as True Pick Level , Sample Peak Level , L.U. And LUFS . A few words about how "read "these indicators?

Peak Level is the maximum level of digital samples, the peak level that most workstations will show, and True Pick Level is the level of the analog signal that will be reconstructed by the DAC from digital samples. Since the DAC includes an oversampling filter, it does not generate steps or straight lines between samples, but a smooth curve, which, oscillating between digital samples, can even exceed their level. Therefore, the peak level of the analog signal reconstructed from digital samples should also be monitored, for which the true-peak level is displayed. It most often turns out to be slightly higher, sometimes up to a decibel, than the digital peak level. The amount of excess depends on the abundance of high frequencies and sharp, strong transients in the signal.
- these are measurements of signal volume in accordance with modern standards IEC 1770 and others. Unlike the objective measurement of a signal by electrical level, loudness measurement is close to auditory perception and hearing characteristics. This is a measurement using a special curve that takes into account curves of equal loudness, a measurement taking into account the features of time integration: long, short (corresponding to VU readings), and common for the entire phonogram or a selected fragment. These measures take into account tonal balance and sound character better than a purely physical measure of RMS pressure. It is especially important that the dynamic range is calculated - how much the phonogram is “flattened”. All this allows you to check the phonogram for compliance with the standards that are currently being implemented in broadcasting.

Above we touched upon the issues of a creative approach to restoration and improvement - I also had to manually draw blows on the cymbal and hi-hat on one old noisy phonogram, while also manually selecting the segments between the blows and introducing some attenuation into them, and then the blows became more "readable ", bright and distinct. But this is extremely labor-intensive work! Sometimes one second of sound takes several minutes of work! Is it possible to automate it somehow?

I'm not sure that the operation you did is universal and suitable for everyone, it is applicable to certain types of sounds of certain instruments. While it is difficult to automate, it will require a lot of adjustments, and the result may be worse than with manual editing.

To what extent is restoration a purely technical, engineering work, and how much is it creative? What are the limits of what a restorer can afford and what he should not afford?

This is a very subjective question. It all depends on the listeners. There are people who will be unhappy with any interference in the recording, even the most minimal suppression of noise and clicks. But most listeners adhere to some kind of middle ground. They are ready to accept significant noise reduction, of course, if the timbre does not suffer. They will be happy with the mono to stereo conversion. I think this is completely acceptable and in keeping with the spirit of the times.

Word "restoration "means restoration. Do we have the right to strive to recreate the lost informational elements of sound using methods such as exciter, etc.?

I think this is something every sound engineer should decide. I myself, if asked, limit myself to eliminating defects and noise, and then transfer the material to the sound engineer, and he decides what to do next. I believe that anything is acceptable, but you have to be honest with the listener. It is necessary to write on the album covers what was done - creative restoration or minimal noise reduction. In this case, purists will buy one remastering, and casual listeners who don't know what the untouched original sounded like will buy another.

I believe that the restorer has the right "help "the performer to overcome the shortcomings of old recording technology. It is not the fault of the masters of the past that sometimes half of the sounds in their recordings are not heard or the microphone has changed the timbre of the voice. In the third version RX a useful tool has appeared that allows you to isolate the first ten harmonics, and you can raise, say, the upper singing formant and restore the brightness to the singer’s voice that was lost during recording. It is clear that this is no longer an engineering solution.

Yes, equalization is already part of the restoration, it creative part. Such things require good musical knowledge from the restorer, knowledge of theory and harmony, knowledge of the history of music and styles.

And thanks to your company and its developers for giving restorers such powerful and deep restoration tools, and not just the ability to remove clicks. I think that the development of such tools, such functionality in the future will provide incredible opportunities for restoring lost sound, and you should continue to move in this direction.

Yes, we are working on this, and such trends and results are emerging in the scientific community, and our task at iZotope is to bring together scientific research and practical developments to give conservators as many opportunities as possible to work with.

Thank you for your work, and we will wait for new products!

Anatoly Veytsenfeld

Love

1 0 0

If you find an error, please select a piece of text and press Ctrl+Enter.

Sound restoration. Criteria for assessing audio material during restoration. Restoring audio recordings at home Mike Thornton restoration and cleaning of audio files

Editor's Choice