Audio Terminologies & File Formats That You Should Know

 


1 . Acoustic Terminologies

2 . Audio editing terminologies


3 . Hardware Terminologies


4 . General audio terminologies


5 . Audio file formats


 

 

Acoustic Terminologies:

Acoustic Foam — A specific type of open-celled expanded polyurethane foam that allows sound waves to enter and flow through the foam, absorbing their energy and preventing them being reflected. The density and depth of the foam affects the frequency range over which it is effective as an absorber.

Acoustic Treatment — A generic term embracing a range of products or constructions intended to absorb, diffuse or reflect sound waves in a controlled manner, with the intention of bestowing a room with an acceptable reverberation time and overall sound character.

Bass Trap — A special type of acoustic absorber which is optimized to absorb low-frequency sound waves.

Boundary — A physical obstruction to sound waves, such as a wall, or a large solid object. When sound waves reach a boundary they create a high pressure area at the surface.

Decoupler (also isolator) — A device intended to prevent the transmission of physical vibration over a specific frequency range, such as a rubber or foam block.

Early Reflections — The initial sound reflections from walls, floors, and ceilings following a sound created in an acoustically reflective environment.

Fidelity — The accuracy or precision of a reproduced acoustic sound wave when compared to the electrical input signal.

Flutter Echoes — Short time-span sound echoes which can be created when sound waves bounce between opposite walls in a small or moderately sized room. A shorter version of the ‘slapback’ echo which can be experienced in a larger hall when sound from a stage is reflected strongly from the rear wall.

Isolation Room — A separate room or enclosure designed to provide acoustic isolation from external noise. Often used alongside a studio’s main live room to record vocals or drums, for example, without spill from other instruments.

Isolator (also decoupler) — A device intended to prevent the transmission of physical vibrations over a specific frequency range, such as a rubber or foam block. The term can also be applied to audio isolation transformers, used to provide galvanic isolation between the source and destination, thus avoiding ground loops. 

 

 

Audio editing terminologies:

Backup — A safety copy of software or other digital data. A popular saying is that unless data exists in three physically separate locations at the same time, it hasn’t been backed up properly!

Chord — Three or more different musical notes played at the same time.

Chromatic — A scale of pitches rising or falling in semitone steps.

Clipboard: The clipboard is where sample data is saved when you cut or copy it from a data window. You can then paste, mix, or cross fade the sample data stored on the clipboard with another data window. This sample data can also be used by other Windows applications that support Sound data on the clipboard, such as Sound Recorder.

Clipping — When an audio signal is allowed to overload the system conveying it, clipping is said to have occurred and severe distortion results. The ‘clipping point’ is reached when the audio system can no longer accommodate the signal amplitude –either because an analogue signal voltage nears or exceeds the circuitry’s power supply voltage, or because a digital sample amplitude exceeds the quantiser’s number range. In both cases, the result is that the signal peaks are ‘clipped’ because the system can’t support the peak excursions — a sine wave source signal becomes more like a square wave. In an analogue system clipping produces strong harmonic distortion artifacts at frequencies above the fundamental. In a digital system those high frequency harmonics cause aliasing which results in an harmonic distortion where the distortion artifacts reproduce at frequencies below the source fundamental. This is why digital clipping sounds so unlike analogue clipping, and is far more unpleasant and less musical.

Click Track — An audible metronome pulse which assists musicians in playing in time.

Clone — An exact duplicate. Often refers to digital copies of digital tapes.

Comping — Short for ‘compilation.’ The process of recording the same performance (e.g. a lead vocal) several times on multiple tracks to allow the subsequent selection of the best sections and assembling them to create a ‘compilation’ performance which would be constructed on a final track.

Cross fade: Mixing two pieces of audio by fading one out as the other fades in:

Cross fade Loop: Sometimes a sample loop cannot be easily created from the given source material. In these instances, a cross fade can be applied to the beginning and end of the loop to aid in the smooth transition between the two. The Cross fade Loop function provides a method of creating sampling loops in material that is otherwise difficult to loop.

Cut and Paste Editing — The ability to copy or move sections of a recording to new locations.

DC Offset: DC offset occurs when hardware, such as a sound card, adds DC current to a recorded audio signal. This current results in a recorded waveform that is not centered around the baseline (-infinity). Glitches and other unexpected results can occur when sound effects are applied to files that contain DC offsets. Sound Forge software can compensate for this DC offset by adding a constant value to the samples in the sound file.

In the following example, the red line represents the baseline.

Destructive Editing: Destructive editing is the type of editing whereby all cuts, deletes, mixes and other processes are actually processed to the sound file. Any time you delete a section of a sound file in Sound Forge software, the sound file on disk is actually rewritten without the deleted section.

Drag and Drop: A quick way to perform certain operations using the mouse. To drag and drop, you click and hold a highlighted selection, drag it (hold the left mouse button down and move the mouse) and drop it (let go of the mouse button) at another position on the screen.

Dubbing — The practice of transferring material from one medium to another, or of adding further material to an existing recording (cf. Over-Dub).

Erase — To remove recorded material from an analogue tape, or to remove digital data from any form of storage media.

File — A container for stored digital data that usually has a meaningful name. For example, a Standard MIDI File is a specific type of file designed to allow sequence information to be interchanged between different types of sequencer.

Frame Rate: Audio uses frame rates only for the purposes of synchronizing to video or other audio. To synchronize with audio, a rate of 30 non-drop is typically used. To synchronize with video, 30 drop is usually used.

Frequency Modulation (FM): Frequency Modulation (FM) is a process by which the frequency (pitch) of a sound is varied over time. Sub audio frequency modulation results in pitch-bending effects (vibrato). Frequency modulation within audio band frequencies (20 Hz – 20,000 Hz) creates many different side-band frequencies that drastically alter the timbre of the sound.

Frequency Spectrum: The frequency spectrum of a signal refers to its range of frequencies. In audio, the audible frequency range is between 20 Hz and 20,000 Hz. The frequency spectrum sometimes refers to the distribution of these frequencies. For example, bass-heavy sounds have a large frequency content in the low end (20 Hz – 200 Hz) of the spectrum.

Invert Data: Inverting sound data reverses the polarity of a waveform around its baseline. Inverting a waveform does not change the sound of a file; however, when you mix different sound files, phase cancellation can occur, producing a “hollow” sound. Inverting one of the files can prevent phase cancellation.

Loop — The process of defining a portion of audio within a DAW, and configuring the system to replay that portion repeatedly. Also, a circuit condition where the output is connected back to the input.

Marker: A marker is an anchored, accessible reference point in a file. Markers can be used for quick navigation.

Nondestructive Editing: This type of editing involves a pointer-based system of keeping track of edits. When you delete a section of audio in a nondestructive system, the audio on disk is not actually deleted. Instead, a set of pointers is established to tell the program to skip the deleted section during playback.

Normalize: Refers to raising the volume so that the highest level sample in the file reaches a user-defined level. Use normalization to make sure you are using all of the dynamic range available to you.

One-Shot: One-shots are RAM-based audio clips that are not designed to loop. Things such as cymbal crashes and sound bites could be considered one-shots. Longer files can be treated as one-shots if your computer has sufficient memory.

Zero-Crossing: A zero-crossing is the point where a fluctuating signal crosses the baseline. By making edits at zero-crossings with the same slope, the chance of creating glitches is minimized.

Zipper Noise: Zipper noise occurs when you apply a changing gain to a signal, such as when fading out. If the gain does not change in small enough increments, zipper noise can become very noticeable. Fades are accomplished using 64-bit arithmetic, thereby creating no audible zipper noise.

 

 

 

Hardware Terminologies:

A-Type Plug — A domestic and semi-pro form of jack plug, also known as TS or TRS and widely used for electric instruments, headphones and line-level connections on semi-pro equipment. (cf. B-Type Plug)

A-Weighting — A form of electrical filter which is designed to mimic the relative sensitivity of the human ear to different frequencies at low sound pressure levels (notionally 40 Phons or about 30dBA SPL). Essentially, the filter rolls-off the low frequencies below about 700Hz and the highs above about 10kHz. This filtering is often used when making measurements of low-level sounds, like the noise floor of a device. (See also C-Weighting and K-Weighting)

AC — Alternating Current (cf. DC). Audio signals are represented in the electrical domain as currents flowing alternately forward and back in the circuits as an analogue of the compression and rarefaction of acoustic air pressure.

Active — Describes a circuit containing transistors, ICs, tubes and other devices that require power to operate, and which are capable of amplification.

Active Loudspeaker or Monitor — A loudspeaker system in which the input signal is passed to a line-level crossover, the suitably filtered outputs of which feed two (or more) power amplifiers, each connected directly to its own drive unit. The line-level crossover and amplifiers are usually (but not always) built in to the loudspeaker cabinet.

A/D [A-D] Converter — A device which converts an analogue audio signal into a digital representation.

ADAT Lightpipe — A widely used eight-channel optical digital audio interface developed by Alesis as a bespoke interface for the company’s digital eight-track tape machines in the early 1990s (Alesis Digital Audio Tape). The interface transfers up to eight channels of 24-bit digital audio at base sample rates (44.1 or 48kHz) via a single fibre-optic cable. This ‘lightpipe’ is physically identical to that used for the TOSlink optical S/PDIF stereo interface found on many digital consumer hi-fi devices, but while the fibre itself can be used interchangeably for either format, the S/PDIF and ADAT interfaces are not compatible in any other way. The interface incorporates embedded clocking, and padding zeros are introduced automatically if the word length is less than 24 bits.

Although not supported by all ADAT interfaces, most modern devices employ the S/MUX (Sample Multiplexing) protocol (licensed from Sonorus) which allows higher sample rates to be employed at the cost of fewer channels of audio. The S/MUX2 format operates at double sample rates (88.2 and 96kHz) but carries only four channels, while S/MUX4 operates at quad rates (176.4 and 192kHz) with two channels. S/MUX uses a clever technique that divides the high sample rate data across the nominal channels in such a way that accidental level changes or dithering applied identically to each channel in the data stream will not destroy the wanted demultiplexed signal.

AES3 — A digital audio interface which passes two digital audio channels, plus embedded clocking data, with up to 24 bits per sample and sample rates up to 384kHz. Developed by the Audio Engineering Society and the European Broadcasting Union, it is often known as the AES-EBU interface. Standard AES3 is connected using 3-pin XLRs with a balanced cable of nominal 110 Ohm impedance and with a signal voltage of up to 7V pk-pk. The related AES3-id format uses BNC connectors with unbalanced 75 Ohm coaxial cables and a 1V pk-pk signal. In both cases the datastream is structured identically to S/PDIF, although some of the Channel status codes are used differently.

AES10 — An AES standard which defines the MADI interface (serial Multichannel Audio Digital Interface). MADi can convey either 56 or 64 channels via single coaxial or optical connections.

AES11 — An AES standard that defines the use of a specific form of AES3 signal for clocking purposes. Also known as DARS (Digital Audio Reference Signal).

AES17 – And AES standard that defines a method of evaluating the dynamic range performance of A-D and D-A converters.

AES42 — An AES standard which defines the connectivity, powering, remote control and audio format of ‘digital microphones.’ The audio information is conveyed as AES3 data, while a bespoke modulated 10V phantom power supply conveys remote control and clocking information.

AES59 — An AES standard which defines the use and pin-outs of 25-pin D-sub connectors for eight-channel balanced analogue audio and bi-directional eight-channel digital interfacing. It conforms fully with the established Tascam interface standard.

AFL — After Fade listen. A system used within mixing consoles to allow specific signals to be monitored at the level set by their fader. Aux sends are generally monitored AFL rather than PFL (see PFL).

Amp/Amplifier — An Amplifier is an electrical device that typically increases the voltage or power of an electrical signal. The amount of amplification can be specified as a multiplication factor (eg. x10) or in decibels (eg. 20dB).

Audio Interface — A device which acts as the physical bridge between the computer’s workstation software and the recording environment. An audio interface may be connected to the computer (via FireWire, USB, Thunderbolt, Dante, AVB or other current communication protocols) to pass audio and MIDI data to and from the computer. Audio Interfaces are available with a wide variety of different facilities including microphone preamps, DI inputs, analogue line inputs, ADAT or S/PDIF digital inputs, analogue line and digital outputs, headphone outputs, and so on. The smallest audio interfaces provide just two channels in and out, while the largest may offer 30 or more.

Autolocator — A common facility on tape machines or other recording devices that enables specific time points to be stored and recalled. For example, you may store the start of a verse as a locate point so that you can get the tape machine or DAW to automatically relocate the start of the verse after you’ve recorded an overdub.

Auxiliary Sends (Auxes) – A separate output signal derived from an input channel on a mixing console, usually with the option to select a pre- or post-fader source and to adjust the level. Corresponding auxiliary sends from all channels are bussed together before being made available to feed an internal signal processor or external physical output. Sometimes also called effects or cue sends.

Aux Return — Dedicated mixer inputs used to add effects to the mix. Aux return channels usually have fewer facilities than normal mixer inputs, such as no EQ and access to fewer aux sends. (cf. Effects Return)

Azimuth — the alignment of a tape head which references the head gap to the true vertical relative to the tape path. (cf. Wrap and Zenith).

B-Type Plug — A professional form of jack plug derived from the telecommunications industry and also known as the PO316. Widely used for balanced mic and line-level connections on professional patch bays. (cf. A-Type Plug)

Back Electret — A form of electrostatic or capacitor microphone. Instead of creating an electrostatic charge within the capacitor capsule with an external DC voltage, an electret microphone employs a special dielectric material which permanently stores a static-electric charge. A PTFE film is normally used, and where this is attached to the back plate of the capsule the device is called a ‘back electret’. Some very early electret microphones used the dielectric film as the diaphragm but these sounded very poor, which is why later and better designs which used the back electret configuration were specifically denoted as such. Designs which attach the PTFE film to the diaphragm are known as Front Electrets. Modern electret capsules compare directly in quality with traditional DC-biased capacitor capsules, and are available in the same range of configurations — large, medium and small diaphragm sizes, single and dual membrane, fixed or multi-pattern, and so on.

Balanced Wiring — Where protection from electromagnetic interference and freedom from earth references are required, a balanced interface is used. The term ‘balanced’ refers to identical (balanced) impedances to ground from each of two signal carrying conductors which are enclosed, again, within an all-embracing overall screen. This screen is grounded (to catch and remove unwanted RFI), but plays no part in passing the audio signal or providing its voltage reference. Instead, the two signal wires provide the reference voltage for each other — the signal is conveyed ‘differentially’ and the receiver detects the voltage difference between the two signal wires. Any interference instils the same voltage on each wire (common mode) because the impedance to ground is identical for each, and as there is therefore no voltage difference between the signal wires, the interference is ignored completely by the receiver.

Signals conveyed over the balanced interface may appear as equal half-level voltages with opposite polarities on each signal wire — the most commonly described technique. However, modern systems are increasingly using a single-sided approach where one wire carries the entire signal voltage and the other a ground reference for it. Some advantages of this technique include less complicated balanced driver stages, and connection to an unbalanced destination still provides the correct signal level, yet the interference rejection properties are unaffected. Effective interference rejection requires both the sending and receiving devices to have balanced output and input stages respectively.

Bass Response — The frequency response of a loudspeaker system at the lower end of the spectrum. The physical size and design of a loudspeaker cabinet and the bass driver (woofer) determine the low frequency extension (the lowest frequency the speaker can reproduce at normal level) and the how quickly the signal level falls below that frequency.

Bantam Plug — Also known as TT or Tiny Telephone Plugs. A professional form of miniature jack plug derived from the telecommunications industry and widely used for balanced mic and line-level connections on professional patch bays. (cf. B-Type Plug)

Blumlein Array — A stereo coincident microphone technique devices by Alan Blumlein in the early 1930s, employing a pair of microphones with figure-eight polar patterns, mounted at 90 degrees to each other with the two diaphragms vertically aligned.

BNC — A type of bayonet-locking, two-terminal connector used for professional video and digital audio connections.

Boom — A mechanical means of supporting a microphone above a sound source. Many microphone stands are supplied with a ‘boom arm’ affixed to the top of the stand’s main vertical mast. The term may also be applied to larger, remotely controlled microphone supports used in film and TV studios, or even to the handheld ‘fishpoles’ used by film and TV sound recordists.

Boundary Layer Microphone — A specialized microphone where the diaphragm is placed very close to a boundary (eg. wall, floor or ceiling). In this position the direct and reflected sound adds constructively, giving a 6dB increase in sensitivity. It also avoids the comb-filtering that can occur when a conventionally placed microphone captures the direct sound along with strong first reflections from nearby boundaries. Also known as PZM or Pressure Zone Microphone.

Buffer — An electronic circuit designed to isolate the output of a source device from loading effects due to the input impedance of destination devices.

Bus — (Also sometimes referred to as a buss) An electrical signal path along which multiple signals may travel. A typical audio mixer contains several (mix) busses which carry the stereo mix, subgroups, the PFL signal, the aux sends, and so on. Power supplies are also fed along busses.

C-Weighting — A form of electrical filter which is designed to mimic the relative sensitivity of the human ear to different frequencies at high sound pressure levels (notionally 100 Phons or about 87dBA SPL). Essentially, the filter rolls-off the low frequencies below about 20Hz and the highs above about 10kHz. This filtering is often used when making measurements of high-level sounds, such as when calibrating loudspeaker reference levels. (See also A-Weighting and K-Weighting)

Cabinet — The physical construction which encloses and supports the loudspeaker drive units. Usually built of wood or wood composites (although other materials are often used including metal alloys and mineral composites). Cabinets can be ‘sealed’ or ‘vented’ in various ways, the precise design influencing the bass and time-domain characteristics.

Cabinet Resonance — Any box-like construction will resonate at one or more frequencies. In the case of a loudspeaker, such resonances are likely to be undesirable as they may obscure or interfere with the wanted sound from the drive units. Cabinets are usually braced and damped internally to minimise resonances.

Capacitor — A passive, two-terminal electrical component which stores energy in the form of an electrostatic field. The terminals are attached to conductive ‘plates’ which are separated by a non-conductive dielectric. Capacitance is measured in Farads. If a voltage is applied across the terminals of a capacitor a static electric field develops across the dielectric, with positive charge collecting on one plate and negative charge on the other. Where the applied voltage is an alternating signal, a capacitor can be thought of as a form of AC resistance that reduces with increasing signal frequency. The old-fashioned term is a ‘condenser’.

Capacitor Microphone — Also known as a ‘condenser microphone’. This is a specific form of electrostatic microphone which operates on the principle of measuring the change in electrical voltage across a capacitor. The capacitor is formed from two metal electrodes, one fixed (the back-plate) and the other a thin conductive membrane that flexes in response to sound pressure. (See also Back Electret, and RF Capacitor Microphone.)

Capsule — An alternative term for a transducer which converts acoustic sound waves into an electrical signal.

Carbon Microphone — (Also known as a Carbon Button Microphone). An obsolete form of microphone in which carbon granules are contained between two metal contact plates, one of which acts as the diaphragm and moves in response to sound waves. The microphone has to be biased with a DC voltage which causes a current to pass from one metal contact plate, through the carbon granules, to the other metal contact plate. The varying pressure exerted on the carbon granules by the moving diaphgram causes a varying resistance and thus a varying current which is analogous to the sound waves. Carbon Button Microphones were used in the very early days of sound recording and broadcasting, as well as in domestic telephones up until the 1980s when electret capsules became more commonplace.

Cardioid — A specific form of polar response of a unidirectional microphone or loudspeaker. It is an inverted heart-shape which has very low sensitivity at the back (180 degrees), but only slightly reduced sensitivity (typically between 3 and 6dB) at the sides (90/270 degrees).

CD-R — A recordable type of Compact Disc that can only be recorded once – it cannot be erased and reused. The CD-R’s technical characteristics are defined in the ‘Orange Book’

CD-R Burner — A device capable of recording data onto blank CD-R discs.

Channel — A path carrying for audio or data. In the context of a mixing console a channel is a single strip of controls relating to one input. In the context of MIDI, Channel refers to one of 16 possible data channels over which MIDI data may be sent. The organisation of data by channels means that up to 16 different MIDI instruments or parts may be addressed using a single cable.

Chip — A slang term for an Integrated Circuit or IC.

Clocking — The process of controlling the sample rate of one digital device with an external clock signal derived from another device. In a conventional digital system there must be only one master clock device, with everything else ‘clocked’ or ‘slaved’ from that master.

Close-Miking — A mic technique which involves placing a microphone very close to a sound source, normally with the intention of maximising the wanted sound and minimising any unwanted sound from other nearby sound sources or the room acoustics. IN classic music circles the technique is more often known as ‘Accent Miking’.

Coincident — A means of arranging two or more directional microphone capsules such that they receive sound waves from all directions at exactly the same time. The varying sensitivity to sound arriving from different directions due to the directional polar patterns means that information about the directions of sound sources is captured in the form of level differences between the capsule outputs. Specific forms of coincident microphones include ‘XY’ and ‘MS’ configurations, as well as B-format and Ambisonic arrays. Coincident arrays are entirely mono-compatible because there are no timing differences between channels.

Common Mode Rejection — A measure of how well a balanced circuit rejects an interference signal that is common to both sides of the balanced connection.

Compact Cassette — Originally conceived as a recording format for dictation machines in the early 1960s, it became a mainstream music release format in the form of the Musicassette. A plastic shell protected 3.81mm wide (1/8-inch) recording tape which ran at 4.75cm/s. A stereo track was recorded in one direction, and the tape could be turned over to play a second stereo track recorded in the opposite direction.

Compander — An encode-decode device typically employed to pass a wide dynamic range signal over a channel with a lower dynamic range capability. The source signal is compressed in the encoder to reduce the dynamic range, and subsequently expanded by the decoder to restore the original dynamics. The Dolby noise reduction codecs are examples of companders.

Conductor — A material that provides a low resistance path for electrical current.

Cone — A specific shape of drive unit diaphragm intended to push and pull the air to create acoustic sound waves. Most bass drivers use cone-shaped diaphragms, where the electromagnetic motor of the drive unit is connected to the point of the cone, and its outer diameter is supported by some form of flexible membrane.

Console — An alternative term for mixer.

Contact Cleaner — A compound designed to increase the conductivity of electrical contacts such as plugs, sockets and edge connectors. (cf. De-Oxidising Compound)

Control Voltage — A variable voltage signal typically used to control the pitch of an oscillator or filter frequency in an analogue synthesizer. Most analogue synthesizers follow a one volt per octave convention, though there are exceptions. To use a pre-MIDI analogue synthesizer under MIDI control, a MIDI to CV converter is required.

Converter — A device which transcodes audio signals between the analogue and digital domains. An analogue-to-digital (A-D) converter accepts an analogue signal and converts it to a digital format, while a digital-to-analogue (D-A) converter does the reverse. The sample rate and wordlength of the digital format is often adjustable, as is the relative amplitude of analogue signal for a given digital level.

Daisy Chain — An arrangement of sharing a common data signal between multiple devices. A ‘daisy chain’ is created by connecting the appropriate output (or through) port of one device to the input of the next. This configuration is often used for connecting multiple MIDI instruments together: the MIDI Out of the master device is connected to the MIDI In of the first slave, then the MIDI Thru of the first slave is connected to the MIDI In of the second slave, and so on… A similar arrangement is often used to share a master word clock sample synchronising signal between digital devices.

DAT — An abbreviation of Digital Audio Tape, but often used to refer to DAT recorders (more correctly known as R-DAT because they use a rotating head similar to a video recorder). Digital recorders using fixed or stationary heads (such as DCC) are known as S-DAT machines.

DC — Direct Current. The form of electrical current supplied by batteries and the power supplies inside electrical equipment. The current flows in one direction only.

DCC — A stationary-head digital recorder format developed by Philips, using a bespoke cassette medium similar in size and format to Compact Cassettes. It used an MPEG data reduction system to reduce the amount of data that needs to be stored.

DBX — A manufacturer of audio processing equipment, most notably compressors and tape noise reduction systems. The DBX NR systems were commercial encode/decode analogue noise-reduction processors intended for consumer and semi-pro tape recording. Different models varied in complexity, but essentially DBX compressed the audio signals during recording and expanded them by an identical amount on playback.

DCO — Digitally Controlled Oscillator. Used in digitally-controlled synthesizers.

DDL — An abbreviation of Digital Delay Line, used to create simple delay-based audio effects.

DDP — Disc Description Protocol. A data description format used for specifying the content of optical discs including CD, and used almost universally now for the delivery of disc masters to duplication houses. A DDP file contains four elements: the Audio image (.DAT); the DDP identifier (DDPID), the DDP Stream Descriptor (DDPMS); and a subcode descriptor (PQDESCR). Often an extra text file is also included with track titles and timing data. Many DAWs and audio editing programs can now create DDP files.

De-emphasis — A system which restores the spectral balance to correct for pre-e

De-Oxidising Compound — A substance formulated to remove oxides from electrical contacts. (cf. Contact Cleaner)

Decca Tree — A form of ‘spaced microphone’ arrangement in which three microphone capsules (usually, but not always, with omnidirectional polar patterns) are placed in a large triangular array roughly two metres wide, with the central microphone one metre further forward. Sounds approaching from different directions arrive at each capsule at different times and with slightly different levels, and these timing and level differences are used to convey the directional information in the recording. The timing differences between channels can result in unwanted colouration if they are combined to produce a mono mix.

Detent — One or more physical click-stops which can be felt when a rotary control is moved. Typically used to identify the centre of a control such as a pan or EQ cut/boost knob, or to give the impression of preset positions on a gain control.

DI — An abbreviation for ‘Direct Instrument’ or ‘Direct Inject’ — the two terms being used interchangeably. Used when an electrical sound source (eg electric guitar, bass or keyboard) is connected directly into an audio chain, rather than captured with a microphone in front of a amp/loudspeaker.

Diaphragm — the movable membrane in a microphone capsule which responds mechanically to variations in the pressure or pressure gradient of sound waves. The mechanical diaphragm vibrations are converted into an electrical signal usually through electromagnetic or electrostatic techniques such as ribbon, moving coil, capacitor or electret devices.

DI Box — Direct Injection, or Direct Instrument Box. A device which accepts the signal input from a guitar, bass, or keyboard and conditions it to conform to the requirements of a microphone signal at the output. The output is a mic-level, balanced signal with a low source impedance, capable of driving long mic cables. There is usually a facility to break the ground continuity between mic cable and source to avoid unwanted ground loop noises. Both active and passive versions are available, the former requiring power from internal batteries or phantom power via the mic cable. Active DI boxes generally have higher input impedances than passive types and are generally considered to sound better.

Digital (cf. Analogue) — A means of representing information (eg audio or video signals) in the form of binary codes comprising strings of 1s and 0s, or their electrical or physical equivalents. Digital audio circuitry uses discrete voltages or currents to represent the audio signal at specific moments in time (samples). A properly engineered digital system has infinite resolution, the same as an analogue system, but the audio bandwidth is restricted by the sample rate, and the signal-noise ratio (or dynamic range) is restricted by the word-length.

DIN Connector — A consumer multi-pin connection format used for vintage microphones, some consumer audio equipment, and MIDI cabling. Various pin configurations are available.

Diode-Bridge Compressor — A form of audio compressor which uses a diode-bridge (sometimes known as a diode-ring) arrangement as the variable gain-reducing element. The design was popular in the 1960s as it provided faster responses than typical opto-compressors, and less distortion than many FET designs. However, noise can be an issue as the audio signal has to be attenuated heavily before the diode-bridge, and considerable (~40dB) gain added subsequently. The diodes also need to be closely matched to maintain low distortion.

Disc — Used to describe vinyl discs, CDs and MiniDiscs.

Disk — An abbreviation of Diskette, but now used to describe computer floppy, hard and removable data storage disks.

Dome — A specific shape of drive unit diaphragm intended to push and pull the air to create acoustic sound waves. Most tweeters use dome-shaped diaphragms which are driven around the circumference by the drive unit’s motor system. ‘Soft-domes’ are made of a fabric — often silk — while metal domes are constructed from a light metal like aluminium, or some form of metal alloy.

Double-lapped Screen — Also known as a Reussen screen. The signal-carrying wires in a microphone cable are protected from external electrostatic and RF interference by a ‘screen’ which is a surrounding conductor connected to earth or ground. The Reussen screen is a specific form of cable screen, comprising two overlapping and counter-wound layers which are unlikely to ‘open up’ if the cable is bent, yet remain highly flexible

DSP — Digital Signal Processor. A powerful microchip used to process digital signals.

Drive unit — A physical device designed to generate an acoustic sound wave in response to an electrical input signal. Drive units can be designed to reproduce almost the full audio spectrum, but most are optimised to reproduce a restricted portion, such as a bass unit (woofer) or high-frequency unit (tweeter). A range of technologies are employed, with most being moving-coil units, but ribbon and electrostatic drive units also exist, each with a different balance of advantages and disadvantages. Also known as a ‘driver’.

Drum Pad — A synthetic playing surface which produces electronic trigger signals in response to being hit with drum sticks.

Dynamic Microphone — A type of microphone that works on the electric generator principle, such as moving Coil and ribbon mics. An acoustical sound waves impact the microphone diaphragm which then moves an electrical conductor within a magnetic field to generate a current, the amplitude and polarity of which reflects the acoustic signal.

Envelope generator — An electronic circuit capable of generating a control signal which represents the envelope of the sound you want to recreate. This may then be used to control the amplitude of an oscillator or other sound source, though envelopes may also be used to control filter or modulation settings. The most common example is the ADSR generator.

E-PROM — Erasable Programmable Read Only Memory. Similar to ROM, but the information on the chip can be erased and replaced using special equipment.

Equaliser (cf. Filter) — A device which allows the user to adjust the tonality of a sound source by boosting or attenuating a specific range of frequencies. Equalisers are available in the form of shelf equalisers, parametric equalisers and graphic equalisers — or as a combination of these basic forms.

EuCon — A control protocol developed by Euphonix which operates at high-speed over an Ethernet connection. It is used between control surfaces and DAW computers to convey information about the positions of faders, knobs, and buttons and to carry display information.

Expander Module — A synthesizer with no keyboard, often rack mountable or in some other compact format.

Fader — A sliding potentiometer control used in mixers and other processors.

Ferric — A type of magnetic tape coating that uses iron oxide.

FET — Field Effect Transistor. A solid-state semiconductor device in which the current flowing between source and drain terminals is controlled by the voltage on the gate terminal. The FET is a very high impedance device, which makes it highly suited for use in impedance converter stages in capacitor and electret microphones.

Figure of Eight — Describes the polar response of a microphone or loudspeaker that is equally sensitive both front and rear, yet rejects sounds coming from the sides. Also called Bipolar.

FireWire — A computer interface format based upon the IEEE 1394 standard and named FireWire by Apple computers (Sony’s i.Link format is also the same interface). FireWire is a serial interface used for high speed isochronous data transfer, including audio and video. FireWire 400 (IEEE 1394-1995 and IEEE 1394a-2000) or S400 interface transfers data at up to 400Mb/s and can operate over cables up to 4.5metres in length. The standard ‘alpha’ connector is available in four and six-connector versions, the latter able to provide power (up to 25V and 8 watts). The FireWire 800 format (IEEE 1394b-2002) or S800 interface uses a 9-wire ‘beta’ connector and can convey data at up to 800Mb/s.

Flash Drive — A large capacity solid-state memory configured to work like a conventional hard drive. Used in digital cameras and audio recorders in formats such as SD and CF2 cards, as well as in ‘pen drives’ or ‘USB memory sticks’. Some computers are now available with solid state flash drives instead of normal internal hard drives.

Floppy Disk — An obsolete computer disk format using a flexible magnetic medium encased in a protective plastic sleeve.

Fukada Tree — A 7-microphone array surround-sound, broadly equivalent to the stereo Decca Tree. Conceived by Akira Fukada when he worked for the Japanese state broadcaster NHK. The front Left, Centre and Right outputs are generated from a trio of mics arranged in a very similar way to a Decca Tree, with the left and right outriggers spaced 2m apart, and the centre mic 1m forward. The Rear Left and Rear Right channels come from mics spaced 2m apart placed and 2m behind the front outriggers. Instead of using omni mics like a Decca Tree, all five mics are usually cardioids, aimed 60 degrees outwards to maximise channel separation. These five mics are usually supplemented with an extra pair of omni outriggers placed midway between the front and rear mics.

Gain Staging — The act of optimising the signal level through each audio device in a signal chain, or through each section of a mixing console, to maintain an appropriate amount of headroom and keep the signal well above the system noise floor.

Galvanic Isolation — Electrical isolation between two circuits. A transformer provides galvanic isolation because there is no direct electrical connection between the primary and secondary windings; the audio signal is passed via magnetic coupling. An opto-coupler also provides galvanic isolation, as the signal is passed via light modulation.

Gate (CV) — A synthesiser control signal generated whenever a key is depressed on an electronic keyboard and used to trigger envelope generators and other events that need to be synchronised to key action.

Gooseneck — A flexible tube often used to support microphones or small lights. Sometimes also known as a ‘Swan Neck’.

Ground — An alternative term for the electrical Earth or 0 Volts reference. In mains wiring, the ground cable is often physically connected to the planet’s earth via a long conductive metal spike.

Ground Loop / Ground Loop Hum — A condition created when two or more devices are interconnected in such a way that a loop is created in the ground circuit. This can result in audible hums or buzzes in analogue equipment, or unreliability and audio glitches in digital equipment. Typically, a ground loop is created when two devices are connected together using one or more screened audio cables, and both units are also plugged into the mains supply with safety ground connections via the mains plug earth pins. The loop exists between one mains plug, to the first device, through the audio cable screen to the second device, back to the mains supply via the second mains plug, and round to the first device via the building’s power wiring. If the two mains socket ground terminals happen to be at slightly different voltages (which is not unusual), and small current will flow around the ground loop. Although not dangerous, this can result in audible hums or buzzes in poorly designed equipment.

Ground loops can often be prevented by ensuring that the connected audio equipment is powered from a single mains socket or distribution board, thus minimising the loop. In extreme cases it may be necessary to disconnect the screen connection at one end of some of the audio cables, or to use audio isolating transformers in the signal paths. The mains plug earth connection must NEVER be disconnected to try to resolve a ground loop problem as this will render the equipment potentially LETHAL.

Hard Disk Drive (cf. Solid-state Drive) — The conventional means of computer data storage. One or more metal disks (hard disks) hermetically sealed in an enclosure with integral drive electronics and interfacing. The disks coated in a magnetic material and spun at high speed (typically 7200rpm for audio applications). A series of movable arms carrying miniature magnetic heads are arranged to move closely over the surface of the discs to record (write) and replay (read) data.

Head — The part of a tape machine or disk drive that reads and/or writes information magnetically to and from the storage media.

Hub — Normally used in the context of the USB computer data interface. A hub is a device used to expand a single USB port into several, enabling the connection of multiple devices. Particularly useful where multiple software program authorisation dongles must be connected to the computer.

IC — An abbreviation of Integrated Circuit, a collection of miniaturised transistors and other components on a single silicon wafer, designed to perform a specific function.

IEM — In-Ear Monitor. A wirelessly-connected foldback monitoring system, often used by musicians on stage with in-ear earpieces.

Impedance — The ‘resistance’ or opposition of a medium to a change of state, often encountered in the context of electrical connections (and the way signals of different frequencies are treated), or acoustic treatment (denoting the resistance it presents to air flow). Although measured in Ohms, the impedance of a ‘reactive’ device such as a loudspeaker drive unit will usually vary with signal frequency and will be higher than the resistance when measured with a static DC voltage. Signal sources have an output impedance and destinations have an input impedance. In analogue audio systems the usually arrangement is to source from a very low impedance and feed a destination of a much higher (typically 10 times) impedance. This is called a ‘voltage matching’ interface. In digital and video systems it is more normal to find ‘matched impedance’ interfacing where the source, destination and cable all have the same impedance (eg. 75 Ohms in the case of S/PDIF).

Microphones have a very low impedance (150 Ohms or so) while microphone preamps provide an input impedance of 1,500 Ohms or more. Line inputs typically have an impedance of 10,000 Ohms and DI boxes may provide an input impedance of as much as 1,000,000 Ohms to suit the relatively high output impedance of typical guitar pickups.

Inductor — A reactive component that presents an increasing impedance with frequency.

Initialise — Resetting a device to its ‘start-up’ state. Sometimes used to mean restoring a piece of equipment to its factory default settings.

Insert Points — The provision on a mixing console or ‘channel strip’ processor of a facility to break into the signal path through the unit to insert an external processor. Budget devices generally use a single connection (usually a TRS socket) with unbalanced send and return signals on separate contacts, requiring a splitter or Y-cable to provide separate send (input to the external device) and return (output from external device) connections . High end units tend to provide separate balanced send and return connections. (cf. Effects Loop)

Input Impedance — The input impedance of an electrical network is the ‘load’ into which a power source delivers energy. In modern audio systems the input impedance is normally about ten times higher than the source impedance — so a typical microphone preamp has an input impedance of between 1500 and 2500 Ohms, and a line input is usually between 10 and 50k Ohms.

Insulator — A material that does not conduct electricity.

Interface — A device that acts as an intermediary to two or more other pieces of equipment. For example, a MIDI interface enables a computer to communicate with MIDI instruments and keyboards.

Intermittent — Something that happens occasionally and unpredictably, typically a fault condition.

Intermodulation Distortion — A form of non-linear distortion that introduces frequencies not present in and musically unrelated to the original signal. These are invariably based on the sum and difference products of the original frequencies.

I/O — The input/output connections of a system.

IPS — Inches Per Second. Used to describe tape speed. Also the Institute of Professional Sound.

Isopropyl Alcohol — A type of alcohol commonly used for cleaning and de-greasing tape machine heads and guides.

Jackfield — A system of panel-mounted connectors used to bring inputs and outputs to a central point from where they can be routed using plug-in patch cords. Also called a patchbay.

Jack Plug — A commonly used audio connector, usually ¼ inch in diameter and with either two terminals (tip and sleeve known as TS) or three (tip, ring, sleeve called TRS). The TS version can only carry unbalanced mono signals, and is often used for electric instruments (guitars, keyboards, etc). The TRS version is used for unbalanced stereo signals (eg for headphones) or balanced mono signals.

K-Weighting — A form of electrical filter which is designed to mimic the relative sensitivity of the human ear to different frequencies in terms of pereceived loudness. It is broadly similar to the A-Weighting curve, except that it adds a shelf  boost above 2kHz. This filter is an integral element of the ITU-R BS.1770 loudness measurement protocol.

Lay Length — The distance along the length of a cable over which the twisted core wires complete one complete turn. Shorter lay lengths provide better rejection of electromagnetic interference, but make the cable less flexible and more expensive.

LED — Light Emitting Diode. A form of solid state lamp.

LCD — Liquid Crystal Display.

LFO — Low Frequency Oscillator, often found in synths or effects using modulation.

Logic — A type of electronic circuitry used for processing binary signals comprising two discrete voltage levels.

Loom — A number of separate cables bound together for neatness and convenience.

Low Frequency Oscillator (LFO) — An oscillator used as a modulation source, usually operating with frequencies below 20Hz. The most common LFO waveshape is the sine wave, though there is often a choice of sine, square, triangular and sawtooth waveforms.

Loudspeaker (also Monitor and Speaker) — A device used to convert an electrical audio signal into an acoustic sound wave. An accurate loudspeaker intended for critical sound auditioning purposes.

 

 

General audio terminologies:

Acoustic Signature: The acoustic signature of a system is data containing all of the sound characteristics of a system. This includes such things as reverb time, frequency response and other timbral qualities. Impulse files used by Acoustic Mirror can be thought of as acoustic signatures.

Additive Synthesis — A system for generating audio waveforms or sounds by combining basic waveforms or sampled sounds prior to further processing with filters and envelope shapers. The Hammond tonewheel organ was one of the first additive synthesizers.

ADSR — When creating artificial waveforms in a synthesizer, changes in the signal amplitude over time are controlled by an ‘envelope generator’ which typically has controls to adjust the Attack, Sustain, Decay and Release times, controlled by the pressing and subsequent release of a key on the keyboard. The Attack phase determines the time taken for the signal to grow to its maximum amplitude, triggered by the pressing of a key. The envelope then immediately enters the Decay phase during which time the signal level reduces until it reaches the Sustain level set by the user. The signal remains at this level until the key is released, at which point the Release phase is entered and the signal level reduces back to zero.

AES — Acronym for Audio Engineering Society, one of the industry’s professional audio associations. (www.aes.org)

A-Law: A-Law is a compounded compression algorithm for voice signals defined by the Geneva Recommendations (G.711). The G.711 recommendation defines A-Law as a method of encoding 16-bit PCM signals into a nonlinear 8-bit format. The algorithm is commonly used in United States telecommunications. A-Law is very similar to µ-Law; however, each uses a slightly different coder and decoder.

Aliasing: A type of distortion that occurs when digitally recording high frequencies with a low sample rate. For example, in a motion picture, when a car’s wheels appear to slowly spin backward while the car is quickly moving forward, you are seeing the effects of aliasing. Similarly, when you try to record a frequency greater than one half of the sampling rate (the Nyquist Frequency), instead of hearing a high pitch, you may hear a low-frequency rumble.

Ambience — The result of sound reflections in a confined space being added to the original sound. Ambience may also be created electronically by some digital reverb units. The main difference between ambience and reverberation is that ambience doesn’t have the characteristic long delay time of reverberation; the reflections mainly give the sound a sense of space.

Amp (Ampere) — Unit of electrical current (A).

Amplitude — The waveform signal level. It can refer to acoustic sound levels or electrical signal levels.

Amplitude Modulation: Amplitude Modulation (AM) is a process whereby the amplitude (loudness) of a sound is varied over time. When varied slowly, a tremolo effect occurs. If the frequency of modulation is high, many side frequencies are created that can strongly alter the timbre of a sound.

Analog: When discussing audio, this term refers to a method of reproducing a sound wave with voltage fluctuations that are analogous to the pressure fluctuations of the sound wave. This is different from digital recording in that these fluctuations are infinitely varying rather than discrete changes at sample time. See Quantization.

Analogue (cf. Digital) — The origin of the term is that the electrical audio signal inside a piece of equipment can be thought of as being ‘analogous’ to the original acoustic signal. Analogue circuitry uses a continually changing voltage or current to represent the audio signal.

Analogue Synthesis — A system for synthesizing sounds by means of analogue circuitry, usually by filtering simple repeating waveforms.

Arming — Arming a track or channel on a recording device places it in a condition where it is ready to record audio when the system is placed in record mode. Unarmed tracks won’t record audio even if the system is in record mode. When a track is armed the system monitoring usually auditions the input signal throughout the recording, whereas unarmed tracks usually replay any previously recorded audio.

Arpeggiator — A device (or software) that allows a MIDI instrument to sequence around any notes currently being played. Most arpeggiators also allows the sound to be sequenced over several octaves, so that holding down a simple chord can result in an impressive repeating sequence of notes.

ASCII — American Standard Code for Information Interchange. An internationally

Audio Data Reduction — A system used to reduce the amount of data needed to represent some information such as an audio signal. Lossless audio data reduction systems, (eg. FLAC and ALAC) can fully and precisely reconstruct the original audio data with bit-accuracy, but the amount of data reduction is rarely much more than 2:1. Lossy data audio reduction systems (eg. MPeg. AAC, AC3 and others) permanently discard audio information that is deemed to have been ‘masked’ by more prominent sounds. The original data can never be retrieved, but the reduction in total data can be considerable (12:1 is common).

Audio Frequency — Signals in the range of human audio audibility. Nominally 20Hz to 20kHz.

Balance — This word has several meanings in recording. It may refer to the relative levels of the left and right channels of a stereo recording (eg. Balance Control), or it may be used to describe the relative levels of the various instruments and voices within a mix (ie. Mix balance).

Bandwidth — The range of frequencies passed by an electronic circuit such as an amplifier, mixer or filter. The frequency range is usually measured at the points where the level drops by 3dB relative to the maximum.

Baseline: The baseline of a waveform is also referred to as the zero-amplitude axis or negative infinity.

Beats Per Minute (BPM): The tempo of a piece of music can be written as a number of beats in one minute. If the tempo is 60 BPM, a single beat will occur once every second.

Bias — A high-frequency signal used in analogue recording to improve the accuracy of the recorded signal and to drive the erase head. Bias is generated by a bias oscillator.

Bit: The most elementary unit in digital systems. Its value can only be 1 or 0, corresponding to a voltage in an electronic circuit. Bits are used to represent values in the binary numbering system. As an example, the 8-bit binary number 10011010 represents the unsigned value of 154 in the decimal system. In digital sampling, a binary number is used to store individual sound levels, called samples.

Bit Depth: The number of bits used to represent a single sample. For example, 8- or 16-bit are common sample sizes. While 8-bit samples take up less memory (and hard disk space), they are inherently noisier than 16- or 24-bit samples.

Bit Rate — The number of data bits replayed or transferred in a given period of time (normally one second). Normally expressed in terms of kb/s (kilo bits per second) or Mb/s (mega bits per second). For example, the bit rate of a standard CD is (2 channels x 16 bits per sample x 44.1 thousand samples per second) = 1411.2 kilobits/second. Popular MP3 file format bit rates range from 128kb/s to 320kb/s, while the Dolby Digital 5.1 surround soundtrack on a DVD-Video typically ranges between 384 and 448kb/s.

Bi-Timbral — A synthesizer than can generate two different sounds simultaneously

Bouncing — The process of mixing two or more recorded tracks together and re-recording these onto another track.

BPM — Beats Per Minute.

Buffer: Memory used as an intermediate repository in which data is temporarily held while waiting to be transferred between two locations. A buffer ensures that there is an uninterrupted flow of data between computers. Media players may need to rebuffer when there is network congestion.

Bus: A virtual pathway where signals from tracks and effects are mixed. A bus’s output is a physical audio device in the computer from which the signal will be heard.

Byte: Refers to a set of 8 bits. An 8-bit sample requires one byte of memory to store, while a 16-bit sample takes two bytes of memory to store.

Cut-off Frequency — The frequency above or below which attenuation begins in a filter circuit.

Cycle — One complete vibration (from maximum peak, through the negative peak, and back to the maximum again) of a sound source or its electrical equivalent. One cycle per second is expressed as 1 Hertz (Hz).

Damping — The control of a resonant device. In the context of reverberation, damping refers to the rate at which the reverberant energy is absorbed by the various surfaces in the environment. In the context of a loudspeaker it relates to the cabinet design and internal acoustic absorbers.

DANTE — A form of audio-over-IP (layer 3) created by Australian company Audinate in 2006. DANTE is an abbreviation of ‘Digital Audio Network Through Ethernet’. The format provides low-latency multichannel audio over standard ethernet intrastructures. it has been widely adopted in the broadcast, music studio, and live sound sectors.

DAW — (Digital Audio Workstation): A term first used in the 1980s to describe early ‘tapeless’ recording/sampling machines like the Fairlight and Synclavier. Nowadays, DAW is more commonly used to describe Audio+MIDI ‘virtual studio’ software programs such as Cubase, Logic Pro, Digital Performer, Sonar and such-like. Essentially elaborate software running on a bespoke or generic computer platform which is designed to replicate the processes involved in recording, replaying, mixing and processing real or virtual audio signals. Many modern DAWs incorporate MIDI sequencing facilities as well as audio manipulation, a range of effects and sound generation.

Decibel dB — The deciBel is a method of expressing the ratio between two quantities in a logarithmic fashion. Used when describing audio signal amplitudes because the logarithmic nature matches the logarithmic character of the human sense of hearing. The dB is used when comparing one signal level against another (such as the input and output levels of an amplifier or filter). When the two signal amplitudes are the same, the decibel value is 0dB. If one signal has twice the amplitude of the other the decibel value is +6dB, and if half the size it is -6dB.

When one signal is being compared to a standard reference level the term is supplemented with a suffix letter representing the specific reference. 0dBu implies a reference voltage of 0.775V rms, while 0dBV relates a reference voltage of 1.0V rms. The two most common standard audio level references are +4dBu (1.223V rms) and -10dBV (0.316V rms). The actual level difference between these is close to 12dB. The term dBm is also sometimes encountered, and this relates to an amount of power rather than a voltage, specifically 1mW dissipated into 600 Ohms (which happens to generate a voltage of 0.775V rms). When discussing acoustic sound levels, 0dB SPL (sound pressure level) is the typical threshold of human hearing at 1kHz.

dB/Octave — A means of measuring the slope or steepness of a filter. The gentlest audio filter is typically 6dB/Octave (also called a first-order slope). Higher values indicate sharper filter slopes. 24dB/octave (fourth order) is the steepest normally found in analogue audio applications.

Decay — The progressive reduction in amplitude of a sound or electrical signal over time, eg. The reverb decay of a room. In the context of an ADSR envelope shaper, the Decay phase starts as soon as the Attack phase has reached its maximum level.

Dithering: Dithering is the practice of adding noise to a signal to mask quantization noise.

Dolby — A manufacturer of analogue and digital audio equipment in the fields of tape noise reduction systems and cinema and domestic surround sound equipment. Dolby’s noise-reduction systems included types B, C and S for domestic and semi-professional machines, and types A and SR for professional machines. Recordings made using one of these systems must also be replayed via the same system. These systems varied in complexity and effectiveness, but essentially they all employed multiband encode/decode processing that raised low-level signals during recording, and reversed the process during playback. Dolby’s surround sound systems started with an analogue phase-matrix system with a very elaborate active-steering decoder called ProLogic, before moving into the digital realm with Dolby Digital, Dolby Digital Plus, Dolby True HD and others.

Dynamics — A way of describing the relative levels within a piece of music.

Dynamic Range: The difference between the maximum and minimum signal levels. It can refer to a musical performance (high-volume vs. low-volume signals) or to electrical equipment (peak level before distortion vs. noise floor).

Effect — A treatment applied to an audio signal in order to change or enhance it in some creative way. Effects often involve the use of delays, and include such treatments as reverb and echo.

Envelope — The way in which the amplitude of a sound signal varies over time.

Equivalent Input Noise — A means of describing the intrinsic electronic noise at the output of an amplifier in terms of an equivalent input noise, taking into account the amplifier’s gain.

Fast Fourier Transform (FFT) Analysis: A Fourier Transform is the mathematical method used to convert a waveform from the Time Domain to the Frequency Domain.

Since the Fourier Transform is computationally intensive, it is common to use a technique called a Fast Fourier Transform (FFT) to perform spectral analysis. The FFT uses mathematical shortcuts to lower the processing time at the expense of putting limitations on the analysis size.

The analysis size, also referred to as the FFT size, indicates the number of samples from the sound signal used in the analysis and also determines the number of discrete frequency bands. When a high number of frequency bands are used, the bands have a smaller bandwidth, which allows for more accurate frequency readings.

Foldback — A system for making one or more separate mixes audible to musicians while performing, recording and overdubbing. Also known as a Cue mix. May be auditioned via headphones, IEMs or wedge monitors.

Formant — The frequency components or resonances of an instrument or voice sound that doesn’t change with the pitch of the note being played or sung. For example, the body resonance of an acoustic guitar remains constant, regardless of the note being played.

Frequency — The number of complete cycles of a repetitive waveform that occur in 1 second. A waveform which repeats once per second has a frequency of 1Hz (Hertz).

Frequency Response — The variation in amplitude relative to the signal frequency. A measurement of the frequency range that can be handled by a specific piece of electrical equipment or loudspeaker. (Also see Bandwidth)

FSK — Frequency Shift Keying. An obsolete method of recording a synchronisation control signal onto tape by representing it as two alternating tones.

Fundamental — The lowest frequency component in a harmonically complex sound. (Also see harmonic and partial.)

Gain — The amount by which a circuit amplifies a signal, normally denoted in decibels.

Glitch — Describes an unwanted short term corruption of a signal, or the unexplained, short term malfunction of a piece of equipment.

Group — A collection of signals within a mixer that are combined and routed through a separate fader to provide overall control. In a multitrack mixer several groups are provided to feed the various recorder track inputs.

Harmonic — High frequency components of a complex waveform, where the harmonic frequency is an integer multiple of the fundamental.

Headroom — The available ‘safety margin’ in audio equipment required to accommodate unexpected loud audio transient signals. It is defined as the region between the nominal operating level (0VU) and the clipping point. Typically, a high quality analogue audio mixer or processor will have a nominal operating level of +4dBu and a clipping point of +24dBu — providing 20dB of headroom. Analogue meters, by convention, don’t show the headroom margin at all; but in contrast, digital systems normally do — hence the need to try to restrict signal levels to average around -20dBFS when tracking and mixing with digital systems to maintain a sensible headroom margin. Fully post-produced signals no longer require headroom as the peak signal level is known and controlled. For this reason it has become normal to create CDs with zero headroom.

Hertz (Hz) — The standard measurement of frequency. 10Hz means ten complete cycles of a repeating waveform per second.

Head-Related Transfer Function (HRTF): Sounds are perceived differently depending on the direction the sound comes from. This occurs because of the echoes bouncing from your shoulders and nose and the shape of your ears. A head-related transfer function contains the frequency and phase response information required to make a sound appear to originate from a certain direction in 3-dimensional space.

Hertz (Hz): The unit of measurement for frequency or cycles per second (CPS).

High Resolution — A misnomer, but used to refer to digital formats with long word-lengths and high sample rates, eg. 24/96 or 24/192. Audio resolution is infinite and identical to analogue systems in properly configured digital systems. Word-length defines only the system’s signal-to-noise ratio (equivalent to tape width in analogue systems) , while sample rate defines only the audio bandwidth (equivalent to tape speed in analogue systems).

Hiss — Random noise caused by random electrical fluctuations.

Hum — Audio Signal contamination caused by the addition of low frequencies, usually related to the mains power frequency.

Hysteresis — A condition whereby the state of a system is dependent on previous events or, in other words, the system’s output can lag behind the input. Most commonly found in audio in the behaviour of ferro-magnetic materials such as in transformers and analogue tape heads, or in electronic circuits such a ‘switch de-bouncing’. Another example is the way a drop-down box on a computer menu remains visible for a short while after the mouse is moved.

Hz — The SI symbol for Hertz, the unit of frequency.

Inverse Telecine (IVTC): Telecine is the process of converting 24 fps (cinema) source to 30 fps video (television) by adding pulldown fields. Inverse telecine, then, is the process of converting 30 fps (television) video to 24 fps (cinema) by removing pulldown.

k — (lower-case k) The standard abbreviation for kilo, meaning a multiplier of 1000 (one thousand). Used as a prefix to other values to indicate magnitude, eg. 1kHz = 1000Hz, 1kOhm = 1000 Ohms.

K-Metering — An audio level metering format developed by mastering engineer Bob Katz which must be used with a monitoring system set up to a calibrated acoustic reference level. Three VU-like meter scales are provided, differing only in the displayed headroom margin. The K-20 scale is used for source recording and wide dynamic-range mixing/mastering, and affords a 20dB headroom margin. The K-14 scale allows 14dB of headroom and is intended for most pop music mixing/mastering, while the K-12 scale is intended for material with a more heavily restricted dynamic-range, such as for broadcasting. In all cases, the meter’s zero mark is aligned with the acoustic reference level.

Latency (cf. Delay) — The time delay experienced between a sound or control signal being generated and it being auditioned or taking effect, measured in seconds.

Load — An electrical load is a circuit that draws power from another circuit or power supply. The term also describes reading data into a computer system.

Loudness — The perceived volume of an audio signal.

Low-range (low, lows) — The lower portion of the audible frequency spectrum, typically denoting frequencies below about 1kHz

LUFS — The standard measurement of loudness, as used on Loudness Meters corresponding to the ITU-TR BS1770 specification. the acronym stands for ‘Loudness Units (relative to) Full Scale. Earlier versions of the specification used LKFS instead, and this label remains in use in America. The K refers to the ‘K-Weighting’ filter used in the signal measurement process.

Mid-Side recording: Mid-side (MS) recording is a microphone technique in which one mic is pointed directly towards the source to record the center (mid) channel, and the other mic is pointed 90 degrees away from the source to record the stereo image. For proper playback on most systems, MS recordings must be converted to your standard left/right (also called AB) track.

Mix: Mixing allows multiple sound files to be blended into one file at user-defined relative levels.

Multiple-Bit-Rate Encoding: Multiple-bit-rate encoding allows you to create a single file that contains streams for several bit rates. A multiple-bit-rate file can accommodate users with different Internet connection speeds, or these files can automatically change to a different bit rate to compensate for network congestion without interrupting playback.To take advantage of multiple-bit-rate encoding, you must publish your media files to a Windows Media server or a RealServerG2.

Nyquist Frequency: The Nyquist Frequency (or Nyquist Rate) is one half of the sample rate and represents the highest frequency that can be recorded using the sample rate without aliasing. For example, the Nyquist Frequency of 44,100 Hz is 22,050 Hz. Any frequencies higher than 22,050 Hz will produce aliasing distortion in the sample if no anti-aliasing filter is used while recording.

Punch-In: Punching-in during recording means automatically starting and stopping recording at user-specified times.

Root Mean Square (RMS): The Root Mean Square (RMS) of a sound is a measurement of the intensity of the sound over a period of time. The RMS level of a sound corresponds to the loudness perceived by a listener when measured over small intervals of time.

Sample: The word sample is used in many different (and often confusing) ways when talking about digital sound. Here are some of the different meanings:

  • A discrete point in time which a sound signal is divided into when digitizing. For example, an audio CD-ROM contains 44,100 samples per second. Each sample is really only a number that contains the amplitude value of a waveform measured over time.
  • A sound that has been recorded in a digital format; used by musicians who make short recordings of musical instruments to be used for composition and performance of music or sound effects. These recordings are called samples. In this Help system, we try to use sound file instead of sample whenever referring to a digital recording.
  • The act of recording sound digitally, i.e. to sample an instrument means to digitize and store it.

Sample Rate: The Sample Rate (also referred to as the Sampling Rate or Sampling Frequency) is the number of samples per second used to store a sound. High sample rates, such as 44,100 Hz provide higher fidelity than lower sample rates, such as 11,025 Hz. However, more storage space is required when using higher sample rates.

Sample Value: The Sample Value (also referred to as sample amplitude) is the number stored by a single sample.

  • In 32-bit audio, these values range from -2147483648 to 2147483647.
  • In 24-bit audio, they range from -8388608 to 8388607. 
  • In 16-bit audio, they range from -32768 to 32767. 
  • In 8-bit audio, they range from -128 to 127. 

The maximum allowed sample value is often referred to as 100% or 0 dB.

Sampler: A sampler is a device that records sounds digitally. Although, in theory, your sound card is a sampler, the term usually refers to a device used to trigger and play back samples while changing the sample pitch.

Secure Digital Music Initiative (SDMI): The Secure Digital Music Initiative (SDMI) is a consortium of recording industry and technology companies organized to develop standards for the secure distribution of digital music. The SDMI specification will answer consumer demand for convenient accessibility to quality digital music, enable copyright protection for artists’ work, and enable technology and music companies to build successful businesses.

SCSI MIDI Device Interface (SMDI): SMDI is a standardized protocol for music equipment communication. Instead of using the slower standard MIDI serial protocol, it uses a SCSI bus for transferring information. Because of its speed, SMDI is often used for sample dumps.

Sign-Bit: Data that has positive and negative values and uses zero to represent silence. Unlike the signed format, twos complement is not used. Instead, negative values are represented by setting the highest bit of the binary number to one without complementing all other bits. This is a format option when opening and saving RAW sound files.

Signed: Data that has positive and negative twos complement values and uses zero to represent silence. This is a format option when opening and saving raw sound files.

Signal-to-Noise Ratio: The signal-to-noise ratio (SNR) is a measurement of the difference between a recorded signal and noise levels. A high SNR is always the goal.

The maximum signal-to-noise ratio of digital audio is determined by the number of bits per sample. In 16-bit audio, the signal to noise ratio is 96 dB, while in 8-bit audio its 48 dB. However, in practice this SNR is never achieved, especially when using low-end electronics.

Small Computer Systems Interface (SCSI): SCSI is a standard interface protocol for connecting devices to your computer. The SCSI bus can accept up to seven devices at a time including CD ROM drives, hard drives and samplers.

Society of Motion Picture and Television Engineers (SMPTE): SMPTE time code is used to synchronize time between devices. The time code is calculated in hours:minutes:second:frames, where frames are fractions of a second based on the frame rate. Frame rates for SMPTE time code are 24, 25, 29.97 and 30 frames per second.

Sound Card: The sound card is the audio interface between your computer and the outside world. It is responsible for converting analog signals to digital and vice-versa. There are many sound cards available on the market today, covering the spectrum of quality and price. Sound Forge software will work with any Windows-compatible sound card.

Streaming: A method of data transfer in which a file is played while it is downloading. Streaming technologies allow Internet users to receive data as a steady, continuous stream after a brief buffering period. Without streaming, users would have to download files completely before playback.

Telecine: The process of creating 30 fps video (television) from 24 fps film (cinema).

Tempo: Tempo is the rhythmic rate of a musical composition, usually specified in beats per minute (BPM).

µ-Law: µ-Law (mu-Law) is a companded compression algorithm for voice signals defined by the Geneva Recommendations (G.711). The G.711 recommendation defines µ-Law as a method of encoding 16-bit PCM signals into a nonlinear 8-bit format. The algorithm is commonly used in European and Asian telecommunications. µ-Law is very similar to A-Law, however, each uses a slightly different coder and decoder.

General audio terminologies:

Acoustic Signature: The acoustic signature of a system is data containing all of the sound characteristics of a system. This includes such things as reverb time, frequency response and other timbral qualities. Impulse files used by Acoustic Mirror can be thought of as acoustic signatures.

Additive Synthesis — A system for generating audio waveforms or sounds by combining basic waveforms or sampled sounds prior to further processing with filters and envelope shapers. The Hammond tonewheel organ was one of the first additive synthesizers.

ADSR — When creating artificial waveforms in a synthesizer, changes in the signal amplitude over time are controlled by an ‘envelope generator’ which typically has controls to adjust the Attack, Sustain, Decay and Release times, controlled by the pressing and subsequent release of a key on the keyboard. The Attack phase determines the time taken for the signal to grow to its maximum amplitude, triggered by the pressing of a key. The envelope then immediately enters the Decay phase during which time the signal level reduces until it reaches the Sustain level set by the user. The signal remains at this level until the key is released, at which point the Release phase is entered and the signal level reduces back to zero.

AES — Acronym for Audio Engineering Society, one of the industry’s professional audio associations. (www.aes.org)

A-Law: A-Law is a compounded compression algorithm for voice signals defined by the Geneva Recommendations (G.711). The G.711 recommendation defines A-Law as a method of encoding 16-bit PCM signals into a nonlinear 8-bit format. The algorithm is commonly used in United States telecommunications. A-Law is very similar to µ-Law; however, each uses a slightly different coder and decoder.

Aliasing: A type of distortion that occurs when digitally recording high frequencies with a low sample rate. For example, in a motion picture, when a car’s wheels appear to slowly spin backward while the car is quickly moving forward, you are seeing the effects of aliasing. Similarly, when you try to record a frequency greater than one half of the sampling rate (the Nyquist Frequency), instead of hearing a high pitch, you may hear a low-frequency rumble.

Ambience — The result of sound reflections in a confined space being added to the original sound. Ambience may also be created electronically by some digital reverb units. The main difference between ambience and reverberation is that ambience doesn’t have the characteristic long delay time of reverberation; the reflections mainly give the sound a sense of space.

Amp (Ampere) — Unit of electrical current (A).

Amplitude — The waveform signal level. It can refer to acoustic sound levels or electrical signal levels.

Amplitude Modulation: Amplitude Modulation (AM) is a process whereby the amplitude (loudness) of a sound is varied over time. When varied slowly, a tremolo effect occurs. If the frequency of modulation is high, many side frequencies are created that can strongly alter the timbre of a sound.

Analog: When discussing audio, this term refers to a method of reproducing a sound wave with voltage fluctuations that are analogous to the pressure fluctuations of the sound wave. This is different from digital recording in that these fluctuations are infinitely varying rather than discrete changes at sample time. See Quantization.

Analogue (cf. Digital) — The origin of the term is that the electrical audio signal inside a piece of equipment can be thought of as being ‘analogous’ to the original acoustic signal. Analogue circuitry uses a continually changing voltage or current to represent the audio signal.

Analogue Synthesis — A system for synthesizing sounds by means of analogue circuitry, usually by filtering simple repeating waveforms.

Arming — Arming a track or channel on a recording device places it in a condition where it is ready to record audio when the system is placed in record mode. Unarmed tracks won’t record audio even if the system is in record mode. When a track is armed the system monitoring usually auditions the input signal throughout the recording, whereas unarmed tracks usually replay any previously recorded audio.

Arpeggiator — A device (or software) that allows a MIDI instrument to sequence around any notes currently being played. Most arpeggiators also allows the sound to be sequenced over several octaves, so that holding down a simple chord can result in an impressive repeating sequence of notes.

ASCII — American Standard Code for Information Interchange. An internationally

Audio Data Reduction — A system used to reduce the amount of data needed to represent some information such as an audio signal. Lossless audio data reduction systems, (eg. FLAC and ALAC) can fully and precisely reconstruct the original audio data with bit-accuracy, but the amount of data reduction is rarely much more than 2:1. Lossy data audio reduction systems (eg. MPeg. AAC, AC3 and others) permanently discard audio information that is deemed to have been ‘masked’ by more prominent sounds. The original data can never be retrieved, but the reduction in total data can be considerable (12:1 is common).

Audio Frequency — Signals in the range of human audio audibility. Nominally 20Hz to 20kHz.

Balance — This word has several meanings in recording. It may refer to the relative levels of the left and right channels of a stereo recording (eg. Balance Control), or it may be used to describe the relative levels of the various instruments and voices within a mix (ie. Mix balance).

Bandwidth — The range of frequencies passed by an electronic circuit such as an amplifier, mixer or filter. The frequency range is usually measured at the points where the level drops by 3dB relative to the maximum.

Baseline: The baseline of a waveform is also referred to as the zero-amplitude axis or negative infinity.

Beats Per Minute (BPM): The tempo of a piece of music can be written as a number of beats in one minute. If the tempo is 60 BPM, a single beat will occur once every second.

Bias — A high-frequency signal used in analogue recording to improve the accuracy of the recorded signal and to drive the erase head. Bias is generated by a bias oscillator.

Bit: The most elementary unit in digital systems. Its value can only be 1 or 0, corresponding to a voltage in an electronic circuit. Bits are used to represent values in the binary numbering system. As an example, the 8-bit binary number 10011010 represents the unsigned value of 154 in the decimal system. In digital sampling, a binary number is used to store individual sound levels, called samples.

Bit Depth: The number of bits used to represent a single sample. For example, 8- or 16-bit are common sample sizes. While 8-bit samples take up less memory (and hard disk space), they are inherently noisier than 16- or 24-bit samples.

Bit Rate — The number of data bits replayed or transferred in a given period of time (normally one second). Normally expressed in terms of kb/s (kilo bits per second) or Mb/s (mega bits per second). For example, the bit rate of a standard CD is (2 channels x 16 bits per sample x 44.1 thousand samples per second) = 1411.2 kilobits/second. Popular MP3 file format bit rates range from 128kb/s to 320kb/s, while the Dolby Digital 5.1 surround soundtrack on a DVD-Video typically ranges between 384 and 448kb/s.

Bi-Timbral — A synthesizer than can generate two different sounds simultaneously

Bouncing — The process of mixing two or more recorded tracks together and re-recording these onto another track.

BPM — Beats Per Minute.

Buffer: Memory used as an intermediate repository in which data is temporarily held while waiting to be transferred between two locations. A buffer ensures that there is an uninterrupted flow of data between computers. Media players may need to rebuffer when there is network congestion.

Bus: A virtual pathway where signals from tracks and effects are mixed. A bus’s output is a physical audio device in the computer from which the signal will be heard.

Byte: Refers to a set of 8 bits. An 8-bit sample requires one byte of memory to store, while a 16-bit sample takes two bytes of memory to store.

Cut-off Frequency — The frequency above or below which attenuation begins in a filter circuit.

Cycle — One complete vibration (from maximum peak, through the negative peak, and back to the maximum again) of a sound source or its electrical equivalent. One cycle per second is expressed as 1 Hertz (Hz).

Damping — The control of a resonant device. In the context of reverberation, damping refers to the rate at which the reverberant energy is absorbed by the various surfaces in the environment. In the context of a loudspeaker it relates to the cabinet design and internal acoustic absorbers.

DANTE — A form of audio-over-IP (layer 3) created by Australian company Audinate in 2006. DANTE is an abbreviation of ‘Digital Audio Network Through Ethernet’. The format provides low-latency multichannel audio over standard ethernet intrastructures. it has been widely adopted in the broadcast, music studio, and live sound sectors.

DAW — (Digital Audio Workstation): A term first used in the 1980s to describe early ‘tapeless’ recording/sampling machines like the Fairlight and Synclavier. Nowadays, DAW is more commonly used to describe Audio+MIDI ‘virtual studio’ software programs such as Cubase, Logic Pro, Digital Performer, Sonar and such-like. Essentially elaborate software running on a bespoke or generic computer platform which is designed to replicate the processes involved in recording, replaying, mixing and processing real or virtual audio signals. Many modern DAWs incorporate MIDI sequencing facilities as well as audio manipulation, a range of effects and sound generation.

Decibel dB — The deciBel is a method of expressing the ratio between two quantities in a logarithmic fashion. Used when describing audio signal amplitudes because the logarithmic nature matches the logarithmic character of the human sense of hearing. The dB is used when comparing one signal level against another (such as the input and output levels of an amplifier or filter). When the two signal amplitudes are the same, the decibel value is 0dB. If one signal has twice the amplitude of the other the decibel value is +6dB, and if half the size it is -6dB.

When one signal is being compared to a standard reference level the term is supplemented with a suffix letter representing the specific reference. 0dBu implies a reference voltage of 0.775V rms, while 0dBV relates a reference voltage of 1.0V rms. The two most common standard audio level references are +4dBu (1.223V rms) and -10dBV (0.316V rms). The actual level difference between these is close to 12dB. The term dBm is also sometimes encountered, and this relates to an amount of power rather than a voltage, specifically 1mW dissipated into 600 Ohms (which happens to generate a voltage of 0.775V rms). When discussing acoustic sound levels, 0dB SPL (sound pressure level) is the typical threshold of human hearing at 1kHz.

dB/Octave — A means of measuring the slope or steepness of a filter. The gentlest audio filter is typically 6dB/Octave (also called a first-order slope). Higher values indicate sharper filter slopes. 24dB/octave (fourth order) is the steepest normally found in analogue audio applications.

Decay — The progressive reduction in amplitude of a sound or electrical signal over time, eg. The reverb decay of a room. In the context of an ADSR envelope shaper, the Decay phase starts as soon as the Attack phase has reached its maximum level.

Dithering: Dithering is the practice of adding noise to a signal to mask quantization noise.

Dolby — A manufacturer of analogue and digital audio equipment in the fields of tape noise reduction systems and cinema and domestic surround sound equipment. Dolby’s noise-reduction systems included types B, C and S for domestic and semi-professional machines, and types A and SR for professional machines. Recordings made using one of these systems must also be replayed via the same system. These systems varied in complexity and effectiveness, but essentially they all employed multiband encode/decode processing that raised low-level signals during recording, and reversed the process during playback. Dolby’s surround sound systems started with an analogue phase-matrix system with a very elaborate active-steering decoder called ProLogic, before moving into the digital realm with Dolby Digital, Dolby Digital Plus, Dolby True HD and others.

Dynamics — A way of describing the relative levels within a piece of music.

Dynamic Range: The difference between the maximum and minimum signal levels. It can refer to a musical performance (high-volume vs. low-volume signals) or to electrical equipment (peak level before distortion vs. noise floor).

Effect — A treatment applied to an audio signal in order to change or enhance it in some creative way. Effects often involve the use of delays, and include such treatments as reverb and echo.

Envelope — The way in which the amplitude of a sound signal varies over time.

Equivalent Input Noise — A means of describing the intrinsic electronic noise at the output of an amplifier in terms of an equivalent input noise, taking into account the amplifier’s gain.

Fast Fourier Transform (FFT) Analysis: A Fourier Transform is the mathematical method used to convert a waveform from the Time Domain to the Frequency Domain.

Since the Fourier Transform is computationally intensive, it is common to use a technique called a Fast Fourier Transform (FFT) to perform spectral analysis. The FFT uses mathematical shortcuts to lower the processing time at the expense of putting limitations on the analysis size.

The analysis size, also referred to as the FFT size, indicates the number of samples from the sound signal used in the analysis and also determines the number of discrete frequency bands. When a high number of frequency bands are used, the bands have a smaller bandwidth, which allows for more accurate frequency readings.

Foldback — A system for making one or more separate mixes audible to musicians while performing, recording and overdubbing. Also known as a Cue mix. May be auditioned via headphones, IEMs or wedge monitors.

Formant — The frequency components or resonances of an instrument or voice sound that doesn’t change with the pitch of the note being played or sung. For example, the body resonance of an acoustic guitar remains constant, regardless of the note being played.

Frequency — The number of complete cycles of a repetitive waveform that occur in 1 second. A waveform which repeats once per second has a frequency of 1Hz (Hertz).

Frequency Response — The variation in amplitude relative to the signal frequency. A measurement of the frequency range that can be handled by a specific piece of electrical equipment or loudspeaker. (Also see Bandwidth)

FSK — Frequency Shift Keying. An obsolete method of recording a synchronisation control signal onto tape by representing it as two alternating tones.

Fundamental — The lowest frequency component in a harmonically complex sound. (Also see harmonic and partial.)

Gain — The amount by which a circuit amplifies a signal, normally denoted in decibels.

Glitch — Describes an unwanted short term corruption of a signal, or the unexplained, short term malfunction of a piece of equipment.

Group — A collection of signals within a mixer that are combined and routed through a separate fader to provide overall control. In a multitrack mixer several groups are provided to feed the various recorder track inputs.

Harmonic — High frequency components of a complex waveform, where the harmonic frequency is an integer multiple of the fundamental.

Headroom — The available ‘safety margin’ in audio equipment required to accommodate unexpected loud audio transient signals. It is defined as the region between the nominal operating level (0VU) and the clipping point. Typically, a high quality analogue audio mixer or processor will have a nominal operating level of +4dBu and a clipping point of +24dBu — providing 20dB of headroom. Analogue meters, by convention, don’t show the headroom margin at all; but in contrast, digital systems normally do — hence the need to try to restrict signal levels to average around -20dBFS when tracking and mixing with digital systems to maintain a sensible headroom margin. Fully post-produced signals no longer require headroom as the peak signal level is known and controlled. For this reason it has become normal to create CDs with zero headroom.

Hertz (Hz) — The standard measurement of frequency. 10Hz means ten complete cycles of a repeating waveform per second.

Head-Related Transfer Function (HRTF): Sounds are perceived differently depending on the direction the sound comes from. This occurs because of the echoes bouncing from your shoulders and nose and the shape of your ears. A head-related transfer function contains the frequency and phase response information required to make a sound appear to originate from a certain direction in 3-dimensional space.

Hertz (Hz): The unit of measurement for frequency or cycles per second (CPS).

High Resolution — A misnomer, but used to refer to digital formats with long word-lengths and high sample rates, eg. 24/96 or 24/192. Audio resolution is infinite and identical to analogue systems in properly configured digital systems. Word-length defines only the system’s signal-to-noise ratio (equivalent to tape width in analogue systems) , while sample rate defines only the audio bandwidth (equivalent to tape speed in analogue systems).

Hiss — Random noise caused by random electrical fluctuations.

Hum — Audio Signal contamination caused by the addition of low frequencies, usually related to the mains power frequency.

Hysteresis — A condition whereby the state of a system is dependent on previous events or, in other words, the system’s output can lag behind the input. Most commonly found in audio in the behaviour of ferro-magnetic materials such as in transformers and analogue tape heads, or in electronic circuits such a ‘switch de-bouncing’. Another example is the way a drop-down box on a computer menu remains visible for a short while after the mouse is moved.

Hz — The SI symbol for Hertz, the unit of frequency.

Inverse Telecine (IVTC): Telecine is the process of converting 24 fps (cinema) source to 30 fps video (television) by adding pulldown fields. Inverse telecine, then, is the process of converting 30 fps (television) video to 24 fps (cinema) by removing pulldown.

k — (lower-case k) The standard abbreviation for kilo, meaning a multiplier of 1000 (one thousand). Used as a prefix to other values to indicate magnitude, eg. 1kHz = 1000Hz, 1kOhm = 1000 Ohms.

K-Metering — An audio level metering format developed by mastering engineer Bob Katz which must be used with a monitoring system set up to a calibrated acoustic reference level. Three VU-like meter scales are provided, differing only in the displayed headroom margin. The K-20 scale is used for source recording and wide dynamic-range mixing/mastering, and affords a 20dB headroom margin. The K-14 scale allows 14dB of headroom and is intended for most pop music mixing/mastering, while the K-12 scale is intended for material with a more heavily restricted dynamic-range, such as for broadcasting. In all cases, the meter’s zero mark is aligned with the acoustic reference level.

Latency (cf. Delay) — The time delay experienced between a sound or control signal being generated and it being auditioned or taking effect, measured in seconds.

Load — An electrical load is a circuit that draws power from another circuit or power supply. The term also describes reading data into a computer system.

Loudness — The perceived volume of an audio signal.

Low-range (low, lows) — The lower portion of the audible frequency spectrum, typically denoting frequencies below about 1kHz

LUFS — The standard measurement of loudness, as used on Loudness Meters corresponding to the ITU-TR BS1770 specification. the acronym stands for ‘Loudness Units (relative to) Full Scale. Earlier versions of the specification used LKFS instead, and this label remains in use in America. The K refers to the ‘K-Weighting’ filter used in the signal measurement process.

Mid-Side recording: Mid-side (MS) recording is a microphone technique in which one mic is pointed directly towards the source to record the center (mid) channel, and the other mic is pointed 90 degrees away from the source to record the stereo image. For proper playback on most systems, MS recordings must be converted to your standard left/right (also called AB) track.

Mix: Mixing allows multiple sound files to be blended into one file at user-defined relative levels.

Multiple-Bit-Rate Encoding: Multiple-bit-rate encoding allows you to create a single file that contains streams for several bit rates. A multiple-bit-rate file can accommodate users with different Internet connection speeds, or these files can automatically change to a different bit rate to compensate for network congestion without interrupting playback.To take advantage of multiple-bit-rate encoding, you must publish your media files to a Windows Media server or a RealServerG2.

Nyquist Frequency: The Nyquist Frequency (or Nyquist Rate) is one half of the sample rate and represents the highest frequency that can be recorded using the sample rate without aliasing. For example, the Nyquist Frequency of 44,100 Hz is 22,050 Hz. Any frequencies higher than 22,050 Hz will produce aliasing distortion in the sample if no anti-aliasing filter is used while recording.

Punch-In: Punching-in during recording means automatically starting and stopping recording at user-specified times.

Root Mean Square (RMS): The Root Mean Square (RMS) of a sound is a measurement of the intensity of the sound over a period of time. The RMS level of a sound corresponds to the loudness perceived by a listener when measured over small intervals of time.

Sample: The word sample is used in many different (and often confusing) ways when talking about digital sound. Here are some of the different meanings:

  • A discrete point in time which a sound signal is divided into when digitizing. For example, an audio CD-ROM contains 44,100 samples per second. Each sample is really only a number that contains the amplitude value of a waveform measured over time.
  • A sound that has been recorded in a digital format; used by musicians who make short recordings of musical instruments to be used for composition and performance of music or sound effects. These recordings are called samples. In this Help system, we try to use sound file instead of sample whenever referring to a digital recording.
  • The act of recording sound digitally, i.e. to sample an instrument means to digitize and store it.

Sample Rate: The Sample Rate (also referred to as the Sampling Rate or Sampling Frequency) is the number of samples per second used to store a sound. High sample rates, such as 44,100 Hz provide higher fidelity than lower sample rates, such as 11,025 Hz. However, more storage space is required when using higher sample rates.

Sample Value: The Sample Value (also referred to as sample amplitude) is the number stored by a single sample.

  • In 32-bit audio, these values range from -2147483648 to 2147483647.
  • In 24-bit audio, they range from -8388608 to 8388607. 
  • In 16-bit audio, they range from -32768 to 32767. 
  • In 8-bit audio, they range from -128 to 127. 

The maximum allowed sample value is often referred to as 100% or 0 dB.

Sampler: A sampler is a device that records sounds digitally. Although, in theory, your sound card is a sampler, the term usually refers to a device used to trigger and play back samples while changing the sample pitch.

Secure Digital Music Initiative (SDMI): The Secure Digital Music Initiative (SDMI) is a consortium of recording industry and technology companies organized to develop standards for the secure distribution of digital music. The SDMI specification will answer consumer demand for convenient accessibility to quality digital music, enable copyright protection for artists’ work, and enable technology and music companies to build successful businesses.

SCSI MIDI Device Interface (SMDI): SMDI is a standardized protocol for music equipment communication. Instead of using the slower standard MIDI serial protocol, it uses a SCSI bus for transferring information. Because of its speed, SMDI is often used for sample dumps.

Sign-Bit: Data that has positive and negative values and uses zero to represent silence. Unlike the signed format, twos complement is not used. Instead, negative values are represented by setting the highest bit of the binary number to one without complementing all other bits. This is a format option when opening and saving RAW sound files.

Signed: Data that has positive and negative twos complement values and uses zero to represent silence. This is a format option when opening and saving raw sound files.

Signal-to-Noise Ratio: The signal-to-noise ratio (SNR) is a measurement of the difference between a recorded signal and noise levels. A high SNR is always the goal.

The maximum signal-to-noise ratio of digital audio is determined by the number of bits per sample. In 16-bit audio, the signal to noise ratio is 96 dB, while in 8-bit audio its 48 dB. However, in practice this SNR is never achieved, especially when using low-end electronics.

Small Computer Systems Interface (SCSI): SCSI is a standard interface protocol for connecting devices to your computer. The SCSI bus can accept up to seven devices at a time including CD ROM drives, hard drives and samplers.

Society of Motion Picture and Television Engineers (SMPTE): SMPTE time code is used to synchronize time between devices. The time code is calculated in hours:minutes:second:frames, where frames are fractions of a second based on the frame rate. Frame rates for SMPTE time code are 24, 25, 29.97 and 30 frames per second.

Sound Card: The sound card is the audio interface between your computer and the outside world. It is responsible for converting analog signals to digital and vice-versa. There are many sound cards available on the market today, covering the spectrum of quality and price. Sound Forge software will work with any Windows-compatible sound card.

Streaming: A method of data transfer in which a file is played while it is downloading. Streaming technologies allow Internet users to receive data as a steady, continuous stream after a brief buffering period. Without streaming, users would have to download files completely before playback.

Telecine: The process of creating 30 fps video (television) from 24 fps film (cinema).

Tempo: Tempo is the rhythmic rate of a musical composition, usually specified in beats per minute (BPM).

µ-Law: µ-Law (mu-Law) is a companded compression algorithm for voice signals defined by the Geneva Recommendations (G.711). The G.711 recommendation defines µ-Law as a method of encoding 16-bit PCM signals into a nonlinear 8-bit format. The algorithm is commonly used in European and Asian telecommunications. µ-Law is very similar to A-Law, however, each uses a slightly different coder and decoder.

Waveform: A waveform is the visual representation of wave-like phenomena, such as sound or light. For example, when the amplitude of sound pressure is graphed over time, pressure variations usually form a smooth waveform. 

 

 

 

 

Audio File Formats:

A List of the most common used audio file formats / extensions in alphabetical order:

(.32) – Yamaha DX-series SysEx dumps (FM-synthesis instruments.)

(.3GP/.3GPP) file format: The .3GA file extension is used by 3GPP multimedia format files. 3GPP stands for “3rd Generation Partnership Program” and is a multimedia container format developed for 3G wireless carrier services such as MMS, PSS or IMS.3GA files identify only the audio stream saved in the 3GPP container. They are similar to 3GP files, which are used for capturing both audio and video content. Both formats are used for transmitting recordings over the Internet between 3G Smartphones and are supported by many types of devices such as Nokia or Samsung. Features:  .3GP: AMR, mono, text info.

(.3G2) – 3GPP ‘project 2’ file format: Features: 3G2: AMR / MPEG-4 AAC, mono / stereo.

(.404) – Muon DS404 bank/patch files: Features: 16-bits mono, loop, name, articulation, sample start & end offsets, multiple filter types.

(.AA) – Audible Audio: is an audio file format developed by Audible.com,  used for storing and distributing audio books sold on their website.The books on Audible.com can be downloaded in different versions and sound qualities, and the AA format is used for the lower quality audiobook versions, contrary to AAX, which is the higher quality alternative to the AA format.In its essence, the audio data stored in AA files is encoded at low bit rates using either MP3 or AAC encoding. However, what makes AA files different from standard MP3 and AAC files is that they may also contain meta data (such as chapter markers, bookmarks etc.) and that they also utilize copy-protection through a proprietary digital rights management (DRM) technology.To further ensure that copying AA files is as hard as possible, Adubile.com keeps the exact AA file format specification closed to the public, making it extremely hard for third-party developers to break the DRM encryption and invent tools that allow you to grab the necessary audio data and convert it to a popular open audio format like MP3 for example. Therefore playback of AA files is heavily limited to a narrow set of Audible-ready devices such as the Amazon Kindle or the Apple iPod.Nevertheless, there are a few clever tricks, such as playing the AA file through Windows Media Player and capturing the stereo output of your sound interface with Audacity in order to grab the audio data and export it to a more convenient audio format that can be played on any portable media player.

(.AA3) – ATRAC3 Audio: is a proprietary compressed audio file format developed by the Sony Corporation and used extensively in most of their audio and video editing applications such as Sony Vegas, Sony Sound Forge and Sony ACID. The AA3 format is also a common choice among game developers for storing game sounds and is also used for storing audio data on CDs and MiniDiscs.Audio data stored in AA3 files is encoded with the Adaptive Transform Acoustic Coding (ATRAC) algorithm. The ATRAC algorithm was originally developed in 1992 by the engineers from Sony, working in close cooperation with engineers from the LSI Corporation, having the aim to create an algorithm that allows encoding at high speeds with the lowest possible amount of power consumption (one of the reasons why, for example, batteries on Sony Walkmans last longer when playing ATRAC files rather than MP3).This led to the successful implementation of the ATRAC1 algorithm. With the addition of numerous improvements to the codec, ATRAC3 saw its release in 1999, followed by ATRAC3plus in 2002. AA3 files are based on the ATRAC3 and ATRAC3plus algorithms.ATRAC3 is a complex sub-band coding algorithm, where the raw signal is split into four sub-bands through three stacked quadrature mirror filters (QMF). Modified discrete cosine transform (MDCT) is then applied to each sub band, leading to a further compression of the signal. On the other hand, ATRAC3plus relies on 16-channel QMF filtering, followed by Generalized Harmonic Analysis (GHA) and a 128-point MDCT. Naturally, ATRAC3plus provides better audio quality compared to ATRAC3 and is therefore used on a broader set of devices, such as the Sony PlayStation 3 and Sony Hi-MD Walkmans. The major downside of the AA3 format is that it lacks decent cross-platform support.

(.AAC) – Advanced Audio Coding file format: is most commonly known as the default audio format for Apple’s iPhone and iPod devices, as well as being the format used by the iTunes store. It is also the standard format for Sony’s PlayStation 3.The .ACC format supports extensions for proprietary digital rights management (DRM) meaning you’ll need an ACC player that supports these extensions if you wish to play these files in anything other iTunes.Compared to the MP3 format, the AAC audio format produces better quality audio at the same bit-rate.

(.AAX) – Audible Enhanced Audiobook format files: which may be played by the iTunes software program. However, AAX files contain enhancements over the standard AA audiobook format in that the AAX file may contain links and images, making it especially attractive for children’s books. AAX files are an archive format that includes a timeline and images and shows them at the appropriate time to the listener.

(.ABC) – Extension: is used by ABC music files, which use text to represent musical scores.

(.AC3) – Audio Codec 3: is an audio file format developed by the Dolby Laboratories used for storing surround sound audio for playback on modern Dolby Digital 5.1 home theater surround sound systems. For that purpose, AC3 files may contain up to six channels of compressed audio data, with five of them meant to carry information for normal range speakers and a sixth channel being reserved strictly for subwoofer operation.Audio data stored in AC3 files is encoded with the Dolby AC-3 algorithm, which is Dolby’s third generation audio encoding algorithm. It is a lossy compression algorithm based on the technique of perceptual coding, which produces audio with higher fidelity than the preceding Dolby Pro-Logic technology.AC3 audio is used on nearly all modern digital mediums like DVD and Blu-Ray. For proper playback, home theater systems need to be equipped with an amplifier compatible with the Dolby Digital technology. On desktop computers, proper AC3 support is largely dependent on the sound interface. Certain sound cards cannot interpret AC3 streams properly and output a PCM stream instead, while others come with native support to AC3 surround sound audio. Devices that do not come with built-in support to the AC3 codec can use a software processing application, like the AC3Filter, to be able to provide proper playback to multimedia files containing AC3 audio.

(.ACD) – ACD extension: are known as Acid Project files; however, other file types may also use this extension.

(.ACM) – InterPlay ACM Audio: is a special compressed audio file format extensively used for storing sound effects and human speech for most of the 2D role-playing game titles published by InterPlay Entertainment in the late 1990s and early 2000s. Some of these games include Fallout, Baldur’s Gate, Icewind Dale and Planescape Torment. While the encoding scheme used in ACM files doesn’t achieve that good compression ratio compared to MP3, Vorbis or any other of the popular audio compression formats, ACM files require less system resources for decoding, which made them an optimal choice for the games from that period, which were pushing the computer hardware to its limits for the time.Although there is no official statement to confirm that, it is assumed that BioWare originally developed the ACM format, since all of the games using it were based on BioWare’s Infinity Game Engine. A typical ACM file consists of two main parts – a short header, containing important metadata, and an ACM-Stream, containing packed data blocks. The header starts with a special ACM-Signature defining the exact version of the ACM file, followed by information about the total number of samples and channels stored in the file and its bitrate. Although in theory these parameters may vary, in practice all ACM files are either 16-bit mono or 16-bit stereo at 22050 Hz. The ACM-Stream section consists of special data structures called BitBlocks that contain the encoded audio data. BitBlocks can have a variable length and can be unpacked into a variable number of samples, which, when combined together, reconstruct the encoded raw audio data.ACM files are generally not natively supported by media players. However, there are a number of applications developed such as ACM2WAV and ACMPlayer for Windows and libacm for Linux, which allow playback and conversion of ACM files to more common formats, such as WAV.

(.ACT) – File extension: ACT are associated with ‘Mikomi’ voice dictation machines (voice recorders). .ACT files store the recorded audio footage to the dictation machines internal memory, which can then be transferred to PC via USB. One model known to use the .ACT format is the ‘Mikomi ET-880’.

(.ADG) – ADG extension:  are known as Ableton Device Group files; however, other file types may also use this extension.

(.ADTS) – Audio Data Transport Stream: is a container format for audio data encoded with the Advanced Audio Coding (AAC) encoding scheme, originally introduced with the MPEG-2 specification. The Advanced Audio Coding scheme is a lossy compression scheme, which was developed with the intention of replacing the MP3 encoding format as a standard. However, while AAC-encoded files were achieving better sound quality in smaller file sizes compared to MP3 files at similar bit rates, the MP3 format was supported by a wider range of applications and devices, leaving it in a dominant position.While AAC can be encountered as a stand-alone extension, AAC-encoded data stored in ADTS files is used particularly for real-time audio streaming in online radio applications and the like. In contrast to other popular streaming formats, an ADTS file consists of series of frames, each containing header data followed by the compressed audio data. This file structure makes the ADTS an ideal audio streaming format, as it makes it easier on software applications to read the stream at any given point of its broadcast.Since AAC encoding was also featured in the updated MPEG-4 specification, it is not unusual to also discover AAC-encoded audio data stored in MPEG-4 (MP4) container files.

(.AFC) – Mass Effect Audio File Container: is an audio container format developed by Canadian video game development company BioWare for their Mass Effect science fiction video game series based on the Unreal 3 Engine.Being a container format, an AFC file can contain one or multiple audio streams. The audio data stored in AFC files may represent either speech, background music and/or sound effects used in the game.In its essence, an AFC file is just a regular OGG file with audio data encoded as RIFF WAVE via the WaveWorks Interactive Sound Engine (Wwise). The only thing that makes AFC files different from regular OGG files is the file header, which is slightly different.AFC files usually have a sampling rate of about 32000Hz at a variable bit rate (VBR) ranging from 75 to about 105kbps. There are currently no desktop media players that provide AFC playback natively. However, there is a good number of independently developed AFC conversion tools such as afc2ogg or the Gibbed Mass Effect 3 Audio Extractor, which allow users to easily convert AFC files to OGG and play them back in any OGG-compatible desktop media player.

(.AHX) – AHX extension: are known as WinAHX Tracker Module files; however, other file types may also use this extension.

(.AIF/.AIFC/.AIFF) – Audio Interchange File Format: The .AIF file format is used on computers and other electronic audio devices. The format is uncompressed and lossless (no reduction in quality) and because of this, is popular with professional audio/video applications – in particular those on the Apple Macintosh operating system. The format has now been superseded by the .AIFF format. The .AIF audio format produces files much larger than other formats, so is not suited for transferring across the internet. The compressed equivalent of this format uses the .AIFC file extension. The actual format was based on the Amiga’s IFF format, and was co-developed by Apple.

Features:

  • AIFF: 1..32-bits PCM, loop, key-range, mono / stereo, name.
  • AIFC: As AIFF plus IMA 4-bit ADPCM / G.721 ADPCM / G.723 ADPCM / G.726
  • ADPCM / DWVW ADPCM / GSM 06.10 / MACE3 / MACE6 / μ-law / A-law. 

 (.AKAI) – AKAI S-series floppies & image files: The AKAI S-900/950/1000/3000 all use floppy disk formats that can’t be read by Windows. However, you can use special third party utilities to create ‘floppy disk image’. 12-bits (S900/S950), 16-bits (S1000/S3000), mono, instrument, layers, loop, name, sample start & end offsets, low-pass filter.

(.AKP) – AKAI S5000/S6000 programs: 16-bits mono / stereo, articulation, instrument, and low-pass filter.

(.AL/.ALAW/.ALW) – Raw CCITT/ITU G.711 A-law (European telephony format) audio: 
There’s no standard file extension for this type of data and there’s no way to auto-detect the format, so you’ll have to use a .ALAW or .ALW file extension to make the program recognize what it is!: Α-Law, mono, 8 kHz.

(.AMR) – Adaptive Multi-Rate compression algorithm: is a compressed audio file format used for storing audio data encoded with the patented Adaptive Multi-Rate audio codec. The format was originally introduced and popularized by the Ericsson Corporation, but can now be found on every modern mobile device.The AMR codec is based primarily on the Algebraic Code-Excited Linear Prediction algorithm (ACELP), developed by the VoiceAge Corporation. In October 1998, AMR was adopted by the 3GPP organization as the standard audio codec for the 3G mobile system specification, making Adaptive Multi-Rate encoding dominant in VoIP and mobile communications. With that decision, an increasing number of applications were being developed to natively work with and export AMR files, gradually setting the AMR file format as the standard for storing audio on mobile devices.However, AMR compression is only useful on human speech recordings, as the ACELP algorithm is designed specifically with that purpose in mind. Therefore, AMR-encoding more complex audio data, such as music for example, usually produces unacceptable results in terms of accuracy and sound quality. Modern smartphones provide alternative applications designed primarily for recording music, which produce better results. These applications don’t use AMR at all and let you store your recordings in a more convenient format like WAV instead.Speaking of AMR files, there is a common misconception about the type of codecs AMR files use. There are indeed two variations of the AMR codec: AMR-NB (Narrow Band) and AMR-WB (Wide Band), but AMR files only use the AMR-NB codec. The AMR-NB codec provides encoding at low bitrates, while its Wide Band alternative is used mostly for transmitting high-quality voice over the mobile network and/or for storing speech recordings in AWB files.Although widely popular on mobile devices, the AMR file format is still uncommon for desktop environments. Fortunately, AMR files can easily be converted to more convenient desktop audio formats such as WAV or OGG, using free applications like the MIKSOFT Mobile Media Converter. Features: MIME ‘AMR file storage format’: 4.75, 5.15, 5.9, 6.7, 7.4, 7.95, 10.2, or 12.2.kbit/s compressed, mono, 8 kHz.

(.AMS) – Extreme Tracker modules: Features:  8/16-bits, mono, name, bank.

(.AMS/.AIS/.ASE) – Velvet Studio modules/instruments/samples: Features:  8 / 16-bits, mono, name, loop, instrument, collection.

(.AMZ) – Amazon MP3 Downloader Download: is a special file format developed by Amazon.com for use with their Amazon MP3 Downloader desktop application.AMZ files contain no actual audio data. Instead, they’re used to activate the Amazon MP3 Downloader and initiate MP3 or M4A downloads of tracks bought from the Amazon music store.AMZ files are generated dynamically by the Amazon music store and enable an easier bulk download of albums or playlists purchased by the user. The information stored inside an AMZ file is only valid for a limited amount of time, meaning that after a certain amount of time passes, the AMZ file expires and Amazon MP3 Downloader can no longer use it in order to retrieve the requested MP3 or M4A files.A major downside to AMZ files is that they lack good cross platform support and Amazon is generally restricting Linux users from retrieving AMZ files.

(.AOB) – AOB extension Files: are known as DVD-Audio Audio Object files, however other file types may also use this extension.

(.APC) – Cryo APC Audio: is an uncommon audio file format invented by Cryo Interactive Entertainment and used extensively in games developed by the company. APC files can be either mono or stereo and provide support to different sample rates. A typical Cryo APC Audio file usually consists of a header followed by sound data compressed with the IMA ADPCM audio compression algorithm. There are no popular media players that support the APC file format natively. However, a free multi-platform software application like FFmpeg can be used to convert APC files to other, more traditional audio formats.

(.APE) – Monkey Audio losslessly compressed files: is a lossless compression audio format used for storage of audio data compressed with the Monkey’s Audio algorithm originally developed in 2000 by Matthew T. Ashland. Similar to other lossless audio compression formats, APE files retain the sound quality of the original recording and can be used as a source for its exact reconstruction.At the same time, APE files are much smaller in size compared to the original source, due to the advanced compression techniques that the algorithm is based on. Tests even show that the Monkey’s Audio algorithm achieves slightly better compression rates in comparison to other popular lossless compression formats like FLAC or WV. This format is well suited e.g. for compressing important archive data without loosing any quality. Features:  8 / 16 / 24-bits, lossless compression.

(.APF) – File Extension: Used by Sony Ericsson phones to store acoustics settings.

(.APEX) – AVM Sample Studio bank files: 8 / 16-bits, loop, name, lfos, envelopes, layers, instruments, drum kit, collection.

(.ARL) – Aureal ‘Aspen’ bank files: 8 / 16-bits, loop, name, lfos, envelopes, layers, instruments, drum-kit, collection.

(.ASF) – Active Streaming Format files: 16 / 24 bit mono / stereo, 8, 11, 16, 22, 32, 44.1, 48, 88, 96 kHz, compresses to 5-160 kbit/s, text meta data.

(.AU) – Sun/NeXT/DEC audio files: AU (Sun Audio) is an audio container format originally developed by Sun Microsystems and introduced to the broader audience with their Sun Operating System (SunOS). AU files were a common encounter on SPARC and NeXT computer workstations, as well as on some of the early websites. Today, the AU file format is standardly supported on Unix-based platforms and Java.Initially, AU files were really plain and simple, storing only a basic µ-law-encoded representation of the audio data inside them. However, later on, the AU specification was upgraded to support a wider array of audio data encodings, including various PCM, IEEE floating point and ADPCM representations of the data. A-law encoding support was also added.Nowadays, the AU file format is considered a bit outdated and therefore many modern digital audio players and desktop audio applications cannot properly recognize and play AU files, which is a major disadvantage. Fortunately, there are plenty of free applications available that allow users to easily convert AU files to more common file types like WAV for example.Sun Microsystems AU files should not be confused with Audacity AU files. Audacity is a popular free audio editing application that is also cross-platform. When you import audio data into your Audacity project, this data is automatically saved as AU files when you save your project. The AU files in this case represent Audacity Audio Blocks rather than a Sun Audio files, and are therefore in no way compatible with the applications capable of providing playback to standard AU files. 8 / 16 / 24 / 32-bit PCM, 32 / 64-bit floating point, μ-law, A-law, G.721 (4-bit) ADCPM, G.723 (3 or 5-bit) ADPCM, mono / stereo.

(.AVR) -.AVR extension Files: are known as Audio Visual Research files, however other file types may also use this extension.

(.AWB) – The .AWB file extension: belongs to an audio format known as ‘Adaptive Multi-Rate Wideband’. This format is supported by many mobile phone manufactures and is the format used in creating Nokia True Tones.

(.BMW) – Buzz:  is the first ever “easy to use” free modular software based synthesizer. In contrast to .bmx files, .bmw-files do not store wave data in the file itself but rather link to the external sample data.

(.BN4) Yamaha DX-series SysEx dumps: Yamaha DX21 / DX27 / DX100 voice SysEx dumps.

(.BNK) Yamaha DX-series SysEx dumps Yamaha DX11 / TX81z / DX21 / DX27 / DX100 voice SysEx dumps.

(.BNK) – Ad Lib banks: FM-synthesis instruments.

(.BWF) – Broadcast wave files: This is essentially the same as a WAV format file, but with restrictions on what data formats are allowed, and with some extra information strings stored in them. Normally they also use the .WAV file extension rather than .BWF. Any application that can read BWF files also ought to be able to read PCM data format WAV files. 8 / 16 / 24 / 32-bit PCM / MPEG audio layer II, mono / stereo, name, comment etc.

(.C01) – Audio Interchange File Format: The .C01 files are a variant used by the Typhoon OS for the Yamaha TX16W.

(.CAF) – Core Audio Files, an audio type that may be created and loaded through the use of the Core Audio API from Apple. These files may contain text annotations, audio channel data, audio markers, and more, as specified by the Core Audio API.CAF files are much like an AIFF file, but are not limited to 4GB in size. Further, CAF files may contain any number of audio channels. Applications loading CAF files must use the Core Audio API and include Quicktime Player 7 or higher.

(.CDA) – Audio CD tracks: These files are virtual ‘placeholders’ that Windows shows in place for audio tracks on an audio CD. You can digitally read the audio track (a.k.a. ‘ripping’) by simply opening the .CDA file. 16-bits, stereo. .CDA files are shortcuts that references an individual track on an audio CD. These .CDA files do not contain any audio data, they simply provide a means to access individual tracks on a CD via an icon. As these files do not contain any audio information, they cannot be converted to another audio format. As the shortcut generally references a particular track, for example a favorite song, the original CD should be in the drive before clicking the shortcut. 

(.CFA) – Conformed Audio files: are created with some of the applications included in the Adobe Creative Suite version 4 or later. Developed by Adobe Systems Inc., the suite is designed for video and audio editing, graphic design and web development. The applications that create CFA files are Adobe Premiere Pro, After Effects, Encore and Soundbooth. They are created by default each time an audio file is imported to a new project.As the name of the format suggests, CFA files are conformed to a proprietary format owned by Adobe. They are also used for improving the performance of the applications above when working with audio data and when creating previews for waveform files. The new format is easier to access and to parse resulting in shorter processing times. Still, because of initial caching, users may encounter a short delay when importing audio data for the first time.Since CFA files contain uncompressed audio data, they can become very large. This can be avoided by disabling caching. However, this will result in longer processing times for audio previews. CFA files are automatically created by the Adobe Media Encoder, a standalone component of the Adobe Creative Suite that comes with all the applications mentioned above. All programs that use CFA files reference the same cache.

(.CDR) – Audio CD compatible raw data: This is raw CD-format audio, i.e. it does not refer to reading data from an actual audio CD. 16-bits, stereo.

(.CMF – Creative Labs FM music files): FM-synthesis instruments, song.

(.COD) – 3GPP ‘AMR interface format 2’: 4.75, 5.15, 5.9, 6.7, 7.4, 7.95, 10.2, or 12.2.kbit/s compressed, mono, 8 kHz.

(.CPR) – Cubase Project: is a special file format developed by Steinberg Media Technologies and originally introduced in 2002 with the release of Cubase SX 1.0.Cubase Project Files contain no actual audio data. Instead, they contain metadata necessary for properly rendering a project inside Cubase.This metadata includes (but is not limited to) references to all audio files associated with the given project, information about the total number and type of tracks used in the project, and any automation settings or effects that may have been applied to those tracks, mixer settings, VST settings, and many other global preference settings.CPR files may also be used for the creation of so-called templates – empty Cubase projects that act as preset environments for specific purposes, such as multi-track recording, surround mixing, stereo mastering etc.Since CPR files contain no actual audio data, it is important to know that in order to open a CPR file properly on another machine, all external audio and VST files should also be transferred to that machine. In addition, in order to convert CPR files to MP3 for playback on other devices, you need to open up the file in Cubase and use the File > Export > Audio Mixdown menu.

(.CWP) -.CWP extension: are known as Cakewalk SONAR Project files, however other file types may also use this extension.

(.DAC) – DAC Sound file: stored in TI/MIT DAC format (byte reversed ADC file). 

(DCF) – DRM Content Format: is a special media container format used for storing data encoded with a proprietary Digital Rights Management (DRM) technology for copy-protection.DCF files are mostly used on cell phones (most notably on cell phones manufactured by Nokia, Motorola and Sony Ericsson) for distributing paid ringtones, videos or wallpapers.Content stored in DCF files is locked and becomes accessible only after undergoing DRM activation. During DRM activation, the cell phone connects to a remote DRM server, which authenticates with the phone, checks the rights object and if everything is valid – unlocks the content that is stored in the file. Since DCF files are most commonly used for ringtones, they are often categorized as audio files, while they can actually contain a wide array of media types.

(.DCM) – DCM modules: .DCM files are music module files that are DCM encoded. 8 / 16-bits, mono, loop, collection.

(.DCT) – .DCT extension are known as Dictation Audio files.

(.DEWF) – SoundCap / SoundEdit instruments: 8-bit, mono, loop.

(.DIG) – Digilink format: 16-bits, stereo.

(.DIG) – Sound Designer I files: Files with the DIG extension are associated with Sound Designer. The program was developed in 1985 by Digidesign to serve as recording and editing software for Macintosh Systems. The DIG file format was the main format used by the Sound Designer Application. DIG files are monophonic and of low sampling size and bitrate. Because of the low quality sound, the DIG format is not used anymore. 8, 16-bits, mono, name, loop.

(.DLS) – Downloadable Sounds level 1 & 2 / Mobile DLS:

  • Level 1 & up: 8 / 16-bit PCM, mono, loop, envelopes, lfo, collection, drum-kits, instruments.
  • Level 2 & up: layers, per region articulation, two lfo’s, low-pass filter.
  • Level 2+ & up:μ-Law / Α-Law / 24 / 32-bit PCM / 32-bit floating point / DVI-IMA 4-bit ADPCM / MPEG layer III compression.
  • Level 2++ & up:stereo, fractional loop points, reverse & bidirectional loops, layer level articulation, LFO ramp time, LFO shapes, filter type, etc. 

(.DMF) – Delusion/XTracker digital music format: 8 / 16-bits, mono, loop, name, instruments, collection.

(.DSF) – Delusion/XTracker sample format: 8 / 16-bits, mono, loop, name, instruments, collection.

(.DR8) – FXpansion DR-008 drumkits: 32-bit floats, mono/stereo, name, low-pass filter.

(.DSM) – Digital Sound Module format: 8-bits, mono, name, loop, collection.

(.DSS) – Digital Speech Standard: is a high-compression digital audio file format developed in 1994 by Grundig Business Systems at the University of Nuremberg, but officially released and specified in 1997 by the International Voice Association – a cooperative venture formed by Grundig Business Systems, Philips Electronics and the Olympus Corporation.DSS files are designed specifically for compressing human speech and are used extensively in most digital voice recorders and dictation devices produced by the above-mentioned companies.The major advantage of DSS files is their reduced file size. Compared to a WAV file of the same length for example, a DSS file would be only about 12% in size.DSS is a proprietary format, meaning its specification is closed to third-party developers outside of the ones involved in the International Voice Association.DSS files may contain metadata specifying how long the recording is or exactly when the recording was made. While DSS files can be easily played on PC or MAC with the freely-distributed Olympus DSS Player Lite, there are a number of free conversion tools available that allow converting DSS files to more common audio formats like MP3 or WAV.One of the most popular free tools for that purpose is probably the Switch Sound File Converter Plus available on CNET.You must have the Olympus DSS Player software installed to be able to read these files (DSS Player 3.x, 2000, or Pro versions supported). 16-bits, mono, Meta info.

(.DTM) – DigiTrekker modules: 8 / 16-bits, mono, loop, name, collection.

(.DTS) .DTS contains ‘digital surround audio data’ or ‘soundtrack’. DVD movies often use .DTS files to store multi channel sound data; this is used to best advantage with audio settings of 5.1 channels. This essentially recreates the ‘surround sound’ format popular in cinemas and theatres. Most DVDs come with a selection of audio formats (files)to increase the choices available to the user, and to fit in with technological restraints, i.e. not all televisions / DVD players support the DTS format, so .DTS soundtracks are user selectable rather than standard. Some Audio files (Cd’s, etc.) also utilise the DTS format to enhance listening pleasure.

(.DTSHD) is a file extension corresponding to a DTS HD lossy or lossless audio stream. The DTSHD audio format is generally used with a corresponding video stream in order to author Blu-Ray and High Definition (HD) DVDs. Further, a DTSHD lossy audio stream may be restored to its original lossless stream using the DTSHD Master Suite application for manipulating this format from DTS, Inc.

(.DVF) – Files with the extension .DVF are audio files recorded using certain models of Sony dictation / voice recorders. DVF files may require the use of the Sony plug-in ‘Sony ICD 1.2 Download’ in order to play on PC.

(.DWD) – DiamondWare digitized files: 8 / 16-bits, mono / stereo.

(.DX7) – Yamaha DX7 voice SysEx dumps / Raw Yamaha DX7 32-voice data.

(.EDA) – Ensoniq ASR disk images. 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EDE) – Ensoniq EPS disk images. 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EDK) – Ensoniq KT disk images. 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EDQ) – Ensoniq SQ1/SQ2/KS32 disk images. 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EDV) – Ensoniq VFX-SD disk images: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EFA) – Ensoniq ASR instrument files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EFE) – Ensoniq EPS instrument files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EFK) – Ensoniq KT instrument files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EFQ) – Ensoniq SQ1/SQ2/KS32 instrument files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EFS) – Ensoniq SQ80 instrument files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EFV) – Ensoniq VFX-SD instrument files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.EMB) – Everest embedded bank files: Compression, loop, name, lfos, envelopes, layers, instruments, drum-kit, and collection.

(.EMD) – ABT extended modules: 16-bits, mono, loop, name, collection.

(.EMX) – eMusic Download Manager Download file format: Files represent downloads associated with the eMusic Download Manager and are commonly referred to as eMusic Download Manager Downloads. The eMusic Download Manager is a desktop application developed especially for the needs of the eMusic.com online store and allows downloading music tracks or audio books in MP3 format on a client machine.eMusic users pay a monthly subscription fee and are allowed to download a fixed number of tracks every month. The store is preferred mostly by people interested in downloading entire albums. EMX files contain all the necessary metadata needed to point the eMusic Download Manager to the exact location of the MP3 files associated with the album queued for download.They contain no actual audio data on their own, but only a reference to where the audio data can be located.

(.ENC) – Encore Musical Notation: are files that have been encoded in the .UUE encoding format and then saved with the generic “.enc” file extension. ENC files are often encoded with the use of a specific encoding program. Examples of programs that produce encoded .ENC files are IBM Lotus 1-2-3 or Adobe Flash. The UUE encoding and generic naming convention is helpful in protecting the file from being opened or viewed plainly by unauthorized users.

(.ESPS) – ESPS audio files: 8 / 16 / 32-bit integer / 32 / 64-bit floating point, mono.

(.EXS) – Logic EXS24 instruments: 8/16/24/32-bits, names, loops, layers, instrument, and trigger type.

(.EUI) – Ensoniq EPS family compacted disk images: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.FAR/.FSM/.F2R/.F3R) – Farandoyle tracker formats: 8 / 16-bits, mono, loop, name, collection.

(.F32/.F64) – Floating point raw 32/64-bit IEEE data: 32 / 64-bits floats, mono.

(.FFF + .DAT) – Gravis UltraSound PnP bank files: The .FFF file contains all ‘instrument parameters’ and stuff while the accompanying .DAT file contains the actual sample data, which is to be downloaded to the synth’s memory. 8 / 16 bit / μ-law, mono, loop, lfo, amplitude envelope, layers, instruments, drum kits, collection.

(.FLAC) – Free lossless audio codec files: Files with the FLAC (Free Lossless Audio Codec) extension are created using a compression algorithm developed by the Xiph.org non-profit organization. FLAC is the most widespread lossless audio codec and the only one that comes with open source reference implementation and it’s both non-proprietary and unpatented.The FLAC File Format enables efficient audio compression without any loss of sound quality. FLAC files are 50 – 60% smaller than the original audio files used for compression. The process can also be reverted resulting in an identical copy of the uncompressed audio file. FLAC files offer support for album cover art, metadata tagging and for fast seeking.This format is well suited e.g. for compressing important archive data without loosing any quality, 8/16/24 bits, lossless compression, mono / stereo, meta info.

(.FLP) – Fruity Loops Project Files: is the propriety audio file format used by ‘Fruity Loops Studio’ which is often known simply as ‘Fruity Loops’. Fruity loops is described as being a ‘digital audio workstation’, essentially a music creation tool based around a pattern-based sequencer and featuring editing/mixing/recording utilities and includes MIDI support..FLP files can be converted to .MP3 or .WAV format within Fruity Loops.* .FLP files may not work properly when attempting to load within a version of Fruity Loops different to the version in which they were created.

(.FNK) – FunkTracker modules: 8-bit PCM, mono, loop, name, panning, collection.

(.FOR) – Kurzweil Forte files: 16-bits, mono, loop, names, layers, instruments, collection, MIDI song, trigger type.

(.FSB) – FMOD SoundSystem sound banks: 8…32-bits, mono/stereo, loop, name.  

(.FST) – Studio State File: Files with the extension .FST are studio state files created and used by ‘Fruity Loops Studio’. .FST files are used to store current channel presets such as generator and effects presets.

 (.FZB/.FZF/.FZV) – Casio FZ-1 formats: 16-bits, mono, loop, name, instruments, sample start & end offsets.

(.G721) – G.721 4-bit (32 kbps) ADPCM format data: ADPCM 4-bit lossy compression, mono.

(.G723) – G.723 3/5-bit ADPCM format data: ADPCM 3 / 5-bit lossy compression, mono.

(.G723-3) – G.723 3-bit (24 kbps) ADPCM format data: ADPCM 3-bit lossy compression, mono.

(.G723-5) – G.723 5-bit ADPCM format data: ADPCM 5-bit lossy compression, mono.

(.G726) – G.726 2/3/4/5-bit ADPCM format data: ADPCM 2 / 3 / 4 / 5-bit lossy compression, mono.

(.G726-2) – G.726 2-bit (16 kbps) ADPCM format data: ADPCM 2-bit lossy compression, mono.

(.G726-3) – G.726 3-bit (24 kbps) ADPCM format data: ADPCM 3-bit lossy compression, mono.

(.G726-4) – G.726 4-bit (32 kbps) ADPCM format data: ADPCM 4-bit lossy compression, mono.

(.G726-5) – G.726 5-bit (40 kbps) ADPCM format data: ADPCM 5-bit lossy compression, mono.

(.GDM) – Bells, Whistles, Sound Boards modules: 8-bit, mono, loop, name, collection.

(.GIG/.GI) – GigaStudio / GigaSampler files: 8 / 16 / 24-bits, mono, loop, envelopes, lfo’s, collection, instruments, layers, lossless PCM compression (of 16 / 24-bits PCM formats), sample start offsets, multiple filter types.

(.GKH) – Ensoniq EPS family disk image files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets.

(.GPX) – Guitar Pro 6 Document file format: GPX (Guitar Pro 6 Document) is a special container file format developed by Arobas Music for storing musical scores and guitar or bass tablatures created with their multitrack editor Guitar Pro.The Guitar Pro 6 GPX file is strictly a file containing musical notation. Contrary to older Guitar Pro formats, such as GP3, GP4 and GP5, data stored in GPX files is encoded using a proprietary made dynamic dictionary compression scheme.The data itself, once uncompressed, represents a “little hard-disk” (an independent file system) consisting of various XML files that contain the actual data used for rendering the file inside the Guitar Pro editor, such as information about the various bars, tracks, voices, beats and notes that form the actual file.On its own a GPX file contains no real audio data, but only metadata used for generating audio with the Guitar Pro editor.A major downside of GPX files is that they cannot be natively exported to one of the older Guitar Pro formats, making it impossible to open Guitar Pro 6 documents, without buying the latest copy of Guitar Pro.GPX files can still be exported to MIDI and thus imported to an older version of the editor, however that usually renders inaccurate tablatures, making it a less-than-ideal solution.

(.GSM) – GSM 06.10 audio streams: There are at least two possible types of ‘packings’. One with packets of 32.5 Bytes/frame and one with 33 Bytes/frame (where the first 4 bits in a packet is ignored). Which packing style is used is auto detected. GSM compression, mono, 8000 Hz.

(.GSM) – US Robotics voice modems GSM files: There are two types of these files. One has a file header and is used by e.g. the QuickLink software. The other has no header and is used by e.g. the VoiceGuide and RapidComm software. Which type a file is can usually be detected, but when saving files to be used by some voice modem software, you may have to try both and see which one works with that program. GSM compression, mono.

(.HCOM) – Sound Tools HCOM format: Huffman compression, mono.

(.IBK) – Creative Labs FM banks: FM-synthesis instruments, song.

(.IFF) – Interchange file format 8SVX and 16SV data types: Container file format developed by Electronic Arts and Commodore back in 1985. As this format is simply a wrapper, the actual data inside can be anything from audio (most common) to images and video.The actual format itself is made up of sections called ‘chunks’, which are defined by four-letter IDs. The three types of chunks are: 

  • FORM – defines the format of the file
  • LIST – includes the properties of the file
  • CAT – includes the remaining data

The 16SV flavor is rather uncommon and is perhaps only used by some PC trackers. Thus if you save .IFF files with ’16-bit PCM’ data format, then do not expect them to be readable by most .IFF readers. 8 / 16-bits, loop, name, mono / stereo.

(.IMW) – IncrediMail MIDI/WAV: contains a collection of data used by IncrediMail, an email client that produces multimedia e-mail.

(.IMY) – iMelody mobile ringtone format: Files with the extension .IMY are ringtone files supported by various mobile phone manufacturers including: Sony Ericsson, Motorola, Siemens and Alcatel.The .IMY (iMelody) format is a non-polyphonic or monophonic ringtone that supports volume modification and has the ability to control the phones backlight, any LED lights and vibrations.The .IMY file format is based on Sony Ericsson’s proprietary ‘eMelody’ format..IMY ringtone files can be transferred to a mobile phone via text message, text-file, bluetooth, data cable, MMS or by e-mail. The iMelody format is not the same as the i-Melody format used by the Japanese i-Mode phones.

(.INI) – Gravis UltraSnd.ini bank setup: Same as for .PAT + collection + drumkits.

(.INI) – IBM MWave DSP synthesizer bank setup (MWSYNTH.INI): 8 / 16-bits, loop, name, instruments, layers, collection.

(.INRS) – INRS-Telecommunications audio files: 16-bits mono.

(.INS) – Ensoniq instrument files: 16-bits, mono, loop, names, fine-tune, layers, instruments, collection, sample start offsets, trigger type, pedal switch.

(.INS) – Ad Lib instruments: FM-synthesis instruments.

(.INS) – Sample Cell II PC/Mac instruments: There are two different (incompatible but very similar) varieties of these files, a PC format and a Mac format. These files contains one instrument definition (per file) and references a bunch of external waveform files. For the PC format, the external files is .WAV files. For the Mac format (usually .INS) they can be either .AIF(F) or .SD2(F) files. When reading an .INS file you should put all these external files into the same directory as the .INS file. It is very important that these external files retain the original filename that they were written – so when transferring Mac files to a PC, take extra care to retain the ‘long’ file names. Moreover, in case of referenced .SD2 files you must transfer it to a ‘flattened’ MacBinary format file. 8 / 16-bits mono, instrument, sample start offsets.

(.INS) – Cakewalk instrument definition files: Names and midi program assignments for instruments.

(.IST) – Digitrakker instrument files: 8, 16-bits, mono, loop, name, huffman packing, collection.

(.IT) – Impulse Tracker modules: 8 / 16-bits, mono / stereo, loop, name, instrument, collection.

(.ITI) – Impulse Tracker instrument: 8 / 16-bits, mono / stereo, loop, name, instrument, collection.

(.ITS) – Impulse Tracker sample: 8 / 16-bits, mono / stereo, loop, name, instrument, collection.

(.IVC) – InterVoice sound file format: Files with the IVC extension are associated with the Intervoice Audio format developed for recording human speech. IVC files are based on the Adaptive Differential Pulse Code Modulation technology, which lowers bandwidth requirements for any signal-to-noise ratio.The IVC file format is mostly used for computer telephony software. It is also used for compressing voice recordings that were originally created using a lossless file format. IVC files support a maximum 8-bit sampling rate and monophonic only audio data. However, they can be compressed using various schemes. Features: G.711 µ-Law and A-Law, G.721 ADPCM (32 kb/s) and G.723 ADPCM (24 kb/s) data formats. 

(.K25) – Kurzweil K2500 files + CD-ROM’s: 16-bits, mono, loop, names, layers, instruments, collection, MIDI song, multiple filter types, trigger type.

(.K26) – Kurzweil K2600 files + CD-ROM’s: 16-bits, mono, loop, names, layers, instruments, collection, MIDI song, multiple filter types, trigger type.

(.KAR) –  Karaoke MIDI File:KAR files are MIDI files that contain song lyrics. KAR files contain a combination of both MIDI data as well as the text that is used for karaoke. Within the KAR file, the lyrics are synchronized to display in time with the music. Traditional MIDI files contain music and no lyrics. However, since the KAR Karaoke files are similar to standard MIDI files, they do not contain actual audio data.

(.KFN) – Audio File: Files with the extension .KFN are audio files associated with ‘Karafun’ which is a karaoke system for the PC.KFN files are audio files downloaded for the karaoke system via the Karafun website.

(.KAWAI12) – KAWAI R50/R50E/R50III/R100 ROM-dump: 12-bit nonlinear audio (DAT LP), mono.

(.KDT) – Konami KDT1 songs: MIDI song data.

(.KFT) – Korg T-series waves: 16-bits mono.

(.KIT) – Native Instruments Battery v1 drum-kit files: 16-bits mono/stereo, loop, envelopes, names, instruments, collection.

(.KMP) – Korg Kronos / M3 / M50 / Triton / Triton LE / Trinity keymap file: A .KMP file contains a ‘multisample’ for the Korg Kronos, M3, M50, Triton, Triton LE or Trinity – corresponding to a ‘layer’ in this program. It references one or more KSF files containing individual waveforms.Note that each synth can load KMP files made for its predecessor(s), but there are some minor differences between the files made for the respective synths. 8 / 16-bits, mono, loop, names.

(.KOE) – AquesTone Vocal Synthesiser Syllable File: Syllable file used by the synthesizer software ‘AquesTone Vocal Synthesiser’.These .KOE files store syllables that the synthesizer then ‘sings’ based on user-assigned keys. AquesTone Vocal Synthesiser can also output singing voices by reading MIDI data and text lyrics.

(.KR1) – Kurzweil K2000/K2500/K2600 split file: 16-bits, mono, loop, names, layers, instruments, collection, MIDI song, multiple filter types, trigger type.

(.KRz) – Kurzweil K2000 file (also K2500 & K2600): 16-bits, mono, loop, names, layers, instruments, collection, MIDI song, multiple filter types, trigger type.

(.KSC) – Korg Kronos / M3 / M50 / Triton / Triton LE / Trinity script file: A .KSC files contains a list of KMP files (multisamples) – normally to be used by your own programs – and/or KSF files (samples) – normally to be used as drums (but note that these contain no key assignments so they will not show up as a drum-kit). This list is used to load them all in one batch into the synt. When saving an instruments or a collection to a KSC file, then melodic instruments are written as KMP files, and drum kit’s as KSF files – all listed in the KSC file.KMP and KSF files are stored in a sub-directory with the same name as the .KSC file.Note that each synth can load KSC files made for its predecessor(s), but there are some minor differences between the files made for the respective synths.The Kronos use a v2 of the KSC format, which probably will not work with the older synths.It includes a “uuid” – a unique number identifying the collection of files. 8 / 16-bits, mono, loop, names, instruments, collection.

(.KSF) – Korg Kronos / M3 / M50 / Triton / Triton LE / Trinity sample file: KSF files contains waveforms (samples) for the Korg Kronos / M3 / M50 / Triton / Triton LE / Trinity. 8 / 16-bits, mono, loop.

(.LQT) – Liquid Audio Music File: LQT files are audio/music files created for use in the audio play-back application ‘LiquidAudio’.These audio files are DRM (digital rights management) protected.

(.LVP) – Avaya Voice Player – Compressed Voice Audio File: Avaya’s Voice Player – a software application for recording and playback of voice audio, uses The .LVP format. Avaya’s Voice Player can be used as an Internet Browser Helper Application or can be integrated with your e-mail software to turn it into a voice mail system. Audio files with the .LVP extension are compressed, meaning file sizes are smaller and easier to send over the internet.

(.M3U) – Media Playlist File: Playlist file saved in a plain text format containing the names (and sometimes the physical location) of one or more media files. generally MP3 music files.A compatible media player will then play through the media files in the specified order found in the .M3U playlist file.The .M3U file only contains text and does not contain any media data.The format was originally developed for WinAmp but is now supported by many other media players.These files contain a single header line (#EXTM3U) followed by an entry for each media file.

(.M4A) – iTunes MPEG-4 audio format: .M4A: MPEG-4 AAC / Apple Lossless (ALAC, read-only), mono / stereo, text info.

(.M4R) – IPhone ring-tone: .M4R: MPEG-4 AAC / Apple Lossless (ALAC, read-only), mono / stereo, text info.

(.MAP) – Native Instruments Reaktor wavetable files: (8 / 16 / 32-bit PCM) / 32-bit float, mono / stereo, instrument (single layer).

(.MAT) – Matlab variables binary files: 8 / 16-bits / floats, mono / stereo, collection.

(.MAUD) – MAUD sample format: 8 / 16-bits / μ-Law / A-Law, mono / stereo, name.

(.MDL) – Digitrakker module files: 8, 16-bits, mono, loop, name, huffman packing, collection.

(.MED) – OctaMED modules: 8-bits, loop, name, collection.

(.MID) – Standard MIDI files: This program supports two types of standard MIDI files ‘format 0’ and ‘format 1’. Which one will be used when you save depends on what format the source data is:

  • Format 0 has all data for all channels stored in a single track. This is the most portable format.
  • Format 1 has the data for each channel separated into it’s own track. There is also an extra track (the first one) that contains ‘control information’ (e.g. tempo changes). This is the format most often used by sequencers and is the format that we recommend.
  • Format 2 is used to store temporally disjoint ‘sequences’ in different tracks. This format is seldom used and may not be correctly handles by this program.

(.MID) – Roland D-50 patch SysEx dumps: LA-synthesis instruments.

(.MID)- Roland MT-32 (and compatibles) timbre SysEx dumps: LA-synthesis instruments, 16-bit PCM, loop.

(.MID) – Yamaha DX7 voice SysEx dump: FM-synthesis instruments.

(.MID) – Yamaha DX7s / DX7II / DX200 voice SysEx dump: FM-synthesis instruments.

(.MID) – Yamaha DX11 / TX81z voice SysEx dump: FM-synthesis instruments.

(.MKA) – Matroska audio file: MKA (Matroska Audio) is an audio container format and as such can contain an unlimited number of audio streams stored inside it, encoded with virtually any existing audio codec, like MP3 or Vorbis to name a few.The Matroska format is being developed with the intention to become the ultimate standard in multimedia container formats and is currently well supported on all major platforms.One of the major advantages of the MKA file format is that it supports chapter markers and subtitle tracks, allowing you to store an entire music album, separated track by track, into a single file, along with all its lyrics.Matroska is also free of charge to use and requires no commercial license, making it an excellent choice for developers.8/16/24/32-bits PCM / 32/64-bits float / MPEG audio layer I / II / III / MPEG AAC / Vorbis Ogg / AC3 / DTS, mono / stereo / multi-channel, text Metadata.

(.MKV) – Matroska video file: 8/16/24/32-bits PCM / 32/64-bits float / MPEG audio layer I / II / III / MPEG AAC / Vorbis Ogg / AC3 / DTS, mono / stereo / multi-channel, text Meta data.

(.MLD) – MFi/MFi2 songs a.k.a. i-Melody: Song, ADPCM 4-bit compressed recordings (v2+ only).

(.MLS) – Miles Sound System compressed DLS file: MLS files are basically ‘DLS level 1’ files with the addition of optional DVI/IMA 4:1 ADPCM compression of the waveform data. 8 / 16-bits PCM / 4-bit ADPCM, mono, loop, envelopes, lfo’s, collection, drumkits, instruments, layers, MIDI song.

(.MMF) – Mobile Music File: is the default file extension for files belonging to the SMAF specification. SMAF stands for Synthetic-music Mobile Application Format.The SMAF format was developed by Yamaha and introduced with the production of their MA-1, MA-2, MA-3, MA-5 and MA-7 LSI sound chips (respectively capable of 4-, 16-, 40-, 64- and 128-polyphony), which all found broad use in mobile phones placed on the East Asian market around the early 2000s.Internationally, the MMF format was most endorsed by Samsung, being used as the default polyphonic ringtone format on a wide range of their mobile devices. With the increasing popularity of modern smartphones, the MMF format became less relevant to the mobile industry. However, certain lower end phones not capable of WAV or MP3 playback etc., are still and may continue using the MMF format to store their ringtones.In structure, MMF/SMAF files highly resemble a MIDI file. However, what makes them different from MIDI, is that MMF files may also encapsulate descriptive metadata (such as artist, song title, genre), graphics and/or raw PCM audio data. PCM audio data may be there, because except through FM synthesis (like in MIDI files), playback may be also produced via PCM wavetable synthesis (similar to module files like MOD, XM, IT or S3M to name a few), in which case there are PCM audio tracks that serve as a sound source and define the timbre of the instruments. The PCM method allowed the ringtones to sound the same regardless of the sound chip that provides their playback.To popularize the format, Yamaha published a number of tools at the official Yamaha SMAF website, which allowed users to easily play MMF files on their desktop computers and convert them from and to widely popular formats like WAV and MID. It’s interesting that compared to MIDI, regular MMF/SMAF files (using only FM synthesis to produce feedback) are only two-thirds of the size, which still makes them a good choice for sharing and distributing this type of audio content.In addition to song data, these files can contain instruments parameters of three types: FM (using FM-synth), PCM (using wavetable synth) and Stream PCM (longer audio clips). Song of type ‘0’ (used by MA-1 and MA-2 chip-sets) and of type ‘2’ (used by MA-3 and MA-5 chip sets), ADPCM compressed ‘streams’, PCM instruments, FM-synthesis instruments.

(.MON) – Wavelab Audio Montage File: Used by the digital audio editor, Wavelab to store audio montages.

(.MOD) – Module files: 8-bits, mono, loop, name, fine-tune, collection, MIDI song.

(.MOV) – Apple QuickTime movie format: .MOV:  8 / 16 / 24 / 32 bit PCM / 32 / 64 bit floats / μ-law / A-law / Apple MACE3 / MACE6 / IMA 4-bit ADPCM / MS 4-bit ADPCM / MPEG audio layer III / GSM 06.10 / MPEG-4 AAC / AMR / Apple Lossless (ALAC) / AC3 / DTS, mono / stereo / multi-channel, text info.

(.MP1) MPEG audio stream, layer I: 16 bit mono / stereo, 48, 44.1, 32, 24, 22.05, 16, 12, 11.025, 8 kHz, compresses from 8 to 448 kbit/s.

(.MP2) MPEG audio stream, layer II: 16-bit mono / stereo, 48, 44.1, 32, 24, 22.05, 16, 12, 11.025, 8 kHz, compresses from 8 to 448 kbit/s.

(.MP3) MPEG audio stream, layer III: 16-bit mono / stereo, 48, 44.1, 32, 24, 22.05, 16, 12, 11.025, 8 kHz, compresses from 8 to 448 kbit/s.

(.MP4) – MPEG-4 base media file format: .MP4: AMR / MPEG-4 AAC, mono / stereo.

(.MPA) MPEG audio stream, layer I, II, ‘II½’ or III:16 bit mono / stereo, 48, 44.1, 32, 24, 22.05, 16, 12, 11.025, 8 kHz, compresses from 8 to 448 kbit/s.

(.MPC) – Musepack audio compression: Background: Musepack is an open source lossy audio compression. It is optimized for “transparent” audio coding at ~180 kbit/s and higher. Is is loosely based on MPEG audio layer II but adds several improvements, while avoiding some of the compression artifacts that is present with MPEG layer III (.MP3) even at high bit-rates. 32-bit floating-point data, mono / stereo, 32 / 37.8 / 44.1 / 48 kHz, bit-rates from ~20-350 kbit/s, Meta info tags.The file extension .MPC is associated with “MusePack” an audio compression format based on the MPEG-1 layer-2/MP2 algorithms.Devices that support playback of MPC files include: Pocket PC, Palm OS, Symbian OS, Windows CE and Windows mobile.

(.MPEG/.MPG) – MPEG system streams: Handles both MPEG 1 system streams, and MPEG 2 ‘program’ system streams.

(.MSS) – Miles Sound System formats: MSS files combines an ‘Extended MIDI’ or .XMI song with either a .DLS or an .MLS into a single file.

(.MSV) – Sony Memory Stick Compressed Voice File: Compressed audio file which contains voice recordings, requires a Sony plug-in for play-back in Windows Media Player.

(.MT2) – MadTracker 2 modules: 8 / 16-bits, mono / stereo, loop, name, collection (for modules), multiple filter types.

(.MTI) – MadTracker 2 instruments: 8 / 16-bits, mono / stereo, loop, name, collection (for modules), multiple filter types.

(.MTM) – MultiTracker modules: 8 / 16-bits, mono, loop, name, collection.

(.MUS) – Doom/Heretic music files: MIDI song.

(.MUS10) – Mus10 audio files: 12 / 16 / 18 / 20 bits, mono / stereo.

(.MWS) – IBM MWave DSP synthesizer’s instrument extracts: The software synthesizer module for the IBM MWave DSP uses the MWSYNTH.INI file to construct instruments from usual .WAV files. When exporting to the MWS format, it outputs a bunch of .WAV files as well as a .MWS text file containing an instrument extract that can be manually pasted into the MWSYNTH.INI file. 8 / 16-bits, loop, name, layers, instruments. 

(.NCOR) – Adobe Encore Project File: An NCOR file is a project file used by the Adobe Encore program that is used to describe all resources and information needed to author a DVD, Blu-Ray, or web media files. The NCOR file will describe the location and usage of all video files contained in the project as well as any image files, audio files, or menus. In addition, it will also contain the chapters and time lines for web authoring, Blu-Rays, or DVDs.

(.NIST) – NIST SPHERE files: These files do not seam to use any definite file extensions, though .NIST, .SD, or .WAV is often seen. 8, 16, 24, 32-bits PCM / μ-Law, mono / stereo / multi.

(.NKI) – Native Instruments Kontakt Instruments:  16..32-bits, loop, name, lfos, envelopes, layers, instruments, collection, sample start & end offsets, filters, scripts.

(.NKM) – Native Instruments Kontakt Multi:  16..32-bits, loop, name, lfos, envelopes, layers, instruments, collection, sample start & end offsets, filters, scripts.

(.NKB) – Native Instruments Kontakt Bank:  16..32-bits, loop, name, lfos, envelopes, layers, instruments, collection, sample start & end offsets, filters, scripts.

(.NKP) – Native Instruments Kontakt Preset:  Presets.

(.NKR) – Native Instruments Kontakt Resource File:  graphics, scripts, and IR samples.

(.NCW) – Native Instruments Kontakt lossless compressed audio file: Samples.

(.NKX) – Native Instruments Kontakt protected monolith: Samples + Patches + Presets.

(.NKC) – Native Instruments Kontakt Cache File: has a list on which position each file within a monolith starts.

(.NRT) – Nokia Ring Tone: is a heavily outdated audio file format developed by the Nokia Corporation and primarily used for storing and playing monophonic ringtones on their early 2000s mobile phones. In their essence, NRT files are not much different from regular MIDI files, except that the size of an NRT file is not supposed to exceed 10-15 kilobytes in order to be properly played by the phone. Older versions of the Nokia PC Suite were able to convert NRT files to more convenient audio file formats; however, newer versions are no longer capable of performing this conversion, making it really hard to conveniently convert NRT files to more popular formats like MIDI, WAV or MP3.

(.NVF) – Creative Labs Nomad voice files: ADPCM 4-bit lossy compression, mono.

(.NWC) – Noteworthy Composer Song File: Song data file used by the music composition and notation processor tool, ‘Noteworthy Composer’.

(.O01) – Typhoon voice files: The actual waveform data are stored in accompanying .C01 file(s). DWVW compression, mono, loop, instrument.

(.OGA) – Ogg Vorbis Audio: An audio container file for an audio file compressed using the Ogg Vorbis audio compression method. These files may also be given the OGG file extension.The Ogg Vorbis audio compression was created to compete with MP3, AAC, and other popular audio formats as a highly compressed but high quality audio codec. Audio recorded in OGA or OGG formats often sound better than MP3s of the same size.

(.OGG) – Vorbis Ogg streams: The Vorbis format uses variable bit-rate encoding and the bit-rate that you select when writing is the approximate average bit-rate only. Also note that not all bit-rates are available for all types of input data, e.g. the 8-kbit/s mode is only available for 8000 Hz mono data as input, other input formats will be saved used the nearest supported bit-rate. Mono / stereo, compresses to ~32-500 kbit/s, text Meta data.

(.OKT) – Oktalyzer modules: 8-bits, mono, loop, name, collection.

(.OMA) – Sony OpenMG Audio File: The OMA extension indicates a compressed audio file created with Sony’s Adaptive Transform Acoustic Coding. Commonly referred to as ATRAC, the compression algorithm was developed by Sony and first used for commercial purposes in 1992.The OMA format is used for files encrypted with OpenMG, a SDMI compliant DRM (Digital Rights Management) system developed by Sony. Sony used OMA files for selling music online through the Connect Music store which was shut down in 2008. They may contain Digital Rights Management Protection to prevent playback on unauthorized devices. Audio data contained by OMA files is compressed to approximately 1/20 the data rate of a CD track. Even though they are considerably smaller, the loss in perceptible quality remains minimal.

(.OPUS) – Opus audio streams: Mono / stereo, compresses to ~6-510 kbit/s, text meta data.

(.OSP) – Orion Sampler programs: 16-PCM / 32-bit float, mono, instrument, layers, loop.

(.OUT) – Roland S-7xx series floppy image (S-70,S-700,S-750,S-760,S-770,S-772): 12 / 16-bits, mono, loop name, fine-tune, lfos, envelopes, layers, instruments, collection, multiple filter types.

(.P) – AKAI S900/S950 programs + CD-ROM’s: These samplers use proprietary CD-ROM and floppy disk formats that can’t be read by Windows. It is thus necessary to talk directly with the floppy disk or CD-ROM hardware. you’ll have to use free 3rd party programs (search for ADISK or AkaiDisk) to transfer stored the files to the PC, also see .AKAI format. 12-bits, mono, instrument, dual layering, loop, name, sample start & end offsets, low-pass filter.

(.P) – AKAI S1000/S1100/S01 programs + CD-ROM’s: 16-bits, mono, instrument, layers, loop, name, sample start & end offsets, low-pass filter.

(.P) – AKAI S3000/S3200/S2000/S2800 programs + CD-ROM’s: 16-bits, mono, instrument, layers, loop, name, sample start & end offsets, low-pass filter.

(.P3K) – Kurzweil PC3K files: 16-bits, mono, loop, names, layers, instruments, collection, MIDI song, trigger type.

(.PAC) – SB Studio II package files: 8 / 16-bits, mono, loop, name, collection.

(.PAF) – Ensoniq PARIS audio files: 16 / 24-bits, mono / stereo / multi.

(.PAT) – Gravis UltraSound GF1 patch files: 8 / 16-bits, mono, loop (incl. fractional), envelopes, lfos, names, instruments.

(.PBF/ .PPF) Turtle Beach Pinnacle bank/program files: The format stores waveforms in external .WAV files that must be placed in the same directory as the .PBF file. 8 / 16-bits, loop, name, lfos, envelopes, layers, instruments, drum-kit, collection, sample start offsets, multiple filter types.

(.PCAST) – iTunes Podcast file format: PCAST (iTunes Podcast) is a special type of XML file developed by Apple Inc., used for sharing podcasts with their iTunes application. PCAST files allow iTunes users to subscribe to various podcast feeds. However, they contain no audio data on their own, but just a reference to the location of the latest podcast episodes.A standard PCAST file is an XML file fully compliant to the RSS 2.0 Specification. It usually contains special iTunes RSS tags, which specify additional information about the author and topic of the podcast. The iTunes RSS tags may be also used to define things such as: a set of keywords that describe the podcast in greater detail, a brief summary of the podcast topic, or a category that the podcast fits into.A special tag defines the URL, length (in bytes) and MIME type of the podcast. Supported audio-only file types are MP3 and M4A. However, certain PCAST files may also carry information about video podcasts. Being an XML file, besides iTunes, a PCAST can be easily opened with any text editor for further tweaking and/or data extraction.New PCAST files can be easily created by just dragging a podcast feed a user is subscribed to from the iTunes app interface directly to the desktop.

(.PCG) – Korg Trinity / Triton / LE / M3 / M50 / Kronos program – bank files: The PCG file must always be accompanied by a KSC file of the same name, listing the KMP and KSF files that needs to be loaded to use the programs in the PCG. When writing Trinity PCG files. 8 / 16-bits, mono, loop, names, instruments, collection, multiple filter types, trigger type.

(PCM) – Pulse Code Modulation: PCM is the most common representation of uncompressed audio signals. This method of coding yields the highest fidelity possible when using digital storage. PCM is the standard format for .wav and .aif files.

(.PCM) – OKI MSM6376 synth chip format: This format supports the following sample frequencies: 4000, 5300, 6400, 8000, 10600, 12800, 16000, 21200, 32000 Hz. When saving files, the nearest supported frequency will be used and the waveform first resampled if it is not exactly the same frequency. To save in a certain frequency, first manually resample the waveform to that frequency. 16-bits, mono.

(.PEK) Adobe Peak Waveform File: The PEK file extension is associated with the Adobe Creative Suite, version 4 and later. Developed by Adobe Systems Inc., the suite is designed for video and audio editing, graphic design and web development. PEK files are created each time the user imports audio data into Adobe Soundbooth, Encore, After Effects or Premiere Pro.For improved performance all the applications above use the Adobe Media Encoder, a standalone component of the Adobe Creative Suite, to create CFA files which are conformed to a proprietary format owned by Adobe. The new format is easier to access and to parse resulting in shorter processing times.When creating and caching CFA files which contain audio data, the Adobe Media Encoder also creates PEK files. These contain the visual waveform data displayed on the audio track timeline. Both CFA and PEK files can be found and moved/deleted manually from the project directory but the Media Encoder will create them again if the data is needed. This can be avoided by disabling caching.

(.PGM) – AKAI MPC 1000/2000XL/2000/2500/3000 drum set files: the MPC-1000, the MPC-2000, the MPC-2000XL and the MPC-3000 all use different program formats although they all have the same .PGM extension. Only the MPC-2500 use the same format as the MPC-1000. Actually the MPC-2000 and MPC-2000XL both use the same PGM file format, but the latter (like the MPC-1000/2500) requires samples in .WAV format, while the former needs them in .SND format (like the MPC-3000).

(.PLA) – Playlist File: Playlist file used by various media players to play media files in a given sequence.

(.PLS) – PlayList File: A .pls file is a playlist, a listing of digital media files (e.g music tracks) sorted in the intended order. This list is then used by your media-player to play the media in the correct sequence. Playlists are often used with music albums, where the included playlist will inform your media player that tracks should be played in their original order (track 1 followed by track 2 etc).These files do not contain any media data, instead they contain a (text) listing of the file names.

(.PLM/.PLS) – DisorderTracker2 modules/samples: 8 / 16-bits, mono, loop, name, collection.

(.PRG) – WAVmaker program files: 8 / 16-bits, mono / stereo, loop, name, instrument, sample start & end offsets.

(.PSB) – Turtle Beach Pinnacle sound banks: This format is a list with assignments of .WAV files to MIDI program numbers. 8 / 16-bits, mono/stereo, loop, name, instrument, drum-kit.

(.PSION) – PSION a-law files: A-law, mono.

(.PSM)- Protracker Studio modules: 8 / 16-bits, mono, loop, name, collection.

(.PTM) – Poly Tracker modules: 8 / 16-bits, mono, loop, name, collection.

(.QCP) – Voice File: The .QCP file extension belongs to an audio format created by Qualcomm for storing voice audio data. Although originally designed for storing voice audio, the format has become popular with mobile phone manufactures (such as LG) for storing ring-tones.It is reported that renaming the extension from .qcp to .wav allows the format to be played in Windows Media Player.

(.RA) – RealAudio files: The .RA file format was created by Real Networks for their media player, RealPlayer.These files usually contain a reference to a location of a streaming audio file on the internet although these files can also contain actual audio data.The file format is compressed using a proprietary compression algorithm, again developed by Real Networks.These files are commonly played through the RealPlayer web-browser plug-in. Voice/Music/Stereo music 8, 11, 16, 22, 32, 44.1 kHz, 20, 34, 45, 80, 150 kbit/s.

(.RAM) – RealMedia Metafile: The .RAM file format is simply a plain-text file that contains a link to a ReadAudio file (.ra extension).

(.RAW.SB/.SDW/.SND/.SW/.UB/.UDW/.UW) Raw PCM data formats  Signed 8-bit PCM data: The file extension has the following meaning:   (.S)   Signed data        (.U)  Unsigned data   (.B)   Byte (8-bit) data (.W)       Word (16-bit) data   (.DW)    DWord (32-bit) data.

(.RAW) – Rdos Raw OPL capture format: Song, FM-synthesis instruments.

(.RAX) – Real Music Store Audio File: The .RAX file format belongs to music purchased through the online ‘Real Music Store’. These files are intended to be played through Real Player.

(.REX) – Audio Sample File:The.REX file extension is used in audio applications for storing audio samples.

(.RFL) – Reason ReFill Sound Bank: RFL files are a component package for the Propellorhead Reason software package. These files can contain patches to update the Propellorhead Reason program as well as various sound samples that can be used in song creation, full song files that can be used by the program, and audio loop formats stored as REX files.In this format, songs and sound samples will often be found compressed to under half their original size.

(.RNS) – Reason Song File: Propellorhead Reason is an audio production software suite that allows song composition and mixing. The RSN file stores the mixing, sequencing, and rack settings from inside the software. However, because RSN files are not actually audio files, they contain no audio data. RSN files simply save the settings for the audio file being created or edited so that opening the RSN file will restore the settings that were being used. RSN files are actually project files.

(.RSD) – Roland Sound File: The .RSD file format is used by Roland’s Visual MT music Tutor application.

(.RSO) – NXT Brick Audio File: The .RSO file extension is used in Lego Mindstorms NXT programming.

(.RX2) – REX2 Loop File: (REX2 Loop) is a proprietary audio file format developed by Swedish music software company Propellerhead for use with their ReCycle music loop editor.The idea behind ReCycle was to enable music producers to easily create new and/or edit existing music loops, allowing them to alter the tempo of the loops, without changing their pitch and without negatively affecting the sound quality.Originally released in 1994, ReCycle made a true revolution in computer music production, being the first application to bring the idea of loop slicing to the mainstream. The REX2 file format is nowadays a standard in audio sample looping and is supported by all major DAWs, such as Reaper, Logic, Cubase, Reason and Pro Tools.REX2 files can either be mono or stereo, and utilize a proprietary non-lossy compression algorithm, which can reduce their file size by up to 60%.The name REX stands for ReCycle Export, since REX was the native export format for Propellerhead ReCycle.

(.RIF) – Rockwell ADPCM format (Hotfax/Quicklink): 2, 3, 4-bit ADPCM, 7200 Hz, mono.

(.RMI) – RIFF-MIDI files: This is a structured file format containing MIDI data plus some additional optional data like comments etc. However, perhaps the most spectacular feature is that it can also contain a DLS instrument collection. This is a new addition to the format so far from all software that handles RMI files may support it yet though.

(.ROCKWELL) – Raw Rockwell 2/3/4-bit ADPCM data: 2 / 3 / 4-bit ADPCM, 7200 Hz, mono.

(.ROL) – AdLib Visual Composer songs: This format contains song data only. It is often accompanied by a .BNK file containing FM-synthesis instruments. MIDI song data.

(.ROM) – Roland MT-32 / CM32L / LAPC-1 control and PCM ROM dumps: LA-synthesis instruments, 16-bit PCM, loop.

(.S) – AKAI S900/S950 samples: 12-bits, mono, loop, name.

(.S) – AKAI S1000/S1100/S01/S3000/S3200/S2000/S2800 samples: 16-bits, mono, loop, name.

(.S1A) – Yamaha EX5 ‘all’ format: 16/8-bits, name, envelopes, lfos, filters, layers, instruments, collection.

(.S1M) – Yamaha EX5 ‘waveforms’ format: 16/8-bits, name, envelopes, lfos, filters, layers, instruments, collection.

(.S1V) – Yamaha EX5 ‘voices’ format: 16/8-bits, name, envelopes, lfos, filters, layers, instruments, collection.

(.S3M/.S3I) – ScreamTracker v3 modules / instruments: 8 / 16-bits, mono / stereo, loop, name, note, collection.

(.S3P) – AKAI MESA II/PC S-series programs: The S3P format contains an S3000 program, but encoded in a ‘SysEx’ like format and is used by the MESA II/PC editor. 16-bits, mono, instrument, layers, loop, name, low-pass filter.

(.SA1) – Audio Encryption File: Files with the extension .SA1 are panasonic ‘AAC’ files data wrapped in SD (secure digital) audio encryption layer.AAC files (Advanced Audio Coding) is a standardized, lossy compression and encoding scheme for digital audio. The AAC format was designed to replace the MP3 format. An .SA1 file is an AAC file wrapped in a SD audio encryption layer and is used by Panasonic for digital rights management.

(.SAM) – Signed 8-bit PCM files: 8-bits, mono.

(.SBI) – Creative Labs FM instruments: FM-synthesis instruments, song.

(.SBK) – EMU SoundFont v1.x banks: The .SBK format (SoundFont v1) has been replaced with .SF2 (SoundFont v2) which should be used for all new files. 16-bits, loop, name, lfos, envelopes, layers, instruments, drum kits, collection, sample start & end offsets, low-pass filter.

(.SBR) – Spectral Band Replication Audio Sound File: Popular in video games, these SBR files are used to store audio data (such as crowd chants/commentary).

(.SC2) – Sample Cell II PC/Mac instruments 8 / 16-bits mono, instrument, sample start offsets.

(.SD)- Sound Designer I files: 8, 16-bits, mono, name, loop.

(.SD2) – Sound Designer II flattened/resource fork files: On the Macintosh these files are really two file ‘forks’. When they are transferred, to a PC these two forks may be flattened (preserving all data and parameters) or only the data fork (containing the raw sound data) may be copied. If only the data fork has been copied, then all information about sample rate and data format has been lost when it was moved to the PC.  8 / 16-bits, mono / stereo.

(.SDK) – Roland S-5xx / S-7xx series floppy disk images + CD-ROM’s (S-50, S-51, S-330, W-30, S-500, S-550, S-70, S-700, S-750, S-760, S-770, S-772): 12 / 16-bits, mono, loop name, fine-tune, lfos, envelopes, layers, instruments, collection, multiple filter types.

(.SDS/.SDX) – MIDI Sample Dump Standard files: SDS is a raw MIDI data dump of an SDS (Sample Dump Standard) transfer and can be up to twice as large as the actual waveform. SDX is a ‘compacted’ form that the program SDX uses to save SDS samples in order to avoid this problem. It also contains a sample name, which SDS does not. 8…32-bits, mono, loop, (SDX only: name, note).

(.SDS) – SmartSound SDS files: 8..32-bits.

(.SEQ) – Sony Playstation MIDI sequences: MIDI song data.

(.SF) – IRCAM / MTU SoundFile formats: 16-bit PCM / 32-bit float, name, note, mono / stereo.

(.SF2) – EMU SoundFont v2.x banks: 16 / 24-bits, loop, name, lfos, envelopes, layers, instruments, drum kits, collection, sample start & end offsets, low-pass filter.

(.SF2PACK) – MIDI Converter Studio packed Sound Font: This format is based on the Sound Font 2 format, but uses WavPack to compress the waveforms. 16-bits, loop, name, lfos, envelopes, layers, instruments, drum kits, collection, sample start & end offsets, low-pass filter.

(.SFA) – Sound Forge Audio File: The SFA file format was developed by Sonic Foundry for their audio and video editing applications. The company was acquired by Sony in late 2003. SFA files are currently associated with Sony Sound Forge. Files with the SFA extension are lossless audio files that offer near CD quality sound. Due to their nature, SFA files are larger in size when compared to other compressed audio formats.

(.SFARK) – Melody Machine compressed SoundFonts: These files are a compressed archive containing an SF2 SoundFont. The compression ratio is typically around 50% but might vary from file to the other.

(.SFI + .SFD) – SoundStage sound files: These files must always come in a .SFI + .SFD pair! The .SFI or ‘Sound File Info’ file contains the sound parameters, the .SFD or ‘Sound File Data’ file contains the actual audio clip. You should only load the .SFI file – the data from the .SFD file will be loaded automatically.

(.SFR) – Sonic Foundry sample resource files: 8 / 16-bits, mono / stereo.

(.SFZ) – rgc:audio SFZ v1 / Cakewalk SFZ v2 instruments: 8 / 16 / 24 / 32-bit PCM / Ogg compressed / FLAC lossless, loop, name, lfos, envelopes, layers, instruments, sample start & end offsets, multiple filter types, trigger type, pedal switch.

(.SGT) – Microsoft DirectMusic Producer Segments File: Used by developers to add audio to their applications to be played through DirectX.

(.SHN) – Shorten lossless compression: This is one of the first more wide-spread formats for lossless audio compression. 8/16-bit PCM, lossless compression, multi-channels.

(.SID) – PlaySID C64 Sound Effects File: Audio sound effects file, containing audio data for the Commodore 64 program, PlaySID.

(.SMD) – J-Phone / SmdEd mobile songs: Song data.

(.SMF) – Standard Midi File: The SMF extension stands for “Standard MIDI File”. The Musical Instrument Digital Interface (MIDI) specification was developed to meet the musicians’ need for a way to connect several different musical instruments with computers and other electronic devices.The MIDI protocol was quickly adopted for early personal computers as it was an excellent solution for storage and playback in exchange for little volume space. SMF files contain the characteristics of a musical event including the start and end of a note, its volume, its pitch and its vibrato.The MIDI protocol has a data rate of 31.25 Kbs, uses 8-bit serial transmission and is asynchronous.

(.SMP) – Samplevision files: 16-bits, mono, loop, name, note.

(.SMP) – Ad Lib Gold samples: 8 / 16-bits, mono / stereo.

(.SMP) – Avalon samples: 8 / 16-bits, mono, loop, name, comment.

(.SND) – AKAI MPC-60/2000/2000XL/3000 sample files: 12/16-bits, mono/stereo, name.

(.SND) – Apple Sound Resource File: Files with the SND extension are attributed to Apple Inc and their Macintosh platform. The SND file format was developed for Mac OS Classic and it was mostly used for games and various programs. SND files contain commands interpreted by the Macintosh Sound Manager, sound samples and wave-table instruments.SND files contain 8 bit mono / 8 KHz sound. Despite the low quality, using current technology SND files can reach 22 KHz for stereo and 11 KHz for mono. Also, due to their small size and basic format, SND files are often used for integrating media files into Web Documents and for Internet Streaming.

(.SNDR) – Sounder sound files: 8-bits, mono.

(.SNDT) – SndTools sound files: 8-bits, mono.

(.SNS) – Burnout Paradise Sound File: The .SNS file extension is used by Criterion Games Burnout Paradise. a driving game for the PC, XBOX 360 and PS3.Criterion used the .SNS file format to store soundtrack music in the game.

(.SOU) – SB Studio II sound files: 8 / 16-bits, mono, loop, name, collection.

(.SPD) – Speech data files: 16-bits, mono.

(.SPL) – Digitrakker sample files: 8, 16-bits, mono, loop, name, huffman packing, collection.

(.SPPACK) – SPPack sound samples: 8 / 16-bits / μ-Law / A-Law, mono.

(.SQ) – Sony PS2 SCEI sequences: MIDI song data.

(.STM) – ScreamTracker v2 modules: 8-bits, mono, loop, name, collection.

(.STS) – Creamware STS-series sampler programs: Only newer versions of the STS samplers use this format – older versions use the AKAI S3000 .P format. 16-bits, mono / stereo, instrument, layers, loop, name, multiple filter types.

(.SVQ) – Roland sequencer files: are used by the built in sequencer in several Roland synths. MIDI song data.

(.SVX) – Interchange file format: 8SVX and 16SV data types: 8 / 16-bits, loop, name, mono / stereo.

(.SWA) – ShockWave Audio: Contains audio data for flash animations, these .SWA files are created by Adobe ShockWave and are similar to the popular .MP3 format.

(.SXT) – Propellerheads Reason NN-XT format: 16-bits, mono / stereo, instrument, layers, loop, name, articulation, sample start & end offsets, multiple filter types.

(.SYX) – Roland D-50 / MT-32 / Yamaha DX7 / DX7s / DX7II / DX200 / DX21 / DX27 / DX100 / DX11 / TX81z patch SysEx dumps: LA-synthesis instruments.

(.SYW/ W??) – Yamaha SY-series wave files: The Yamaha 16-bit SY-series wave sample files are really named .W??. There’s also .T?? (All Data Files, often contains waves and are much more common than .W.J?? (Patch parameters?) and .K?? (Sequenced songs). 16-bits, mono, loop.

(.TRM) – For The Record Voice Recording File: The TRM file format is used by ‘ForTheRecord’ and is an audio file used to store voice recordings in courts.

(.TVN) – Yamaha Tyros 2 custom voice files: 16-bits, mono, loop, envelopes, lfos, filters, layers, and instrument.

(.TVD) – Yamaha Tyros 2 custom drum voice files: 16-bits, mono, loop, envelopes, lfos, filters, layers, and instrument.

(.TXT) – Ascii text parameter description files: Contains human readable information about items at bank, instrument or waveform levels. Useful if you can’t export directly to your destination format and you’ll have to piece together instruments from, say, .WAV files. Human readable information, no sample data.

(.TXT) – Ascii text formatted audio data: Ascii text file with integer sample values in base 10. Samples are delaminated by either , (ascii no 44) or new-line (ascii no 10). Comments and lines not containing sample values must begin with a ; or a %. For stereo files (with interleaved samples), you must manually set stereo in the waveforms properties box source tab. 1..32 bits PCM / floats.

(.TXT) – RTTTL / Nokring mobile ring-tone format: the NokRing software first used The RTTTL – you can use that, or other similar compatible software to transfer ring-tones to your Nokia phone. Song, name.

(.TXT) – Steinberg LM-4 banks: 8 / 16-bits, mono / stereo, loop, names, instrument.

(.TXW/.W??) – Yamaha TX16W wave files: The Yamaha TX16W (and probably also the other Yamaha 12-bit samplers) waveform files are really named .W?? (where ?? is a number). There’s also .F?? (Filter), .V?? (Voice to MIDI no assign?),.U?? (Performance settings) and .S?? (System setup?) Files, which are of no use to you. 12-bits, mono, loop.

(.U255LAW) – Exponential 8-bit format: These are used e.g. by some old drum computers. There is no standard extension for these files, and no way to auto-detect them. 8-bit exp, mono.

(.UAX/.UMX) – Unreal Tournament audio/music packages: 8 / 16-bits, mono / stereo.

(.UL) – Audio file: Audio file created with ULAW, which uses Ulaw encoding.

(.ULAW/.ULW) – Raw CCITT/ITU G.711 μ-Law (US telephony format) audio: This is also the format used by the MIME mail encoding standards audio/basic attachments.Some applications saves these with a ‘reverse’ bit ordering within each byte. μ-Law, mono, 8 kHz.

(.ULT/.UWF) – UltraTracker modules/wave files: 8 / 16-bits, mono, loop, name, collection.

(.UNI) – MikMod ‘UniMod’ modules: 8 / 16-bits, mono, name, loop, instruments, collection.

(.UVN) – Yamaha Tyros 3 custom voice files: 16-bits, mono, loop, envelopes, lfos, filters, layers, and instrument.

(.UVD) – Yamaha Tyros 3 custom drum voice files: 16-bits, mono, loop, envelopes, lfos, filters, layers, and instrument.

(.V8) – Covox 8-bit files: 8-bits, mono.

(.VAB) – Sony Playstation / PS2 bank files: A VAB file can contain a maximum of 128 instruments (called programs), each of which can have up to 16 regions (called tones). If an instrument has more regions, only some of the lowest ones in the lowest layers are written. 4-bit ADPCM, mono, name, envelope, instruments, collection.

(.VAG) – Sony Playstation / PS2 compressed sound files: Audio file used by the Sony Playstation, these sound files are used to store game audio. Due to a limitation of the format, loops points will be aligned to the nearest ’28 samples’ boundary when writing files. 4-bit ADPCM, mono, name.

(.VAP) – Annotated speech files: 8-bit PCM, mu-Law, A-Law, Dialogic 4-bit ADPCM, mono, name.

(.VM1) – Panasonic voice files: ADPCM 4-bit lossy compression, mono.

(.VMO) – Siemens Mobile Phone Voice File: The .VMO fle format was used by Siemens’ mobile cell phones but is no longer used or supported. These files can be converted to .wav format using the vmo2wav application.

(.VOC) – Creative Labs sound files: 8 / 16-bits / μ-Law / A-Law / 2/2.67/4-bit ADPCM, mono / stereo, name.

(.VOX/.VOX-6K/.VOX-8K) – Dialogic 4-bit ADPCM files: VOX files are associated with Dialogic’s Adaptive Differential Pulse Code Modulation (ADPCM) format. Just like any other ADPCM audio file, Dialogic VOX files compress data at a low 4 bit sampling rate. Originally intended for saving human speech, VOX files were optimized to store digitized voice data at a low sampling rate while still maintaining clarity. Another benefit of using VOX files is their small size when compared to other audio files.The VOX ADPCM audio format is mostly used for computer telephony software. It was also used for computer and arcade games.There are several types of ‘Dialogic’ files – ADPCM 8000 Hz, ADPCM 6000 Hz, PCM8, mu-Law, A-law. These cannot be auto detected and can all use the same file extension. ADPCM compression (4-bit) / PCM 8-bit / mu-Law / A-Law, mono.

(.VOX) – Talking Technology Incorporated files: 8-bits, mono.

(.VPM) – Garmin GPS Navigation Voice File: Voice file used by Garmin personal navigation GPS devices The .VPM extension belongs to voice files used by Garmin GPS navigation devices.

(.VRF) – Ventrilo Audio Recording: The .VRF file format is used by the Voice over IP (VoIP) group communication tool ‘Ventrilo’.

(.VY1/VY2/VY3) – Audio Footage File: Files with the extension .VY1, .VY2, .VY3, and so on are audio footage files recorded using the ‘Samsung Yepp VY’ series of digital voice Recorders. The .VY? file contains audio footage recorded onto the voice recorders internal memory, this can then be uploaded onto a PC.

(.VSB) – Virtual Sampler banks: 16-bits, loop, name, lfos, envelopes, layers, instruments, drum kits, collection, sample start & end offsets, multiple filter types.

(.W2A+.W3A) – Yamaha Motif ‘all’ format: 16-bits, name, envelopes, lfos, filters, layers, instruments, collection.

(.W2V+.W3V – Yamaha Motif ‘voices’ format: 16-bits, name, envelopes, lfos, filters, layers, instruments, collection.

(.W2W+.W3W) – Yamaha Motif ‘waveforms’ format: 16-bits, name, envelopes, lfos, filters, layers, instruments, collection.

(.W4KSND) – Wusik 4000 instrument: 6/24/32-bit PCM / 32-bit floats, mono/stereo.

(.W7A+.W8A) – Yamaha Motif ES ‘all’ formats: 16-bits, name, envelopes, lfos, layers, filters, instruments, collection, multiple filter types.

(.W7V+.W8V – Yamaha Motif ES ‘voices’ formats: 16-bits, name, envelopes, lfos, layers, filters, instruments, collection, multiple filter types.

(.W7W+.W8W – Yamaha Motif ES ‘waveforms’ formats: 16-bits, name, envelopes, lfos, layers, filters, instruments, collection, multiple filter types.

(.W64) – Sonic Foundry Wave-64 formats: Files with the extension .W64 are audio files created by ‘Sonic Foundry’. .W64 files or ‘WAVE64’ files were developed to overcome the restrictions of the 4GB limit of the Microsoft ‘WAV’ format.* Sony bought Sonic Foundry’s editor assets (31/07/2003).1..32 bits PCM / μ-Law / A-Law, mono / stereo.

  • A .W64 file is a binary file similar to a RIFF / WAV file.
  • A .W64 file usually stores uncompressed sampled audio as ‘PCM’ (pulse-code modulation).

(Wave) – A digital audio standard developed by Microsoft and IBM. One minute of uncompressed audio requires 10 MB of storage. The .WAV format is a standard audio format for the Windows operating systems and professional level audio/video applications. The format is not compressed and is lossless (no reduction in quality) and is ideal for storing high-quality audio. As the format can contain such high-quality audio, the file sizes are relatively large compared to other compressed formats, such as MP3.

(.WAV) – Microsoft wave format: 1..32-bit PCM / 32 / 64-bit floating point / μ-Law / A-Law / Microsoft 4-bit ADPCM / Intel DVI/IMA 2/3/4-it ADPCM / G.721 4-bit ADPCM / G.723 3/5-bit ADPCM / G.726 2/3/4/5-bit ADPCM / Dialogic 4-bit ADPCM / Rockwell 2/3/4-bit ADPCM / Yamaha 4-bit ADPCM / GSM 06.10 / MPEG audio layer I / II / III, MPEG AAC / Vorbis Ogg / AC3 / DTS / any format with an ‘ACM codec’, mono / stereo / multi-channel, note, loop, root-key, fine-tune, name, comment et c.

(.WAV/.BWF) – Broadcast wave format (EBU BWF): This format, defined by the European Broadcasting Union, is essentially the same as a Microsoft wave format, but with restrictions on what data formats are allowed, and with some extra Meta data stored in them. Normally they also use the .WAV file extension and not .BWF, even though that also occurs. Any application that can read BWF files also ought to be able to read PCM data format WAV files, and vice versa. 8 / 16 / 24 / 32-bit PCM / MPEG audio layer II, mono / stereo, name, comment etc.

(.WA!) – GigaStudio/GigaSampler compressed wave files: 8, 16-bits lossless PCM compression.

(.WFB/.WFP/.WFD) – Turtle Beach WaveFront / WavePatch formats: WaveFront based synthesizers are currently the Maui, Tropez, Rio and Monterey..WFB files contains a whole MIDI Bank, .WFP a single Program and .WFD a complete drum set. 8 / 16-bits / μ-Law, name, note, loop (incl. fractional), envelopes, lfos, layers, instruments, drumkit, collection, sample start & end offsets.

(.WLF) – id Software Music Format: Song, FM-synthesis instruments.

(.WMA) – Windows Media Audio files: The .WMA format was created by Microsoft and is based on the MP3 format but makes significant improvements to the audio quality while retaining similar file sizes.The format is most commonly used for playing audio across the internet.Due to political reasons, these files cannot be played on Apple’s iPod but can be played on other MP3 players such as (not surprisingly) Microsoft’s Zune. 16 / 24 bit mono / stereo, 8, 11, 16, 22, 32, 44.1, 48, 88, 96 kHz, compresses to 5-160 kbit/s, text metadata.

(.WMV) – Windows Media Video files: 16 / 24 bit mono / stereo, 8, 11, 16, 22, 32, 44.1, 48, 88, 96 kHz, compresses to 5-160 kbit/s, text meta data.

(.WPROJ) – Wwise Project: The .WPROJ file format is used by Wwise, an audio run-time engine developed by Audiokinetic. These are project files using the XML file format.

(.WRF) – Westacott WinRanX instrument files: 16/24/32-bit PCM, mono, loop.

(.WRK) – CakeWalk work files: MIDI song data, embedded FM or MT32 SysEx banks, linked SoundFonts.

(.WUSIKSND) – Wusikstation sound file: 16-bit PCM / 32-bit floats, mono/stereo.

(.WUSIKPACK) – Wusikstation pack file: 16-bit PCM / 32-bit floats, mono/stereo, instruments.

(.WV) – WavPack lossless compression: This format is well suited e.g. for compressing important archive data without loosing any quality. 8/16/24/32-bit PCM, 32-bit floating point, lossless compression, multi-channels, gain adjustment (Replay Gain), text Meta data.

(.X0A) – Yamaha Motif XS ‘all’ format: 16-bits, mono/stereo, name, envelopes, lfos, filters, layers, instruments, collection, trigger type.

(.X0V) – Yamaha Motif XS ‘voices’ format: 16-bits, mono/stereo, name, envelopes, lfos, filters, layers, instruments, collection, trigger type.

(.X0W) – Yamaha Motif XS ‘waveforms’ format: 16-bits, mono/stereo, name, envelopes, lfos, filters, layers, instruments, collection, trigger type.

(.X3A) – Yamaha Motif XF ‘all’ format: 16-bits, mono/stereo, name, envelopes, lfos, filters, layers, instruments, collection, trigger type.

(.X3V) – Yamaha Motif XF ‘voices’ format: 16-bits, mono/stereo, name, envelopes, lfos, filters, layers, instruments, collection, trigger type.

(.X3W) – Yamaha Motif XF ‘waveforms’ format.: 16-bits, mono/stereo, name, envelopes, lfos, filters, layers, instruments, collection, trigger type.

(.XM/.XI) – FastTracker 2 eXtende modules/instruments: 8 / 16-bits, mono, loop, name, instruments, panning, collection.

(.XMI) – Miles Sound System extended MIDI files: These files are a type of ‘time quantized’ MIDI songs. They are usually a little smaller than a regular MID file but may in some cases loose a little information. You can also combine an XMI with a DLS into an MSS file. MIDI song.

(.YADPCM) – Raw Yamaha 4-bit ADPCM format: ADPCM 4-bit lossy compression, mono.

(.ZGR) – BeatCreator Loop File: The file extension .ZGR is associated with ‘BeatCreator’ an audio sample/loop editing program. .ZGR files contain the slicing scheme of the saved loop.BeatCreator allows you to edit samples in a variety of ways and then export them as a whole or in slices to your loop creation software such as ‘Fruityloops’.

 

Leave A Reply