Mp3 Info Tag rev 1 specifications - draft 0

The purpose of this tag is to provide extra information about the mp3 bistream, encoder and parameters used. This tag should, as much as possible, be meaningfull for as many encoders as possible, even if it is unlikely that other encoders than Lame will implement it.

This tag should be backward compatible with tha Xing vbr tag, providing basic support for a lot of already written software. As much as possible the current revision (revision 1) should provide information similar to the one already provided by revision 0.

A few fields, as they could be necessary for some functionnalities of already existing software, should not be moved in any version of the tag. They are indicated as "UNMOVABLE".

 

LAME 3.88 Tag example :

frame at 44.1kHz samplerate:
 

0000:
0010:
0020:
0030:
0040:
0050:
0060:
0070:
0080:
0090:
00A0:
00B0:
00C0:
00D0:
00E0:
00F0:
0100:
0110:
0120:
0130:
0140:
0150:
0160:
0170:
0180:
0190:
01A0:
FF FB 90 64-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-58 69 6E 67-00 00 00 0F-00 00 00 74
00 00 30 C1-00 04 07 09-0B 0D 0F 14-16 18 1A 1D
1F 23 26 28-2A 2C 2E 33-35 37 39 3C-3E 40 44 47
49 4B 4D 4F-54 56 58 5A-5D 5F 63 66-68 6A 6C 6E
73 75 77 79-7C 7E 80 84-87 89 8B 8D-8F 94 96 98
9A 9D 9F A3-A6 A8 AA AC-AE B3 B5 B7-B9 BC BE C0
C4 C7 C9 CB-CD CF D4 D6-D8 DA DD DF-E3 E6 E8 EA
EC EE F3 F5-F7 F9 FC FE-00 00 00 58-4C 41 4D 45
33 2E 38 38-20 28 62 65-74 61 29 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00
 ?Éd

  Xing   ?   t
  0? ?????¶????
?#&(*,.3579<>@DG
IKMOTVXZ]_cfhjln
suwy|~ÇäçëïìÅöûÿ
Ü¥úª¿¬¼«???????
?????????????µ??
????????   XLAME
3.88 (beta)
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

/*

//    ZONE A - Traditional Xing VBR Tag data
//    4 bytes for Header Tag
//    4 bytes for Header Flags
//  100 bytes for entry (NUMTOCENTRIES)
//    4 bytes for FRAME SIZE
//    4 bytes for STREAM_SIZE
//    4 bytes for VBR SCALE. a VBR quality indicator: 0=best 100=worst

//   ZONE B - Initial LAME info
//   20 bytes for LAME tag.  for example, "LAME3.12 (beta 6)"
// ___________
//  140 bytes
//

//   ZONE C - LAME Tag
//   208 bytes unused in 128k frame (in 48kHz case)
//
//   using
//   FrameLengthInBytes = 144 * BitRate / SampleRate + Padding
//
//   this gives
//   Layer III, BitRate=128000, SampleRate=44100, Padding=0
//        ==>  FrameSize=417 bytes
//   Layer III, BitRate=128000, SampleRate=48000, Padding=0
//        ==>  FrameSize=384 bytes
//
//   so this would make the minimal frame size 384 bytes ($0-$17F), hence the available bytes for this field are not 241 as in this 44100Hz case, but at most 208 bytes.
*/

frame at a 48.0kHz samplerate:
 

0000:
0010:
0020:
0030:
0040:
0050:
0060:
0070:
0080:
0090:
00A0:
00B0:
00C0:
00D0:
00E0:
00F0:
0100:
0110:
0120:
0130:
0140:
0150:
0160:
0170:
FF FB 94 64-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-58 69 6E 67-00 00 00 0F-00 00 00 7E
00 00 30 C0-00 04 06 08-0C 0E 10 12-16 18 1A 1C
21 23 25 27-2B 2D 2F 31-35 37 39 3B-3F 41 43 47
49 4B 4D 51-53 55 57 5B-5D 5F 62 66-68 6A 6C 70
72 74 76 7A-7C 7E 80 84-86 88 8C 8E-90 92 96 98
9A 9C A1 A3-A5 A7 AB AD-AF B1 B5 B7-B9 BB BF C1
C3 C7 C9 CB-CD D1 D3 D5-D7 DB DD DF-E2 E6 E8 EA
EC F0 F2 F4-F6 FA FC FE-00 00 00 58-4C 41 4D 45
33 2E 38 38-20 28 62 65-74 61 29 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
00 00 00 00-00 00 00 00-00 00 00 00-00 00 00 00
 ?öd

  Xing   ?   ~
  0? ???????????
!#%'+-/1579;?ACG
IKMQSUW[]_bfhjlp
rtvz|~ÇäåêîÄÉÆûÿ
Ü£íúѺ½¡»???????
?????????????µ??
????÷·??   XLAME
3.88 (beta)
 
 
 
 
 
 
 
 
 
 
 

 



Suggested Info Tag extension fields + layout :
 
REMARK:

In the Info Tag, the "Xing" identification string (mostly at 0x24) of the header is replaced by "Info" in case of a CBR file. 
This was done to avoid CBR files to be recognized as traditional Xing VBR files by some decoders. Although the two identification strings "Xing" and "Info" are both valid, it is suggested that you keep the identification string "Xing" in case of VBR bistream in order to keep compatibility.

now: 

LAME VBR & ABR:

"Xing"

LAME CBR:

"Info"

 
 
 
 
 
 
 
 
00h
00h
00h
$9B
"L"
"A"
"M"
"E"
$A0
"."
$A2
$A3
$A4
$A5
$A6
$A7
$A8
$A9
$AA
$AB
$AC
$AD
$AE
$AF
$B0
$B1
$B2
$B3
$B4
$B5
$B6
$B7
$B8
$B9
$BA
$BB
$BC
$BD
$BE
$BF


byte $9B  VBR Quality 

This field is there to indicate a quality level, although the scale was not precised in the original Xing specifications.

In case of Lame, the meaning is the following:

int    Quality = (100 - 10 * gfp->VBR_q - gfp->quality)h

examples:

V0 and q0 = 100 - 10 * 0 - 0 = 100 => 64h
V0 and q2 = 100 - 10 * 0 - 2 = 98 => 62h
V2 and q5 = 100 - 10 * 2 - 5 = 75 => 4Bh
V9 and q9 = 100 - 10 * 9 - 9 = 1 => 01h


bytes $9A-$A4  Encoder short VersionString 


9 characters

examples:

"LAME3.90a" : LAME version 3.90 alpha
"GOGO3.02b" : GOGO version 3.02 beta


byte $A5  Info Tag revision + VBR method 


two 4 bits fields:

In case of Lame, the meaning is the following:
2: abr
3: vbr old / vbr rh
4: vbr mtrh
5: vbr mt

examples:

byte $A5 = 03h

= 0001 0011b =>

byte $A5 = 35h

= 0011 0101b =>


byte $A6  Lowpass filter value 

int    lowpass = (lowpass value) / 100

range: 01h = 01d : 100Hz -> FFh = 255d : 25500Hz

value 00h => unknown

examples:

byte $A6 = C3h

C3h = 195d : 19500Hz

byte $A6 = 78h

78h = 120d : 12000Hz


bytes $A7-$AF  Replay Gain 

as defined here: http://www.david.robinson.org/replaylevel/ by David Robinson

three fields:


byte $AF  Encoding flags + ATH Type 


two 4 bits fields: examples:

byte $AF = 03h

= 0000 0011b =>

byte $AF = 15h

= 0001 0101b =>


byte $B0  if ABR {specified bitrate} else {minimal bitrate}


IF the file is an ABR file:

range: 01h = 01d : 1 kbit/s (--abr 1)  -> FFh = 255d : 255 kbit/s or larger (--abr 255)

value 00h => unknown

examples:

byte $B0 = C3h

C3h = 195d : --abr 195

byte $B0 = 78h

78h = 128d : --abr 128

byte $B0 = FEh

FEh = 254d : --abr 254

byte $B0 = FFh

FEh = 255d : --abr 255 or higher, eg: --abr 280

IF the file is NOT an ABR file: (CBR/VBR)

the (CBR)/(minimal VBR (-b)) bitrate is stored here 8-255. 255 if bigger.

examples:


bytes $B1-$B3  Encoder delays 


store in 3 bytes:

[xxxxxxxx][xxxxyyyy][yyyyyyyy]

the 12 bit values (0-4095) of how many samples were added at start (encoder delay) in X and how many 0-samples were padded at the end in Y to complete the last frame.

so ideally you could do: #frames*(#samples/frame)-(these two values) = exact number of samples in original wav.

so worst case scenario you'd have a 48kHz file which would give it a range of 0.085s at the end and at the start.

example:
[01101100][00010010][11010010]

X = (011011000001)b = (1729)d, so 1729 samples is the encoder delay
Y = (001011010010)b = (722)d, so 722 samples have been padded at the end of the file


byte $B4  Misc 

2 lsb I'd like to add the different noise shapings also in a 2-bit field (0-3)
(00)b: noise shaping: 0
(01)b: noise shaping: 1
(10)b: noise shaping: 2
(11)b: noise shaping: 3
3 bits Stereo mode

msb fist:

(000)b: (m)ono
(001)b: (s)tereo
(010)b: (d)ual
(011)b: (j)oint
(100)b: (f)orce
(101)b: (a)uto
(110)b: (i)ntensity
(111)b: (x)undefined / different
 

1 bit unwise settings used

(0)b: no
(1)b: yes (definition encoder side(*))

2 msb Source (not mp3) sample frequency

(00)b: 32kHz or smaller
(01)b: 44.1kHz
(10)b: 48kHz
(11)b: higher than 48kHz

(*)some settings were used which would likely damage quality in normal circumstances. (like disabling all use of the ATH or forcing only short blocks, -b192 ...)


byte $B5  MP3 Gain 


any mp3 can be amplified by a factor 2 ^ ( x * 0.25) in a lossless manner by a tool like eg. mp3gain

byte $B5 is set to (00)h by default.

if done so, this 8-bit field can be used to log such transformation happened so that any given time it can be undone.
 

WARNING:

Do NOT alter this field if you do not fully understand its use.  You will damage the Replaygain
fields and musicCRC if you do not implement this correctly.


You can only modify this field if
  1. the TagCRC checks out
  2. you update all three the ReplayGain fields with the correct number of 1.5dB steps
  3. the TagCRC is updated after all this
Do NOT change/update the musicCRC.  It will be invalid after you change this mp3gain field.
If an application like mp3gain changes the main music frames of the mp3 then the musicCRC should be invalid. 
The Lame Tag CRC should still be valid however (it could be updated by mp3gain).
only tools like mp3gain should use this field, as it is made for making lossless adjustments to the mp3

after encoding is finished.  No need to support this in LAME or any decoder at all.

2^(a/4) , range of "a" here: -127..0..127
 
[byte $B5]b
a
dB change
amplification factor used was
[11111111]
-127
-190.5dB
0.000000000276883
...
...
 
 
[10000011]
-3
-4.5dB
0.594603557501360533
[10000010]
-2
-3dB
0.707106781186547524
[10000001]
-1
-1.5dB
0.840896415253714543
[00000000]
0
0dB
1.0
[00000001]
1
+1.5dB
1.18920711500272107
[00000010]
2
+3dB
1.41421356237309505
[00000011]
3
+4.5dB
1.68179283050742909
[00000100]
4
+6dB
2.0
[00000101]
5
+7.5dB
2.37841423000544213
...
...
 
 
[00011111]
31
+46.5dB
215.269482304950923
...
...
 
 
[01111111]
127
+190.5dB
3611622602.83833951

the +-190.5dB range is too large, but there was little else to do with the extra bits, so for uniformity we took this range.


bytes $B6-$B7  Preset and surround info

2 most significant bits: unused

 

3 bits: surround info

0: no surround info
1: DPL encoding
2: DPL2 encoding
3: Ambisonic encoding
8: reserved

 

11 least significant bits: Preset used.

0: unknown/ no preset used
This allows a range of 2047 presets. With Lame we would use the value of the internal preset enum.


bytes $B8-$BB  MusicLength 


32 bit integer filed containing the exact length in bytes of the mp3 file originally made by LAME excluded ID3 tag info at the end.

The first byte it counts is the first byte of this LAME Tag and the last byte it counts is the last byte of the last mp3 frame containing music.

Should be filelength at the time of LAME encoding, except when using ID3 tags.

practical example:
[misc+ID3v2 tag info][LAME Tag frame][complete mp3 music data][misc+ID3v1/2 tag info]

remark: applying any (ID3v2) kind of tagging or information in FRONT of the LAME/Xing Tag frame is a very bad idea.  You will disable the functionality of all decoders to read the tag info correctly. (for example: VBR mp3 seek info will no longer be usable)

range (1)d-(4,294,967,295)d [ or about 4294967295/(650*1024*1024)/320*1411 = 27.79 hours of 44.1kHz 320kbit/s music. ]

Musiclength not set / unknown / larger than 4G:
$B8 $B9 $BA $BB
00h 00h 00h 00h

use of this field: together with the next field deliver


Examples:
 
$B8 $B9 $BA $BB
(29)h (17)h (A3)h (62)h
would be (2917A362)h = (689,415,010)b bytes
$B8 $B9 $BA $BB
(00)h (3B)h (82)h (B5)h
would be (3B82B5)h = (3,900,085)b bytes


bytes $BC-$BD  MusicCRC 


contains a CRC-16 of the complete mp3 music data as made originally by LAME. reason : will guarantee that the actual mp3 music data is intact, unregardless people adding ID3 tags to the end (or the start) of the file.

practical example:
[misc+ID3v2 tag info][LAME Tag frame][complete mp3 music data as made by LAME][misc+ID3v1/2 tag info]

remark: applying any (ID3v2) kind of tagging or information in FRONT of the LAME/Xing Tag frame is a very bad idea.  You will disable the functionality of all decoders to read the tag info correctly. (for example: VBR mp3 seek info will no longer be usable)

Meaning of this musicCRC:

"if the musicCRC is correct, then this file (or the music data in it) are identical to when encoded by LAME"

It does not say:

This will enable: CRC-16 should suffice since in the event of total randomness every 65536th defective file on average will be falsely identified as being not defective.  Also, CRC-16 routines are present in both mp3 encoder and decoders.

CRCInitValue := $0000;


bytes $BE-$BF  CRC-16 of Info Tag 


contains a CRC-16 of the first 190 bytes ($00-$BD) of the Info header frame. This field is calculated at the end, once all other fields are completed.

reason : safeguards LAME VBR header against easy tampering.  Improving the header functionality as quality control/verification tool for VBR files.

CRCInitValue := $0000;


Remarks / Ideas: