Disc IDs and Tagging
ID3v1
ID3v1 doesn't have any way to hold any TOC data of any kind, so it's relatively useless. EAC provides for automatically adding the CDDB discid of the CD it is ripping to the comments field, but I find this is rare on files that might be downloaded.
ID3v2
ID3v2 is fairly good with its MCDI frame. It allows for actually storing the binary version of the CD's TOC in the frame. Unfortunately, only three programs I know of use this frame (properly, or improperly):
- CDEx (? Supposedly someone patched it to work "right" But I'm not sure.)
- A Patched Version of Jeremy Zawodny's discid (complete with support for hashing MB discids, as well as CDDB1 discids)
The latter, (which I made) I have built based on the SCSI MMC command set's default TOC query response 0000b (which details exactly what the CD TOC query should return). The others most likely do the same, but I am not sure.
Basic format in pseudo-C code (all are big-endian, MSB first):
struct cdtoc { unsigned short toc_data_length; unsigned char first_track_number; unsigned char last_track_number; /* the following fields are repeated once per track on * the CD, and then one extra time for the lead-out */ unsigned char reserved1; /* This should be 0 */ unsigned char adr_ctrl; /* first 4 bits for the ADR data last 4 bits for Control data */ unsigned char track_number; /* This is 0xAA for the lead-out */ unsigned char reserved2; /* This should be 0 */ unsigned long lba_address; /* NOT MSF. */ /* The lba_address may be misapplied in my hack of discid * (off by 150 frames, track 1 starts at LBA 0), but I don't * have a way to verify that this is the case. */ };
Please correct me (and email me: pipian_at_pipian_dott_com) if my code is indeed wrong for the LBA (anyone with low-level ATAPI experience able to confirm? I may look in the Linux source to double check.).
As a result, it's easy to calculate any DiscID (CDDB or Musicbrainz) with this method.
In addition to the MCDI tag, I've proposed several possible TXXX frames for storing data for those that don't have the MCDI data (MusicBrainz CDIndex, CDDB Discid).
Of note, however, is that the MCDI tag does not implicitly support storing both the standard TOC and the MultiSession TOC (of a similar structure, but used to denote the beginning and end of sessions). As a result, it is a lossy mapping from the full TOC of Enhanced CDs, and the Musicbrainz DiscID may not necessarily be recoverable from the MCDI alone (although in many cases it is). It might be a good idea to stuff this alternate TOC into a PRIV tag.
Vorbis
At this time, Vorbis has no way to store binary TOC data for hash calculation. Due to the UTF-8 restriction, the only major possibility would be to concatenate the offsets with +'s, similar to how Microsoft does it in WMA (explained below)
APEv2
APEv2 likewise has no such method. Considering that they harshly disapprove of binary data, again, a Text rendition would be necessary.
AAC/iTunes
iTunes seems to have two different ways of holding CDDB IDs. I've not quite figured out what the difference is in making one frame or the other. Suffice to say though, even though their tagging is ID3v2 ported to the Atomic format, they did not port the MCDI frame over.
Anyway, on to what we do know about:
(eng)iTunes_CDDB_IDs
: This is short and sweet, and seems to be much more common. Suffice to say, it is a string with three fields, concatenated with "+"s. First field is the decimal representation of the number of tracks. Second field seems to be the ASCII representation of a 16-byte MD5 hash. Third field may be a decimal length of some sort. It is believed to be the CDDB2 (Gracenote) hash.
(eng)iTunes_CDDB_1 and (eng)iTunes_CDDB_TrackNumber
: These seem to be rarer, and it is unknown what creates them (perhaps a CD that cannot be found in Gracenote?) This one is much more feasible, as it actually has an ASCII representation of the TOC. It appears to be in the normal CDDB hash, with the Discid first then the number of tracks, and then the offset of each track in LBA form until lead out. This is concatenated by "+"s.
Suffice to say, I highly doubt iTunes will really be helpful to us for tagging purposes.
WMA
WMA strings (for which the WM/MCDI frame is one) are all encoded in little-endian UTF-16 (in the typical windows fashion).
The WM/MCDI frame resembles (in many aspects) the MCDI frame of ID3v2. However, it consists of (all in UTF-16 hexadecimal):
Number of tracks, followed by track offsets in hexadecimal.
Summary
ID3v2 is the most helpful of the tagging formats, with WMA coming in second. iTunes can be helpful at times, but not often, and ID3v1, APEv2, and Vorbis comments are not helpful at all.