About SourceSafe physical file names and file numbers

If you worked long enough with SourceSafe, you probably heard by now about the physical file names associated with files and projects from a SourceSafe database. The SourceSafe command line ss physical can be used to display the physical file name associated with a logical file or folder from the VSS database.

If you look at the the free SourceSafe tools from the www.ezds.com site you'll find a SSNPL utility whose description says that every SourceSafe file has a number, a physical file name, and a logical file name; ssnpl converts from one to the other.

If you asked yourself why is the number good for, the answer is simple: when storing a reference to a physical file in another database file, VSS will use the file number instead of the physical file name, because it requires less space on disk.

So, the question that remains is how does VSS generate these physical file names, and what is this file number calculated?

Let's start our investigation by creating a new database ("mkss.exe C:\temp\vss"). The database is initially empty, and in the data folder there is only one file, data\a\aaaaaaaa. If you use the "ss.exe physical $/" command you'll find out that AAAAAAAA physical name is used by the database root.

From this MSDN article you'll find that the content of the data\aaaaaaaa.cnt file in the SourceSafe database reflects the physical name of the last file added to the database. Indeed, the initial content of this file is 'AAAAAAAA' - sign that last created file or folder was the database root.

Let's add a file into the database. You'll see that the content of the aaaaaaaa.cnt file is now BAAAAAAA, and SourceSafe has created a new file on disk, data\b\baaaaaaa.

Let's continue adding one-by-one new files into the database and each time look at the content of the aaaaaaaa.cnt to see how the new physical files are named. It's easy to see the files are named BAAAAAAA, CAAAAAAA, DAAAAAAA, etc., each file ending up in the data subfolder identified by the first letter of the filename - (data\b\baaaaaaa, data\c\caaaaaaa, data\d\daaaaaaa), etc. After the last letter of the alphabet is reached (ZAAAAAAA), the filenames wrap back, using the second letter: ABAAAAAA, BBAAAAAA, CBAAAAAAA, etc., with files being created on disk in data\a\abaaaaaaa, data\b\bbaaaaaa, data\c\cbaaaaaa, etc.
So, the SourceSafe naming scheme and location of physical files uses a 26-way hashtable to distribute the files on disk in the a-z subfolders of the data folder.

If you run the SSNPLtool on the files added into the database ("ssnpl.exe $/ C:\temp\vss\data", etc.) you'll see their numbers are incremental: 0 (the database root), then 1, 2, 3, 25, 26, 27, .... etc. for each added file.

It's failry easy now to deduce how the numbering scheme works:

  • each file added into the database gets next available number in sequence: 0, 1, 2, etc.
  • the physical file names are composed of 8 letter characters, using letters A-Z: [L0][L1][L2][L3][L4][L5][L6][L7]
  • the number associated with a file is: (L0 - 'A') + 26 * (L1 -'A') + 26^2 * (L2 - 'A') + 26^3 * (L3 - 'A') + ....

Basically the physical file name is a base-26 representation of the file number, with each base-26-digit represented in A-Z range instead of 0-9A-P.

You should be now able to convert easily between the file numbers and physical file names.

As for the mapping between physical file names and logical paths in the database, that is a bit more complex, so I'll leave it for another time...

Previous
Next Post »