The RetroPlatform framework (as used in
players like
Amiga
Forever and
C64
Forever) employs a score-based mechanism
to try to identify unknown content by
analyzing its media.
Recognition is desirable because it allows to
display the proper publisher name, title,
etc., and to preset an appropriate
configuration.
The database of known
titles is called RetroPlatform Library, and
it may be a local "Library.rp-lib" file, or a
live service. When installed, the file may be
at a location like
"C:\ProgramData\Cloanto\RetroPlatform". After
a library update, the previous version is
preserved as "Library.rp-lib.backup".
A numerical score is used to express
similarity between media image files and
known data stored in the RetroPlatform
Library. A value of 100 indicates "full
match", while 0 indicates "completely
unrecognized".
In a first step, the media is divided
into "slices", for each of which a SHA-1
checksum is calculated. Floppy disks, tapes
and executables are sliced into 10 parts. CD
images (e.g. ISO and CUE+ISO/WAV/MP3) are
sliced into 9 parts for the first data
track, and a remaining part for the
additional tracks (usually audio). Some data
such as certain boot block and root block
properties are processed separately. If the
checksums of all 10 slices match an image in
the database, the recognition score is set
to 100 and the medium is considered to be
the same as the reference once.
If less than 10 slices match, then each
matching slice is assigned 9 points, and
individual 1 point scores are assigned to
certain properties such as volume name,
creation date and modification date.
For different Amiga disk image formats
(e.g. ADF, DMS, etc.), the player tries to
map the image to a common
(non-copy-protected) format.
The mechanism of dividing media into
"slices" is highly effective for fixed-size
media like floppy disk images, because it
can help recognize the media even after
small changes to the content. For best
results, portions of the disk which are
recognized as blank do not contribute to the
score (or else two formatted disks with only
a few small files each would have a high
similarity even if the files were
different).
Variable-size content, such as tape and
CD images, does not require processing of
empty portions (because there are no empty
parts). For
variable-size content such as tape data,
the mechanism is less effective, but still
at least as good as other techniques (e.g.
CRC-32 over the full content).
To make a practical example applied to a
floppy disk, if for example a small high score
file is written to a game disk, it is likely to
affect 1 of the 10 SHA-1 checksums
("slices"), and
possibly some fields in the root block.
Overall, however, the modified disk will
still have a similarity score of about 90,
compared to the original disk.
The recognition logic used by player
tools such as RP9 Toolbox can be adjusted
under Tools/Options/Media Recognition
Scores. Default settings are:
- "Do not prompt" if the score is at least
70 points
- "Always prompt" if similar items are within
20 points
- Ignore media content if under 50
points
When opening or importing media, the
player may ask for confirmation if the score
is under a threshold that is considered
reliable for automatic recognition, even if
only one match is found. This default value
is 70 (out of 1-99). Regardless
of this value, when an exact match is found
in the library the player always stops
searching for further matches, and never
prompts for details.
Even a high score (e.g. above 70) may
require further disambiguation if more than
one match is found. The default score difference
between two items that are considered
similar is 20. So for example if one match
has a score of 78 and another one a score of
85, the player will prompt for manual
disambiguation (because both matches are
within 20 points from each other).
Another adjustable value indicates what score is always
considered too low for media recognition
purposes, causing the player to revert to an
analysis of the file name alone (e.g. RP9 or
TOSEC file name data). The default is 50.
The valid range is 0-100 (0 = "never use
file name", 100 = "never use library
data").
Related Links