Cloanto Implementation of INI File Format
Abstract
This document specifies a text-based file format for representing software
configuration data in a format which is easily editable by humans and unambiguously
readable by a simple automatic parser.
Introduction
The format described here is an implementation of the ".INI File Format", as
defined in the Microsoft Windows for Workgroups Resource Kit. Since the original Microsoft
specification is generic, and implementations vary widely, we decided to document how the
INI files and parsers used in Cloanto applications works. As far as possible, existing
specifications and practices for character encoding and programming have been retained.
Where this document says that an application "should" implement a certain
feature in a certain way, it means that the Cloanto INI parser supports this feature as
recommended here. Where this document says that a certain style "should" be
avoided, but parsers "should" support it or ignore it, it means that authors of
INI files should not use that style (the risk being incompatibility with slightly
different INI parsers), but authors of parser software should support it or ignore it as
indicated, rather than generate an error condition. In the following, "INI"
refers to the file format as specified here.
Character Set
For maximum compatibility, the entire INI file (including section names, key names and data values)
should be encoded
using 8-bit characters in the ISO 8859-1 character set (of which 7-bit ASCII is a subset).
Application-specific string values may store data using other character sets, and even
binary data, as long as they are encoded as either ASCII or ISO 8859-1 text. ANSI C Escape
sequences are used for the encoding of special characters in strings.
Alternatively, if the risk of incompatibility with
other application scenarios (e.g. data exchange with other applications,
driver INF files, AutoRun, etc.) can be excluded, the INI file may be
encoded as a Unicode stream, inclusive of an initial byte order mark (BOM)
that describes the encoding (UTF-8, UTF-16 or UTF-32) and endianness (big
endian or little endian).
Known applications that write Unicode INI files include
Software Director and the
Windows Registry Editor. Both write little-endian UTF-16 data.
Also see the section "Defensive INI: Detecting Unicode, XML,
JSON"
in this document.
File Names
If the INI file is to be stored or used on a Microsoft Windows system, the following
characters should not be used in the file name, as they are not supported by older
file
systems: "\", "/", ":", "*",
"?", """ (double quote character), "<",
">" and "|".
Additionally, certain file names are supported at the
file system
level, and can be accessed by programs, but they are not accessible from certain shells
and file manipulation programs. These names include: "COMx",
"AUX", "LPTx", "PRN", "NUL" and
"CON", and all variants with a "dot suffix". File names entirely
consisting of dots (".", "..", etc.) are also "problem
files".
Where compatibility with older file systems is important, short names
("8+3") should be used for directory and file names.
File Structure
An INI file is an 8-bit text file divided into sections, each containing zero or more
keys. Each key contains zero or more values.
Example:
[SectionName]
keyname=value
;comment
keyname=value, value, value ;comment
Section names are enclosed in square brackets, and must begin at the beginning of a
line.
Section and key names are case-insensitive. In
consideration of possible future upgrades from INI to XML o JSON, authors of INI files
may want to use consistent and case-exact section and key names, as if INI
parsers were case-sensitive (XML and JSON parsers are case-sensitive).
Section and key names cannot contain spacing characters.
The key name is followed by an equal sign ("=", decimal code 61), optionally
surrounded by spacing characters, which are ignored.
If the same section appears more than once in the same file, or if the same key appears
more than once in the same section, then the last occurrence prevails.
Multiple values for a key are separated by a comma,
optionally followed by one or more spacing
characters (as defined below).
When a parser encounters an unrecognized section name, the entire section (with all its
keys) should be skipped. Within a known section, only unrecognized keys should be skipped.
Spacing, Line Terminators and Comments
Both Space (decimal code 32) and Horizontal Tab (HT, decimal code 9) are acceptable
spacing characters.
Lines are terminated by a CR (decimal code 13) or LF (decimal code 10) character. If
CR+LF or LF+CR appear consecutively, they count as a single line terminator, generating
one line, not two lines. A sequence consisting of CR+LF+CR+LF would however result in two
lines, not one line. Where implementation-specific considerations do not advise otherwise,
the recommended line terminator is a LF character.
Comments are introduced by a semicolon character (";", decimal code 59).
Comments must begin at the beginning of a line or after a spacing character. Comments
terminate at the end of the line.
Parsers should treat HT (decimal code 9) characters as space characters (decimal code
32). Comments, empty lines, and spaces at the beginning of a line should be ignored.
Parsers should also ignore binary characters other than HT, LF and CR, and treat an end of
file like an end of line before continuing with specific end-of-file processing.
Parsers should be tolerant towards spacing variations, such as in:
[section name]
keyname=value
; comment
keyname = value, value , value ;comment
String Values
String values may optionally be enclosed in quote characters (""",
decimal code 34). String values beginning or ending with spaces, or containing commas
or semicolons,
must be enclosed in quotes. Quote and backslash ("\", decimal code 92)
characters, as well as binary characters (decimal ranges 0..31, 127..159) appearing inside
strings must be encoded using the escape sequences
described in this document.
The default character set for text appearing inside string values is ISO 8859-1.
Applications may define certain sections and/or keys to store text which, from the
application side, results as being encoded using different character sets, or even binary
data. However, the string data appearing in the INI file must be encoded as ISO 8859-1
text, using escape sequences as necessary, so as to be compatible with INI parsers.
Path Values
Path values are identical to String values, with the
exception that escape sequences introduced by the backslash character are not
supported. Path values are used to represent file system and registry paths,
where for clarity it is not desirable to use double backslash characters
("\\") to indicate backslash characters inside paths. This means that
a path like "C:\readme.txt" would remain unchanged (whereas it would
have to be indicated as "C:\\readme.txt" in a String field).
Numerical Values
In numerical values, the period (".", decimal code 46) is the only valid
decimal separator. Leading zeros are optional (e.g. "0.5", "000.5" and
".5" are all valid representations for the same value, i.e. 0.5). For
consistency, one leading zero (e.g. "0.5") is the preferred format to represent
values smaller than 1.
Trailing zeroes may be used on an application-dependent basis, for example to express
precision (e.g. "0.50", opposed to "0.5", may indicate that the
smallest currency unit is equal to fifty "cents", as opposed to five
"tenths" of the full unit, and "1.234000" may indicate to use a
precision of six decimal digits in partial or total results).
Escape Sequences
Escape sequences, consisting of a backslash followed by a lower case letter or by a
combination of digits, should be used to encode binary data and certain other special
characters and character combinations. The result of each escape sequence is parsed as if
it were a single character. Quote and backslash characters inside a string must always be
preceded by a backslash character. Binary data in key values should be encoded using one
of the octal or hexadecimal notations described below.
Escape
Sequence |
Represents |
\a
|
Bell (alert)
|
\b
|
Backspace
|
\f
|
Form feed
|
\n
|
New line
|
\r
|
Carriage return
|
\t
|
Horizontal tab
|
\v
|
Vertical tab
|
\'
|
Single quotation mark
|
\"
|
Double quotation mark
|
\\
|
Backslash
|
\?
|
Question mark
|
\ooo
|
ASCII character in octal notation
|
\xhhhh
|
ASCII character in hexadecimal notation
|
\ at end of line
|
Continuation character
|
The octal (ooo) and hexadecimal (hhh) streams can be of any length, and
terminate as soon as the first non-octal or non-hexadecimal character is encountered. For
example, the digit 8 would terminate an octal stream, and be interpreted as a separate
character. Both uppercase and lowercase letters A..F are acceptable for hexadecimal.
The use of "\'" and "\?" is entirely optional (unlike
"\"", "\\" and binary characters), and implemented only for
compatibility with ANSI C.
Nonprinting sequences, such as "\a" and "\f", produce
device-dependent results. Applications should ignore such codes if they are not applicable
to the context. Applications should try to implement at least "\n" and
"\t". A simple implementation for "\t" could be to replace it with
eight space characters, or with one space character, depending on the intended use.
All escape sequences consisting of a backslash character plus a character that does not
appear in the table listed above are ignored. For example, "\A" or
"\c" would simply be skipped by a parser.
A backslash followed by any combination of CR and LF characters is considered a
continuation character. In this case, the backslash is ignored, and the new line beginning
after the sequence of CR and LF characters ends is treated as the continuation of the
previous line. For line-counting purposes (e.g. to report an error in a certain line),
however, the lines are treated as separate.
INI Files on Web Servers
A few special considerations apply to INI files hosted on
web servers.
MIME Media Types instruct web clients how to handle files
received from a server. The HTTP protocol specification requires that the web
server reports the MIME type for content. By default, servers like Internet
Information Services 6.0 serve only files with extensions registered in their
MIME type list. If the web server does not already have a MIME type entry for
INI files, we recommend that it be added by setting the MIME type for extension
".ini" to "application/octet-stream".
If a security filter such as Microsoft Urlscan is running
on the web server to control client requests, it must be configured to allow the
data files to be fetched (e.g. by adding .ini to the [AllowExtensions] section
and removing it from [DenyExtensions] in %SystemRoot%\System32\Inetsrv\Urlscan\URLScan.ini).
If it is not possible to change the security settings, the names of the data
files should be changed (e.g. replacing ".ini" with ".txt").
INI in Autorun.inf Files
AutoRun is a feature of the Microsoft Windows operating
system which makes it possible to automatically run a program (e.g. a menu
window, a setup procedure, etc.) when a medium is inserted in the drive. When a
medium is inserted in an AutoRun-enabled drive, the system looks for a file
named "autorun.inf". For maximum compatibility both with Windows (especially
older versions) and with third-party parser applications, it is recommended
that:
-
No comments should be used in the file
-
No space characters should be used around "="
-
No space characters should be used in Open and Icon paths
(not even if the paths are quoted)
-
Line terminators should be CR+LF
For additional information on AutoRun you may want to
refer to the MenuBox FAQ,
documentation and Knowledge Base pages.
Defensive INI: Detecting Unicode, XML, JSON
Sometimes data is initially written as an 8-bit INI file, but later as
the application requirements evolve this morphs into a Unicode INI and/or an
XML or JSON file. In order to support format changes and extensions, version
information is traditionally placed inside the INI data (e.g. a
"RequiredVersion" key). However, a transition from 8-bit INI to Unicode or
XML or JSON would prevent legacy code to access the version information itself. It
may therefore make sense to implement a simple check from the very first
version of the software, so that such a condition can be handled
appropriately (e.g. by displaying a "newer version required" message instead
of a generic error message).
All the application needs to do is to check whether the file begins with
a Unicode Byte Order Mark (BOM) or an XML header (e.g. "<" or "<?xml"),
or a JSON opening character (i.e. "{" or "["), which are easily recognizable as non-8-bit INI.
Here are some frequent header byte patterns (hexadecimal notation):
- 3C 3F 78 6D 6C (beginning of XML "<?xml" header)
- FE FF (UTF-16 BOM, big-endian)
- FF FE (UTF-16 BOM, little-endian)
- EF BB BF (UTF-8 BOM)
- FF FE 00 00 (UTF-32 BOM, little-endian)
- 00 00 FE FF (UTF-32 BOM, big-endian)
It should be noted that an XML or JSON file may begin with a BOM and/or with
"white space" (space, line feed, tab) before the first "structural
character" ("<" for XML, and "{" or "[" for JSON).
XML detection by checking for "<?xml" rather than for just "<" is more
prudent, as it minimizes confusion with HTML content (e.g. server and proxy
error messages) that may also begin with "<".
|
Specification Information |
|
Homepage: |
https://cloanto.com/specs/ini/ |
Version: |
1.4 (2009-10-23) |
Status: |
Unmodified spec is free to
distribute, free to implement |
Last Page Update: |
2024-09-14 |
|
Your feedback is
always appreciated. It is safe to link to
this page. |
|