Cloanto Implementation of INI File Format

Abstract

This document specifies a text-based file format for representing software configuration data in a format which is easily editable by humans and unambiguously readable by a simple automatic parser.

Introduction

The format described here is an implementation of the ".INI File Format", as defined in the Microsoft Windows for Workgroups Resource Kit. Since the original Microsoft specification is generic, and implementations vary widely, we decided to document how the INI files and parsers used in Cloanto applications works. As far as possible, existing specifications and practices for character encoding and programming have been retained.

Where this document says that an application "should" implement a certain feature in a certain way, it means that the Cloanto INI parser supports this feature as recommended here. Where this document says that a certain style "should" be avoided, but parsers "should" support it or ignore it, it means that authors of INI files should not use that style (the risk being incompatibility with slightly different INI parsers), but authors of parser software should support it or ignore it as indicated, rather than generate an error condition. In the following, "INI" refers to the file format as specified here.

Character Set

For maximum compatibility, the entire INI file (including section names, key names and data values) should be encoded using 8-bit characters in the ISO 8859-1 character set (of which 7-bit ASCII is a subset). Application-specific string values may store data using other character sets, and even binary data, as long as they are encoded as either ASCII or ISO 8859-1 text. ANSI C Escape sequences are used for the encoding of special characters in strings.

Alternatively, if the risk of incompatibility with other application scenarios (e.g. data exchange with other applications, driver INF files, AutoRun, etc.) can be excluded, the INI file may be encoded as a Unicode stream, inclusive of an initial byte order mark (BOM) that describes the encoding (UTF-8, UTF-16 or UTF-32) and endianness (big endian or little endian).

Known applications that write Unicode INI files include Software Director and the Windows Registry Editor. Both write little-endian UTF-16 data.

Also see the section "Defensive INI: Detecting Unicode, XML, JSON" in this document.

File Names

If the INI file is to be stored or used on a Microsoft Windows system, the following characters should not be used in the file name, as they are not supported by older file systems: "\", "/", ":", "*", "?", """ (double quote character), "<", ">" and "|".

Additionally, certain file names are supported at the file system level, and can be accessed by programs, but they are not accessible from certain shells and file manipulation programs. These names include: "COMx", "AUX", "LPTx", "PRN", "NUL" and "CON", and all variants with a "dot suffix". File names entirely consisting of dots (".", "..", etc.) are also "problem files".

Where compatibility with older file systems is important, short names ("8+3") should be used for directory and file names.

File Structure

An INI file is an 8-bit text file divided into sections, each containing zero or more keys. Each key contains zero or more values.

Example:

[SectionName]

keyname=value

;comment

keyname=value, value, value ;comment

Section names are enclosed in square brackets, and must begin at the beginning of a line.

Section and key names are case-insensitive. In consideration of possible future upgrades from INI to XML o JSON, authors of INI files may want to use consistent and case-exact section and key names, as if INI parsers were case-sensitive (XML and JSON parsers are case-sensitive).

Section and key names cannot contain spacing characters. The key name is followed by an equal sign ("=", decimal code 61), optionally surrounded by spacing characters, which are ignored.

If the same section appears more than once in the same file, or if the same key appears more than once in the same section, then the last occurrence prevails.

Multiple values for a key are separated by a comma, optionally followed by one or more spacing characters (as defined below).

When a parser encounters an unrecognized section name, the entire section (with all its keys) should be skipped. Within a known section, only unrecognized keys should be skipped.

Spacing, Line Terminators and Comments

Both Space (decimal code 32) and Horizontal Tab (HT, decimal code 9) are acceptable spacing characters.

Lines are terminated by a CR (decimal code 13) or LF (decimal code 10) character. If CR+LF or LF+CR appear consecutively, they count as a single line terminator, generating one line, not two lines. A sequence consisting of CR+LF+CR+LF would however result in two lines, not one line. Where implementation-specific considerations do not advise otherwise, the recommended line terminator is a LF character.

Comments are introduced by a semicolon character (";", decimal code 59). Comments must begin at the beginning of a line or after a spacing character. Comments terminate at the end of the line.

Parsers should treat HT (decimal code 9) characters as space characters (decimal code 32). Comments, empty lines, and spaces at the beginning of a line should be ignored. Parsers should also ignore binary characters other than HT, LF and CR, and treat an end of file like an end of line before continuing with specific end-of-file processing.

Parsers should be tolerant towards spacing variations, such as in:

[section name]

keyname=value

; comment

keyname = value, value , value ;comment

String Values

String values may optionally be enclosed in quote characters (""", decimal code 34). String values beginning or ending with spaces, or containing commas or semicolons, must be enclosed in quotes. Quote and backslash ("\", decimal code 92) characters, as well as binary characters (decimal ranges 0..31, 127..159) appearing inside strings must be encoded using the escape sequences described in this document.

The default character set for text appearing inside string values is ISO 8859-1. Applications may define certain sections and/or keys to store text which, from the application side, results as being encoded using different character sets, or even binary data. However, the string data appearing in the INI file must be encoded as ISO 8859-1 text, using escape sequences as necessary, so as to be compatible with INI parsers.

Path Values

Path values are identical to String values, with the exception that escape sequences introduced by the backslash character are not supported. Path values are used to represent file system and registry paths, where for clarity it is not desirable to use double backslash characters ("\\") to indicate backslash characters inside paths. This means that a path like "C:\readme.txt" would remain unchanged (whereas it would have to be indicated as "C:\\readme.txt" in a String field).

Numerical Values

In numerical values, the period (".", decimal code 46) is the only valid decimal separator. Leading zeros are optional (e.g. "0.5", "000.5" and ".5" are all valid representations for the same value, i.e. 0.5). For consistency, one leading zero (e.g. "0.5") is the preferred format to represent values smaller than 1.

Trailing zeroes may be used on an application-dependent basis, for example to express precision (e.g. "0.50", opposed to "0.5", may indicate that the smallest currency unit is equal to fifty "cents", as opposed to five "tenths" of the full unit, and "1.234000" may indicate to use a precision of six decimal digits in partial or total results).

Escape Sequences

Escape sequences, consisting of a backslash followed by a lower case letter or by a combination of digits, should be used to encode binary data and certain other special characters and character combinations. The result of each escape sequence is parsed as if it were a single character. Quote and backslash characters inside a string must always be preceded by a backslash character. Binary data in key values should be encoded using one of the octal or hexadecimal notations described below.

Escape Sequence	Represents
\a	Bell (alert)
\b	Backspace
\f	Form feed
\n	New line
\r	Carriage return
\t	Horizontal tab
\v	Vertical tab
\'	Single quotation mark
\"	Double quotation mark
\\	Backslash
\?	Question mark
\ooo	ASCII character in octal notation
\xhhhh	ASCII character in hexadecimal notation
\ at end of line	Continuation character

The octal (ooo) and hexadecimal (hhh) streams can be of any length, and terminate as soon as the first non-octal or non-hexadecimal character is encountered. For example, the digit 8 would terminate an octal stream, and be interpreted as a separate character. Both uppercase and lowercase letters A..F are acceptable for hexadecimal.

The use of "\'" and "\?" is entirely optional (unlike "\"", "\\" and binary characters), and implemented only for compatibility with ANSI C.

Nonprinting sequences, such as "\a" and "\f", produce device-dependent results. Applications should ignore such codes if they are not applicable to the context. Applications should try to implement at least "\n" and "\t". A simple implementation for "\t" could be to replace it with eight space characters, or with one space character, depending on the intended use.

All escape sequences consisting of a backslash character plus a character that does not appear in the table listed above are ignored. For example, "\A" or "\c" would simply be skipped by a parser.

A backslash followed by any combination of CR and LF characters is considered a continuation character. In this case, the backslash is ignored, and the new line beginning after the sequence of CR and LF characters ends is treated as the continuation of the previous line. For line-counting purposes (e.g. to report an error in a certain line), however, the lines are treated as separate.

INI Files on Web Servers

A few special considerations apply to INI files hosted on web servers.

MIME Media Types instruct web clients how to handle files received from a server. The HTTP protocol specification requires that the web server reports the MIME type for content. By default, servers like Internet Information Services 6.0 serve only files with extensions registered in their MIME type list. If the web server does not already have a MIME type entry for INI files, we recommend that it be added by setting the MIME type for extension ".ini" to "application/octet-stream".

If a security filter such as Microsoft Urlscan is running on the web server to control client requests, it must be configured to allow the data files to be fetched (e.g. by adding .ini to the [AllowExtensions] section and removing it from [DenyExtensions] in %SystemRoot%\System32\Inetsrv\Urlscan\URLScan.ini). If it is not possible to change the security settings, the names of the data files should be changed (e.g. replacing ".ini" with ".txt").

INI in Autorun.inf Files

AutoRun is a feature of the Microsoft Windows operating system which makes it possible to automatically run a program (e.g. a menu window, a setup procedure, etc.) when a medium is inserted in the drive. When a medium is inserted in an AutoRun-enabled drive, the system looks for a file named "autorun.inf". For maximum compatibility both with Windows (especially older versions) and with third-party parser applications, it is recommended that:

No comments should be used in the file
No space characters should be used around "="
No space characters should be used in Open and Icon paths (not even if the paths are quoted)
Line terminators should be CR+LF

For additional information on AutoRun you may want to refer to the MenuBox FAQ, documentation and Knowledge Base pages.

Defensive INI: Detecting Unicode, XML, JSON

Sometimes data is initially written as an 8-bit INI file, but later as the application requirements evolve this morphs into a Unicode INI and/or an XML or JSON file. In order to support format changes and extensions, version information is traditionally placed inside the INI data (e.g. a "RequiredVersion" key). However, a transition from 8-bit INI to Unicode or XML or JSON would prevent legacy code to access the version information itself. It may therefore make sense to implement a simple check from the very first version of the software, so that such a condition can be handled appropriately (e.g. by displaying a "newer version required" message instead of a generic error message).

All the application needs to do is to check whether the file begins with a Unicode Byte Order Mark (BOM) or an XML header (e.g. "<" or "<?xml"), or a JSON opening character (i.e. "{" or "["), which are easily recognizable as non-8-bit INI.

Here are some frequent header byte patterns (hexadecimal notation):

3C 3F 78 6D 6C (beginning of XML "<?xml" header)
FE FF (UTF-16 BOM, big-endian)
FF FE (UTF-16 BOM, little-endian)
EF BB BF (UTF-8 BOM)
FF FE 00 00 (UTF-32 BOM, little-endian)
00 00 FE FF (UTF-32 BOM, big-endian)

It should be noted that an XML or JSON file may begin with a BOM and/or with "white space" (space, line feed, tab) before the first "structural character" ("<" for XML, and "{" or "[" for JSON).

XML detection by checking for "<?xml" rather than for just "<" is more prudent, as it minimizes confusion with HTML content (e.g. server and proxy error messages) that may also begin with "<".


Specification Information

Homepage:	https://cloanto.com/specs/ini/
Version:	1.4 (2009-10-23)
Status:	Unmodified spec is free to distribute, free to implement
Last Page Update:	2024-09-14

Your feedback is always appreciated. It is safe to link to this page.