File naming

How we name files can enhance their utility.

When considering how to name files, it is important to know:

a.Where they are going to be used.
b.What information needs to be in the name.
c.The order information needs to be in the name.

Of course, many programs define their own filenames and those cannot be changed, but outputs from those programs that are not going to be used or changed directly by them can be renamed.

Never waste a filename

There is always something useful they can store.

Note that AI will likely ignore filenames as an optimisation mechanism for processing because their training data is mainly conventional coding and flow patterns, not multi-mode cross-optimisations. This means that until filenames as storage becomes more widely used, we will have to make up our own usage patterns.

Where

△

Where files are stored or used will place limits on what characters are allowed.

Some key issues with filenames are:

a.Some operating systems ignore case, like Windows, but others do not, like Linux. That means that it will be better to use lowercase letters in names to allow the file to be used anywhere without possible sorting or identification issues.
b.Many filesystems do not handle Unicode characters consistently and that can vary between operating systems.
c.Each operating system uses special characters in their command line utilities, and those cannot be used in filenames.
d.All filesystems use a . before the extension. If processing the filenames programmatically, use it as the major separator between filename parts to split it only once to isolate those parts.
e.At some point, a filename may need to be displayed on a web page. While many people may use a _ to separate words in filenames, on a web page the name will not automatically wrap at them, possibly leading to extra long names that push past the page boundary. Use - instead which do wrap.
f.Some operating systems impose limits on filename lengths or total path length. For example, Windows restricts paths to 256 characters.

Restrict the characters in filenames to:

a.Letters: a to z
b.Numbers: 0 to 9
c.Primary separator: .
d.Secondary separator: -
e.Name length: 255 characters
f.Path length: 260 on Windows, 1024 elsewhere
g.Nothing else!

While Windows 11 can be made to allow longer path lengths from the default of 260 characters up to over 32,000 through the Settings > System > Advanced > File Explorer > Enable long paths setting, very few systems would have it switched on. While Linux allows up to 4096 character path lengths, MacOS, though also based upon Unix, only allows 1024. Use a maximum of 1024 on webservers to be safe and be readable in site management pages.

Information

△

A lot of information can be stored in the filename to avoid having to open it.

The types of information that can be stored in file or folder names are:

a.Dates and times, preferably as UTC so that daylight savings will not create issues.
b.Names of documents or objects, even if abbreviated.
c.Status, maybe even just a letter if there are not many options.
d.Unicode, but converted to lowercase hex.
e.Hash and its algorithm of the file's contents.

Some examples of these are:

a.Format of yyyy-mm-dd or yyyy-mm-dd-hh-mm-ss, depending upon the granularity required. The separators make them far easier to read. Keep leading 0s to facilitate sorting and direct comparisons.
b.finance or fin if the mnemonics are generally understood. If wanting to compare across different filenames while still being able to compare other parts, make them all the same length by using constant-length mnemonics.
c.w, d and r for works-in-progress, drafts and releases respectively.

All internal refences for elements, like articles and files, in my Smallsite Design website design app use their create date, so all filenames use the format yyyy-mm-dd-hh-mm-ss for those in their filenames and folders.

Order

△

Many utilities, like file managers, default to sorting filenames alphabetically and that can be used to groups files according to the order of information in the name.

The key to sorting visually and having ease of processing is to make the content in each part the same length across all filenames. Then order the parts in the way you want to see them listed in your file manager, which will save having to open another program to show them in order.

While file extensions are usually used to distinguish which programs files are managed by, they are at the end of names and so are useless for sorting by program if other parts of the filename are sort criteria. Use a short mnemonic at or near the start of filenames if wanting to significantly sort by program.

If programmatically processing files or folders, timestamps (date and time) will typically be first as the file-reading functions like glob sort filenames alphabetically by default into an array, so getting the latest file is simply obtained by reading the last array item.

Benefits

△

Having reliably formatted filenames means that other applications may benefit.

Filenames with known formats become usable by other applications or new ones can be written to take advantage of the names. For example, documents for a product can start with the company mnemonic, product id, release number, project id, phase and extension. A utility can then be written to collect all release phase documents for all products, copy them to a webserver, and make a page with a section for each product with a list of releases, latest first, for the standard product then each project's versions.

Another utility can scour the document repositories and send an email to their documentation managers with a list of all filenames with format errors. Both these utilities can save hundreds of hours of manual maintenance merely because the formal filename formats allow reliable processing.

Filenames are also an easy way to store modest amounts of information without giving them any content. They only require being read from a folder listing. This makes them good for recording asynchronous information rather than using a database with all its overhead. A later process can aggregate them for storage in a single file.

For example, after 30 seconds of a web page being open, a mini-page inline in an iframe element can send the page id, a timestamp and any other desired information like its locale to the website, where it is turned into a file with the received information solely in its filename. A regular aggregation process can then read all the new read files, add their information to a read statistics file and then delete them.

The examples cited have all been done for real, with the later being used in Smallsite Design to show read statistics for each article of a site, which is the most important site statistic, and without using JavaScript.