Managing Web Map tiles on a Debian computer, and other Webmap technical complexities

Webmaps are a great tool for everyday use to view a map in a browser, and they are getting a lot of use from Google Maps and Open Street Maps down. What is not really seen in these technologies is some of the challenges that have to be overcome to produce them ready to go onto a server. Each tile in a TMS or XYZ tiled map system, regardless of the zoom level or scale, is only 256 by 256 pixels and thus covers quite a small part of a map’s surface. This in turn means a large number of tiles are needed to cover significant areas such as a whole country.

Currently with NZ Rail Maps web maps, the basic maps completed at stage 1 with just a topographical base and all the overlays currently seen have over 15 million tiles. These create quite a few challenges; among these are typical web hosting inode limits (An inode is basically a catalogue entry for a file or directory in typical Linux file systems, and one of them is needed for every file). When NZRM Webmaps was first developed, it was obvious that the hosting platform’s inode limit of up to 250,000 would soon be exceeded and make it impossible to fully implement the webmaps as planned at that time. Although the current web host does not have such a limit, this quickly resulted in a switch to the mbtiles format to encapsulate each layer with its hundreds of thousands of tiles, into a single file. This also greatly speeds up file upload due to the significant overhead of work with a very large number of small files, and mbtiles compression actually reduces the physical number of tiles stored in each layer.

The major issues for working with tiles on either a production computer or a web server are similar: the files are very small and numerous, and there are a lot of them. I’ve looked briefly at the latter issue and now I will consider the former. By default most ext4 disks on a Linux system have a default block size of 4096 bytes (4 KiB). But the average size of a tile is in many cases considerably less than this. In fact the average for NZRM Webmaps is around 1.5 KiB. So then there is a lot of disk space wastage. Related to this is inode limits on the local disk where the tiles are stored when they are being produced before uploading. When the Webmaps were first developed they were stored on a 2 TB disk on which the average file size apart from the Webmaps was much larger than 4 KiB (most of the storage on appspc is occupied by Gimp aerial photo projects each between 20-30 GiB in size) and because of this large file size average, there were plenty of inodes free to allocate individual webmap tiles. In fact the average file size on this disk apart from webmaps is around 5.4 MiB, which is of course 3500 times larger than an average web map tile.

Problems started when the webmaps were deployed to their own SSD partition; file corruption began to occur for no apparent reason when Qgis’s scripts were creating tiles and for other tools and steps, Debian would be unable to create a new file. Whilst I haven’t managed to work out exactly what is happening, it seems highly likely all the available inodes on the partition were allocated well before the disk was full (possibly it was only 50% full at that point). In addition a lot of wasted blocks were found where tiles only 1.5 KiB or smaller were using a full block of 4 KiB and with so many tiles this added up to quite a low storage efficiency.

The solution to both issues that has been found successful to date is to create a partition with a block size of just 1024 bytes (1 KiB) and to specify the creation of an inode for every 1 KiB of storage. This addresses storage efficiency and also ensures there should be enough inodes to store the large number of web map tiles generated. However, storage efficiency is impaired by the amount of disk space needed to store such a large number of inodes, which on a 220 GiB partition is estimated at around 50 GiB. So getting the more efficient on disk storage of files is traded off against the large number of inodes needed to store the many millions of small files. In the end, there is probably not that much improvement in storage efficiency, but on the other hand the efficiency improves as the disk fills up, and there is more of a linear relationship between tile count and disk space occupied as well as the assurance there won’t be any more problems with insufficient inode availability.

The ext4 file system also supports extent based storage, which is said to overcome some of the limitations experienced with more conventional disk file/directory architecture, but I haven’t yet evaluated it at any level.

When webmaps production was started, some of the other issues considered were how to provide operational backups, and how to preview maps before they were uploaded. I even considered installing a version control server to collect multiple editions of each tile as they were generated and make it easy to roll back to a previous generation if some sort of problem occurred. This option was never implemented and over time, keeping multiple generations has proven unnecessary, as rebuilding a layer for one volume is quick and easy with the various scripts, especially with a separate computer being used to build the volumes without tying up the editing computer. The backups were a little bit more tricky to start off with, but what has been implemented is to exclude the tiles themselves from being backed up. Instead, only the mbtiles files produced for each layer are backed up using rsync between computers. The full backup to an external removable disk will still copy all the individual tiles but these backups are not done too often and are incremental.

Previewing the maps has been done by installing an Apache web server with PHP on the build computer and means that each layer can be checked as soon as it is built, both as raw tiles and mbtiles, although Apache permissions can make configuration a bit of a headache. Checking each layer on a preview means it is possible to save the significant time delay in waiting for a layer to upload before checking it live. On the live web server, the uploads are always done to a test preview of the site, before the files are copied directly on the web server to the live section.