bitfontCore Data Usage

bitfontCore

bitfontCore is a resource which has been gathered from a couple other bitmapped font resources. The biggest one that I've found is here. You can look at those, but I already went through and wrote parsers for the .yaff and .draw files and gathered the font glyphs into one big set. Not a lot of pictures for this one, but you can find others in the previous couple articles which have used this data.

This project has been evolving over a little while, and I think now I have it packaged enough to make it easy for myself and others to use. The current format is large JSON structure which contains two pieces of information per character - first is a set of dimensions, width, height, second is a list of unsigned integers which make up the bit pattern. I have been able to efficiently code this, even using JSON, by using the bits in these unsigned int values as boolean on/off values for each pixel in the font glyph. This does limit the usage to purely bitmapped font glyphs, of known x,y dimensions. For this application, it seemed a good fit.

Usage

The basic record looks like the above, and should be pretty easy to use in any language. This is entry number 10012, which is an accented capital letter E. To decode the pattern, we know the input x,y dimensions for the glyph by looking at the "x" and "y" fields. The input "y" value is redundant if you can query the number of elements in the data subarray, but the "x" value tells how many bits of width you need to consider when loading the glyph.

Next is row-by-row evaluation of the bit pattern - to get the right most bit, in the top row, we need to take the least significant bit of the value stored in the first element of the d array, and determine its contents. Subsequent bits in the row are gotten by looking at the second least significant bit, etc, as shown below, till you reach the specified width. The second element in the d array is the second row.

I have limited the visual here to one byte of width, since the "x" value is given as 7. Zero/false/off values are dots and one/true/on values are shown as 0s, just to accentuate contrast a bit. I have used C++ uint64_t's to encode, because I have no glyphs in the collection that are more than 64 bits wide - figure most languages have 64 bit uint support.


     Decimal   Binary

         40  = [ . . 0 . 0 . . . ]
         252 = [ 0 0 0 0 0 0 . . ]
         68  = [ . 0 . . . 0 . . ]
         72  = [ . 0 . . 0 . . . ]
         120 = [ . 0 0 0 0 . . . ]
         72  = [ . 0 . . 0 . . . ]
         64  = [ . 0 . . . . . . ]
         68  = [ . 0 . . . 0 . . ]
         252 = [ 0 0 0 0 0 0 . . ]
         0   = [ . . . . . . . . ]
         0   = [ . . . . . . . . ]
         0   = [ . . . . . . . . ]

The data is available here. This is the processed JSON record of all the distinct encoded font glyphs, duplicates removed. There are some alignment issues that may need to be addressed, specifically that I seem to be adding another bit to the width of the glyphs. This should be simple enough to fix and reencode, and while I'm at it, probably trim/add data so there is only a single bit of whitespace on all sides for all glyphs. The dimensions don't really matter for this application, it's already had all the info that relates to its use as a font stripped off, so it is simply a bit pattern. Use of the full model with the font data is still possible, but it is a bit more involved and the data files for that are more than 10 times larger. You can find a reference implementation of a decoder as described above in the linked repo.

I think these have a huge amount of creative potential as a source of random data with interesting forms and a lot of coherent shapes. As far as applications go, you can see a couple usages Voraldo, including the Spaceship Generator, and in another quick project where I used some of glyph dimension data to find well-distributed glyphs by coverage to replicate pixel intensity values. I'll certainly be using this resource again for things in the future.

I think there may be some value to classifying these characters with another data field, giving the most similar character, a-z/A-Z/0-9 alphanumerics or special characters. That would be a largely manual process, and I don't see an easy way to make it happen. Given that there are over 70k glyphs - it's not super practical to do it manually without building a tool for it. And I still might, we'll see.

Jon Baker, Graphics Programming

bitfontCore

Usage