JBIG2: A short history
As amazing as it seems, the idea of transmitting images over wire came into being before the Pony Express. Alexander Bain filed the first patent for image transmission via wire in 1843, some 17 years before the first Pony Express delivery in 1860. Bain's idea was realized by the end of the Civil War. By 1865, images were being transmitted over telegraph lines. (And the Pony Express had been defunct for three years.)
A century later, the Xerox Magnafax Telecopier could connect to any telephone line and transmit bi-level images. (A bi-level image is an electronic image in which each picture element [pixel] is represented by only one bit, which can be either on or off.) Within two decades, many businesses around the world had adopted facsimile technology - the fax machine - into their everyday practice.
As people have embraced the Internet and personal computers for both business and home use, the fax machine has waned somewhat in popularity. Today, document transmission can be done via e-mail, a Web site, a wireless personal digital assistant or even a cell phone. But the much-anticipated "paperless office" is still a myth in present practice. (Once upon a time, this "paperless office" was seen as an ideal world where the need for paper significantly decreased as electronic communications and documents increased. Yet the reality is that the use of paper copies of documents has expanded even as the use of electronic document media is being adopted into business. In a variety of studies, including Sellen and Harper's The Myth of the Paperless Office, Web use and e-mail use are both shown to substantially increase the amount of paper used in an office.)
As the number of electronic documents increases, so does the need to optimize electronic document files for transmission speed and the smallest storage space possible. Scanned books and manuals need to be accessed and manipulated easily. Checks and legal documents must be accurately stored and filed electronically, and available for easy and immediate retrieval. As appropriate, digital media must be ready for Web and LAN-based transmission and display.
There are a number of file formats available for document transmission and storage, some more efficient than others. In the pages that follow, you'll see why file formats make a difference - and why JBIG2 is being heralded as a breakthrough format.
Image compression experts from a variety of companies (including but not limited to Siemens, AT&T, IBM and Xerox) combined their skills to develop the International Telecommunications Union (ITU)-approved JBIG2 standard for compression of black and white (bitonal) scanned documents. Reflecting this reality, "JBIG2" is an acronym for "Joint Bilevel Imaging Group 2."
As you're about to see, the power of JBIG2 against its predecessors is its ability to achieve much higher compression rates, meaning that smaller files can retain the same information as the original scanned files in a lot less space. The results are more efficient storage, and greater speed of transmission - sometimes 10x faster - than TIFF G3, G4, or JBIG (also known as JBIG1).
The TIFF Group 3 (TIFF G3) standard was introduced in 1980. It gained wide acceptance in the digital imaging world as the "go to" standard for fax-based transmission. The TIFF Group 4 (TIFF G4) standard of 1985 remains the effective standard for archiving and digital scanners.
Due to the higher compression rates offered by JBIG2, digital imaging devices that currently utilize a TIFF compression filter are expected to convert to a JBIG2 filter for bitonal scans in the near future. Consider this example: TIFF G4 is often 15x smaller than a raw 300 dpi (dots per inch) black and white image file. Yet a compact JBIG2 version of the same file will be 5x - 15x smaller than the TIFF G4 file.
Figure 2. Across a standard set of scanned documents, the TIFF files are much smaller than the raw files, while in turn the JBIG2 files are much smaller than the TIFF files.
The comparison is even more striking when the original file has an electronic origin (for example, those generated by, and converted from, a Corel Suite or Microsoft Office application). In these cases, JBIG2 compression rates can be 20x - 30x smaller than a comparable TIFF G4 encoding.
There's another reason behind the likely transition from TIFF to JBIG2: a JBIG2-encoded file only shows a minimal increase in size with increased scanner resolution. In contrast, TIFF encoding generates a linear increase in file size corresponding to the increase in scanning resolution. So, in practical terms a 300 dpi TIFF file is usually 50% or so larger than its equivalent 200 dpi TIFF file. In contrast, a 300 dpi JBIG2 file will only be marginally larger than its equivalent 200 dpi file - and in cases where some advanced JBIG2 features can be applied, the JBIG2 300 dpi file may actually be smaller than its 200 dpi equivalent with no loss of accuracy or quality.
Introduced as an ITU standard in 1993, JBIG (also called JBIG1) never achieved the acceptance that TIFF G4 enjoyed even though it provided a 20-30% reduction in file size over TIFF G4. Such a reduction rate never generated sufficient enthusiasm among the digital imaging community to justify broad-based industry support. Consequently, JBIG was mostly used for bitonal image compression on a very limited range of (mostly Japanese) MFP devices and digital copiers.
In contrast, the digital media industry has readily received the JBIG2 standard. Almost from the time of its introduction, JBIG2 was supported for bitonal compression in the JPEG 2000 Part 6 specifications, and as a compression filter in Adobe PDF. It quickly became the format of choice for a number of document-heavy organizations including legal, media, financial, scanning and banking firms.
One advantage held by both JBIG and JBIG2 over TIFF G3 and G4 is the JBIG formats' ability to use arithmetic coding instead of Huffman coding. Again, the key difference is the higher compression ratio arithmetic coding can bring to the JBIG standard. Arithmetic coding allows for data to be represented by a fraction of a bit. In comparison, Huffman coding requires whole bits to represent runs in the image, resulting in a lower compression ratio for the TIFF formats.
JBIG2 introduced a number of features not available in JBIG, including:
- File size reduction of 90% or more.
- Support for several lossy and lossless compression modes.
- Readily available viewer support in the form of Adobe Reader when the JBIG2 file stream is PDF-wrapped.
- Machine learning of font classes.
A simple definition of codec could be "any type of format used to compress a file to a smaller size." Clearly, the formats previously addressed in this Primer fit well into this definition. Compression protocols have been around for some time; development on codecs began decades ago.
But not all compression codecs qualify as "smart compression codecs." Earlier codecs, including JPEG, TIFF G3, and TIFF G4, employed fully-specified compression standards for both encoding and decoding. Recent developments in CPU power brought forth newer smart compression codecs, including JBIG2, JPEG 2000, and MPEG4. These protocols specify compression standards to decode a file, but do not specify precisely how to encode a file. This enables a sophisticated vendor to utilize a wide range of techniques to improve image quality, increase the compression ratio and print speed, and enhance other document properties..
In many ways the previous generations of compression codecs were similar in structure to uncompressed image formats. Raw image formats such as BMP store the value of every pixel in the image. The first image compression standards effectively did the same thing. In older compression standards such as TIFF, the decoder would traverse the entire image and "paint" each pixel based on the information in the compressed file. Even if a page was completely blank, the TIFF specs required each row of the compressed file to explicitly state that it was blank. Of course, this limited the compression advantage of the older compression codecs.
To allow a significant improvement of compression rates over these older standards, JBIG2 uses a technique of having a symbol dictionary and an addressing stream. The symbol dictionary contains the fonts. The addressing stream says where to place the fonts in the image. It is up to the encoder to choose and define the symbols in the symbol dictionary and to find the most efficient way to properly address them in the image.
The history of paperless document transmission and the development of compression standards to expedite transmission of those documents demonstrates that efficient encoding of captured bitonal and color documents is more compelling than ever. From fax transmission through the electronic transmission and storage of complex multilayer color image and text documents, JBIG2 is rapidly gaining acceptance as the new bitonal compression standard for the digital imaging and printing industry.