TITLE:
Improving Compression of Short Messages
AUTHORS:
Paul Gardner-Stephen, Andrew Bettison, Romana Challans, Jennifer Hampton, Jeremy Lakeman, Corey Wallis
KEYWORDS:
Lossless; Text Compression; Sms; Twitter; Arithmetic Coding; Mobile; Cellular; Mesh Network
JOURNAL NAME:
International Journal of Communications, Network and System Sciences,
Vol.6 No.12,
December
16,
2013
ABSTRACT:
Compression of short text strings, such as the GSM Short Message
Service (SMS) and Twitter messages, has received relatively little attention
compared to the compression of longer texts. This is not surprising given that
for typical cellular and internet-based networks, the cost of
compression probably outweighs the cost of delivering uncompressed messages.
However, this is not necessarily true in the case where the cost of data transport is high, for example, where satellite
back-haul is involved, or on bandwidth-starved mobile mesh networks, such as
the mesh networks for disaster relief, rural, remote and developing contexts
envisaged by the Serval Project [1-4]. This motivated the development of a
state-of-art text compression algorithm that could be used to compress
mesh-based short-message traffic, culminating in the development of the stats3
SMS compression scheme described in this paper. Stats3 uses word frequency and
3rd-order letter statistics embodied in a pre-constructed dictionary to affect
lossless compression of short text messages. This scheme shows that our scheme
compressing text messages typically reduces messages to less than half of their
original size, and in so doing substantially outperforms all public SMS
compression systems, while also matching or exceeding the marketing claims of
the commercial options known to the authors. We also outline approaches for
future work that has the potential to further improve the performance and
practical utility of stats3.