To give you a rather simple and rudimentary look at how easy it is to disguise data within data, the subheading for this blog is "The art of hiding data in plain sight". This actually contains a hidden message, if it was combined with another piece of data - 100,914,928 it can become something totally different. Feel free to try and work it out - the hidden message is at the end of the blog if you are interested.
In todays society, the ability to communicate privately with someone else is getting increasingly difficult. Online conversations will be stored acorss several databases and depending on what country you are in, the authorities can request access to communications you have been part off (some countries request this in a more civil form than others) This does not even factor in data breaches and hackers who can obtain data you have without your consent. This is where Steganography (alongside encryption) is a tool which can be used for positive and negative motives. On one hand it can help nefarious cyber criminals and terrorist conceal their communications, however it can be used to bypass censorship where free speech/ the flow of information is restricted (I'm sure there are parts of the world you can think of that this may apply in).
Pictured above is the Johannes Trithemius, who in the 15th century, wrote a trilogy of books about ‘Steganographia’ (also pictured) which is believed to be one of the first examples of recorded steganography due to the hidden data inside each book (Arif and Hajjdiab, 2017). This is where the word steganography (nothing to do with Stegasaurus sadly) originates from. Therfore it is evident that data concealment is nothing new, people have used codes, tattoos and custom languages among many other methods in the past. But with new digital mediums providing a bountiful steganographic possiblility, I decided to focus my project on how steganography can be performed via digital mediums.
Project origins and aims
As mentioned in a previous blog detailing my second semester at Abertay, this project was part of one of my second semester modules, "Introduction to Security".
I have always been fascinated in the art of hiding a picture within a picture and how messages can be hidden. This led me to wanting to explore how much data could be stored in different file types, without impacting the functionality of the file the data is being hidden in (this file is called the carrier file).
This led to me exploring what file types make the best steganographic mediums and how to process of hiding a message using various file types can be achieved.
What does this do to the functionality and size of the document?
Will it be easy to detect any obvious signs of steganography?
How the project was achieved
The decision to use OpenPuff was straightforward, as it supports the use of a range of file types within the stegangoraphy process. Notably, it supports the use of PDF (unlike other open source steganography tools I explored) - a steganographic medium I was keen to experiment with.
For this project I wanted to hide a message that was;
The data chosen for the hidden message in this project was taken from the final submission for a previous module at Abertay as it satisfies all the above criteria (and can be found here). The hidden message was inserted into CyberChef, where it was decrypted using a ROT-47 Cipher and then converted to an ordinal integer array.
PDF Carrier file suitability compared to other file types
Steganography within images is something we had convered our second semester module, Introduction to Security. We looked at how the 1s and 0s that are in each image can be used to hide data. If you use the least significant bit, allowing you to hide less but witha minimal impact on the original file. The more significant bits you use to hide the data, the more the image will be altered and evidence that steganography has taken place.
To see the strengths and limitations of PDF files as steganographic mediums as carrier files (files which will carry hidden information) I compared similar sized PDF, mp3 and mp4 files (a file size range of 144kb)
By trying to use these files to hide data using OpenPuff, it was quickly apparent that the PDF file lagged behind the MP3 and MP4 files. It makes grim reading for PDF files when its' steganographic potential (ability to conceal data) is compared to a smaller PNG file. The above image shows the Bible.pdf used in this project only being able to store 11% and 7% of the data the mp4 and mp3 file can store respectively (pictured below).
This is highlighted further when compared to the bytes available in the aurora_borealis.png file which had the capacity to store 51,760 bytes at maximum bit selection (figure 8) and 12,928 bytes at the minimum bit selection (figure 9). Despite the png file being 5.78 times smaller than the Bible.pdf, it could store 3,266.7% and 13,379.2% more data at minimum and maximum bit selection respectively (maximum bit comparison pictured below).
This made disguising the hidden data in PDF carrier files problematic and to successfully hide and unhide the hidden file using OpenPuff, a png and mp3 file were needed.
The original Bible.pdf file used in this project, 6,131kb in size, had a capacity to store 512 bytes as a carrier file.
The hidden message of 54kb required 55,126 bytes to be hidden using OpenPuff, which would require 128 Bible.pdf files to conceal the hidden message.
This amounts to 662.148mb worth of PDF files to conceal a message 0.008% of its size.
This demonstrates that concealing large quantities of data in PDF files is ineffective, however using OpenPuff to communicate shorter messages privately can be effective.
This is where PDF files could be advantagous; they are widespread in workplaces and the greater internet, the difference in file size after stegangoraphy is negligible and PDF files retain full functionality. Potentially very useful for private conversations between two people who don't want any interference, and have access to the same PDF files on a common server. But these communications would need to be concise.
You have looked at different files and their suitability for Steganography, how do you actually do it?
Steganography can be broken up into two main steps; encryption and decryption.
OpenPuff is very user friendly and the process of concealing data requires four steps.
The four steps must be replicated by the user who receives the carrier files output from OpenPuff, to reveal the contents of the hidden message.
Step 1 - The user is asked to insert up to three uncorrelated data passwords (you can use only one password but skimping on security is silly), which for this project were; A - “Security”, B – “Assignment” and C – “Practical”.
Step 2 - Select the data you wish to conceal.
Step 3 - Choose the carrier files which will be used to hide the data
Step 4 - Select how much space you space within each carrier file type you wish to use for hiding data (bigger bit selection will alter each file more significantly, with each file type bit selection level needing to be replicated to unhide the contents of the hidden file)
In the example I used to hide my message, I used five carrier files to conceal the contents of the hidden data. The order they appear in one OpenPuff (the chain order) is fundamental to the success of unhiding the data.
Should the chain order be altered when trying to unhide the message, it will be unsuccessful.
The picture below shows the seetings used to hide data in my project. Simply clicking Hide Data! at the bottom right will begin the steganography process.
(Note it is not possible to hide a file if a file type is not accepted, the file type is too big (>256mb) and the data carriers do not have enough space to hide the file.)
Every file used the maximum bit selection in this project, except the mp3 file which used medium bit selection, as this was all that was required to cover the size of the hidden message.
After successfully performing steganography, the carrier files are output to a folder (of your choice), which I then e-mailed to another device so the contents can be used on another machine.
I wanted to send it via e-mail to see if steganography is detected and if it was easy enough to install OpenPuff on another machine and download the carrier files and extract the hidden message, which (spoiler alert) was successful.
To decrypt the hidden message, the hidden message must first be extracted from the carrier files. The carrier files from the previous step were sent between two e-mails created for this project alongside the original carrier files were also sent so a comparison between files concealing and not concealing data could be made (see picture below to see difference). To successfully extract the hidden message, the three uncorrelated passwords, chain order and bit selection settings for each file type must be replicated on OpenPuff.
Should any part of the chain order, bit selection and/or passwords be incorrect, the hidden data will remain concealed (beneficial from a security standpoint!).
A successful extraction will lead to the hidden file being output in a file location desired by the user.
The key for decrypting this message can be found on the Bible.pdf used as a data carrier and by following this process on CyberChef, the true contents of the encrypted message can be read by the user.
Any questions about my project, or ideas I should investigate for future research? Pleast let me know here.
Project resources and sources
What was the answer to hidden message?
This is by no means a complex hidden message - but just to show you how simple it can be to hide a message in a message.
The number 100,914,928 = 0110 0000 0011 1101 0110 1111 0000 in binary.
If we line this binary number alongside out hidden message (removing spaces in the hidden message and adding two 0s at the start first two 0s were added in to make it the same length as the message) then it looks like the following underneath.
If you only keep the letters with a 1 above it, you have the following letters - ARNGDAAIPAINS
0001 1000 0000 1111 0101 1011 1100 00
Thea rtof hidi ngda tain plai nsig ht
These letters can be aranged into "Granada, Spain". Is this a rather strange way to show how steganography can work? Yes? Was this a good use of my time to come up with? Probably not! Was it quick to come up with? Sadly not!