SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Table of Contents


Abstract            ……………………………………….………          2

Motivation          ………………………………………………           3

History             ………………………………………………           3

Principals          ………………………………………………           4

Techniques          ………………………………………………           5

Related Work        ………………………………………………           7

Limitation of Existing Techniques   …………………….   18

Proposed Solution         ………………………………………       18

Application         ………………………………………………          18

References          ………………………………………………          19
Abstract




                                                                     Mark Owens      [9]
              !quot;#

       With computers having GHz of processing speed, information / data either stored or in

transmission has become more and more vernalable to hostile eavesdropping, theft,

wiretapping etc. This urges us to devise new data hiding techniques to protect and secure data

of vital significance. Steganography is a method of securing data by obscuring the contents in

another media (called Cover) in which it is saved / transmitted. This doctorial thesis proposal will

present a new Steganographic Technique for hiding data in (ASCII) text files together with its

Software implementation, a research area in Steganography which is considered as

toughest among all, to address.
$


Motivation
       While Net surfing, I encountered an on-line article in the USA Today titled “Terror

groups hide behind Web encryption” claiming (though not yet publicized evidence exist)

terrorists may be using steganography to communicate with each other in planning terrorist

attacks, that twigged my interest for evolving a new concealment technique. It is intuited that

images with hidden messages have ideal cover on bulletin boards or dead drops for other

terrorists to pick up and resolve.


History
       Steganography dates back to ancient Greece when etching messages or images in

wooden tablets and covering them with wax, and tattooing a shaved messenger's head, letting

the hair grow back, and then shaving the head again to read the message were common

practices.


       Early in WWII steganographic technology consisted almost exclusively of invisible inks.

Sources for invisible inks include milk, vinegar, fruit juices and urine that darken when heated.


       The following message was sent by a German spy during WWII:

         Apparently neutral's protest is thoroughly discounted
         and ignored. Isman hard hit. Blockade issue affects
         pretext for embargo on by products, ejecting suets
         and vegetable oils.


       Taking the second letter in each word the following message emerges:

         Pershing sails from NY June 1.


       When invisible inks became easy to decode through improved technology, null ciphers

were used. Null ciphers are unencrypted messages that are indiscernible in innocent sounding

messages. An example of such a message is:
%


         Fishing freshwater bends and saltwater coasts rewards
         anyone feeling stressed. Resourceful anglers usually
         find masterful leapers fun and admit swordfish rank
         overwhelming anyday.

Taking the third letter in each word the following message emerges:

          Send Lawyers, Guns, and Money.


       The Germans developed the microdot technology during WWII. Microdots are text or

photographic images that are shrunk down to the size and shape of a period or the dot of an i or

j. Microdots were usually sent by writing a letter containing periods, i's, or j's, and the intended

recipient could read the messages using a microscope. Because of the extremely small size of

the microdots the messages typically went unnoticed by inspectors.


       A steganographic message generally appears to be something else, like an article or a

picture, or some other quot;coverquot; message. Drawings have often been used to conceal information

since it is easy to encode a message by varying lines, colors or other elements in pictures. This

tutorial will focus on image files to hide text messages.


Principals:
       Steganography can be split into two types, these are Fragile and Robust. The following

section describes the definition of these two different types of steganography.


       Fragile

       Fragile steganography involves embedding information into a file which is destroyed if the

       file is modified. This method is unsuitable for recording the copyright holder of the file

       since it can be so easily removed, but is useful in situations where it is important to

       prove that the file has not been tampered with, such as using a file as evidence in a

       court of law, since any tampering would have removed the watermark. Fragile

       steganography techniques tend to be easier to implement than robust methods.
&


        Robust

        Robust marking aims to embed information into a file which cannot easily be destroyed.

        Although no mark is truly indestructible, a system can be considered robust if the

        amount of changes required to remove the mark would render the file useless. Therefore

        the mark should be hidden in a part of the file where its removal would be easily

        perceived.


        There are two main types of robust marking. Fingerprinting involves hiding a unique

identifier for the customer who originally acquired the file and therefore is allowed to use it.

Should the file be found in the possession of somebody else, the copyright owner can use the

fingerprint to identify which customer violated the license agreement by distributing a copy of the

file.


        Unlike fingerprints, Watermarks identify the copyright owner of the file, not the

customer. Whereas fingerprints are used to identify people who violate the license agreement

watermarks help with prosecuting those who have an illegal copy. Ideally fingerprinting should

be used but for mass production of CDs, DVDs, etc it is not feasible to give each disk a

separate fingerprint.


        Watermarks are typically hidden to prevent their detection and removal, they are said to

be imperceptible watermarks. However this need not always be the case. Visible watermarks

can be used and often take the form of a visual pattern overlaid on an image. The use of visible

watermarks is similar to the use of watermarks in non-digital formats (such as the watermark on

British money).


Techniques:
        Information hiding techniques are receiving much attention today. The main motivation

for this is largely due to fear of encryption services getting outlawed, and copyright owners who
'


want to track confidential and intellectual property copyright against unauthorized access and

use in digital materials such as music, film, book and software through the use of digital

watermarks.


       A Steganographic System:




       f E:      steganographic function quot;embeddingquot;

       fE-1:    steganographic function quot;extractingquot;

       cover:   cover data in which emb will be hidden

       emb:      message to be hidden

       key:      parameter of fE

       stego:   cover data with the hidden message


       A Graphical Version of the Steganographic System:




       Steganographic messages may first be encrypted and then a cover message is modified

to contain the encrypted message, resulting in stego text. Only those who know the technique

used can recover the message and, if required, decrypt it.
(


        The message may be a few thousand bits (often at 7 or 8 bits per text character)

embedded in millions of other bits. Probably the most typical use is digital images. Digital

images are commonly stored in either 24-bit or 8-bit files. If an 8-bit image is viewed as a grid

and the grid is made up of cells, these cells are called pixels. Each pixel consists of an 8-bit

binary number (or a single byte), and each 8-bit binary number refers to the color palette (a set

of colors defined within the image). All color variations for the pixels are derived from three

primary colors: red, green, and blue. Each primary color is represented by 1 byte (= 8 bits).


        Digital watermarking technology is viewed as quot;an enabling agent allowing more

widespread sharing and use of that content while decreasing worry over piracy”. Today

steganography is often used for digital watermarking to hide copyright or ownership information

in an image, movie, or audio file. A copyright holder can pull the hidden copyright or ownership

information out of a suspect file to prove it is stolen. Digital watermarking is not used for

authenticating documents. (Digital signatures perform this task.) A digital watermark refers to

the ability to unobtrusively include information in a file, and is commonly executed through a

variety of cryptographic techniques, collectively known as steganography.


        Algorithms and transformations: Another steganography technique is to hide data in

mathematical functions that are in compression algorithms. The idea is to hide the data bits in

the least significant coefficients.


        Other techniques of steganography include spread spectrum steganography, statistical

steganography, distortion, and cover generation steganography.



Related Work (Text Techniques)
        While it is very easy to tell when you have committed a copyright infringement by

photocopying a book, since the quality is widely different, it is more difficult when it comes to
*


electronic versions of text. Copies are identical and it is impossible to tell if it is an original or a

copied version. To embed information inside a document we can simply alter some of its

characteristics. These can be either the text formatting or characteristics of the characters. You

may think that if we alter these characteristics it will become visible and obvious to third parties

or attackers. The key to this problem is that we alter the document in a way that it is simply not

visible to the human eye yet it is possible to decode it by computer.




                                     +

       Figure above, shows the general principle in embedding hidden information inside a

document. Again, there is an encoder and to decode it, there will be a decoder. The codebook is

a set of rules that tells the encoder which parts of the document it needs to change. It is also

worth pointing out that the marked documents can be either identical or different. By different,

we mean that the same watermark is marked on the document but different characteristics of

each of the documents are changed.


Line Shift Coding Protocol

       In line shift coding, we simply shift various lines inside the document up or down by a
                                th
small fraction such as 1/300 of an inch) according to the codebook. The shifted lines are

undetectable by humans because it is only a small fraction but is detectable when the computer

measures the distances between each of the lines. Differential encoding techniques are

normally used in this protocol, meaning if you shift a line the adjacent lines are not moved.
,


These lines will become a control so that the computer can measure the distances between

them.

        By finding out whether a line has been shifted up or down we can represent a single bit,

0 or 1. And if we put the whole document together, we can embed a number of bits and

therefore have the ability to hide large information.


Word Shift Coding Protocol

        The word shift coding protocol is based on the same principle as the line shift coding

protocol. The main difference is instead of shifting lines up or down, we shift words left or right.

This is also known as the justification of the document. The codebook will simply tell the

encoder which of the words is to be shifted and whether it is a left or a right shift. Again, the

decoding technique is measuring the spaces between each word and a left shift could represent

a 0 bit and a right bit representing a 1 bit.


                        The quick brown fox jumps        the lazy dog.
                            -               ./               0

Line Shift Coding Protocol

        In this example the first line uses normal spacing while the second has had each word

shifted left or right by 0.5 points in order to encode the sequence 01000001 that is 65, the ASCII

character code for A. Without having the original for comparison it is likely that this may not be

noticed and the shifting could be even smaller to make it less noticeable.


Feature Coding Protocol

        In feature coding, there is a slight difference with the above protocols, and this is that the

document is passed through a parser where it examines the document and it automatically

builds a codebook specific to that document. It will pick out all the features that it thinks it can

use to hide information and each of these will be marked into the document. This can use a

number of different characteristics such as the height of certain characters, the dots above i and
1


j and the horizontal line length of letters such as f and t. Line shifting and word shifting

techniques can also be used to increase the amount of data that can be hidden.


White Space Manipulation


       One way of hiding data in text is to use white space. If done correctly, white space can

be manipulated so that bits can be stored. This is done by adding a certain amount of white

space to the end of lines. The amount of white space corresponds to a certain bit value. Due to

the fact that in practically all text editors, extra white space at the end of lines is skipped over, it

won’t be noticed by the casual viewer. In a large piece of text, this can result in enough room to

hide a few lines of text or some secret codes. A freely available program which uses this

technique is named “SNOW”.


Text Content

       Another way of hiding information is to conceal it in what seems to be inconspicuous

text. The grammar within the text can be used to store information. It is possible to change

sentences to store information and keep the original meaning. TextHide is a program, which

incorporates this technique to hide secret messages. A simple example is:



       Changed to:

                    2                             -                3

       Another way of using text itself is to use random words as a means of encoding

information. Different words can be given different values. Of course this would be easy to spot

but there are clever implementations, such as SpamMimic which creates a spam email that

contains a secret message. As spam usually has poor grammar, it is far easier for it to escape

notice. The following extract from a spam email encodes the phrase 45
Dear Friend , Especially for you - this red-hot intelligence . We will comply with all removal requests .

      This mail is being sent in compliance with Senate bill 2116 , Title 9 ; Section 303 ! THIS IS NOT A GET

      RICH SCHEME . Why work for somebody else when you can become rich inside 57 weeks . Have you

      ever noticed most everyone has a cellphone & people love convenience. Well, now is your chance to

      capitalize on this . WE will help YOU SELL MORE and sell more! You are guaranteed to succeed

      because we take all the risk ! But don't believe us . Ms Simpson of Washington tried us and says quot;My

      only problem now is where to park all my carsquot; . This offer is 100% legal. You will blame yourself

      forever if you don't order now ! Sign up a friend and you'll get a discount of 50%. Thank-you for your

      serious consideration of our offer . Dear Decision maker;

      Thank-you for your interest in our briefing . If you are not interested in our publications and wish to be

      removed from our lists, simply do NOT respond and ignore this mail ! This mail is being sent in

      compliance with Senate bill 1623 ; Title 6 ; Section 304 ! THIS

      IS NOT A GET RICH SCHEME ! Why work for somebody else when you can …


        A very basic form of steganography makes use of a cipher. A cipher is basically a key

which can be used to decode some data to retrieve a secret hidden message. Sir Francis Bacon
                          th
created one in the 16 Century using messages with two different type faces, one bolder than

the other. By looking at the positions of the bold characters in relation to the rest of the text, a

secret message could be decoded. There are many other different ciphers which could be used

to the same effect.


XML

        XML is becoming a widely used standard for data exchange. The format also provides

plenty of opportunities for data hiding. This is important for verifying documents to see if they

have been altered and also for copyright reasons. You can embed a code for example, which

can be traced back to the source. A method for hiding information in XML comes courtesy of the

University of Tokyo.


        Many different files can exist when XML is used. There is the XML file itself but there can

be transformation files (.xsl), validation files (.dtd) and style files (.css). All of these files can be
used to hide data but the main XML file is usually the best due to its larger size. This technique

concentrates on just the XML file, more elaborate techniques could use a combination of all four

files to increase robustness.


       One way of hiding data in XML is to use the different tags as allowed by the W3C. For

example both of these image tags are valid and could be used to indicate different bit settings


                                         Stego key:
                                     <img></img> -> 0
                                         <img/> -> 1
In this way a piece of XML like the following could be used to encode a simple bit string.
                                         Stego data:
                                 <img src=”foo1.jpg”></img>
                                     <img src=”foo2.jpg”/>
                                     <img src=”foo3.jpg”/>
                                     <img src=”foo4.jpg”/>
                                 <img src=”foo5.jpg”></img>


       The XML data in this case stores the bit strings 101100 and 010011.

       Other ways of storing data include using the order in which attributes or elements

appear. For example, assigning the combination of element A followed by element B the bit

value of 1 while if A is followed by some element C, it would be assigned the value of 0.


       Hiding data using the scheme outlined above would be pretty easy. In the case of using

white space, a simple text manipulation program could be used to add the spaces and then a

reader could be created to parse the XML and retrieve the hidden data. The same is true for the

usage of different tags. The structure of elements would be a little more difficult as changing

elements could have an adverse impact on the way the XML is displayed but if cleverly

designed, this could be overcome. In this example the containment of elements is used:


               <favorite><fruit>SOMETHING</fruit></favorite> -> 0
$


               <fruit><favorite>SOMETHING</favorite></fruit> -> 1

               In this example the order of the elements is used:

               <user><name>NAME</name><id>ID</id></user> -> 0

                                                                                             [2]
               <user><id>ID</id><name>NAME</name></user> -> 1

Microsoft Soft Office Suit


       A great deal of research has been accomplished in the area of hiding data in text, image,

or audio files. There does not seem to be a lot of research in the area of hiding data inside

unused space. The only related work found is by Eric Cole in his book “Hiding Data in Plain

Sight” where he gives several examples of how to hide data in various file structures, including

the properties section of Word documents.


       In the world of spy vs. spy, covert communication, or steganography, is not a new

concept. This ancient art has been used in many ways and in many mediums and has not been

ignored in this century with the bits and bytes of the computerized world. Many methods have

been found for hiding covert messages and data in computer files. One only has to search the

Internet for steganography, or stego for short, to find multiple freeware utilities that will allow

even a novice computer user to create files with hidden communications. However, where there

is a desire to hide communication, there is also a desire to detect that communication. For this

reason, there are also tools available online to detect covert data in image files. How dangerous

is a hiding place that everyone knows about? What if someone sending covert data used file

types less commonly used for steganography such as MS Word documents? Would that

communication escape notice? Can these files even carry a covert message?


       With the large amount of traffic that traverses networks daily it is impossible for any

single administrator or investigator to examine all data. When examining network traffic a

system administrator is limited to the traffic they consider suspicious or dangerous. A system

administrator must know the normal traffic across their network and investigate when something
%


odd occurs. There are a large number of programs today that will hide data in image or audio

files. Therefore, data could be stored inside one of these and sent across the network

decreasing suspicion. However, what if, instead of pictures, someone sends a Word document.

Then they send a Power Point presentation followed by any number of common office

documents. This varying of file types would create less suspicion by appearing to be normal

traffic. Can these files carry covert information? Yes, they contain meta-data and unused bits

that can be replaced without obvious effect.

       The programs mentioned above that hide data in images perform steganography. There

are numerous, well-published ways to use steganography in the hiding of information in image

and audio files. However, a lesser considered area is the simple hiding of information inside

common office files. These spaces are not well-known or well-documented. They can be used

relatively easily to hide data and using them decreases suspicion as stated above. Also, using

these spaces with bit substitution keeps the original file size. This reduces the chance for

automated detection or analysis. For these reasons and more, these spaces should be made

aware to investigators.

Unused Space and Meta-data Defined


       Some files contain readily available spaces that can be used inside their file structures.

One possible example could be meta-data, data about data. Meta-data is ingrained in file

structures but not visible to the user without special tools. Some files also have unused space.

They contain bits that can be overwritten without any adverse or obvious effect on the file.

These spaces are not visible to the average user because they are ignored when the files are

opened. These spaces can be seen when examined at the byte level, something few users

would do. These spaces create an opportunity to hide covert data. This paper shows the results

of examining several common office files to see if they have these spaces and whether or not
&


they could be used to hide data. It is not our intent to suggest their use, but rather to document

their existence as a vulnerability and possible data leakage point.

The Experiments and General Observations


       The first sets of tests were run on the Microsoft Office documents: Word, Excel, and

Power Point. Next html and email files were examined. Finally, compressed files were tested.

Each file type was put through the same set of tests. The presence or absence of meta-data

and unused space was immediately obvious in all file types. It was most prevalent in Microsoft

Word. This file type not only kept metadata but also contained history information about the

document. It contained such things as who created it, and where it was printed.

       Along with these meta-data sections, large groups of the repeated hex value FF or 00

were noticed in some file types. These spaces were ideal for hiding data. For each file type,

several files of different sizes were examined to determine if these spaces were constant. The

spaces seem to be more dependent on the version used to create the file than on the file

contents. Replacing these spaces with our data was accomplished but the data could not be

inserted in this area without noticeable side effects. Inserting data changes the length of the file

and the format of the file structure, so once the file is saved it cannot be opened without error

messages. Sometimes, it could not be opened at all. Therefore, inserting the data is easily done

and possible but it corrupts the file in the process. This held true for all the file types that did not

consist of plain text like web pages. Data inserted at the end of the file did not cause this effect

but did affect the file size, which could help identify that file as containing hidden data. Each file

type was tested to see if data could be hidden at the end of the file, after the end of file pointer.

All proved susceptible to this technique except html and email files. Data in either place proved

to be volatile. Once anyone opens and saves the document, the hidden data is destroyed. Now

details concerning each one of the file types will be discussed.
'


Results by File Type

       Word documents were the first to be tested. 780 bytes of repeated values were

discovered and utilized to hide data. Excel files were examined next. The findings were similar

to those of Word, however, Excel had fewer spaces in which to hide data. The largest

continuous block was approximately 420 bytes found just below the header. Finally Power Point

files were examined. The results were the same as the Excel files, except they did seem to have

more of the smaller hiding places. In Word the plain text was obvious. In Excel the numbers

could be seen. Power Point was not so obvious making searching for hiding places harder.

       In summary Microsoft Office files provided many opportunities for hiding data. Inserting

data caused the file to become corrupt, but they had plenty of unused space that could be

written over. This could be avoided by inserting data at the end of the file. Another peculiarity

was the need to avoid the area where Microsoft stores its file property information. This area

had to be avoided to prevent others from easily viewing the hidden data. This was discussed in

which provided source code for a program that could be used to hide data in this spot. Other

than this limitation, the inserted data was not apparent and was stable as long as the file was

not altered or saved.

       Web files were tested next. Html and email files are actually no more than text files that

are interpreted by another program. Text files have no headers and no unused space. There are

ways to hide data in text, but there are no data hiding vulnerabilities in the file structures of a

simple text file that we are aware of. However, web pages contain areas that are ignored during

web page creation. There is no real unused space to hide data in, but these ignored areas

create meta-data hiding opportunities. Web browsers also ignore commands they see as errors,

so data can be hidden by placing it inside the symbols “<>.”

       These methods have a draw-back. Web browsers normally contain the option to “view

source.” This is not an often used tool but it allows any user to view the hidden text with ease.
(


The data could be encrypted or made to look like meta-data using a grammar-based

substitution technique but its presence could still be easily detected.

Email files proved to be similar to html files. They are also plain text files that are interpreted by

other programs. Emails contain information about each server that the email traveled through.

Data can easily be hidden here by mimicking this server information. Simply insert the data

following the word “Received:”. Most email programs today would not display this information by

default. Just as in html/htm documents, one has only to view source or open the file in a text

editor to see the hidden data. In summary, web files could be used to hide data easily, but the

ease of use is balanced by the ease of discovery.

       When dealing with electronic transfer where space must be conserved, it would not be

uncommon to see compressed files, such as WinZip. Therefore compressed files were studied

next. Due to the nature of these files, they are not as vulnerable to hiding. One function of a

compression algorithm is to look for long strings of redundant bytes and transform them into

smaller strings that represent them. Therefore, the long strings of repeated values being used to

hide data here would have been reduced or eliminated. However, because of the commonality

of these files, tests were run to confirm this.

       Data was successfully added after the end of file marker, but there were no unused

spaces inside them to use for hiding data. It was also noted that compressing a file with hidden

data and then uncompressing it did not affect the hidden data. In addition while the file was

compressed the hidden data was not readable with the hex editor. The compressed files

containing hidden data were larger than the uncompressed files because of the reduction of the

redundant bits when the substitution of hidden data was done. This could possibly be a red flag

for hidden data if the reduction ratios of files were used to check file sizes.                   [1]
*


Limitations of Existing Text based Steganographic Techniques


       Following are the major drawbacks in the above cited techniques:


       Data hidden in .doc files is lost when saved in PDF/ASCII – Text format etc.

       Increase / Decrease in line / word spacing is eye-catching, and so is the separation of

       words / lines with extra spaces.

       Placing extra spaces at the end of a sentence can go un-noticed except if one selects a

       page or an entire document for copy etc., where the extra spaces become prominent.

       Adding spaces past end of file mark can create doubts because of increased file-length.


Proposed Solution


   Till today, no known Text-based data hiding technique exist that can hide information

without increasing / decreasing document length and / or altering the text appearance.


   The proposed thesis is aimed at evolving a coding technique that will hide data within actual

contents of the Text file, used as cover, taking care of all of the existing drawbacks in Text-

based Steganographic Systems, dully supported by a complete software solution.


   This will eradicate the possibility of losing hidden data at the time of compression or

conversion of the text to “pdf” file format. In addition, any one in possession of the actual cover

will not find a change in the contents and layout of the stego-text document on comparison.


APPLICATION:

       This technique can best be applied on web pages for un-noticed global interaction,

where the entire concentration is primarily focused on images and text spacing. A real time

demonstration of this fact will also be given.
,


                                           References

        6.                47     +     4                  8             !           2       +             9:
        8                  :

2.          9;;                        ;< (6 00 ;!          '1(;
        Steganography And Digital Watermarking, 2004 Jonathan Cummins, Patrick Diskin, Samuel
3.
        Lau and Robert Parlett,School of Computer Science, The University of Birmingham.
                      !                    quot;#$$%& 7                                                '
                                                             8 9: 4                8
                 (                                       ) $ $*)         #$$% #$%##$
+       !             ,- . '                                /      0/ 04    666 !2 = quot;# 6> 82 !4 ?
                                                                                                6
          + $12**%)$%)3 2 $$ 4 #$$% '          '         5' ( 5' 7
                                                            67      8.
9                                                                          / /: 0       9 #$$2
        8 7 = @9:                                     7 =@
2       .          5      ;                                                '<          =         .
                                          2
                              9quot; & 2 1 >
                                  -                 **>
>       ?        @                                            '?         <       8 %+       %(
                                    -
        % %1%9 **9
*       ?        @                                             '?         <       8 %*        %(
                            :
        + 219> #$$$
    $           A        ? B/         . ! 6 A =, >                                   '
                                                               +    7        4
                   @ ,/            %21 $                  =                               =
            9;;                        ;     A;         ;8         A ,'A       A
                0    **9
        ;.                 5 <.                       C/ 4               7     3: 8
          =/ '          >2     2         $9#1 $2> <      ***
        +     7       !               6                   -          quot;       .+
 #
                   D                E                    E                D              D D!          quot;
                                                 E!
            #             $#                                 quot;                              %      & #'
        ( &#) quot; ' ## *          +% # !# # &
    %       , :A            / * #$$# . ) '          .             % F: +           !     !
        8
                      '        6 G!               '           9;;                           ; .7      ;;
    +       /         ,   !       ?-.                     =6/H              =                @        ,
                9;;               ;B          ;             ;                           C            3/
        8            #$$#
    9   </          ; <I           / F6.              8            98             #
                                                '                  ;0           **>-#91%
                       //        ;  ; 1'
    2           ;</    <                      / <I        F8                      !                   9
                                                                                    =           '=
        4              7        4
                  6/      5          ' 1651**1 $1 ;< :        0   ***

Weitere ähnliche Inhalte

Was ist angesagt?

Steganography ProjectReport
Steganography ProjectReportSteganography ProjectReport
Steganography ProjectReport
ekta sharma
 
Stegnography final
Stegnography finalStegnography final
Stegnography final
Nikhil Kumar
 

Was ist angesagt? (20)

A NOVEL APPROACHES TOWARDS STEGANOGRAPHY
A NOVEL APPROACHES TOWARDS STEGANOGRAPHYA NOVEL APPROACHES TOWARDS STEGANOGRAPHY
A NOVEL APPROACHES TOWARDS STEGANOGRAPHY
 
Steganography.
Steganography.Steganography.
Steganography.
 
Steganography ProjectReport
Steganography ProjectReportSteganography ProjectReport
Steganography ProjectReport
 
Cryptography and Steganography with watermarking
Cryptography and Steganography with watermarkingCryptography and Steganography with watermarking
Cryptography and Steganography with watermarking
 
Steganography chandni verma(cse 4th year)
Steganography chandni verma(cse 4th year)Steganography chandni verma(cse 4th year)
Steganography chandni verma(cse 4th year)
 
Data Security Using Steganography
Data Security Using Steganography Data Security Using Steganography
Data Security Using Steganography
 
art of Steganography
art of Steganography art of Steganography
art of Steganography
 
Steganography Engineering project report
Steganography Engineering project reportSteganography Engineering project report
Steganography Engineering project report
 
Steganography
SteganographySteganography
Steganography
 
Stegnography final
Stegnography finalStegnography final
Stegnography final
 
Steganography
Steganography Steganography
Steganography
 
Steganography presentation
Steganography presentationSteganography presentation
Steganography presentation
 
96683234 project-report-steganography
96683234 project-report-steganography96683234 project-report-steganography
96683234 project-report-steganography
 
Steganography
SteganographySteganography
Steganography
 
83747965 steganography
83747965 steganography83747965 steganography
83747965 steganography
 
Steganography - The art of hiding data
Steganography - The art of hiding dataSteganography - The art of hiding data
Steganography - The art of hiding data
 
Steganography and watermarking
Steganography and watermarkingSteganography and watermarking
Steganography and watermarking
 
Steganography ppt
Steganography pptSteganography ppt
Steganography ppt
 
A Study of Various Steganographic Techniques Used for Information Hiding
A Study of Various Steganographic Techniques Used for Information HidingA Study of Various Steganographic Techniques Used for Information Hiding
A Study of Various Steganographic Techniques Used for Information Hiding
 
Steganography ppt
Steganography pptSteganography ppt
Steganography ppt
 

Ähnlich wie Phd T H E S I Sproposal

Feature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised ClusteringFeature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised Clustering
Editor IJCATR
 
Steganography using Interpolation and LSB with Cryptography on Video Images -...
Steganography using Interpolation and LSB with Cryptography on Video Images -...Steganography using Interpolation and LSB with Cryptography on Video Images -...
Steganography using Interpolation and LSB with Cryptography on Video Images -...
Editor IJCATR
 
Steganography using Interpolation and LSB with Cryptography on Video Images-A...
Steganography using Interpolation and LSB with Cryptography on Video Images-A...Steganography using Interpolation and LSB with Cryptography on Video Images-A...
Steganography using Interpolation and LSB with Cryptography on Video Images-A...
Editor IJCATR
 

Ähnlich wie Phd T H E S I Sproposal (20)

Steganography
SteganographySteganography
Steganography
 
steganography-252-uzLRCSm.pptx
steganography-252-uzLRCSm.pptxsteganography-252-uzLRCSm.pptx
steganography-252-uzLRCSm.pptx
 
Presentation1
Presentation1Presentation1
Presentation1
 
A Survey Of Text Steganography Methods
A Survey Of Text Steganography MethodsA Survey Of Text Steganography Methods
A Survey Of Text Steganography Methods
 
CSE steganography for data writing and reading
CSE steganography for data writing and readingCSE steganography for data writing and reading
CSE steganography for data writing and reading
 
Feature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised ClusteringFeature Selection Algorithm for Supervised and Semisupervised Clustering
Feature Selection Algorithm for Supervised and Semisupervised Clustering
 
Steganography using Interpolation and LSB with Cryptography on Video Images -...
Steganography using Interpolation and LSB with Cryptography on Video Images -...Steganography using Interpolation and LSB with Cryptography on Video Images -...
Steganography using Interpolation and LSB with Cryptography on Video Images -...
 
Steganography using Interpolation and LSB with Cryptography on Video Images-A...
Steganography using Interpolation and LSB with Cryptography on Video Images-A...Steganography using Interpolation and LSB with Cryptography on Video Images-A...
Steganography using Interpolation and LSB with Cryptography on Video Images-A...
 
Steganography ppt
Steganography pptSteganography ppt
Steganography ppt
 
A Tutorial Review On Steganography
A Tutorial Review On SteganographyA Tutorial Review On Steganography
A Tutorial Review On Steganography
 
Visual Cryptography part 1-1.pptx.pptx
Visual Cryptography part 1-1.pptx.pptxVisual Cryptography part 1-1.pptx.pptx
Visual Cryptography part 1-1.pptx.pptx
 
Hi3612991303
Hi3612991303Hi3612991303
Hi3612991303
 
Steganography
SteganographySteganography
Steganography
 
steganography
steganographysteganography
steganography
 
Steganography
SteganographySteganography
Steganography
 
F1803023843
F1803023843F1803023843
F1803023843
 
Steganography
Steganography Steganography
Steganography
 
A Novel Steganography Technique that Embeds Security along with Compression
A Novel Steganography Technique that Embeds Security along with CompressionA Novel Steganography Technique that Embeds Security along with Compression
A Novel Steganography Technique that Embeds Security along with Compression
 
Steganography
SteganographySteganography
Steganography
 
A Review Paper On Steganography Techniques
A Review Paper On Steganography TechniquesA Review Paper On Steganography Techniques
A Review Paper On Steganography Techniques
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Phd T H E S I Sproposal

  • 1. Table of Contents Abstract ……………………………………….……… 2 Motivation ……………………………………………… 3 History ……………………………………………… 3 Principals ……………………………………………… 4 Techniques ……………………………………………… 5 Related Work ……………………………………………… 7 Limitation of Existing Techniques ……………………. 18 Proposed Solution ……………………………………… 18 Application ……………………………………………… 18 References ……………………………………………… 19
  • 2. Abstract Mark Owens [9] !quot;# With computers having GHz of processing speed, information / data either stored or in transmission has become more and more vernalable to hostile eavesdropping, theft, wiretapping etc. This urges us to devise new data hiding techniques to protect and secure data of vital significance. Steganography is a method of securing data by obscuring the contents in another media (called Cover) in which it is saved / transmitted. This doctorial thesis proposal will present a new Steganographic Technique for hiding data in (ASCII) text files together with its Software implementation, a research area in Steganography which is considered as toughest among all, to address.
  • 3. $ Motivation While Net surfing, I encountered an on-line article in the USA Today titled “Terror groups hide behind Web encryption” claiming (though not yet publicized evidence exist) terrorists may be using steganography to communicate with each other in planning terrorist attacks, that twigged my interest for evolving a new concealment technique. It is intuited that images with hidden messages have ideal cover on bulletin boards or dead drops for other terrorists to pick up and resolve. History Steganography dates back to ancient Greece when etching messages or images in wooden tablets and covering them with wax, and tattooing a shaved messenger's head, letting the hair grow back, and then shaving the head again to read the message were common practices. Early in WWII steganographic technology consisted almost exclusively of invisible inks. Sources for invisible inks include milk, vinegar, fruit juices and urine that darken when heated. The following message was sent by a German spy during WWII: Apparently neutral's protest is thoroughly discounted and ignored. Isman hard hit. Blockade issue affects pretext for embargo on by products, ejecting suets and vegetable oils. Taking the second letter in each word the following message emerges: Pershing sails from NY June 1. When invisible inks became easy to decode through improved technology, null ciphers were used. Null ciphers are unencrypted messages that are indiscernible in innocent sounding messages. An example of such a message is:
  • 4. % Fishing freshwater bends and saltwater coasts rewards anyone feeling stressed. Resourceful anglers usually find masterful leapers fun and admit swordfish rank overwhelming anyday. Taking the third letter in each word the following message emerges: Send Lawyers, Guns, and Money. The Germans developed the microdot technology during WWII. Microdots are text or photographic images that are shrunk down to the size and shape of a period or the dot of an i or j. Microdots were usually sent by writing a letter containing periods, i's, or j's, and the intended recipient could read the messages using a microscope. Because of the extremely small size of the microdots the messages typically went unnoticed by inspectors. A steganographic message generally appears to be something else, like an article or a picture, or some other quot;coverquot; message. Drawings have often been used to conceal information since it is easy to encode a message by varying lines, colors or other elements in pictures. This tutorial will focus on image files to hide text messages. Principals: Steganography can be split into two types, these are Fragile and Robust. The following section describes the definition of these two different types of steganography. Fragile Fragile steganography involves embedding information into a file which is destroyed if the file is modified. This method is unsuitable for recording the copyright holder of the file since it can be so easily removed, but is useful in situations where it is important to prove that the file has not been tampered with, such as using a file as evidence in a court of law, since any tampering would have removed the watermark. Fragile steganography techniques tend to be easier to implement than robust methods.
  • 5. & Robust Robust marking aims to embed information into a file which cannot easily be destroyed. Although no mark is truly indestructible, a system can be considered robust if the amount of changes required to remove the mark would render the file useless. Therefore the mark should be hidden in a part of the file where its removal would be easily perceived. There are two main types of robust marking. Fingerprinting involves hiding a unique identifier for the customer who originally acquired the file and therefore is allowed to use it. Should the file be found in the possession of somebody else, the copyright owner can use the fingerprint to identify which customer violated the license agreement by distributing a copy of the file. Unlike fingerprints, Watermarks identify the copyright owner of the file, not the customer. Whereas fingerprints are used to identify people who violate the license agreement watermarks help with prosecuting those who have an illegal copy. Ideally fingerprinting should be used but for mass production of CDs, DVDs, etc it is not feasible to give each disk a separate fingerprint. Watermarks are typically hidden to prevent their detection and removal, they are said to be imperceptible watermarks. However this need not always be the case. Visible watermarks can be used and often take the form of a visual pattern overlaid on an image. The use of visible watermarks is similar to the use of watermarks in non-digital formats (such as the watermark on British money). Techniques: Information hiding techniques are receiving much attention today. The main motivation for this is largely due to fear of encryption services getting outlawed, and copyright owners who
  • 6. ' want to track confidential and intellectual property copyright against unauthorized access and use in digital materials such as music, film, book and software through the use of digital watermarks. A Steganographic System: f E: steganographic function quot;embeddingquot; fE-1: steganographic function quot;extractingquot; cover: cover data in which emb will be hidden emb: message to be hidden key: parameter of fE stego: cover data with the hidden message A Graphical Version of the Steganographic System: Steganographic messages may first be encrypted and then a cover message is modified to contain the encrypted message, resulting in stego text. Only those who know the technique used can recover the message and, if required, decrypt it.
  • 7. ( The message may be a few thousand bits (often at 7 or 8 bits per text character) embedded in millions of other bits. Probably the most typical use is digital images. Digital images are commonly stored in either 24-bit or 8-bit files. If an 8-bit image is viewed as a grid and the grid is made up of cells, these cells are called pixels. Each pixel consists of an 8-bit binary number (or a single byte), and each 8-bit binary number refers to the color palette (a set of colors defined within the image). All color variations for the pixels are derived from three primary colors: red, green, and blue. Each primary color is represented by 1 byte (= 8 bits). Digital watermarking technology is viewed as quot;an enabling agent allowing more widespread sharing and use of that content while decreasing worry over piracy”. Today steganography is often used for digital watermarking to hide copyright or ownership information in an image, movie, or audio file. A copyright holder can pull the hidden copyright or ownership information out of a suspect file to prove it is stolen. Digital watermarking is not used for authenticating documents. (Digital signatures perform this task.) A digital watermark refers to the ability to unobtrusively include information in a file, and is commonly executed through a variety of cryptographic techniques, collectively known as steganography. Algorithms and transformations: Another steganography technique is to hide data in mathematical functions that are in compression algorithms. The idea is to hide the data bits in the least significant coefficients. Other techniques of steganography include spread spectrum steganography, statistical steganography, distortion, and cover generation steganography. Related Work (Text Techniques) While it is very easy to tell when you have committed a copyright infringement by photocopying a book, since the quality is widely different, it is more difficult when it comes to
  • 8. * electronic versions of text. Copies are identical and it is impossible to tell if it is an original or a copied version. To embed information inside a document we can simply alter some of its characteristics. These can be either the text formatting or characteristics of the characters. You may think that if we alter these characteristics it will become visible and obvious to third parties or attackers. The key to this problem is that we alter the document in a way that it is simply not visible to the human eye yet it is possible to decode it by computer. + Figure above, shows the general principle in embedding hidden information inside a document. Again, there is an encoder and to decode it, there will be a decoder. The codebook is a set of rules that tells the encoder which parts of the document it needs to change. It is also worth pointing out that the marked documents can be either identical or different. By different, we mean that the same watermark is marked on the document but different characteristics of each of the documents are changed. Line Shift Coding Protocol In line shift coding, we simply shift various lines inside the document up or down by a th small fraction such as 1/300 of an inch) according to the codebook. The shifted lines are undetectable by humans because it is only a small fraction but is detectable when the computer measures the distances between each of the lines. Differential encoding techniques are normally used in this protocol, meaning if you shift a line the adjacent lines are not moved.
  • 9. , These lines will become a control so that the computer can measure the distances between them. By finding out whether a line has been shifted up or down we can represent a single bit, 0 or 1. And if we put the whole document together, we can embed a number of bits and therefore have the ability to hide large information. Word Shift Coding Protocol The word shift coding protocol is based on the same principle as the line shift coding protocol. The main difference is instead of shifting lines up or down, we shift words left or right. This is also known as the justification of the document. The codebook will simply tell the encoder which of the words is to be shifted and whether it is a left or a right shift. Again, the decoding technique is measuring the spaces between each word and a left shift could represent a 0 bit and a right bit representing a 1 bit. The quick brown fox jumps the lazy dog. - ./ 0 Line Shift Coding Protocol In this example the first line uses normal spacing while the second has had each word shifted left or right by 0.5 points in order to encode the sequence 01000001 that is 65, the ASCII character code for A. Without having the original for comparison it is likely that this may not be noticed and the shifting could be even smaller to make it less noticeable. Feature Coding Protocol In feature coding, there is a slight difference with the above protocols, and this is that the document is passed through a parser where it examines the document and it automatically builds a codebook specific to that document. It will pick out all the features that it thinks it can use to hide information and each of these will be marked into the document. This can use a number of different characteristics such as the height of certain characters, the dots above i and
  • 10. 1 j and the horizontal line length of letters such as f and t. Line shifting and word shifting techniques can also be used to increase the amount of data that can be hidden. White Space Manipulation One way of hiding data in text is to use white space. If done correctly, white space can be manipulated so that bits can be stored. This is done by adding a certain amount of white space to the end of lines. The amount of white space corresponds to a certain bit value. Due to the fact that in practically all text editors, extra white space at the end of lines is skipped over, it won’t be noticed by the casual viewer. In a large piece of text, this can result in enough room to hide a few lines of text or some secret codes. A freely available program which uses this technique is named “SNOW”. Text Content Another way of hiding information is to conceal it in what seems to be inconspicuous text. The grammar within the text can be used to store information. It is possible to change sentences to store information and keep the original meaning. TextHide is a program, which incorporates this technique to hide secret messages. A simple example is: Changed to: 2 - 3 Another way of using text itself is to use random words as a means of encoding information. Different words can be given different values. Of course this would be easy to spot but there are clever implementations, such as SpamMimic which creates a spam email that contains a secret message. As spam usually has poor grammar, it is far easier for it to escape notice. The following extract from a spam email encodes the phrase 45
  • 11. Dear Friend , Especially for you - this red-hot intelligence . We will comply with all removal requests . This mail is being sent in compliance with Senate bill 2116 , Title 9 ; Section 303 ! THIS IS NOT A GET RICH SCHEME . Why work for somebody else when you can become rich inside 57 weeks . Have you ever noticed most everyone has a cellphone & people love convenience. Well, now is your chance to capitalize on this . WE will help YOU SELL MORE and sell more! You are guaranteed to succeed because we take all the risk ! But don't believe us . Ms Simpson of Washington tried us and says quot;My only problem now is where to park all my carsquot; . This offer is 100% legal. You will blame yourself forever if you don't order now ! Sign up a friend and you'll get a discount of 50%. Thank-you for your serious consideration of our offer . Dear Decision maker; Thank-you for your interest in our briefing . If you are not interested in our publications and wish to be removed from our lists, simply do NOT respond and ignore this mail ! This mail is being sent in compliance with Senate bill 1623 ; Title 6 ; Section 304 ! THIS IS NOT A GET RICH SCHEME ! Why work for somebody else when you can … A very basic form of steganography makes use of a cipher. A cipher is basically a key which can be used to decode some data to retrieve a secret hidden message. Sir Francis Bacon th created one in the 16 Century using messages with two different type faces, one bolder than the other. By looking at the positions of the bold characters in relation to the rest of the text, a secret message could be decoded. There are many other different ciphers which could be used to the same effect. XML XML is becoming a widely used standard for data exchange. The format also provides plenty of opportunities for data hiding. This is important for verifying documents to see if they have been altered and also for copyright reasons. You can embed a code for example, which can be traced back to the source. A method for hiding information in XML comes courtesy of the University of Tokyo. Many different files can exist when XML is used. There is the XML file itself but there can be transformation files (.xsl), validation files (.dtd) and style files (.css). All of these files can be
  • 12. used to hide data but the main XML file is usually the best due to its larger size. This technique concentrates on just the XML file, more elaborate techniques could use a combination of all four files to increase robustness. One way of hiding data in XML is to use the different tags as allowed by the W3C. For example both of these image tags are valid and could be used to indicate different bit settings Stego key: <img></img> -> 0 <img/> -> 1 In this way a piece of XML like the following could be used to encode a simple bit string. Stego data: <img src=”foo1.jpg”></img> <img src=”foo2.jpg”/> <img src=”foo3.jpg”/> <img src=”foo4.jpg”/> <img src=”foo5.jpg”></img> The XML data in this case stores the bit strings 101100 and 010011. Other ways of storing data include using the order in which attributes or elements appear. For example, assigning the combination of element A followed by element B the bit value of 1 while if A is followed by some element C, it would be assigned the value of 0. Hiding data using the scheme outlined above would be pretty easy. In the case of using white space, a simple text manipulation program could be used to add the spaces and then a reader could be created to parse the XML and retrieve the hidden data. The same is true for the usage of different tags. The structure of elements would be a little more difficult as changing elements could have an adverse impact on the way the XML is displayed but if cleverly designed, this could be overcome. In this example the containment of elements is used: <favorite><fruit>SOMETHING</fruit></favorite> -> 0
  • 13. $ <fruit><favorite>SOMETHING</favorite></fruit> -> 1 In this example the order of the elements is used: <user><name>NAME</name><id>ID</id></user> -> 0 [2] <user><id>ID</id><name>NAME</name></user> -> 1 Microsoft Soft Office Suit A great deal of research has been accomplished in the area of hiding data in text, image, or audio files. There does not seem to be a lot of research in the area of hiding data inside unused space. The only related work found is by Eric Cole in his book “Hiding Data in Plain Sight” where he gives several examples of how to hide data in various file structures, including the properties section of Word documents. In the world of spy vs. spy, covert communication, or steganography, is not a new concept. This ancient art has been used in many ways and in many mediums and has not been ignored in this century with the bits and bytes of the computerized world. Many methods have been found for hiding covert messages and data in computer files. One only has to search the Internet for steganography, or stego for short, to find multiple freeware utilities that will allow even a novice computer user to create files with hidden communications. However, where there is a desire to hide communication, there is also a desire to detect that communication. For this reason, there are also tools available online to detect covert data in image files. How dangerous is a hiding place that everyone knows about? What if someone sending covert data used file types less commonly used for steganography such as MS Word documents? Would that communication escape notice? Can these files even carry a covert message? With the large amount of traffic that traverses networks daily it is impossible for any single administrator or investigator to examine all data. When examining network traffic a system administrator is limited to the traffic they consider suspicious or dangerous. A system administrator must know the normal traffic across their network and investigate when something
  • 14. % odd occurs. There are a large number of programs today that will hide data in image or audio files. Therefore, data could be stored inside one of these and sent across the network decreasing suspicion. However, what if, instead of pictures, someone sends a Word document. Then they send a Power Point presentation followed by any number of common office documents. This varying of file types would create less suspicion by appearing to be normal traffic. Can these files carry covert information? Yes, they contain meta-data and unused bits that can be replaced without obvious effect. The programs mentioned above that hide data in images perform steganography. There are numerous, well-published ways to use steganography in the hiding of information in image and audio files. However, a lesser considered area is the simple hiding of information inside common office files. These spaces are not well-known or well-documented. They can be used relatively easily to hide data and using them decreases suspicion as stated above. Also, using these spaces with bit substitution keeps the original file size. This reduces the chance for automated detection or analysis. For these reasons and more, these spaces should be made aware to investigators. Unused Space and Meta-data Defined Some files contain readily available spaces that can be used inside their file structures. One possible example could be meta-data, data about data. Meta-data is ingrained in file structures but not visible to the user without special tools. Some files also have unused space. They contain bits that can be overwritten without any adverse or obvious effect on the file. These spaces are not visible to the average user because they are ignored when the files are opened. These spaces can be seen when examined at the byte level, something few users would do. These spaces create an opportunity to hide covert data. This paper shows the results of examining several common office files to see if they have these spaces and whether or not
  • 15. & they could be used to hide data. It is not our intent to suggest their use, but rather to document their existence as a vulnerability and possible data leakage point. The Experiments and General Observations The first sets of tests were run on the Microsoft Office documents: Word, Excel, and Power Point. Next html and email files were examined. Finally, compressed files were tested. Each file type was put through the same set of tests. The presence or absence of meta-data and unused space was immediately obvious in all file types. It was most prevalent in Microsoft Word. This file type not only kept metadata but also contained history information about the document. It contained such things as who created it, and where it was printed. Along with these meta-data sections, large groups of the repeated hex value FF or 00 were noticed in some file types. These spaces were ideal for hiding data. For each file type, several files of different sizes were examined to determine if these spaces were constant. The spaces seem to be more dependent on the version used to create the file than on the file contents. Replacing these spaces with our data was accomplished but the data could not be inserted in this area without noticeable side effects. Inserting data changes the length of the file and the format of the file structure, so once the file is saved it cannot be opened without error messages. Sometimes, it could not be opened at all. Therefore, inserting the data is easily done and possible but it corrupts the file in the process. This held true for all the file types that did not consist of plain text like web pages. Data inserted at the end of the file did not cause this effect but did affect the file size, which could help identify that file as containing hidden data. Each file type was tested to see if data could be hidden at the end of the file, after the end of file pointer. All proved susceptible to this technique except html and email files. Data in either place proved to be volatile. Once anyone opens and saves the document, the hidden data is destroyed. Now details concerning each one of the file types will be discussed.
  • 16. ' Results by File Type Word documents were the first to be tested. 780 bytes of repeated values were discovered and utilized to hide data. Excel files were examined next. The findings were similar to those of Word, however, Excel had fewer spaces in which to hide data. The largest continuous block was approximately 420 bytes found just below the header. Finally Power Point files were examined. The results were the same as the Excel files, except they did seem to have more of the smaller hiding places. In Word the plain text was obvious. In Excel the numbers could be seen. Power Point was not so obvious making searching for hiding places harder. In summary Microsoft Office files provided many opportunities for hiding data. Inserting data caused the file to become corrupt, but they had plenty of unused space that could be written over. This could be avoided by inserting data at the end of the file. Another peculiarity was the need to avoid the area where Microsoft stores its file property information. This area had to be avoided to prevent others from easily viewing the hidden data. This was discussed in which provided source code for a program that could be used to hide data in this spot. Other than this limitation, the inserted data was not apparent and was stable as long as the file was not altered or saved. Web files were tested next. Html and email files are actually no more than text files that are interpreted by another program. Text files have no headers and no unused space. There are ways to hide data in text, but there are no data hiding vulnerabilities in the file structures of a simple text file that we are aware of. However, web pages contain areas that are ignored during web page creation. There is no real unused space to hide data in, but these ignored areas create meta-data hiding opportunities. Web browsers also ignore commands they see as errors, so data can be hidden by placing it inside the symbols “<>.” These methods have a draw-back. Web browsers normally contain the option to “view source.” This is not an often used tool but it allows any user to view the hidden text with ease.
  • 17. ( The data could be encrypted or made to look like meta-data using a grammar-based substitution technique but its presence could still be easily detected. Email files proved to be similar to html files. They are also plain text files that are interpreted by other programs. Emails contain information about each server that the email traveled through. Data can easily be hidden here by mimicking this server information. Simply insert the data following the word “Received:”. Most email programs today would not display this information by default. Just as in html/htm documents, one has only to view source or open the file in a text editor to see the hidden data. In summary, web files could be used to hide data easily, but the ease of use is balanced by the ease of discovery. When dealing with electronic transfer where space must be conserved, it would not be uncommon to see compressed files, such as WinZip. Therefore compressed files were studied next. Due to the nature of these files, they are not as vulnerable to hiding. One function of a compression algorithm is to look for long strings of redundant bytes and transform them into smaller strings that represent them. Therefore, the long strings of repeated values being used to hide data here would have been reduced or eliminated. However, because of the commonality of these files, tests were run to confirm this. Data was successfully added after the end of file marker, but there were no unused spaces inside them to use for hiding data. It was also noted that compressing a file with hidden data and then uncompressing it did not affect the hidden data. In addition while the file was compressed the hidden data was not readable with the hex editor. The compressed files containing hidden data were larger than the uncompressed files because of the reduction of the redundant bits when the substitution of hidden data was done. This could possibly be a red flag for hidden data if the reduction ratios of files were used to check file sizes. [1]
  • 18. * Limitations of Existing Text based Steganographic Techniques Following are the major drawbacks in the above cited techniques: Data hidden in .doc files is lost when saved in PDF/ASCII – Text format etc. Increase / Decrease in line / word spacing is eye-catching, and so is the separation of words / lines with extra spaces. Placing extra spaces at the end of a sentence can go un-noticed except if one selects a page or an entire document for copy etc., where the extra spaces become prominent. Adding spaces past end of file mark can create doubts because of increased file-length. Proposed Solution Till today, no known Text-based data hiding technique exist that can hide information without increasing / decreasing document length and / or altering the text appearance. The proposed thesis is aimed at evolving a coding technique that will hide data within actual contents of the Text file, used as cover, taking care of all of the existing drawbacks in Text- based Steganographic Systems, dully supported by a complete software solution. This will eradicate the possibility of losing hidden data at the time of compression or conversion of the text to “pdf” file format. In addition, any one in possession of the actual cover will not find a change in the contents and layout of the stego-text document on comparison. APPLICATION: This technique can best be applied on web pages for un-noticed global interaction, where the entire concentration is primarily focused on images and text spacing. A real time demonstration of this fact will also be given.
  • 19. , References 6. 47 + 4 8 ! 2 + 9: 8 : 2. 9;; ;< (6 00 ;! '1(; Steganography And Digital Watermarking, 2004 Jonathan Cummins, Patrick Diskin, Samuel 3. Lau and Robert Parlett,School of Computer Science, The University of Birmingham. ! quot;#$$%& 7 ' 8 9: 4 8 ( ) $ $*) #$$% #$%##$ + ! ,- . ' / 0/ 04 666 !2 = quot;# 6> 82 !4 ? 6 + $12**%)$%)3 2 $$ 4 #$$% ' ' 5' ( 5' 7 67 8. 9 / /: 0 9 #$$2 8 7 = @9: 7 =@ 2 . 5 ; '< = . 2 9quot; & 2 1 > - **> > ? @ '? < 8 %+ %( - % %1%9 **9 * ? @ '? < 8 %* %( : + 219> #$$$ $ A ? B/ . ! 6 A =, > ' + 7 4 @ ,/ %21 $ = = 9;; ; A; ;8 A ,'A A 0 **9 ;. 5 <. C/ 4 7 3: 8 =/ ' >2 2 $9#1 $2> < *** + 7 ! 6 - quot; .+ # D E E D D D! quot; E! # $# quot; % & #' ( &#) quot; ' ## * +% # !# # & % , :A / * #$$# . ) ' . % F: + ! ! 8 ' 6 G! ' 9;; ; .7 ;; + / , ! ?-. =6/H = @ , 9;; ;B ; ; C 3/ 8 #$$# 9 </ ; <I / F6. 8 98 # ' ;0 **>-#91% // ; ; 1' 2 ;</ < / <I F8 ! 9 = '= 4 7 4 6/ 5 ' 1651**1 $1 ;< : 0 ***